document Kerberos issues

This commit is contained in:
2022-10-04 17:28:30 +02:00
parent 183ccf294f
commit b5bb0e05ea

View File

@@ -2,8 +2,8 @@
This document describes the Kerberos issues we encountered during RHEL 8 introduction.
In RHEL we are using the `KEYRING` (kernel keyring) cache,
whereas for RHEL8 there came early the wish to use `KCM` (Kerberos Cache Manager) instead.
In RHEL 7 we are using the `KEYRING` (kernel keyring) cache,
whereas for RHEL 8 there came early the wish to use `KCM` (Kerberos Cache Manager) instead.
The Kerberos documentation contains a [reference for all available cache types]( https://web.mit.edu/kerberos/www/krb5-latest/doc/basic/ccache_def.html).
@@ -13,10 +13,11 @@ The Kerberos documentation contains a [reference for all available cache types](
- ssh ticket delegation (with `GSSAPIDelegateCredentials yes`)
- AFS authentication (`aklog`)
- AFS administrative operations where the user switches to a separate admin principal (e.g. `buchel_k-adm`)
- Website authentication (`SPNEGO` with Firefox, Chrome)
- local desktop: get new TGT on login
- local desktop: ticket renewal after reauthentication on lock screen
- remote desktop with NoMachine NX: get new TGT on login
- remote desktop with NoMachine NX: ticket renewal after reconnection
- Website authentication (`SPNEGO` with Firefox, Chrome)
## KCM
@@ -25,7 +26,7 @@ The `KCM` cache is provided by a dedicated daemon, for RHEL8 this is `sssd_kcm`
### Advantages of KCM
The advantage of `KCM` is that the caches are permanent and survive daemon restarts and system reboots without the need to fiddle around with files and file permission. This simplifies daemon and container use cases.
It also does automatically renew tickets which is handy for every use case.
It also automatically renews tickets which is handy for every use case.
### User Based vs Session Based
@@ -37,7 +38,7 @@ Problems I see with this are
- user may change his principal, eg. for admin operations (`kinit buchel_k-adm`) which is then used by all sessions
- user may destroy the cache (it is good security practice to have a `kdestroy` in `.bash_logout` to ensure nobody on the machine can use your tokens after logging out)
- software may put tokens into the cache which suddenly are not there any more
- the magic/heuristic used to select might not work optimally for all use cases (as we see below `sshd-kcm` fails horribly..)
- the magic/heuristic used to select might not work optimally for all use cases (as we see below `sshd-kcm` fails horribly...)
So if we have more than one session on a machine (e.g. people connecting via remote desktop and ssh at the same time), the cross-session side-effects can cause unexpected behaviour.
@@ -49,9 +50,9 @@ A way to get `KCM` of of the business of selecting the "optimal" cache is to sel
### Problems of sssd_kcm
The most obvious and well [known problem](https://github.com/SSSD/sssd/issues/3593) of `sshd-kcm` is that does not remove expired tokens and credential caches. I agree that it should not have an impact as this is mostly cosmetic. But that is only the case when everything can cope with that...
The most obvious and [well known problem](https://github.com/SSSD/sssd/issues/3593) of `sshd-kcm` is that it does not remove expired tokens and credential caches. I agree that it should not have an impact as this is mostly cosmetic. But that is only the case when everything can cope with that...
To check the Kerberos credential cache, you can use `klist` to look a the current default cache and `klist -l` to look at all available caches. Note that there the first listed cache is the default cache. Of course that is only valid when there is no `KRB5CCNAME` environment variable set or it is `KCM:`.
To check the Kerberos credential cache, you can use `klist` to look a the current default cache and `klist -l` to look at all available caches. Note that the first listed cache is the default cache. Of course that is only valid when there is no `KRB5CCNAME` environment variable set or it is `KCM:`.
#### Use of Expired Credential Caches
In below example you see that on the ssh login, I got a new default cache. But after a few minutes (there was a Desktop login from my side and maybe an automatic AFS token renewal in between), I get an expired cache as default cache.
@@ -157,7 +158,7 @@ Principal name Cache name
[buchel_k@mpc2959 ~]$
```
After I login via NoMachine NX and get an cache expired since more than two month:
After I logged in via NoMachine NX I got a cache expired since more than two month:
```
[buchel_k@mpc2959 ~]$ klist -l
@@ -178,7 +179,7 @@ Do Sep 22 08:37:41 CEST 2022
```
Note that a non-expired cache is available, but NoMachine NX explicitely sets `KRB5CCNAME` to a specific KCM cache. And it contains a ticket/cache which is supposed to the gone.
So there is a security bug in `sssd-kcm`: it does not fully destroy tickets when being told so. And there is another security issue in the NoMachine NX -> `sssd-kcm` interaction. I assume that it talks with the `KCM` as root and gets somehow (or has saved somewhere) old caches and moves them over into user context. But the cache may not originally belong to the user...
So there is a security bug in `sssd-kcm`: it does not fully destroy tickets when being told so. And there is another security issue in the NoMachine NX -> `sssd-kcm` interaction. I assume that it talks with the `KCM` as root and gets somehow (or has saved somewhere) old caches and moves them over into user context. But the cache may originally not have belonged to the user...
I have not found a lot concerning Kerberos on the NoMachine website.
@@ -187,7 +188,7 @@ I have not found a lot concerning Kerberos on the NoMachine website.
Ideally we would get to a solution which can do the following:
- interactive user sessions are isolated do not interfer with each other
- AFS can get hold of new tickets and inject them into the PAGs as long as the user somehow regular authenticates
- AFS can get hold of new tickets and inject them into the PAGs as long as the user somehow regularly authenticates
- `systemd --user` which is residing outside of the interactive user sessions is happy as well
- `goa-daemon` sees only one cache
@@ -210,7 +211,7 @@ I had a very short look at the `systemd` source code, but could not yet find the
#### At the Start of PAM
At some point I also made a test by setting `KRB5CCNAME` at the start of PAM to a fixed name of an existing cache, so that the TGT, etc. end up in a well known place.
That worked well, I also tested sucessfully that autheticating on the screen lock updates the TGT.
That worked well, I also tested sucessfully that authenticating on the screen lock updates the TGT.
Using a random, non-existing cache name resulted in a failure, not in the creation of that cache as it would happen if you do that with `kinit`.
So that self made PAM module would need to be extended to also create the cache.
@@ -239,6 +240,8 @@ How to proceed here? Post this document and ask how to proceed?
### Other Options
- another selfmade daemon to monitor/clean up `sssd-kcm`
Fill in your ideas.
## PS