update state of Kerberos
This commit is contained in:
+50
-55
@@ -3,7 +3,8 @@
|
||||
This document describes the Kerberos issues we encountered during RHEL 8 introduction.
|
||||
|
||||
In RHEL 7 we are using the `KEYRING` (kernel keyring) cache,
|
||||
whereas for RHEL 8 there came early the wish to use `KCM` (Kerberos Cache Manager) instead.
|
||||
whereas for RHEL 8 there came early the wish to use `KCM` (Kerberos Cache Manager) instead,
|
||||
which also is the [new default](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/considerations_in_adopting_rhel_8/identity-management_considerations-in-adopting-rhel-8#kcm-replace-keyring-default-cache_considerations-in-adopting-RHEL-8).
|
||||
|
||||
The Kerberos documentation contains a [reference for all available cache types]( https://web.mit.edu/kerberos/www/krb5-latest/doc/basic/ccache_def.html).
|
||||
|
||||
@@ -17,13 +18,13 @@ The Kerberos documentation contains a [reference for all available cache types](
|
||||
- local desktop: ticket renewal after reauthentication on lock screen
|
||||
- remote desktop with NoMachine NX: get new TGT on login
|
||||
- remote desktop with NoMachine NX: ticket renewal after reconnection
|
||||
- Website authentication (`SPNEGO` with Firefox, Chrome)
|
||||
- website authentication (`SPNEGO` with Firefox, Chrome)
|
||||
|
||||
## KCM
|
||||
## `KCM`
|
||||
|
||||
The `KCM` cache is provided by a dedicated daemon, for RHEL8 this is `sssd_kcm` which has been programmed by Red Hat itself.
|
||||
|
||||
### Advantages of KCM
|
||||
### Advantages of `KCM`
|
||||
|
||||
The advantage of `KCM` is that the caches are permanent and survive daemon restarts and system reboots without the need to fiddle around with files and file permission. This simplifies daemon and container use cases.
|
||||
It also automatically renews tickets which is handy for every use case.
|
||||
@@ -48,12 +49,20 @@ Or even to recover when a new ticket comes available again.
|
||||
A way to get `KCM` of of the business of selecting the "optimal" cache is to select it yourself and provide the session/software one specific cache by setting the `KRB5CCNAME` environment variable accordingly (e.g. `KCM:44951:66120`). Note when set to `KCM:` it will use the default cache from `KCM`.
|
||||
|
||||
|
||||
### Problems of sssd_kcm
|
||||
|
||||
The most obvious and [well known problem](https://github.com/SSSD/sssd/issues/3593) of `sshd-kcm` is that it does not remove expired tokens and credential caches. I agree that it should not have an impact as this is mostly cosmetic. But that is only the case when everything can cope with that...
|
||||
### Problems of `sssd_kcm`
|
||||
|
||||
To check the Kerberos credential cache, you can use `klist` to look a the current default cache and `klist -l` to look at all available caches. Note that the first listed cache is the default cache. Of course that is only valid when there is no `KRB5CCNAME` environment variable set or it is `KCM:`.
|
||||
|
||||
#### No Cleanup of Expired Caches
|
||||
The most obvious and [well known problem](https://github.com/SSSD/sssd/issues/3593) of `sshd-kcm` is that it does not remove expired tokens and credential caches. I agree that it should not have an impact as this is mostly cosmetic. But that is only the case when everything can cope with that...
|
||||
|
||||
By default is is limited to 64 caches, but when that limit was hit, then it was not possible any more to authenticate on the lock screen:
|
||||
|
||||
```
|
||||
Okt 05 14:57:11 lxdev01.psi.ch krb5_child[43689]: Internal credentials cache error
|
||||
```
|
||||
So this causes a denial of service problem, we need to deal with somehow, e.g. by regulary removing expired caches. And note that these caches are persistent and do not get removed on reboot.
|
||||
|
||||
#### Use of Expired Credential Caches
|
||||
In below example you see that on the ssh login, I got a new default cache. But after a few minutes (there was a Desktop login from my side and maybe an automatic AFS token renewal in between), I get an expired cache as default cache.
|
||||
```
|
||||
@@ -104,7 +113,7 @@ buchel_k@D.PSI.CH KCM:44951:85800 (Expired)
|
||||
Note that the automatic AFS token renewal was created after we have experienced this issue.
|
||||
|
||||
|
||||
#### Busy Loop of goa-daemon
|
||||
#### Busy Loop of `goa-daemon`
|
||||
If the [GNOME Online Accounts](https://wiki.gnome.org/Projects/GnomeOnlineAccounts) encounters a number of Kerberos credential caches it goes into a busy loop and causes `sssd-kcm` to consume 100% of one core. Happily ignored bugs at [Red Hat](https://bugzilla.redhat.com/show_bug.cgi?id=1645624#c113) and [Gnome](https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/79).
|
||||
|
||||
#### Zombie Caches by NoMachine NX
|
||||
@@ -191,6 +200,15 @@ Ideally we would get to a solution which can do the following:
|
||||
- AFS can get hold of new tickets and inject them into the PAGs as long as the user somehow regularly authenticates
|
||||
- `systemd --user` which is residing outside of the interactive user sessions is happy as well
|
||||
- `goa-daemon` sees only one cache
|
||||
- expired caches get somehow cleaned up
|
||||
|
||||
|
||||
### Only One Cache
|
||||
The `sssd-kcm` limits the number of caches by default to 64, but that can be changed to 1 with the `max_uid_ccaches`.
|
||||
So there would be only one cache, shared by all sessions, but at least the `KCM` cannot serve anything but the latest.
|
||||
|
||||
But some logins do not work any more when the maximum number of caches is hit as already documented above in the chapter "No Cleanup of Expired Caches".
|
||||
|
||||
|
||||
### renew-afstoken Script/Daemon
|
||||
|
||||
@@ -198,73 +216,50 @@ For AFS we (Achim and I) made the script `renew-afstoken` which is started as pe
|
||||
Out of the available `KCM` caches it selects a suitable one to regulary get a new AFS token.
|
||||
This now works very robust and can also recover from expiration when a new ticket gets available.
|
||||
|
||||
### Session Isolation with KRB5CCNAME
|
||||
|
||||
#### At the End of PAM
|
||||
The idea is to set `KRB5CCNAME` to the very cache which has been created while going through the PAM stack.
|
||||
A self-made PAM module just does this.
|
||||
|
||||
It works well for ssh sessions and might also work well for simple desktop sessions, but not for GNOME.
|
||||
In GNOME the user programms do not start as child of the login screen and thus do not inherit the environment variables.
|
||||
They are started by `systemd --user` and which sets `KRB5CCNAME` to `KCM:` instead of using the system default (which results in the same behaviour).
|
||||
I had a very short look at the `systemd` source code, but could not yet find the place where `KRB5CCNAME` is set. And the RHEL8 version of `systemd` has more than 800 patches compared with upstream... (OK, some might be backports...)
|
||||
### Setup Shared or Isolated Caches with KRB5CCNAME in own PAM Module
|
||||
|
||||
#### At the Start of PAM
|
||||
At some point I also made a test by setting `KRB5CCNAME` at the start of PAM to a fixed name of an existing cache, so that the TGT, etc. end up in a well known place.
|
||||
That worked well, I also tested sucessfully that authenticating on the screen lock updates the TGT.
|
||||
A self-made PAM module `pam_single_kcm_cache.so` runs at session setup to set:
|
||||
|
||||
Using a random, non-existing cache name resulted in a failure, not in the creation of that cache as it would happen if you do that with `kinit`.
|
||||
So that self made PAM module would need to be extended to also create the cache.
|
||||
I assumed that the "End of PAM" solution would be easier to implement, so I opted for that.
|
||||
- `KRB5CCNAME=KCM:$UID:desktop` to use a shared credential cache for desktop sessions and `systemd --user`
|
||||
- `KRB5CCNAME=KCM:$UID:$RANDOM_LETTERS` for text sessions to provide session isolation
|
||||
|
||||
#### Copy Credentials from Automatic Generated Cache to Self-Made Cache Both "Start of PAM" or "End of PAM"
|
||||
In my latest version of my self-made PAM module can be used both at auth (early) and at session (late) part of the PAM stack. It either sets `KRB5CCNAME` to a randomly generated cache ( e.g. `KCM:44951:bzfandqspm`) or to a given value, e.g. `suffix=desktop` would create `KCM:44951:desktop`.
|
||||
Now as the automatically created credential caches by `sshd` (ticket delegation) or by `gdm` (created in PAM by `sss.so`, I guess) end up in a new `KCM` cache with the pattern `KCM:$UID:$RANDOM_NUMBER`.
|
||||
The credentials therein are now copied over in the newly created or already existing cache. The former, automatic created cache is then destroyed.
|
||||
|
||||
To make ticket delegation work, it copies the credentials from the default cache to the new cache.
|
||||
Now there is no simple and bullet proof selection of the automatically created credential cache.
|
||||
The default cache used select by KCM might it be or not.
|
||||
To work around this, the module iterates through all credential caches provided by the KCM and selects only those with the pattern `KCM:$UID` or `KCM:$UID:$RANDOM_NUMBER` which has a principal fitting the username.
|
||||
From all of those it selects the one which is the youngest.
|
||||
|
||||
First experiments with with ssh are successful. For the desktop I use a fixed cache `KCM:44951:desktop`. This makes`KRB5CCNAME` is correctly set, but the credentials are in the current version not yet available.
|
||||
Note that the reason for `systemd --user` to use the same credential cache as the desktop sessions is that at least Gnome uses it to start the user programs like Evolution or Firefox.
|
||||
|
||||
TODOs:
|
||||
- better selection of the source cache as the default cache is not always ideal (I have code for that already in the "End of PAM"-only version.
|
||||
- do not delete credentials in target cache
|
||||
- the "random" might not be nessesary, at least for ssh it it would be sufficient to fix it to the automatic generated cache
|
||||
- or destroy source cache?
|
||||
- might be the early part (auth, "Start of PAM") is not needed.
|
||||
Ideally at the end there exist only caches with the naming pattern `KCM:$UID:desktop` and `KCM:$UID:$RANDOM_LETTERS`.
|
||||
If there are still some `KCM:$UID:$RANDOM_NUMBER` then they were not caught, e.g. because they use an so far unknown authentication path.
|
||||
|
||||
There are still some more experiments required.
|
||||
The code you find in [Gitlab](https://git.psi.ch/linux-infra/pam_single_kcm_cache) where there is currently an [open merge request for the initial commit](https://git.psi.ch/linux-infra/pam_single_kcm_cache/-/merge_requests/1). I plan to make that public on Github.
|
||||
|
||||
### Only One Cache
|
||||
The `sssd-kcm` limits the number of caches by default to 64, but that can be changed to 1 with the `max_uid_ccaches`.
|
||||
So there would be only one cache, shared by all sessions, but at least the `KCM` cannot serve anything but the latest.
|
||||
|
||||
I did not exactly test this, but I tested what happens when all 64 caches are used up.
|
||||
|
||||
It was not possible any more to authenticate on the lock screen:
|
||||
|
||||
```
|
||||
Okt 05 14:57:11 lxdev01.psi.ch krb5_child[43689]: Internal credentials cache error
|
||||
```
|
||||
So this causes a denial of service problem, we need to deal with somehow, e.g. by regulary removing expired caches.
|
||||
## Open Problems
|
||||
|
||||
- for NX and su I do not get a copy of the initial cache (or is there an initial cache?), this needs more investigation
|
||||
- when getting out of Gnome screen lock it puts the new TGT into the default KCM cache and not necessarily into `KCM:$UID:desktop`
|
||||
- cleanup of caches, else we might end up in DoS situation
|
||||
- `pam_single_kcm_cache.so` could be extended to destroy cache on end of session => not a good idea with AFS and long running background calculations
|
||||
- `pam_single_kcm_cache.so` could be extended to optionally destroy all caches at the end of session => useful for `systemd --user`, because that ends after the last user process has ended and would then do a full cleanup. This would also ensure a empty KCM after a clean shutdown.
|
||||
- alternatively we might do a `systemd --user` unit doing so, maybe also as daemon to clean up old expired caches
|
||||
|
||||
## Options for Next Steps
|
||||
|
||||
### Continue Experiments with Own PAM Module Setting KRB5CCNAME
|
||||
I think we can get here a solution where we get the KCM out of the business of selecting the best cache while still using the rest of its advantages. But that needs a bit more work and experimentation on that.
|
||||
### Continue with `pam_single_kcm_cache.so`
|
||||
I think we can get here a solution where we get the KCM out of the business of selecting the best cache while still using the rest of its advantages.
|
||||
|
||||
### Try out KEYRING
|
||||
Maybe we can try to create a solution with `KEYRING` which isolates the interactive sessions and still allows the AFS token renewal to access all caches. This then also needs `renew-afstoken` to care about Kerberos ticket renewal.
|
||||
|
||||
For the listed use cases above the caches and tickets do not need to survive reboots. If there is something/someone needing `KCM` for some reason, it can be used specifically and privately and will not interfer with the rest of the system.
|
||||
|
||||
### How to deal with Systemd --user ?
|
||||
The `systemd --user` process is started at the beginning of the first session and ends at the end of the last session. And some desktop environments depend heavily on it.
|
||||
|
||||
So the ideal solution might be to have just one known "desktop" cache (e.g. `KCM:44951:desktop`) which is shared by `systemd --user` and all desktop sessions.
|
||||
|
||||
Additinally I figured out that it is possble to inject environment variables into `systemd --user` with `systemctl --user import-environment` or `systemctl --user set-environment`, but that then affects only newly started software.
|
||||
Only experiments would show if this is good enought or if some important processes live longer than the desktop session.
|
||||
|
||||
### Red Hat Ticket
|
||||
I have an [ticket](https://access.redhat.com/support/cases/#/case/03280446) open with Red Hat on this case. On the first part I concentrated on the missing session isolation, but it showed that this is the supposed behaviour of a KCM setup.
|
||||
|
||||
@@ -274,7 +269,7 @@ Then it is not that easy to reproduce as the problem is best seen in a long runn
|
||||
|
||||
I posted a few strange looking `klist` outputs and asked for explanation, but that seamed not yet to have reached someone with intimidate `sssd-kcm` knowledge.
|
||||
|
||||
How to proceed here? Post this document and ask how to proceed?
|
||||
I posted this document, but so far the response was not very helpful.
|
||||
|
||||
### Other Options
|
||||
|
||||
|
||||
Reference in New Issue
Block a user