update Kerberos state

This commit is contained in:
2022-10-28 10:44:34 +02:00
parent e7eeac6a6b
commit 7986374fce

View File

@@ -1,39 +1,74 @@
# Kerberos on RHEL 8
This document describes the Kerberos issues we encountered during RHEL 8 introduction.
This document describes the state of Kerberos on RHEL 8.
This includes the current open issues, a user guide and how we solved the KCM (Kerberos Cache Manager) issues.
In RHEL 7 we are using the `KEYRING` (kernel keyring) cache,
whereas for RHEL 8 there came early the wish to use `KCM` (Kerberos Cache Manager) instead,
which also is the [new default](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/considerations_in_adopting_rhel_8/identity-management_considerations-in-adopting-rhel-8#kcm-replace-keyring-default-cache_considerations-in-adopting-RHEL-8).
## Open Problems
- cleanup of caches, else we might end up in DoS situation. Best we do this `systemd --unit` managed.
- Kerberos with Firefox does not work yet.
## User Guide
### Manage Ticket for Admin User
If you need for administrative operations a TGT from your admin user (e.g. `buchel_k-adm`), then do
```
OLD_KRB5CCNAME=$KRB5CCNAME
export KRB5CCNAME=KCM:$(id -u):admin
kinit $(id -n)-adm
```
and after you are done do
```
kdestroy
```
to delete your administrative tickets. You might now exit the shell or switch back the Kerberos cache with
```
export KRB5CCNAME=$OLD_KRB5CCNAME
```
### List all Credential Caches
```
KRB5CCNAME=KCM: klist -l
```
lists all caches and
```
KRB5CCNAME=KCM: klist -A
```
also the tickets therein.
The Kerberos documentation contains a [reference for all available cache types]( https://web.mit.edu/kerberos/www/krb5-latest/doc/basic/ccache_def.html).
## Kerberos Use and Test Cases
- ssh authentication (authentication method `gssapi-with-mic`)
- ssh ticket delegation (with `GSSAPIDelegateCredentials yes`)
- ssh TGT (ticket granting ticket) delegation (with `GSSAPIDelegateCredentials yes`)
- AFS authentication (`aklog`)
- AFS administrative operations where the user switches to a separate admin principal (e.g. `buchel_k-adm`)
- local desktop: get new TGT on login
- local desktop: ticket renewal after reauthentication on lock screen
- local desktop: TGT renewal after reauthentication on lock screen
- remote desktop with NoMachine NX: get new TGT on login
- remote desktop with NoMachine NX: ticket renewal after reconnection
- remote desktop with NoMachine NX: TGT renewal after reconnection
- website authentication (`SPNEGO` with Firefox, Chrome)
## `KCM`
The `KCM` cache is provided by a dedicated daemon, for RHEL8 this is `sssd_kcm` which has been programmed by Red Hat itself.
In RHEL 7 we are using the `KEYRING` (kernel keyring) cache,
whereas for RHEL 8 there came early the wish to use KCM (Kerberos Cache Manager) instead,
which also is the [new default](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/considerations_in_adopting_rhel_8/identity-management_considerations-in-adopting-rhel-8#kcm-replace-keyring-default-cache_considerations-in-adopting-RHEL-8).
### Advantages of `KCM`
The Kerberos documentation contains a [reference for all available cache types]( https://web.mit.edu/kerberos/www/krb5-latest/doc/basic/ccache_def.html).
The advantage of `KCM` is that the caches are permanent and survive daemon restarts and system reboots without the need to fiddle around with files and file permission. This simplifies daemon and container use cases.
The KCM cache is provided by a dedicated daemon, for RHEL8 this is `sssd_kcm` which has been programmed by Red Hat itself.
### Advantages of KCM
The advantage of KCM is that the caches are permanent and survive daemon restarts and system reboots without the need to fiddle around with files and file permission. This simplifies daemon and container use cases.
It also automatically renews tickets which is handy for every use case.
### User Based vs Session Based
Intuitively I would expect that something delicate as authentication is managed per session (ssh, desktop, console login, ...).
Aparently with `KCM` this is not the case. It provides a default cache which is supposed to be the optimal for you and that can change any time.
Aparently with KCM this is not the case. It provides a default cache which is supposed to be the optimal for you and that can change any time.
Problems I see with this are
- user may change his principal, eg. for admin operations (`kinit buchel_k-adm`) which is then used by all sessions
@@ -46,7 +81,7 @@ So if we have more than one session on a machine (e.g. people connecting via rem
In contrast to this for AFS token renewal having access to new tokens is helpful, as this allows prolong the time a `PAG` (group of processes authenticated against AFS) is working as long as there is at least one valid ticket available.
Or even to recover when a new ticket comes available again.
A way to get `KCM` of of the business of selecting the "optimal" cache is to select it yourself and provide the session/software one specific cache by setting the `KRB5CCNAME` environment variable accordingly (e.g. `KCM:44951:66120`). Note when set to `KCM:` it will use the default cache from `KCM`.
A way to get KCM of of the business of selecting the "optimal" cache is to select it yourself and provide the session/software one specific cache by setting the `KRB5CCNAME` environment variable accordingly (e.g. `KCM:44951:66120`). Note when set to `KCM:` it will use as default cache the one KCM believes should be the default cache. And that can change for whatever reason.
### Problems of `sssd_kcm`
@@ -159,7 +194,7 @@ buchel_k@D.PSI.CH KCM:44951:92516 (Expired)
Both show principals which I am very sure that they have not been added manually by the user. And somewhere there is a security issue, either `sssd-kcm` or NoMachine NX.
In another experiment I logged into a machine with `ssh` and did `kdestroy -A` which should destroy all caches:
```
[buchel_k@mpc2959 ~]$ kdestroy -A
[buchel_k@mpc2959 ~]$ klist -l
@@ -188,7 +223,7 @@ Do Sep 22 08:37:41 CEST 2022
```
Note that a non-expired cache is available, but NoMachine NX explicitely sets `KRB5CCNAME` to a specific KCM cache. And it contains a ticket/cache which is supposed to the gone.
So there is a security bug in `sssd-kcm`: it does not fully destroy tickets when being told so. And there is another security issue in the NoMachine NX -> `sssd-kcm` interaction. I assume that it talks with the `KCM` as root and gets somehow (or has saved somewhere) old caches and moves them over into user context. But the cache may originally not have belonged to the user...
So there is a security bug in `sssd-kcm`: it does not fully destroy tickets when being told so. And there is another security issue in the NoMachine NX -> `sssd-kcm` interaction. I assume that it talks with the KCM as root and gets somehow (or has saved somewhere) old caches and moves them over into user context. But the cache may originally not have belonged to the user...
I have not found a lot concerning Kerberos on the NoMachine website.
@@ -204,8 +239,9 @@ Ideally we would get to a solution which can do the following:
### Only One Cache
The `sssd-kcm` limits the number of caches by default to 64, but that can be changed to 1 with the `max_uid_ccaches`.
So there would be only one cache, shared by all sessions, but at least the `KCM` cannot serve anything but the latest.
So there would be only one cache, shared by all sessions, but at least the KCM cannot serve anything but the latest.
But some logins do not work any more when the maximum number of caches is hit as already documented above in the chapter "No Cleanup of Expired Caches".
@@ -213,59 +249,31 @@ But some logins do not work any more when the maximum number of caches is hit as
### renew-afstoken Script/Daemon
For AFS we (Achim and I) made the script `renew-afstoken` which is started as per PAG daemon by PAM upon login.
Out of the available `KCM` caches it selects a suitable one to regulary get a new AFS token.
Out of the available KCM caches it selects a suitable one to regulary get a new AFS token.
This now works very robust and can also recover from expiration when a new ticket gets available.
### Setup Shared or Isolated Caches with KRB5CCNAME in own PAM Module
A self-made PAM module `pam_single_kcm_cache.so` runs at session setup to set:
The self-made PAM module `pam_single_kcm_cache.so` improves the situation by setting
- `KRB5CCNAME=KCM:$UID:desktop` to use a shared credential cache for desktop sessions and `systemd --user`
- `KRB5CCNAME=KCM:$UID:$RANDOM_LETTERS` for text sessions to provide session isolation
I identified so far two cases of the "program flow" in PAM to manage:
- *TGT delegation* as done by `sshd` with authentication method `gssapi-with-mic`, where a new cache is created by `sshd` and then filled with the delegated ticket
- *TGT creation* as done by `sss.so` upon password authentication, where a new TGT is created an placed into the `KCM` managed default cache.
and providing a working TGT in these caches.
The current version of `pam_single_kcm_cache.so` so far handles the first case well, but not so much the second case, as I wrongly assumed that the `sss.so` would also create a new cache.
I identified so far two cases of the program flow in PAM to manage:
- **TGT delegation** as done by `sshd` with authentication method `gssapi-with-mic`, where a new cache is created by `sshd` and then filled with the delegated ticket
- **TGT creation** as done by `sss.so` upon password authentication, where a new TGT is created an placed into the `KCM` managed default cache.
So the current version assumes that the TGT ends up in a new `KCM` cache with the pattern `KCM:$UID:$RANDOM_NUMBER`.
The credentials therein are now copied over in the newly created or already existing cache. The former, automatic created cache is then destroyed.
Now there is no simple and bullet proof selection of the automatically created credential cache.
The default cache used select by KCM might it be or not.
To work around this, the module iterates through all credential caches provided by the KCM and selects only those with the pattern `KCM:$UID` or `KCM:$UID:$RANDOM_NUMBER` which has a principal fitting the username.
From all of those it selects the one which is the youngest.
Now there is no simple and bullet proof selection of where the TGT ends up in KCM.
The KCM designated default cache might it be or not.
To work around this, the module iterates through all credential caches provided by the KCM copies a TGT which is younger than 10 s and has a principal fitting the username.
Note that the reason for `systemd --user` to use the same credential cache as the desktop sessions is that at least Gnome uses it to start the user programs like Evolution or Firefox.
Ideally at the end there exist only caches with the naming pattern `KCM:$UID:desktop` and `KCM:$UID:$RANDOM_LETTERS`.
If there are still some `KCM:$UID:$RANDOM_NUMBER` then they were not caught, e.g. because they use an so far unknown authentication path.
The code you find in [Gitlab](https://git.psi.ch/linux-infra/pam_single_kcm_cache) where there is currently an [open merge request for the initial commit](https://git.psi.ch/linux-infra/pam_single_kcm_cache/-/merge_requests/1). I plan to make that public on Github.
## Open Problems
- `pam_single_kcm_cache.so` needs to deal properly with newly TGTs
- cleanup of caches, else we might end up in DoS situation
- `pam_single_kcm_cache.so` could be extended to destroy cache on end of session => not a good idea with AFS and long running background calculations
- `pam_single_kcm_cache.so` could be extended to optionally destroy all caches at the end of session => useful for `systemd --user`, because that ends after the last user process has ended and would then do a full cleanup. This would also ensure a empty KCM after a clean shutdown.
- alternatively we might do a `systemd --user` unit doing so, maybe also as daemon to clean up old expired caches
## Options for Next Steps
### Continue with `pam_single_kcm_cache.so`
I think we can get here a solution where we get the KCM out of the business of selecting the best cache while still using the rest of its advantages.
### Try out KEYRING
Maybe we can try to create a solution with `KEYRING` which isolates the interactive sessions and still allows the AFS token renewal to access all caches. This then also needs `renew-afstoken` to care about Kerberos ticket renewal.
For the listed use cases above the caches and tickets do not need to survive reboots. If there is something/someone needing `KCM` for some reason, it can be used specifically and privately and will not interfer with the rest of the system.
### Red Hat Ticket
I have an [ticket](https://access.redhat.com/support/cases/#/case/03280446) open with Red Hat on this case. On the first part I concentrated on the missing session isolation, but it showed that this is the supposed behaviour of a KCM setup.
@@ -284,7 +292,4 @@ I hope that sharing this document with them will help.
Fill in your ideas.
## PS
There is an advanage in the broken `sssd-kcm` default cache selection: it forces us to make our stuff robust against `KCM` glitches, which might also occur with a better manager, just way less often and then it would be more harder to explain and to track down.
There is an advanage in the broken `sssd-kcm` default cache selection: it forces us to make our stuff robust against KCM glitches, which might also occur with a better manager, just way less often and then it would be more harder to explain and to track down.