334 lines
17 KiB
Markdown
334 lines
17 KiB
Markdown
# Kerberos on RHEL 8
|
|
|
|
This document describes the state of Kerberos on RHEL 8.
|
|
This includes the current open issues, a user guide and how we solved the KCM (Kerberos Cache Manager) issues.
|
|
At the bottom you find sequence diagrams showing the interactions concerning authentication and Kerberos.
|
|
|
|
## Open Problems
|
|
|
|
- cleanup of caches, else we might end up in DoS situation. Best we do this `systemd --unit` managed.
|
|
- Kerberos with Firefox does not work yet.
|
|
|
|
## User Guide
|
|
|
|
### Manage Ticket for Admin User
|
|
If you need for administrative operations a TGT from your admin user (e.g. `buchel_k-adm`), then do
|
|
```
|
|
OLD_KRB5CCNAME=$KRB5CCNAME
|
|
export KRB5CCNAME=KCM:$(id -u):admin
|
|
kinit $(id -un)-adm
|
|
```
|
|
and after you are done do
|
|
```
|
|
kdestroy
|
|
export KRB5CCNAME=$OLD_KRB5CCNAME
|
|
```
|
|
to delete your administrative tickets and to get back to your normal credential cache.
|
|
|
|
### Update TGT on Long Running Sessions
|
|
The TGT will be automatically renewed for 7 days.
|
|
Note that a screen unlock or a new connection with NoMachine NX will update the credential cache with a new TGT.
|
|
|
|
But also manual reauthentication is possible. Inside the session you can do
|
|
```
|
|
kinit
|
|
```
|
|
Outside of the session you first need to figure out the credential cache used.
|
|
First get the process ID of the process which needs authentication, then
|
|
```
|
|
$ strings /proc/$PID/environ | grep KRB5CCNAME
|
|
KRB5CCNAME=KCM:44951:iepgjskbkd
|
|
$
|
|
```
|
|
and then a
|
|
```
|
|
KRB5CCNAME=KCM:44951:iepgjskbkd kinit
|
|
```
|
|
will update given credential cache.
|
|
|
|
Note that for AFS it will look in all caches for a valid TGT, so logging in on the desktop or ssh with password or ticket delegation is sufficient to make AFS access work for another week.
|
|
|
|
### List all Credential Caches
|
|
```
|
|
KRB5CCNAME=KCM: klist -l
|
|
```
|
|
lists all caches and
|
|
```
|
|
KRB5CCNAME=KCM: klist -A
|
|
```
|
|
also the tickets therein.
|
|
|
|
|
|
## Kerberos Use and Test Cases
|
|
|
|
- ssh authentication (authentication method `gssapi-with-mic`)
|
|
- ssh TGT (ticket granting ticket) delegation (with `GSSAPIDelegateCredentials yes`)
|
|
- AFS authentication (`aklog`)
|
|
- AFS administrative operations where the user switches to a separate admin principal (e.g. `buchel_k-adm`)
|
|
- long running sessions with `nohup`, `tmux` and `screen`
|
|
- local desktop: get new TGT on login
|
|
- local desktop: TGT renewal after reauthentication on lock screen
|
|
- remote desktop with NoMachine NX: get new TGT on login
|
|
- remote desktop with NoMachine NX: TGT renewal after reconnection
|
|
- website authentication (`SPNEGO` with Firefox, Chrome)
|
|
|
|
## KCM (Kerberos Cache Manager)
|
|
|
|
In RHEL 7 we are using the `KEYRING` (kernel keyring) cache,
|
|
whereas for RHEL 8 there came early the wish to use KCM instead,
|
|
which also is the [new default](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/considerations_in_adopting_rhel_8/identity-management_considerations-in-adopting-rhel-8#kcm-replace-keyring-default-cache_considerations-in-adopting-RHEL-8).
|
|
|
|
The Kerberos documentation contains a [reference for all available cache types]( https://web.mit.edu/kerberos/www/krb5-latest/doc/basic/ccache_def.html).
|
|
|
|
The KCM cache is provided by a dedicated daemon, for RHEL8 this is `sssd_kcm` which has been programmed by Red Hat itself.
|
|
|
|
### Advantages of KCM
|
|
|
|
The advantage of KCM is that the caches are permanent and survive daemon restarts and system reboots without the need to fiddle around with files and file permission. This simplifies daemon and container use cases.
|
|
It also automatically renews tickets which is handy for every use case.
|
|
|
|
### User Based vs Session Based
|
|
|
|
Intuitively I would expect that something delicate as authentication is managed per session (ssh, desktop, console login, ...).
|
|
|
|
Aparently with KCM this is not the case. It provides a default cache which is supposed to be the optimal for you and that can change any time.
|
|
|
|
Problems I see with this are
|
|
- user may change his principal, eg. for admin operations (`kinit buchel_k-adm`) which is then used by all sessions
|
|
- user may destroy the cache (it is good security practice to have a `kdestroy` in `.bash_logout` to ensure nobody on the machine can use your tokens after logging out)
|
|
- software may put tokens into the cache which suddenly are not there any more
|
|
- the magic/heuristic used to select might not work optimally for all use cases (as we see below `sshd-kcm` fails horribly...)
|
|
|
|
So if we have more than one session on a machine (e.g. people connecting via remote desktop and ssh at the same time), the cross-session side-effects can cause unexpected behaviour.
|
|
|
|
In contrast to this for AFS token renewal having access to new tokens is helpful, as this allows prolong the time a `PAG` (group of processes authenticated against AFS) is working as long as there is at least one valid ticket available.
|
|
Or even to recover when a new ticket comes available again.
|
|
|
|
A way to get KCM of of the business of selecting the "optimal" cache is to select it yourself and provide the session/software one specific cache by setting the `KRB5CCNAME` environment variable accordingly (e.g. `KCM:44951:66120`). Note when set to `KCM:` it will use as default cache the one KCM believes should be the default cache. And that can change for whatever reason.
|
|
|
|
|
|
### Problems of `sssd_kcm`
|
|
|
|
To check the Kerberos credential cache, you can use `klist` to look a the current default cache and `klist -l` to look at all available caches. Note that the first listed cache is the default cache. Of course that is only valid when there is no `KRB5CCNAME` environment variable set or it is `KCM:`.
|
|
|
|
#### No Cleanup of Expired Caches
|
|
The most obvious and [well known problem](https://github.com/SSSD/sssd/issues/3593) of `sshd-kcm` is that it does not remove expired tokens and credential caches. I agree that it should not have an impact as this is mostly cosmetic. But that is only the case when everything can cope with that...
|
|
|
|
By default is is limited to 64 caches, but when that limit was hit, then it was not possible any more to authenticate on the lock screen:
|
|
|
|
```
|
|
Okt 05 14:57:11 lxdev01.psi.ch krb5_child[43689]: Internal credentials cache error
|
|
```
|
|
So this causes a denial of service problem, we need to deal with somehow, e.g. by regulary removing expired caches. And note that these caches are persistent and do not get removed on reboot.
|
|
|
|
#### Use of Expired Credential Caches
|
|
In below example you see that on the ssh login, I got a new default cache. But after a few minutes (there was a Desktop login from my side and maybe an automatic AFS token renewal in between), I get an expired cache as default cache.
|
|
```
|
|
$ ssh lxdev01.psi.ch
|
|
Last login: Tue Oct 4 09:50:33 2022
|
|
[buchel_k@lxdev01 ~]$ klist -l
|
|
Principal name Cache name
|
|
-------------- ----------
|
|
buchel_k@D.PSI.CH KCM:44951:42923
|
|
buchel_k@D.PSI.CH KCM:44951:12312 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:42199 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:40168
|
|
buchel_k@D.PSI.CH KCM:44951:8914 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:62275 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:27078 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:73924 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:72006
|
|
buchel_k@D.PSI.CH KCM:44951:64449 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:60061 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:36925 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:48361 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:49651 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:76984 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:54227 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:85800 (Expired)
|
|
[buchel_k@lxdev01 ~]$ klist -l
|
|
Principal name Cache name
|
|
-------------- ----------
|
|
buchel_k@D.PSI.CH KCM:44951:12312 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:42199 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:40168
|
|
buchel_k@D.PSI.CH KCM:44951:8914 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:62275 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:27078 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:73924 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:72006
|
|
buchel_k@D.PSI.CH KCM:44951:64449 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:60061 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:36925 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:48361 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:42923
|
|
buchel_k@D.PSI.CH KCM:44951:49651 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:76984 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:54227 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:85800 (Expired)
|
|
[buchel_k@lxdev01 ~]$
|
|
```
|
|
Note that the automatic AFS token renewal was created after we have experienced this issue.
|
|
|
|
|
|
#### Busy Loop of `goa-daemon`
|
|
If the [GNOME Online Accounts](https://wiki.gnome.org/Projects/GnomeOnlineAccounts) encounters a number of Kerberos credential caches it goes into a busy loop and causes `sssd-kcm` to consume 100% of one core. Happily ignored bugs at [Red Hat](https://bugzilla.redhat.com/show_bug.cgi?id=1645624#c113) and [Gnome](https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/79).
|
|
|
|
#### Zombie Caches by NoMachine NX
|
|
On a machine with remote desktop access using NoMachine NX I have seen following cache list in the log:
|
|
```
|
|
# /usr/bin/klist -l
|
|
Principal name Cache name
|
|
-------------- ----------
|
|
fische_r@D.PSI.CH KCM:45334:73632 (Expired)
|
|
buchel_k@D.PSI.CH KCM:45334:55706 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:44226 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:40904 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:62275 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:89020 (Expired)
|
|
buchel_k@D.PSI.CH KCM:45334:25061 (Expired)
|
|
buchel_k@D.PSI.CH KCM:45334:35168 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:73845 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:47508 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:34317 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:52058 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:16150 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:84445 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:69076 (Expired)
|
|
buchel_k@D.PSI.CH KCM:45334:87346 (Expired)
|
|
fische_r@D.PSI.CH KCM:45334:57070 (Expired)
|
|
```
|
|
or on another machine in my personal list:
|
|
```
|
|
[buchel_k@pc14831 ~]$ klist -l
|
|
Principal name Cache name
|
|
-------------- ----------
|
|
buchel_k@D.PSI.CH KCM:44951:69748
|
|
buchel_k@D.PSI.CH KCM:44951:18506 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:5113 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:52685 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:13951 (Expired)
|
|
PC14831$@D.PSI.CH KCM:44951:43248 (Expired)
|
|
PC14831$@D.PSI.CH KCM:44951:58459 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:14668 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:92516 (Expired)
|
|
[buchel_k@pc14831 ~]$
|
|
```
|
|
Both show principals which I am very sure that they have not been added manually by the user. And somewhere there is a security issue, either `sssd-kcm` or NoMachine NX.
|
|
|
|
In another experiment I logged into a machine with `ssh` and did `kdestroy -A` which should destroy all caches:
|
|
|
|
```
|
|
[buchel_k@mpc2959 ~]$ kdestroy -A
|
|
[buchel_k@mpc2959 ~]$ klist -l
|
|
Principal name Cache name
|
|
[buchel_k@mpc2959 ~]$
|
|
```
|
|
|
|
After I logged in via NoMachine NX I got a cache expired since more than two month:
|
|
|
|
```
|
|
[buchel_k@mpc2959 ~]$ klist -l
|
|
Principal name Cache name
|
|
|
|
buchel_k@D.PSI.CH KCM:44951:16795 (Expired)
|
|
buchel_k@D.PSI.CH KCM:44951:69306
|
|
[buchel_k@mpc2959 ~]$ klist
|
|
Ticket cache: KCM:44951:16795
|
|
Default principal: buchel_k@D.PSI.CH
|
|
|
|
Valid starting Expires Service principal
|
|
13.07.2022 11:35:51 13.07.2022 21:26:19 krbtgt/D.PSI.CH@D.PSI.CH
|
|
renew until 14.07.2022 11:26:19
|
|
[buchel_k@mpc2959 ~]$ date
|
|
Do Sep 22 08:37:41 CEST 2022
|
|
[buchel_k@mpc2959 ~]$
|
|
```
|
|
Note that a non-expired cache is available, but NoMachine NX explicitely sets `KRB5CCNAME` to a specific KCM cache. And it contains a ticket/cache which is supposed to the gone.
|
|
|
|
So there is a security bug in `sssd-kcm`: it does not fully destroy tickets when being told so. And there is another security issue in the NoMachine NX -> `sssd-kcm` interaction. I assume that it talks with the KCM as root and gets somehow (or has saved somewhere) old caches and moves them over into user context. But the cache may originally not have belonged to the user...
|
|
|
|
I have not found a lot concerning Kerberos on the NoMachine website.
|
|
|
|
## Solution Attempts
|
|
|
|
Ideally we would get to a solution which can do the following:
|
|
|
|
- interactive user sessions are isolated do not interfer with each other
|
|
- AFS can get hold of new tickets and inject them into the PAGs as long as the user somehow regularly authenticates
|
|
- `systemd --user` which is residing outside of the interactive user sessions is happy as well
|
|
- `goa-daemon` sees only one cache
|
|
- expired caches get somehow cleaned up
|
|
|
|
|
|
### Only One Cache
|
|
|
|
The `sssd-kcm` limits the number of caches by default to 64, but that can be changed to 1 with the `max_uid_ccaches`.
|
|
So there would be only one cache, shared by all sessions, but at least the KCM cannot serve anything but the latest.
|
|
|
|
But some logins do not work any more when the maximum number of caches is hit as already documented above in the chapter "No Cleanup of Expired Caches".
|
|
|
|
|
|
### renew-afstoken Script/Daemon
|
|
|
|
For AFS we (Achim and I) made the script `renew-afstoken` which is started as per PAG daemon by PAM upon login.
|
|
Out of the available KCM caches it selects a suitable one to regulary get a new AFS token.
|
|
This now works very robust and can also recover from expiration when a new ticket gets available.
|
|
|
|
|
|
### Setup Shared or Isolated Caches with KRB5CCNAME in own PAM Module
|
|
|
|
The self-made PAM module `pam_single_kcm_cache.so` improves the situation by setting
|
|
|
|
- `KRB5CCNAME=KCM:$UID:desktop` to use a shared credential cache for desktop sessions and `systemd --user`
|
|
- `KRB5CCNAME=KCM:$UID:$RANDOM_LETTERS` for text sessions to provide session isolation
|
|
|
|
and providing a working TGT in these caches.
|
|
|
|
I identified so far two cases of the program flow in PAM to manage:
|
|
- **TGT delegation** as done by `sshd` with authentication method `gssapi-with-mic`, where a new cache is created by `sshd` and then filled with the delegated ticket
|
|
- **TGT creation** as done by `pam_sss.so` upon password authentication, where a new TGT is created an placed into the `KCM` managed default cache.
|
|
|
|
Now there is no simple and bullet proof selection of where the TGT ends up in KCM.
|
|
The KCM designated default cache might it be or not.
|
|
To work around this, the module iterates through all credential caches provided by the KCM copies a TGT which is younger than 10 s and has a principal fitting the username.
|
|
|
|
Note that the reason for `systemd --user` to use the same credential cache as the desktop sessions is that at least Gnome uses it to start the user programs like Evolution or Firefox.
|
|
|
|
The code is publicly available on [Github](https://github.com/paulscherrerinstitute/pam_single_kcm_cache).
|
|
|
|
## Diagrams about Kerberos related Interactions
|
|
|
|
Below diagrams show how PAM and especially `pam_single_kcm_cache.so` interact with the KCM in different use cases.
|
|
|
|
### Login with SSH using Password Authentication
|
|

|
|
|
|
That is kind of the "common" authentication case where all important work is done in PAM. This is the same for login on the virtual console or when using `su` with password. At the end there is an shell session with a credential cache which is not used by any other session (unless the user shares it somehow manually). Like this session isolation is achieved.
|
|
|
|
### Login with SSH using Kerberos Authentication and TGT Delegation
|
|

|
|
|
|
This is a bit simpler as all the authentication is done in `sshd` and only the session setup is done by PAM. Note that `sshd` does not use the default cache, but instead creates always a new one with the delegated TGT.
|
|
|
|
### Systemd User Instance
|
|
|
|
In above diagrams we see how `systemd --user` is being started. It is also using PAM to setup its own session, but it does not do any authentication.
|
|
|
|

|
|
|
|
Here we use a predefined name for the credential cache so it can be shared with the desktop sessions. The next diagram shows more in detail how `systemd --user` and the Gnome desktop interact.
|
|
|
|
### Gnome Desktop
|
|
|
|
This is the most complex use case:
|
|
|
|

|
|
|
|
At the end we have a well known shared credential cache between Gnome and `systemd --user`. This is needed `systemd --user` is used extensively by Gnome. Important is that the Kerberos setup already happens at authentication phase as there is no session setup phase for screen unlock as the user returns there to an already existing session.
|
|
|
|
With NoMachine NX this is configured similarly.
|
|
|
|
## PS
|
|
There is an advantage in the broken `sssd-kcm` default cache selection: it forces us to make our stuff robust against KCM glitches, which might also occur with a better manager, just way less often and then it would be more harder to explain and to track down.
|