17 KiB
Kerberos on RHEL 8
This document describes the state of Kerberos on RHEL 8. This includes the current open issues, a user guide and how we solved the KCM (Kerberos Cache Manager) issues. At the bottom you find sequence diagrams showing the interactions concerning authentication and Kerberos.
Open Problems
- cleanup of caches, else we might end up in DoS situation. Best we do this
systemd --unitmanaged. - Kerberos with Firefox does not work yet.
User Guide
Manage Ticket for Admin User
If you need for administrative operations a TGT from your admin user (e.g. buchel_k-adm), then do
OLD_KRB5CCNAME=$KRB5CCNAME
export KRB5CCNAME=KCM:$(id -u):admin
kinit $(id -un)-adm
and after you are done do
kdestroy
export KRB5CCNAME=$OLD_KRB5CCNAME
to delete your administrative tickets and to get back to your normal credential cache.
Update TGT on Long Running Sessions
The TGT will be automatically renewed for 7 days. Note that a screen unlock or a new connection with NoMachine NX will update the credential cache with a new TGT.
But also manual reauthentication is possible. Inside the session you can do
kinit
Outside of the session you first need to figure out the credential cache used. First get the process ID of the process which needs authentication, then
$ strings /proc/$PID/environ | grep KRB5CCNAME
KRB5CCNAME=KCM:44951:iepgjskbkd
$
and then a
KRB5CCNAME=KCM:44951:iepgjskbkd kinit
will update given credential cache.
Note that for AFS it will look in all caches for a valid TGT, so logging in on the desktop or ssh with password or ticket delegation is sufficient to make AFS access work for another week.
List all Credential Caches
KRB5CCNAME=KCM: klist -l
lists all caches and
KRB5CCNAME=KCM: klist -A
also the tickets therein.
Kerberos Use and Test Cases
- ssh authentication (authentication method
gssapi-with-mic) - ssh TGT (ticket granting ticket) delegation (with
GSSAPIDelegateCredentials yes) - AFS authentication (
aklog) - AFS administrative operations where the user switches to a separate admin principal (e.g.
buchel_k-adm) - local desktop: get new TGT on login
- local desktop: TGT renewal after reauthentication on lock screen
- remote desktop with NoMachine NX: get new TGT on login
- remote desktop with NoMachine NX: TGT renewal after reconnection
- website authentication (
SPNEGOwith Firefox, Chrome)
KCM (Kerberos Cache Manager)
In RHEL 7 we are using the KEYRING (kernel keyring) cache,
whereas for RHEL 8 there came early the wish to use KCM instead,
which also is the new default.
The Kerberos documentation contains a reference for all available cache types.
The KCM cache is provided by a dedicated daemon, for RHEL8 this is sssd_kcm which has been programmed by Red Hat itself.
Advantages of KCM
The advantage of KCM is that the caches are permanent and survive daemon restarts and system reboots without the need to fiddle around with files and file permission. This simplifies daemon and container use cases. It also automatically renews tickets which is handy for every use case.
User Based vs Session Based
Intuitively I would expect that something delicate as authentication is managed per session (ssh, desktop, console login, ...).
Aparently with KCM this is not the case. It provides a default cache which is supposed to be the optimal for you and that can change any time.
Problems I see with this are
- user may change his principal, eg. for admin operations (
kinit buchel_k-adm) which is then used by all sessions - user may destroy the cache (it is good security practice to have a
kdestroyin.bash_logoutto ensure nobody on the machine can use your tokens after logging out) - software may put tokens into the cache which suddenly are not there any more
- the magic/heuristic used to select might not work optimally for all use cases (as we see below
sshd-kcmfails horribly...)
So if we have more than one session on a machine (e.g. people connecting via remote desktop and ssh at the same time), the cross-session side-effects can cause unexpected behaviour.
In contrast to this for AFS token renewal having access to new tokens is helpful, as this allows prolong the time a PAG (group of processes authenticated against AFS) is working as long as there is at least one valid ticket available.
Or even to recover when a new ticket comes available again.
A way to get KCM of of the business of selecting the "optimal" cache is to select it yourself and provide the session/software one specific cache by setting the KRB5CCNAME environment variable accordingly (e.g. KCM:44951:66120). Note when set to KCM: it will use as default cache the one KCM believes should be the default cache. And that can change for whatever reason.
Problems of sssd_kcm
To check the Kerberos credential cache, you can use klist to look a the current default cache and klist -l to look at all available caches. Note that the first listed cache is the default cache. Of course that is only valid when there is no KRB5CCNAME environment variable set or it is KCM:.
No Cleanup of Expired Caches
The most obvious and well known problem of sshd-kcm is that it does not remove expired tokens and credential caches. I agree that it should not have an impact as this is mostly cosmetic. But that is only the case when everything can cope with that...
By default is is limited to 64 caches, but when that limit was hit, then it was not possible any more to authenticate on the lock screen:
Okt 05 14:57:11 lxdev01.psi.ch krb5_child[43689]: Internal credentials cache error
So this causes a denial of service problem, we need to deal with somehow, e.g. by regulary removing expired caches. And note that these caches are persistent and do not get removed on reboot.
Use of Expired Credential Caches
In below example you see that on the ssh login, I got a new default cache. But after a few minutes (there was a Desktop login from my side and maybe an automatic AFS token renewal in between), I get an expired cache as default cache.
$ ssh lxdev01.psi.ch
Last login: Tue Oct 4 09:50:33 2022
[buchel_k@lxdev01 ~]$ klist -l
Principal name Cache name
-------------- ----------
buchel_k@D.PSI.CH KCM:44951:42923
buchel_k@D.PSI.CH KCM:44951:12312 (Expired)
buchel_k@D.PSI.CH KCM:44951:42199 (Expired)
buchel_k@D.PSI.CH KCM:44951:40168
buchel_k@D.PSI.CH KCM:44951:8914 (Expired)
buchel_k@D.PSI.CH KCM:44951:62275 (Expired)
buchel_k@D.PSI.CH KCM:44951:27078 (Expired)
buchel_k@D.PSI.CH KCM:44951:73924 (Expired)
buchel_k@D.PSI.CH KCM:44951:72006
buchel_k@D.PSI.CH KCM:44951:64449 (Expired)
buchel_k@D.PSI.CH KCM:44951:60061 (Expired)
buchel_k@D.PSI.CH KCM:44951:36925 (Expired)
buchel_k@D.PSI.CH KCM:44951:48361 (Expired)
buchel_k@D.PSI.CH KCM:44951:49651 (Expired)
buchel_k@D.PSI.CH KCM:44951:76984 (Expired)
buchel_k@D.PSI.CH KCM:44951:54227 (Expired)
buchel_k@D.PSI.CH KCM:44951:85800 (Expired)
[buchel_k@lxdev01 ~]$ klist -l
Principal name Cache name
-------------- ----------
buchel_k@D.PSI.CH KCM:44951:12312 (Expired)
buchel_k@D.PSI.CH KCM:44951:42199 (Expired)
buchel_k@D.PSI.CH KCM:44951:40168
buchel_k@D.PSI.CH KCM:44951:8914 (Expired)
buchel_k@D.PSI.CH KCM:44951:62275 (Expired)
buchel_k@D.PSI.CH KCM:44951:27078 (Expired)
buchel_k@D.PSI.CH KCM:44951:73924 (Expired)
buchel_k@D.PSI.CH KCM:44951:72006
buchel_k@D.PSI.CH KCM:44951:64449 (Expired)
buchel_k@D.PSI.CH KCM:44951:60061 (Expired)
buchel_k@D.PSI.CH KCM:44951:36925 (Expired)
buchel_k@D.PSI.CH KCM:44951:48361 (Expired)
buchel_k@D.PSI.CH KCM:44951:42923
buchel_k@D.PSI.CH KCM:44951:49651 (Expired)
buchel_k@D.PSI.CH KCM:44951:76984 (Expired)
buchel_k@D.PSI.CH KCM:44951:54227 (Expired)
buchel_k@D.PSI.CH KCM:44951:85800 (Expired)
[buchel_k@lxdev01 ~]$
Note that the automatic AFS token renewal was created after we have experienced this issue.
Busy Loop of goa-daemon
If the GNOME Online Accounts encounters a number of Kerberos credential caches it goes into a busy loop and causes sssd-kcm to consume 100% of one core. Happily ignored bugs at Red Hat and Gnome.
Zombie Caches by NoMachine NX
On a machine with remote desktop access using NoMachine NX I have seen following cache list in the log:
# /usr/bin/klist -l
Principal name Cache name
-------------- ----------
fische_r@D.PSI.CH KCM:45334:73632 (Expired)
buchel_k@D.PSI.CH KCM:45334:55706 (Expired)
fische_r@D.PSI.CH KCM:45334:44226 (Expired)
fische_r@D.PSI.CH KCM:45334:40904 (Expired)
fische_r@D.PSI.CH KCM:45334:62275 (Expired)
fische_r@D.PSI.CH KCM:45334:89020 (Expired)
buchel_k@D.PSI.CH KCM:45334:25061 (Expired)
buchel_k@D.PSI.CH KCM:45334:35168 (Expired)
fische_r@D.PSI.CH KCM:45334:73845 (Expired)
fische_r@D.PSI.CH KCM:45334:47508 (Expired)
fische_r@D.PSI.CH KCM:45334:34317 (Expired)
fische_r@D.PSI.CH KCM:45334:52058 (Expired)
fische_r@D.PSI.CH KCM:45334:16150 (Expired)
fische_r@D.PSI.CH KCM:45334:84445 (Expired)
fische_r@D.PSI.CH KCM:45334:69076 (Expired)
buchel_k@D.PSI.CH KCM:45334:87346 (Expired)
fische_r@D.PSI.CH KCM:45334:57070 (Expired)
or on another machine in my personal list:
[buchel_k@pc14831 ~]$ klist -l
Principal name Cache name
-------------- ----------
buchel_k@D.PSI.CH KCM:44951:69748
buchel_k@D.PSI.CH KCM:44951:18506 (Expired)
buchel_k@D.PSI.CH KCM:44951:5113 (Expired)
buchel_k@D.PSI.CH KCM:44951:52685 (Expired)
buchel_k@D.PSI.CH KCM:44951:13951 (Expired)
PC14831$@D.PSI.CH KCM:44951:43248 (Expired)
PC14831$@D.PSI.CH KCM:44951:58459 (Expired)
buchel_k@D.PSI.CH KCM:44951:14668 (Expired)
buchel_k@D.PSI.CH KCM:44951:92516 (Expired)
[buchel_k@pc14831 ~]$
Both show principals which I am very sure that they have not been added manually by the user. And somewhere there is a security issue, either sssd-kcm or NoMachine NX.
In another experiment I logged into a machine with ssh and did kdestroy -A which should destroy all caches:
[buchel_k@mpc2959 ~]$ kdestroy -A
[buchel_k@mpc2959 ~]$ klist -l
Principal name Cache name
[buchel_k@mpc2959 ~]$
After I logged in via NoMachine NX I got a cache expired since more than two month:
[buchel_k@mpc2959 ~]$ klist -l
Principal name Cache name
buchel_k@D.PSI.CH KCM:44951:16795 (Expired)
buchel_k@D.PSI.CH KCM:44951:69306
[buchel_k@mpc2959 ~]$ klist
Ticket cache: KCM:44951:16795
Default principal: buchel_k@D.PSI.CH
Valid starting Expires Service principal
13.07.2022 11:35:51 13.07.2022 21:26:19 krbtgt/D.PSI.CH@D.PSI.CH
renew until 14.07.2022 11:26:19
[buchel_k@mpc2959 ~]$ date
Do Sep 22 08:37:41 CEST 2022
[buchel_k@mpc2959 ~]$
Note that a non-expired cache is available, but NoMachine NX explicitely sets KRB5CCNAME to a specific KCM cache. And it contains a ticket/cache which is supposed to the gone.
So there is a security bug in sssd-kcm: it does not fully destroy tickets when being told so. And there is another security issue in the NoMachine NX -> sssd-kcm interaction. I assume that it talks with the KCM as root and gets somehow (or has saved somewhere) old caches and moves them over into user context. But the cache may originally not have belonged to the user...
I have not found a lot concerning Kerberos on the NoMachine website.
Solution Attempts
Ideally we would get to a solution which can do the following:
- interactive user sessions are isolated do not interfer with each other
- AFS can get hold of new tickets and inject them into the PAGs as long as the user somehow regularly authenticates
systemd --userwhich is residing outside of the interactive user sessions is happy as wellgoa-daemonsees only one cache- expired caches get somehow cleaned up
Only One Cache
The sssd-kcm limits the number of caches by default to 64, but that can be changed to 1 with the max_uid_ccaches.
So there would be only one cache, shared by all sessions, but at least the KCM cannot serve anything but the latest.
But some logins do not work any more when the maximum number of caches is hit as already documented above in the chapter "No Cleanup of Expired Caches".
renew-afstoken Script/Daemon
For AFS we (Achim and I) made the script renew-afstoken which is started as per PAG daemon by PAM upon login.
Out of the available KCM caches it selects a suitable one to regulary get a new AFS token.
This now works very robust and can also recover from expiration when a new ticket gets available.
Setup Shared or Isolated Caches with KRB5CCNAME in own PAM Module
The self-made PAM module pam_single_kcm_cache.so improves the situation by setting
KRB5CCNAME=KCM:$UID:desktopto use a shared credential cache for desktop sessions andsystemd --userKRB5CCNAME=KCM:$UID:$RANDOM_LETTERSfor text sessions to provide session isolation
and providing a working TGT in these caches.
I identified so far two cases of the program flow in PAM to manage:
- TGT delegation as done by
sshdwith authentication methodgssapi-with-mic, where a new cache is created bysshdand then filled with the delegated ticket - TGT creation as done by
pam_sss.soupon password authentication, where a new TGT is created an placed into theKCMmanaged default cache.
Now there is no simple and bullet proof selection of where the TGT ends up in KCM. The KCM designated default cache might it be or not. To work around this, the module iterates through all credential caches provided by the KCM copies a TGT which is younger than 10 s and has a principal fitting the username.
Note that the reason for systemd --user to use the same credential cache as the desktop sessions is that at least Gnome uses it to start the user programs like Evolution or Firefox.
The code is publicly available on Github.
Diagrams about Kerberos related Interactions
Below diagrams show how PAM and especially pam_single_kcm_cache.so interact with the KCM in different use cases.
Login with SSH using Password Authentication
That is kind of the "common" authentication case where all important work is done in PAM. This is the same for login on the virtual console or when using su with password. At the end there is an shell session with a credential cache which is not used by any other session (unless the user shares it somehow manually). Like this session isolation is achieved.
Login with SSH using Kerberos Authentication and TGT Delegation
This is a bit simpler as all the authentication is done in sshd and only the session setup is done by PAM. Note that sshd does not use the default cache, but instead creates always a new one with the delegated TGT.
Systemd User Instance
In above diagrams we see how systemd --user is being started. It is also using PAM to setup its own session, but it does not do any authentication.
Here we use a predefined name for the credential cache so it can be shared with the desktop sessions. The next diagram shows more in detail how systemd --user and the Gnome desktop interact.
Gnome Desktop
This is the most complex use case:
At the end we have a well known shared credential cache between Gnome and systemd --user. This is needed systemd --user is used extensively by Gnome. Important is that the Kerberos setup already happens at authentication phase as there is no session setup phase for screen unlock as the user returns there to an already existing session.
With NoMachine NX this is configured similarly.
PS
There is an advanage in the broken sssd-kcm default cache selection: it forces us to make our stuff robust against KCM glitches, which might also occur with a better manager, just way less often and then it would be more harder to explain and to track down.



