178 lines
9.3 KiB
Markdown
178 lines
9.3 KiB
Markdown
# Red Hat Enterprise Linux 8
|
|
|
|
## Production Ready
|
|
|
|
The central infrastructure (automatic provisioning, upstream package synchronisation and Puppet) are stable and production ready.
|
|
|
|
The configuration management is done with Puppet like for RHEL 7. RHEL 7 and RHEL 8 hosts can share the same hierarchy in Hiera and thus also the "same" configuration. In cases where the configuration for RHEL 7 or RHEL 8 differs, the idea is to have both in parallel in Hiera and Puppet shall select the right one.
|
|
|
|
|
|
Please still consider also implementing following two migrations when moving to RHEL 8:
|
|
- migrate from Icinga1 to [Icinga2](../admin-guide/configuration/icinga2), as Icinga1 will be decommissioned by end of 2024
|
|
- explicit [network configuration in Hiera](../admin-guide/configuration/networking) with `networking::setup`, especially if you have static IP addresses or static routes
|
|
|
|
Bugs and issues can be reported in the [Linux project in JIRA](https://jira.psi.ch/browse/PSILINUX).
|
|
|
|
## Documenation
|
|
|
|
* [Installation](installation)
|
|
* [CUDA and Nvidia Drivers](nvidia)
|
|
* [Kerberos](kerberos)
|
|
* [Desktop](desktop)
|
|
* [Hardware Compatibility](hardware_compatibility)
|
|
* [Vendor Documentation](vendor_documentation)
|
|
|
|
## Disk Layout
|
|
The default partition schema for RHEL8 is:
|
|
|
|
- create one primary ``/boot`` partition of 1Gb;
|
|
- create the ``vg_root`` Volume Group that uses the rest of the disk;
|
|
- on ``vg_root`` create the following logical volumes:
|
|
- ``lv_root`` of 14 Gb size for ``/root``;
|
|
- ``lv_home`` of 2 Gb size for ``/home``;
|
|
- ``lv_var`` of 8 Gb size for ``/var``;
|
|
- ``lv_var_log`` of 3 Gb size for ``/var/log``;
|
|
- ``lv_var_tmp`` of 2 Gb size for ``/var/log``;
|
|
- ``lv_tmp`` of 2 Gb size for ``/tmp``.
|
|
|
|
## Caveats
|
|
|
|
### Missing or Replaced Packages
|
|
|
|
[List of packages removed in RHEL 8](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/considerations_in_adopting_rhel_8/index#removed-packages_changes-to-packages)
|
|
|
|
| RHEL 7 | RHEL 8 | remarks |
|
|
| --- | --- | --- |
|
|
| `a2ps` | recommends to use `enscript` instead | [`enscript` upstream](https://www.gnu.org/software/enscript/) [`a2ps` upstream](https://www.gnu.org/software/a2ps/) |
|
|
| `blt` | - | [`blt` upstream](http://blt.sourceforge.net/), does not work with newer Tk version ([source](https://wiki.tcl-lang.org/page/BLT)) |
|
|
| `gnome-icon-theme-legacy` | - | used for RHEL 7 Icewm |
|
|
| ... | ... | here I stopped research, please report/document further packages |
|
|
| `devtoolset*` | `gcc-toolset*` | |
|
|
| `git-cvs` | - | `cvs` itself is not supported by RHEL8, but available through EPEL. Still missing is the support for `git cvsimport`. |
|
|
|
|
|
|
### Missing RAID Drivers
|
|
|
|
#### Missing RAID Drivers during Installation
|
|
|
|
For RHEL 8 Red Hat phased out some hardware drivers, here is an [official list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/considerations_in_adopting_rhel_8/hardware-enablement_considerations-in-adopting-rhel-8#removed-adapters_hardware-enablement), but I also found some stuff missing not listed there.
|
|
|
|
Installation with an unsupported RAID adapter then fails as the installer does not find a system disk to use.
|
|
|
|
To figure out what driver you need, best go the the installer shell or boot a rescue linux over the network and on the shell check the PCI Device ID of the RAID controller with
|
|
|
|
```
|
|
$ lspci -nn
|
|
...
|
|
82:00.0 RAID bus controller [0104]: 3ware Inc 9750 SAS2/SATA-II RAID PCIe [13c1:1010] (rev 05)
|
|
...
|
|
```
|
|
The ID is in the rightmost square brackets. Then check if there are drivers available.
|
|
|
|
I will now focus on [ElRepo](https://elrepo.org/) which provides drivers not supported any more by Red Hat. Check the PCI Device ID on their list of (https://elrepo.org/tiki/DeviceIDs). If you found a driver, then there are also [driver disks provided](https://linuxsoft.cern.ch/elrepo/dud/el8/x86_64/).
|
|
|
|
There are two option in providing this driver disk to the installer:
|
|
|
|
1. Download the according `.iso` file and extract it on an USB stick labelled with `OEMDRV` and have it connected during installation.
|
|
2. Extend the kernel command line with `inst.dd=$URL_OF_ISO_FILE`, e.g. with a custom Grub config on the [boot server](https://git.psi.ch/linux-infra/network-boot) or with the sysdb/bob attribute `kernel_cmdline`.
|
|
|
|
([Red Hat documentation of this procedure](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/performing_an_advanced_rhel_8_installation/index#updating-drivers-during-installation_installing-rhel-as-an-experienced-user))
|
|
|
|
At the end do not forget to enable the ElRepo RPM package repository in Hiera to also get new drivers for updated kernels:
|
|
```
|
|
# enable 3rd-party drivers from ElRepo
|
|
rpm_repos::default:
|
|
- 'elrepo_rhel8'
|
|
```
|
|
|
|
#### Missing RAID Drivers on Kernel Upgrade
|
|
|
|
If the machine does not boot after provisioning or after an kernel upgrade with
|
|
```
|
|
Warning: /dev/mapper/vg_root-lv_root does not exist
|
|
Warning: /dev/vg_root/lv_root does not exist
|
|
```
|
|
after a lot of
|
|
```
|
|
Warning: dracut-initqueue timeout - starting timeout scripts
|
|
```
|
|
the it could be that the RAID controller supported was removed with the new kernel, e.g. for the LSI MegaRAID SAS there is a [dedicated article](https://access.redhat.com/solutions/3751841).
|
|
|
|
For the LSI MegaRAID SAS there is still a driver available in ElRepo, so it can be installed during provisioning by Puppet.
|
|
To do so add to Hiera:
|
|
```
|
|
base::pkg_group::....:
|
|
- 'kmod-megaraid_sas'
|
|
|
|
rpm_repos::default:
|
|
- 'elrepo_rhel8'
|
|
```
|
|
|
|
### AFS cache partition not created due to existing XFS signature
|
|
It can happen when upgrading an existing RHEL 7 installation that the puppet run produces
|
|
```
|
|
Error: Execution of '/usr/sbin/lvcreate -n lv_openafs --size 2G vg_root' returned 5: WARNING: xfs signature detected on /dev/vg_root/lv_openafs at offset 0. Wipe it? [y/n]: [n]
|
|
```
|
|
This needs to be fixed manually:
|
|
- run the complaining command and approve (or use `--yes`)
|
|
- run `puppet agent -t` to finalize the configuration
|
|
|
|
### Puppet run fails to install KCM related service/timer on Slurm node
|
|
|
|
The Puppet run fails with
|
|
```
|
|
Notice: /Stage[main]/Profile::Aaa/Systemd::Service[kcm-destroy]/Exec[start-global-user-service-kcm-destroy]/returns: Failed to connect to bus: Connection refused
|
|
Error: '/usr/bin/systemctl --quiet start --global kcm-destroy.service' returned 1 instead of one of [0]
|
|
Error: /Stage[main]/Profile::Aaa/Systemd::Service[kcm-destroy]/Exec[start-global-user-service-kcm-destroy]/returns: change from 'notrun' to ['0'] failed: '/usr/bin/systemctl --quiet start --global kcm-destroy.service' returned 1 instead of one of [0] (corrective)
|
|
Notice: /Stage[main]/Profile::Aaa/Profile::Custom_timer[kcm-cleanup]/Systemd::Timer[kcm-cleanup]/Exec[start-global-user-timer-kcm-cleanup]/returns: Failed to connect to bus: Connection refused
|
|
Error: '/usr/bin/systemctl --quiet start --global kcm-cleanup.timer' returned 1 instead of one of [0]
|
|
Error: /Stage[main]/Profile::Aaa/Profile::Custom_timer[kcm-cleanup]/Systemd::Timer[kcm-cleanup]/Exec[start-global-user-timer-kcm-cleanup]/returns: change from 'notrun' to ['0'] failed: '/usr/bin/systemctl --quiet start --global kcm-cleanup.timer' returned 1 instead of one of [0] (corrective)
|
|
```
|
|
|
|
This is caused by the use of KCM as default Kerberos credential cache in RHEL8:
|
|
|
|
- for RHEL8 it was recommended to use the KCM provided by sssd as Kerberos Credential Cache.
|
|
- a major issue of this KCM is that it does not remove outdated caches
|
|
- this leads to a Denial-of-Service situation when all 64 slots are filled, new logins start to fail after (this is persistent, reboot does not help).
|
|
- we fix this issue by running regularly cleanup script in user context
|
|
- this "user context" is handled by the `systemd --user` instance, which is started on the first login and keeps running until the last session ends.
|
|
- that systemd user instance is started by `pam_systemd.so`
|
|
- `pam_systemd.so` and `pam_slurm_adopt.so` conflict because both want to set up cgroups
|
|
- because of this there is no `pam_systemd.so` configured on Slurm nodes thus there is no `systemd --user` instance
|
|
|
|
I see two options to solve this issue:
|
|
- do not use KCM
|
|
- get somehow systemd user instance running
|
|
|
|
|
|
#### do not use KCM
|
|
Can be done in Hiera, to get back to RHEL7 behavior do
|
|
|
|
aaa::default_krb_cache: "KEYRING:persistent:%{literal('%')}{uid}"
|
|
|
|
then there will be no KCM magic any more.
|
|
We could also make this automatically happen in Puppet when Slurm is enabled.
|
|
|
|
|
|
#### get somehow systemd user instance running
|
|
`pam_systemd.so` does not want to take its hands off cgroups:
|
|
https://github.com/systemd/systemd/issues/13535
|
|
|
|
But there is documented how to get (part?) of the `pam_systemd.so` functionality running with Slurm:
|
|
https://slurm.schedmd.com/pam_slurm_adopt.html#PAM_CONFIG
|
|
(the Prolog, TaskProlog and Epilog part).
|
|
I wonder if that also starts a `systemd --user` instance or not. Or if it is possible to somehow integrate the start of it therein.
|
|
|
|
|
|
### Workstation Installation Takes Long and Seams to Hang
|
|
On the very first puppet run the command to install the GUI packages takes up to 10 minutes and it looks like it
|
|
is hanging. Usually it is after the installation of `/etc/sssd/sssd.conf`. Just give it a bit time.
|
|
|
|
### "yum/dnf search" Gives Permission Denied as Normal User
|
|
It works fine beside the below error message:
|
|
```
|
|
Failed to store expired repos cache: [Errno 13] Permission denied: '/var/cache/dnf/x86_64/8/expired_repos.json'
|
|
```
|
|
which is IMHO OK to not allow a normal user to do changes there.
|
|
|