3.6 KiB
Deployment
A deployment roughly has the following phases:
- DHCP followed by PXE boot.
- Kickstart installation followed by a reboot.
- Initial Puppet run, followed by updates, followed by another Puppet run and a reboot.
PXE boot/iPXE
When deployment fails during the PXE phase it usually due to one of the following:
- No network connectivity - This is usually indicated by messages similar to
No link on XXX. - No DHCP in the connected network (eg DMZ, tier3) - The DHCP requests by the BIOS/UEFI firmware will time out.
- Firewall (no TFTP/HTTP to the relevant servers)
- Incompatibilities between iPXE and network card (NIC)
- Incorrect sysdb entry (hence iPXE entry incorrect).
If there is not DHCP, the static network information provided manually is possibly wrong or for a different network than the one connected to the host.
Infiniband
Infiniband can generally cause installation problem, expecially in the initial phase, when iPXE tries to load the configuration file. As a general rule, disable PXE on all Infiniband cards.
Anyway this is not always enough since it happens that iPXE recognized anyway the Infiniband card as the first device (with MAC address 79:79:79:79:79:79) and tries to get configuration file for that.
Kickstart
Typical problems during the Kickstart phase:
- The Kickstart file cannot be retrieved from the sysdb server sysdb.psi.ch. Typically caused by incorrect sysdb entries or firewalls.
- Partitioning fails. This can happen because
- No disk is recognized, or the wrong disk is used
- Packages or other installation data cannot be downloaded. Can be caused by firewalls or incorrect sysdb entries.
Hiera
A typical problem are Hiera errors, eg the following::
Info: Using configured environment 'prod'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'console::mount_root' at /srv/puppet/code/dev/envs/prod/code/modules/role/manifests/console.pp:1 on node lxdev05.psi.ch
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
The error message shows that the value for console::mount_root could not be found in Hiera.
Active Directory
Sometimes the Active Directory join fails, usually for one of these three reasons:
- There is already an Active Directory computer object for the same system from a previous Windows installation. In this case, delete the computer object and restart the installation.
- Firewall restrictions
- Old Puppet certificates from a previous SL6 installation are used on the system. In this case delete the certificates on the client with
find /etc/puppetlabs -name '*.pem' -deleteand clean up any certificates on the Puppet server withpuppet cert clean $HOSTNAME. Then restart the installation.
Rejoin Active Directory
If the AD join seams to be broken (failed logins, etc.), then the node can be automatically rejoined again:
- remove
/etc/krb5.keytab - run puppet, e.g. with
puppet agent --test
YFS / AFS
If the yfs-client does not start (cannot load kernel module) due to key not available:
Sep 02 13:21:34 pc12661.psi.ch systemd[1]: Starting AuriStorFS Client Service...
Sep 02 13:21:34 pc12661.psi.ch modprobe[29282]: modprobe: ERROR: could not insert 'yfs': Required key not available
then there is most probably SecureBoot blocking the loading of the unsigned yfs kernel module.
Disable secure boot in the BIOS/EFI settings.