Files
Controls-docs/rhel8/nvidia.md
T
2022-11-15 14:26:18 +01:00

2.2 KiB

CUDA and Proprietary Nvidia GPU Drivers on RHEL 8

Managing Nvidia software comes with its own set of challenges. For the most common cases are covered by our Puppet configuration. Those are discussed in the first chapter, more details you find more below.

Hiera Configuration

Changes in Hiera are forwared by Puppet to the node, but not applied. They are applied on reboot. Alternatively you might execute /opt/pli/libexec/ensure-nvidia-software in a safe moment (no process using CUDA and the desktop will be restarted).

I need CUDA

Set in Hiera nvidia::cuda::install_software: true and it will automatically install the suitable Nvidia drivers and newest possible CUDA version.

To enable nvidia_persistenced you additionally need to set nvidia::cuda::nvidia_persistenced::enable: true.

I need a specific CUDA version

Then you can additionally set nvidia::cuda::version to the desired version. The version must be fully specified (all three numbers, with X.Y.0 for the GA version).

Note that newer CUDA versions do not support older drivers, for details see Table 3 in the CUDA Release Notes.

I just need the Nvidia drivers

Nothing needs to be done, they are installed by default when Nvidia GPUs or accelerators are found.

I do not want the Nvidia drivers

Set in Hiera nvidia::driver::enable: false. Note this will be ignored if CUDA is enabled (see above).

Note they do not get automatically removed when already installed. That you would need to do by hand.

I need the Nvidia drivers from a specific driver branch

The driver branch can be selected in Hiera with nvidia::driver::branch. It will then use the latest driver version of that branch. Note that only production branches are available in the PSI package repository.

I need a Nvidia driver of a given version

This is not recommended, still it is possible to do so by setting the exact driver version (X.Y.Z, excluding the package iteration number) in Hiera with nvidia::driver::version.

If the driver version is too old, it will install an older kernel version and you will need a second reboot to activate it.

Versioning Mess

Manual Operation