GPUs fixes

This commit is contained in:
caubet_m 2021-01-15 12:29:22 +01:00
parent 23b16eac18
commit c110a835fc

View File

@ -121,16 +121,18 @@ The following settings are required for running on the GPU nodes:
```bash
#SBATCH --gres=gpu # Always set at least this option when using GPUs
```
Please read below **[GPU advanced settings](/merlin6/running-gpu-jobs.html#gpu-advanced-settings)** for other `--gres` options.
This option is still valid as this might be needed by other resources, but for GPUs new options (i.e. `--gpus`, `--mem-per-gpu`) can be used, which provide more flexibility when running on GPUs.
Please read below **[GPU advanced settings](/merlin6/running-jobs.html#gpu-advanced-settings)** for other `--gpus` options.
* **`[Valid from 08.01.2021]` GPU options (instead of GRES):** Slurm must be aware that the job will use GPUs. New options are available for specifying
the GPUs as a consumable resource. These are the following:
* `--gpus` *instead of* (but also in addition with) `--gres=gpu`: specifies the total number of GPUs required for the job.
* `--gpus=[<type>:]<number>` *instead of* (but also in addition with) `--gres=gpu`: specifies the total number of GPUs required for the job.
* `--gpus-per-task=[<type>:]<number>`, `--gpus-per-socket=[<type>:]<number>`, `--gpus-per-node=[<type>:]<number>` to specify the number of GPUs per tasks and/or socket and/or node.
* `--gpus-per-node=[<type>:]<number>`, `--gpus-per-socket`, `--gpus-per-task`, to specify how many GPUs per node, socket and or tasks need to be allocated.
* `--cpus-per-gpu`, to specify the number of CPUs to be used for each GPU.
* `--mem-per-gpu`, to specify the amount of memory to be used for each GPU.
* `--gpus-per-node`, `--gpus-per-socket`, `--gpus-per-task`, to specify how many GPUs per node, socket and or tasks need to be allocated.
* Other advanced options (i.e. `--gpu-bind`). Please see **man** pages for **sbatch**/**srun**/**salloc** (i.e. *`man sbatch`*) for further information.
Please read below **[GPU advanced settings](/merlin6/running-gpu-jobs.html#gpu-advanced-settings)** for other `--gpus` options.
Please read below **[GPU advanced settings](/merlin6/running-jobs.html#gpu-advanced-settings)** for other `--gpus` options.
* Please, consider that one can specify the GPU `type` on some of the options. If one needs to specify it, then it must be specified in all options defined in the Slurm job.
#### GPU advanced settings
@ -144,11 +146,12 @@ Valid ``gres`` options are: ``gpu[[:type]:count]`` where ``type=GTX1080|GTX1080T
```
**From 08.01.2021**, `--gres` is not needed anymore (but can still be used), and `--gpus` and related other options should replace it. `--gpus` works in a similar way, but without
the need of specifying the `gpu` resource. In oher words, `--gpus` options are: ``[[:type]:count]`` where ``type=GTX1080|GTX1080Ti|RTX2080Ti`` and ``count=<number of gpus to use>``. In example:
the need of specifying the `gpu` resource. In oher words, `--gpus` options are: ``[[:type]:count]`` where ``type=GTX1080|GTX1080Ti|RTX2080Ti`` (which is optional) and ``count=<number of gpus to use>``. In example:
```bash
#SBATCH --gpus=GTX1080:4 # Use 4 GPUs with Type=GTX1080
```
This setting can use in addition other settings, such like `--gpus-per-node`, in order to accomplish a similar behaviour as with `--gres`.
* Please, consider that one can specify the GPU `type` in some of the options. If one needs to specify it, then it must be specified in all options defined in the Slurm job.
{{site.data.alerts.tip}}Always check <span style="color:orange;"><b>'/etc/slurm/gres.conf'</b></span> for checking available <span style="color:orange;"><i>Types</i></span> and for details of the NUMA node.
{{site.data.alerts.end}}
@ -208,7 +211,7 @@ The following template should be used by any user submitting jobs to GPU nodes:
```bash
#!/bin/bash
#SBATCH --partition=<gpu|gpu-short> # Specify GPU partition
#SBATCH --gpus="<type>:<number_gpus>" # You should specify at least 'gpu'
#SBATCH --gpus="<type>:<num_gpus>" # <type> is optional, <num_gpus> is mandatory
#SBATCH --time=<D-HH:MM:SS> # Strongly recommended
#SBATCH --output=<output_file> # Generate custom output file
#SBATCH --error=<error_file # Generate custom error file
@ -220,9 +223,9 @@ The following template should be used by any user submitting jobs to GPU nodes:
##SBATCH --ntasks=1 # Uncomment and specify number of nodes to use
##SBATCH --cpus-per-gpu=5 # Uncomment and specify the number of cores per task
##SBATCH --mem-per-gpu=16000 # Uncomment and specify the number of cores per task
##SBATCH --gpus-per-node=2 # Uncomment and specify the number of GPUs per node
##SBATCH --gpus-per-socket=2 # Uncomment and specify the number of GPUs per socket
##SBATCH --gpus-per-task=1 # Uncomment and specify the number of GPUs per task
##SBATCH --gpus-per-node=<type>:2 # Uncomment and specify the number of GPUs per node
##SBATCH --gpus-per-socket=<type>:2 # Uncomment and specify the number of GPUs per socket
##SBATCH --gpus-per-task=<type>:1 # Uncomment and specify the number of GPUs per task
```
## Advanced configurations