From c110a835fc94d72f723582f6e08579d28a24331e Mon Sep 17 00:00:00 2001 From: caubet_m Date: Fri, 15 Jan 2021 12:29:22 +0100 Subject: [PATCH] GPUs fixes --- .../merlin6/03 Job Submission/running-jobs.md | 23 +++++++++++-------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/pages/merlin6/03 Job Submission/running-jobs.md b/pages/merlin6/03 Job Submission/running-jobs.md index f3c2aea..1219f95 100644 --- a/pages/merlin6/03 Job Submission/running-jobs.md +++ b/pages/merlin6/03 Job Submission/running-jobs.md @@ -121,16 +121,18 @@ The following settings are required for running on the GPU nodes: ```bash #SBATCH --gres=gpu # Always set at least this option when using GPUs ``` - - Please read below **[GPU advanced settings](/merlin6/running-gpu-jobs.html#gpu-advanced-settings)** for other `--gres` options. + This option is still valid as this might be needed by other resources, but for GPUs new options (i.e. `--gpus`, `--mem-per-gpu`) can be used, which provide more flexibility when running on GPUs. + Please read below **[GPU advanced settings](/merlin6/running-jobs.html#gpu-advanced-settings)** for other `--gpus` options. * **`[Valid from 08.01.2021]` GPU options (instead of GRES):** Slurm must be aware that the job will use GPUs. New options are available for specifying the GPUs as a consumable resource. These are the following: - * `--gpus` *instead of* (but also in addition with) `--gres=gpu`: specifies the total number of GPUs required for the job. + * `--gpus=[:]` *instead of* (but also in addition with) `--gres=gpu`: specifies the total number of GPUs required for the job. + * `--gpus-per-task=[:]`, `--gpus-per-socket=[:]`, `--gpus-per-node=[:]` to specify the number of GPUs per tasks and/or socket and/or node. + * `--gpus-per-node=[:]`, `--gpus-per-socket`, `--gpus-per-task`, to specify how many GPUs per node, socket and or tasks need to be allocated. * `--cpus-per-gpu`, to specify the number of CPUs to be used for each GPU. * `--mem-per-gpu`, to specify the amount of memory to be used for each GPU. - * `--gpus-per-node`, `--gpus-per-socket`, `--gpus-per-task`, to specify how many GPUs per node, socket and or tasks need to be allocated. * Other advanced options (i.e. `--gpu-bind`). Please see **man** pages for **sbatch**/**srun**/**salloc** (i.e. *`man sbatch`*) for further information. - Please read below **[GPU advanced settings](/merlin6/running-gpu-jobs.html#gpu-advanced-settings)** for other `--gpus` options. + Please read below **[GPU advanced settings](/merlin6/running-jobs.html#gpu-advanced-settings)** for other `--gpus` options. + * Please, consider that one can specify the GPU `type` on some of the options. If one needs to specify it, then it must be specified in all options defined in the Slurm job. #### GPU advanced settings @@ -144,11 +146,12 @@ Valid ``gres`` options are: ``gpu[[:type]:count]`` where ``type=GTX1080|GTX1080T ``` **From 08.01.2021**, `--gres` is not needed anymore (but can still be used), and `--gpus` and related other options should replace it. `--gpus` works in a similar way, but without -the need of specifying the `gpu` resource. In oher words, `--gpus` options are: ``[[:type]:count]`` where ``type=GTX1080|GTX1080Ti|RTX2080Ti`` and ``count=``. In example: +the need of specifying the `gpu` resource. In oher words, `--gpus` options are: ``[[:type]:count]`` where ``type=GTX1080|GTX1080Ti|RTX2080Ti`` (which is optional) and ``count=``. In example: ```bash #SBATCH --gpus=GTX1080:4 # Use 4 GPUs with Type=GTX1080 ``` This setting can use in addition other settings, such like `--gpus-per-node`, in order to accomplish a similar behaviour as with `--gres`. + * Please, consider that one can specify the GPU `type` in some of the options. If one needs to specify it, then it must be specified in all options defined in the Slurm job. {{site.data.alerts.tip}}Always check '/etc/slurm/gres.conf' for checking available Types and for details of the NUMA node. {{site.data.alerts.end}} @@ -208,7 +211,7 @@ The following template should be used by any user submitting jobs to GPU nodes: ```bash #!/bin/bash #SBATCH --partition= # Specify GPU partition -#SBATCH --gpus=":" # You should specify at least 'gpu' +#SBATCH --gpus=":" # is optional, is mandatory #SBATCH --time= # Strongly recommended #SBATCH --output= # Generate custom output file #SBATCH --error=:2 # Uncomment and specify the number of GPUs per node +##SBATCH --gpus-per-socket=:2 # Uncomment and specify the number of GPUs per socket +##SBATCH --gpus-per-task=:1 # Uncomment and specify the number of GPUs per task ``` ## Advanced configurations