GPUs fixes

2021-01-15 12:29:22 +01:00 · 2021-01-15 12:29:22 +01:00 · c110a835fc
commit c110a835fc
parent 23b16eac18
1 changed files with 13 additions and 10 deletions
--- a/Submission/running-jobs.md
+++ b/Submission/running-jobs.md
@ -121,16 +121,18 @@ The following settings are required for running on the GPU nodes:
  ```bash
  #SBATCH --gres=gpu                    # Always set at least this option when using GPUs
  ```
- 
-  Please read below **[GPU advanced settings](/merlin6/running-gpu-jobs.html#gpu-advanced-settings)** for other `--gres` options.
+  This option is still valid as this might be needed by other resources, but for GPUs new options (i.e. `--gpus`, `--mem-per-gpu`) can be used, which provide more flexibility when running on GPUs.
+  Please read below **[GPU advanced settings](/merlin6/running-jobs.html#gpu-advanced-settings)** for other `--gpus` options.
 * **`[Valid from 08.01.2021]` GPU options (instead of GRES):** Slurm must be aware that the job will use GPUs. New options are available for specifying
 the GPUs as a consumable resource. These are the following:
-  * `--gpus` *instead of* (but also in addition with) `--gres=gpu`: specifies the total number of GPUs required for the job.
+  * `--gpus=[<type>:]<number>` *instead of* (but also in addition with) `--gres=gpu`: specifies the total number of GPUs required for the job.
+  * `--gpus-per-task=[<type>:]<number>`, `--gpus-per-socket=[<type>:]<number>`, `--gpus-per-node=[<type>:]<number>` to specify the number of GPUs per tasks and/or socket and/or node.
+  * `--gpus-per-node=[<type>:]<number>`, `--gpus-per-socket`, `--gpus-per-task`, to specify how many GPUs per node, socket and or tasks need to be allocated.
  * `--cpus-per-gpu`, to specify the number of CPUs to be used for each GPU.
  * `--mem-per-gpu`, to specify the amount of memory to be used for each GPU.
-  * `--gpus-per-node`, `--gpus-per-socket`, `--gpus-per-task`, to specify how many GPUs per node, socket and or tasks need to be allocated.
  * Other advanced options (i.e. `--gpu-bind`). Please see **man** pages for **sbatch**/**srun**/**salloc** (i.e. *`man sbatch`*) for further information.
-  Please read below **[GPU advanced settings](/merlin6/running-gpu-jobs.html#gpu-advanced-settings)** for other `--gpus` options.
+  Please read below **[GPU advanced settings](/merlin6/running-jobs.html#gpu-advanced-settings)** for other `--gpus` options.
+  * Please, consider that one can specify the GPU `type` on some of the options. If one needs to specify it, then it must be specified in all options defined in the Slurm job.

 #### GPU advanced settings

@ -144,11 +146,12 @@ Valid ``gres`` options are: ``gpu[[:type]:count]`` where ``type=GTX1080|GTX1080T
 ```

 **From 08.01.2021**, `--gres` is not needed anymore (but can still be used), and `--gpus` and related other options should replace it. `--gpus` works in a similar way, but without
-the need of specifying the `gpu` resource. In oher words, `--gpus` options are: ``[[:type]:count]`` where ``type=GTX1080|GTX1080Ti|RTX2080Ti`` and ``count=<number of gpus to use>``. In example:
+the need of specifying the `gpu` resource. In oher words, `--gpus` options are: ``[[:type]:count]`` where ``type=GTX1080|GTX1080Ti|RTX2080Ti`` (which is optional) and ``count=<number of gpus to use>``. In example:
 ```bash
 #SBATCH --gpus=GTX1080:4   # Use 4 GPUs with Type=GTX1080
 ```
 This setting can use in addition other settings, such like `--gpus-per-node`, in order to accomplish a similar behaviour as with `--gres`. 
+  * Please, consider that one can specify the GPU `type` in some of the options. If one needs to specify it, then it must be specified in all options defined in the Slurm job.

 {{site.data.alerts.tip}}Always check <span style="color:orange;"><b>'/etc/slurm/gres.conf'</b></span> for checking available <span style="color:orange;"><i>Types</i></span> and for details of the NUMA node.
 {{site.data.alerts.end}}
@ -208,7 +211,7 @@ The following template should be used by any user submitting jobs to GPU nodes:
 ```bash
 #!/bin/bash
 #SBATCH --partition=<gpu|gpu-short>         # Specify GPU partition
-#SBATCH --gpus="<type>:<number_gpus>"       # You should specify at least 'gpu'
+#SBATCH --gpus="<type>:<num_gpus>"          # <type> is optional, <num_gpus> is mandatory
 #SBATCH --time=<D-HH:MM:SS>                 # Strongly recommended
 #SBATCH --output=<output_file>              # Generate custom output file
 #SBATCH --error=<error_file                 # Generate custom error  file
@ -220,9 +223,9 @@ The following template should be used by any user submitting jobs to GPU nodes:
 ##SBATCH --ntasks=1                         # Uncomment and specify number of nodes to use
 ##SBATCH --cpus-per-gpu=5                   # Uncomment and specify the number of cores per task
 ##SBATCH --mem-per-gpu=16000                # Uncomment and specify the number of cores per task
-##SBATCH --gpus-per-node=2                  # Uncomment and specify the number of GPUs per node
-##SBATCH --gpus-per-socket=2                # Uncomment and specify the number of GPUs per socket
-##SBATCH --gpus-per-task=1                  # Uncomment and specify the number of GPUs per task
+##SBATCH --gpus-per-node=<type>:2           # Uncomment and specify the number of GPUs per node
+##SBATCH --gpus-per-socket=<type>:2         # Uncomment and specify the number of GPUs per socket
+##SBATCH --gpus-per-task=<type>:1           # Uncomment and specify the number of GPUs per task
 ```

 ## Advanced configurations