Slurm Configuration: CPU vs GPU nodes

2020-12-23 12:03:58 +01:00
parent 97400778a7
commit a107b80bd0
1 changed files with 55 additions and 25 deletions
--- a/Submission/slurm-configuration.md
+++ b/Submission/slurm-configuration.md
@ -18,12 +18,12 @@ the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.

 In this documentation is only explained the usage of the **merlin6** Slurm cluster.

-## Using Slurm 'merlin6' cluster
+## Merlin6 CPU

-Basic configuration for the **merlin6** cluster will be detailed here. 
+Basic configuration for the **merlin6 CPUs** cluster will be detailed here. 
 For advanced usage, please refer to [Understanding the Slurm configuration (for advanced users)](/merlin6/slurm-configuration.html#understanding-the-slurm-configuration-for-advanced-users)

-### Merlin6 Node definition
+### CPU nodes definition

 The following table show default and maximum resources that can be used per node:

@ -32,16 +32,13 @@ The following table show default and maximum resources that can be used per node
 | merlin-c-[001-024] | 1 core    | 44 cores  | 2        | 4000        | 352000      | 352000       | 10000    | N/A       | N/A       |
 | merlin-c-[101-124] | 1 core    | 44 cores  | 2        | 4000        | 352000      | 352000       | 10000    | N/A       | N/A       |
 | merlin-c-[201-224] | 1 core    | 44 cores  | 2        | 4000        | 352000      | 352000       | 10000    | N/A       | N/A       |
-| merlin-g-[001]     | 1 core    | 8 cores   | 1        | 4000        | 102400      | 102400       | 10000    | 1         | 2         |
-| merlin-g-[002-009] | 1 core    | 20 cores  | 1        | 4000        | 102400      | 102400       | 10000    | 1         | 4         |
-| merlin-g-[010-013] | 1 core    | 20 cores  | 1        | 4000        | 102400      | 102400       | 10000    | 1         | 4         |

 If nothing is specified, by default each core will use up to 8GB of memory. Memory can be increased with the `--mem=<mem_in_MB>` and 
 `--mem-per-cpu=<mem_in_MB>` options, and maximum memory allowed is `Max.Mem/Node`.

-In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
+In *Merlin6*, Memory is considered a Consumable Resource, as well as the CPU.

-### Merlin6 Slurm partitions
+### CPU partitions

 Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
 The following *partitions* (also known as *queues*) are configured in Slurm:
@ -52,11 +49,6 @@ The following *partitions* (also known as *queues*) are configured in Slurm:
 | **daily**          |  1 day        | 1 day    | 67        | medium   | 500                 |
 | **hourly**         |  1 hour       | 1 hour   | unlimited | highest  | 1000                |

-| GPU Partition      |  Default Time | Max Time | Max Nodes | Priority | PriorityJobFactor\* |
-|:-----------------: |  :----------: | :------: | :-------: | :------: | :-----------------: |
-| **<u>gpu</u>**     |  1 day        | 1 week   | 4         | low      | 1                   |
-| **gpu-short**      |  2 hours      | 2 hours  | 4         | highest  | 1000                |
-
 \*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
 partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
 partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
@ -70,7 +62,7 @@ but also because <b>general</b> has limited the number of nodes that can be used
 be blocked by long jobs and we can always ensure resources for shorter jobs.
 {{site.data.alerts.end}}

-### Merlin6 user and job limits
+### User and job limits 

 In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to 
 avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
@ -96,11 +88,11 @@ These are limits which apply to a single job. In other words, there is a maximum
 and limits will vary depending on the day of the week and the time (*working* vs *non-working* hours). Limits are shown in format: `SlurmQoS(limits)`,
 where `SlurmQoS` can be seen with the command `sacctmgr show qos`:

-| Partition   | Mon-Fri 08h-18h               | Sun-Thu 18h-0h                | From Fri 18h to Sun 8h        | From Sun 8h to Mon 18h        |
-|:----------: | :------------------:          | :------------:                | :---------------------:       | :--------------------:        |
-| **general** | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)     |
-| **daily**   | daytime(cpu=704,mem=2750G)    | nighttime(cpu=1408,mem=5500G) | unlimited(cpu=2112,mem=8250G) | nighttime(cpu=1408,mem=5500G) |
-| **hourly**  | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G) |
+| Partition   | Mon-Fri  0h-18h               | Sun-Thu 18h-0h                | From Fri 18h to Mon 0h        |
+|:----------: | :------------------:          | :------------:                | :---------------------:       |
+| **general** | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)     |
+| **daily**   | daytime(cpu=704,mem=2750G)    | nighttime(cpu=1408,mem=5500G) | unlimited(cpu=2112,mem=8250G) |
+| **hourly**  | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G) |

 By default, a job can not use more than 704 cores (max CPU per job). In the same way, memory is also proportionally limited. This is equivalent as 
 running a job using up to 8 nodes at once. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours).
@ -111,23 +103,61 @@ wider limits are provided. In order to avoid massive nodes drain in the cluster,
 mostly refers to "per user" limits more than to "per job" limits (in other words, users can run any number of hourly jobs, but the job size for such jobs is limited
 with wide values).

-#### Per user limits
+#### Per user limits for CPU partitions

 These limits which apply exclusively to users. In other words, there is a maximum of resource a single user can use. This is described in the table below,
 and limits will vary depending on the day of the week and the time (*working* vs *non-working* hours). Limits are shown in format: `SlurmQoS(limits)`,
 where `SlurmQoS` can be seen with the command `sacctmgr show qos`:

-| Partition   | Mon-Fri 08h-18h                | Sun-Thu 18h-0h                | From Fri 18h to Sun 8h         | From Sun 8h to Mon 18h         |
-|:-----------:| :----------------:             | :------------:                | :---------------------:        | :--------------------:         |
-| **general** | normal(cpu=704,mem=2750G)      | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)      | normal(cpu=704,mem=2750G)      |
-| **daily**   | daytime(cpu=1408,mem=5500G)    | nighttime(cpu=2112,mem=8250G) | unlimited(cpu=6336,mem=24750G) | nighttime(cpu=2112,mem=8250G)  |
-| **hourly**  | unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G)| unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G) |
+| Partition   | Mon-Fri 0h-18h                 | Sun-Thu 18h-0h                | From Fri 18h to Mon 0h         |
+|:-----------:| :----------------:             | :------------:                | :---------------------:        |
+| **general** | normal(cpu=704,mem=2750G)      | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)      |
+| **daily**   | daytime(cpu=1408,mem=5500G)    | nighttime(cpu=2112,mem=8250G) | unlimited(cpu=6336,mem=24750G) |
+| **hourly**  | unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G)| unlimited(cpu=6336,mem=24750G) |

 By default, users can not use more than 704 cores at the same time (max CPU per user). Memory is also proportionally limited in the same way. This is 
 equivalent to 8 exclusive nodes. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours). 
 For the **hourly** partition, there are no limits restriction and user limits are removed. Limits are softed for the **daily** partition during non 
 working hours, and during the weekend limits are removed.

+## Merlin6 GPU
+
+Basic configuration for the **merlin6 GPUs** will be detailed here. 
+For advanced usage, please refer to [Understanding the Slurm configuration (for advanced users)](/merlin6/slurm-configuration.html#understanding-the-slurm-configuration-for-advanced-users)
+
+### GPU nodes definition
+
+| Nodes              | Def.#CPUs | Max.#CPUs | #Threads | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | GPU Type       | Def.#GPUs | Max.#GPUs |
+|:------------------:| ---------:| :--------:| :------: | :----------:| :----------:| :-----------:| :-------:| :--------:     | :-------: | :-------: |
+| merlin-g-[001]     | 1 core    | 8 cores   | 1        | 4000        | 102400      | 102400       | 10000    | **GTX1080**    | 1         | 2         |
+| merlin-g-[002-005] | 1 core    | 20 cores  | 1        | 4000        | 102400      | 102400       | 10000    | **GTX1080**    | 1         | 4         |
+| merlin-g-[006-009] | 1 core    | 20 cores  | 1        | 4000        | 102400      | 102400       | 10000    | **GTX1080Ti**  | 1         | 4         |
+| merlin-g-[010-013] | 1 core    | 20 cores  | 1        | 4000        | 102400      | 102400       | 10000    | **RTX2080Ti**  | 1         | 4         |
+
+{{site.data.alerts.tip}}Always check <b>'/etc/slurm/gres.conf'</b> for changes in the GPU type and details of the NUMA node.
+{{site.data.alerts.end}}
+
+### GPU partitions
+
+| GPU Partition      |  Default Time | Max Time | Max Nodes | Priority | PriorityJobFactor\* |
+|:-----------------: |  :----------: | :------: | :-------: | :------: | :-----------------: |
+| **<u>gpu</u>**     |  1 day        | 1 week   | 4         | low      | 1                   |
+| **gpu-short**      |  2 hours      | 2 hours  | 4         | highest  | 1000                |
+
+\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
+partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
+partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
+
+### User and job limits 
+
+#### Per job limits
+
+Per job limits are the same as the per user limits (see below).
+
+#### Per user limits for CPU partitions
+
+By default, a user can not use more than **two** GPU nodes in parallel. Hence, users are limited to use 8 GPUs in parallel at most (from 2 nodes).
+
 ## Understanding the Slurm configuration (for advanced users)

 Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.