This commit is contained in:
2021-05-20 18:04:54 +02:00
parent 173759bbf0
commit 42d8f38934
10 changed files with 558 additions and 177 deletions

View File

@ -23,7 +23,7 @@ In this documentation is only explained the usage of the **merlin6** Slurm clust
Basic configuration for the **merlin6 CPUs** cluster will be detailed here.
For advanced usage, please refer to [Understanding the Slurm configuration (for advanced users)](/merlin6/slurm-configuration.html#understanding-the-slurm-configuration-for-advanced-users)
### CPU nodes definition
## Merlin6 CPU nodes definition
The following table show default and maximum resources that can be used per node:
@ -120,77 +120,9 @@ equivalent to 8 exclusive nodes. This limit applies to the **general** partition
For the **hourly** partition, there are no limits restriction and user limits are removed. Limits are softed for the **daily** partition during non
working hours, and during the weekend limits are removed.
## Merlin6 GPU
Basic configuration for the **merlin6 GPUs** will be detailed here.
For advanced usage, please refer to [Understanding the Slurm configuration (for advanced users)](/merlin6/slurm-configuration.html#understanding-the-slurm-configuration-for-advanced-users)
### GPU nodes definition
| Nodes | Def.#CPUs | Max.#CPUs | #Threads | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | GPU Type | Def.#GPUs | Max.#GPUs |
|:------------------:| ---------:| :--------:| :------: | :----------:| :----------:| :-----------:| :-------:| :--------: | :-------: | :-------: |
| merlin-g-[001] | 1 core | 8 cores | 1 | 4000 | 102400 | 102400 | 10000 | **GTX1080** | 1 | 2 |
| merlin-g-[002-005] | 1 core | 20 cores | 1 | 4000 | 102400 | 102400 | 10000 | **GTX1080** | 1 | 4 |
| merlin-g-[006-009] | 1 core | 20 cores | 1 | 4000 | 102400 | 102400 | 10000 | **GTX1080Ti** | 1 | 4 |
| merlin-g-[010-013] | 1 core | 20 cores | 1 | 4000 | 102400 | 102400 | 10000 | **RTX2080Ti** | 1 | 4 |
{{site.data.alerts.tip}}Always check <b>'/etc/slurm/gres.conf'</b> for changes in the GPU type and details of the NUMA node.
{{site.data.alerts.end}}
### GPU partitions
| GPU Partition | Default Time | Max Time | Max Nodes | Priority | PriorityJobFactor\* |
|:-----------------: | :----------: | :------: | :-------: | :------: | :-----------------: |
| **<u>gpu</u>** | 1 day | 1 week | 4 | low | 1 |
| **gpu-short** | 2 hours | 2 hours | 4 | highest | 1000 |
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
### User and job limits
The GPU cluster contains some basic user and job limits to ensure that a single user can not overabuse the resources and a fair usage of the cluster.
The limits are described below.
#### Per job limits
These are limits applying to a single job. In other words, there is a maximum of resources a single job can use.
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`,
(list of possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
| Partition | Mon-Sun 0h-24h |
|:-------------:| :------------------------------------: |
| **gpu** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
| **gpu-short** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
With these limits, a single job can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**.
Since there are no more existing QoS during the week temporary overriding job limits (this happens for instance in the CPU **daily** partition), the job needs to be cancelled, and the requested resources must be adapted according to the above resource limits.
#### Per user limits for CPU partitions
These limits apply exclusively to users. In other words, there is a maximum of resources a single user can use.
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`,
(list of possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
| Partition | Mon-Sun 0h-24h |
|:-------------:| :---------------------------------------------------------: |
| **gpu** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
| **gpu-short** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
With these limits, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**. In that case, job can wait in the queue until some of the running resources are freed.
Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc.
Please try to avoid occupying all GPUs of the same type for several hours or multiple days, otherwise it would block other users needing the same
type of GPU.
## Understanding the Slurm configuration (for advanced users)
## Advanced Slurm configuration
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
@ -200,5 +132,4 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
to the **merlin6** login nodes.
Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).