initial formatting changes complete

This commit is contained in:
2026-01-06 16:40:15 +01:00
parent f58c1f57b8
commit 7db5d0fd05
81 changed files with 805 additions and 1112 deletions

View File

@@ -1,12 +1,4 @@
---
title: Slurm cluster 'gmerlin6'
#tags:
keywords: configuration, partitions, node definition, gmerlin6
last_updated: 29 January 2021
summary: "This document describes a summary of the Slurm 'configuration."
sidebar: merlin6_sidebar
permalink: /gmerlin6/slurm-configuration.html
---
# Slurm cluster 'gmerlin6'
This documentation shows basic Slurm configuration and options needed to run jobs in the GPU cluster.
@@ -49,30 +41,35 @@ Users might need to specify the Slurm partition. If no partition is specified, i
The table below resumes shows all possible partitions available to users:
| GPU Partition | Default Time | Max Time | PriorityJobFactor\* | PriorityTier\*\* |
| GPU Partition | Default Time | Max Time | PriorityJobFactor | PriorityTier |
|:---------------------: | :----------: | :--------: | :-----------------: | :--------------: |
| `gpu` | 1 day | 1 week | 1 | 1 |
| `gpu-short` | 2 hours | 2 hours | 1000 | 500 |
| `gwendolen` | 30 minutes | 2 hours | 1000 | 1000 |
| `gwendolen-long`\*\*\* | 30 minutes | 8 hours | 1 | 1 |
| `gwendolen-long` | 30 minutes | 8 hours | 1 | 1 |
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
\*\*Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values.
Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
and, if possible, they will preempt running jobs from partitions with lower **PriorityTier** values.
\*\*\***gwnedolen-long** is a special partition which is enabled during non-working hours only. As of _Nov 2023_, the current policy is to disable this partition from Mon to Fri, from 1am to 5pm. However, jobs can be submitted anytime, but can only be scheduled outside this time range.
**gwnedolen-long** is a special partition which is enabled during non-working
hours only. As of **Nov 2023**, the current policy is to disable this partition
from Mon to Fri, from 1am to 5pm. However, jobs can be submitted anytime, but
can only be scheduled outside this time range.
### Merlin6 GPU Accounts
Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
```bash
#SBATCH --account=merlin # Possible values: merlin, gwendolen
```
Not all the accounts can be used on all partitions. This is resumed in the table below:
| Slurm Account | Slurm Partitions |
@@ -82,14 +79,20 @@ Not all the accounts can be used on all partitions. This is resumed in the table
By default, all users belong to the `merlin` Slurm accounts, and jobs are submitted to the `gpu` partition when no partition is defined.
Users only need to specify the `gwendolen` account when using the `gwendolen` or `gwendolen-long` partitions, otherwise specifying account is not needed (it will always default to `merlin`).
Users only need to specify the `gwendolen` account when using the `gwendolen` or `gwendolen-long` partitions, otherwise specifying account is not needed (it will always default to `merlin`).
#### The 'gwendolen' account
For running jobs in the **`gwendolen`/`gwendolen-long`** partitions, users must specify the **`gwendolen`** account.
The `merlin` account is not allowed to use the Gwendolen partitions.
For running jobs in the **`gwendolen`/`gwendolen-long`** partitions, users must
specify the **`gwendolen`** account. The `merlin` account is not allowed to
use the Gwendolen partitions.
Gwendolen is restricted to a set of users belonging to the **`unx-gwendolen`** Unix group. If you belong to a project allowed to use **Gwendolen**, or you are a user which would like to have access to it, please request access to the **`unx-gwendolen`** Unix group through [PSI Service Now](https://psi.service-now.com/): the request will be redirected to the responsible of the project (Andreas Adelmann).
Gwendolen is restricted to a set of users belonging to the **`unx-gwendolen`**
Unix group. If you belong to a project allowed to use **Gwendolen**, or you are
a user which would like to have access to it, please request access to the
**`unx-gwendolen`** Unix group through [PSI Service
Now](https://psi.service-now.com/): the request will be redirected to the
responsible of the project (Andreas Adelmann).
### Slurm GPU specific options
@@ -119,16 +122,20 @@ This is detailed in the below table.
#### Constraint / Features
Instead of specifying the GPU **type**, sometimes users would need to **specify the GPU by the amount of memory available in the GPU** card itself.
This has been defined in Slurm with **Features**, which is a tag which defines the GPU memory for the different GPU cards.
Users can specify which GPU memory size needs to be used with the `--constraint` option. In that case, notice that *in many cases
there is not need to specify `[<type>:]`* in the `--gpus` option.
Instead of specifying the GPU **type**, sometimes users would need to **specify
the GPU by the amount of memory available in the GPU** card itself.
This has been defined in Slurm with **Features**, which is a tag which defines
the GPU memory for the different GPU cards. Users can specify which GPU memory
size needs to be used with the `--constraint` option. In that case, notice that
*in many cases there is not need to specify `[<type>:]`* in the `--gpus`
option.
```bash
#SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_24gb, gpumem_40gb
```
The table below shows the available **Features** and which GPU card models and GPU nodes they belong to:
The table below shows the available **Features** and which GPU card models and GPU nodes they belong to:
<table>
<thead>
@@ -172,6 +179,7 @@ The table below shows the available **Features** and which GPU card models and G
#### Other GPU options
Alternative Slurm options for GPU based jobs are available. Please refer to the **man** pages
for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`).
Below are listed the most common settings:
@@ -191,8 +199,9 @@ Please, notice that when defining `[<type>:]` once, then all other options must
#### Dealing with Hyper-Threading
The **`gmerlin6`** cluster contains the partitions `gwendolen` and `gwendolen-long`, which have a node with Hyper-Threading enabled.
In that case, one should always specify whether to use Hyper-Threading or not. If not defined, Slurm will
The **`gmerlin6`** cluster contains the partitions `gwendolen` and `gwendolen-long`, which have a node with Hyper-Threading enabled.
In that case, one should always specify whether to use Hyper-Threading or not. If not defined, Slurm will
generally use it (exceptions apply). For this machine, generally HT is recommended.
```bash
@@ -200,14 +209,14 @@ generally use it (exceptions apply). For this machine, generally HT is recommend
#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.
```
## User and job limits
## User and job limits
The GPU cluster contains some basic user and job limits to ensure that a single user can not overabuse the resources and a fair usage of the cluster.
The limits are described below.
### Per job limits
These are limits applying to a single job. In other words, there is a maximum of resources a single job can use.
These are limits applying to a single job. In other words, there is a maximum of resources a single job can use.
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
@@ -218,25 +227,29 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
| **gwendolen** | `gwendolen` | No limits |
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**.
As there are no more existing QoS during the week temporary overriding job limits (this happens for
instance in the CPU **daily** partition), the job needs to be cancelled, and the requested resources
must be adapted according to the above resource limits.
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**.
As there are no more existing QoS during the week temporary overriding job limits (this happens for
instance in the CPU **daily** partition), the job needs to be cancelled, and the requested resources
must be adapted according to the above resource limits.
* The **gwendolen** and **gwendolen-long** partitions are two special partitions for a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine.
Only users belonging to the **`unx-gwendolen`** Unix group can run in these partitions. No limits are applied (machine resources can be completely used).
Only users belonging to the **`unx-gwendolen`** Unix group can run in these partitions. No limits are applied (machine resources can be completely used).
* The **`gwendolen-long`** partition is available 24h. However,
* from 5:30am to 9pm the partition is `down` (jobs can be submitted, but can not run until the partition is set to `active`).
* from 9pm to 5:30am jobs are allowed to run (partition is set to `active`).
* from 5:30am to 9pm the partition is `down` (jobs can be submitted, but can not run until the partition is set to `active`).
* from 9pm to 5:30am jobs are allowed to run (partition is set to `active`).
### Per user limits for GPU partitions
These limits apply exclusively to users. In other words, there is a maximum of resources a single user can use.
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
These limits apply exclusively to users. In other words, there is a maximum of
resources a single user can use. Limits are defined using QoS, and this is
usually set at the partition level. Limits are described in the table below
with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be listed
with the command `sacctmgr show qos`):
| Partition | Slurm Account | Mon-Sun 0h-24h |
|:------------------:| :----------------: | :---------------------------------------------: |
@@ -245,13 +258,18 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
| **gwendolen** | `gwendolen` | No limits |
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**.
In that case, job can wait in the queue until some of the running resources are freed.
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
Jobs sent by any user already exceeding such limits will stay in the queue
with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**. In that case, job can
wait in the queue until some of the running resources are freed.
* Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc.
Please try to avoid occupying all GPUs of the same type for several hours or multiple days, otherwise it would block other users needing the same
type of GPU.
!!! warning
Please try to avoid occupying all GPUs of the same type for several hours or
multiple days, otherwise it would block other users needing the same type of
GPU.
## Advanced Slurm configuration
@@ -265,4 +283,8 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).
Configuration files for the old **merlin5** cluster or for the **gmerlin6**
cluster must be checked directly on any of the **merlin5** or **gmerlin6**
computing nodes (in example, by login in to one of the nodes while a job or an
active allocation is running).