Fixed gwendolen information
This commit is contained in:
@ -67,25 +67,28 @@ Users need to ensure that the public **`merlin`** account is specified. No speci
|
|||||||
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
|
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
#SBATCH --account=merlin # Possible values: merlin, gwendolen_public, gwendolen
|
#SBATCH --account=merlin # Possible values: merlin, gwendolen
|
||||||
```
|
```
|
||||||
Not all the accounts can be used on all partitions. This is resumed in the table below:
|
Not all the accounts can be used on all partitions. This is resumed in the table below:
|
||||||
|
|
||||||
| Slurm Account | Slurm Partitions |
|
| Slurm Account | Slurm Partitions | Special QoS |
|
||||||
|:-------------------: | :------------------: |
|
|:-------------------: | :------------------: | :---------------------------------: |
|
||||||
| **`merlin`** | **`gpu`**,`gpu-short` |
|
| **`merlin`** | **`gpu`**,`gpu-short` | |
|
||||||
| `gwendolen_public` | `gwendolen` |
|
| `gwendolen` | `gwendolen` | `gwendolen`, **`gwendolen_public`** |
|
||||||
| `gwendolen` | `gwendolen` |
|
|
||||||
|
|
||||||
By default, all users belong to the `merlin` and `gwendolen_public` Slurm accounts. `gwendolen` is a restricted account.
|
By default, all users belong to the `merlin` and `gwendolen` Slurm accounts.
|
||||||
|
|
||||||
#### The 'gwendolen' accounts
|
Users only need to specify `gwendolen` when using `gwendolen`, otherwise specfying account is not needed (it will always default to `merlin`). `gwendolen` is a special account, with two different **QoS** granting different types of access (see details below).
|
||||||
|
|
||||||
For running jobs in the **`gwendolen`** partition, users must specify one of the `gwendolen_public` or `gwendolen` accounts.
|
#### The 'gwendolen' account
|
||||||
The `merlin` account is not allowed to use the `gwendolen` partition.
|
|
||||||
|
|
||||||
* The **`gwendolen_public`** can be used by any Merlin user, and provides restricted resource access to **`gwendolen`**.
|
For running jobs in the **`gwendolen`** partition, users must specify the `gwendolen` account. The `merlin` account is not allowed to use the `gwendolen` partition.
|
||||||
* The **`gwendolen`** is restricted to a set of users, and provides full access to **`gwendolen`**.
|
|
||||||
|
In addition, in Slurm there is the concept of **QoS**, which stands for **Quality of Service**. The **`gwendolen`** account has two different QoS configured:
|
||||||
|
* The **QoS** **`gwendolen_public`** is set by default to all Merlin users. This restricts the number of resources than can be used on **Gwendolen**. For further information about restrictions, please read the ['User and Job Limits'](/gmerlin6/slurm-configuration.html#user-and-job-limits) documentation.
|
||||||
|
* The **QoS** **`gwendolen`** provides full access to **`gwendolen`**, however this is restricted to a set of users belonging to the **`unx-gwendolen`** Unix group.
|
||||||
|
|
||||||
|
Users don't need to specify any QoS, however, they need to be aware about resources restrictions. If you belong to one of the projects which is allowed to use **Gwendolen** without restrictions, please request access to the **`unx-gwendolen`** through [PSI Service Now](https://psi.service-now.com/).
|
||||||
|
|
||||||
### Slurm GPU specific options
|
### Slurm GPU specific options
|
||||||
|
|
||||||
@ -201,12 +204,12 @@ These are limits applying to a single job. In other words, there is a maximum of
|
|||||||
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
|
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
|
||||||
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
|
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
|
||||||
|
|
||||||
| Partition | Slurm Account | Mon-Sun 0h-24h |
|
| Partition | Slurm Account | Mon-Sun 0h-24h |
|
||||||
|:-------------:| :----------------: | :------------------------------------------: |
|
|:-------------:| :------------: | :------------------------------------------: |
|
||||||
| **gpu** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
|
| **gpu** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
|
||||||
| **gpu-short** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
|
| **gpu-short** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
|
||||||
| **gwendolen** | `gwendolen_public` | gwendolen_public(cpu=32,gres/gpu=2,mem=200G) |
|
| **gwendolen** | `gwendolen` | gwendolen_public(cpu=32,gres/gpu=2,mem=200G) |
|
||||||
| **gwendolen** | `gwendolen` | No limits, full access granted |
|
| **gwendolen** | `gwendolen` | gwendolen(No limits, full access granted) |
|
||||||
|
|
||||||
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
|
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
|
||||||
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
|
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
|
||||||
@ -216,8 +219,8 @@ instance in the CPU **daily** partition), the job needs to be cancelled, and the
|
|||||||
must be adapted according to the above resource limits.
|
must be adapted according to the above resource limits.
|
||||||
|
|
||||||
* The **gwendolen** partition is a special partition with a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine.
|
* The **gwendolen** partition is a special partition with a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine.
|
||||||
Public access is possible through the `gwendolen_public` account, however is limited to 2 GPUs per job, 32 CPUs and 121875MB of memory).
|
Public access is possible through the `gwendolen` account, however this is limited to 2 GPUs per job, 32 CPUs and 121875MB of memory).
|
||||||
For full access, the `gwendolen` account is needed, and this is restricted to a set of users.
|
For full access, the `gwendolen` account with `gwendolen` **QoS** (Quality of Service) is needed, and this is restricted to a set of users (belonging to the **`unx-gwendolen`** Unix group). Any other user will have by default a QoS **`gwendolen_public`**, which restricts resources in Gwendolen.
|
||||||
|
|
||||||
### Per user limits for GPU partitions
|
### Per user limits for GPU partitions
|
||||||
|
|
||||||
@ -229,8 +232,8 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
|
|||||||
|:-------------:| :----------------: | :---------------------------------------------: |
|
|:-------------:| :----------------: | :---------------------------------------------: |
|
||||||
| **gpu** | **`merlin`** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
|
| **gpu** | **`merlin`** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
|
||||||
| **gpu-short** | **`merlin`** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
|
| **gpu-short** | **`merlin`** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
|
||||||
| **gwendolen** | `gwendolen_public` | gwendolen_public(cpu=64,gres/gpu=4,mem=243750M) |
|
| **gwendolen** | `gwendolen` | gwendolen_public(cpu=64,gres/gpu=4,mem=243750M) |
|
||||||
| **gwendolen** | `gwendolen` | No limits, full access granted |
|
| **gwendolen** | `gwendolen` | gwendolen(No limits, full access granted) |
|
||||||
|
|
||||||
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
|
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
|
||||||
Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**.
|
Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**.
|
||||||
|
Reference in New Issue
Block a user