Fixed gwendolen information

This commit is contained in:
2021-05-25 16:10:52 +02:00
parent f98ee30047
commit a6d8f4541a

View File

@ -67,25 +67,28 @@ Users need to ensure that the public **`merlin`** account is specified. No speci
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
```bash
#SBATCH --account=merlin # Possible values: merlin, gwendolen_public, gwendolen
#SBATCH --account=merlin # Possible values: merlin, gwendolen
```
Not all the accounts can be used on all partitions. This is resumed in the table below:
| Slurm Account | Slurm Partitions |
|:-------------------: | :------------------: |
| **`merlin`** | **`gpu`**,`gpu-short` |
| `gwendolen_public` | `gwendolen` |
| `gwendolen` | `gwendolen` |
| Slurm Account | Slurm Partitions | Special QoS |
|:-------------------: | :------------------: | :---------------------------------: |
| **`merlin`** | **`gpu`**,`gpu-short` | |
| `gwendolen` | `gwendolen` | `gwendolen`, **`gwendolen_public`** |
By default, all users belong to the `merlin` and `gwendolen_public` Slurm accounts. `gwendolen` is a restricted account.
By default, all users belong to the `merlin` and `gwendolen` Slurm accounts.
#### The 'gwendolen' accounts
Users only need to specify `gwendolen` when using `gwendolen`, otherwise specfying account is not needed (it will always default to `merlin`). `gwendolen` is a special account, with two different **QoS** granting different types of access (see details below).
For running jobs in the **`gwendolen`** partition, users must specify one of the `gwendolen_public` or `gwendolen` accounts.
The `merlin` account is not allowed to use the `gwendolen` partition.
#### The 'gwendolen' account
* The **`gwendolen_public`** can be used by any Merlin user, and provides restricted resource access to **`gwendolen`**.
* The **`gwendolen`** is restricted to a set of users, and provides full access to **`gwendolen`**.
For running jobs in the **`gwendolen`** partition, users must specify the `gwendolen` account. The `merlin` account is not allowed to use the `gwendolen` partition.
In addition, in Slurm there is the concept of **QoS**, which stands for **Quality of Service**. The **`gwendolen`** account has two different QoS configured:
* The **QoS** **`gwendolen_public`** is set by default to all Merlin users. This restricts the number of resources than can be used on **Gwendolen**. For further information about restrictions, please read the ['User and Job Limits'](/gmerlin6/slurm-configuration.html#user-and-job-limits) documentation.
* The **QoS** **`gwendolen`** provides full access to **`gwendolen`**, however this is restricted to a set of users belonging to the **`unx-gwendolen`** Unix group.
Users don't need to specify any QoS, however, they need to be aware about resources restrictions. If you belong to one of the projects which is allowed to use **Gwendolen** without restrictions, please request access to the **`unx-gwendolen`** through [PSI Service Now](https://psi.service-now.com/).
### Slurm GPU specific options
@ -201,12 +204,12 @@ These are limits applying to a single job. In other words, there is a maximum of
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
| Partition | Slurm Account | Mon-Sun 0h-24h |
|:-------------:| :----------------: | :------------------------------------------: |
| **gpu** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
| **gpu-short** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
| **gwendolen** | `gwendolen_public` | gwendolen_public(cpu=32,gres/gpu=2,mem=200G) |
| **gwendolen** | `gwendolen` | No limits, full access granted |
| Partition | Slurm Account | Mon-Sun 0h-24h |
|:-------------:| :------------: | :------------------------------------------: |
| **gpu** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
| **gpu-short** | **`merlin`** | gpu_week(cpu=40,gres/gpu=8,mem=200G) |
| **gwendolen** | `gwendolen` | gwendolen_public(cpu=32,gres/gpu=2,mem=200G) |
| **gwendolen** | `gwendolen` | gwendolen(No limits, full access granted) |
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
@ -216,8 +219,8 @@ instance in the CPU **daily** partition), the job needs to be cancelled, and the
must be adapted according to the above resource limits.
* The **gwendolen** partition is a special partition with a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine.
Public access is possible through the `gwendolen_public` account, however is limited to 2 GPUs per job, 32 CPUs and 121875MB of memory).
For full access, the `gwendolen` account is needed, and this is restricted to a set of users.
Public access is possible through the `gwendolen` account, however this is limited to 2 GPUs per job, 32 CPUs and 121875MB of memory).
For full access, the `gwendolen` account with `gwendolen` **QoS** (Quality of Service) is needed, and this is restricted to a set of users (belonging to the **`unx-gwendolen`** Unix group). Any other user will have by default a QoS **`gwendolen_public`**, which restricts resources in Gwendolen.
### Per user limits for GPU partitions
@ -229,8 +232,8 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
|:-------------:| :----------------: | :---------------------------------------------: |
| **gpu** | **`merlin`** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
| **gpu-short** | **`merlin`** | gpu_week(cpu=80,gres/gpu=16,mem=400G) |
| **gwendolen** | `gwendolen_public` | gwendolen_public(cpu=64,gres/gpu=4,mem=243750M) |
| **gwendolen** | `gwendolen` | No limits, full access granted |
| **gwendolen** | `gwendolen` | gwendolen_public(cpu=64,gres/gpu=4,mem=243750M) |
| **gwendolen** | `gwendolen` | gwendolen(No limits, full access granted) |
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**.