Expanded PModules docs

This commit is contained in:
2021-05-21 18:39:38 +02:00
parent fcfdbf1344
commit 0fd1653938
11 changed files with 219 additions and 69 deletions

View File

@ -50,15 +50,15 @@ The table below resumes shows all possible partitions available to users:
| GPU Partition | Default Time | Max Time | PriorityJobFactor\* | PriorityTier\*\* |
|:-----------------: | :----------: | :------: | :-----------------: | :--------------: |
| **<u>gpu</u>** | 1 day | 1 week | 1 | 1 |
| **gpu-short** | 2 hours | 2 hours | 1000 | 500 |
| **gwendolen** | 1 hour | 12 hours | 1000 | 1000 |
| `gpu` | 1 day | 1 week | 1 | 1 |
| `gpu-short` | 2 hours | 2 hours | 1000 | 500 |
| `gwendolen` | 1 hour | 12 hours | 1000 | 1000 |
**\***The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
**\*\***Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
\*\*Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values.
### Merlin6 GPU Accounts
@ -71,11 +71,11 @@ This is mostly needed by users which have multiple Slurm accounts, which may def
```
Not all the accounts can be used on all partitions. This is resumed in the table below:
| Slurm Account | Slurm Partitions |
|:-------------------: | :--------------: |
| **<u>merlin</u>** | `gpu`,`gpu-short` |
| **gwendolen_public** | `gwendolen` |
| **gwendolen** | `gwendolen` |
| Slurm Account | Slurm Partitions |
|:-------------------: | :------------------: |
| **`merlin`** | **`gpu`**,`gpu-short` |
| `gwendolen_public` | `gwendolen` |
| `gwendolen` | `gwendolen` |
By default, all users belong to the `merlin` and `gwendolen_public` Slurm accounts. `gwendolen` is a restricted account.
@ -103,14 +103,61 @@ The GPU type is optional: if left empty, it will try allocating any type of GPU.
The different `[<type>:]` values and `<number>` of GPUs depends on the node.
This is detailed in the below table.
| Nodes | GPU Type | #GPUs |
|:------------------:| :-------------------: | :---: |
| merlin-g-[001] | `geforce_gtx_1080` | 2 |
| merlin-g-[002-005] | `geforce_gtx_1080` | 4 |
| merlin-g-[006-009] | `geforce_gtx_1080_ti` | 4 |
| merlin-g-[010-013] | `geforce_rtx_2080_ti` | 4 |
| merlin-g-014 | `geforce_rtx_2080_ti` | 8 |
| merlin-g-100 | `A100` | 8 |
| Nodes | GPU Type | #GPUs |
|:---------------------: | :-----------------------: | :---: |
| **merlin-g-[001]** | **`geforce_gtx_1080`** | 2 |
| **merlin-g-[002-005]** | **`geforce_gtx_1080`** | 4 |
| **merlin-g-[006-009]** | **`geforce_gtx_1080_ti`** | 4 |
| **merlin-g-[010-013]** | **`geforce_rtx_2080_ti`** | 4 |
| **merlin-g-014** | **`geforce_rtx_2080_ti`** | 8 |
| **merlin-g-100** | **`A100`** | 8 |
#### Constraint / Features
Instead of specifying the GPU **type**, sometimes users would need to **specify the GPU by the amount of memory available in the GPU** card itself.
This has been defined in Slurm with **Features**, which is a tag which defines the GPU memory for the different GPU cards.
Users can specify which GPU memory size needs to be used with the `--constraint` option. In that case, notice that *in many cases
there is not need to specify `[<type>:]`* in the `--gpus` option.
```bash
#SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_40gb
```
The table below shows the available **Features** and which GPU card models and GPU nodes they belong to:
<table>
<thead>
<tr>
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="3">Merlin6 CPU Computing Nodes</th>
</tr>
<tr>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Nodes</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">GPU Type</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Feature</th>
</tr>
</thead>
<tbody>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-[001-005]</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`geforce_gtx_1080`</td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>`gpumem_8gb`</b></td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-[006-009]</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`geforce_gtx_1080_ti`</td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="2"><b>`gpumem_11gb`</b></td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-[010-014]</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`geforce_rtx_2080_ti`</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-100</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`A100`</td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>`gpumem_40gb`</b></td>
</tr>
</tbody>
</table>
#### Other GPU options
@ -120,14 +167,14 @@ Below are listed the most common settings:
```bash
#SBATCH --hint=[no]multithread
#SBATCH --ntasks=<ntasks>
#SBATCH --ntasks-per-gpu=<ntasks>
#SBATCH --mem-per-gpu=<size[units]>
#SBATCH --cpus-per-gpu=<ncpus>
#SBATCH --gpus-per-node=[<type>:]<number>
#SBATCH --gpus-per-socket=[<type>:]<number>
#SBATCH --gpus-per-task=[<type>:]<number>
#SBATCH --gpu-bind=[verbose,]<type>
#SBATCH --ntasks=\<ntasks\>
#SBATCH --ntasks-per-gpu=\<ntasks\>
#SBATCH --mem-per-gpu=\<size[units]\>
#SBATCH --cpus-per-gpu=\<ncpus\>
#SBATCH --gpus-per-node=[\<type\>:]\<number\>
#SBATCH --gpus-per-socket=[\<type\>:]\<number\>
#SBATCH --gpus-per-task=[\<type\>:]\<number\>
#SBATCH --gpu-bind=[verbose,]\<type\>
```
Please, notice that when defining `[<type>:]` once, then all other options must use it too!