Add CPU features information

This commit is contained in:
2022-05-02 17:46:26 +02:00
parent 168dc84226
commit 057d792517

View File

@ -33,10 +33,6 @@ and memory was by default oversubscribed.
{{site.data.alerts.tip}}Always check <b>'/etc/slurm/slurm.conf'</b> for changes in the hardware.
{{site.data.alerts.end}}
## Running jobs in the 'merlin6' cluster
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
### Merlin6 CPU cluster
To run jobs in the **`merlin6`** cluster users **can optionally** specify the cluster name in Slurm:
@ -66,6 +62,7 @@ The following *partitions* (also known as *queues*) are configured in Slurm:
| **hourly** | 1 hour | 1 hour | unlimited | 1000 | 1 | 4000 |
| **asa-general** | 1 hour | 2 weeks | unlimited | 1 | 2 | 3712 |
| **asa-daily** | 1 hour | 1 week | unlimited | 500 | 2 | 3712 |
| **asa-visas** | 1 hour | 90 days | unlimited | 1000 | 4 | 3712 |
| **asa-ansys** | 1 hour | 90 days | unlimited | 1000 | 4 | 15600 |
| **mu3e** | 1 day | 7 days | unlimited | 1000 | 4 | 3712 |
@ -79,7 +76,7 @@ and, if possible, they will preempt running jobs from partitions with lower *Pri
* The **`general`** partition is the **default**. It can not have more than 50 nodes running jobs.
* For **`daily`** this limitation is extended to 67 nodes.
* For **`hourly`** there are no limits.
* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private hidden** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
{{site.data.alerts.tip}}Jobs which would run for less than one day should be always sent to <b>daily</b>, while jobs that would run for less
than one hour should be sent to <b>hourly</b>. This would ensure that you have highest priority over jobs sent to partitions with less priority,
@ -101,12 +98,13 @@ Not all the accounts can be used on all partitions. This is resumed in the table
| Slurm Account | Slurm Partitions |
| :------------------: | :----------------------------------: |
| **<u>merlin</u>** | `hourly`,`daily`, `general` |
| **gfa-asa** | `gfa-asa`,`hourly`,`daily`, `general` |
| **gfa-asa** | `asa-general`,`asa-daily`,`asa-visas`,`asa-ansys`,`hourly`,`daily`, `general` |
| **mu3e** | `mu3e` |
#### The 'gfa-asa' private account
#### Private accounts
For accessing the **`gfa-asa`** partition, it must be done through the **`gfa-asa`** account. This account **is restricted**
to a group of users and is not public.
* The *`gfa-asa`* and *`mu3e`* accounts are private accounts. This can be used for accessing dedicated
nodes owned by different departments.
### Slurm CPU specific options
@ -128,7 +126,7 @@ Below are listed the most common settings:
#SBATCH --cpu-bind=[{quiet,verbose},]<type> # only for 'srun' command
```
#### Dealing with Hyper-Threading
#### Enabling/Disabling Hyper-Threading
The **`merlin6`** cluster contains nodes with Hyper-Threading enabled. One should always specify
whether to use Hyper-Threading or not. If not defined, Slurm will generally use it (exceptions apply).
@ -138,6 +136,51 @@ whether to use Hyper-Threading or not. If not defined, Slurm will generally use
#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.
```
#### Constraint / Features
Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more
specific features. For the CPU nodes, we have the following features:
```
NodeName=merlin-c-[001-024,101-124,201-224] Features=mem_384gb,xeon-gold-6152
NodeName=merlin-c-[301-312] Features=mem_768gb,xeon-gold-6240r
NodeName=merlin-c-[313-318] Features=mem_768gb,xeon-gold-6240r
NodeName=merlin-c-[319-324] Features=mem_384gb,xeon-gold-6240r
```
Therefore, users running on `hourly` can select which node they want to use (fat memory nodes vs regular memory nodes, CPU type).
This is possible by using the option `--constraint=<feature_name>` in Slurm.
Examples:
1. Select nodes with 48 cores only (nodes with [2 x Xeon Gold 6240R](https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html)):
```
sbatch --constraint=xeon-gold-6240r ...
```
2. Select nodes with 44 cores only (nodes with [2 x Xeon Gold 6152](https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html)):
```
sbatch --constraint=xeon-gold-6152 ...
```
3. Select fat memory nodes only:
```
sbatch --constraint=mem_768gb ...
```
4. Select regular memory nodes only:
```
sbatch --constraint=mem_384gb ...
```
5. Select fat memory nodes with 48 cores only:
```
sbatch --constraint=mem_768gb,xeon-gold-6240r ...
```
Detailing exactly which type of nodes you want to use is important, therefore, for groups with private accounts (`mu3e`,`gfa-asa`) or for
public users running on the `hourly` partition, *constraining nodes by features is recommended*. This becomes even more important when
having heterogeneous clusters.
## Running jobs in the 'merlin6' cluster
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
### User and job limits
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to