Add CPU features information
This commit is contained in:
@ -33,10 +33,6 @@ and memory was by default oversubscribed.
|
||||
{{site.data.alerts.tip}}Always check <b>'/etc/slurm/slurm.conf'</b> for changes in the hardware.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
## Running jobs in the 'merlin6' cluster
|
||||
|
||||
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
|
||||
|
||||
### Merlin6 CPU cluster
|
||||
|
||||
To run jobs in the **`merlin6`** cluster users **can optionally** specify the cluster name in Slurm:
|
||||
@ -66,6 +62,7 @@ The following *partitions* (also known as *queues*) are configured in Slurm:
|
||||
| **hourly** | 1 hour | 1 hour | unlimited | 1000 | 1 | 4000 |
|
||||
| **asa-general** | 1 hour | 2 weeks | unlimited | 1 | 2 | 3712 |
|
||||
| **asa-daily** | 1 hour | 1 week | unlimited | 500 | 2 | 3712 |
|
||||
| **asa-visas** | 1 hour | 90 days | unlimited | 1000 | 4 | 3712 |
|
||||
| **asa-ansys** | 1 hour | 90 days | unlimited | 1000 | 4 | 15600 |
|
||||
| **mu3e** | 1 day | 7 days | unlimited | 1000 | 4 | 3712 |
|
||||
|
||||
@ -79,7 +76,7 @@ and, if possible, they will preempt running jobs from partitions with lower *Pri
|
||||
* The **`general`** partition is the **default**. It can not have more than 50 nodes running jobs.
|
||||
* For **`daily`** this limitation is extended to 67 nodes.
|
||||
* For **`hourly`** there are no limits.
|
||||
* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private hidden** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
|
||||
* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
|
||||
|
||||
{{site.data.alerts.tip}}Jobs which would run for less than one day should be always sent to <b>daily</b>, while jobs that would run for less
|
||||
than one hour should be sent to <b>hourly</b>. This would ensure that you have highest priority over jobs sent to partitions with less priority,
|
||||
@ -101,12 +98,13 @@ Not all the accounts can be used on all partitions. This is resumed in the table
|
||||
| Slurm Account | Slurm Partitions |
|
||||
| :------------------: | :----------------------------------: |
|
||||
| **<u>merlin</u>** | `hourly`,`daily`, `general` |
|
||||
| **gfa-asa** | `gfa-asa`,`hourly`,`daily`, `general` |
|
||||
| **gfa-asa** | `asa-general`,`asa-daily`,`asa-visas`,`asa-ansys`,`hourly`,`daily`, `general` |
|
||||
| **mu3e** | `mu3e` |
|
||||
|
||||
#### The 'gfa-asa' private account
|
||||
#### Private accounts
|
||||
|
||||
For accessing the **`gfa-asa`** partition, it must be done through the **`gfa-asa`** account. This account **is restricted**
|
||||
to a group of users and is not public.
|
||||
* The *`gfa-asa`* and *`mu3e`* accounts are private accounts. This can be used for accessing dedicated
|
||||
nodes owned by different departments.
|
||||
|
||||
### Slurm CPU specific options
|
||||
|
||||
@ -128,7 +126,7 @@ Below are listed the most common settings:
|
||||
#SBATCH --cpu-bind=[{quiet,verbose},]<type> # only for 'srun' command
|
||||
```
|
||||
|
||||
#### Dealing with Hyper-Threading
|
||||
#### Enabling/Disabling Hyper-Threading
|
||||
|
||||
The **`merlin6`** cluster contains nodes with Hyper-Threading enabled. One should always specify
|
||||
whether to use Hyper-Threading or not. If not defined, Slurm will generally use it (exceptions apply).
|
||||
@ -138,6 +136,51 @@ whether to use Hyper-Threading or not. If not defined, Slurm will generally use
|
||||
#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.
|
||||
```
|
||||
|
||||
#### Constraint / Features
|
||||
|
||||
Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more
|
||||
specific features. For the CPU nodes, we have the following features:
|
||||
|
||||
```
|
||||
NodeName=merlin-c-[001-024,101-124,201-224] Features=mem_384gb,xeon-gold-6152
|
||||
NodeName=merlin-c-[301-312] Features=mem_768gb,xeon-gold-6240r
|
||||
NodeName=merlin-c-[313-318] Features=mem_768gb,xeon-gold-6240r
|
||||
NodeName=merlin-c-[319-324] Features=mem_384gb,xeon-gold-6240r
|
||||
```
|
||||
|
||||
Therefore, users running on `hourly` can select which node they want to use (fat memory nodes vs regular memory nodes, CPU type).
|
||||
This is possible by using the option `--constraint=<feature_name>` in Slurm.
|
||||
|
||||
Examples:
|
||||
1. Select nodes with 48 cores only (nodes with [2 x Xeon Gold 6240R](https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html)):
|
||||
```
|
||||
sbatch --constraint=xeon-gold-6240r ...
|
||||
```
|
||||
2. Select nodes with 44 cores only (nodes with [2 x Xeon Gold 6152](https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html)):
|
||||
```
|
||||
sbatch --constraint=xeon-gold-6152 ...
|
||||
```
|
||||
3. Select fat memory nodes only:
|
||||
```
|
||||
sbatch --constraint=mem_768gb ...
|
||||
```
|
||||
4. Select regular memory nodes only:
|
||||
```
|
||||
sbatch --constraint=mem_384gb ...
|
||||
```
|
||||
5. Select fat memory nodes with 48 cores only:
|
||||
```
|
||||
sbatch --constraint=mem_768gb,xeon-gold-6240r ...
|
||||
```
|
||||
|
||||
Detailing exactly which type of nodes you want to use is important, therefore, for groups with private accounts (`mu3e`,`gfa-asa`) or for
|
||||
public users running on the `hourly` partition, *constraining nodes by features is recommended*. This becomes even more important when
|
||||
having heterogeneous clusters.
|
||||
|
||||
## Running jobs in the 'merlin6' cluster
|
||||
|
||||
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
|
||||
|
||||
### User and job limits
|
||||
|
||||
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
|
||||
|
Reference in New Issue
Block a user