From 057d79251765976be7372dbdb1dcb7fe5a971887 Mon Sep 17 00:00:00 2001
From: caubet_m <marc.caubet@psi.ch>
Date: Mon, 2 May 2022 17:46:26 +0200
Subject: [PATCH] Add CPU features information

---
 pages/merlin6/slurm-configuration.md | 63 +++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 10 deletions(-)
diff --git a/pages/merlin6/slurm-configuration.md b/pages/merlin6/slurm-configuration.md
index 9e17723..30fcb5f 100644
--- a/pages/merlin6/slurm-configuration.md
+++ b/pages/merlin6/slurm-configuration.md
@@ -33,10 +33,6 @@ and memory was by default oversubscribed.
 {{site.data.alerts.tip}}Always check <b>'/etc/slurm/slurm.conf'</b> for changes in the hardware.
 {{site.data.alerts.end}}
 
-## Running jobs in the 'merlin6' cluster
-
-In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
-
 ### Merlin6 CPU cluster
 
 To run jobs in the **`merlin6`** cluster users **can optionally** specify the cluster name in Slurm:
@@ -66,6 +62,7 @@ The following *partitions* (also known as *queues*) are configured in Slurm:
 | **hourly**         |  1 hour       | 1 hour   | unlimited | 1000                | 1                | 4000         |
 | **asa-general**    |  1 hour       | 2 weeks  | unlimited | 1                   | 2                | 3712         |
 | **asa-daily**      |  1 hour       | 1 week   | unlimited | 500                 | 2                | 3712         |
+| **asa-visas**      |  1 hour       | 90 days  | unlimited | 1000                | 4                | 3712         |
 | **asa-ansys**      |  1 hour       | 90 days  | unlimited | 1000                | 4                | 15600        |
 | **mu3e**           |  1 day        | 7 days   | unlimited | 1000                | 4                | 3712         |
 
@@ -79,7 +76,7 @@ and, if possible, they will preempt running jobs from partitions with lower *Pri
 * The **`general`** partition is the **default**. It can not have more than 50 nodes running jobs.
 * For **`daily`** this limitation is extended to 67 nodes.
 * For **`hourly`** there are no limits.
-* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private hidden** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
+* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
 
 {{site.data.alerts.tip}}Jobs which would run for less than one day should be always sent to <b>daily</b>, while jobs that would run for less
 than one hour should be sent to <b>hourly</b>. This would ensure that you have highest priority over jobs sent to partitions with less priority,
@@ -101,12 +98,13 @@ Not all the accounts can be used on all partitions. This is resumed in the table
 | Slurm Account        | Slurm Partitions                      |
 | :------------------: |  :----------------------------------: |
 | **<u>merlin</u>**    | `hourly`,`daily`, `general`           |
-| **gfa-asa**          | `gfa-asa`,`hourly`,`daily`, `general` |
+| **gfa-asa**          | `asa-general`,`asa-daily`,`asa-visas`,`asa-ansys`,`hourly`,`daily`, `general` |
+| **mu3e**             | `mu3e` |
 
-#### The 'gfa-asa' private account
+#### Private accounts
 
-For accessing the **`gfa-asa`** partition, it must be done through the **`gfa-asa`** account. This account **is restricted**
-to a group of users and is not public.
+* The *`gfa-asa`* and *`mu3e`* accounts are private accounts. This can be used for accessing dedicated
+nodes owned by different departments.
 
 ### Slurm CPU specific options
 
@@ -128,7 +126,7 @@ Below are listed the most common settings:
 #SBATCH --cpu-bind=[{quiet,verbose},]<type>  # only for 'srun' command
 ```
 
-#### Dealing with Hyper-Threading
+#### Enabling/Disabling Hyper-Threading
 
 The **`merlin6`** cluster contains nodes with Hyper-Threading enabled. One should always specify 
 whether to use Hyper-Threading or not. If not defined, Slurm will generally use it (exceptions apply).
@@ -138,6 +136,51 @@ whether to use Hyper-Threading or not. If not defined, Slurm will generally use
 #SBATCH --hint=nomultithread          # Don't use extra threads with in-core multi-threading.
 ```
 
+#### Constraint / Features
+
+Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more
+specific features. For the CPU nodes, we have the following features:
+
+```
+NodeName=merlin-c-[001-024,101-124,201-224]   Features=mem_384gb,xeon-gold-6152
+NodeName=merlin-c-[301-312]                   Features=mem_768gb,xeon-gold-6240r
+NodeName=merlin-c-[313-318]                   Features=mem_768gb,xeon-gold-6240r
+NodeName=merlin-c-[319-324]                   Features=mem_384gb,xeon-gold-6240r
+```
+
+Therefore, users running on `hourly` can select which node they want to use (fat memory nodes vs regular memory nodes, CPU type).
+This is possible by using the option `--constraint=<feature_name>` in Slurm.
+
+Examples:
+1. Select nodes with 48 cores only (nodes with [2 x Xeon Gold 6240R](https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html)):
+```
+sbatch --constraint=xeon-gold-6240r ...
+```
+2. Select nodes with 44 cores only (nodes with [2 x Xeon Gold 6152](https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html)):
+```
+sbatch --constraint=xeon-gold-6152 ...
+```
+3. Select fat memory nodes only:
+```
+sbatch --constraint=mem_768gb ...
+```
+4. Select regular memory nodes only:
+```
+sbatch --constraint=mem_384gb ...
+```
+5. Select fat memory nodes with 48 cores only:
+```
+sbatch --constraint=mem_768gb,xeon-gold-6240r ...
+```
+
+Detailing exactly which type of nodes you want to use is important, therefore, for groups with private accounts (`mu3e`,`gfa-asa`) or for
+public users running on the `hourly` partition, *constraining nodes by features is recommended*. This becomes even more important when
+having heterogeneous clusters.
+
+## Running jobs in the 'merlin6' cluster
+
+In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
+
 ### User and job limits 
 
 In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to