New slurm config

2020-09-04 09:26:15 +02:00 · 2020-09-04 09:26:15 +02:00 · d03fe42bde
commit d03fe42bde
parent bb93cee6e8
1 changed files with 71 additions and 20 deletions
--- a/Submission/slurm-configuration.md
+++ b/Submission/slurm-configuration.md
@ -27,15 +27,15 @@ Basic usage for the **merlin6** cluster will be detailed here. For advanced usag
 The following table show default and maximum resources that can be used per node:

 | Nodes              | Def.#CPUs | Max.#CPUs | #Threads | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
-|:-------------------| ---------:| ---------:| -------- | -----------:| -----------:| ------------:| --------:| --------- | --------- |
+|:------------------:| ---------:| :--------:| :------: | :----------:| :----------:| :-----------:| :-------:| :-------: | :-------: |
 | merlin-c-[001-024] | 1 core    | 44 cores  | 2        | 4000        | 352000      | 352000       | 10000    | N/A       | N/A       |
 | merlin-c-[101-124] | 1 core    | 44 cores  | 2        | 4000        | 352000      | 352000       | 10000    | N/A       | N/A       |
 | merlin-c-[201-224] | 1 core    | 44 cores  | 2        | 4000        | 352000      | 352000       | 10000    | N/A       | N/A       |
 | merlin-g-[001]     | 1 core    | 8 cores   | 1        | 4000        | 102400      | 102400       | 10000    | 1         | 2         |
 | merlin-g-[002-009] | 1 core    | 20 cores  | 1        | 4000        | 102400      | 102400       | 10000    | 1         | 4         |

-If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
-and maximum memory allowed is ``Max.Mem/Node``.
+If nothing is specified, by default each core will use up to 8GB of memory. Memory can be increased with the `--mem=<mem_in_MB>` and 
+`--mem-per-cpu=<mem_in_MB>` options, and maximum memory allowed is `Max.Mem/Node`.

 In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.

@ -44,27 +44,78 @@ In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
 Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
 The following *partitions* (also known as *queues*) are configured in Slurm:

-| Partition   | Default Partition | Default Time | Max Time | Max Nodes | Priority |
-|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
-| **general** | true              | 1 day        | 1 week   | 50        | low      |
-| **daily**   | false             | 1 day        | 1 day    | 60        | medium   |
-| **hourly**  | false             | 1 hour       | 1 hour   | unlimited | highest  |
+| Partition          |  Default Time | Max Time | Max Nodes | Priority | PriorityJobFactor\* |
+|:-----------------: |  :----------: | :------: | :-------: | :------: | :-----------------: |
+| **<u>general</u>** |  1 day        | 1 week   | 50        | low      | 1                   |
+| **daily**          |  1 day        | 1 day    | 67        | medium   | 500                 |
+| **hourly**         |  1 hour       | 1 hour   | unlimited | highest  | 1000                |

-General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
-running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
-longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
+\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
+partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision).

-### Merlin6 User limits
+The **general** partition is the *default*: when nothing is specified, job will be by default assigned to that partition. General can not have more 
+than 50 nodes running jobs. For **daily** this limitation is extended to 67 nodes while for **hourly** there are no limits. 

-By default, users can not use more than 704 cores at the same time (Max CPU per user). This is equivalent to 8 exclusive nodes.
-This limit applies to the **general** and **daily** partitions. For the **hourly** partition, there is no restriction and user limits are removed. 
-Limits are softed for the **daily** partition during non working hours and during the weekend as follows:
+{{site.data.alerts.tip}}Jobs which would run for less than one day should be always sent to <b>daily</b>, while jobs that would run for less
+than one hour should be sent to <b>hourly</b>. This would ensure that you have highest priority over jobs sent to partitions with less priority,
+but also because <b>general</b> has limited the number of nodes that can be used for that. The idea behind that, is that the cluster can not
+be blocked by long jobs and we can always ensure resources for shorter jobs.
+{{site.data.alerts.end}}

-| Partition   | Mon-Fri 08h-18h  | Sun-Thu 18h-0h | From Fri 18h to Sun 8h  | From Sun 8h to Mon 18h |
-|:----------- | ---------------- | -------------- | ----------------------- | ---------------------- |
-| **general** | 704 (user limit) | 704            | 704                     | 704                    |
-| **daily**   | 704 (user limit) | 1408           | Unlimited               | 1408                   |
-| **hourly**  | Unlimited        | Unlimited      | Unlimited               | Unlimited              |
+### Merlin6 user and job limits
+
+In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to 
+avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
+pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied). 
+In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
+resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).
+
+Hence, there is a need of setting up wise limits and to ensure that there is a fair usage of the resources, by trying to optimize the overall efficiency
+of the cluster while allowing jobs of different nature and sizes (it is, **single core** based **vs parallel jobs** of different sizes) to run.
+
+{{site.data.alerts.warning}}Wide limits are provided in the <b>daily</b> and <b>hourly</b> partitions, while for <b>general</b> those limits are
+more restrictive. 
+<br>However, we kindly ask users to inform the Merlin administrators when there are plans to send big jobs which would require a
+massive draining of nodes for allocating such jobs. This would apply to jobs requiring the <b>unlimited</b> QoS (see below <i>"Per job limits"</i>) 
+{{site.data.alerts.end}}
+
+#### Per job limits
+
+These are limits which apply to a single job. In other words, there is a maximum of resources a single job can use. This is described in the table below,
+and limits will vary depending on the day of the week and the time (*working* vs *non-working* hours). Limits are shown in format: `SlurmQoS(limits)`,
+where `SlurmQoS` can be seen with the command `sacctmgr show qos`:
+
+| Partition   | Mon-Fri 08h-18h               | Sun-Thu 18h-0h                | From Fri 18h to Sun 8h        | From Sun 8h to Mon 18h       |
+|:----------: | :------------------:          | :------------:                | :---------------------:       | :--------------------:       |
+| **general** | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G     |
+| **daily**   | daily(cpu=704,mem=2750G)      | nightly(cpu=1408,mem=5500G)   | unlimited(cpu=2112,mem=8250G) | nightly(cpu=1408,mem=5500G   |
+| **hourly**  | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G) | unlimited(cpu=2112,mem=8250G)|
+
+By default, a job can not use more than 704 cores (max CPU per job). In the same way, memory is also proportionally limited. This is equivalent as 
+running a job using up to 8 nodes at once. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours).
+Limits are softed for the **daily** partition during non working hours, and during the weekend limits are even wider.
+
+For the **hourly** partition, **despite running many parallel jobs is something not desirable** (for allocating such jobs it requires massive draining of nodes), 
+wider limits are provided. In order to avoid massive nodes drain in the cluster, for allocating huge jobs, setting per job limits is necessary. Hence, **unlimited** QoS
+mostly refers to "per user" limits more than to "per job" limits (in other words, users can run any number of hourly jobs, but the job size for such jobs is limited
+with wide values).
+
+#### Per user limits
+
+These limits which apply exclusively to users. In other words, there is a maximum of resource a single user can use. This is described in the table below,
+and limits will vary depending on the day of the week and the time (*working* vs *non-working* hours). Limits are shown in format: `SlurmQoS(limits)`,
+where `SlurmQoS` can be seen with the command `sacctmgr show qos`:
+
+| Partition   | Mon-Fri 08h-18h                | Sun-Thu 18h-0h                | From Fri 18h to Sun 8h         | From Sun 8h to Mon 18h         |
+|:-----------:| :----------------:             | :------------:                | :---------------------:        | :--------------------:         |
+| **general** | normal(cpu=704,mem=2750G)      | normal(cpu=704,mem=2750G)     | normal(cpu=704,mem=2750G)      | normal(cpu=704,mem=2750G)      |
+| **daily**   | daily(cpu=1408,mem=5500G)      | nightly(cpu=2112,mem=8250G)   | unlimited(cpu=6336,mem=24750G) | nightly(cpu=2112,mem=8250G)    |
+| **hourly**  | unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G)| unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G) |
+
+By default, users can not use more than 704 cores at the same time (max CPU per user). Memory is also proportionally limited in the same way. This is 
+equivalent to 8 exclusive nodes. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours). 
+For the **hourly** partition, there are no limits restriction and user limits are removed. Limits are softed for the **daily** partition during non 
+working hours, and during the weekend limits are removed.

 ## Understanding the Slurm configuration (for advanced users)