gitea-pages/pages/merlin5/slurm-configuration.md

---
title: Slurm Configuration
#tags:
keywords: configuration, partitions, node definition
last_updated: 20 May 2021
summary: "This document describes a summary of the Merlin5 Slurm configuration."
sidebar: merlin6_sidebar
permalink: /merlin5/slurm-configuration.html
---

This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster.

The Merlin5 cluster is an old cluster with old hardware which is maintained in a best effort for increasing the CPU power of the Merlin cluster.

## Merlin5 CPU nodes definition

The following table show default and maximum resources that can be used per node:

| Nodes            | Def.#CPUs | Max.#CPUs | #Threads | Max.Mem/Node | Max.Swap |
|:----------------:| ---------:| :--------:| :------: | :----------: | :-------:|
| merlin-c-[18-30] | 1 core    | 16 cores  | 1        | 60000        | 10000    |
| merlin-c-[31-32] | 1 core    | 16 cores  | 1        | 124000       | 10000    |
| merlin-c-[33-45] | 1 core    | 16 cores  | 1        | 60000        | 10000    |
| merlin-c-[46-47] | 1 core    | 16 cores  | 1        | 124000       | 10000    |

There is one *main difference between the Merlin5 and Merlin6 clusters*: Merlin5 is keeping an old configuration which does not
consider the memory as a *consumable resource*. Hence, users can *oversubscribe* memory. This might trigger some side-effects, but
this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago.
If you know that this might be a problem for you, please, always use Merlin6 instead.


## Running jobs in the 'merlin5' cluster

In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster.

### Merlin5 CPU cluster

To run jobs in the **`merlin5`** cluster users **must** specify the cluster name in Slurm:

```bash
#SBATCH --cluster=merlin5
```

### Merlin5 CPU partitions

Users might need to specify the Slurm partition. If no partition is specified, it will default to **`merlin`**:

```bash
#SBATCH --partition=<partition_name> # Possible <partition_name> values: merlin, merlin-long:
```

The table below resumes shows all possible partitions available to users:

| CPU Partition      |  Default Time | Max Time | Max Nodes | PriorityJobFactor\* | PriorityTier\*\* |
|:-----------------: |  :----------: | :------: | :-------: | :-----------------: | :--------------: |
| **<u>merlin</u>**  |  5 days       | 1 week   | All nodes | 500                 | 1                |
| **merlin-long**    |  5 days       | 21 days  | 4         | 1                   | 1                |

**\***The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.

**\*\***Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with  lower *PriorityTier*  value
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values.

The **`merlin-long`** partition **is limited to 4 nodes**, as it might contain jobs running for up to 21 days.

### Merlin5 CPU Accounts

Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.

```bash
#SBATCH --account=merlin  # Possible values: merlin
```

### Slurm CPU specific options

Some options are available when using CPUs. These are detailed here.

Alternative Slurm options for CPU based jobs are available. Please refer to the **man** pages
for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`).
Below are listed the most common settings:

```bash
#SBATCH --ntasks=<ntasks>
#SBATCH --ntasks-per-core=<ntasks>
#SBATCH --ntasks-per-socket=<ntasks>
#SBATCH --ntasks-per-node=<ntasks>
#SBATCH --mem=<size[units]>
#SBATCH --mem-per-cpu=<size[units]>
#SBATCH --cpus-per-task=<ncpus>
#SBATCH --cpu-bind=[{quiet,verbose},]<type>  # only for 'srun' command
```

Notice that in **Merlin5** no hyper-threading is available (while in **Merlin6** it is).
Hence, in **Merlin5** there is not need to specify `--hint` hyper-threading related options.

## User and job limits

In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied).
In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).

Hence, there is a need of setting up wise limits and to ensure that there is a fair usage of the resources, by trying to optimize the overall efficiency
of the cluster while allowing jobs of different nature and sizes (it is, **single core** based **vs parallel jobs** of different sizes) to run.

In the **`merlin5`** cluster, as not many users are running on it, these limits are wider than the ones set in the **`merlin6`** and **`gmerlin6`** clusters.

### Per job limits

These are limits which apply to a single job. In other words, there is a maximum of resources a single job can use. These limits are described in the table below,
with the format `SlurmQoS(limits)` (`SlurmQoS` can be listed from the `sacctmgr show qos` command):

| Partition        | Mon-Sun 0h-24h   | Other limits |
|:---------------: | :--------------: | :----------: |
| **merlin**       | merlin5(cpu=384) | None         |
| **merlin-long**  | merlin5(cpu=384) | Max. 4 nodes |

By default, by QoS limits, a job can not use more than 384 cores (max CPU per job).
However, for the `merlin-long`, this is even more restricted: there is an extra limit of 4 dedicated nodes for this partion. This is defined
at the partition level, and will overwrite any QoS limit as long as this is more restrictive.

### Per user limits for CPU partitions

No user limits apply by QoS. For the **`merlin`** partition, a single user could fill the whole batch system with jobs (however, the restriction is at the job size, as explained above). For the **`merlin-limit`** partition, the 4 node limitation still applies.

## Advanced Slurm configuration

Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.

For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:

* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.

The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).