first stab at mkdocs migration

2025-11-26 17:28:07 +01:00
parent 149de6fb18
commit 1d9c01572d
282 changed files with 200 additions and 8940 deletions
--- a/docs/merlin5/cluster-introduction.md
+++ b/docs/merlin5/cluster-introduction.md
@@ -0,0 +1,44 @@
+---
+title: Cluster 'merlin5'
+#tags:
+#keywords:
+last_updated: 07 April 2021
+#summary: "Merlin 5 cluster overview"
+sidebar: merlin6_sidebar
+permalink: /merlin5/cluster-introduction.html
+---
+
+## Slurm 'merlin5' cluster
+
+**Merlin5** was the old official PSI Local HPC cluster for development and
+mission-critical applications which was built in 2016-2017. It was an
+extension of the Merlin4 cluster and built from existing hardware due 
+to a lack of central investment on Local HPC Resources. **Merlin5** was 
+then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
+with an important central investment of ~1,5M CHF. **Merlin5** was mostly 
+based on CPU resources, but also contained a small amount of GPU-based 
+resources which were mostly used by the BIO experiments.
+
+**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
+called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
+and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
+
+The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)** 
+cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
+contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
+
+### Submitting jobs to 'merlin5'
+
+To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
+the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
+
+## The Merlin Architecture
+
+### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
+
+The following image shows the Slurm architecture design for Merlin cluster.
+It contains a multi non-federated cluster setup, with a central Slurm database
+and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
+
+![Merlin6 Slurm Architecture Design]({{ "/images/merlin-slurm-architecture.png" }})
+
--- a/docs/merlin5/hardware-and-software-description.md
+++ b/docs/merlin5/hardware-and-software-description.md
@@ -0,0 +1,97 @@
+---
+title: Hardware And Software Description
+#tags:
+#keywords:
+last_updated: 09 April 2021
+#summary: ""
+sidebar: merlin6_sidebar
+permalink: /merlin5/hardware-and-software.html
+---
+
+## Hardware
+
+### Computing Nodes
+
+Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
+* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
+* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
+  * 16 internal ports for intra chassis communication
+  * 2 connected external ports for inter chassis communication and storage access.
+
+The below table summarizes the hardware setup for the Merlin5 computing nodes:
+
+<table>
+  <thead>
+   <tr>
+   <th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin5 CPU Computing Nodes</th>
+   </tr>
+   <tr>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Chassis</th>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
+   <th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
+   </tr>
+  </thead>
+  <tbody>
+   <tr style="vertical-align:middle;text-align:center;" ralign="center">
+     <td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#0</b></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[18-30]</b></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
+   </tr>
+   <tr style="vertical-align:middle;text-align:center;" ralign="center">
+     <td rowspan="1"><b>merlin-c-[31,32]</b></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
+   </tr>
+   <tr style="vertical-align:middle;text-align:center;" ralign="center">
+     <td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#1</b></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[33-45]</b></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
+   </tr>
+   <tr style="vertical-align:middle;text-align:center;" ralign="center">
+     <td rowspan="1"><b>merlin-c-[46,47]</b></td>
+     <td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
+   </tr>
+  </tbody>
+</table>
+
+### Login Nodes
+
+The login nodes are part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
+and are used to compile and to submit jobs to the different ***Merlin Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
+Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
+
+### Storage
+
+The storage is part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
+and is mounted in all the ***Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
+Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
+
+### Network
+
+Merlin5 cluster connectivity is based on the [Infiniband QDR](https://en.wikipedia.org/wiki/InfiniBand) technology.
+This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs.
+However, this is an old version of Infiniband which requires older drivers and software can not take advantage of the latest features.
+
+## Software
+
+In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](/merlin6/index.html). 
+
+Due to this, Merlin5 runs:
+* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
+* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
+* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
+* [**MLNX_OFED LTS v.4.9-2.2.4.0**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed), which is an old version, but required because **ConnectX-3** support has been dropped on newer OFED versions.
--- a/docs/merlin5/slurm-configuration.md
+++ b/docs/merlin5/slurm-configuration.md
@@ -0,0 +1,142 @@
+---
+title: Slurm Configuration
+#tags:
+keywords: configuration, partitions, node definition
+last_updated: 20 May 2021
+summary: "This document describes a summary of the Merlin5 Slurm configuration."
+sidebar: merlin6_sidebar
+permalink: /merlin5/slurm-configuration.html
+---
+
+This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster.
+
+The Merlin5 cluster is an old cluster with old hardware which is maintained in a best effort for increasing the CPU power of the Merlin cluster.
+
+## Merlin5 CPU nodes definition
+
+The following table show default and maximum resources that can be used per node:
+
+| Nodes            | Def.#CPUs | Max.#CPUs | #Threads | Max.Mem/Node | Max.Swap |
+|:----------------:| ---------:| :--------:| :------: | :----------: | :-------:|
+| merlin-c-[18-30] | 1 core    | 16 cores  | 1        | 60000        | 10000    |
+| merlin-c-[31-32] | 1 core    | 16 cores  | 1        | 124000       | 10000    |
+| merlin-c-[33-45] | 1 core    | 16 cores  | 1        | 60000        | 10000    |
+| merlin-c-[46-47] | 1 core    | 16 cores  | 1        | 124000       | 10000    |
+
+There is one *main difference between the Merlin5 and Merlin6 clusters*: Merlin5 is keeping an old configuration which does not
+consider the memory as a *consumable resource*. Hence, users can *oversubscribe* memory. This might trigger some side-effects, but
+this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago.
+If you know that this might be a problem for you, please, always use Merlin6 instead.
+
+
+## Running jobs in the 'merlin5' cluster
+
+In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster.
+
+### Merlin5 CPU cluster
+
+To run jobs in the **`merlin5`** cluster users **must** specify the cluster name in Slurm:
+
+```bash
+#SBATCH --cluster=merlin5
+```
+
+### Merlin5 CPU partitions
+
+Users might need to specify the Slurm partition. If no partition is specified, it will default to **`merlin`**:
+
+```bash
+#SBATCH --partition=<partition_name> # Possible <partition_name> values: merlin, merlin-long:
+```
+
+The table below resumes shows all possible partitions available to users:
+
+| CPU Partition      |  Default Time | Max Time | Max Nodes | PriorityJobFactor\* | PriorityTier\*\* |
+|:-----------------: |  :----------: | :------: | :-------: | :-----------------: | :--------------: |
+| **<u>merlin</u>**  |  5 days       | 1 week   | All nodes | 500                 | 1                |
+| **merlin-long**    |  5 days       | 21 days  | 4         | 1                   | 1                |
+
+**\***The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
+partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
+partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
+
+**\*\***Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with  lower *PriorityTier*  value
+and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values.
+
+The **`merlin-long`** partition **is limited to 4 nodes**, as it might contain jobs running for up to 21 days.
+
+### Merlin5 CPU Accounts
+
+Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
+This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
+
+```bash
+#SBATCH --account=merlin  # Possible values: merlin
+```
+
+### Slurm CPU specific options
+
+Some options are available when using CPUs. These are detailed here.
+
+Alternative Slurm options for CPU based jobs are available. Please refer to the **man** pages
+for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`).
+Below are listed the most common settings:
+
+```bash
+#SBATCH --ntasks=<ntasks>
+#SBATCH --ntasks-per-core=<ntasks>
+#SBATCH --ntasks-per-socket=<ntasks>
+#SBATCH --ntasks-per-node=<ntasks>
+#SBATCH --mem=<size[units]>
+#SBATCH --mem-per-cpu=<size[units]>
+#SBATCH --cpus-per-task=<ncpus>
+#SBATCH --cpu-bind=[{quiet,verbose},]<type>  # only for 'srun' command
+```
+
+Notice that in **Merlin5** no hyper-threading is available (while in **Merlin6** it is). 
+Hence, in **Merlin5** there is not need to specify `--hint` hyper-threading related options.
+
+## User and job limits 
+
+In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to 
+avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
+pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied). 
+In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
+resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).
+
+Hence, there is a need of setting up wise limits and to ensure that there is a fair usage of the resources, by trying to optimize the overall efficiency
+of the cluster while allowing jobs of different nature and sizes (it is, **single core** based **vs parallel jobs** of different sizes) to run.
+
+In the **`merlin5`** cluster, as not many users are running on it, these limits are wider than the ones set in the **`merlin6`** and **`gmerlin6`** clusters.
+
+### Per job limits
+
+These are limits which apply to a single job. In other words, there is a maximum of resources a single job can use. These limits are described in the table below,
+with the format `SlurmQoS(limits)` (`SlurmQoS` can be listed from the `sacctmgr show qos` command):
+
+| Partition        | Mon-Sun 0h-24h   | Other limits |
+|:---------------: | :--------------: | :----------: |
+| **merlin**       | merlin5(cpu=384) | None         |
+| **merlin-long**  | merlin5(cpu=384) | Max. 4 nodes |
+
+By default, by QoS limits, a job can not use more than 384 cores (max CPU per job). 
+However, for the `merlin-long`, this is even more restricted: there is an extra limit of 4 dedicated nodes for this partion. This is defined
+at the partition level, and will overwrite any QoS limit as long as this is more restrictive.
+
+### Per user limits for CPU partitions
+
+No user limits apply by QoS. For the **`merlin`** partition, a single user could fill the whole batch system with jobs (however, the restriction is at the job size, as explained above). For the **`merlin-limit`** partition, the 4 node limitation still applies.
+
+## Advanced Slurm configuration
+
+Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
+Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
+
+For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
+
+* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
+* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
+* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
+
+The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
+Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).