first stab at mkdocs migration

2025-11-26 17:28:07 +01:00
parent 7475369bc4
commit 10eae1319b
282 changed files with 200 additions and 8940 deletions
--- a/docs/merlin7/03-Slurm-General-Documentation/interactive-jobs.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/interactive-jobs.md
@@ -0,0 +1,213 @@
+---
+title: Running Interactive Jobs
+#tags:
+keywords: interactive, X11, X, srun, salloc, job, jobs, slurm, nomachine, nx
+last_updated: 07 August 2024
+summary: "This document describes how to run interactive jobs as well as X based software."
+sidebar: merlin7_sidebar
+permalink: /merlin7/interactive-jobs.html
+---
+
+### The Merlin7 'interactive' partition
+
+On the **`merlin7`** cluster, it is recommended to always run interactive jobs on the **`interactive`** partition.
+This partition allows CPU oversubscription (up to four users may share the same CPU) and **has the highest scheduling priority**. Access to this partition is typically quick, making it a convenient extension of the login nodes for interactive workloads.
+
+On the **`gmerlin7`** cluster, additional interactive partitions are available, but these are primarily intended for CPU-only workloads (such like compiling GPU-based software, or creating an allocation for submitting jobs to Grace-Hopper nodes).
+
+{{site.data.alerts.warning}}
+Because <b>GPU resources are scarce and expensive</b>, interactive allocations on GPU nodes that use GPUs should only be submitted when strictly necessary and well justified.
+{{site.data.alerts.end}}
+
+## Running interactive jobs
+
+There are two different ways for running interactive jobs in Slurm. This is possible by using
+the ``salloc`` and ``srun`` commands:
+
+* **``salloc``**: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
+* **``srun``**: is used for running parallel tasks.
+
+### srun
+
+Is run is used to run parallel jobs in the batch system. It can be used within a batch script
+(which can be run with ``sbatch``), or within a job allocation (which can be run with ``salloc``).
+Also, it can be used as a direct command (in example, from the login nodes).
+
+When used inside a batch script or during a job allocation, ``srun`` is constricted to the 
+amount of resources allocated by the ``sbatch``/``salloc`` commands. In ``sbatch``, usually 
+these resources are defined inside the batch script with the format ``#SBATCH <option>=<value>``. 
+In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core) 
+and 2 nodes, ``srun`` is constricted to these amount of resources (you can use less, but never 
+exceed those limits).
+
+When used from the login node, usually is used to run a specific command or software in an 
+interactive way. ``srun`` is a blocking process (it will block bash prompt until the ``srun`` 
+command finishes, unless you run it in background with ``&``). This can be very useful to run 
+interactive software which pops up a Window and then submits jobs or run sub-tasks in the 
+background (in example, **Relion**, **cisTEM**, etc.)
+
+Refer to ``man srun`` for exploring all possible options for that command.
+
+<details>
+<summary>[Show 'srun' example]: Running 'hostname' command on 3 nodes, using 2 cores (1 task/core) per node</summary>
+<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
+caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --ntasks=6 --ntasks-per-node=2 --nodes=3 hostname
+cn001.merlin7.psi.ch
+cn001.merlin7.psi.ch
+cn002.merlin7.psi.ch
+cn002.merlin7.psi.ch
+cn003.merlin7.psi.ch
+cn003.merlin7.psi.ch
+</pre>
+</details>
+
+### salloc
+
+**``salloc``** is used to obtain a Slurm job allocation (a set of nodes). Once job is allocated,
+users are able to execute interactive command(s). Once finished (``exit`` or ``Ctrl+D``), 
+the allocation is released. **``salloc``** is a blocking command, it is, command will be blocked 
+until the requested resources are allocated. 
+
+When running **``salloc``**, once the resources are allocated, *by default* the user will get 
+a ***new shell on one of the allocated resources*** (if a user has requested few nodes, it will 
+prompt a new shell on the first allocated node). However, this behaviour can be changed by adding
+a shell (`$SHELL`) at the end of the `salloc` command. In example:
+
+```bash
+# Typical 'salloc' call 
+salloc --clusters=merlin7 --partition=interactive -N 2 -n 2
+
+# Custom 'salloc' call
+#   - $SHELL will open a local shell on the login node from where ``salloc`` is running
+salloc --clusters=merlin7 --partition=interactive -N 2 -n 2 $SHELL
+```
+
+<details>
+<summary>[Show 'salloc' example]: Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - <i>Default</i></summary>
+<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
+caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive -N 2 -n 2
+salloc: Granted job allocation 161
+salloc: Nodes cn[001-002] are ready for job
+
+caubet_m@login001:~> srun hostname
+cn002.merlin7.psi.ch
+cn001.merlin7.psi.ch
+
+caubet_m@login001:~> exit
+exit
+salloc: Relinquishing job allocation 161
+</pre>
+</details>
+
+<details>
+<summary>[Show 'salloc' example]: Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - <i>$SHELL</i></summary>
+<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
+caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --ntasks=2 --nodes=2 $SHELL
+salloc: Granted job allocation 165
+salloc: Nodes cn[001-002] are ready for job
+caubet_m@login001:~> srun hostname
+cn001.merlin7.psi.ch
+cn002.merlin7.psi.ch
+caubet_m@login001:~> exit
+exit
+salloc: Relinquishing job allocation 165
+</pre>
+</details>
+
+## Running interactive jobs with X11 support
+
+### Requirements
+
+#### Graphical access
+
+[NoMachine](/merlin7/nomachine.html) is the official supported service for graphical 
+access in the Merlin cluster. This service is running on the login nodes. Check the 
+document [{Accessing Merlin -> NoMachine}](/merlin7/nomachine.html) for details about 
+how to connect to the **NoMachine** service in the Merlin cluster.
+
+For other non officially supported graphical access (X11 forwarding):
+
+* For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](/merlin7/connect-from-linux.html)
+* For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](/merlin7/connect-from-windows.html)
+* For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](/merlin7/connect-from-macos.html)
+
+### 'srun' with x11 support
+
+Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to 
+add the option ``--x11`` to the ``srun`` command. In example:
+
+```bash
+srun --clusters=merlin7 --partition=interactive --x11 sview
+```
+
+will popup a X11 based slurm view of the cluster.
+
+In the same manner, you can create a bash shell with x11 support. For doing that, you need
+to add the option ``--pty`` to the ``srun --x11`` command. Once resource is allocated, from
+there you can interactively run X11 and non-X11 based commands.
+
+```bash
+srun --clusters=merlin7 --partition=interactive --x11 --pty bash
+```
+
+<details>
+<summary>[Show 'srun' with X11 support examples]</summary>
+<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
+caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 sview
+
+caubet_m@login001:~> 
+
+caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 --pty bash
+
+caubet_m@cn003:~> sview
+
+caubet_m@cn003:~> echo "This was an example"
+This was an example
+
+caubet_m@cn003:~> exit
+exit
+</pre>
+</details>
+
+### 'salloc' with x11 support
+
+**Merlin6** and **merlin7** clusters allow running any windows based applications. For that, you need to 
+add the option ``--x11`` to the ``salloc`` command. In example:
+
+```bash
+salloc --clusters=merlin7 --partition=interactive --x11 sview
+```
+
+will popup a X11 based slurm view of the cluster.
+
+In the same manner, you can create a bash shell with x11 support. For doing that, you need
+to add to run just ``salloc --clusters=merlin7 --partition=interactive --x11``. Once resource is allocated, from 
+there you can interactively run X11 and non-X11 based commands.
+
+```bash
+salloc --clusters=merlin7 --partition=interactive --x11
+```
+
+<details>
+<summary>[Show 'salloc' with X11 support examples]</summary>
+<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
+caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11 sview
+salloc: Granted job allocation 174
+salloc: Nodes cn001 are ready for job
+salloc: Relinquishing job allocation 174
+
+caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11 
+salloc: Granted job allocation 175
+salloc: Nodes cn001 are ready for job
+caubet_m@cn001:~> 
+
+caubet_m@cn001:~> sview
+
+caubet_m@cn001:~> echo "This was an example"
+This was an example
+
+caubet_m@cn001:~> exit
+exit
+salloc: Relinquishing job allocation 175
+</pre>
+</details>
--- a/docs/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md
@@ -0,0 +1,59 @@
+---
+title: Slurm cluster 'merlin7'
+#tags:
+keywords: configuration, partitions, node definition
+#last_updated: 24 Mai 2023
+summary: "This document describes a summary of the Merlin7 configuration."
+sidebar: merlin7_sidebar
+permalink: /merlin7/merlin7-configuration.html
+---
+
+This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
+
+## Infrastructure
+
+### Hardware
+
+  * 2 CPU-only login nodes
+  * 77 CPU-only compute nodes
+  * 5 GPU A100 nodes
+  * 8 GPU Grace Hopper nodes
+
+The specification of the node types is:
+
+| Node           | #Nodes | CPU                                                               | RAM                        | GRES                 |
+| ----:          | ------ | ---                                                               | ---                        | ----                 |
+| Login Nodes    |      2 | _2x_ AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz)               | 512GB DDR4 3200Mhz         |                      |
+| CPU Nodes      |     77 | _2x_ AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz)               | 512GB DDR4 3200Mhz         |                      |
+| A100 GPU Nodes |      5 | _2x_ AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz)               | 512GB DDR4 3200Mhz         | 4 x NV_A100 (80GB)   |
+| GH   GPU Nodes |      3 | _2x_ NVidia Grace Neoverse-V2 (SBSA ARM 64bit, 144 Cores, 3.1GHz) | _2x_ 480GB DDR5X (CPU+GPU) | 4 x NV_GH200 (120GB) |
+
+### Network
+
+The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able
+to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on at [HPE](https://www.hpe.com/psnow/doc/PSN1012904596HREN) and
+at <https://www.glennklockwood.com/garden/slingshot>.
+
+Through software interfaces like [libFabric](https://ofiwg.github.io/libfabric/) (which available on Merlin7), application can leverage the network seamlessly.
+
+### Storage
+
+Unlike previous iteration of the Merlin HPC clusters, Merlin7 _does not_ have any local storage. Instead storage for the entire cluster is provided through
+a dedicated storage appliance from HPE/Cray called [ClusterStor](https://www.hpe.com/psnow/doc/PSN1012842049INEN.pdf).
+
+The appliance is built of several storage servers:
+
+* 2 management nodes
+* 2 MDS servers, 12 drives per server, 2.9TiB (Raid10)
+* 8 OSS-D servers, 106 drives per server, 14.5 T.B HDDs (Gridraid / Raid6)
+* 4 OSS-F servers, 12 drives per server 7TiB SSDs (Raid10)
+
+With effective storage capacity of:
+
+* 10 PB HDD
+  * value visible on linux: HDD 9302.4 TiB
+* 162 TB SSD
+  * value visible on linux: SSD 151.6 TiB
+* 23.6 TiB on Metadata
+
+The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.
--- a/docs/merlin7/03-Slurm-General-Documentation/slurm-configuration.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/slurm-configuration.md
@@ -0,0 +1,370 @@
+---
+title: Slurm merlin7 Configuration
+#tags:
+keywords: configuration, partitions, node definition
+#last_updated: 24 Mai 2023
+summary: "This document describes a summary of the Merlin7 Slurm CPU-based configuration."
+sidebar: merlin7_sidebar
+permalink: /merlin7/slurm-configuration.html
+---
+
+This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
+
+## Public partitions configuration summary
+
+### CPU public partitions
+
+| PartitionName      |  DefaultTime  | MaxTime     | Priority | Account          | Per Job Limits        | Per User Limits       | 
+| -----------------: |  -----------: | ----------: | -------: | ---------------: | --------------------: | --------------------: |
+| **<u>general</u>** |  1-00:00:00   | 7-00:00:00  | Low      | <u>merlin</u>    | cpu=1024,mem=1920G    | cpu=1024,mem=1920G    |
+| **daily**          |  0-01:00:00   | 1-00:00:00  | Medium   | <u>merlin</u>    | cpu=1024,mem=1920G    | cpu=2048,mem=3840G    |
+| **hourly**         |  0-00:30:00   | 0-01:00:00  | High     | <u>merlin</u>    | cpu=2048,mem=3840G    |   cpu=8192,mem=15T    |
+| **interactive**    |  0-04:00:00   | 0-12:00:00  | Highest  | <u>merlin</u>    | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 |
+
+### GPU public partitions
+
+#### A100 nodes
+
+| PartitionName        | DefaultTime  | MaxTime     | Priority   | Account        | Per Job Limits                   | Per User Limits                  |
+| -------------------: | -----------: | ----------: | ---------: | -------------: | -------------------------------: | -------------------------------: |
+| **a100-general**     | 1-00:00:00   | 7-00:00:00  | Low        | <u>merlin</u>  | gres/gpu=4                       | gres/gpu=8                       |
+| **a100-daily**       | 0-01:00:00   | 1-00:00:00  | Medium     | <u>merlin</u>  | gres/gpu=8                       | gres/gpu=8                       |
+| **a100-hourly**      | 0-00:30:00   | 0-01:00:00  | High       | <u>merlin</u>  | gres/gpu=8                       | gres/gpu=8                       |
+| **a100-interactive** | 0-01:00:00   | 0-12:00:00  | Very High  | <u>merlin</u>  | cpu=16,gres/gpu=1,mem=60G,node=1 | cpu=16,gres/gpu=1,mem=60G,node=1 |
+                                                                                   
+#### Grace-Hopper nodes
+
+| PartitionName        | DefaultTime  | MaxTime     | Priority   | Account        | Per Job Limits                   | Per User Limits                  |
+| -------------------: | -----------: | ----------: | ---------: | -------------: | -------------------------------: | -------------------------------: |
+| **gh-general**       |  1-00:00:00   | 7-00:00:00 | Low        | <u>merlin</u>  | gres/gpu=4                       | gres/gpu=8                       |
+| **gh-daily**         |  0-01:00:00   | 1-00:00:00 | Medium     | <u>merlin</u>  | gres/gpu=8                       | gres/gpu=8                       |
+| **gh-hourly**        |  0-00:30:00   | 0-01:00:00 | High       | <u>merlin</u>  | gres/gpu=8                       | gres/gpu=8                       |
+| **gh-interactive**   |  0-01:00:00   | 0-12:00:00 | Very High  | <u>merlin</u>  | cpu=16,gres/gpu=1,mem=46G,node=1 | cpu=16,gres/gpu=1,mem=46G,node=1 |
+
+## CPU cluster: merlin7
+
+**By default, jobs will be submitted to `merlin7`**, as it is the primary cluster configured on the login nodes.
+Specifying the cluster name is typically unnecessary unless you have defined environment variables that could override the default cluster name.
+However, when necessary, one can specify the cluster as follows:
+```bash
+#SBATCH --cluster=merlin7
+```
+
+### CPU general configuration
+
+The **Merlin7 CPU cluster** is configured with the **`CR_CORE_MEMORY`** and **`CR_ONE_TASK_PER_CORE`** options.
+* This configuration treats both cores and memory as consumable resources.
+* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU 
+  to fulfill a job's resource requirements.
+
+By default, Slurm will allocate one task per core, which means:
+* Each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job.
+
+This behavior ensures consistent resource allocation but may result in underutilization of hyper-threading in some cases.
+
+### CPU nodes definition
+
+The table below provides an overview of the Slurm configuration for the different node types in the Merlin7 cluster.
+This information is essential for understanding how resources are allocated, enabling users to tailor their submission
+scripts accordingly.
+
+| Nodes                | Sockets  | CoresPerSocket  | Cores  | ThreadsPerCore   | CPUs  | MaxMemPerNode | DefMemPerCPU | Features      |
+| --------------------:| -------: | --------------: | -----: | --------------:  | ----: | ------------: | -----------: | ------------: |
+| login[001-002]       | 2        | 64              | 128    | 2                | 256   | 480G          | 1920M        | AMD_EPYC_7713 |
+| cn[001-077]          | 2        | 64              | 128    | 2                | 256   | 480G          | 1920M        | AMD_EPYC_7713 |
+
+Notes on memory configuration:
+* **Memory allocation options:** To request additional memory, use the following options in your submission script:
+   * **`--mem=<mem_in_MB>`**: Allocates memory per node.
+   * **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
+
+  The total memory requested cannot exceed the **`MaxMemPerNode`** value.
+* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core, 
+effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
+adjusted.
+
+  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. 
+  In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
+
+{{site.data.alerts.tip}}
+Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
+{{site.data.alerts.end}}
+
+### User and job limits with QoS
+
+In the `merlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
+overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
+efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific 
+limits may result in pending jobs even when many nodes are idle due to low activity.
+
+On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
+all available resources, which could block other jobs from running. Without job size limits, for instance, a large job 
+might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
+
+Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These 
+limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist 
+effectively.
+
+To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied 
+**to specific partitions** in line with the established resource allocation policies. The table below outlines the 
+various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
+* `MaxTRES` specifies resource limits per job.
+* `MaxTRESPU` specifies resource limits per user.
+
+|        Name          |            MaxTRES    |          MaxTRESPU    | Scope                  |
+| -------------------: | --------------------: | --------------------: | ---------------------: |
+|      **normal**      |                       |                       | partition              |
+| **cpu_general**      | cpu=1024,mem=1920G    | cpu=1024,mem=1920G    | <u>user</u>, partition |
+|   **cpu_daily**      | cpu=1024,mem=1920G    | cpu=2048,mem=3840G    | partition              |
+|  **cpu_hourly**      | cpu=2048,mem=3840G    |   cpu=8192,mem=15T    | partition              |
+|  **cpu_interactive** | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 | partition              |
+
+Where:
+* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job 
+  restrictions.
+* **`cpu_general` QoS:** This is the **default QoS** for `merlin7` _users_. It limits the total resources available to each
+  user. Additionally, this QoS is applied to the `general` partition, enforcing restrictions at the partition level and
+  overriding user-level QoS.
+* **`cpu_daily` QoS:** Guarantees increased resources for the `daily` partition, accommodating shorter-duration jobs
+  with higher resource needs.
+* **`cpu_hourly` QoS:** Offers the least constraints, allowing more resources to be used for the `hourly` partition,
+  which caters to very short-duration jobs.
+* **`cpu_interactive` QoS:** Is restricted to one node and a few CPUs only, and is intended to be used when interactive
+allocations are necessary (`salloc`, `srun`).
+
+For additional details, refer to the [CPU partitions](/merlin7/slurm-configuration.html#CPU-partitions) section.
+
+{{site.data.alerts.tip}}
+Always verify QoS definitions for potential changes using the  <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
+{{site.data.alerts.end}}
+
+### CPU partitions
+
+This section provides a summary of the partitions available in the `merlin7` CPU cluster.
+
+Key concepts:
+* **`PriorityJobFactor`**: This value is added to a job’s priority (visible in the `PARTITION` column of the `sprio -l` command).
+  Jobs submitted to partitions with higher `PriorityJobFactor` values generally run sooner. However, other factors like *job age*
+  and especially *fair share* can also influence scheduling.
+* **`PriorityTier`**: Jobs submitted to partitions with higher `PriorityTier` values take precedence over pending jobs in partitions
+  with lower `PriorityTier` values. Additionally, jobs from higher `PriorityTier` partitions can preempt running jobs in lower-tier
+  partitions, where applicable.
+* **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
+  for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
+  QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
+
+{{site.data.alerts.tip}}
+Always verify partition configurations for potential changes using the  <b>'scontrol show partition'</b> command.
+{{site.data.alerts.end}}
+
+#### CPU public partitions
+
+| PartitionName      |  DefaultTime  | MaxTime     | TotalNodes | PriorityJobFactor | PriorityTier | QoS             | AllowAccounts  |
+| -----------------: |  -----------: | ----------: | --------:  | ----------------: | -----------: | --------------: | -------------: |
+| **<u>general</u>** |  1-00:00:00   | 7-00:00:00  | 46         | 1                 | 1            | cpu_general     | <u>merlin</u>  |
+| **daily**          |  0-01:00:00   | 1-00:00:00  | 58         | 500               | 1            | cpu_daily       | <u>merlin</u>  |
+| **hourly**         |  0-00:30:00   | 0-01:00:00  | 77         | 1000              | 1            | cpu_hourly      | <u>merlin</u>  |
+| **interactive**    |  0-04:00:00   | 0-12:00:00  | 58         | 1                 | 2            | cpu_interactive | <u>merlin</u>  |
+
+All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
+Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
+
+{{site.data.alerts.tip}}
+For jobs running less than one day, submit them to the <b>daily</b> partition.
+For jobs running less than one hour, use the <b>hourly</b> partition.
+These partitions provide higher priority and ensure quicker scheduling compared to <b>general</b>, which has limited node availability.
+{{site.data.alerts.end}}
+
+The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed 
+by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the 
+**`hourly`** partition might experience delays in such scenarios.
+
+The **`interactive`** partition is designed specifically for real-time, interactive work. Here are the key characteristics:
+
+* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive 
+jobs may share the same physical CPU core. This can impact performance, but enables fast access for short-term tasks.
+* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled 
+before any jobs in other partitions.
+* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where 
+immediate access is important.
+
+{{site.data.alerts.warning}}
+Because of CPU sharing, the performance on the **'interactive'** partition may not be optimal for compute-intensive tasks. 
+For long-running or production workloads, use a dedicated batch partition instead.
+{{site.data.alerts.end}}
+
+#### CPU private partitions
+
+##### CAS / ASA
+
+| PartitionName      |  DefaultTime  | MaxTime     | TotalNodes | PriorityJobFactor | PriorityTier | QoS         | AllowAccounts  |
+| -----------------: |  -----------: | ----------: | --------:  | ----------------: | -----------: | ----------: | -------------: |
+| **asa**            |  0-01:00:00   | 14-00:00:00 | 10         | 1                 | 2            | normal      | asa            |
+
+##### CNM / Mu3e
+
+| PartitionName      |  DefaultTime  | MaxTime     | TotalNodes | PriorityJobFactor | PriorityTier | QoS         | AllowAccounts  |
+| -----------------: |  -----------: | ----------: | --------:  | ----------------: | -----------: | ----------: | -------------: |
+| **mu3e**           |  1-00:00:00   | 7-00:00:00  | 4          | 1                 | 2            | normal      | mu3e, meg      |
+
+##### CNM / MeG
+
+| PartitionName      |  DefaultTime  | MaxTime     | TotalNodes | PriorityJobFactor | PriorityTier | QoS         | AllowAccounts  |
+| -----------------: |  -----------: | ----------: | --------:  | ----------------: | -----------: | ----------: | -------------: |
+| **meg-short**      |  0-01:00:00   | 0-01:00:00  | unlimited | 1000               | 2            | normal      | meg            |
+| **meg-long**       |  1-00:00:00   | 5-00:00:00  | unlimited | 1                  | 2            | normal      | meg            |
+| **meg-prod**       |  1-00:00:00   | 5-00:00:00  | unlimited | 1000               | 4            | normal      | meg            |
+
+## GPU cluster: gmerlin7
+
+As mentioned in previous sections, by default, jobs will be submitted to `merlin7`, as it is the primary cluster configured on the login nodes.
+For submittng jobs to the GPU cluster, **the cluster name `gmerlin7` must be specified**, as follows:
+```bash
+#SBATCH --cluster=gmerlin7
+```
+
+### GPU general configuration
+
+The **Merlin7 GPU cluster** is configured with the **`CR_CORE_MEMORY`**, **`CR_ONE_TASK_PER_CORE`**, and **`ENFORCE_BINDING_GRES`** options.
+* This configuration treats both cores and memory as consumable resources.
+* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU 
+  to fulfill a job's resource requirements.
+* Slurm will allocate the CPUs to the selected GPU.
+
+By default, Slurm will allocate one task per core, which means:
+* For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job.
+* For the NVIDIA GraceHopper-based nodes, each task will consume 1 **CPU**.
+
+This behavior ensures consistent resource allocation but may result in underutilization of hyper-threading in some cases.
+
+### GPU nodes definition
+
+The table below provides an overview of the Slurm configuration for the different node types in the Merlin7 cluster.
+This information is essential for understanding how resources are allocated, enabling users to tailor their submission
+scripts accordingly.
+
+| Nodes                | Sockets  | CoresPerSocket  | Cores  | ThreadsPerCore  | CPUs  | MaxMemPerNode | DefMemPerCPU | Gres                        | Features               |
+| --------------------:| -------: | --------------: | -----: | --------------: | ----: | ------------: | -----------: | --------------------------: | ---------------------: |
+| gpu[001-007]         | 4        | 72              | 288    | 1               | 288   | 828G          | 2944M        | gpu:gh200:4                 | AMD_EPYC_7713, NV_A100 |
+| gpu[101-105]         | 1        | 64              | 64     | 2               | 128   | 480G          | 3840M        | gpu:nvidia_a100-sxm4-80gb:4 | GH200, NV_H100         |
+
+Notes on memory configuration:
+* **Memory allocation options:** To request additional memory, use the following options in your submission script:
+   * **`--mem=<mem_in_MB>`**: Allocates memory per node.
+   * **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
+
+  The total memory requested cannot exceed the **`MaxMemPerNode`** value.
+* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core, 
+effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
+adjusted.
+
+  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. 
+  In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
+
+{{site.data.alerts.tip}}
+Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
+{{site.data.alerts.end}}
+
+### User and job limits with QoS
+
+In the `gmerlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
+overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
+efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific 
+limits may result in pending jobs even when many nodes are idle due to low activity.
+
+On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
+all available resources, which could block other jobs from running. Without job size limits, for instance, a large job 
+might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
+
+Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These 
+limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist 
+effectively.
+
+To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied 
+**to specific partitions** in line with the established resource allocation policies. The table below outlines the 
+various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
+* `MaxTRES` specifies resource limits per job.
+* `MaxTRESPU` specifies resource limits per user.
+
+|        Name              |            MaxTRES               |          MaxTRESPU              | Scope                  |
+| -----------------------: | -------------------------------: | ------------------------------: | ---------------------: |
+|      **normal**          |                                  |                                 | partition              |
+| **gpu_general**          | gres/gpu=4                       |                      gres/gpu=8 | <u>user</u>, partition |
+|   **gpu_daily**          | gres/gpu=8                       |                      gres/gpu=8 | partition              |
+|  **gpu_hourly**          | gres/gpu=8                       |                      gres/gpu=8 | partition              |
+| **gpu_gh_interactive**   | cpu=16,gres/gpu=1,mem=46G,node=1 |cpu=16,gres/gpu=1,mem=46G,node=1 | partition              |
+| **gpu_a100_interactive** | cpu=16,gres/gpu=1,mem=60G,node=1 |cpu=16,gres/gpu=1,mem=60G,node=1 | partition              |
+
+Where:
+* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job 
+  restrictions.
+* **`gpu_general` QoS:** This is the **default QoS** for `gmerlin7` _users_. It limits the total resources available to each
+  user. Additionally, this QoS is applied to the `[a100|gh]-general` partitions, enforcing restrictions at the partition level and
+  overriding user-level QoS.
+* **`gpu_daily` QoS:** Guarantees increased resources for the `[a100|gh]-daily` partitions, accommodating shorter-duration jobs
+  with higher resource needs.
+* **`gpu_hourly` QoS:** Offers the least constraints, allowing more resources to be used for the `[a100|gh]-hourly` partitions,
+  which caters to very short-duration jobs.
+* **`gpu_a100_interactive` & `gpu_gh_interactive` QoS:** Guarantee interactive access to GPU nodes for software compilation and
+  small testing.
+
+For additional details, refer to the [GPU partitions](/merlin7/slurm-configuration.html#GPU-partitions) section.
+
+{{site.data.alerts.tip}}
+Always verify QoS definitions for potential changes using the  <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
+{{site.data.alerts.end}}
+
+### GPU partitions
+
+This section provides a summary of the partitions available in the `gmerlin7` GPU cluster.
+
+Key concepts:
+* **`PriorityJobFactor`**: This value is added to a job’s priority (visible in the `PARTITION` column of the `sprio -l` command).
+  Jobs submitted to partitions with higher `PriorityJobFactor` values generally run sooner. However, other factors like *job age*
+  and especially *fair share* can also influence scheduling.
+* **`PriorityTier`**: Jobs submitted to partitions with higher `PriorityTier` values take precedence over pending jobs in partitions
+  with lower `PriorityTier` values. Additionally, jobs from higher `PriorityTier` partitions can preempt running jobs in lower-tier
+  partitions, where applicable.
+* **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
+  for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
+  QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
+
+{{site.data.alerts.tip}}
+Always verify partition configurations for potential changes using the  <b>'scontrol show partition'</b> command.
+{{site.data.alerts.end}}
+
+#### A100-based partitions
+
+| PartitionName        |  DefaultTime  | MaxTime     | TotalNodes | PriorityJobFactor | PriorityTier | QoS                  | AllowAccounts  |
+| -------------------: |  -----------: | ----------: | --------:  | ----------------: | -----------: | -------------------: | -------------: |
+| **a100-general**     |  1-00:00:00   | 7-00:00:00  | 3          | 1                 | 1            | gpu_general          | <u>merlin</u>  |
+| **a100-daily**       |  0-01:00:00   | 1-00:00:00  | 4          | 500               | 1            | gpu_daily            | <u>merlin</u>  |
+| **a100-hourly**      |  0-00:30:00   | 0-01:00:00  | 5          | 1000              | 1            | gpu_hourly           | <u>merlin</u>  |
+| **a100-interactive** |  0-01:00:00   | 0-12:00:00  | 5          | 1                 | 2            | gpu_a100_interactive | <u>merlin</u>  |
+
+All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
+Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
+
+{{site.data.alerts.tip}}
+For jobs running less than one day, submit them to the <b>a100-daily</b> partition.
+For jobs running less than one hour, use the <b>a100-hourly</b> partition.
+These partitions provide higher priority and ensure quicker scheduling compared to <b>a100-general</b>, which has limited node availability.
+{{site.data.alerts.end}}
+
+#### GH-based partitions
+
+| PartitionName        |  DefaultTime  | MaxTime     | TotalNodes | PriorityJobFactor | PriorityTier | QoS                  | AllowAccounts  |
+| -------------------: |  -----------: | ----------: | --------:  | ----------------: | -----------: | -------------------: | -------------: |
+| **gh-general**       |  1-00:00:00   | 7-00:00:00  | 5          | 1                 | 1            | gpu_general          | <u>merlin</u>  |
+| **gh-daily**         |  0-01:00:00   | 1-00:00:00  | 6          | 500               | 1            | gpu_daily            | <u>merlin</u>  |
+| **gh-hourly**        |  0-00:30:00   | 0-01:00:00  | 7          | 1000              | 1            | gpu_hourly           | <u>merlin</u>  |
+| **gh-interactive**   |  0-01:00:00   | 0-12:00:00  | 7          | 1                 | 2            | gpu_gh_interactive   | <u>merlin</u>  |
+
+All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
+Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
+
+{{site.data.alerts.tip}}
+For jobs running less than one day, submit them to the <b>gh-daily</b> partition.
+For jobs running less than one hour, use the <b>gh-hourly</b> partition.
+These partitions provide higher priority and ensure quicker scheduling compared to <b>gh-general</b>, which has limited node availability.
+{{site.data.alerts.end}}
--- a/docs/merlin7/03-Slurm-General-Documentation/slurm-examples.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/slurm-examples.md
@@ -0,0 +1,68 @@
+---
+title: Slurm Examples
+#tags:
+keywords: slurm example, template, examples, templates, running jobs, sbatch, single core based jobs, HT, multithread, no-multithread, mpi, openmp, packed jobs, hands-on, array jobs, gpu
+last_updated: 24 Mai 2023
+summary: "This document shows different template examples for running jobs in the Merlin cluster."
+sidebar: merlin7_sidebar
+permalink: /merlin7/slurm-examples.html
+---
+
+## Single core based job examples
+
+```bash
+#!/bin/bash
+#SBATCH --partition=hourly      # Using 'hourly' will grant higher priority
+#SBATCH --ntasks-per-core=2     # Request the max ntasks be invoked on each core
+#SBATCH --hint=multithread      # Use extra threads with in-core multi-threading
+#SBATCH --time=00:30:00         # Define max time job will run
+#SBATCH --output=myscript.out   # Define your output file
+#SBATCH --error=myscript.err    # Define your error file
+
+module purge
+module load $MODULE_NAME # where $MODULE_NAME is a software in PModules
+srun $MYEXEC             # where $MYEXEC is a path to your binary file
+```
+
+## Multi-core based jobs example
+
+### Pure MPI
+
+```bash
+#!/bin/bash
+#SBATCH --job-name=purempi
+#SBATCH --partition=daily      # Using 'daily' will grant higher priority
+#SBATCH --time=24:00:00        # Define max time job will run
+#SBATCH --output=%x-%j.out     # Define your output file
+#SBATCH --error=%x-%j.err      # Define your error file
+#SBATCH --exclusive
+#SBATCH --nodes=1
+#SBATCH --ntasks=128
+#SBATCH --hint=nomultithread
+##SBATCH --cpus-per-task=1
+
+module purge
+module load $MODULE_NAME # where $MODULE_NAME is a software in PModules
+srun $MYEXEC             # where $MYEXEC is a path to your binary file
+```
+
+### Hybrid
+
+```bash
+#!/bin/bash
+#SBATCH --job-name=hybrid
+#SBATCH --partition=daily      # Using 'daily' will grant higher priority
+#SBATCH --time=24:00:00        # Define max time job will run
+#SBATCH --output=%x-%j.out     # Define your output file
+#SBATCH --error=%x-%j.err      # Define your error file
+#SBATCH --exclusive
+#SBATCH --nodes=1
+#SBATCH --ntasks=128
+#SBATCH --hint=multithread
+#SBATCH --cpus-per-task=2
+
+module purge
+module load $MODULE_NAME # where $MODULE_NAME is a software in PModules
+srun $MYEXEC             # where $MYEXEC is a path to your binary file
+```
+