initial formatting changes complete

2026-01-06 16:40:15 +01:00
parent 173f822230
commit 5f759a629a
81 changed files with 806 additions and 1113 deletions
--- a/docs/merlin7/03-Slurm-General-Documentation/interactive-jobs.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/interactive-jobs.md
@@ -24,17 +24,17 @@ Is run is used to run parallel jobs in the batch system. It can be used within a
 (which can be run with ``sbatch``), or within a job allocation (which can be run with ``salloc``).
 Also, it can be used as a direct command (in example, from the login nodes).

-When used inside a batch script or during a job allocation, ``srun`` is constricted to the 
-amount of resources allocated by the ``sbatch``/``salloc`` commands. In ``sbatch``, usually 
-these resources are defined inside the batch script with the format ``#SBATCH <option>=<value>``. 
-In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core) 
-and 2 nodes, ``srun`` is constricted to these amount of resources (you can use less, but never 
+When used inside a batch script or during a job allocation, ``srun`` is constricted to the
+amount of resources allocated by the ``sbatch``/``salloc`` commands. In ``sbatch``, usually
+these resources are defined inside the batch script with the format ``#SBATCH <option>=<value>``.
+In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core)
+and 2 nodes, ``srun`` is constricted to these amount of resources (you can use less, but never
 exceed those limits).

-When used from the login node, usually is used to run a specific command or software in an 
-interactive way. ``srun`` is a blocking process (it will block bash prompt until the ``srun`` 
-command finishes, unless you run it in background with ``&``). This can be very useful to run 
-interactive software which pops up a Window and then submits jobs or run sub-tasks in the 
+When used from the login node, usually is used to run a specific command or software in an
+interactive way. ``srun`` is a blocking process (it will block bash prompt until the ``srun``
+command finishes, unless you run it in background with ``&``). This can be very useful to run
+interactive software which pops up a Window and then submits jobs or run sub-tasks in the
 background (in example, **Relion**, **cisTEM**, etc.)

 Refer to ``man srun`` for exploring all possible options for that command.
@@ -65,7 +65,7 @@ prompt a new shell on the first allocated node). However, this behaviour can be
 a shell (`$SHELL`) at the end of the `salloc` command. In example:

 ```bash
-# Typical 'salloc' call 
+# Typical 'salloc' call
 salloc --clusters=merlin7 --partition=interactive -N 2 -n 2

 # Custom 'salloc' call
@@ -111,20 +111,21 @@ salloc: Relinquishing job allocation 165

 #### Graphical access

-[NoMachine](../02-How-To-Use-Merlin/nomachine.md) is the official supported service for graphical 
-access in the Merlin cluster. This service is running on the login nodes. Check the 
-document [{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for details about 
+[NoMachine](../02-How-To-Use-Merlin/nomachine.md) is the official supported service for graphical
+access in the Merlin cluster. This service is running on the login nodes. Check the
+document [{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for details about
 how to connect to the **NoMachine** service in the Merlin cluster.

 For other non officially supported graphical access (X11 forwarding):

 * For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](../02-How-To-Use-Merlin/connect-from-linux.md)
+
 * For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](../02-How-To-Use-Merlin/connect-from-windows.md)
 * For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-Merlin/connect-from-macos.md)

 ### 'srun' with x11 support

-Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to 
+Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to
 add the option ``--x11`` to the ``srun`` command. In example:

 ```bash
@@ -146,7 +147,7 @@ srun --clusters=merlin7 --partition=interactive --x11 --pty bash
 <pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
 caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 sview

-caubet_m@login001:~> 
+caubet_m@login001:~>

 caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 --pty bash

@@ -162,7 +163,7 @@ exit

 ### 'salloc' with x11 support

-**Merlin6** and **merlin7** clusters allow running any windows based applications. For that, you need to 
+**Merlin6** and **merlin7** clusters allow running any windows based applications. For that, you need to
 add the option ``--x11`` to the ``salloc`` command. In example:

 ```bash
@@ -172,7 +173,7 @@ salloc --clusters=merlin7 --partition=interactive --x11 sview
 will popup a X11 based slurm view of the cluster.

 In the same manner, you can create a bash shell with x11 support. For doing that, you need
-to add to run just ``salloc --clusters=merlin7 --partition=interactive --x11``. Once resource is allocated, from 
+to add to run just ``salloc --clusters=merlin7 --partition=interactive --x11``. Once resource is allocated, from
 there you can interactively run X11 and non-X11 based commands.

 ```bash
@@ -187,10 +188,10 @@ salloc: Granted job allocation 174
 salloc: Nodes cn001 are ready for job
 salloc: Relinquishing job allocation 174

-caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11 
+caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11
 salloc: Granted job allocation 175
 salloc: Nodes cn001 are ready for job
-caubet_m@cn001:~> 
+caubet_m@cn001:~>

 caubet_m@cn001:~> sview

--- a/docs/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md
@@ -1,12 +1,4 @@
---
-title: Slurm cluster 'merlin7'
-#tags:
-keywords: configuration, partitions, node definition
-#last_updated: 24 Mai 2023
-summary: "This document describes a summary of the Merlin7 configuration."
-sidebar: merlin7_sidebar
-permalink: /merlin7/merlin7-configuration.html
---
+# Slurm cluster 'merlin7'

 This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.

@@ -14,10 +6,10 @@ This documentation shows basic Slurm configuration and options needed to run job

 ### Hardware

-  * 2 CPU-only login nodes
-  * 77 CPU-only compute nodes
-  * 5 GPU A100 nodes
-  * 8 GPU Grace Hopper nodes
+    * 2 CPU-only login nodes
+    * 77 CPU-only compute nodes
+    * 5 GPU A100 nodes
+    * 8 GPU Grace Hopper nodes

 The specification of the node types is:

@@ -51,9 +43,9 @@ The appliance is built of several storage servers:
 With effective storage capacity of:

 * 10 PB HDD
-  * value visible on linux: HDD 9302.4 TiB
+    * value visible on linux: HDD 9302.4 TiB
 * 162 TB SSD
-  * value visible on linux: SSD 151.6 TiB
+    * value visible on linux: SSD 151.6 TiB
 * 23.6 TiB on Metadata

 The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.
--- a/docs/merlin7/03-Slurm-General-Documentation/slurm-configuration.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/slurm-configuration.md
@@ -1,12 +1,4 @@
---
-title: Slurm merlin7 Configuration
-#tags:
-keywords: configuration, partitions, node definition
-#last_updated: 24 Mai 2023
-summary: "This document describes a summary of the Merlin7 Slurm CPU-based configuration."
-sidebar: merlin7_sidebar
-permalink: /merlin7/slurm-configuration.html
---
+# Slurm merlin7 Configuration

 This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.

@@ -14,7 +6,7 @@ This documentation shows basic Slurm configuration and options needed to run job

 ### CPU public partitions

-| PartitionName      |  DefaultTime  | MaxTime     | Priority | Account          | Per Job Limits        | Per User Limits       | 
+| PartitionName      |  DefaultTime  | MaxTime     | Priority | Account          | Per Job Limits        | Per User Limits       |
 | -----------------: |  -----------: | ----------: | -------: | ---------------: | --------------------: | --------------------: |
 | **<u>general</u>** |  1-00:00:00   | 7-00:00:00  | Low      | <u>merlin</u>    | cpu=1024,mem=1920G    | cpu=1024,mem=1920G    |
 | **daily**          |  0-01:00:00   | 1-00:00:00  | Medium   | <u>merlin</u>    | cpu=1024,mem=1920G    | cpu=2048,mem=3840G    |
@@ -31,7 +23,7 @@ This documentation shows basic Slurm configuration and options needed to run job
 | **a100-daily**       | 0-01:00:00   | 1-00:00:00  | Medium     | <u>merlin</u>  | gres/gpu=8                       | gres/gpu=8                       |
 | **a100-hourly**      | 0-00:30:00   | 0-01:00:00  | High       | <u>merlin</u>  | gres/gpu=8                       | gres/gpu=8                       |
 | **a100-interactive** | 0-01:00:00   | 0-12:00:00  | Very High  | <u>merlin</u>  | cpu=16,gres/gpu=1,mem=60G,node=1 | cpu=16,gres/gpu=1,mem=60G,node=1 |
-                                                                                   
+
 #### Grace-Hopper nodes

 | PartitionName        | DefaultTime  | MaxTime     | Priority   | Account        | Per Job Limits                   | Per User Limits                  |
@@ -53,8 +45,9 @@ However, when necessary, one can specify the cluster as follows:
 ### CPU general configuration

 The **Merlin7 CPU cluster** is configured with the **`CR_CORE_MEMORY`** and **`CR_ONE_TASK_PER_CORE`** options.
+
 * This configuration treats both cores and memory as consumable resources.
-* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU 
+* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
  to fulfill a job's resource requirements.

 By default, Slurm will allocate one task per core, which means:
@@ -75,15 +68,15 @@ scripts accordingly.

 Notes on memory configuration:
 * **Memory allocation options:** To request additional memory, use the following options in your submission script:
-   * **`--mem=<mem_in_MB>`**: Allocates memory per node.
-   * **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
+    * **`--mem=<mem_in_MB>`**: Allocates memory per node.
+    * **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).

  The total memory requested cannot exceed the **`MaxMemPerNode`** value.
-* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core, 
+* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
 effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
 adjusted.

-  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. 
+  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
  In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.

 !!! tip
@@ -93,19 +86,19 @@ adjusted.

 In the `merlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
 overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
-efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific 
+efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
 limits may result in pending jobs even when many nodes are idle due to low activity.

 On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
-all available resources, which could block other jobs from running. Without job size limits, for instance, a large job 
+all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
 might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.

-Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These 
-limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist 
+Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
+limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
 effectively.

-To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied 
-**to specific partitions** in line with the established resource allocation policies. The table below outlines the 
+To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
+**to specific partitions** in line with the established resource allocation policies. The table below outlines the
 various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
 * `MaxTRES` specifies resource limits per job.
 * `MaxTRESPU` specifies resource limits per user.
@@ -119,7 +112,7 @@ various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
 |  **cpu_interactive** | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 | partition              |

 Where:
-* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job 
+* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
  restrictions.
 * **`cpu_general` QoS:** This is the **default QoS** for `merlin7` _users_. It limits the total resources available to each
  user. Additionally, this QoS is applied to the `general` partition, enforcing restrictions at the partition level and
@@ -172,17 +165,17 @@ Similarly, if no partition is specified, jobs are automatically submitted to the
    partitions provide higher priority and ensure quicker scheduling compared
    to **general**, which has limited node availability.

-The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed 
-by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the 
+The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed
+by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the
 **`hourly`** partition might experience delays in such scenarios.

 The **`interactive`** partition is designed specifically for real-time, interactive work. Here are the key characteristics:

-* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive 
+* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive
 jobs may share the same physical CPU core. This can impact performance, but enables fast access for short-term tasks.
-* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled 
+* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled
 before any jobs in other partitions.
-* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where 
+* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where
 immediate access is important.

 !!! warning
@@ -223,12 +216,14 @@ For submittng jobs to the GPU cluster, **the cluster name `gmerlin7` must be spe
 ### GPU general configuration

 The **Merlin7 GPU cluster** is configured with the **`CR_CORE_MEMORY`**, **`CR_ONE_TASK_PER_CORE`**, and **`ENFORCE_BINDING_GRES`** options.
+
 * This configuration treats both cores and memory as consumable resources.
-* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU 
+* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
  to fulfill a job's resource requirements.
 * Slurm will allocate the CPUs to the selected GPU.

 By default, Slurm will allocate one task per core, which means:
+
 * For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job.
 * For the NVIDIA GraceHopper-based nodes, each task will consume 1 **CPU**.

@@ -247,15 +242,16 @@ scripts accordingly.

 Notes on memory configuration:
 * **Memory allocation options:** To request additional memory, use the following options in your submission script:
-   * **`--mem=<mem_in_MB>`**: Allocates memory per node.
-   * **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
+    * **`--mem=<mem_in_MB>`**: Allocates memory per node.
+    * **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).

  The total memory requested cannot exceed the **`MaxMemPerNode`** value.
-* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core, 
+
+* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
 effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
 adjusted.

-  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. 
+  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
  In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.

 !!! tip
@@ -265,20 +261,22 @@ adjusted.

 In the `gmerlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
 overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
-efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific 
+efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
 limits may result in pending jobs even when many nodes are idle due to low activity.

 On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
-all available resources, which could block other jobs from running. Without job size limits, for instance, a large job 
+all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
 might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.

-Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These 
-limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist 
+Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
+limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
 effectively.

-To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied 
-**to specific partitions** in line with the established resource allocation policies. The table below outlines the 
+To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
+**to specific partitions** in line with the established resource allocation policies. The table below outlines the
+
 various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
+
 * `MaxTRES` specifies resource limits per job.
 * `MaxTRESPU` specifies resource limits per user.

@@ -292,7 +290,7 @@ various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
 | **gpu_a100_interactive** | cpu=16,gres/gpu=1,mem=60G,node=1 |cpu=16,gres/gpu=1,mem=60G,node=1 | partition              |

 Where:
-* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job 
+* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
  restrictions.
 * **`gpu_general` QoS:** This is the **default QoS** for `gmerlin7` _users_. It limits the total resources available to each
  user. Additionally, this QoS is applied to the `[a100|gh]-general` partitions, enforcing restrictions at the partition level and
--- a/docs/merlin7/03-Slurm-General-Documentation/slurm-examples.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/slurm-examples.md
@@ -1,12 +1,4 @@
---
-title: Slurm Examples
-#tags:
-keywords: slurm example, template, examples, templates, running jobs, sbatch, single core based jobs, HT, multithread, no-multithread, mpi, openmp, packed jobs, hands-on, array jobs, gpu
-last_updated: 24 Mai 2023
-summary: "This document shows different template examples for running jobs in the Merlin cluster."
-sidebar: merlin7_sidebar
-permalink: /merlin7/slurm-examples.html
---
+# Slurm Examples

 ## Single core based job examples