vibe #3

2025-12-11 13:30:12 +01:00
parent 25c81a6b4d
commit 5cbc746528
24 changed files with 179 additions and 190 deletions
--- a/docs/merlin7/03-Slurm-General-Documentation/slurm-configuration.md
+++ b/docs/merlin7/03-Slurm-General-Documentation/slurm-configuration.md
@@ -86,9 +86,8 @@ adjusted.
  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. 
  In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.

-{{site.data.alerts.tip}}
-Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
-{{site.data.alerts.end}}
+!!! tip
+    Always verify the Slurm `/var/spool/slurmd/conf-cache/slurm.conf` configuration file for potential changes.

 ### User and job limits with QoS

@@ -132,11 +131,10 @@ Where:
 * **`cpu_interactive` QoS:** Is restricted to one node and a few CPUs only, and is intended to be used when interactive
 allocations are necessary (`salloc`, `srun`).

-For additional details, refer to the [CPU partitions](slurm-configuration.md#CPU-partitions) section.
+For additional details, refer to the [CPU partitions](#cpu-partitions) section.

-{{site.data.alerts.tip}}
-Always verify QoS definitions for potential changes using the  <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
-{{site.data.alerts.end}}
+!!! tip
+    Always verify QoS definitions for potential changes using the `sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"` command.

 ### CPU partitions

@@ -151,11 +149,10 @@ Key concepts:
  partitions, where applicable.
 * **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
  for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
-  QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
+  QoS settings can be found in the [User and job limits with QoS](#user-and-job-limits-with-qos) section.

-{{site.data.alerts.tip}}
-Always verify partition configurations for potential changes using the  <b>'scontrol show partition'</b> command.
-{{site.data.alerts.end}}
+!!! tip
+    Always verify partition configurations for potential changes using the `scontrol show partition` command.

 #### CPU public partitions

@@ -169,11 +166,11 @@ Always verify partition configurations for potential changes using the  <b>'scon
 All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
 Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.

-{{site.data.alerts.tip}}
-For jobs running less than one day, submit them to the <b>daily</b> partition.
-For jobs running less than one hour, use the <b>hourly</b> partition.
-These partitions provide higher priority and ensure quicker scheduling compared to <b>general</b>, which has limited node availability.
-{{site.data.alerts.end}}
+!!! tip
+    For jobs running less than one day, submit them to the **daily** partition.
+    For jobs running less than one hour, use the **hourly** partition.  These
+    partitions provide higher priority and ensure quicker scheduling compared
+    to **general**, which has limited node availability.

 The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed 
 by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the 
@@ -188,10 +185,10 @@ before any jobs in other partitions.
 * **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where 
 immediate access is important.

-{{site.data.alerts.warning}}
-Because of CPU sharing, the performance on the **'interactive'** partition may not be optimal for compute-intensive tasks. 
-For long-running or production workloads, use a dedicated batch partition instead.
-{{site.data.alerts.end}}
+!!! warning
+    Because of CPU sharing, the performance on the **interactive** partition
+    may not be optimal for compute-intensive tasks.  For long-running or
+    production workloads, use a dedicated batch partition instead.

 #### CPU private partitions

@@ -261,9 +258,8 @@ adjusted.
  For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. 
  In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.

-{{site.data.alerts.tip}}
-Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
-{{site.data.alerts.end}}
+!!! tip
+    Always verify the Slurm `/var/spool/slurmd/conf-cache/slurm.conf` configuration file for potential changes.

 ### User and job limits with QoS

@@ -308,11 +304,10 @@ Where:
 * **`gpu_a100_interactive` & `gpu_gh_interactive` QoS:** Guarantee interactive access to GPU nodes for software compilation and
  small testing.

-For additional details, refer to the [GPU partitions](slurm-configuration.md#GPU-partitions) section.
+For additional details, refer to the [GPU partitions](#gpu-partitions) section.

-{{site.data.alerts.tip}}
-Always verify QoS definitions for potential changes using the  <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
-{{site.data.alerts.end}}
+!!! tip
+    Always verify QoS definitions for potential changes using the `sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"` command.

 ### GPU partitions

@@ -327,11 +322,10 @@ Key concepts:
  partitions, where applicable.
 * **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
  for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
-  QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
+  QoS settings can be found in the [User and job limits with QoS](#user-and-job-limits-with-qos) section.

-{{site.data.alerts.tip}}
-Always verify partition configurations for potential changes using the  <b>'scontrol show partition'</b> command.
-{{site.data.alerts.end}}
+!!! tip
+    Always verify partition configurations for potential changes using the `scontrol show partition` command.

 #### A100-based partitions

@@ -345,11 +339,12 @@ Always verify partition configurations for potential changes using the  <b>'scon
 All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
 Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.

-{{site.data.alerts.tip}}
-For jobs running less than one day, submit them to the <b>a100-daily</b> partition.
-For jobs running less than one hour, use the <b>a100-hourly</b> partition.
-These partitions provide higher priority and ensure quicker scheduling compared to <b>a100-general</b>, which has limited node availability.
-{{site.data.alerts.end}}
+!!! tip
+    For jobs running less than one day, submit them to the **a100-daily**
+    partition.  For jobs running less than one hour, use the **a100-hourly**
+    partition.  These partitions provide higher priority and ensure quicker
+    scheduling compared to **a100-general**, which has limited node
+    availability.

 #### GH-based partitions

@@ -363,8 +358,8 @@ These partitions provide higher priority and ensure quicker scheduling compared
 All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
 Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.

-{{site.data.alerts.tip}}
-For jobs running less than one day, submit them to the <b>gh-daily</b> partition.
-For jobs running less than one hour, use the <b>gh-hourly</b> partition.
-These partitions provide higher priority and ensure quicker scheduling compared to <b>gh-general</b>, which has limited node availability.
-{{site.data.alerts.end}}
+!!! tip
+    For jobs running less than one day, submit them to the **gh-daily**
+    partition.  For jobs running less than one hour, use the **gh-hourly**
+    partition.  These partitions provide higher priority and ensure quicker
+    scheduling compared to **gh-general**, which has limited node availability.