This commit is contained in:
2025-12-11 13:30:12 +01:00
parent 25c81a6b4d
commit 5cbc746528
24 changed files with 179 additions and 190 deletions

View File

@@ -86,9 +86,8 @@ adjusted.
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
{{site.data.alerts.tip}}
Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
{{site.data.alerts.end}}
!!! tip
Always verify the Slurm `/var/spool/slurmd/conf-cache/slurm.conf` configuration file for potential changes.
### User and job limits with QoS
@@ -132,11 +131,10 @@ Where:
* **`cpu_interactive` QoS:** Is restricted to one node and a few CPUs only, and is intended to be used when interactive
allocations are necessary (`salloc`, `srun`).
For additional details, refer to the [CPU partitions](slurm-configuration.md#CPU-partitions) section.
For additional details, refer to the [CPU partitions](#cpu-partitions) section.
{{site.data.alerts.tip}}
Always verify QoS definitions for potential changes using the <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify QoS definitions for potential changes using the `sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"` command.
### CPU partitions
@@ -151,11 +149,10 @@ Key concepts:
partitions, where applicable.
* **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
QoS settings can be found in the [User and job limits with QoS](#user-and-job-limits-with-qos) section.
{{site.data.alerts.tip}}
Always verify partition configurations for potential changes using the <b>'scontrol show partition'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify partition configurations for potential changes using the `scontrol show partition` command.
#### CPU public partitions
@@ -169,11 +166,11 @@ Always verify partition configurations for potential changes using the <b>'scon
All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
{{site.data.alerts.tip}}
For jobs running less than one day, submit them to the <b>daily</b> partition.
For jobs running less than one hour, use the <b>hourly</b> partition.
These partitions provide higher priority and ensure quicker scheduling compared to <b>general</b>, which has limited node availability.
{{site.data.alerts.end}}
!!! tip
For jobs running less than one day, submit them to the **daily** partition.
For jobs running less than one hour, use the **hourly** partition. These
partitions provide higher priority and ensure quicker scheduling compared
to **general**, which has limited node availability.
The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed
by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the
@@ -188,10 +185,10 @@ before any jobs in other partitions.
* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where
immediate access is important.
{{site.data.alerts.warning}}
Because of CPU sharing, the performance on the **'interactive'** partition may not be optimal for compute-intensive tasks.
For long-running or production workloads, use a dedicated batch partition instead.
{{site.data.alerts.end}}
!!! warning
Because of CPU sharing, the performance on the **interactive** partition
may not be optimal for compute-intensive tasks. For long-running or
production workloads, use a dedicated batch partition instead.
#### CPU private partitions
@@ -261,9 +258,8 @@ adjusted.
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
{{site.data.alerts.tip}}
Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
{{site.data.alerts.end}}
!!! tip
Always verify the Slurm `/var/spool/slurmd/conf-cache/slurm.conf` configuration file for potential changes.
### User and job limits with QoS
@@ -308,11 +304,10 @@ Where:
* **`gpu_a100_interactive` & `gpu_gh_interactive` QoS:** Guarantee interactive access to GPU nodes for software compilation and
small testing.
For additional details, refer to the [GPU partitions](slurm-configuration.md#GPU-partitions) section.
For additional details, refer to the [GPU partitions](#gpu-partitions) section.
{{site.data.alerts.tip}}
Always verify QoS definitions for potential changes using the <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify QoS definitions for potential changes using the `sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"` command.
### GPU partitions
@@ -327,11 +322,10 @@ Key concepts:
partitions, where applicable.
* **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
QoS settings can be found in the [User and job limits with QoS](#user-and-job-limits-with-qos) section.
{{site.data.alerts.tip}}
Always verify partition configurations for potential changes using the <b>'scontrol show partition'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify partition configurations for potential changes using the `scontrol show partition` command.
#### A100-based partitions
@@ -345,11 +339,12 @@ Always verify partition configurations for potential changes using the <b>'scon
All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
{{site.data.alerts.tip}}
For jobs running less than one day, submit them to the <b>a100-daily</b> partition.
For jobs running less than one hour, use the <b>a100-hourly</b> partition.
These partitions provide higher priority and ensure quicker scheduling compared to <b>a100-general</b>, which has limited node availability.
{{site.data.alerts.end}}
!!! tip
For jobs running less than one day, submit them to the **a100-daily**
partition. For jobs running less than one hour, use the **a100-hourly**
partition. These partitions provide higher priority and ensure quicker
scheduling compared to **a100-general**, which has limited node
availability.
#### GH-based partitions
@@ -363,8 +358,8 @@ These partitions provide higher priority and ensure quicker scheduling compared
All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
{{site.data.alerts.tip}}
For jobs running less than one day, submit them to the <b>gh-daily</b> partition.
For jobs running less than one hour, use the <b>gh-hourly</b> partition.
These partitions provide higher priority and ensure quicker scheduling compared to <b>gh-general</b>, which has limited node availability.
{{site.data.alerts.end}}
!!! tip
For jobs running less than one day, submit them to the **gh-daily**
partition. For jobs running less than one hour, use the **gh-hourly**
partition. These partitions provide higher priority and ensure quicker
scheduling compared to **gh-general**, which has limited node availability.