This commit is contained in:
2025-12-11 13:30:12 +01:00
parent 84f9846a0c
commit 01ac18b3f4
24 changed files with 179 additions and 190 deletions

View File

@@ -7,9 +7,8 @@ This partition allows CPU oversubscription (up to four users may share the same
On the **`gmerlin7`** cluster, additional interactive partitions are available, but these are primarily intended for CPU-only workloads (such like compiling GPU-based software, or creating an allocation for submitting jobs to Grace-Hopper nodes).
{{site.data.alerts.warning}}
Because <b>GPU resources are scarce and expensive</b>, interactive allocations on GPU nodes that use GPUs should only be submitted when strictly necessary and well justified.
{{site.data.alerts.end}}
!!! warning
Because **GPU resources are scarce and expensive**, interactive allocations on GPU nodes that use GPUs should only be submitted when strictly necessary and well justified.
## Running interactive jobs

View File

@@ -86,9 +86,8 @@ adjusted.
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
{{site.data.alerts.tip}}
Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
{{site.data.alerts.end}}
!!! tip
Always verify the Slurm `/var/spool/slurmd/conf-cache/slurm.conf` configuration file for potential changes.
### User and job limits with QoS
@@ -132,11 +131,10 @@ Where:
* **`cpu_interactive` QoS:** Is restricted to one node and a few CPUs only, and is intended to be used when interactive
allocations are necessary (`salloc`, `srun`).
For additional details, refer to the [CPU partitions](slurm-configuration.md#CPU-partitions) section.
For additional details, refer to the [CPU partitions](#cpu-partitions) section.
{{site.data.alerts.tip}}
Always verify QoS definitions for potential changes using the <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify QoS definitions for potential changes using the `sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"` command.
### CPU partitions
@@ -151,11 +149,10 @@ Key concepts:
partitions, where applicable.
* **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
QoS settings can be found in the [User and job limits with QoS](#user-and-job-limits-with-qos) section.
{{site.data.alerts.tip}}
Always verify partition configurations for potential changes using the <b>'scontrol show partition'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify partition configurations for potential changes using the `scontrol show partition` command.
#### CPU public partitions
@@ -169,11 +166,11 @@ Always verify partition configurations for potential changes using the <b>'scon
All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
{{site.data.alerts.tip}}
For jobs running less than one day, submit them to the <b>daily</b> partition.
For jobs running less than one hour, use the <b>hourly</b> partition.
These partitions provide higher priority and ensure quicker scheduling compared to <b>general</b>, which has limited node availability.
{{site.data.alerts.end}}
!!! tip
For jobs running less than one day, submit them to the **daily** partition.
For jobs running less than one hour, use the **hourly** partition. These
partitions provide higher priority and ensure quicker scheduling compared
to **general**, which has limited node availability.
The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed
by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the
@@ -188,10 +185,10 @@ before any jobs in other partitions.
* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where
immediate access is important.
{{site.data.alerts.warning}}
Because of CPU sharing, the performance on the **'interactive'** partition may not be optimal for compute-intensive tasks.
For long-running or production workloads, use a dedicated batch partition instead.
{{site.data.alerts.end}}
!!! warning
Because of CPU sharing, the performance on the **interactive** partition
may not be optimal for compute-intensive tasks. For long-running or
production workloads, use a dedicated batch partition instead.
#### CPU private partitions
@@ -261,9 +258,8 @@ adjusted.
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
{{site.data.alerts.tip}}
Always verify the Slurm <b>'/var/spool/slurmd/conf-cache/slurm.conf'</b> configuration file for potential changes.
{{site.data.alerts.end}}
!!! tip
Always verify the Slurm `/var/spool/slurmd/conf-cache/slurm.conf` configuration file for potential changes.
### User and job limits with QoS
@@ -308,11 +304,10 @@ Where:
* **`gpu_a100_interactive` & `gpu_gh_interactive` QoS:** Guarantee interactive access to GPU nodes for software compilation and
small testing.
For additional details, refer to the [GPU partitions](slurm-configuration.md#GPU-partitions) section.
For additional details, refer to the [GPU partitions](#gpu-partitions) section.
{{site.data.alerts.tip}}
Always verify QoS definitions for potential changes using the <b>'sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify QoS definitions for potential changes using the `sacctmgr show qos format="Name%22,MaxTRESPU%35,MaxTRES%35"` command.
### GPU partitions
@@ -327,11 +322,10 @@ Key concepts:
partitions, where applicable.
* **`QoS`**: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability
for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various
QoS settings can be found in the [User and job limits with QoS](/merlin7/slurm-configuration.html#user-and-job-limits-with-qos) section.
QoS settings can be found in the [User and job limits with QoS](#user-and-job-limits-with-qos) section.
{{site.data.alerts.tip}}
Always verify partition configurations for potential changes using the <b>'scontrol show partition'</b> command.
{{site.data.alerts.end}}
!!! tip
Always verify partition configurations for potential changes using the `scontrol show partition` command.
#### A100-based partitions
@@ -345,11 +339,12 @@ Always verify partition configurations for potential changes using the <b>'scon
All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
{{site.data.alerts.tip}}
For jobs running less than one day, submit them to the <b>a100-daily</b> partition.
For jobs running less than one hour, use the <b>a100-hourly</b> partition.
These partitions provide higher priority and ensure quicker scheduling compared to <b>a100-general</b>, which has limited node availability.
{{site.data.alerts.end}}
!!! tip
For jobs running less than one day, submit them to the **a100-daily**
partition. For jobs running less than one hour, use the **a100-hourly**
partition. These partitions provide higher priority and ensure quicker
scheduling compared to **a100-general**, which has limited node
availability.
#### GH-based partitions
@@ -363,8 +358,8 @@ These partitions provide higher priority and ensure quicker scheduling compared
All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs.
Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default.
{{site.data.alerts.tip}}
For jobs running less than one day, submit them to the <b>gh-daily</b> partition.
For jobs running less than one hour, use the <b>gh-hourly</b> partition.
These partitions provide higher priority and ensure quicker scheduling compared to <b>gh-general</b>, which has limited node availability.
{{site.data.alerts.end}}
!!! tip
For jobs running less than one day, submit them to the **gh-daily**
partition. For jobs running less than one hour, use the **gh-hourly**
partition. These partitions provide higher priority and ensure quicker
scheduling compared to **gh-general**, which has limited node availability.