initial formatting changes complete
This commit is contained in:
@@ -1,12 +1,4 @@
|
||||
---
|
||||
title: Slurm merlin7 Configuration
|
||||
#tags:
|
||||
keywords: configuration, partitions, node definition
|
||||
#last_updated: 24 Mai 2023
|
||||
summary: "This document describes a summary of the Merlin7 Slurm CPU-based configuration."
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/slurm-configuration.html
|
||||
---
|
||||
# Slurm merlin7 Configuration
|
||||
|
||||
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
|
||||
|
||||
@@ -14,7 +6,7 @@ This documentation shows basic Slurm configuration and options needed to run job
|
||||
|
||||
### CPU public partitions
|
||||
|
||||
| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits |
|
||||
| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits |
|
||||
| -----------------: | -----------: | ----------: | -------: | ---------------: | --------------------: | --------------------: |
|
||||
| **<u>general</u>** | 1-00:00:00 | 7-00:00:00 | Low | <u>merlin</u> | cpu=1024,mem=1920G | cpu=1024,mem=1920G |
|
||||
| **daily** | 0-01:00:00 | 1-00:00:00 | Medium | <u>merlin</u> | cpu=1024,mem=1920G | cpu=2048,mem=3840G |
|
||||
@@ -31,7 +23,7 @@ This documentation shows basic Slurm configuration and options needed to run job
|
||||
| **a100-daily** | 0-01:00:00 | 1-00:00:00 | Medium | <u>merlin</u> | gres/gpu=8 | gres/gpu=8 |
|
||||
| **a100-hourly** | 0-00:30:00 | 0-01:00:00 | High | <u>merlin</u> | gres/gpu=8 | gres/gpu=8 |
|
||||
| **a100-interactive** | 0-01:00:00 | 0-12:00:00 | Very High | <u>merlin</u> | cpu=16,gres/gpu=1,mem=60G,node=1 | cpu=16,gres/gpu=1,mem=60G,node=1 |
|
||||
|
||||
|
||||
#### Grace-Hopper nodes
|
||||
|
||||
| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits |
|
||||
@@ -53,8 +45,9 @@ However, when necessary, one can specify the cluster as follows:
|
||||
### CPU general configuration
|
||||
|
||||
The **Merlin7 CPU cluster** is configured with the **`CR_CORE_MEMORY`** and **`CR_ONE_TASK_PER_CORE`** options.
|
||||
|
||||
* This configuration treats both cores and memory as consumable resources.
|
||||
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
||||
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
||||
to fulfill a job's resource requirements.
|
||||
|
||||
By default, Slurm will allocate one task per core, which means:
|
||||
@@ -75,15 +68,15 @@ scripts accordingly.
|
||||
|
||||
Notes on memory configuration:
|
||||
* **Memory allocation options:** To request additional memory, use the following options in your submission script:
|
||||
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
||||
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
||||
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
||||
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
||||
|
||||
The total memory requested cannot exceed the **`MaxMemPerNode`** value.
|
||||
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
||||
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
||||
effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
|
||||
adjusted.
|
||||
|
||||
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
||||
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
||||
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
|
||||
|
||||
!!! tip
|
||||
@@ -93,19 +86,19 @@ adjusted.
|
||||
|
||||
In the `merlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
|
||||
overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
|
||||
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
||||
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
||||
limits may result in pending jobs even when many nodes are idle due to low activity.
|
||||
|
||||
On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
|
||||
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
||||
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
||||
might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
|
||||
|
||||
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
||||
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
||||
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
||||
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
||||
effectively.
|
||||
|
||||
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
||||
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
||||
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
||||
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
||||
various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
||||
* `MaxTRES` specifies resource limits per job.
|
||||
* `MaxTRESPU` specifies resource limits per user.
|
||||
@@ -119,7 +112,7 @@ various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
||||
| **cpu_interactive** | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 | partition |
|
||||
|
||||
Where:
|
||||
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
||||
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
||||
restrictions.
|
||||
* **`cpu_general` QoS:** This is the **default QoS** for `merlin7` _users_. It limits the total resources available to each
|
||||
user. Additionally, this QoS is applied to the `general` partition, enforcing restrictions at the partition level and
|
||||
@@ -172,17 +165,17 @@ Similarly, if no partition is specified, jobs are automatically submitted to the
|
||||
partitions provide higher priority and ensure quicker scheduling compared
|
||||
to **general**, which has limited node availability.
|
||||
|
||||
The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed
|
||||
by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the
|
||||
The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed
|
||||
by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the
|
||||
**`hourly`** partition might experience delays in such scenarios.
|
||||
|
||||
The **`interactive`** partition is designed specifically for real-time, interactive work. Here are the key characteristics:
|
||||
|
||||
* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive
|
||||
* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive
|
||||
jobs may share the same physical CPU core. This can impact performance, but enables fast access for short-term tasks.
|
||||
* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled
|
||||
* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled
|
||||
before any jobs in other partitions.
|
||||
* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where
|
||||
* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where
|
||||
immediate access is important.
|
||||
|
||||
!!! warning
|
||||
@@ -223,12 +216,14 @@ For submittng jobs to the GPU cluster, **the cluster name `gmerlin7` must be spe
|
||||
### GPU general configuration
|
||||
|
||||
The **Merlin7 GPU cluster** is configured with the **`CR_CORE_MEMORY`**, **`CR_ONE_TASK_PER_CORE`**, and **`ENFORCE_BINDING_GRES`** options.
|
||||
|
||||
* This configuration treats both cores and memory as consumable resources.
|
||||
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
||||
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
||||
to fulfill a job's resource requirements.
|
||||
* Slurm will allocate the CPUs to the selected GPU.
|
||||
|
||||
By default, Slurm will allocate one task per core, which means:
|
||||
|
||||
* For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job.
|
||||
* For the NVIDIA GraceHopper-based nodes, each task will consume 1 **CPU**.
|
||||
|
||||
@@ -247,15 +242,16 @@ scripts accordingly.
|
||||
|
||||
Notes on memory configuration:
|
||||
* **Memory allocation options:** To request additional memory, use the following options in your submission script:
|
||||
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
||||
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
||||
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
||||
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
||||
|
||||
The total memory requested cannot exceed the **`MaxMemPerNode`** value.
|
||||
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
||||
|
||||
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
||||
effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
|
||||
adjusted.
|
||||
|
||||
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
||||
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
||||
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
|
||||
|
||||
!!! tip
|
||||
@@ -265,20 +261,22 @@ adjusted.
|
||||
|
||||
In the `gmerlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
|
||||
overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
|
||||
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
||||
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
||||
limits may result in pending jobs even when many nodes are idle due to low activity.
|
||||
|
||||
On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
|
||||
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
||||
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
||||
might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
|
||||
|
||||
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
||||
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
||||
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
||||
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
||||
effectively.
|
||||
|
||||
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
||||
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
||||
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
||||
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
||||
|
||||
various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
||||
|
||||
* `MaxTRES` specifies resource limits per job.
|
||||
* `MaxTRESPU` specifies resource limits per user.
|
||||
|
||||
@@ -292,7 +290,7 @@ various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
||||
| **gpu_a100_interactive** | cpu=16,gres/gpu=1,mem=60G,node=1 |cpu=16,gres/gpu=1,mem=60G,node=1 | partition |
|
||||
|
||||
Where:
|
||||
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
||||
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
||||
restrictions.
|
||||
* **`gpu_general` QoS:** This is the **default QoS** for `gmerlin7` _users_. It limits the total resources available to each
|
||||
user. Additionally, this QoS is applied to the `[a100|gh]-general` partitions, enforcing restrictions at the partition level and
|
||||
|
||||
Reference in New Issue
Block a user