Add packed jobs documentation
Build and deploy documentation / build-and-deploy-docs (push) Successful in 32s
Build and deploy documentation / build-and-deploy-docs (push) Successful in 32s
This commit is contained in:
@@ -1,6 +1,8 @@
|
||||
# Slurm Examples
|
||||
|
||||
## Single core based job examples
|
||||
## Basic examples
|
||||
|
||||
### Single core based job examples
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
@@ -16,9 +18,9 @@ module load $MODULE_NAME # where $MODULE_NAME is a software in PModules
|
||||
srun $MYEXEC # where $MYEXEC is a path to your binary file
|
||||
```
|
||||
|
||||
## Multi-core based jobs example
|
||||
### Multi-core based jobs example
|
||||
|
||||
### Pure MPI
|
||||
#### Pure MPI
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
@@ -38,7 +40,7 @@ module load $MODULE_NAME # where $MODULE_NAME is a software in PModules
|
||||
srun $MYEXEC # where $MYEXEC is a path to your binary file
|
||||
```
|
||||
|
||||
### Hybrid
|
||||
#### Hybrid
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
@@ -58,3 +60,173 @@ module load $MODULE_NAME # where $MODULE_NAME is a software in PModules
|
||||
srun $MYEXEC # where $MYEXEC is a path to your binary file
|
||||
```
|
||||
|
||||
## Advanced examples
|
||||
|
||||
### Packed jobs: running many short tasks inside one allocation
|
||||
|
||||
Launching a Slurm job has some overhead. If you have hundreds or thousands of short, independent tasks, avoid submitting each task as a separate Slurm job. **This creates unnecessary scheduler load** and often increases the total time your workflow spends waiting in the queue.
|
||||
|
||||
A better approach is to use a **packed job**: request one Slurm allocation with enough CPUs, then run several short tasks in parallel inside that allocation.
|
||||
|
||||
!!! tip "When packed jobs are useful"
|
||||
|
||||
- Each task is short
|
||||
- The tasks are independent from each other
|
||||
- Each task uses one or a small fixed number of CPU cores
|
||||
- You want to limit how many tasks run at the same time
|
||||
|
||||
!!! danger "When not to use packed jobs"
|
||||
|
||||
Packed jobs are not always the best solution. Consider other approaches if:
|
||||
|
||||
- Each task is long-running;
|
||||
- Tasks have very different runtimes and need advanced scheduling;
|
||||
- Each task requires many nodes;
|
||||
- You need Slurm accounting for each task as a separate job;
|
||||
- Failed tasks must be automatically retried or tracked individually.
|
||||
|
||||
In those cases, a Slurm job array may be more appropriate.
|
||||
|
||||
!!! warning
|
||||
Do not start more parallel tasks than the CPUs requested from Slurm. For example, if your job requests `--cpus-per-task=4`, run at most 4 single tasks at the same time.
|
||||
|
||||
#### Recommended pattern: Control parallelism inside the job script
|
||||
|
||||
The following example requests 4 CPUs from Slurm and runs 12 short tasks in total. At most 4 tasks are active at the same time, matching the number of CPUs requested with `--cpus-per-task=4`.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
#SBATCH --job-name=stress_single_job
|
||||
#SBATCH --partition=hourly
|
||||
#SBATCH --time=00:05:00
|
||||
#SBATCH --cpus-per-task=4
|
||||
#SBATCH --mem=1G
|
||||
#SBATCH --output=stress_single_job_%j.out
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
TASKS=12
|
||||
MAX_PARALLEL="${SLURM_CPUS_PER_TASK:-1}"
|
||||
|
||||
run_one_task() {
|
||||
local idx="$1"
|
||||
|
||||
echo "[$(date '+%F %T')] starting task ${idx} on host $(hostname)"
|
||||
|
||||
# Replace this command with your real workload.
|
||||
# This simulates around 20 seconds of single-core CPU work.
|
||||
stress-ng --cpu 1 --timeout 20s --metrics-brief
|
||||
|
||||
echo "[$(date '+%F %T')] finished task ${idx}"
|
||||
}
|
||||
|
||||
export -f run_one_task
|
||||
|
||||
active=0
|
||||
|
||||
for i in $(seq 1 "${TASKS}"); do
|
||||
bash -lc "run_one_task '${i}'" &
|
||||
((active+=1))
|
||||
|
||||
if [ "${active}" -ge "${MAX_PARALLEL}" ]; then
|
||||
wait -n
|
||||
((active-=1))
|
||||
fi
|
||||
done
|
||||
|
||||
wait
|
||||
echo "All tasks completed"
|
||||
```
|
||||
|
||||
!!! note
|
||||
Replace the `stress-ng` command with the command you actually want to run. For example:
|
||||
|
||||
```bash
|
||||
./myprog "${idx}"
|
||||
```
|
||||
|
||||
In this example:
|
||||
|
||||
- Slurm allocates one job with 4 CPUs.
|
||||
- The script launches 12 tasks in total.
|
||||
- Only 4 tasks run in parallel.
|
||||
- When one task finishes, the next one starts.
|
||||
- The job finishes only after all background tasks have completed.
|
||||
|
||||
The `&` starts each task in the background. The `wait -n` command waits until one background task finishes before launching more work.
|
||||
The final `wait` ensures that the script does not exit until all remaining tasks have completed.
|
||||
|
||||
!!! tip
|
||||
The number of simultaneously running tasks should match the resources requested from Slurm.
|
||||
|
||||
* *For single threaded tasks*, and example would be:
|
||||
|
||||
```bash
|
||||
#SBATCH --cpus-per-task=8
|
||||
|
||||
MAX_PARALLEL="${SLURM_CPUS_PER_TASK:-1}"
|
||||
```
|
||||
|
||||
This means that up to 8 single tasks may run at the same time.
|
||||
|
||||
* *For tasks that use multiple threads each*, reduce the number of parallel tasks accordingly.
|
||||
For example, if every task uses 4 CPU threads and the job requests 16 CPUs, then run at most 4 tasks in parallel.
|
||||
|
||||
```bash
|
||||
#SBATCH --cpus-per-task=16
|
||||
|
||||
CPUS_PER_WORKER=4
|
||||
MAX_PARALLEL=$((SLURM_CPUS_PER_TASK / CPUS_PER_WORKER))
|
||||
```
|
||||
|
||||
You should also make sure that the application itself uses the expected number of threads, for example:
|
||||
|
||||
```bash
|
||||
export OMP_NUM_THREADS="${CPUS_PER_WORKER}"
|
||||
```
|
||||
|
||||
#### Alternative pattern: Using `srun` for each packed task
|
||||
|
||||
For some workflows it can be useful to launch each internal task with `srun`. This gives Slurm more visibility of each step inside the allocation.
|
||||
|
||||
!!! danger
|
||||
|
||||
Using `srun` inside a packed job is valid and gives Slurm visibility of each task as a job step. However, **every `srun` creates additional Slurm step-management overhead.**
|
||||
For many very short tasks, it is usually better to request the required CPUs once and control the parallelism inside the job script using Bash, GNU Parallel, or a similar
|
||||
workflow tool. Use `srun` mainly when you need Slurm to launch MPI tasks, enforce step-level resource isolation, or track each task as a Slurm job step.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
#SBATCH --job-name=packed_srun_example
|
||||
#SBATCH --partition=hourly
|
||||
#SBATCH --time=00:10:00
|
||||
#SBATCH --ntasks=4
|
||||
#SBATCH --cpus-per-task=1
|
||||
#SBATCH --mem=1G
|
||||
#SBATCH --output=packed_srun_example_%j.out
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
TASKS=12
|
||||
MAX_PARALLEL="${SLURM_NTASKS:-1}"
|
||||
|
||||
active=0
|
||||
|
||||
for i in $(seq 1 "${TASKS}"); do
|
||||
srun --nodes=1 --ntasks=1 --cpus-per-task=1 --exclusive ./myprog "${i}" &
|
||||
((active+=1))
|
||||
|
||||
if [ "${active}" -ge "${MAX_PARALLEL}" ]; then
|
||||
wait -n
|
||||
((active-=1))
|
||||
fi
|
||||
done
|
||||
|
||||
wait
|
||||
echo "All tasks completed"
|
||||
```
|
||||
|
||||
In this case, the job requests 4 Slurm tasks, and each internal `srun` step consumes one of them. The `--exclusive` option on the `srun` command prevents several job steps from sharing the same allocated CPU resources.
|
||||
|
||||
!!! note
|
||||
The `--exclusive` option shown here belongs to `srun`. It is not the same as using `#SBATCH --exclusive`, which would request exclusive access to whole nodes and is usually not what you want for packed short tasks.
|
||||
|
||||
Reference in New Issue
Block a user