From e39402e98cfe5ba3629ac7349638050c0b40e0ec Mon Sep 17 00:00:00 2001 From: Derek Feichtinger Date: Fri, 11 Oct 2019 17:12:54 +0200 Subject: [PATCH] improved wording and some corrections to array/packed jobs --- .../03 merlin6-slurm/slurm-examples.md | 24 ++++++++++--------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/pages/merlin6/03 merlin6-slurm/slurm-examples.md b/pages/merlin6/03 merlin6-slurm/slurm-examples.md index 1f03a3b..1d1c107 100644 --- a/pages/merlin6/03 merlin6-slurm/slurm-examples.md +++ b/pages/merlin6/03 merlin6-slurm/slurm-examples.md @@ -154,10 +154,10 @@ options can be found in the following link: https://slurm.schedmd.com/sbatch.htm If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run advanced configurations unless your are sure of what you are doing. -## Array Jobs - how to launch a big number of similar jobs +## Array Jobs - launching a large number of related jobs -If you need to run a larger number of jobs using the same program with systematically varying inputs, -e.g. a parameter sweep, you can do this most easily in form of a **simple array job** +If you need to run a large number of jobs based on the same executable with systematically varying inputs, +e.g. for a parameter sweep, you can do this most easily in form of a **simple array job**. ``` bash #!/bin/bash @@ -172,9 +172,11 @@ srun myprogram config-file-${SLURM_ARRAY_TASK_ID}.dat ``` -This will run 8 independent jobs, where each job can use the counter variable `SLURM_ARRAY_TASK_ID` -to feed the correct input arguments or configuration file to the "myprogram" executable. Each job -will receive the same set of configurations (e.g. time limit of 8h in the example above). +This will run 8 independent jobs, where each job can use the counter +variable `SLURM_ARRAY_TASK_ID` defined by Slurm inside of the job's +environment to feed the correct input arguments or configuration file +to the "myprogram" executable. Each job will receive the same set of +configurations (e.g. time limit of 8h in the example above). The jobs are independent, but they will run in parallel (if the cluster resources allow for it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each @@ -192,18 +194,18 @@ FILES=(/path/to/data/*) srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]} ``` -Or for a trivial case you could supply the parameter to scan in form -of a parameter list +Or for a trivial case you could supply the values for a parameter scan in form +of a argument list that gets fed to the program using the counter variable. ``` bash ARGS=(0.05 0.25 0.5 1 2 5 100) srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]} ``` -## Array jobs for long running tasks with checkpoint files +## Array jobs for running very long tasks with checkpoint files -If you need to run a job for a much longer than the queues (partitions) allow, and -your executable is able to create checkpoints at intervals, you can use this +If you need to run a job for much longer than the queues (partitions) permit, and +your executable is able to create checkpoint files, you can use this strategy: ``` bash