improved wording and some corrections to array/packed jobs

This commit is contained in:
2019-10-11 17:12:54 +02:00
parent ce72b9e9c2
commit e39402e98c

View File

@ -154,10 +154,10 @@ options can be found in the following link: https://slurm.schedmd.com/sbatch.htm
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
advanced configurations unless your are sure of what you are doing.
## Array Jobs - how to launch a big number of similar jobs
## Array Jobs - launching a large number of related jobs
If you need to run a larger number of jobs using the same program with systematically varying inputs,
e.g. a parameter sweep, you can do this most easily in form of a **simple array job**
If you need to run a large number of jobs based on the same executable with systematically varying inputs,
e.g. for a parameter sweep, you can do this most easily in form of a **simple array job**.
``` bash
#!/bin/bash
@ -172,9 +172,11 @@ srun myprogram config-file-${SLURM_ARRAY_TASK_ID}.dat
```
This will run 8 independent jobs, where each job can use the counter variable `SLURM_ARRAY_TASK_ID`
to feed the correct input arguments or configuration file to the "myprogram" executable. Each job
will receive the same set of configurations (e.g. time limit of 8h in the example above).
This will run 8 independent jobs, where each job can use the counter
variable `SLURM_ARRAY_TASK_ID` defined by Slurm inside of the job's
environment to feed the correct input arguments or configuration file
to the "myprogram" executable. Each job will receive the same set of
configurations (e.g. time limit of 8h in the example above).
The jobs are independent, but they will run in parallel (if the cluster resources allow for
it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each
@ -192,18 +194,18 @@ FILES=(/path/to/data/*)
srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]}
```
Or for a trivial case you could supply the parameter to scan in form
of a parameter list
Or for a trivial case you could supply the values for a parameter scan in form
of a argument list that gets fed to the program using the counter variable.
``` bash
ARGS=(0.05 0.25 0.5 1 2 5 100)
srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}
```
## Array jobs for long running tasks with checkpoint files
## Array jobs for running very long tasks with checkpoint files
If you need to run a job for a much longer than the queues (partitions) allow, and
your executable is able to create checkpoints at intervals, you can use this
If you need to run a job for much longer than the queues (partitions) permit, and
your executable is able to create checkpoint files, you can use this
strategy:
``` bash