improved wording and some corrections to array/packed jobs
This commit is contained in:
@ -154,10 +154,10 @@ options can be found in the following link: https://slurm.schedmd.com/sbatch.htm
|
|||||||
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
|
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
|
||||||
advanced configurations unless your are sure of what you are doing.
|
advanced configurations unless your are sure of what you are doing.
|
||||||
|
|
||||||
## Array Jobs - how to launch a big number of similar jobs
|
## Array Jobs - launching a large number of related jobs
|
||||||
|
|
||||||
If you need to run a larger number of jobs using the same program with systematically varying inputs,
|
If you need to run a large number of jobs based on the same executable with systematically varying inputs,
|
||||||
e.g. a parameter sweep, you can do this most easily in form of a **simple array job**
|
e.g. for a parameter sweep, you can do this most easily in form of a **simple array job**.
|
||||||
|
|
||||||
``` bash
|
``` bash
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
@ -172,9 +172,11 @@ srun myprogram config-file-${SLURM_ARRAY_TASK_ID}.dat
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
This will run 8 independent jobs, where each job can use the counter variable `SLURM_ARRAY_TASK_ID`
|
This will run 8 independent jobs, where each job can use the counter
|
||||||
to feed the correct input arguments or configuration file to the "myprogram" executable. Each job
|
variable `SLURM_ARRAY_TASK_ID` defined by Slurm inside of the job's
|
||||||
will receive the same set of configurations (e.g. time limit of 8h in the example above).
|
environment to feed the correct input arguments or configuration file
|
||||||
|
to the "myprogram" executable. Each job will receive the same set of
|
||||||
|
configurations (e.g. time limit of 8h in the example above).
|
||||||
|
|
||||||
The jobs are independent, but they will run in parallel (if the cluster resources allow for
|
The jobs are independent, but they will run in parallel (if the cluster resources allow for
|
||||||
it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each
|
it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each
|
||||||
@ -192,18 +194,18 @@ FILES=(/path/to/data/*)
|
|||||||
srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]}
|
srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]}
|
||||||
```
|
```
|
||||||
|
|
||||||
Or for a trivial case you could supply the parameter to scan in form
|
Or for a trivial case you could supply the values for a parameter scan in form
|
||||||
of a parameter list
|
of a argument list that gets fed to the program using the counter variable.
|
||||||
|
|
||||||
``` bash
|
``` bash
|
||||||
ARGS=(0.05 0.25 0.5 1 2 5 100)
|
ARGS=(0.05 0.25 0.5 1 2 5 100)
|
||||||
srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}
|
srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Array jobs for long running tasks with checkpoint files
|
## Array jobs for running very long tasks with checkpoint files
|
||||||
|
|
||||||
If you need to run a job for a much longer than the queues (partitions) allow, and
|
If you need to run a job for much longer than the queues (partitions) permit, and
|
||||||
your executable is able to create checkpoints at intervals, you can use this
|
your executable is able to create checkpoint files, you can use this
|
||||||
strategy:
|
strategy:
|
||||||
|
|
||||||
``` bash
|
``` bash
|
||||||
|
Reference in New Issue
Block a user