improved wording and some corrections to array/packed jobs
This commit is contained in:
@ -154,10 +154,10 @@ options can be found in the following link: https://slurm.schedmd.com/sbatch.htm
|
||||
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
|
||||
advanced configurations unless your are sure of what you are doing.
|
||||
|
||||
## Array Jobs - how to launch a big number of similar jobs
|
||||
## Array Jobs - launching a large number of related jobs
|
||||
|
||||
If you need to run a larger number of jobs using the same program with systematically varying inputs,
|
||||
e.g. a parameter sweep, you can do this most easily in form of a **simple array job**
|
||||
If you need to run a large number of jobs based on the same executable with systematically varying inputs,
|
||||
e.g. for a parameter sweep, you can do this most easily in form of a **simple array job**.
|
||||
|
||||
``` bash
|
||||
#!/bin/bash
|
||||
@ -172,9 +172,11 @@ srun myprogram config-file-${SLURM_ARRAY_TASK_ID}.dat
|
||||
|
||||
```
|
||||
|
||||
This will run 8 independent jobs, where each job can use the counter variable `SLURM_ARRAY_TASK_ID`
|
||||
to feed the correct input arguments or configuration file to the "myprogram" executable. Each job
|
||||
will receive the same set of configurations (e.g. time limit of 8h in the example above).
|
||||
This will run 8 independent jobs, where each job can use the counter
|
||||
variable `SLURM_ARRAY_TASK_ID` defined by Slurm inside of the job's
|
||||
environment to feed the correct input arguments or configuration file
|
||||
to the "myprogram" executable. Each job will receive the same set of
|
||||
configurations (e.g. time limit of 8h in the example above).
|
||||
|
||||
The jobs are independent, but they will run in parallel (if the cluster resources allow for
|
||||
it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each
|
||||
@ -192,18 +194,18 @@ FILES=(/path/to/data/*)
|
||||
srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]}
|
||||
```
|
||||
|
||||
Or for a trivial case you could supply the parameter to scan in form
|
||||
of a parameter list
|
||||
Or for a trivial case you could supply the values for a parameter scan in form
|
||||
of a argument list that gets fed to the program using the counter variable.
|
||||
|
||||
``` bash
|
||||
ARGS=(0.05 0.25 0.5 1 2 5 100)
|
||||
srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}
|
||||
```
|
||||
|
||||
## Array jobs for long running tasks with checkpoint files
|
||||
## Array jobs for running very long tasks with checkpoint files
|
||||
|
||||
If you need to run a job for a much longer than the queues (partitions) allow, and
|
||||
your executable is able to create checkpoints at intervals, you can use this
|
||||
If you need to run a job for much longer than the queues (partitions) permit, and
|
||||
your executable is able to create checkpoint files, you can use this
|
||||
strategy:
|
||||
|
||||
``` bash
|
||||
|
Reference in New Issue
Block a user