improved wording and some corrections to array/packed jobs

This commit is contained in:
2019-10-11 17:12:54 +02:00
parent ce72b9e9c2
commit e39402e98c

View File

@ -154,10 +154,10 @@ options can be found in the following link: https://slurm.schedmd.com/sbatch.htm
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
advanced configurations unless your are sure of what you are doing. advanced configurations unless your are sure of what you are doing.
## Array Jobs - how to launch a big number of similar jobs ## Array Jobs - launching a large number of related jobs
If you need to run a larger number of jobs using the same program with systematically varying inputs, If you need to run a large number of jobs based on the same executable with systematically varying inputs,
e.g. a parameter sweep, you can do this most easily in form of a **simple array job** e.g. for a parameter sweep, you can do this most easily in form of a **simple array job**.
``` bash ``` bash
#!/bin/bash #!/bin/bash
@ -172,9 +172,11 @@ srun myprogram config-file-${SLURM_ARRAY_TASK_ID}.dat
``` ```
This will run 8 independent jobs, where each job can use the counter variable `SLURM_ARRAY_TASK_ID` This will run 8 independent jobs, where each job can use the counter
to feed the correct input arguments or configuration file to the "myprogram" executable. Each job variable `SLURM_ARRAY_TASK_ID` defined by Slurm inside of the job's
will receive the same set of configurations (e.g. time limit of 8h in the example above). environment to feed the correct input arguments or configuration file
to the "myprogram" executable. Each job will receive the same set of
configurations (e.g. time limit of 8h in the example above).
The jobs are independent, but they will run in parallel (if the cluster resources allow for The jobs are independent, but they will run in parallel (if the cluster resources allow for
it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each
@ -192,18 +194,18 @@ FILES=(/path/to/data/*)
srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]} srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]}
``` ```
Or for a trivial case you could supply the parameter to scan in form Or for a trivial case you could supply the values for a parameter scan in form
of a parameter list of a argument list that gets fed to the program using the counter variable.
``` bash ``` bash
ARGS=(0.05 0.25 0.5 1 2 5 100) ARGS=(0.05 0.25 0.5 1 2 5 100)
srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]} srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}
``` ```
## Array jobs for long running tasks with checkpoint files ## Array jobs for running very long tasks with checkpoint files
If you need to run a job for a much longer than the queues (partitions) allow, and If you need to run a job for much longer than the queues (partitions) permit, and
your executable is able to create checkpoints at intervals, you can use this your executable is able to create checkpoint files, you can use this
strategy: strategy:
``` bash ``` bash