4.7 KiB
title, last_updated, sidebar, permalink
title | last_updated | sidebar | permalink |
---|---|---|---|
Running Jobs | 18 June 2019 | merlin6_sidebar | /merlin6/running-jobs.html |
Commands for running jobs
sbatch
: to submit a batch script to Slurmsqueue
: for checking the status of your jobsscancel
: for deleting a job from the queue
srun
: to run parallel jobs in the batch systemsalloc
: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.salloc
is equivalent to an interactive run
Shared nodes and exclusivity
The Merlin6 cluster has been designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This behaviour can be changed by a user if they require exclusive usage of nodes.
By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way, we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs.
Exclusivity of a node can be setup by specific the --exclusive
option as follows:
#SBATCH --exclusive
Output and Errors
By default, Slurm script will generate standard output and errors files in the directory from where you submit the batch script:
- standard output will be written into a file
slurm-$SLURM_JOB_ID.out
. - standard error will be written into a file
slurm-$SLURM_JOB_ID.err
.
If you want to the default names it can be done with the options --output
and --error
. In example:
#SBATCH --output=logs/myJob.%N.%j.out # Generate an output file per hostname and jobid
#SBATCH --error=logs/myJob.%N.%j.err # Generate an errori file per hostname and jobid
Use man sbatch (man sbatch | grep -A36 '^filename pattern'
) for getting a list specification of filename patterns.
Partitions
Merlin6 contains 3 partitions for general purpose. These are general
, daily
and hourly
. If no partition is defined,
general
will be the default. Partition can be defined with the --partition
option as follows:
#SBATCH --partition=<general|daily|hourly> # name of slurm partition to submit. 'general' is the 'default'.
Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup.
CPU-based Jobs Settings
CPU-based jobs are available for all PSI users. Users must belong to the merlin6
Slurm Account
in order to be able
to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the Account
.
Slurm CPU Mandatory Settings
The following options are mandatory settings that must be included in your batch scripts:
#SBATCH --constraint=mc # Always set it to 'mc' for CPU jobs.
Slurm CPU Recommended Settings
There are some settings that are not mandatory but would be needed or useful to specify. These are the following:
-
--time
: mostly used when you need to specify longer runs in thegeneral
partition, also useful for specifying shorter times. This may affect scheduling priorities.#SBATCH --time=<D-HH:MM:SS> # Time job needs to run
GPU-based Jobs Settings
GPU-base jobs are restricted to BIO users, however access for PSI users can be requested on demand. Users must belong to
the merlin6-gpu
Slurm Account
in order to be able to run GPU-based nodes. BIO users belonging to any BIO group
are automatically registered to the merlin6-gpu
account. Other users should request access to the Merlin6 administrators.
Slurm CPU Mandatory Settings
The following options are mandatory settings that must be included in your batch scripts:
#SBATCH --constraint=gpu # Always set it to 'gpu' for GPU jobs.
#SBATCH --gres=gpu # Always set at least this option when using GPUs
Slurm GPU Recommended Settings
GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process
must be used. Users can define which GPUs resources they need with the --gres
option.
This would be according to the following rules:
- All machines except
merlin-g-001
have up to 4 GPUs.merlin-g-001
has up to 2 GPUs. - Two different NVIDIA models profiles exist:
GTX1080
andGTX1080Ti
.
Valid gres
options are: gpu[[:type]:count]
where:
type
: can beGTX1080
orGTX1080Ti
count
: will be the number of GPUs to use
In example:
#SBATCH --gres=gpu:GTX1080:4 # Use 4 x GTX1080 GPUs