--- title: Running Jobs #tags: #keywords: last_updated: 18 June 2019 #summary: "" sidebar: merlin6_sidebar permalink: /merlin6/running-jobs.html --- ## Commands for running jobs * ``sbatch``: to submit a batch script to Slurm * ``squeue``: for checking the status of your jobs * ``scancel``: for deleting a job from the queue * ``srun``: to run parallel jobs in the batch system * ``salloc``: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished. * ``salloc`` is equivalent to an interactive run ## Shared nodes and exclusivity The **Merlin6** cluster has been designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This behaviour can be changed by a user if they require exclusive usage of nodes. By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way, we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs. Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows: ```bash #SBATCH --exclusive ``` ## Output and Errors By default, Slurm script will generate standard output and errors files in the directory from where you submit the batch script: * standard output will be written into a file ``slurm-$SLURM_JOB_ID.out``. * standard error will be written into a file ``slurm-$SLURM_JOB_ID.err``. If you want to the default names it can be done with the options ``--output`` and ``--error``. In example: ```batch #SBATCH --output=logs/myJob.%N.%j.out # Generate an output file per hostname and jobid #SBATCH --error=logs/myJob.%N.%j.err # Generate an errori file per hostname and jobid ``` Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**. ## Partitions Merlin6 contains 3 partitions for general purpose. These are ``general``, ``daily`` and ``hourly``. If no partition is defined, ``general`` will be the default. Partition can be defined with the ``--partition`` option as follows: ```bash #SBATCH --partition= # name of slurm partition to submit. 'general' is the 'default'. ``` Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup. ## CPU-based Jobs Settings CPU-based jobs are available for all PSI users. Users must belong to the ``merlin6`` Slurm ``Account`` in order to be able to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the ``Account``. ### Slurm CPU Mandatory Settings The following options are mandatory settings that **must be included** in your batch scripts: ```bash #SBATCH --constraint=mc # Always set it to 'mc' for CPU jobs. ``` ### Slurm CPU Recommended Settings There are some settings that are not mandatory but would be needed or useful to specify. These are the following: * ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying shorter times. This may affect scheduling priorities. ```bash #SBATCH --time= # Time job needs to run ``` ## GPU-based Jobs Settings GPU-base jobs are restricted to BIO users, however access for PSI users can be requested on demand. Users must belong to the ``merlin6-gpu`` Slurm ``Account`` in order to be able to run GPU-based nodes. BIO users belonging to any BIO group are automatically registered to the ``merlin6-gpu`` account. Other users should request access to the Merlin6 administrators. ### Slurm CPU Mandatory Settings The following options are mandatory settings that **must be included** in your batch scripts: ```bash #SBATCH --constraint=gpu # Always set it to 'gpu' for GPU jobs. #SBATCH --gres=gpu # Always set at least this option when using GPUs ``` ## Slurm GPU Recommended Settings GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process must be used. Users can define which GPUs resources they need with the ``--gres`` option. This would be according to the following rules: * All machines except ``merlin-g-001`` have up to 4 GPUs. ``merlin-g-001`` has up to 2 GPUs. * Two different NVIDIA models profiles exist: ``GTX1080`` and ``GTX1080Ti``. Valid ``gres`` options are: ``gpu[[:type]:count]`` where: * ``type``: can be ``GTX1080`` or ``GTX1080Ti`` * ``count``: will be the number of GPUs to use In example: ```batch #SBATCH --gres=gpu:GTX1080:4 # Use 4 x GTX1080 GPUs ```