--- title: Running Slurm Scripts #tags: keywords: batch script, slurm, sbatch, srun last_updated: 23 January 2020 summary: "This document describes how to run batch scripts in Slurm." sidebar: merlin6_sidebar permalink: /merlin6/running-jobs.html --- ## The rules Before starting using the cluster, please read the following rules: 1. Always try to **estimate and** to **define a proper run time** of your jobs: * Use ``--time=`` for that. * This will ease the scheduling. * Slurm will schedule efficiently the queued jobs. * For very long runs, please consider using ***[Job Arrays with Checkpointing](/merlin6/running-jobs.html#array-jobs-running-very-long-tasks-with-checkpoint-files)*** 2. Try to optimize your jobs for running within **one day**. Please, consider the following: * Some software can simply scale up by using more nodes while drastically reducing the run time. * Some software allow to save a specific state, and a second job can start from that state. * ***[Job Arrays with Checkpointing](/merlin6/running-jobs.html#array-jobs-running-very-long-tasks-with-checkpoint-files)*** can help you with that. * Use the **'daily'** partition when you ensure that you can run within one day: * ***'daily'*** **will give you more priority than running in the** ***'general'*** **queue!** 3. Is **forbidden** to run **very short jobs**: * Running jobs of few seconds can cause severe problems. * Running very short jobs causes a lot of overhead. * ***Question:*** Is my job a very short job? * ***Answer:*** If it lasts in few seconds or very few minutes, yes. * ***Question:*** How long should my job run? * ***Answer:*** as the *Rule of Thumb*, from 5' would start being ok, from 15' would preferred. * Use ***[Packed Jobs](/merlin6/running-jobs.html#packed-jobs-running-a-large-number-of-short-tasks)*** for running a large number of short tasks. * For short runs lasting in less than 1 hour, please use the **hourly** partition. * ***'hourly'*** **will give you more priority than running in the** ***'daily'*** **queue!** 4. Do not submit hundreds of similar jobs! * Use ***[Array Jobs](/merlin6/running-jobs.html#array-jobs-launching-a-large-number-of-related-jobs)*** for gathering jobs instead. ## Basic commands for running batch scripts **``sbatch``** is the command used for submitting a batch script to Slurm * Use **``srun``**: to run parallel tasks. * As an alternative, ``mpirun`` and ``mpiexec`` can be used. However, ***is strongly recommended to user ``srun``*** instead. * Use **``squeue``** for checking jobs status * Use **``scancel``** for deleting a job from the queue. ## Basic settings For a complete list of options and parameters available is recommended to use the **man** pages (``man sbatch``, ``man srun``, ``man salloc``). Please, notice that behaviour for some parameters might change depending on the command (in example, ``--exclusive`` behaviour in ``sbatch`` differs from ``srun``. In this chapter we show the basic parameters which are usually needed in the Merlin cluster. ### Clusters * For running jobs in the **Merlin6** computing nodes, users have to add the following option: ```bash #SBATCH --clusters=merlin6 ``` * For running jobs in the **Merlin5** computing nodes, users have to add the following options: ```bash #SBATCH --clusters=merlin5 ``` ***For advanced users:*** If you do not care where to run the jobs (**Merlin5** or **Merlin6**) you can skip this setting, however you must make sure that your code can run on both clusters without any problem and you have defined proper settings in your *batch* script. ### Partitions **Merlin6** contains 4 partitions for general purpose, while **Merlin5** contains 1 single CPU partition (for historical reasons): * **Merlin6** CPU partitions are 3: ``general``, ``daily`` and ``hourly``. * **Merlin6** GPU partition is 1: ``gpu``. * **Merlin5** CPU partition is 1: ``merlin`` For Merlin6, if no partition is defined, ``general`` will be the default, while for Merlin5 is ``merlin``. Partitions can be changed by defining the ``--partition`` option as follows: ```bash #SBATCH --partition= # Partition to use. 'general' is the 'default' in Merlin6. ``` Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup. ### Hyperthreaded vs non-Hyperthreaded jobs Computing nodes in **merlin6** have hyperthreading enabled: every core is running two threads. It means that for many cases it needs to be disabled and only those multithread-based applications will benefit from that. There are some parameters that users must apply: * For **hyperthreaded based jobs** users ***must*** specify the following options: ```bash #SBATCH --hint=multithread # Mandatory for multithreaded jobs #SBATCH --ntasks-per-core=2 # Only needed when a task fits into a core ``` * For **non-hyperthreaded based jobs** users ***must*** specify the following options: ```bash #SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs #SBATCH --ntasks-per-core=1 # Only needed when a task fits into a core ``` {{site.data.alerts.tip}} In general, --hint=[no]multithread is a mandatory field. On the other hand, --ntasks-per-core is only needed when one needs to define how a task should be handled within a core, and this setting will not be generally used on Hybrid MPI/OpenMP jobs where multiple cores are needed for a single tasks. {{site.data.alerts.end}} ### Shared vs exclusive nodes The **Merlin5** and **Merlin6** clusters are designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This behaviour can be changed by a user if they require exclusive usage of nodes. By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way, we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs. Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows: ```bash #SBATCH --exclusive ``` ### Time There are some settings that are not mandatory but would be needed or useful to specify. These are the following: * ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying shorter times. **This will affect scheduling priorities**, hence is important to define it (and to define it properly). ```bash #SBATCH --time= # Time job needs to run ``` ### Output and Errors By default, Slurm script will generate standard output and errors files in the directory from where you submit the batch script: * standard output will be written into a file ``slurm-$SLURM_JOB_ID.out``. * standard error will be written into a file ``slurm-$SLURM_JOB_ID.err``. If you want to the default names it can be done with the options ``--output`` and ``--error``. In example: ```bash #SBATCH --output=logs/myJob.%N.%j.out # Generate an output file per hostname and jobid #SBATCH --error=logs/myJob.%N.%j.err # Generate an errori file per hostname and jobid ``` Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**. ### GPU specific settings #### Slurm account When using GPUs, users must switch to the **merlin-gpu** Slurm account in order to be able to run on GPU-based nodes. This is done with the ``--account`` setting as follows: ```bash #SBATCH --account=merlin-gpu # The account 'merlin-gpu' must be used ``` #### GRES The following options are mandatory settings that **must be included** in your batch scripts: ```bash #SBATCH --gres=gpu # Always set at least this option when using GPUs ``` ##### GRES advanced settings GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process must be used. Users can define which GPUs resources they need with the ``--gres`` option. Valid ``gres`` options are: ``gpu[[:type]:count]`` where ``type=GTX1080|GTX1080Ti`` and ``count=`` This would be according to the following rules: In example: ```bash #SBATCH --gres=gpu:GTX1080:4 # Use a node with 4 x GTX1080 GPUs ``` ***Important note:*** Due to a bug in the configuration, ``[:type]`` (i.e. ``GTX1080`` or ``GTX1080Ti``) is not working. Users should skip that and use only ``gpu[:count]``. This will be fixed in the upcoming downtimes as it requires a full restart of the batch system. ## Batch script templates ### CPU-based jobs templates The following examples apply to the **Merlin6** cluster. #### Nomultithreaded jobs template The following template should be used by any user submitting jobs to CPU nodes: ```bash #!/bin/bash #SBATCH --partition= # Specify 'general' or 'daily' or 'hourly' #SBATCH --time= # Strongly recommended #SBATCH --output= # Generate custom output file #SBATCH --error= # Generate custom error file #SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs ##SBATCH --exclusive # Uncomment if you need exclusive node usage ##SBATCH --ntasks-per-core=1 # Only mandatory for non-multithreaded single tasks ## Advanced options example ##SBATCH --nodes=1 # Uncomment and specify #nodes to use ##SBATCH --ntasks=44 # Uncomment and specify #nodes to use ##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node ##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task ``` #### Multithreaded jobs template The following template should be used by any user submitting jobs to CPU nodes: ```bash #!/bin/bash #SBATCH --partition= # Specify 'general' or 'daily' or 'hourly' #SBATCH --time= # Strongly recommended #SBATCH --output= # Generate custom output file #SBATCH --error= # Generate custom error file #SBATCH --hint=multithread # Mandatory for multithreaded jobs ##SBATCH --exclusive # Uncomment if you need exclusive node usage ##SBATCH --ntasks-per-core=2 # Only mandatory for multithreaded single tasks ## Advanced options example ##SBATCH --nodes=1 # Uncomment and specify #nodes to use ##SBATCH --ntasks=88 # Uncomment and specify #nodes to use ##SBATCH --ntasks-per-node=88 # Uncomment and specify #tasks per node ##SBATCH --cpus-per-task=88 # Uncomment and specify the number of cores per task ``` ### GPU-based jobs templates The following template should be used by any user submitting jobs to GPU nodes: ```bash #!/bin/bash #SBATCH --partition=gpu_ # Specify 'general' or 'daily' or 'hourly' #SBATCH --gres="gpu::" # You should specify at least 'gpu' #SBATCH --time= # Strongly recommended #SBATCH --output= # Generate custom output file #SBATCH --error=