Files
gitea-pages/pages/merlin6/merlin6-slurm/slurm-examples.md
2019-06-28 13:55:01 +02:00

7.5 KiB

title, last_updated, sidebar, permalink
title last_updated sidebar permalink
Slurm Examples 28 June 2019 merlin6_sidebar /merlin6/slurm-examples.html

Basic single core job

Basic single core job - Example 1

#!/bin/bash
#SBATCH --partition=hourly      # Using 'hourly' will grant higher priority
#SBATCH --constraint=mc         # Use CPU batch system
#SBATCH --ntasks-per-core=1     # Force no Hyper-Threading, will run 1 task per core
#SBATCH --mem-per-cpu=8000      # Double the default memory per cpu
#SBATCH --time=00:30:00         # Define max time job will run
#SBATCH --output=myscript.out   # Define your output file
#SBATCH --error=myscript.err    # Define your error file

my_script

In this example we run a single core job by defining --ntasks-per-core=1 (which is also the default). Since the default memory per cpu is 4000MB (in Slurm, this is equivalent to the memory per thread), and we are using 1 single thread per core, default memory per CPU should be doubled: using a single thread will always be accounted as if the job was using the whole physical core (which has 2 available hyperthreads), hence we want to use the memory as if we were using 2 threads.

Basic single core job - Example 2

#!/bin/bash
#SBATCH --partition=hourly      # Using 'hourly' will grant higher priority
#SBATCH --constraint=mc         # Use CPU batch system
#SBATCH --ntasks-per-core=1     # Force no Hyper-Threading, will run 1 task per core
#SBATCH --mem=352000            # We want to use the whole memory
#SBATCH --time=00:30:00         # Define max time job will run
#SBATCH --output=myscript.out   # Define your output file
#SBATCH --error=myscript.err    # Define your error file

my_script

In this example we run a single core job by defining --ntasks-per-core=1 (which is also the default). Also, we define that the job will use the whole memory of a node with --mem=352000 (which is the maximum memory available per Apollo node). Whenever you want to run a job needing more memory than the default (4000MB per thread) is very important to specify the amount of memory that the job will use. This must be done in order to avoid conflicts with other jobs from other users.

Basic MPI with hyper-threading

#!/bin/bash
#SBATCH --partition=hourly      # Using 'hourly' will grant higher priority
#SBATCH --exclusive             # Use the node in exclusive mode
#SBATCH --ntasks=88             # Job will run 88 tasks
#SBATCH --ntasks-per-core=2     # Force Hyper-Threading, will run 2 tasks per core
#SBATCH --constraint=mc         # Use CPU batch system
#SBATCH --time=00:30:00         # Define max time job will run
#SBATCH --output=myscript.out   # Define your output file
#SBATCH --error=myscript.err    # Define your error file

module load gcc/8.3.0 openmpi/3.1.3

MPI_script

In this example we run a job that will run 88 tasks. Merlin6 Apollo nodes have 44 cores each one with HT enabled. This means that we can run 2 threads per core, in total 88 threads. We add the option --exclusive to ensure that the node usage is exclusive and no other jobs are running there. Finally, the default memory per thread is 4000MB, in total this job can use up to 352000MB memory which is the maximum allowed in a single node.

Basic MPI without hyper-threading

#!/bin/bash
#SBATCH --partition=hourly      # Using 'hourly' will grant higher priority
#SBATCH --ntasks=44             # Job will run 44 tasks
#SBATCH --ntasks-per-core=1     # Force no Hyper-Threading, will run 1 task per core
#SBATCH --mem=352000            # Define the whole memory of the node
#SBATCH --constraint=mc         # Use CPU batch system
#SBATCH --time=00:30:00         # Define max time job will run 
#SBATCH --output=myscript.out   # Define your output file
#SBATCH --error=myscript.err    # Define your output file

module load gcc/8.3.0 openmpi/3.1.3

MPI_script

In this example we run a job that will run 44 tasks, and Hyper-Threading will not be used. Merlin6 Apollo nodes have 44 cores each one with HT enabled. However, defining --ntasks-per-core=1 we force the use of one single thread per core (if this is not defined, will be the default, but is recommended to add it explicitly). Each task will run in 1 thread, and each tasks will be assigned to an independent core. We add the option --exclusive to ensre that the node usage is exclusive and no other jobs are running there. Finally, since the default memory per thread is 4000MB and we use only 1 thread, we want to avoid using half of the memory: we have to specify that we will use the whole memory of the node with the option --mem=352000 (which is the maximum memory available in the node)`.

Advanced Slurm Example

Copy-paste the following example in a file called myAdvancedTest.batch):

#!/bin/bash
#SBATCH --partition=daily  # name of slurm partition to submit
#SBATCH --time=2:00:00     # limit the execution of this job to 2 hours, see sinfo for the max. allowance
#SBATCH --nodes=2          # number of nodes
#SBATCH --ntasks=44        # number of tasks

module load gcc/8.3.0 openmpi/3.1.3
module list

echo "Example no-MPI:" ; hostname        # will print one hostname per node
echo "Example MPI:"    ; mpirun hostname # will print one hostname per ntask

In the above example are specified the options --nodes=2 and --ntasks=44. This means that up 2 nodes are requested, and is expected to run 44 tasks. Hence, 44 cores are needed for running that job (we do not specify --ntasks-per-core, so it will default to 1). Slurm will try to allocate a maximum of 2 nodes, both together having at least 44 cores. Since our nodes have 44 cores / each, if nodes are empty (no other users have running jobs there), job can land on a single node (it has enough cores to run 44 tasks).

If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires more memory per core) you should specify other options.

A good example is --ntasks-per-node=22. This will equally distribute 22 tasks on 2 nodes.

#SBATCH --ntasks-per-node=22

A different example could be by specifying how much memory per core is needed. For instance --mem-per-cpu=32000 will reserve ~32000MB per core. Since we have a maximum of 352000MB per Apollo node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB) per node. It means that 4 nodes will be needed (max 11 tasks per node due to memory definition, and we need to run 44 tasks), in this case we need to change --nodes=4 (or remove --nodes). Alternatively, we can decrease --mem-per-cpu to a lower value which can allow the use of at least 44 cores per node (i.e. with 16000 should be able to use 2 nodes)

#SBATCH --mem-per-cpu=16000

Finally, in order to ensure exclusivity of the node, an option --exclusive can be used (see below). This will ensure that the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely free nodes will be allocated).

#SBATCH --exclusive

This can be combined with the previous examples.

More advanced configurations can be defined and can be combined with the previous examples. More information about advanced options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch').

If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run advanced configurations unless your are sure of what you are doing.