caubet_m caa558a090 Update Mrlin6 documentation with latest changes, added new pages

2019-06-13 08:41:29 +02:00

6.5 KiB

Raw Blame History

layout, title, parent, grand_parent, nav_order

layout	title	parent	grand_parent	nav_order
default	Slurm Basic Commands	Merlin6 Slurm	Merlin6 User Guide	1

Slurm Basic Commands

{: .no_toc }

{: .no_toc .text-delta }

TOC {:toc}

Basic commands

Useful commands for the slurm:

sinfo            # to see the name of nodes, their occupancy, name of slurm partitions, limits (try out with "-l" option)
squeue           # to see the currently running/waiting jobs in slurm (additional "-l" option may also be useful)
sbatch Script.sh # to submit a script (example below) to the slurm.
srun command     # to submit a command to Slurm. Same options as in 'sbatch' can be used.
salloc           # to allocate computing nodes. Useful for running interactive jobs (ANSYS, Python Notebooks, etc.).
scancel job_id   # to cancel slurm job, job id is the numeric id, seen by the squeue

Other advanced basic commands:

sinfo -N -l      # list nodes, state, resources (number of CPUs, memory per node, etc.), and other information
sshare -a        # to list shares of associations to a cluster
sprio -l         # to view the factors that comprise a job's scheduling priority (add -u <username> for filtering user)

Basic slurm example

You can copy-paste the following example in a file file called mySlurm.batch. Some basic parameters are explained in the example. Please notice that # is an enabled option while ## is a commented out option (no effect).

#!/bin/sh
#SBATCH --partition=daily         # name of slurm partition to submit. Can be 'general' (default if not specified), 'daily', 'hourly'. 
#SBATCH --job-name="mySlurmTest"  # name of the job. Useful when submitting different types of jobs for filtering (i.e. 'squeue' command)
#SBATCH --time=0-12:00:00         # time limit. Here is shortened to 12 hours (default and max for 'daily' is 1 day).
#SBATCH --exclude=merlin-c-001    # exclude which nodes you don't want to submit
#SBATCH --nodes=10                # number of nodes you want to allocate the job
#SBATCH --ntasks=440              # number of tasks to run
##SBATCH --exclusive              # enable if you need exclusive usage of a node. If this option is not specified, nodes are shared by default.
##SBATCH --ntasks-per-node=32     # number of tasks per node. Each Apollo node has 44 cores, using less with exclusive mode may help to turbo boost CPU frequency. If this option is enabled, setup --ntasks and --nodes according to it.

module load gcc/8.3.0 openmpi/3.1.3

echo "Example no-MPI:" ; hostname        # will print one hostname per node
echo "Example MPI:"    ; mpirun hostname # will print one hostname per ntask

Submitting a job

Submit job to slurm and check it's status:

sbatch mySlurm.batch  # submit this job to slurm
squeue                # check its status

Advanced slurm test script

Copy-paste the following example in a file called myAdvancedTest.batch):

#!/bin/bash
#SBATCH --partition=merlin # name of slurm partition to submit
#SBATCH --time=2:00:00     # limit the execution of this job to 2 hours, see sinfo for the max. allowance 
#SBATCH --nodes=2          # number of nodes  
#SBATCH --ntasks=24        # number of tasks
 
module load gcc/8.3.0 openmpi/3.1.3
module list

echo "Example no-MPI:" ; hostname        # will print one hostname per node
echo "Example MPI:"    ; mpirun hostname # will print one hostname per ntask

In the above example are specified the options --nodes=2 and --ntasks=24. This means that up 2 nodes are requested, and is expected to run 24 tasks. Hence, 24 cores are needed for running that job. Slurm will try to allocate a maximum of 2 nodes, both together having at least 24 cores. Since our nodes have 44 cores / each, if nodes are empty (no other users have running jobs there), job will land on a single node (it has enough cores to run 24 tasks).

If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires more memory per core) you should specify other options.

A good example is --ntasks-per-node=12. This will equally distribute 12 tasks on 2 nodes.

#SBATCH --ntasks-per-node=12

A different example could be by specifying how much memory per core is needed. For instance --mem-per-cpu=32000 will reserve ~32000MB per core. Since we have a maximum of 352000MB per node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB). It means that 3 nodes will be needed (can not run 12 tasks per node, only 11), or we need to decrease --mem-per-cpu to a lower value which can allow the use of at least 12 cores per node (i.e. 28000)

#SBATCH --mem-per-cpu=28000

Finally, in order to ensure exclusivity of the node, an option --exclusive can be used (see below). This will ensure that the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely free nodes will be allocated).

#SBATCH --exclusive

This can be combined with the previous examples.

More advanced configurations can be defined and can be combined with the previous examples. More information about advanced options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch').

If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run advanced configurations unless your are sure of what you are doing.

Environment Modules

On top of the operating system stack we provide different software using the PSI developed pmodule system. Useful commands:

module avail                                       # to see the list of available software provided via pmodules
module load gnuplot/5.2.0                          # to load specific version of gnuplot package
module search hdf                                  # try it out to see which version of hdf5 package is provided and with which dependencies
module load gcc/6.2.0 openmpi/1.10.2 hdf5/1.8.17   # load the specific version of hdf5, compiled with specific version of gcc and openmpi
module use unstable                                # to get access to the packages which are not considered to be fully stable by module provider (may be very fresh version, or not yet tested by community)
module list                                        # to see which software is loaded in your environment

Requests for New Software

If you miss some package/version, contact us

6.5 KiB Raw Blame History