153 lines
6.5 KiB
Markdown
153 lines
6.5 KiB
Markdown
---
|
|
layout: default
|
|
title: Slurm Basic Commands
|
|
parent: Merlin6 Slurm
|
|
grand_parent: Merlin6 User Guide
|
|
nav_order: 1
|
|
---
|
|
|
|
# Slurm Basic Commands
|
|
{: .no_toc }
|
|
|
|
## Table of contents
|
|
{: .no_toc .text-delta }
|
|
|
|
1. TOC
|
|
{:toc}
|
|
|
|
---
|
|
|
|
## Basic commands
|
|
|
|
Useful commands for the slurm:
|
|
|
|
```bash
|
|
sinfo # to see the name of nodes, their occupancy, name of slurm partitions, limits (try out with "-l" option)
|
|
squeue # to see the currently running/waiting jobs in slurm (additional "-l" option may also be useful)
|
|
sbatch Script.sh # to submit a script (example below) to the slurm.
|
|
srun command # to submit a command to Slurm. Same options as in 'sbatch' can be used.
|
|
salloc # to allocate computing nodes. Useful for running interactive jobs (ANSYS, Python Notebooks, etc.).
|
|
scancel job_id # to cancel slurm job, job id is the numeric id, seen by the squeue
|
|
```
|
|
|
|
Other advanced basic commands:
|
|
|
|
```bash
|
|
sinfo -N -l # list nodes, state, resources (number of CPUs, memory per node, etc.), and other information
|
|
sshare -a # to list shares of associations to a cluster
|
|
sprio -l # to view the factors that comprise a job's scheduling priority (add -u <username> for filtering user)
|
|
```
|
|
|
|
---
|
|
|
|
## Basic slurm example
|
|
|
|
You can copy-paste the following example in a file file called ``mySlurm.batch``.
|
|
Some basic parameters are explained in the example.
|
|
Please notice that ``#`` is an enabled option while ``##`` is a commented out option (no effect).
|
|
|
|
```bash
|
|
#!/bin/sh
|
|
#SBATCH --partition=daily # name of slurm partition to submit. Can be 'general' (default if not specified), 'daily', 'hourly'.
|
|
#SBATCH --job-name="mySlurmTest" # name of the job. Useful when submitting different types of jobs for filtering (i.e. 'squeue' command)
|
|
#SBATCH --time=0-12:00:00 # time limit. Here is shortened to 12 hours (default and max for 'daily' is 1 day).
|
|
#SBATCH --exclude=merlin-c-001 # exclude which nodes you don't want to submit
|
|
#SBATCH --nodes=10 # number of nodes you want to allocate the job
|
|
#SBATCH --ntasks=440 # number of tasks to run
|
|
##SBATCH --exclusive # enable if you need exclusive usage of a node. If this option is not specified, nodes are shared by default.
|
|
##SBATCH --ntasks-per-node=32 # number of tasks per node. Each Apollo node has 44 cores, using less with exclusive mode may help to turbo boost CPU frequency. If this option is enabled, setup --ntasks and --nodes according to it.
|
|
|
|
module load gcc/8.3.0 openmpi/3.1.3
|
|
|
|
echo "Example no-MPI:" ; hostname # will print one hostname per node
|
|
echo "Example MPI:" ; mpirun hostname # will print one hostname per ntask
|
|
```
|
|
|
|
### Submitting a job
|
|
|
|
Submit job to slurm and check it's status:
|
|
|
|
```bash
|
|
sbatch mySlurm.batch # submit this job to slurm
|
|
squeue # check its status
|
|
```
|
|
|
|
---
|
|
|
|
## Advanced slurm test script
|
|
|
|
Copy-paste the following example in a file called myAdvancedTest.batch):
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
#SBATCH --partition=merlin # name of slurm partition to submit
|
|
#SBATCH --time=2:00:00 # limit the execution of this job to 2 hours, see sinfo for the max. allowance
|
|
#SBATCH --nodes=2 # number of nodes
|
|
#SBATCH --ntasks=24 # number of tasks
|
|
|
|
module load gcc/8.3.0 openmpi/3.1.3
|
|
module list
|
|
|
|
echo "Example no-MPI:" ; hostname # will print one hostname per node
|
|
echo "Example MPI:" ; mpirun hostname # will print one hostname per ntask
|
|
```
|
|
|
|
In the above example are specified the options ``--nodes=2`` and ``--ntasks=24``. This means that up 2 nodes are requested,
|
|
and is expected to run 24 tasks. Hence, 24 cores are needed for running that job. Slurm will try to allocate a maximum of 2 nodes,
|
|
both together having at least 24 cores. Since our nodes have 44 cores / each, if nodes are empty (no other users
|
|
have running jobs there), job will land on a single node (it has enough cores to run 24 tasks).
|
|
|
|
If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires
|
|
more memory per core) you should specify other options.
|
|
|
|
A good example is ``--ntasks-per-node=12``. This will equally distribute 12 tasks on 2 nodes.
|
|
|
|
```bash
|
|
#SBATCH --ntasks-per-node=12
|
|
```
|
|
|
|
A different example could be by specifying how much memory per core is needed. For instance ``--mem-per-cpu=32000`` will reserve
|
|
~32000MB per core. Since we have a maximum of 352000MB per node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB).
|
|
It means that 3 nodes will be needed (can not run 12 tasks per node, only 11), or we need to decrease ``--mem-per-cpu`` to a lower value which
|
|
can allow the use of at least 12 cores per node (i.e. ``28000``)
|
|
|
|
```bash
|
|
#SBATCH --mem-per-cpu=28000
|
|
```
|
|
|
|
Finally, in order to ensure exclusivity of the node, an option *--exclusive* can be used (see below). This will ensure that
|
|
the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely
|
|
free nodes will be allocated).
|
|
|
|
```bash
|
|
#SBATCH --exclusive
|
|
```
|
|
|
|
This can be combined with the previous examples.
|
|
|
|
More advanced configurations can be defined and can be combined with the previous examples. More information about advanced
|
|
options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch').
|
|
|
|
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
|
|
advanced configurations unless your are sure of what you are doing.
|
|
|
|
---
|
|
|
|
## Environment Modules
|
|
|
|
On top of the operating system stack we provide different software using the PSI developed
|
|
pmodule system. Useful commands:
|
|
|
|
```bash
|
|
module avail # to see the list of available software provided via pmodules
|
|
module load gnuplot/5.2.0 # to load specific version of gnuplot package
|
|
module search hdf # try it out to see which version of hdf5 package is provided and with which dependencies
|
|
module load gcc/6.2.0 openmpi/1.10.2 hdf5/1.8.17 # load the specific version of hdf5, compiled with specific version of gcc and openmpi
|
|
module use unstable # to get access to the packages which are not considered to be fully stable by module provider (may be very fresh version, or not yet tested by community)
|
|
module list # to see which software is loaded in your environment
|
|
```
|
|
|
|
### Requests for New Software
|
|
|
|
If you miss some package/version, contact us
|