Update Mrlin6 documentation with latest changes, added new pages

This commit is contained in:
2019-06-13 08:41:29 +02:00
parent caa63db616
commit caa558a090
9 changed files with 942 additions and 0 deletions

View File

@ -0,0 +1,13 @@
---
layout: default
title: Merlin6 Slurm
parent: Merlin6 User Guide
nav_order: 5
has_children: true
permalink: /docs/merlin6-user-guide/merlin6-slurm/merlin6-slurm.html
---
# Merlin6 Slurm
Wellcome to the PSI Merlin6 Slurm Cluster.

View File

@ -0,0 +1,152 @@
---
layout: default
title: Slurm Basic Commands
parent: Merlin6 Slurm
grand_parent: Merlin6 User Guide
nav_order: 1
---
# Slurm Basic Commands
{: .no_toc }
## Table of contents
{: .no_toc .text-delta }
1. TOC
{:toc}
---
## Basic commands
Useful commands for the slurm:
```bash
sinfo # to see the name of nodes, their occupancy, name of slurm partitions, limits (try out with "-l" option)
squeue # to see the currently running/waiting jobs in slurm (additional "-l" option may also be useful)
sbatch Script.sh # to submit a script (example below) to the slurm.
srun command # to submit a command to Slurm. Same options as in 'sbatch' can be used.
salloc # to allocate computing nodes. Useful for running interactive jobs (ANSYS, Python Notebooks, etc.).
scancel job_id # to cancel slurm job, job id is the numeric id, seen by the squeue
```
Other advanced basic commands:
```bash
sinfo -N -l # list nodes, state, resources (number of CPUs, memory per node, etc.), and other information
sshare -a # to list shares of associations to a cluster
sprio -l # to view the factors that comprise a job's scheduling priority (add -u <username> for filtering user)
```
---
## Basic slurm example
You can copy-paste the following example in a file file called ``mySlurm.batch``.
Some basic parameters are explained in the example.
Please notice that ``#`` is an enabled option while ``##`` is a commented out option (no effect).
```bash
#!/bin/sh
#SBATCH --partition=daily # name of slurm partition to submit. Can be 'general' (default if not specified), 'daily', 'hourly'.
#SBATCH --job-name="mySlurmTest" # name of the job. Useful when submitting different types of jobs for filtering (i.e. 'squeue' command)
#SBATCH --time=0-12:00:00 # time limit. Here is shortened to 12 hours (default and max for 'daily' is 1 day).
#SBATCH --exclude=merlin-c-001 # exclude which nodes you don't want to submit
#SBATCH --nodes=10 # number of nodes you want to allocate the job
#SBATCH --ntasks=440 # number of tasks to run
##SBATCH --exclusive # enable if you need exclusive usage of a node. If this option is not specified, nodes are shared by default.
##SBATCH --ntasks-per-node=32 # number of tasks per node. Each Apollo node has 44 cores, using less with exclusive mode may help to turbo boost CPU frequency. If this option is enabled, setup --ntasks and --nodes according to it.
module load gcc/8.3.0 openmpi/3.1.3
echo "Example no-MPI:" ; hostname # will print one hostname per node
echo "Example MPI:" ; mpirun hostname # will print one hostname per ntask
```
### Submitting a job
Submit job to slurm and check it's status:
```bash
sbatch mySlurm.batch # submit this job to slurm
squeue # check its status
```
---
## Advanced slurm test script
Copy-paste the following example in a file called myAdvancedTest.batch):
```bash
#!/bin/bash
#SBATCH --partition=merlin # name of slurm partition to submit
#SBATCH --time=2:00:00 # limit the execution of this job to 2 hours, see sinfo for the max. allowance
#SBATCH --nodes=2 # number of nodes
#SBATCH --ntasks=24 # number of tasks
module load gcc/8.3.0 openmpi/3.1.3
module list
echo "Example no-MPI:" ; hostname # will print one hostname per node
echo "Example MPI:" ; mpirun hostname # will print one hostname per ntask
```
In the above example are specified the options ``--nodes=2`` and ``--ntasks=24``. This means that up 2 nodes are requested,
and is expected to run 24 tasks. Hence, 24 cores are needed for running that job. Slurm will try to allocate a maximum of 2 nodes,
both together having at least 24 cores. Since our nodes have 44 cores / each, if nodes are empty (no other users
have running jobs there), job will land on a single node (it has enough cores to run 24 tasks).
If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires
more memory per core) you should specify other options.
A good example is ``--ntasks-per-node=12``. This will equally distribute 12 tasks on 2 nodes.
```bash
#SBATCH --ntasks-per-node=12
```
A different example could be by specifying how much memory per core is needed. For instance ``--mem-per-cpu=32000`` will reserve
~32000MB per core. Since we have a maximum of 352000MB per node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB).
It means that 3 nodes will be needed (can not run 12 tasks per node, only 11), or we need to decrease ``--mem-per-cpu`` to a lower value which
can allow the use of at least 12 cores per node (i.e. ``28000``)
```bash
#SBATCH --mem-per-cpu=28000
```
Finally, in order to ensure exclusivity of the node, an option *--exclusive* can be used (see below). This will ensure that
the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely
free nodes will be allocated).
```bash
#SBATCH --exclusive
```
This can be combined with the previous examples.
More advanced configurations can be defined and can be combined with the previous examples. More information about advanced
options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch').
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
advanced configurations unless your are sure of what you are doing.
---
## Environment Modules
On top of the operating system stack we provide different software using the PSI developed
pmodule system. Useful commands:
```bash
module avail # to see the list of available software provided via pmodules
module load gnuplot/5.2.0 # to load specific version of gnuplot package
module search hdf # try it out to see which version of hdf5 package is provided and with which dependencies
module load gcc/6.2.0 openmpi/1.10.2 hdf5/1.8.17 # load the specific version of hdf5, compiled with specific version of gcc and openmpi
module use unstable # to get access to the packages which are not considered to be fully stable by module provider (may be very fresh version, or not yet tested by community)
module list # to see which software is loaded in your environment
```
### Requests for New Software
If you miss some package/version, contact us

View File

@ -0,0 +1,90 @@
---
layout: default
title: Slurm Configuration
parent: Merlin6 Slurm
grand_parent: Merlin6 User Guide
nav_order: 2
---
# Slurm Configuration
{: .no_toc }
## Table of contents
{: .no_toc .text-delta }
1. TOC
{:toc}
---
## Using the Slurm batch system
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
to the **merlin6** login nodes.
### About Merlin5 & Merlin6
The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* cluster will be kept for some time, and it has been renamed to **merlin5**.
It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
From July 2019, **merlin6** becomes the **default cluster** and any job submitted to Slurm will be submitted to that cluster. Users can keep submitting to
the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
### Using Slurm 'merlin6' cluster
Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
#### Merlin6 Node definition
The following table show default and maximum resources that can be used per node:
| Nodes | Def.#CPUs | Max.#CPUs | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
|:---------------------------------- | ---------:| ---------:| -----------:| -----------:| ------------:| --------:| --------- | --------- |
| merlin-c-[001-022,101-122,201-222] | 1 core | 44 cores | 8000 | 352000 | 352000 | 10000 | N/A | N/A |
| merlin-g-[001] | 1 core | 8 cores | 8000 | 102498 | 102498 | 10000 | 1 | 2 |
| merlin-g-[002-009] | 1 core | 10 cores | 8000 | 102498 | 102498 | 10000 | 1 | 4 |
If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
and maximum memory allowed is ``Max.Mem/Node``.
In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
#### Merlin6 Slurm partitions
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
The following *partitions* (also known as *queues*) are configured in Slurm:
| Partition | Default Partition | Default Time | Max Time | Max Nodes | Priority |
|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
| **general** | true | 1 day | 1 week | 50 | low |
| **daily** | false | 1 day | 1 day | 60 | medium |
| **hourly** | false | 1 hour | 1 hour | unlimited | highest |
General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
#### Merlin6 User limits
By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general** and **daily** partitions. For the **hourly** partition, there is no restriction.
These limits are softed for the **daily** partition during non working hours and during the weekend as follows:
| Partition | Mon-Fri 08h-18h | Sun-Thu 18h-0h | From Fri 18h to Sun 8h | From Sun 8h to Mon 18h |
|:----------- | --------------- | -------------- | ----------------------- | ---------------------- |
| **general** | 528 | 528 | 528 | 528 |
| **daily** | 528 | 792 | Unlimited | 792 |
| **hourly** | Unlimited | Unlimited | Unlimited | Unlimited |