Reorganize merlin6 pages to follow navigation menu

The folders are only used for source organization; URLs remain flat.
This commit is contained in:
Spencer Bliven
2019-07-29 15:18:22 +02:00
parent b3f62ee51f
commit 95f511a203
23 changed files with 11 additions and 22 deletions

View File

@ -0,0 +1,172 @@
---
title: Running Jobs
#tags:
#keywords:
last_updated: 18 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/running-jobs.html
---
## Commands for running jobs
* ``sbatch``: to submit a batch script to Slurm. Use ``squeue`` for checking jobs status and ``scancel`` for deleting a job from the queue.
* ``srun``: to run parallel jobs in the batch system
* ``salloc``: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
This is equivalent to interactive run.
## Running on Merlin5
The **Merlin5** cluster will be available at least until 1st of November 2019. In the meantime, users can keep submitting jobs to the old cluster
but they will need to specify a couple of extra options to their scripts.
```bash
#SBATCH --clusters=merlin5
```
By adding ``--clusters=merlin5`` it will send the jobs to the old Merlin5 computing nodes. Also, ``--partition=<merlin|gpu>`` can be specified in
order to use the old Merlin5 partitions.
## Running on Merlin6
In order to run on the **Merlin6** cluster, users have to add the following options:
```bash
#SBATCH --clusters=merlin6
```
By adding ``--clusters=merlin6`` it will send the jobs to the old Merlin6 computing nodes.
## Shared nodes and exclusivity
The **Merlin6** cluster has been designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing
co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This
behaviour can be changed by a user if they require exclusive usage of nodes.
By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way,
we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs.
Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows:
```bash
#SBATCH --exclusive
```
## Output and Errors
By default, Slurm script will generate standard output and errors files in the directory from where
you submit the batch script:
* standard output will be written into a file ``slurm-$SLURM_JOB_ID.out``.
* standard error will be written into a file ``slurm-$SLURM_JOB_ID.err``.
If you want to the default names it can be done with the options ``--output`` and ``--error``. In example:
```bash
#SBATCH --output=logs/myJob.%N.%j.out # Generate an output file per hostname and jobid
#SBATCH --error=logs/myJob.%N.%j.err # Generate an errori file per hostname and jobid
```
Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**.
## Partitions
Merlin6 contains 6 partitions for general purpose:
* For the CPU these are ``general``, ``daily`` and ``hourly``.
* For the GPU these are ``gpu``.
If no partition is defined, ``general`` will be the default. Partition can be defined with the ``--partition`` option as follows:
```bash
#SBATCH --partition=<partition_name> # Partition to use. 'general' is the 'default'.
```
Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup.
## CPU-based Jobs Settings
CPU-based jobs are available for all PSI users. Users must belong to the ``merlin6`` Slurm ``Account`` in order to be able
to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the ``Account``.
### Slurm CPU Recommended Settings
There are some settings that are not mandatory but would be needed or useful to specify. These are the following:
* ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying
shorter times. This may affect scheduling priorities.
```bash
#SBATCH --time=<D-HH:MM:SS> # Time job needs to run
```
### Slurm CPU Template
The following template should be used by any user submitting jobs to CPU nodes:
```bash
#!/bin/sh
#SBATCH --partition=<general|daily|hourly> # Specify 'general' or 'daily' or 'hourly'
#SBATCH --time=<D-HH:MM:SS> # Strictly recommended when using 'general' partition.
#SBATCH --output=<output_file> # Generate custom output file
#SBATCH --error=<error_file> # Generate custom error file
#SBATCH --ntasks-per-core=1 # Recommended one thread per core
##SBATCH --exclusive # Uncomment if you need exclusive node usage
## Advanced options example
##SBATCH --nodes=1 # Uncomment and specify #nodes to use
##SBATCH --ntasks=44 # Uncomment and specify #nodes to use
##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node
##SBATCH --ntasks-per-core=2 # Uncomment and specify #tasks per core (a.k.a. threads)
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
```
* Users needing hyper-threading can specify ``--ntasks-per-core=2`` instead. This is not recommended for generic usage.
## GPU-based Jobs Settings
GPU-base jobs are restricted to BIO users, however access for PSI users can be requested on demand. Users must belong to
the ``merlin6-gpu`` Slurm ``Account`` in order to be able to run GPU-based nodes. BIO users belonging to any BIO group
are automatically registered to the ``merlin6-gpu`` account. Other users should request access to the Merlin6 administrators.
### Slurm CPU Mandatory Settings
The following options are mandatory settings that **must be included** in your batch scripts:
```bash
#SBATCH --gres=gpu # Always set at least this option when using GPUs
```
### Slurm GPU Recommended Settings
GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process
must be used. Users can define which GPUs resources they need with the ``--gres`` option.
Valid ``gres`` options are: ``gpu[[:type]:count]`` where ``type=GTX1080|GTX1080Ti`` and ``count=<number of gpus to use>``
This would be according to the following rules:
In example:
```bash
#SBATCH --gres=gpu:GTX1080:8 # Use 8 x GTX1080 GPUs
```
### Slurm GPU Template
The following template should be used by any user submitting jobs to GPU nodes:
```bash
#!/bin/sh
#SBATCH --partition=gpu_<general|daily|hourly> # Specify 'general' or 'daily' or 'hourly'
#SBATCH --time=<D-HH:MM:SS> # Strictly recommended when using 'general' partition.
#SBATCH --output=<output_file> # Generate custom output file
#SBATCH --error=<error_file # Generate custom error file
#SBATCH --gres="gpu:<type>:<number_gpus>" # You should specify at least 'gpu'
#SBATCH --ntasks-per-core=1 # GPU nodes have hyper-threading disabled
##SBATCH --exclusive # Uncomment if you need exclusive node usage
## Advanced options example
##SBATCH --nodes=1 # Uncomment and specify number of nodes to use
##SBATCH --ntasks=44 # Uncomment and specify number of nodes to use
##SBATCH --ntasks-per-node=44 # Uncomment and specify number of tasks per node
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
```

View File

@ -0,0 +1,59 @@
---
title: Slurm Basic Commands
#tags:
#keywords:
last_updated: 19 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/slurm-basics.html
---
In this document some basic commands for using Slurm are showed. Advanced examples for some of these
are explained in other Merlin6 Slurm pages. You can always use ```man <command>``` pages for more
information about options and examples.
## Basic commands
Useful commands for the slurm:
```bash
sinfo # to see the name of nodes, their occupancy,
# name of slurm partitions, limits (try out with "-l" option)
squeue # to see the currently running/waiting jobs in slurm
# (additional "-l" option may also be useful)
sbatch Script.sh # to submit a script (example below) to the slurm.
srun <command> # to submit a command to Slurm. Same options as in 'sbatch' can be used.
salloc # to allocate computing nodes. Use for interactive runs.
scancel job_id # to cancel slurm job, job id is the numeric id, seen by the squeue.
```
---
## Advanced basic commands:
```bash
sinfo -N -l # list nodes, state, resources (#CPUs, memory per node, ...), etc.
sshare -a # to list shares of associations to a cluster
sprio -l # to view the factors that comprise a job's scheduling priority
# add '-u <username>' for filtering user
```
## Show information for specific cluster
By default, any of the above commands shows information of the local cluster which is ***merlin6**.
If you want to see the same information for **merlin5** you have to add the parameter ``--clusters=merlin5``.
If you want to see both clusters at the same time, add the option ``--federation``.
Examples:
```bash
sinfo # 'sinfo' local cluster which is 'merlin6'
sinfo --clusters=merlin5 # 'sinfo' non-local cluster 'merlin5'
sinfo --federation # 'sinfo' all clusters which are 'merlin5' & 'merlin6'
squeue # 'squeue' local cluster which is 'merlin6'
squeue --clusters=merlin5 # 'squeue' non-local cluster 'merlin5'
squeue --federation # 'squeue' all clusters which are 'merlin5' & 'merlin6'
```
---

View File

@ -0,0 +1,84 @@
---
title: Slurm Configuration
#tags:
#keywords:
last_updated: 18 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/slurm-configuration.html
---
## About Merlin5 & Merlin6
The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* cluster will be kept for some time, and it has been renamed to **merlin5**.
It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
From July 2019, **merlin6** becomes the **default cluster** and any job submitted to Slurm will be submitted to that cluster. Users can keep submitting to
the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
## Using Slurm 'merlin6' cluster
Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
### Merlin6 Node definition
The following table show default and maximum resources that can be used per node:
| Nodes | Def.#CPUs | Max.#CPUs | #Threads | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
|:-------------------| ---------:| ---------:| -------- | -----------:| -----------:| ------------:| --------:| --------- | --------- |
| merlin-c-[001-022] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A |
| merlin-c-[101-122] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A |
| merlin-c-[201-222] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A |
| merlin-g-[001] | 1 core | 8 cores | 1 | 4000 | 102400 | 102400 | 10000 | 1 | 2 |
| merlin-g-[002-009] | 1 core | 20 cores | 1 | 4000 | 102400 | 102400 | 10000 | 1 | 4 |
If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
and maximum memory allowed is ``Max.Mem/Node``.
In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
### Merlin6 Slurm partitions
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
The following *partitions* (also known as *queues*) are configured in Slurm:
| Partition | Default Partition | Default Time | Max Time | Max Nodes | Priority |
|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
| **general** | true | 1 day | 1 week | 50 | low |
| **daily** | false | 1 day | 1 day | 60 | medium |
| **hourly** | false | 1 hour | 1 hour | unlimited | highest |
General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
### Merlin6 User limits
By default, users can not use more than 704 cores at the same time (Max CPU per user). This is equivalent to 8 exclusive nodes.
This limit applies to the **general** and **daily** partitions. For the **hourly** partition, there is no restriction and user limits are removed.
Limits are softed for the **daily** partition during non working hours and during the weekend as follows:
| Partition | Mon-Fri 08h-18h | Sun-Thu 18h-0h | From Fri 18h to Sun 8h | From Sun 8h to Mon 18h |
|:----------- | ---------------- | -------------- | ----------------------- | ---------------------- |
| **general** | 704 (user limit) | 704 | 704 | 704 |
| **daily** | 704 (user limit) | 1408 | Unlimited | 1408 |
| **hourly** | Unlimited | Unlimited | Unlimited | Unlimited |
## Understanding the Slurm configuration (for advanced users)
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
to the **merlin6** login nodes.

View File

@ -0,0 +1,155 @@
---
title: Slurm Examples
#tags:
#keywords:
last_updated: 28 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/slurm-examples.html
---
## Basic single core job
### Basic single core job - Example 1
```bash
#!/bin/bash
#SBATCH --partition=hourly # Using 'hourly' will grant higher priority
#SBATCH --ntasks-per-core=1 # Force no Hyper-Threading, will run 1 task per core
#SBATCH --mem-per-cpu=8000 # Double the default memory per cpu
#SBATCH --time=00:30:00 # Define max time job will run
#SBATCH --output=myscript.out # Define your output file
#SBATCH --error=myscript.err # Define your error file
my_script
```
In this example we run a single core job by defining ``--ntasks-per-core=1`` (which is also the default). Since the default memory per
cpu is 4000MB (in Slurm, this is equivalent to the memory per thread), and we are using 1 single thread per core, default memory per CPU
should be doubled: using a single thread will always be accounted as if the job was using the whole physical core (which has 2 available
hyperthreads), hence we want to use the memory as if we were using 2 threads.
### Basic single core job - Example 2
```bash
#!/bin/bash
#SBATCH --partition=hourly # Using 'hourly' will grant higher priority
#SBATCH --ntasks-per-core=1 # Force no Hyper-Threading, will run 1 task per core
#SBATCH --mem=352000 # We want to use the whole memory
#SBATCH --time=00:30:00 # Define max time job will run
#SBATCH --output=myscript.out # Define your output file
#SBATCH --error=myscript.err # Define your error file
my_script
```
In this example we run a single core job by defining ``--ntasks-per-core=1`` (which is also the default). Also, we define that the
job will use the whole memory of a node with ``--mem=352000`` (which is the maximum memory available per Apollo node). Whenever
you want to run a job needing more memory than the default (4000MB per thread) is very important to specify the amount of memory that
the job will use. This must be done in order to avoid conflicts with other jobs from other users.
## Basic MPI with hyper-threading
```bash
#!/bin/bash
#SBATCH --partition=hourly # Using 'hourly' will grant higher priority
#SBATCH --exclusive # Use the node in exclusive mode
#SBATCH --ntasks=88 # Job will run 88 tasks
#SBATCH --ntasks-per-core=2 # Force Hyper-Threading, will run 2 tasks per core
#SBATCH --time=00:30:00 # Define max time job will run
#SBATCH --output=myscript.out # Define your output file
#SBATCH --error=myscript.err # Define your error file
module load gcc/8.3.0 openmpi/3.1.3
MPI_script
```
In this example we run a job that will run 88 tasks. Merlin6 Apollo nodes have 44 cores each one with HT
enabled. This means that we can run 2 threads per core, in total 88 threads. We add the option ``--exclusive`` to
ensure that the node usage is exclusive and no other jobs are running there. Finally, the default memory
per thread is 4000MB, in total this job can use up to 352000MB memory which is the maximum allowed in a single node.
## Basic MPI without hyper-threading
```bash
#!/bin/bash
#SBATCH --partition=hourly # Using 'hourly' will grant higher priority
#SBATCH --ntasks=44 # Job will run 44 tasks
#SBATCH --ntasks-per-core=1 # Force no Hyper-Threading, will run 1 task per core
#SBATCH --mem=352000 # Define the whole memory of the node
#SBATCH --time=00:30:00 # Define max time job will run
#SBATCH --output=myscript.out # Define your output file
#SBATCH --error=myscript.err # Define your output file
module load gcc/8.3.0 openmpi/3.1.3
MPI_script
```
In this example we run a job that will run 44 tasks, and Hyper-Threading will not be used. Merlin6 Apollo nodes have 44 cores
each one with HT enabled. However, defining ``--ntasks-per-core=1`` we force the use of one single thread per core (if this is
not defined, will be the default, but is recommended to add it explicitly). Each task will
run in 1 thread, and each tasks will be assigned to an independent core. We add the option ``--exclusive`` to ensre that the node
usage is exclusive and no other jobs are running there. Finally, since the default memory per thread is 4000MB and we use only
1 thread, we want to avoid using half of the memory: we have to specify that we will use the whole memory of the node with the
option ``--mem=352000`` (which is the maximum memory available in the node)`.
## Advanced Slurm Example
Copy-paste the following example in a file called myAdvancedTest.batch):
```bash
#!/bin/bash
#SBATCH --partition=daily # name of slurm partition to submit
#SBATCH --time=2:00:00 # limit the execution of this job to 2 hours, see sinfo for the max. allowance
#SBATCH --nodes=2 # number of nodes
#SBATCH --ntasks=44 # number of tasks
module load gcc/8.3.0 openmpi/3.1.3
module list
echo "Example no-MPI:" ; hostname # will print one hostname per node
echo "Example MPI:" ; mpirun hostname # will print one hostname per ntask
```
In the above example are specified the options ``--nodes=2`` and ``--ntasks=44``. This means that up 2 nodes are requested,
and is expected to run 44 tasks. Hence, 44 cores are needed for running that job (we do not specify ``--ntasks-per-core``, so it will
default to ``1``). Slurm will try to allocate a maximum of 2 nodes, both together having at least 44 cores.
Since our nodes have 44 cores / each, if nodes are empty (no other users have running jobs there), job can land on a single node
(it has enough cores to run 44 tasks).
If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires
more memory per core) you should specify other options.
A good example is ``--ntasks-per-node=22``. This will equally distribute 22 tasks on 2 nodes.
```bash
#SBATCH --ntasks-per-node=22
```
A different example could be by specifying how much memory per core is needed. For instance ``--mem-per-cpu=32000`` will reserve
~32000MB per core. Since we have a maximum of 352000MB per Apollo node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB) per node.
It means that 4 nodes will be needed (max 11 tasks per node due to memory definition, and we need to run 44 tasks), in this case we need to change ``--nodes=4``
(or remove ``--nodes``). Alternatively, we can decrease ``--mem-per-cpu`` to a lower value which can allow the use of at least 44 cores per node (i.e. with ``16000``
should be able to use 2 nodes)
```bash
#SBATCH --mem-per-cpu=16000
```
Finally, in order to ensure exclusivity of the node, an option *--exclusive* can be used (see below). This will ensure that
the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely
free nodes will be allocated).
```bash
#SBATCH --exclusive
```
This can be combined with the previous examples.
More advanced configurations can be defined and can be combined with the previous examples. More information about advanced
options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch').
If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
advanced configurations unless your are sure of what you are doing.

View File

@ -0,0 +1,64 @@
---
title: Using PModules
#tags:
#keywords:
last_updated: 20 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/using-modules.html
---
## Environment Modules
On top of the operating system stack we provide different software using the PSI developed PModule system.
PModules is the official supported way and each package is deployed by a specific expert. Usually, in PModules
software which is used by many people will be found.
If you miss any package/versions or a software with a specific missing feature, contact us. We will study if is feasible or not to install it.
### Basic commands:
Basic generic commands would be:
```bash
module avail # to see the list of available software packages provided via pmodules
module use unstable # to get access to a set of packages not fully tested by the community
module load <package>/<version> # to load specific software package with a specific version
module search <string> # to search for a specific software package and its dependencies.
module list # to list which software is loaded in your environment
module purge # unload all loaded packages and cleanup the environment
```
Also, you can load multiple packages at once. This can be useful for instance when loading a package with its dependencies:
```bash
# Single line
module load gcc/8.3.0 openmpi/3.1.3
# Multiple line
module load gcc/8.3.0
module load openmpi/3.1.3
```
In the example above, we load ``openmpi/3.1.3`` but we also specify ``gcc/8.3.0`` which is a strict dependency. The dependency must be
loaded in advance.
---
## When to request for new PModules packages
### Missing software
If you don't find a specific software and you know from other people interesing on it, it can be installed in PModules. Please contact us
and we will try to help with that. Deploying new software in PModules may take few days.
Usually installation of new software are possible as long as few users will use it. If you are insterested in to maintain this software,
please let us know.
### Missing version
If the existing PModules versions for a specific package do not fit to your needs, is possible to ask for a new version.
Usually installation of newer versions will be supported, as long as few users will use it. Installation of intermediate versions can
be supported if this is strictly justified.