Update Mrlin6 documentation with latest changes, added new pages

2019-06-13 08:41:29 +02:00
parent caa63db616
commit caa558a090
9 changed files with 942 additions and 0 deletions
--- a/pages/merlin6-user-guide/merlin6-slurm/merlin6-slurm.md
+++ b/pages/merlin6-user-guide/merlin6-slurm/merlin6-slurm.md
@ -0,0 +1,13 @@
+---
+layout: default
+title: Merlin6 Slurm
+parent: Merlin6 User Guide
+nav_order: 5
+has_children: true
+permalink: /docs/merlin6-user-guide/merlin6-slurm/merlin6-slurm.html
+---
+
+# Merlin6 Slurm
+
+Wellcome to the PSI Merlin6 Slurm Cluster.
+
--- a/pages/merlin6-user-guide/merlin6-slurm/slurm-basic-commands.md
+++ b/pages/merlin6-user-guide/merlin6-slurm/slurm-basic-commands.md
@ -0,0 +1,152 @@
+---
+layout: default
+title: Slurm Basic Commands
+parent: Merlin6 Slurm
+grand_parent: Merlin6 User Guide
+nav_order: 1
+---
+
+# Slurm Basic Commands
+{: .no_toc }
+
+## Table of contents
+{: .no_toc .text-delta }
+
+1. TOC
+{:toc}
+
+---
+
+## Basic commands
+
+Useful commands for the slurm:
+
+```bash
+sinfo            # to see the name of nodes, their occupancy, name of slurm partitions, limits (try out with "-l" option)
+squeue           # to see the currently running/waiting jobs in slurm (additional "-l" option may also be useful)
+sbatch Script.sh # to submit a script (example below) to the slurm.
+srun command     # to submit a command to Slurm. Same options as in 'sbatch' can be used.
+salloc           # to allocate computing nodes. Useful for running interactive jobs (ANSYS, Python Notebooks, etc.).
+scancel job_id   # to cancel slurm job, job id is the numeric id, seen by the squeue
+```
+
+Other advanced basic commands:
+
+```bash
+sinfo -N -l      # list nodes, state, resources (number of CPUs, memory per node, etc.), and other information
+sshare -a        # to list shares of associations to a cluster
+sprio -l         # to view the factors that comprise a job's scheduling priority (add -u <username> for filtering user)
+```
+
+---
+
+## Basic slurm example
+
+You can copy-paste the following example in a file file called ``mySlurm.batch``. 
+Some basic parameters are explained in the example. 
+Please notice that ``#`` is an enabled option while ``##`` is a commented out option (no effect).
+
+```bash
+#!/bin/sh
+#SBATCH --partition=daily         # name of slurm partition to submit. Can be 'general' (default if not specified), 'daily', 'hourly'. 
+#SBATCH --job-name="mySlurmTest"  # name of the job. Useful when submitting different types of jobs for filtering (i.e. 'squeue' command)
+#SBATCH --time=0-12:00:00         # time limit. Here is shortened to 12 hours (default and max for 'daily' is 1 day).
+#SBATCH --exclude=merlin-c-001    # exclude which nodes you don't want to submit
+#SBATCH --nodes=10                # number of nodes you want to allocate the job
+#SBATCH --ntasks=440              # number of tasks to run
+##SBATCH --exclusive              # enable if you need exclusive usage of a node. If this option is not specified, nodes are shared by default.
+##SBATCH --ntasks-per-node=32     # number of tasks per node. Each Apollo node has 44 cores, using less with exclusive mode may help to turbo boost CPU frequency. If this option is enabled, setup --ntasks and --nodes according to it.
+
+module load gcc/8.3.0 openmpi/3.1.3
+
+echo "Example no-MPI:" ; hostname        # will print one hostname per node
+echo "Example MPI:"    ; mpirun hostname # will print one hostname per ntask
+```
+
+### Submitting a job
+
+Submit job to slurm and check it's status:
+
+```bash
+sbatch mySlurm.batch  # submit this job to slurm
+squeue                # check its status
+```
+
+---
+
+## Advanced slurm test script 
+
+Copy-paste the following example in a file called myAdvancedTest.batch):
+
+```bash
+#!/bin/bash
+#SBATCH --partition=merlin # name of slurm partition to submit
+#SBATCH --time=2:00:00     # limit the execution of this job to 2 hours, see sinfo for the max. allowance 
+#SBATCH --nodes=2          # number of nodes  
+#SBATCH --ntasks=24        # number of tasks
+ 
+module load gcc/8.3.0 openmpi/3.1.3
+module list
+
+echo "Example no-MPI:" ; hostname        # will print one hostname per node
+echo "Example MPI:"    ; mpirun hostname # will print one hostname per ntask
+```
+
+In the above example are specified the options ``--nodes=2`` and ``--ntasks=24``. This means that up 2 nodes are requested, 
+and is expected to run 24 tasks. Hence, 24 cores are needed for running that job. Slurm will try to allocate a maximum of 2 nodes, 
+both together having at least 24 cores. Since our nodes have 44 cores / each, if nodes are empty (no other users
+have running jobs there), job will land on a single node (it has enough cores to run 24 tasks).
+
+If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires
+more memory per core) you should specify other options. 
+
+A good example is ``--ntasks-per-node=12``. This will equally distribute 12 tasks on 2 nodes. 
+
+```bash
+#SBATCH --ntasks-per-node=12
+```
+
+A different example could be by specifying how much memory per core is needed. For instance ``--mem-per-cpu=32000`` will reserve
+~32000MB per core. Since we have a maximum of 352000MB per node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB).
+It means that 3 nodes will be needed (can not run 12 tasks per node, only 11), or we need to decrease ``--mem-per-cpu`` to a lower value which
+can allow the use of at least 12 cores per node (i.e. ``28000``)
+
+```bash
+#SBATCH --mem-per-cpu=28000
+```
+
+Finally, in order to ensure exclusivity of the node, an option *--exclusive* can be used (see below). This will ensure that 
+the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely 
+free nodes will be allocated). 
+
+```bash
+#SBATCH --exclusive
+```
+
+This can be combined with the previous examples.
+
+More advanced configurations can be defined and can be combined with the previous examples. More information about advanced 
+options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch').
+
+If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run
+advanced configurations unless your are sure of what you are doing.
+
+---
+
+## Environment Modules 
+
+ On top of the operating system stack we provide different software using the PSI developed
+pmodule system. Useful commands:
+
+```bash
+module avail                                       # to see the list of available software provided via pmodules
+module load gnuplot/5.2.0                          # to load specific version of gnuplot package
+module search hdf                                  # try it out to see which version of hdf5 package is provided and with which dependencies
+module load gcc/6.2.0 openmpi/1.10.2 hdf5/1.8.17   # load the specific version of hdf5, compiled with specific version of gcc and openmpi
+module use unstable                                # to get access to the packages which are not considered to be fully stable by module provider (may be very fresh version, or not yet tested by community)
+module list                                        # to see which software is loaded in your environment 
+```
+
+### Requests for New Software 
+
+If you miss some package/version, contact us
--- a/pages/merlin6-user-guide/merlin6-slurm/slurm-configuration.md
+++ b/pages/merlin6-user-guide/merlin6-slurm/slurm-configuration.md
@ -0,0 +1,90 @@
+---
+layout: default
+title: Slurm Configuration
+parent: Merlin6 Slurm
+grand_parent: Merlin6 User Guide
+nav_order: 2
+---
+
+# Slurm Configuration
+{: .no_toc }
+
+## Table of contents
+{: .no_toc .text-delta }
+
+1. TOC
+{:toc}
+
+---
+
+## Using the Slurm batch system
+
+Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs. 
+Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
+
+Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
+
+For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
+
+* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
+* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
+* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
+
+The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files. 
+Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
+to the **merlin6** login nodes.
+
+### About Merlin5 & Merlin6
+
+The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* cluster will be kept for some time, and it has been renamed to **merlin5**.
+It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
+
+From July 2019, **merlin6** becomes the **default cluster** and any job submitted to Slurm will be submitted to that cluster. Users can keep submitting to
+the old *merlin5* computing nodes by using the option ``--cluster=merlin5``. 
+
+In this documentation is only explained the usage of the **merlin6** Slurm cluster. 
+
+### Using Slurm 'merlin6' cluster
+
+Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
+
+#### Merlin6 Node definition
+
+The following table show default and maximum resources that can be used per node:
+
+| Nodes                              | Def.#CPUs | Max.#CPUs | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
+|:---------------------------------- | ---------:| ---------:| -----------:| -----------:| ------------:| --------:| --------- | --------- |
+| merlin-c-[001-022,101-122,201-222] | 1 core    | 44 cores  | 8000        | 352000      | 352000       | 10000    | N/A       | N/A       |
+| merlin-g-[001]                     | 1 core    | 8 cores   | 8000        | 102498      | 102498       | 10000    | 1         | 2         |
+| merlin-g-[002-009]                 | 1 core    | 10 cores  | 8000        | 102498      | 102498       | 10000    | 1         | 4         |
+
+If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
+and maximum memory allowed is ``Max.Mem/Node``.
+
+In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
+
+#### Merlin6 Slurm partitions
+
+Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option. 
+The following *partitions* (also known as *queues*) are configured in Slurm:
+
+| Partition   | Default Partition | Default Time | Max Time | Max Nodes | Priority |
+|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
+| **general** | true              | 1 day        | 1 week   | 50        | low      | 
+| **daily**   | false             | 1 day        | 1 day    | 60        | medium   | 
+| **hourly**  | false             | 1 hour       | 1 hour   | unlimited | highest  | 
+
+General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes 
+running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
+longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
+
+#### Merlin6 User limits
+
+By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general**  and **daily** partitions. For the **hourly** partition, there is no restriction. 
+These limits are softed for the **daily** partition during non working hours and during the weekend as follows:
+
+| Partition   | Mon-Fri 08h-18h | Sun-Thu 18h-0h | From Fri 18h to Sun 8h  | From Sun 8h to Mon 18h |
+|:----------- | --------------- | -------------- | ----------------------- | ---------------------- |
+| **general** | 528             | 528            | 528                     | 528                    | 
+| **daily**   | 528             | 792            | Unlimited               | 792                    | 
+| **hourly**  | Unlimited       | Unlimited      | Unlimited               | Unlimited              |