Expanded PModules docs

This commit is contained in:
caubet_m 2021-05-21 18:39:38 +02:00
parent fcfdbf1344
commit 0fd1653938
11 changed files with 219 additions and 69 deletions

View File

@ -13,6 +13,6 @@ entries:
- title: News
url: /news.html
output: web
- title: Merlin 6
- title: The Merlin Local HPC Cluster
url: /merlin6/introduction.html
output: web
output: web

View File

@ -12,21 +12,23 @@ topnav:
topnav_dropdowns:
- title: Topnav dropdowns
folders:
- title: Merlin 6
- title: Quick Start
folderitems:
- title: Introduction
url: /merlin6/introduction.html
- title: Contact
url: /merlin6/contact.html
- title: Using Merlin6
url: /merlin6/use.html
- title: User Guide
url: /merlin6/user-guide.html
- title: Slurm
- title: Requesting Accounts
url: /merlin6/request-account.html
- title: Requesting Projects
url: /merlin6/request-project.html
- title: Accessing the Interactive Nodes
url: /merlin6/interactive.html
- title: Accessing the Slurm Clusters
url: /merlin6/slurm-access.html
- title: Merlin Slurm Clusters
folderitems:
- title: Cluster 'merlin5'
url: /merlin5/slurm-cluster.html
url: /merlin5/slurm-configuration.html
- title: Cluster 'merlin6'
url: /gmerlin6/slurm-cluster.html
url: /gmerlin6/slurm-configuration.html
- title: Cluster 'gmerlin6'
url: /gmerlin6/slurm-cluster.html
url: /gmerlin6/slurm-configuration.html

View File

@ -50,15 +50,15 @@ The table below resumes shows all possible partitions available to users:
| GPU Partition | Default Time | Max Time | PriorityJobFactor\* | PriorityTier\*\* |
|:-----------------: | :----------: | :------: | :-----------------: | :--------------: |
| **<u>gpu</u>** | 1 day | 1 week | 1 | 1 |
| **gpu-short** | 2 hours | 2 hours | 1000 | 500 |
| **gwendolen** | 1 hour | 12 hours | 1000 | 1000 |
| `gpu` | 1 day | 1 week | 1 | 1 |
| `gpu-short` | 2 hours | 2 hours | 1000 | 500 |
| `gwendolen` | 1 hour | 12 hours | 1000 | 1000 |
**\***The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
**\*\***Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
\*\*Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values.
### Merlin6 GPU Accounts
@ -71,11 +71,11 @@ This is mostly needed by users which have multiple Slurm accounts, which may def
```
Not all the accounts can be used on all partitions. This is resumed in the table below:
| Slurm Account | Slurm Partitions |
|:-------------------: | :--------------: |
| **<u>merlin</u>** | `gpu`,`gpu-short` |
| **gwendolen_public** | `gwendolen` |
| **gwendolen** | `gwendolen` |
| Slurm Account | Slurm Partitions |
|:-------------------: | :------------------: |
| **`merlin`** | **`gpu`**,`gpu-short` |
| `gwendolen_public` | `gwendolen` |
| `gwendolen` | `gwendolen` |
By default, all users belong to the `merlin` and `gwendolen_public` Slurm accounts. `gwendolen` is a restricted account.
@ -103,14 +103,61 @@ The GPU type is optional: if left empty, it will try allocating any type of GPU.
The different `[<type>:]` values and `<number>` of GPUs depends on the node.
This is detailed in the below table.
| Nodes | GPU Type | #GPUs |
|:------------------:| :-------------------: | :---: |
| merlin-g-[001] | `geforce_gtx_1080` | 2 |
| merlin-g-[002-005] | `geforce_gtx_1080` | 4 |
| merlin-g-[006-009] | `geforce_gtx_1080_ti` | 4 |
| merlin-g-[010-013] | `geforce_rtx_2080_ti` | 4 |
| merlin-g-014 | `geforce_rtx_2080_ti` | 8 |
| merlin-g-100 | `A100` | 8 |
| Nodes | GPU Type | #GPUs |
|:---------------------: | :-----------------------: | :---: |
| **merlin-g-[001]** | **`geforce_gtx_1080`** | 2 |
| **merlin-g-[002-005]** | **`geforce_gtx_1080`** | 4 |
| **merlin-g-[006-009]** | **`geforce_gtx_1080_ti`** | 4 |
| **merlin-g-[010-013]** | **`geforce_rtx_2080_ti`** | 4 |
| **merlin-g-014** | **`geforce_rtx_2080_ti`** | 8 |
| **merlin-g-100** | **`A100`** | 8 |
#### Constraint / Features
Instead of specifying the GPU **type**, sometimes users would need to **specify the GPU by the amount of memory available in the GPU** card itself.
This has been defined in Slurm with **Features**, which is a tag which defines the GPU memory for the different GPU cards.
Users can specify which GPU memory size needs to be used with the `--constraint` option. In that case, notice that *in many cases
there is not need to specify `[<type>:]`* in the `--gpus` option.
```bash
#SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_40gb
```
The table below shows the available **Features** and which GPU card models and GPU nodes they belong to:
<table>
<thead>
<tr>
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="3">Merlin6 CPU Computing Nodes</th>
</tr>
<tr>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Nodes</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">GPU Type</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Feature</th>
</tr>
</thead>
<tbody>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-[001-005]</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`geforce_gtx_1080`</td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>`gpumem_8gb`</b></td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-[006-009]</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`geforce_gtx_1080_ti`</td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="2"><b>`gpumem_11gb`</b></td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-[010-014]</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`geforce_rtx_2080_ti`</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-100</b></td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1">`A100`</td>
<td markdown="span" style="vertical-align:middle;text-align:center;" rowspan="1"><b>`gpumem_40gb`</b></td>
</tr>
</tbody>
</table>
#### Other GPU options
@ -120,14 +167,14 @@ Below are listed the most common settings:
```bash
#SBATCH --hint=[no]multithread
#SBATCH --ntasks=<ntasks>
#SBATCH --ntasks-per-gpu=<ntasks>
#SBATCH --mem-per-gpu=<size[units]>
#SBATCH --cpus-per-gpu=<ncpus>
#SBATCH --gpus-per-node=[<type>:]<number>
#SBATCH --gpus-per-socket=[<type>:]<number>
#SBATCH --gpus-per-task=[<type>:]<number>
#SBATCH --gpu-bind=[verbose,]<type>
#SBATCH --ntasks=\<ntasks\>
#SBATCH --ntasks-per-gpu=\<ntasks\>
#SBATCH --mem-per-gpu=\<size[units]\>
#SBATCH --cpus-per-gpu=\<ncpus\>
#SBATCH --gpus-per-node=[\<type\>:]\<number\>
#SBATCH --gpus-per-socket=[\<type\>:]\<number\>
#SBATCH --gpus-per-task=[\<type\>:]\<number\>
#SBATCH --gpu-bind=[verbose,]\<type\>
```
Please, notice that when defining `[<type>:]` once, then all other options must use it too!

View File

@ -13,9 +13,11 @@ redirect_from:
## The Merlin local HPC cluster
Historically, the local HPC clusters at PSI were named Merlin. Over the years,
Historically, the local HPC clusters at PSI were named **Merlin**. Over the years,
multiple generations of Merlin have been deployed.
At present, the **Merlin local HPC cluster** contains _two_ generations of it: the old **Merlin5** cluster and the newest **Merlin6**.
### Merlin6
Merlin6 is a the official PSI Local HPC cluster for development and
@ -26,16 +28,27 @@ Merlin6 is designed to be extensible, so is technically possible to add
more compute nodes and cluster storage without significant increase of
the costs of the manpower and the operations.
Merlin6 is mostly based on **CPU** resources, but also contains a small amount
of **GPU**-based resources which are mostly used by the BIO Division and Deep Learning projects:
* The Merlin6 CPU nodes are in a dedicated Slurm cluster called [**`merlin6`**](/merlin6/slurm-configuration.html).
* This is the default Slurm cluster configured in the login nodes, and any job submitted without the option `--cluster` will be submited to this cluster.
* The Merlin6 GPU resources are in a dedicated Slurm cluster called [**`gmerlin6`**](/gmerlin6/slurm-configuration.html).
Merlin6 contains all the main services needed for running cluster, including
**login nodes**, **storage**, **computing nodes** and other *subservices*,
connected to the central PSI IT infrastructure.
#### CPU and GPU Slurm clusters
The Merlin6 **computing nodes** are mostly based on **CPU** resources. However,
it also contains a small amount of **GPU**-based resources, which are mostly used
by the BIO Division and by Deep Leaning project.
These computational resources are split into **two** different **[Slurm](https://slurm.schedmd.com/overview.html)** clusters:
* The Merlin6 CPU nodes are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`merlin6`**](/merlin6/slurm-configuration.html).
* This is the **default Slurm cluster** configured in the login nodes: any job submitted without the option `--cluster` will be submited to this cluster.
* The Merlin6 GPU resources are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`gmerlin6`**](/gmerlin6/slurm-configuration.html).
* Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``.
### Merlin5
The old Slurm **CPU** *merlin* cluster is still active and is maintained in a best effort basis.
**Merlin5** only contains **computing nodes** resources in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster.
* The Merlin5 CPU cluster is called [**merlin5**](/merlin5/slurm-configuration.html).
## Merlin Architecture

View File

@ -28,7 +28,7 @@ Official X11 Forwarding support is through NoMachine. Please follow the document
we provide a small recipe for enabling X11 Forwarding in Linux.
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
to implicitly add ``-Y`` to all ssh connections:
to implicitly add ``-X`` to all ssh connections:
```bash
ForwardAgent yes
@ -38,9 +38,9 @@ to implicitly add ``-Y`` to all ssh connections:
* Alternatively, you can add the option ``-Y`` to the ``ssh`` command. In example:
```bash
ssh -Y $username@merlin-l-01.psi.ch
ssh -Y $username@merlin-l-001.psi.ch
ssh -Y $username@merlin-l-002.psi.ch
ssh -X $username@merlin-l-01.psi.ch
ssh -X $username@merlin-l-001.psi.ch
ssh -X $username@merlin-l-002.psi.ch
```
* For testing that X11 forwarding works, just run ``xclock``. A X11 based clock should

View File

@ -38,7 +38,7 @@ we provide a small recipe for enabling X11 Forwarding in MacOS.
* Ensure that **[XQuartz](https://www.xquartz.org/)** is installed and running in your MacOS.
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
to implicitly add ``-Y`` to all ssh connections:
to implicitly add ``-X`` to all ssh connections:
```bash
ForwardAgent yes
@ -48,9 +48,9 @@ to implicitly add ``-Y`` to all ssh connections:
* Alternatively, you can add the option ``-Y`` to the ``ssh`` command. In example:
```bash
ssh -Y $username@merlin-l-01.psi.ch
ssh -Y $username@merlin-l-001.psi.ch
ssh -Y $username@merlin-l-002.psi.ch
ssh -X $username@merlin-l-01.psi.ch
ssh -X $username@merlin-l-001.psi.ch
ssh -X $username@merlin-l-002.psi.ch
```
* For testing that X11 forwarding works, just run ``xclock``. A X11 based clock should

View File

@ -2,7 +2,7 @@
title: Using PModules
#tags:
#keywords:
last_updated: 20 June 2019
last_updated: 21 May 2021
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/using-modules.html
@ -17,11 +17,47 @@ software which is used by many people will be found.
If you miss any package/versions or a software with a specific missing feature, contact us. We will study if is feasible or not to install it.
### Basic commands:
## Module release stages
Basic generic commands would be:
Three different **release stages** are available in Pmodules, ensuring proper software life cycling. These are the following: **`unstable`**, **`stable`** and **`deprecated`**
### Unstable release stage
The **`unstable`** release stage contains *unstable* releases of software. Software compilations here are usually under development or are not fully production ready.
This release stage is **not directly visible** by the end users, and needs to be explicitly invoked as follows:
```bash
module use unstable
```
Once software is validated and considered production ready, this is moved to the `stable` release stage.
### Stable release stage
The **`stable`** release stage contains *stable* releases of software, which have been deeply tested and are fully supported.
This is the ***default*** release stage, and is visible by default. Whenever possible, users are strongly advised to use packages from this release stage.
### Deprecated release stage
The **`deprecated`** release stage contains *deprecated* releases of software. Software in this release stage is usually deprecated or discontinued by their developers.
Also, minor versions or redundant compilations are moved here as long as there is a valid copy in the *stable* repository.
This release stage is **not directly visible** by the users, and needs to be explicitly invoked as follows:
```bash
module use deprecated
```
However, software moved to this release stage can be directly loaded without the need of invoking it. This ensure proper life cycling of the software, and making it transparent for the end users.
## PModules commands
Below is listed a summary of all available commands:
```bash
module use # show all available PModule Software Groups as well as Release Stages
module avail # to see the list of available software packages provided via pmodules
module use unstable # to get access to a set of packages not fully tested by the community
module load <package>/<version> # to load specific software package with a specific version
@ -30,21 +66,73 @@ module list # to list which software is loaded in your envi
module purge # unload all loaded packages and cleanup the environment
```
Also, you can load multiple packages at once. This can be useful for instance when loading a package with its dependencies:
### module use/unuse
Without any parameter, `use` **lists** all available PModule **Software Groups and Release Stages**.
```bash
module use
```
When followed by a parameter, `use`/`unuse` invokes/uninvokes a PModule **Software Group** or **Release Stage**.
```bash
module use EM # Invokes the 'EM' software group
module unuse EM # Uninvokes the 'EM' software group
module use unstable # Invokes the 'unstable' Release stable
module unuse unstable # Uninvokes the 'unstable' Release stable
```
### module avail
This option **lists** all available PModule **Software Groups and their packages**.
Please run `module avail --help` for further listing options.
### module search
This is used to **search** for **software packages**. By default, if no **Release Stage** or **Software Group** is specified
in the options of the `module search` command, it will search from the already invoked *Software Groups* and *Release Stages*.
Direct package dependencies will be also showed.
```bash
(base) [caubet_m@merlin-l-001 caubet_m]$ module search openmpi/4.0.5_slurm
Module Release Group Requires
---------------------------------------------------------------------------
openmpi/4.0.5_slurm stable Compiler gcc/8.4.0
openmpi/4.0.5_slurm stable Compiler gcc/9.2.0
openmpi/4.0.5_slurm stable Compiler gcc/9.3.0
openmpi/4.0.5_slurm stable Compiler intel/20.4
(base) [caubet_m@merlin-l-001 caubet_m]$ module load intel/20.4 openmpi/4.0.5_slurm
```
Please run `module search --help` for further search options.
### module load/unload
This loads/unloads specific software packages. Packages might have direct dependencies that need to be loaded first. Other dependencies
will be automatically loaded.
In the example below, the ``openmpi/4.0.5_slurm`` package will be loaded, however ``gcc/9.3.0`` must be loaded as well as this is a strict dependency. Direct dependencies must be loaded in advance. Users can load multiple packages one by one or at once. This can be useful for instance when loading a package with direct dependencies.
```bash
# Single line
module load gcc/9.2.0 openmpi/3.1.5-1_merlin6
module load gcc/9.3.0 openmpi/4.0.5_slurm
# Multiple line
module load gcc/9.2.0
module load openmpi/3.1.5-1_merlin6
module load gcc/9.3.0
module load openmpi/4.0.5_slurm
```
In the example above, we load ``openmpi/3.1.5-1_merlin6`` but we also specify ``gcc/9.2.0`` which is a strict dependency. The dependency must be
loaded in advance.
#### module purge
---
This command is an alternative to `module unload`, which can be used to unload **all** loaded module files.
```bash
module purge
```
## When to request for new PModules packages

View File

@ -26,21 +26,21 @@ Link processing in Jekyll
Code | Result | Baseurl
---- | ------ | -------
`{%raw%}[Normal link to source]{%endraw%}{%raw%}(/pages/merlin6/01 introduction/introduction.md){%endraw%}` | [Normal link to source](/pages/merlin6/01 introduction/introduction.md) | ✅
`{%raw%}[Normal link to source]{%endraw%}{%raw%}(/pages/merlin6/01-Quick-Start-Guide/introduction.md){%endraw%}` | [Normal link to source](/pages/merlin6/01-Quick-Start-Guide/introduction.md) | ✅
`{%raw%}[Normal link to result](/merlin6/introduction.html){%endraw%}` | [Normal link to result](/merlin6/introduction.html) | ❌
`{%raw%}[Invalid Escaped link to source]({{"/pages/merlin6/01 introduction/introduction.md"}}){%endraw%}` | [Invalid Escaped link to source]({{"/pages/merlin6/01 introduction/introduction.md"}}) | ❌❗
`{%raw%}[Invalid Escaped link to source]({{"/pages/merlin6/01-Quick-Start-Guide/introduction.md"}}){%endraw%}` | [Invalid Escaped link to source]({{"/pages/merlin6/01-Quick-Start-Guide/introduction.md"}}) | ❌❗
`{%raw%}[Escaped link to result]({{"/merlin6/introduction.html"}}){%endraw%}` | [Escaped link to result]({{"/merlin6/introduction.html"}}) | ❌
`{%raw%}[Reference link to source](srcRef){%endraw%}` | [Reference link to source][srcRef] | ✅
`{%raw%}[Reference link to result](dstRef){%endraw%}` | [Reference link to result][dstRef] | ❌
`{%raw%}[Liquid Link]({% link pages/merlin6/01 introduction/introduction.md %}){%endraw%}` | [Liquid Link]({% link pages/merlin6/01 introduction/introduction.md %}) | ❌
`{%raw%}[Liquid Link]({% link pages/merlin6/01-Quick-Start-Guide/introduction.md %}){%endraw%}` | [Liquid Link]({% link pages/merlin6/01-Quick-Start-Guide/introduction.md %}) | ❌
`{%raw%}![PSI Logo](/images/psi-logo.png){%endraw%}` | ![PSI Logo](/images/psi-logo.png) | ✅
`{%raw%}![Escaped PSI Logo]({{ "/images/psi-logo.png" }}){%endraw%}` | ![PSI Logo from liquid]({{ "/images/psi-logo.png" }}) | ❌
`{%raw%}{% include inline_image.html file="psi-logo.png" alt="Included PSI Logo" %}{%endraw%}` | {% include inline_image.html file="psi-logo.png" alt="Included PSI Logo" -%} | | ❌
`{%raw%}{{ "/pages/merlin6/01 introduction/introduction.md" | relative_url }}{%endraw%}` | {{ "/pages/merlin6/01 introduction/introduction.md" | relative_url }} | ✅❗
`{%raw%}{{ "/pages/merlin6/01-Quick-Start-Guide/introduction.md" | relative_url }}{%endraw%}` | {{ "/pages/merlin6/01-Quick-Start-Guide/introduction.md" | relative_url }} | ✅❗
`{%raw%}{{ "/merlin6/introduction.html" | relative_url }}{%endraw%}` | {{ "/merlin6/introduction.html" | relative_url }} | ✅
`{%raw%}{% link pages/merlin6/01 introduction/introduction.md %}{%endraw%}` | {% link pages/merlin6/01 introduction/introduction.md %} | ✅
`{%raw%}{% link pages/merlin6/01-Quick-Start-Guide/introduction.md %}{%endraw%}` | {% link pages/merlin6/01-Quick-Start-Guide/introduction.md %} | ✅
[srcRef]: /pages/merlin6/01 introduction/introduction.md
[srcRef]: /pages/merlin6/01-Quick-Start-Guide/introduction.md
[dstRef]: /merlin6/introduction.html
Key: