first stab at mkdocs migration
refactor CSCS and Meg content add merlin6 quick start update merlin6 nomachine docs give the userdoc its own color scheme we use the Materials default one refactored slurm general docs merlin6 add merlin6 JB docs add software support m6 docs add all files to nav vibed changes #1 add missing pages further vibing #2 vibe #3 further fixes
This commit is contained in:
168
docs/merlin6/software-support/openmpi.md
Normal file
168
docs/merlin6/software-support/openmpi.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# OpenMPI
|
||||
|
||||
This document describes which set of OpenMPI versions in PModules are supported
|
||||
in the Merlin6 cluster.
|
||||
|
||||
## Usage
|
||||
|
||||
### srun
|
||||
|
||||
We strongly recommend the use of **`srun`** over **`mpirun`** or **`mpiexec`**.
|
||||
Using **`srun`** would properly bind tasks in to cores and less customization
|
||||
is needed, while **`mpirun`** and '**mpiexec**' might need more advanced
|
||||
configuration and should be only used by advanced users. Please, ***always***
|
||||
adapt your scripts for using **`srun`** before opening a support ticket. Also,
|
||||
please contact us on any problem when using a module.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
srun ./app
|
||||
```
|
||||
|
||||
!!! tip
|
||||
Always run OpenMPI with the **`srun`** command. The only exception is for
|
||||
advanced users, however **`srun`** is still recommended.
|
||||
|
||||
### OpenMPI with UCX
|
||||
|
||||
**OpenMPI** supports **UCX** starting from version 3.0, but it’s recommended to
|
||||
use version 4.0 or higher due to stability and performance improvements.
|
||||
**UCX** should be used only by advanced users, as it requires to run it with
|
||||
**`mpirun`** (needs advanced knowledge) and is an exception for running MPI
|
||||
without **`srun`** (**UCX** is not integrated at PSI within **`srun`**).
|
||||
|
||||
For running UCX, one should:
|
||||
|
||||
* add the following options to **`mpirun`**:
|
||||
|
||||
```bash
|
||||
-mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1
|
||||
```
|
||||
|
||||
* or alternatively, add the following options **before** **`mpirun`**:
|
||||
|
||||
```bash
|
||||
export OMPI_MCA_pml="ucx"
|
||||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||||
export UCX_NET_DEVICES=mlx5_0:1
|
||||
```
|
||||
|
||||
In addition, one can add the following options for debugging purposes (visit
|
||||
[UCX Logging](https://github.com/openucx/ucx/wiki/Logging) for possible
|
||||
`UCX_LOG_LEVEL` values):
|
||||
|
||||
```bash
|
||||
-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>
|
||||
```
|
||||
|
||||
This can be also added externally before the **`mpirun`** call (see below
|
||||
example). Full example:
|
||||
|
||||
* Within the **`mpirun`** command:
|
||||
|
||||
```bash
|
||||
mpirun -np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_LOG_LEVEL=data -x UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log ./app
|
||||
```
|
||||
|
||||
* Outside the **`mpirun`** command:
|
||||
|
||||
```bash
|
||||
export OMPI_MCA_pml="ucx"
|
||||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||||
export UCX_NET_DEVICES=mlx5_0:1
|
||||
export UCX_LOG_LEVEL=data
|
||||
export UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log
|
||||
|
||||
mpirun -np $SLURM_NTASKS ./app
|
||||
```
|
||||
|
||||
## Supported OpenMPI versions
|
||||
|
||||
For running OpenMPI properly in a Slurm batch system, ***OpenMPI and Slurm must
|
||||
be compiled accordingly***.
|
||||
|
||||
We can find a large number of compilations of OpenMPI modules in the PModules
|
||||
central repositories. However, only some of them are suitable for running in a
|
||||
Slurm cluster: ***any OpenMPI versions with suffixes `_slurm` are suitable for
|
||||
running in the Merlin6 cluster***. Also, OpenMPI with suffix `_merlin6` can be
|
||||
used, but these will be fully replaced by the `_slurm` series in the future (so
|
||||
it can be used on any Slurm cluster at PSI). Please, ***avoid using any other
|
||||
OpenMPI releases***.
|
||||
|
||||
!!! tip
|
||||
Suitable **OpenMPI** versions for running in the Merlin6 cluster:
|
||||
|
||||
* `openmpi/<version>_slurm` *[Recommended]*
|
||||
* `openmpi/<version>_merlin6`
|
||||
|
||||
### 'unstable' repository
|
||||
|
||||
New OpenMPI versions that need to be tested will be compiled first in the
|
||||
**`unstable`** repository, and once validated will be moved to
|
||||
**`stable`**. We can not ensure that modules in that repository are
|
||||
production ready, but you can use it *at your own risk*.
|
||||
|
||||
For using *unstable* modules, you might need to load the **`unstable`**
|
||||
PModules repository as follows:
|
||||
|
||||
```bash
|
||||
module use unstable
|
||||
```
|
||||
|
||||
### 'stable' repository
|
||||
|
||||
Officially supported [OpenMPI versions](https://www.open-mpi.org/software/ompi)
|
||||
will be available in the **`stable`** repository (which is the *default* loaded
|
||||
repository).
|
||||
|
||||
For further information, please check [*Current* and *still
|
||||
supported*](https://www.open-mpi.org/software/ompi/) versions in the left-hand
|
||||
sidebar.
|
||||
|
||||
Usually, not more than 2 minor update releases will be present in the
|
||||
**`stable`** repository. Older minor update releases will be moved to
|
||||
**`deprecated`** despite are officially supported. This will ensure that
|
||||
users compile new software with the latest stable versions, but we keep
|
||||
available the old versions for software which was compiled with it.
|
||||
|
||||
#### 'deprecated' repository
|
||||
|
||||
Old OpenMPI versions (it is, any official OpenMPI version which has been moved
|
||||
to **retired** or **ancient**) will be moved to the ***`deprecated`*** PModules
|
||||
repository. For further information, please check [*Older
|
||||
versions*](https://www.open-mpi.org/software/ompi/) in the left-hand sidebar.
|
||||
versions.
|
||||
|
||||
Also, as mentioned in [before](#stable-repository), older official supported
|
||||
OpenMPI releases (minor updates) will be moved to `deprecated`.
|
||||
|
||||
For using *deprecated* modules, you might need to load the **`deprecated`**
|
||||
PModules repository as follows:
|
||||
|
||||
```bash
|
||||
module use deprecated
|
||||
```
|
||||
|
||||
However, this is usually not needed: when loading directly a specific version
|
||||
in the `deprecated` repository, if this is not found in `stable` it try to
|
||||
search and to fallback to other repositories (`deprecated` or `unstable`).
|
||||
|
||||
### About missing versions
|
||||
|
||||
#### Missing OpenMPI versions
|
||||
|
||||
For legacy software, some users might require a different OpenMPI version. **We
|
||||
always encourage** users to try one of the existing stable versions (*OpenMPI
|
||||
always with suffix `_slurm` or `_merlin6`!*), as they will contain the latest
|
||||
bug fixes and they usually should work. In the worst case, you can also try
|
||||
with the ones in the deprecated repository (again, *OpenMPI always with suffix
|
||||
`_slurm` or `_merlin6`!*), or for very old software which was based on OpenMPI
|
||||
v1 you can follow the guide [FAQ: Removed MPI
|
||||
constructs](https://www.open-mpi.org/faq/?category=mpi-removed), which provides
|
||||
some easy steps for migrating from OpenMPI v1 to v2 or superior or also is
|
||||
useful to find out why your code does not compile properly.
|
||||
|
||||
When, after trying the mentioned versions and guide, you are still facing
|
||||
problems, please contact us. Also, please contact us if you require a newer
|
||||
version with a different `gcc` or `intel` compiler (in example, Intel v19).
|
||||
Reference in New Issue
Block a user