169 lines
6.0 KiB
Markdown
169 lines
6.0 KiB
Markdown
# OpenMPI
|
||
|
||
This document describes which set of OpenMPI versions in PModules are supported
|
||
in the Merlin6 cluster.
|
||
|
||
## Usage
|
||
|
||
### srun
|
||
|
||
We strongly recommend the use of **`srun`** over **`mpirun`** or **`mpiexec`**.
|
||
Using **`srun`** would properly bind tasks in to cores and less customization
|
||
is needed, while **`mpirun`** and '**mpiexec**' might need more advanced
|
||
configuration and should be only used by advanced users. Please, ***always***
|
||
adapt your scripts for using **`srun`** before opening a support ticket. Also,
|
||
please contact us on any problem when using a module.
|
||
|
||
Example:
|
||
|
||
```bash
|
||
srun ./app
|
||
```
|
||
|
||
!!! tip
|
||
Always run OpenMPI with the **`srun`** command. The only exception is for
|
||
advanced users, however **`srun`** is still recommended.
|
||
|
||
### OpenMPI with UCX
|
||
|
||
**OpenMPI** supports **UCX** starting from version 3.0, but it’s recommended to
|
||
use version 4.0 or higher due to stability and performance improvements.
|
||
**UCX** should be used only by advanced users, as it requires to run it with
|
||
**`mpirun`** (needs advanced knowledge) and is an exception for running MPI
|
||
without **`srun`** (**UCX** is not integrated at PSI within **`srun`**).
|
||
|
||
For running UCX, one should:
|
||
|
||
* add the following options to **`mpirun`**:
|
||
|
||
```bash
|
||
-mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1
|
||
```
|
||
|
||
* or alternatively, add the following options **before** **`mpirun`**:
|
||
|
||
```bash
|
||
export OMPI_MCA_pml="ucx"
|
||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||
export UCX_NET_DEVICES=mlx5_0:1
|
||
```
|
||
|
||
In addition, one can add the following options for debugging purposes (visit
|
||
[UCX Logging](https://github.com/openucx/ucx/wiki/Logging) for possible
|
||
`UCX_LOG_LEVEL` values):
|
||
|
||
```bash
|
||
-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>
|
||
```
|
||
|
||
This can be also added externally before the **`mpirun`** call (see below
|
||
example). Full example:
|
||
|
||
* Within the **`mpirun`** command:
|
||
|
||
```bash
|
||
mpirun -np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_LOG_LEVEL=data -x UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log ./app
|
||
```
|
||
|
||
* Outside the **`mpirun`** command:
|
||
|
||
```bash
|
||
export OMPI_MCA_pml="ucx"
|
||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||
export UCX_NET_DEVICES=mlx5_0:1
|
||
export UCX_LOG_LEVEL=data
|
||
export UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log
|
||
|
||
mpirun -np $SLURM_NTASKS ./app
|
||
```
|
||
|
||
## Supported OpenMPI versions
|
||
|
||
For running OpenMPI properly in a Slurm batch system, ***OpenMPI and Slurm must
|
||
be compiled accordingly***.
|
||
|
||
We can find a large number of compilations of OpenMPI modules in the PModules
|
||
central repositories. However, only some of them are suitable for running in a
|
||
Slurm cluster: ***any OpenMPI versions with suffixes `_slurm` are suitable for
|
||
running in the Merlin6 cluster***. Also, OpenMPI with suffix `_merlin6` can be
|
||
used, but these will be fully replaced by the `_slurm` series in the future (so
|
||
it can be used on any Slurm cluster at PSI). Please, ***avoid using any other
|
||
OpenMPI releases***.
|
||
|
||
!!! tip
|
||
Suitable **OpenMPI** versions for running in the Merlin6 cluster:
|
||
|
||
* `openmpi/<version>_slurm` *[Recommended]*
|
||
* `openmpi/<version>_merlin6`
|
||
|
||
### 'unstable' repository
|
||
|
||
New OpenMPI versions that need to be tested will be compiled first in the
|
||
**`unstable`** repository, and once validated will be moved to
|
||
**`stable`**. We can not ensure that modules in that repository are
|
||
production ready, but you can use it *at your own risk*.
|
||
|
||
For using *unstable* modules, you might need to load the **`unstable`**
|
||
PModules repository as follows:
|
||
|
||
```bash
|
||
module use unstable
|
||
```
|
||
|
||
### 'stable' repository
|
||
|
||
Officially supported [OpenMPI versions](https://www.open-mpi.org/software/ompi)
|
||
will be available in the **`stable`** repository (which is the *default* loaded
|
||
repository).
|
||
|
||
For further information, please check [*Current* and *still
|
||
supported*](https://www.open-mpi.org/software/ompi/) versions in the left-hand
|
||
sidebar.
|
||
|
||
Usually, not more than 2 minor update releases will be present in the
|
||
**`stable`** repository. Older minor update releases will be moved to
|
||
**`deprecated`** despite are officially supported. This will ensure that
|
||
users compile new software with the latest stable versions, but we keep
|
||
available the old versions for software which was compiled with it.
|
||
|
||
#### 'deprecated' repository
|
||
|
||
Old OpenMPI versions (it is, any official OpenMPI version which has been moved
|
||
to **retired** or **ancient**) will be moved to the ***`deprecated`*** PModules
|
||
repository. For further information, please check [*Older
|
||
versions*](https://www.open-mpi.org/software/ompi/) in the left-hand sidebar.
|
||
versions.
|
||
|
||
Also, as mentioned in [before](#stable-repository), older official supported
|
||
OpenMPI releases (minor updates) will be moved to `deprecated`.
|
||
|
||
For using *deprecated* modules, you might need to load the **`deprecated`**
|
||
PModules repository as follows:
|
||
|
||
```bash
|
||
module use deprecated
|
||
```
|
||
|
||
However, this is usually not needed: when loading directly a specific version
|
||
in the `deprecated` repository, if this is not found in `stable` it try to
|
||
search and to fallback to other repositories (`deprecated` or `unstable`).
|
||
|
||
### About missing versions
|
||
|
||
#### Missing OpenMPI versions
|
||
|
||
For legacy software, some users might require a different OpenMPI version. **We
|
||
always encourage** users to try one of the existing stable versions (*OpenMPI
|
||
always with suffix `_slurm` or `_merlin6`!*), as they will contain the latest
|
||
bug fixes and they usually should work. In the worst case, you can also try
|
||
with the ones in the deprecated repository (again, *OpenMPI always with suffix
|
||
`_slurm` or `_merlin6`!*), or for very old software which was based on OpenMPI
|
||
v1 you can follow the guide [FAQ: Removed MPI
|
||
constructs](https://www.open-mpi.org/faq/?category=mpi-removed), which provides
|
||
some easy steps for migrating from OpenMPI v1 to v2 or superior or also is
|
||
useful to find out why your code does not compile properly.
|
||
|
||
When, after trying the mentioned versions and guide, you are still facing
|
||
problems, please contact us. Also, please contact us if you require a newer
|
||
version with a different `gcc` or `intel` compiler (in example, Intel v19).
|