141 lines
6.8 KiB
Markdown
141 lines
6.8 KiB
Markdown
---
|
||
title: OpenMPI Support
|
||
#tags:
|
||
last_updated: 13 March 2020
|
||
keywords: software, openmpi, slurm
|
||
summary: "This document describes how to use OpenMPI in the Merlin6 cluster"
|
||
sidebar: merlin6_sidebar
|
||
permalink: /merlin6/openmpi.html
|
||
---
|
||
|
||
## Introduction
|
||
|
||
This document describes which set of OpenMPI versions in PModules are supported in the Merlin6 cluster.
|
||
|
||
### srun
|
||
|
||
We strongly recommend the use of **'srun'** over **'mpirun'** or **'mpiexec'**. Using **'srun'** would properly
|
||
bind tasks in to cores and less customization is needed, while **'mpirun'** and '**mpiexec**' might need more advanced
|
||
configuration and should be only used by advanced users. Please, ***always*** adapt your scripts for using **'srun'**
|
||
before opening a support ticket. Also, please contact us on any problem when using a module.
|
||
|
||
Example:
|
||
|
||
```bash
|
||
srun ./app
|
||
```
|
||
|
||
{{site.data.alerts.tip}} Always run OpenMPI with the <b>srun</b> command. The only exception is for advanced users, however <b>srun</b> is still recommended.
|
||
{{site.data.alerts.end}}
|
||
|
||
### OpenMPI with UCX
|
||
|
||
**OpenMPI** supports **UCX** starting from version 3.0, but it’s recommended to use version 4.0 or higher due to stability and performance improvements.
|
||
**UCX** should be used only by advanced users, as it requires to run it with **mpirun** (needs advanced knowledge) and is an exception for running MPI
|
||
without **srun** (**UCX** is not integrated at PSI within **srun**).
|
||
|
||
For running UCX, one should:
|
||
|
||
* add the following options to **mpirun**:
|
||
```bash
|
||
-mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1
|
||
```
|
||
* or alternatively, add the following options **before mpirun**
|
||
```bash
|
||
export OMPI_MCA_pml="ucx"
|
||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||
export UCX_NET_DEVICES=mlx5_0:1
|
||
```
|
||
|
||
In addition, one can add the following options for debugging purposes (visit [UCX Logging](https://github.com/openucx/ucx/wiki/Logging) for possible `UCX_LOG_LEVEL` values):
|
||
|
||
```bash
|
||
-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>
|
||
```
|
||
|
||
This can be also added externally before the **mpirun** call (see below example). Full example:
|
||
|
||
* Within the **mpirun** command:
|
||
```bash
|
||
mpirun -np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_LOG_LEVEL=data -x UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log ./app
|
||
```
|
||
* Outside the **mpirun** command:
|
||
```bash
|
||
export OMPI_MCA_pml="ucx"
|
||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||
export UCX_NET_DEVICES=mlx5_0:1
|
||
export UCX_LOG_LEVEL=data
|
||
export UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log
|
||
|
||
mpirun -np $SLURM_NTASKS ./app
|
||
```
|
||
|
||
## Supported OpenMPI versions
|
||
|
||
For running OpenMPI properly in a Slurm batch system, ***OpenMPI and Slurm must be compiled accordingly***.
|
||
|
||
We can find a large number of compilations of OpenMPI modules in the PModules central repositories. However, only
|
||
some of them are suitable for running in a Slurm cluster: ***any OpenMPI versions with suffixes `_slurm`
|
||
are suitable for running in the Merlin6 cluster***. Also, OpenMPI with suffix `_merlin6` can be used, but these will be fully
|
||
replaced by the `_slurm` series in the future (so it can be used on any Slurm cluster at PSI). Please, ***avoid using any other OpenMPI releases***.
|
||
|
||
{{site.data.alerts.tip}} Suitable <b>OpenMPI</b> versions for running in the Merlin6 cluster:
|
||
<p> -
|
||
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false"><b>openmpi/<version>_slurm</b>
|
||
</span> <b>[<u>Recommended</u>]</b>
|
||
</p>
|
||
<p> -
|
||
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">openmpi/<version>_merlin6
|
||
</span>
|
||
</p>
|
||
{{site.data.alerts.end}}
|
||
|
||
#### 'unstable' repository
|
||
|
||
New OpenMPI versions that need to be tested will be compiled first in the **``unstable``** repository, and once validated will be moved to **``stable``**.
|
||
We can not ensure that modules in that repository are production ready, but you can use it *at your own risk*.
|
||
|
||
For using *unstable* modules, you might need to load the **``unstable``** PModules repository as follows:
|
||
```bash
|
||
module use unstable
|
||
```
|
||
|
||
#### 'stable' repository
|
||
|
||
Officially supported OpenMPI versions (https://www.open-mpi.org/) will be available in the **``stable``** repository (which is the *default* loaded repository).
|
||
For further information, please check [https://www.open-mpi.org/software/ompi/ -> Current & Still Supported](https://www.open-mpi.org/software/ompi/)
|
||
versions.
|
||
|
||
Usually, not more than 2 minor update releases will be present in the **``stable``** repository. Older minor update releases will be moved to **``deprecated``**
|
||
despite are officially supported. This will ensure that users compile new software with the latest stable versions, but we keep available the old versions
|
||
for software which was compiled with it.
|
||
|
||
#### 'deprecated' repository
|
||
|
||
Old OpenMPI versions (it is, any official OpenMPI version which has been moved to **retired** or **ancient**) will be
|
||
moved to the ***'deprecated'*** PModules repository.
|
||
For further information, please check [https://www.open-mpi.org/software/ompi/ -> Older Versions](https://www.open-mpi.org/software/ompi/)
|
||
versions.
|
||
|
||
Also, as mentioned in [before](/merlin6/openmpi.html#stable-repository), older official supported OpenMPI releases (minor updates) will be moved to ``deprecated``.
|
||
|
||
For using *deprecated* modules, you might need to load the **``deprecated``** PModules repository as follows:
|
||
```bash
|
||
module use deprecated
|
||
```
|
||
However, this is usually not needed: when loading directly a specific version in the ``deprecated`` repository, if this is not found in
|
||
``stable`` it try to search and to fallback to other repositories (``deprecated`` or ``unstable``).
|
||
|
||
#### About missing versions
|
||
|
||
##### Missing OpenMPI versions
|
||
|
||
For legacy software, some users might require a different OpenMPI version. **We always encourage** users to try one of the existing stable versions
|
||
(*OpenMPI always with suffix ``_slurm`` or ``_merlin6``!*), as they will contain the latest bug fixes and they usually should work. In the worst case, you
|
||
can also try with the ones in the deprecated repository (again, *OpenMPI always with suffix ``_slurm`` or ``_merlin6``!*), or for very old software which
|
||
was based on OpenMPI v1 you can follow the guide [FAQ: Removed MPI constructs](https://www.open-mpi.org/faq/?category=mpi-removed), which provides
|
||
some easy steps for migrating from OpenMPI v1 to v2 or superior or also is useful to find out why your code does not compile properly.
|
||
|
||
When, after trying the mentioned versions and guide, you are still facing problems, please contact us. Also, please contact us if you require a newer
|
||
version with a different ``gcc`` or ``intel`` compiler (in example, Intel v19).
|