Doc changes
This commit is contained in:
140
pages/merlin6/05-Software-Support/openmpi.md
Normal file
140
pages/merlin6/05-Software-Support/openmpi.md
Normal file
@ -0,0 +1,140 @@
|
||||
---
|
||||
title: OpenMPI Support
|
||||
#tags:
|
||||
last_updated: 13 March 2020
|
||||
keywords: software, openmpi, slurm
|
||||
summary: "This document describes how to use OpenMPI in the Merlin6 cluster"
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/openmpi.html
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
This document describes which set of OpenMPI versions in PModules are supported in the Merlin6 cluster.
|
||||
|
||||
### srun
|
||||
|
||||
We strongly recommend the use of **'srun'** over **'mpirun'** or **'mpiexec'**. Using **'srun'** would properly
|
||||
bind tasks in to cores and less customization is needed, while **'mpirun'** and '**mpiexec**' might need more advanced
|
||||
configuration and should be only used by advanced users. Please, ***always*** adapt your scripts for using **'srun'**
|
||||
before opening a support ticket. Also, please contact us on any problem when using a module.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
srun ./app
|
||||
```
|
||||
|
||||
{{site.data.alerts.tip}} Always run OpenMPI with the <b>srun</b> command. The only exception is for advanced users, however <b>srun</b> is still recommended.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
### OpenMPI with UCX
|
||||
|
||||
**OpenMPI** supports **UCX** starting from version 3.0, but it’s recommended to use version 4.0 or higher due to stability and performance improvements.
|
||||
**UCX** should be used only by advanced users, as it requires to run it with **mpirun** (needs advanced knowledge) and is an exception for running MPI
|
||||
without **srun** (**UCX** is not integrated at PSI within **srun**).
|
||||
|
||||
For running UCX, one should:
|
||||
|
||||
* add the following options to **mpirun**:
|
||||
```bash
|
||||
-mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1
|
||||
```
|
||||
* or alternatively, add the following options **before mpirun**
|
||||
```bash
|
||||
export OMPI_MCA_pml="ucx"
|
||||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||||
export UCX_NET_DEVICES=mlx5_0:1
|
||||
```
|
||||
|
||||
In addition, one can add the following options for debugging purposes (visit [UCX Logging](https://github.com/openucx/ucx/wiki/Logging) for possible `UCX_LOG_LEVEL` values):
|
||||
|
||||
```bash
|
||||
-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>
|
||||
```
|
||||
|
||||
This can be also added externally before the **mpirun** call (see below example). Full example:
|
||||
|
||||
* Within the **mpirun** command:
|
||||
```bash
|
||||
mpirun -np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_LOG_LEVEL=data -x UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log ./app
|
||||
```
|
||||
* Outside the **mpirun** command:
|
||||
```bash
|
||||
export OMPI_MCA_pml="ucx"
|
||||
export OMPI_MCA_btl="^vader,tcp,openib,uct"
|
||||
export UCX_NET_DEVICES=mlx5_0:1
|
||||
export UCX_LOG_LEVEL=data
|
||||
export UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log
|
||||
|
||||
mpirun -np $SLURM_NTASKS ./app
|
||||
```
|
||||
|
||||
## Supported OpenMPI versions
|
||||
|
||||
For running OpenMPI properly in a Slurm batch system, ***OpenMPI and Slurm must be compiled accordingly***.
|
||||
|
||||
We can find a large number of compilations of OpenMPI modules in the PModules central repositories. However, only
|
||||
some of them are suitable for running in a Slurm cluster: ***any OpenMPI versions with suffixes `_slurm`
|
||||
are suitable for running in the Merlin6 cluster***. Also, OpenMPI with suffix `_merlin6` can be used, but these will be fully
|
||||
replaced by the `_slurm` series in the future (so it can be used on any Slurm cluster at PSI). Please, ***avoid using any other OpenMPI releases***.
|
||||
|
||||
{{site.data.alerts.tip}} Suitable <b>OpenMPI</b> versions for running in the Merlin6 cluster:
|
||||
<p> -
|
||||
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false"><b>openmpi/<version>_slurm</b>
|
||||
</span> <b>[<u>Recommended</u>]</b>
|
||||
</p>
|
||||
<p> -
|
||||
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">openmpi/<version>_merlin6
|
||||
</span>
|
||||
</p>
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
#### 'unstable' repository
|
||||
|
||||
New OpenMPI versions that need to be tested will be compiled first in the **``unstable``** repository, and once validated will be moved to **``stable``**.
|
||||
We can not ensure that modules in that repository are production ready, but you can use it *at your own risk*.
|
||||
|
||||
For using *unstable* modules, you might need to load the **``unstable``** PModules repository as follows:
|
||||
```bash
|
||||
module use unstable
|
||||
```
|
||||
|
||||
#### 'stable' repository
|
||||
|
||||
Officially supported OpenMPI versions (https://www.open-mpi.org/) will be available in the **``stable``** repository (which is the *default* loaded repository).
|
||||
For further information, please check [https://www.open-mpi.org/software/ompi/ -> Current & Still Supported](https://www.open-mpi.org/software/ompi/)
|
||||
versions.
|
||||
|
||||
Usually, not more than 2 minor update releases will be present in the **``stable``** repository. Older minor update releases will be moved to **``deprecated``**
|
||||
despite are officially supported. This will ensure that users compile new software with the latest stable versions, but we keep available the old versions
|
||||
for software which was compiled with it.
|
||||
|
||||
#### 'deprecated' repository
|
||||
|
||||
Old OpenMPI versions (it is, any official OpenMPI version which has been moved to **retired** or **ancient**) will be
|
||||
moved to the ***'deprecated'*** PModules repository.
|
||||
For further information, please check [https://www.open-mpi.org/software/ompi/ -> Older Versions](https://www.open-mpi.org/software/ompi/)
|
||||
versions.
|
||||
|
||||
Also, as mentioned in [before](/merlin6/openmpi.html#stable-repository), older official supported OpenMPI releases (minor updates) will be moved to ``deprecated``.
|
||||
|
||||
For using *deprecated* modules, you might need to load the **``deprecated``** PModules repository as follows:
|
||||
```bash
|
||||
module use deprecated
|
||||
```
|
||||
However, this is usually not needed: when loading directly a specific version in the ``deprecated`` repository, if this is not found in
|
||||
``stable`` it try to search and to fallback to other repositories (``deprecated`` or ``unstable``).
|
||||
|
||||
#### About missing versions
|
||||
|
||||
##### Missing OpenMPI versions
|
||||
|
||||
For legacy software, some users might require a different OpenMPI version. **We always encourage** users to try one of the existing stable versions
|
||||
(*OpenMPI always with suffix ``_slurm`` or ``_merlin6``!*), as they will contain the latest bug fixes and they usually should work. In the worst case, you
|
||||
can also try with the ones in the deprecated repository (again, *OpenMPI always with suffix ``_slurm`` or ``_merlin6``!*), or for very old software which
|
||||
was based on OpenMPI v1 you can follow the guide [FAQ: Removed MPI constructs](https://www.open-mpi.org/faq/?category=mpi-removed), which provides
|
||||
some easy steps for migrating from OpenMPI v1 to v2 or superior or also is useful to find out why your code does not compile properly.
|
||||
|
||||
When, after trying the mentioned versions and guide, you are still facing problems, please contact us. Also, please contact us if you require a newer
|
||||
version with a different ``gcc`` or ``intel`` compiler (in example, Intel v19).
|
Reference in New Issue
Block a user