Changes in OpenMPI

This commit is contained in:
2020-05-22 17:43:09 +02:00
parent 34a79b5569
commit 05883f5735

View File

@ -19,18 +19,39 @@ bind tasks in to cores and less customization is needed, while **'mpirun'** and
configuration and should be only used by advanced users. Please, ***always*** adapt your scripts for using **'srun'**
before opening a support ticket. Also, please contact us on any problem when using a module.
{{site.data.alerts.tip}} Always run OpenMPI with the <b>srun</b> command.
{{site.data.alerts.tip}} Always run OpenMPI with the <b>srun</b> command. The only exception is for advanced users.
{{site.data.alerts.end}}
### PModules
### OpenMPI with UCX
**OpenMPI** supports **UCX** starting from version 3.0, but its recommended to use version 4.0 or higher due to stability and performance improvements.
**UCX** should be used only by advanced users, as it requires to run it with **mpirun** (needs advanced knowledge) and is an exception for running MPI
without **srun** (**UCX** is not integrated at PSI within **srun**).
For running UCX, one should add the following options to **mpirun**:
```bash
mpirun --np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 ./app
```
Alternatively, one can add the following options for debugging purposes (visit [UCX Logging](https://github.com/openucx/ucx/wiki/Logging) for possible `UCX_LOG_LEVEL` values):
```bash
-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>
```
## Supported OpenMPI versions
For running OpenMPI properly in a Slurm batch system, ***OpenMPI and Slurm must be compiled accordingly***.
We can find a large number of compilations of OpenMPI modules in the PModules central repositories. However, only
some of them are suitable for running in a Slurm cluster: ***any OpenMPI versions with suffixes ``_slurm`` or ``_merlin6``
are suitable for running in the Merlin6 cluster***. Please, ***avoid using any other OpenMPI releases***.
some of them are suitable for running in a Slurm cluster: ***any OpenMPI versions with suffixes `_slurm`
are suitable for running in the Merlin6 cluster***. Also, OpenMPI with suffix `_merlin6` can be used, but these will be fully
replaced by the `_slurm` series in the future (so it can be used on any Slurm cluster at PSI). Please, ***avoid using any other OpenMPI releases***.
{{site.data.alerts.tip}} Suitable <b>OpenMPI</b> versions for running in the Merlin6 cluster:
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">openmpi/&lt;version&gt;&#95;slurm
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false"><b>openmpi/&lt;version&gt;&#95;slurm</b>
</span>&nbsp;<b>[<u>Recommended</u>]</b>
</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;