Changes in OpenMPI
This commit is contained in:
@ -19,18 +19,39 @@ bind tasks in to cores and less customization is needed, while **'mpirun'** and
|
||||
configuration and should be only used by advanced users. Please, ***always*** adapt your scripts for using **'srun'**
|
||||
before opening a support ticket. Also, please contact us on any problem when using a module.
|
||||
|
||||
{{site.data.alerts.tip}} Always run OpenMPI with the <b>srun</b> command.
|
||||
{{site.data.alerts.tip}} Always run OpenMPI with the <b>srun</b> command. The only exception is for advanced users.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
### PModules
|
||||
### OpenMPI with UCX
|
||||
|
||||
**OpenMPI** supports **UCX** starting from version 3.0, but it’s recommended to use version 4.0 or higher due to stability and performance improvements.
|
||||
**UCX** should be used only by advanced users, as it requires to run it with **mpirun** (needs advanced knowledge) and is an exception for running MPI
|
||||
without **srun** (**UCX** is not integrated at PSI within **srun**).
|
||||
|
||||
For running UCX, one should add the following options to **mpirun**:
|
||||
|
||||
```bash
|
||||
mpirun --np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 ./app
|
||||
```
|
||||
|
||||
Alternatively, one can add the following options for debugging purposes (visit [UCX Logging](https://github.com/openucx/ucx/wiki/Logging) for possible `UCX_LOG_LEVEL` values):
|
||||
|
||||
```bash
|
||||
-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>
|
||||
```
|
||||
|
||||
## Supported OpenMPI versions
|
||||
|
||||
For running OpenMPI properly in a Slurm batch system, ***OpenMPI and Slurm must be compiled accordingly***.
|
||||
|
||||
We can find a large number of compilations of OpenMPI modules in the PModules central repositories. However, only
|
||||
some of them are suitable for running in a Slurm cluster: ***any OpenMPI versions with suffixes ``_slurm`` or ``_merlin6``
|
||||
are suitable for running in the Merlin6 cluster***. Please, ***avoid using any other OpenMPI releases***.
|
||||
some of them are suitable for running in a Slurm cluster: ***any OpenMPI versions with suffixes `_slurm`
|
||||
are suitable for running in the Merlin6 cluster***. Also, OpenMPI with suffix `_merlin6` can be used, but these will be fully
|
||||
replaced by the `_slurm` series in the future (so it can be used on any Slurm cluster at PSI). Please, ***avoid using any other OpenMPI releases***.
|
||||
|
||||
{{site.data.alerts.tip}} Suitable <b>OpenMPI</b> versions for running in the Merlin6 cluster:
|
||||
<p> -
|
||||
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">openmpi/<version>_slurm
|
||||
<span class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false"><b>openmpi/<version>_slurm</b>
|
||||
</span> <b>[<u>Recommended</u>]</b>
|
||||
</p>
|
||||
<p> -
|
||||
|
Reference in New Issue
Block a user