6.0 KiB
OpenMPI
This document describes which set of OpenMPI versions in PModules are supported in the Merlin6 cluster.
Usage
srun
We strongly recommend the use of srun over mpirun or mpiexec.
Using srun would properly bind tasks in to cores and less customization
is needed, while mpirun and 'mpiexec' might need more advanced
configuration and should be only used by advanced users. Please, always
adapt your scripts for using srun before opening a support ticket. Also,
please contact us on any problem when using a module.
Example:
srun ./app
!!! tip
Always run OpenMPI with the srun command. The only exception is for
advanced users, however srun is still recommended.
OpenMPI with UCX
OpenMPI supports UCX starting from version 3.0, but it’s recommended to
use version 4.0 or higher due to stability and performance improvements.
UCX should be used only by advanced users, as it requires to run it with
mpirun (needs advanced knowledge) and is an exception for running MPI
without srun (UCX is not integrated at PSI within srun).
For running UCX, one should:
-
add the following options to
mpirun:-mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -
or alternatively, add the following options before
mpirun:export OMPI_MCA_pml="ucx" export OMPI_MCA_btl="^vader,tcp,openib,uct" export UCX_NET_DEVICES=mlx5_0:1
In addition, one can add the following options for debugging purposes (visit
UCX Logging for possible
UCX_LOG_LEVEL values):
-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>
This can be also added externally before the mpirun call (see below
example). Full example:
-
Within the
mpiruncommand:mpirun -np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_LOG_LEVEL=data -x UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log ./app -
Outside the
mpiruncommand:export OMPI_MCA_pml="ucx" export OMPI_MCA_btl="^vader,tcp,openib,uct" export UCX_NET_DEVICES=mlx5_0:1 export UCX_LOG_LEVEL=data export UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log mpirun -np $SLURM_NTASKS ./app
Supported OpenMPI versions
For running OpenMPI properly in a Slurm batch system, OpenMPI and Slurm must be compiled accordingly.
We can find a large number of compilations of OpenMPI modules in the PModules
central repositories. However, only some of them are suitable for running in a
Slurm cluster: any OpenMPI versions with suffixes _slurm are suitable for
running in the Merlin6 cluster. Also, OpenMPI with suffix _merlin6 can be
used, but these will be fully replaced by the _slurm series in the future (so
it can be used on any Slurm cluster at PSI). Please, avoid using any other
OpenMPI releases.
!!! tip Suitable OpenMPI versions for running in the Merlin6 cluster:
* `openmpi/<version>_slurm` *[Recommended]*
* `openmpi/<version>_merlin6`
'unstable' repository
New OpenMPI versions that need to be tested will be compiled first in the
unstable repository, and once validated will be moved to
stable. We can not ensure that modules in that repository are
production ready, but you can use it at your own risk.
For using unstable modules, you might need to load the unstable
PModules repository as follows:
module use unstable
'stable' repository
Officially supported OpenMPI versions
will be available in the stable repository (which is the default loaded
repository).
For further information, please check Current and still supported versions in the left-hand sidebar.
Usually, not more than 2 minor update releases will be present in the
stable repository. Older minor update releases will be moved to
deprecated despite are officially supported. This will ensure that
users compile new software with the latest stable versions, but we keep
available the old versions for software which was compiled with it.
'deprecated' repository
Old OpenMPI versions (it is, any official OpenMPI version which has been moved
to retired or ancient) will be moved to the deprecated PModules
repository. For further information, please check Older
versions in the left-hand sidebar.
versions.
Also, as mentioned in before, older official supported
OpenMPI releases (minor updates) will be moved to deprecated.
For using deprecated modules, you might need to load the deprecated
PModules repository as follows:
module use deprecated
However, this is usually not needed: when loading directly a specific version
in the deprecated repository, if this is not found in stable it try to
search and to fallback to other repositories (deprecated or unstable).
About missing versions
Missing OpenMPI versions
For legacy software, some users might require a different OpenMPI version. We
always encourage users to try one of the existing stable versions (OpenMPI
always with suffix _slurm or _merlin6!), as they will contain the latest
bug fixes and they usually should work. In the worst case, you can also try
with the ones in the deprecated repository (again, OpenMPI always with suffix
_slurm or _merlin6!), or for very old software which was based on OpenMPI
v1 you can follow the guide FAQ: Removed MPI
constructs, which provides
some easy steps for migrating from OpenMPI v1 to v2 or superior or also is
useful to find out why your code does not compile properly.
When, after trying the mentioned versions and guide, you are still facing
problems, please contact us. Also, please contact us if you require a newer
version with a different gcc or intel compiler (in example, Intel v19).