2021-05-21 12:34:19 +02:00

6.8 KiB
Raw Permalink Blame History

title, last_updated, keywords, summary, sidebar, permalink
title last_updated keywords summary sidebar permalink
OpenMPI Support 13 March 2020 software, openmpi, slurm This document describes how to use OpenMPI in the Merlin6 cluster merlin6_sidebar /merlin6/openmpi.html

Introduction

This document describes which set of OpenMPI versions in PModules are supported in the Merlin6 cluster.

srun

We strongly recommend the use of 'srun' over 'mpirun' or 'mpiexec'. Using 'srun' would properly bind tasks in to cores and less customization is needed, while 'mpirun' and 'mpiexec' might need more advanced configuration and should be only used by advanced users. Please, always adapt your scripts for using 'srun' before opening a support ticket. Also, please contact us on any problem when using a module.

Example:

srun ./app

{{site.data.alerts.tip}} Always run OpenMPI with the srun command. The only exception is for advanced users, however srun is still recommended. {{site.data.alerts.end}}

OpenMPI with UCX

OpenMPI supports UCX starting from version 3.0, but its recommended to use version 4.0 or higher due to stability and performance improvements. UCX should be used only by advanced users, as it requires to run it with mpirun (needs advanced knowledge) and is an exception for running MPI without srun (UCX is not integrated at PSI within srun).

For running UCX, one should:

  • add the following options to mpirun:
    -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1
    
  • or alternatively, add the following options before mpirun
    export OMPI_MCA_pml="ucx"
    export OMPI_MCA_btl="^vader,tcp,openib,uct"
    export UCX_NET_DEVICES=mlx5_0:1
    

In addition, one can add the following options for debugging purposes (visit UCX Logging for possible UCX_LOG_LEVEL values):

-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>

This can be also added externally before the mpirun call (see below example). Full example:

  • Within the mpirun command:
    mpirun -np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_LOG_LEVEL=data -x UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log ./app
    
  • Outside the mpirun command:
    export OMPI_MCA_pml="ucx"
    export OMPI_MCA_btl="^vader,tcp,openib,uct"
    export UCX_NET_DEVICES=mlx5_0:1
    export UCX_LOG_LEVEL=data
    export UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log
    
    mpirun -np $SLURM_NTASKS ./app
    

Supported OpenMPI versions

For running OpenMPI properly in a Slurm batch system, OpenMPI and Slurm must be compiled accordingly.

We can find a large number of compilations of OpenMPI modules in the PModules central repositories. However, only some of them are suitable for running in a Slurm cluster: any OpenMPI versions with suffixes _slurm are suitable for running in the Merlin6 cluster. Also, OpenMPI with suffix _merlin6 can be used, but these will be fully replaced by the _slurm series in the future (so it can be used on any Slurm cluster at PSI). Please, avoid using any other OpenMPI releases.

{{site.data.alerts.tip}} Suitable OpenMPI versions for running in the Merlin6 cluster:

            -  openmpi/<version>_slurm  [Recommended]

            -  openmpi/<version>_merlin6

{{site.data.alerts.end}}

'unstable' repository

New OpenMPI versions that need to be tested will be compiled first in the unstable repository, and once validated will be moved to stable. We can not ensure that modules in that repository are production ready, but you can use it at your own risk.

For using unstable modules, you might need to load the unstable PModules repository as follows:

module use unstable

'stable' repository

Officially supported OpenMPI versions (https://www.open-mpi.org/) will be available in the stable repository (which is the default loaded repository). For further information, please check https://www.open-mpi.org/software/ompi/ -> Current & Still Supported versions.

Usually, not more than 2 minor update releases will be present in the stable repository. Older minor update releases will be moved to deprecated despite are officially supported. This will ensure that users compile new software with the latest stable versions, but we keep available the old versions for software which was compiled with it.

'deprecated' repository

Old OpenMPI versions (it is, any official OpenMPI version which has been moved to retired or ancient) will be moved to the 'deprecated' PModules repository. For further information, please check https://www.open-mpi.org/software/ompi/ -> Older Versions versions.

Also, as mentioned in before, older official supported OpenMPI releases (minor updates) will be moved to deprecated.

For using deprecated modules, you might need to load the deprecated PModules repository as follows:

module use deprecated

However, this is usually not needed: when loading directly a specific version in the deprecated repository, if this is not found in stable it try to search and to fallback to other repositories (deprecated or unstable).

About missing versions

Missing OpenMPI versions

For legacy software, some users might require a different OpenMPI version. We always encourage users to try one of the existing stable versions (OpenMPI always with suffix _slurm or _merlin6!), as they will contain the latest bug fixes and they usually should work. In the worst case, you can also try with the ones in the deprecated repository (again, OpenMPI always with suffix _slurm or _merlin6!), or for very old software which was based on OpenMPI v1 you can follow the guide FAQ: Removed MPI constructs, which provides some easy steps for migrating from OpenMPI v1 to v2 or superior or also is useful to find out why your code does not compile properly.

When, after trying the mentioned versions and guide, you are still facing problems, please contact us. Also, please contact us if you require a newer version with a different gcc or intel compiler (in example, Intel v19).