2.0 KiB
title, keywords, summary, sidebar, permalink
title | keywords | summary | sidebar | permalink |
---|---|---|---|---|
Slurm cluster 'merlin7' | configuration, partitions, node definition | This document describes a summary of the Merlin7 configuration. | merlin7_sidebar | /merlin7/slurm-configuration.html |
{:style="display:block; margin-left:auto; margin-right:auto"}
{{site.data.alerts.warning}}The Merlin7 documentation is Work In Progress.
Please do not use or rely on this documentation until this becomes official.
This applies to any page under https://lsm-hpce.gitpages.psi.ch/merlin7/
{{site.data.alerts.end}}
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
Infrastructure
Hardware
The current configuration for the preproduction phase (and likely the production phase) is made up as:
- 92 nodes in total for Merlin7:
- 2 CPU-only login nodes
- 77 CPU-only compute nodes
- 5 GPU A100 nodes
- 8 GPU Grace Hopper nodes
The specification of the node types is:
Node | CPU | RAM | GRES | Notes |
---|---|---|---|---|
Multi-core node | 2x AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz) | 512GB DDR4 3200Mhz | For both the login and CPU-only compute nodes | |
A100 node | 2x AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz) | 512GB DDR4 3200Mhz | 4x NVidia A100 (Ampere, 80GB) | |
GH Node | 2x NVidia Grace Neoverse-V2 (SBSA ARM 64bit, 144 Cores, 3.1GHz) | 2x 480GB DDR5X (CPU + GPU) | 4x NVidia GH200 (Hopper, 120GB) |
Network
The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on https://www.glennklockwood.com/garden/slingshot.
Through software interfaces like libFabric (which available on Merlin7), application can leverage the network seamlessly.