update slurm infra info

This commit is contained in:
viessm_h 2024-11-21 16:01:26 +01:00
parent 5fc3e79c4d
commit 358132a5c6
Signed by: viessm_h
GPG Key ID: 0C24C120CDED56F0
2 changed files with 46 additions and 35 deletions

View File

@ -0,0 +1,46 @@
---
title: Slurm cluster 'merlin7'
#tags:
keywords: configuration, partitions, node definition
#last_updated: 24 Mai 2023
summary: "This document describes a summary of the Merlin7 configuration."
sidebar: merlin7_sidebar
permalink: /merlin7/slurm-configuration.html
---
![Work In Progress](/images/WIP/WIP1.webp){:style="display:block; margin-left:auto; margin-right:auto"}
{{site.data.alerts.warning}}The Merlin7 documentation is <b>Work In Progress</b>.
Please do not use or rely on this documentation until this becomes official.
This applies to any page under <b><a href="https://lsm-hpce.gitpages.psi.ch/merlin7/">https://lsm-hpce.gitpages.psi.ch/merlin7/</a></b>
{{site.data.alerts.end}}
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
## Infrastructure
### Hardware
The current configuration for the _preproduction_ phase (and likely the production phase) is made up as:
* 92 nodes in total for Merlin7:
* 2 CPU-only login nodes
* 77 CPU-only compute nodes
* 5 GPU A100 nodes
* 8 GPU Grace Hopper nodes
The specification of the node types is:
| Node | CPU | RAM | GRES | Notes |
| ---- | --- | --- | ---- | ----- |
| Multi-core node | _2x_ AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz) | 512GB DDR4 3200Mhz | | For both the login and CPU-only compute nodes |
| A100 node | _2x_ AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz) | 512GB DDR4 3200Mhz | _4x_ NVidia A100 (Ampere, 80GB) | |
| GH Node | _2x_ NVidia Grace Neoverse-V2 (SBSA ARM 64bit, 144 Cores, 3.1GHz) | _2x_ 480GB DDR5X (CPU + GPU) | _4x_ NVidia GH200 (Hopper, 120GB) | |
### Network
The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able
to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on <https://www.glennklockwood.com/garden/slingshot>.
Through software interfaces like [libFabric](https://ofiwg.github.io/libfabric/) (which available on Merlin7), application can leverage the network seamlessly.

View File

@ -1,35 +0,0 @@
---
title: Slurm cluster 'merlin7'
#tags:
keywords: configuration, partitions, node definition
last_updated: 24 Mai 2023
summary: "This document describes a summary of the Merlin7 configuration."
sidebar: merlin7_sidebar
permalink: /merlin7/slurm-configuration.html
---
![Work In Progress](/images/WIP/WIP1.webp){:style="display:block; margin-left:auto; margin-right:auto"}
{{site.data.alerts.warning}}The Merlin7 documentation is <b>Work In Progress</b>.
Please do not use or rely on this documentation until this becomes official.
This applies to any page under <b><a href="https://lsm-hpce.gitpages.psi.ch/merlin7/">https://lsm-hpce.gitpages.psi.ch/merlin7/</a></b>
{{site.data.alerts.end}}
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
### Infrastructure
#### Hardware
The current configuration for the _preproduction_ phase is made up as:
* nodes for the _PSI-Dev_ development system
* 2 CPU-only login nodes
* 77 CPU-only compute nodes
* 4 GPU nodes
| Node | CPU | RAM | GRES | Notes |
| ---- | --- | --- | ---- | ----- |
| Login node | _2x_ AMD EPYC 7742 (x86_64 Rome, 64 Cores, 3.2GHz) | 512GB DRR4 3200Mhz | | |
| CPU node | _2x_ AMD EPYC 7742 (x86_64 Rome, 64 Cores, 3.2GHz) | 512GB DRR4 3200Mhz | | |
| GPU node | _2x_ AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz) | 512GB DDR4 3200Mhz | _4x_ NVidia A100 (Ampere, 80GB) | |