2025-01-21 13:57:25 +01:00

68 lines
3.2 KiB
Markdown

---
title: Introduction
#tags:
keywords: introduction, home, welcome, architecture, design
last_updated: 07 September 2022
sidebar: merlin7_sidebar
permalink: /merlin7/introduction.html
redirect_from:
- /merlin7
- /merlin7/index.html
---
![Work In Progress](/images/WIP/WIP1.webp){:style="display:block; margin-left:auto; margin-right:auto"}
{{site.data.alerts.warning}}The Merlin7 documentation is <b>Work In Progress</b> as the system is still evolving.
{{site.data.alerts.end}}
## About Merlin7
The Merlin7 cluster is in **preproduction** state since August 2024. We are moving the system towards production from January 2025 on, the schedule of the migration of users and communities being subject to the resolution of some remaining problems on the platform. You will be notified well in advance regarding the migration of data.
All PSI users can request access to Merlin7.
In case you identify errors or missing information, please provide feedback through [merlin-admins mailing list](mailto:merlin-admins@lists.psi.ch) mailing list or [submit a ticket using the PSI service portal](https://psi.service-now.com/psisp).
## Infrastructure
### Hardware
The Merlin7 cluster contains the following node specification:
| Node | #N | CPU | RAM | GPU | #GPUs |
| ----: | -- | --- | --- | ----: | ---: |
| Login | 2 | 2 AMD EPYC 7742 (64 Cores 2.25GHz) | 512GB | | |
| CPU | 77 | 2 AMD EPYC 7742 (64 Cores 2.25GHz) | 512GB | | |
| GPU A100 | 8 | 2 AMD EPYC 7713 (64 Cores 3.2GHz) | 512GB | A100 80GB | 4 |
| GPU GH | 5 | NVIDIA ARM Grace Neoverse v2 (144 Cores 3.1GHz) | 864GB (Unified) | GH200 120GB | 4 |
### Network
The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able
to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on at [HPE](https://www.hpe.com/psnow/doc/PSN1012904596HREN) and
at <https://www.glennklockwood.com/garden/slingshot>.
Through software interfaces like [libFabric](https://ofiwg.github.io/libfabric/) (which available on Merlin7), application can leverage the network seamlessly.
### Storage
Unlike previous iteration of the Merlin HPC clusters, Merlin7 _does not_ have any local storage. Instead storage for the entire cluster is provided through
a dedicated storage appliance from HPE/Cray called [ClusterStor](https://www.hpe.com/psnow/doc/PSN1012842049INEN.pdf).
The appliance is built of several storage servers:
* 2 management nodes
* 2 MDS servers, 12 drives per server, 2.9TiB (Raid10)
* 8 OSS-D servers, 106 drives per server, 14.5 T.B HDDs (Gridraid / Raid6)
* 4 OSS-F servers, 12 drives per server 7TiB SSDs (Raid10)
With effective storage capacity of:
* 10 PB HDD
* value visible on linux: HDD 9302.4 TiB
* 162 TB SSD
* value visible on linux: SSD 151.6 TiB
* 23.6 TiB on Metadata
The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.