expand merlin7 hardware description

This commit is contained in:
2024-11-21 16:12:39 +01:00
parent 358132a5c6
commit 8695c0dc42

View File

@ -40,7 +40,29 @@ The specification of the node types is:
### Network
The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able
to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on <https://www.glennklockwood.com/garden/slingshot>.
to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on at [HPE](https://www.hpe.com/psnow/doc/PSN1012904596HREN) and
at <https://www.glennklockwood.com/garden/slingshot>.
Through software interfaces like [libFabric](https://ofiwg.github.io/libfabric/) (which available on Merlin7), application can leverage the network seamlessly.
### Storage
Unlike previous iteration of the Merlin HPC clusters, Merlin7 _does not_ have any local storage. Instead storage for the entire cluster is provided through
a dedicated storage appliance from HPE/Cray called [ClusterStor](https://www.hpe.com/psnow/doc/PSN1012842049INEN.pdf).
The appliance is built of several storage servers:
* 2 management nodes
* 2 MDS servers, 12 drives per server, 2.9TiB (Raid10)
* 8 OSS-D servers, 106 drives per server, 14.5 T.B HDDs (Gridraid / Raid6)
* 4 OSS-F servers, 12 drives per server 7TiB SSDs (Raid10)
With effective storage capacity of:
* 10 PB HDD
* value visible on linux: HDD 9302.4 TiB
* 162 TB SSD
* value visible on linux: SSD 151.6 TiB
* 23.6 TiB on Metadata
The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.