12 KiB
title, last_updated, sidebar, permalink
title | last_updated | sidebar | permalink |
---|---|---|---|
Hardware And Software Description | 13 June 2019 | merlin6_sidebar | /merlin6/hardware-and-software.html |
Hardware
Computing Nodes
The new Merlin6 cluster contains a solution based on four HPE Apollo k6000 Chassis
- Three of them contain 24 x HP Apollo XL230K Gen10 blades.
- A fourth chassis was purchased on 2021 with HP Apollo XL230K Gen10 blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
The connectivity for the Merlin6 cluster is based on ConnectX-5 EDR-100Gbps, and each chassis contains:
- 1 x HPE Apollo InfiniBand EDR 36-port Unmanaged Switch
- 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
- 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
Merlin6 CPU Computing Nodes | |||||||
---|---|---|---|---|---|---|---|
Chassis | Node | Processor | Sockets | Cores | Threads | Scratch | Memory |
#0 | merlin-c-0[01-24] | Intel Xeon Gold 6152 | 2 | 44 | 2 | 1.2TB | 384GB |
#1 | merlin-c-1[01-24] | Intel Xeon Gold 6152 | 2 | 44 | 2 | 1.2TB | 384GB |
#2 | merlin-c-2[01-24] | Intel Xeon Gold 6152 | 2 | 44 | 2 | 1.2TB | 384GB |
#3 | merlin-c-3[01-12] | Intel Xeon Gold 6240R | 2 | 48 | 2 | 1.2TB | 768GB |
merlin-c-3[03-18] | 1 | ||||||
merlin-c-3[19-24] | 2 | 384GB |
Login Nodes
One old login node (merlin-l-01.psi.ch
) is inherit from the previous Merlin5 cluster. Its mainly use is for running some BIO services (cryosparc
) and for submitting jobs.
Two new login nodes (merlin-l-001.psi.ch
,merlin-l-002.psi.ch
) with similar configuration to the Merlin6 computing nodes are available for the users. The mainly use
is for compiling software and submitting jobs.
The connectivity is based on ConnectX-5 EDR-100Gbps for the new login nodes, and ConnectIB FDR-56Gbps for the old one.
Merlin6 CPU Computing Nodes | |||||||
---|---|---|---|---|---|---|---|
Hardware | Node | Processor | Sockets | Cores | Threads | Scratch | Memory |
Old | merlin-l-01 | Intel Xeon E5-2697AV4 | 2 | 16 | 2 | 100GB | 512GB |
New | merlin-l-00[1,2] | Intel Xeon Gold 6152 | 2 | 44 | 2 | 1.8TB | 384GB |
Storage
The storage node is based on the Lenovo Distributed Storage Solution for IBM Spectrum Scale.
- 2 x Lenovo DSS G240 systems, each one composed by 2 IO Nodes ThinkSystem SR650 mounting 4 x Lenovo Storage D3284 High Density Expansion enclosures.
- Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are ConnectX-5 and 2 are ConnectX-4).
The storage solution is connected to the HPC clusters through 2 x Mellanox SB7800 InfiniBand 1U Switches for high availability and load balancing.
Network
Merlin6 cluster connectivity is based on the Infiniband technology. This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs:
- Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
- Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
- Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
- 1 x MSX6710 (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
- 2 x MSB7800 (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
- 3 x HP EDR Unmanaged switches, each one embedded to each HP Apollo k6000 chassis solution.
- 2 x MSB7700 (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
Software
In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, Merlin6 runs:
- RedHat Enterprise Linux 7
- Slurm, we usually try to keep it up to date with the most recent versions.
- GPFS v5
- MLNX_OFED LTS v.5.2-2.2.0.0 or newer for all ConnectX-5 or superior cards.
- MLNX_OFED LTS v.4.9-2.2.4.0 is installed for remaining ConnectX-3 and ConnectIB cards.