--- title: Hardware And Software Description #tags: #keywords: last_updated: 13 June 2019 #summary: "" sidebar: merlin6_sidebar permalink: /merlin6/hardware-and-software.html --- ## Hardware ### Computing Nodes The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw) * *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades. * A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements. The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains: * 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw) * 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity) * 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
Merlin6 CPU Computing Nodes
Chassis Node Processor Sockets Cores Threads Scratch Memory
#0 merlin-c-0[01-24] Intel Xeon Gold 6152 2 44 2 1.2TB 384GB
#1 merlin-c-1[01-24] Intel Xeon Gold 6152 2 44 2 1.2TB 384GB
#2 merlin-c-2[01-24] Intel Xeon Gold 6152 2 44 2 1.2TB 384GB
#3 merlin-c-3[01-12] Intel Xeon Gold 6240R 2 48 2 1.2TB 768GB
merlin-c-3[03-18] 1
merlin-c-3[19-24] 2 384GB
Each blade contains a NVMe disk, where up to 300TB are dedicated to the O.S., and ~1.2TB are reserved for local `/scratch`. ### Login Nodes *One old login node* (``merlin-l-01.psi.ch``) is inherit from the previous Merlin5 cluster. Its mainly use is for running some BIO services (`cryosparc`) and for submitting jobs. *Two new login nodes* (``merlin-l-001.psi.ch``,``merlin-l-002.psi.ch``) with similar configuration to the Merlin6 computing nodes are available for the users. The mainly use is for compiling software and submitting jobs. The connectivity is based on **ConnectX-5 EDR-100Gbps** for the new login nodes, and **ConnectIB FDR-56Gbps** for the old one.
Merlin6 CPU Computing Nodes
Hardware Node Processor Sockets Cores Threads Scratch Memory
Old merlin-l-01 Intel Xeon E5-2697AV4 2 16 2 100GB 512GB
New merlin-l-00[1,2] Intel Xeon Gold 6152 2 44 2 1.8TB 384GB
### Storage The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5). * 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures. * Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**). The storage solution is connected to the HPC clusters through 2 x **Mellanox SB7800 InfiniBand 1U Switches** for high availability and load balancing. ### Network Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs: * Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth. * Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth. * Communication to the storage ensures up to 800Gbps of aggregated bandwidth. Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis): * 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible. * 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode. * 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution. * 2 x **MSB7700** (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800). ## Software In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs: * [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index) * [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions. * [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) * [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-5** or superior cards. * [MLNX_OFED LTS v.4.9-2.2.4.0](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) is installed for remaining **ConnectX-3** and **ConnectIB** cards.