172 lines
12 KiB
Markdown
172 lines
12 KiB
Markdown
---
|
|
title: Hardware And Software Description
|
|
#tags:
|
|
#keywords:
|
|
last_updated: 13 June 2019
|
|
#summary: ""
|
|
sidebar: merlin6_sidebar
|
|
permalink: /merlin6/hardware-and-software.html
|
|
---
|
|
|
|
## Hardware
|
|
|
|
### Computing Nodes
|
|
|
|
The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw)
|
|
* *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades.
|
|
* A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
|
|
|
|
The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains:
|
|
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
|
|
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
|
|
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin6 CPU Computing Nodes</th>
|
|
</tr>
|
|
<tr>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Chassis</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#0</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-0[01-24]</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
|
</tr>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#1</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-1[01-24]</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
|
</tr>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#2</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-2[01-24]</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
|
</tr>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="3"><b>#3</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-3[01-12]</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="3"><a href="https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html">Intel Xeon Gold 6240R</a></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="3">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="3">48</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="3">1.2TB</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">768GB</td>
|
|
</tr>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td rowspan="1"><b>merlin-c-3[03-18]</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
|
</tr>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td rowspan="1"><b>merlin-c-3[19-24]</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
Each blade contains a NVMe disk, where up to 300TB are dedicated to the O.S., and ~1.2TB are reserved for local `/scratch`.
|
|
|
|
### Login Nodes
|
|
|
|
*One old login node* (``merlin-l-01.psi.ch``) is inherit from the previous Merlin5 cluster. Its mainly use is for running some BIO services (`cryosparc`) and for submitting jobs.
|
|
*Two new login nodes* (``merlin-l-001.psi.ch``,``merlin-l-002.psi.ch``) with similar configuration to the Merlin6 computing nodes are available for the users. The mainly use
|
|
is for compiling software and submitting jobs.
|
|
|
|
The connectivity is based on **ConnectX-5 EDR-100Gbps** for the new login nodes, and **ConnectIB FDR-56Gbps** for the old one.
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin6 CPU Computing Nodes</th>
|
|
</tr>
|
|
<tr>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Hardware</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
|
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>Old</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-l-01</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-2697A-v4-40M-Cache-2-60-GHz-">Intel Xeon E5-2697AV4</a></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">16</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">100GB</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">512GB</td>
|
|
</tr>
|
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>New</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-l-00[1,2]</b></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.8TB</td>
|
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
### Storage
|
|
|
|
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
|
|
* 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures.
|
|
* Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**).
|
|
|
|
The storage solution is connected to the HPC clusters through 2 x **Mellanox SB7800 InfiniBand 1U Switches** for high availability and load balancing.
|
|
|
|
### Network
|
|
|
|
Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
|
|
extremely efficient MPI-based jobs:
|
|
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
|
|
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
|
|
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
|
|
|
|
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
|
|
* 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
|
|
* 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
|
|
* 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution.
|
|
* 2 x **MSB7700** (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
|
|
|
|
## Software
|
|
|
|
In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs:
|
|
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
|
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
|
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
|
* [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-5** or superior cards.
|
|
* [MLNX_OFED LTS v.4.9-2.2.4.0](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) is installed for remaining **ConnectX-3** and **ConnectIB** cards.
|