This commit is contained in:
caubet_m 2021-04-15 17:38:45 +02:00
parent 19c7f9bb79
commit e9861ef6b5
6 changed files with 356 additions and 88 deletions

View File

@ -5,22 +5,36 @@ entries:
- product: Merlin - product: Merlin
version: 6 version: 6
folders: folders:
- title: Introduction - title: Quick Start Guide
# URLs for top-level folders are optional. If omitted it is a bit easier to toggle the accordion. # URLs for top-level folders are optional. If omitted it is a bit easier to toggle the accordion.
#url: /merlin6/introduction.html #url: /merlin6/introduction.html
folderitems: folderitems:
- title: Introduction
url: /merlin6/introduction.html
- title: Code Of Conduct - title: Code Of Conduct
url: /merlin6/code-of-conduct.html url: /merlin6/code-of-conduct.html
- title: Hardware And Software Description
url: /merlin6/hardware-and-software.html
- title: Accessing Merlin
folderitems:
- title: Requesting Accounts - title: Requesting Accounts
url: /merlin6/request-account.html url: /merlin6/request-account.html
- title: Requesting Projects - title: Requesting Projects
url: /merlin6/request-project.html url: /merlin6/request-project.html
- title: Slurm CPU 'merlin5'
folderitems:
- title: Introduction
url: /merlin5/introduction.html
- title: Hardware And Software Description
url: /merlin5/hardware-and-software.html
- title: Slurm CPU 'merlin6'
folderitems:
- title: Introduction
url: /merlin6/introduction.html
- title: Hardware And Software Description
url: /merlin6/hardware-and-software.html
- title: Slurm GPU 'gmerlin6'
folderitems:
- title: Introduction
url: /gmerlin6/introduction.html
- title: Hardware And Software Description
url: /gmerlin6/hardware-and-software.html
- title: Accessing Merlin
folderitems:
- title: Accessing Interactive Nodes - title: Accessing Interactive Nodes
url: /merlin6/interactive.html url: /merlin6/interactive.html
- title: Accessing from a Linux client - title: Accessing from a Linux client

View File

@ -22,3 +22,11 @@ topnav_dropdowns:
url: /merlin6/use.html url: /merlin6/use.html
- title: User Guide - title: User Guide
url: /merlin6/user-guide.html url: /merlin6/user-guide.html
- title: Slurm
folderitems:
- title: Cluster 'merlin5'
url: /merlin5/slurm-cluster.html
- title: Cluster 'merlin6'
url: /gmerlin6/slurm-cluster.html
- title: Cluster 'gmerlin6'
url: /gmerlin6/slurm-cluster.html

View File

@ -0,0 +1,47 @@
---
title: Cluster 'gmerlin6'
#tags:
#keywords:
last_updated: 07 April 2021
#summary: "GPU Merlin 6 cluster overview"
sidebar: merlin6_sidebar
permalink: /merlin5/introduction.html
redirect_from:
- /gmerlin6
- /gmerlin6/index.html
---
## Slurm 'merlin5' cluster
**Merlin5** was the old official PSI Local HPC cluster for development and
mission-critical applications which was built in 2016-2017. It was an
extension of the Merlin4 cluster and built from existing hardware due
to a lack of central investment on Local HPC Resources. **Merlin5** was
then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
with an important central investment of ~1,5M CHF. **Merlin5** was mostly
based on CPU resources, but also contained a small amount of GPU-based
resources which were mostly used by the BIO experiments.
**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)**
cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
### Submitting jobs to 'merlin5'
To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
## The Merlin Architecture
### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
The following image shows the Slurm architecture design for Merlin cluster.
It contains a multi non-federated cluster setup, with a central Slurm database
and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
![Merlin6 Slurm Architecture Design]({{ "/images/merlin-slurm-architecture.png" }})

View File

@ -0,0 +1,97 @@
---
title: Hardware And Software Description
#tags:
#keywords:
last_updated: 09 April 2021
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin5/hardware-and-software.html
---
## Hardware
### Computing Nodes
Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
* 16 internal ports for intra chassis communication
* 2 connected external ports for inter chassis communication and storage access.
The below table summarizes the hardware setup for the Merlin5 computing nodes:
<table>
<thead>
<tr>
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin5 CPU Computing Nodes</th>
</tr>
<tr>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Chassis</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
</tr>
</thead>
<tbody>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#0</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[18-30]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td rowspan="1"><b>merlin-c-[31,32]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#1</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[33-45]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td rowspan="1"><b>merlin-c-[46,47]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
</tr>
</tbody>
</table>
### Login Nodes
The login nodes are part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
and are used to compile and to submit jobs to the different ***Merlin Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
### Storage
The storage is part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
and is mounted in all the ***Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
### Network
Merlin5 cluster connectivity is based on the [Infiniband QDR](https://en.wikipedia.org/wiki/InfiniBand) technology.
This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs.
However, this is an old version of Infiniband which requires older drivers and software can not take advantage of the latest features.
## Software
In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](/merlin6/index.html).
Due to this, Merlin5 runs:
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
* [**MLNX_OFED LTS v.4.9-2.2.4.0**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed), which is an old version, but required because **ConnectX-3** support has been dropped on newer OFED versions.

View File

@ -0,0 +1,47 @@
---
title: Cluster 'merlin5'
#tags:
#keywords:
last_updated: 07 April 2021
#summary: "Merlin 5 cluster overview"
sidebar: merlin6_sidebar
permalink: /merlin5/introduction.html
redirect_from:
- /merlin5
- /merlin5/index.html
---
## Slurm 'merlin5' cluster
**Merlin5** was the old official PSI Local HPC cluster for development and
mission-critical applications which was built in 2016-2017. It was an
extension of the Merlin4 cluster and built from existing hardware due
to a lack of central investment on Local HPC Resources. **Merlin5** was
then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
with an important central investment of ~1,5M CHF. **Merlin5** was mostly
based on CPU resources, but also contained a small amount of GPU-based
resources which were mostly used by the BIO experiments.
**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)**
cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
### Submitting jobs to 'merlin5'
To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
## The Merlin Architecture
### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
The following image shows the Slurm architecture design for Merlin cluster.
It contains a multi non-federated cluster setup, with a central Slurm database
and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
![Merlin6 Slurm Architecture Design]({{ "/images/merlin-slurm-architecture.png" }})

View File

@ -8,104 +8,159 @@ sidebar: merlin6_sidebar
permalink: /merlin6/hardware-and-software.html permalink: /merlin6/hardware-and-software.html
--- ---
# Hardware And Software Description ## Hardware
{: .no_toc }
## Table of contents ### Computing Nodes
{: .no_toc .text-delta }
1. TOC The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw)
{:toc} * *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades.
* A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
--- The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains:
## Computing Nodes
The new Merlin6 cluster contains an homogeneous solution based on *three* HP Apollo k6000 systems. Each HP Apollo k6000 chassis contains 22 HP XL320k Gen10 blades. However,
each chassis can contain up to 24 blades, so is possible to upgradew with up to 2 nodes per chassis.
Each HP XL320k Gen 10 blade can contain up to two processors of the latest Intel® Xeon® Scalable Processor family. The hardware and software configuration is the following:
* 3 x HP Apollo k6000 chassis systems, each one:
* 22 x [HP Apollo XL230K Gen10](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw), each one:
* 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
* 12 x 32 GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
* Dual Port !InfiniBand !ConnectX-5 EDR-100Gbps (low latency network); one active port per chassis.
* 1 x 1.6TB NVMe SSD Disk
* ~300GB reserved for the O.S.
* ~1.2TB reserved for local fast scratch ``/scratch``.
* Software:
* RedHat Enterprise Linux 7.6
* [Slurm](https://slurm.schedmd.com/) v18.08
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw) * 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity) * 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity) * 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
---
## Login Nodes <table>
<thead>
<tr>
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin6 CPU Computing Nodes</th>
</tr>
<tr>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Chassis</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
</tr>
</thead>
<tbody>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#0</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-0[01-24]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#1</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-1[01-24]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#2</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-2[01-24]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#3</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-3[01-06]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html">Intel Xeon Gold 6240R</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">48</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="2">1.2TB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td rowspan="1"><b>merlin-c-3[07-12]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">768GB</td>
</tr>
</tbody>
</table>
Each blade contains a NVMe disk, where up to 300TB are dedicated to the O.S., and ~1.2TB are reserved for local `/scratch`.
### merlin-l-0[1,2] ### Login Nodes
Two login nodes are inherit from the previous Merlin5 cluster: ``merlin-l-01.psi.ch``, ``merlin-l-02.psi.ch``. The hardware and software configuration is the following: *One old login node* (``merlin-l-01.psi.ch``) is inherit from the previous Merlin5 cluster. Its mainly use is for running some BIO services (`cryosparc`) and for submitting jobs.
*Two new login nodes* (``merlin-l-001.psi.ch``,``merlin-l-002.psi.ch``) with similar configuration to the Merlin6 computing nodes are available for the users. The mainly use
is for compiling software and submitting jobs.
* 2 x HP DL380 Gen9, each one: The connectivity is based on **ConnectX-5 EDR-100Gbps** for the new login nodes, and **ConnectIB FDR-56Gbps** for the old one.
* 2 x *16 core* [Intel® Xeon® Processor E5-2697AV4 Family](https://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-2697A-v4-40M-Cache-2-60-GHz-) (2.60-3.60GHz)
* Hyper-Threading disabled
* 16 x 32 GB (512 GB in total) of DDR4 memory clocked 2400 MHz.
* Dual Port Infiniband !ConnectIB FDR-56Gbps (low latency network).
* Software:
* RedHat Enterprise Linux 7.6
* [Slurm](https://slurm.schedmd.com/) v18.08
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
### merlin-l-00[1,2] <table>
<thead>
<tr>
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin6 CPU Computing Nodes</th>
</tr>
<tr>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Hardware</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
</tr>
</thead>
<tbody>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>Old</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-l-01</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-2697A-v4-40M-Cache-2-60-GHz-">Intel Xeon E5-2697AV4</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">16</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">100GB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">512GB</td>
</tr>
<tr style="vertical-align:middle;text-align:center;" ralign="center">
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>New</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-l-00[1,2]</b></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.8TB</td>
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
</tr>
</tbody>
</table>
Two new login nodes are available in the new cluster: ``merlin-l-001.psi.ch``, ``merlin-l-002.psi.ch``. The hardware and software configuration is the following: ### Storage
* 2 x HP DL380 Gen10, each one:
* 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
* Hyper-threading enabled.
* 24 x 16GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
* Dual Port Infiniband !ConnectX-5 EDR-100Gbps (low latency network).
* Software:
* [NoMachine Terminal Server](https://www.nomachine.com/)
* Currently only on: ``merlin-l-001.psi.ch``.
* RedHat Enterprise Linux 7.6
* [Slurm](https://slurm.schedmd.com/) v18.08
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2 (merlin-l-001) v5.0.3 (merlin-l-002)
---
## Storage
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5). The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
The solution is equipped with 334 x 10TB disks providing a useable capacity of 2.316 PiB (2.608PB). THe overall solution can provide a maximum read performance of 20GB/s. * 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures.
* 1 x Lenovo DSS G240, composed by: * Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**).
* 2 x ThinkSystem SR650, each one:
* 2 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
* 2 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
* 1 x ThinkSystem RAID 930-8i 2GB Flash PCIe 12Gb Adapter
* 1 x ThinkSystem SR630
* 1 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
* 1 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
* 4 x Lenovo Storage D3284 High Density Expansion Enclosure, each one:
* Holds 84 x 3.5" hot-swap drive bays in two drawers. Each drawer has three rows of drives, and each row has 14 drives.
* Each drive bay will contain a 10TB Helium 7.2K NL-SAS HDD.
* 2 x Mellanox SB7800 InfiniBand 1U Switch for High Availability and fast access to the storage with very low latency. Each one:
* 36 EDR-100Gbps ports
--- The storage solution is connected to the HPC clusters through 2 x **Mellanox SB7800 InfiniBand 1U Switches** for high availability and load balancing.
## Network ### Network
Merlin6 cluster connectivity is based on the [Infiniband](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
extremely efficient MPI-based jobs: extremely efficient MPI-based jobs:
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth. * Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth. * Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth. * Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis): Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
* 1 * MSX6710 (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible. * 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
* 2 * MSB7800 (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode. * 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
* 3 * HP EDR Unmanaged switches, each one embedded to each HP Apollo k6000 chassis solution. * 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution.
* 2 * MSB7700 (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800). * 2 x **MSB7700** (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
## Software
In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs:
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
* [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-5** or superior cards.
* [MLNX_OFED LTS v.4.9-2.2.4.0](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) is installed for remaining **ConnectX-3** and **ConnectIB** cards.