Docs v2
This commit is contained in:
parent
19c7f9bb79
commit
e9861ef6b5
@ -5,22 +5,36 @@ entries:
|
|||||||
- product: Merlin
|
- product: Merlin
|
||||||
version: 6
|
version: 6
|
||||||
folders:
|
folders:
|
||||||
- title: Introduction
|
- title: Quick Start Guide
|
||||||
# URLs for top-level folders are optional. If omitted it is a bit easier to toggle the accordion.
|
# URLs for top-level folders are optional. If omitted it is a bit easier to toggle the accordion.
|
||||||
#url: /merlin6/introduction.html
|
#url: /merlin6/introduction.html
|
||||||
folderitems:
|
folderitems:
|
||||||
- title: Introduction
|
|
||||||
url: /merlin6/introduction.html
|
|
||||||
- title: Code Of Conduct
|
- title: Code Of Conduct
|
||||||
url: /merlin6/code-of-conduct.html
|
url: /merlin6/code-of-conduct.html
|
||||||
- title: Hardware And Software Description
|
|
||||||
url: /merlin6/hardware-and-software.html
|
|
||||||
- title: Accessing Merlin
|
|
||||||
folderitems:
|
|
||||||
- title: Requesting Accounts
|
- title: Requesting Accounts
|
||||||
url: /merlin6/request-account.html
|
url: /merlin6/request-account.html
|
||||||
- title: Requesting Projects
|
- title: Requesting Projects
|
||||||
url: /merlin6/request-project.html
|
url: /merlin6/request-project.html
|
||||||
|
- title: Slurm CPU 'merlin5'
|
||||||
|
folderitems:
|
||||||
|
- title: Introduction
|
||||||
|
url: /merlin5/introduction.html
|
||||||
|
- title: Hardware And Software Description
|
||||||
|
url: /merlin5/hardware-and-software.html
|
||||||
|
- title: Slurm CPU 'merlin6'
|
||||||
|
folderitems:
|
||||||
|
- title: Introduction
|
||||||
|
url: /merlin6/introduction.html
|
||||||
|
- title: Hardware And Software Description
|
||||||
|
url: /merlin6/hardware-and-software.html
|
||||||
|
- title: Slurm GPU 'gmerlin6'
|
||||||
|
folderitems:
|
||||||
|
- title: Introduction
|
||||||
|
url: /gmerlin6/introduction.html
|
||||||
|
- title: Hardware And Software Description
|
||||||
|
url: /gmerlin6/hardware-and-software.html
|
||||||
|
- title: Accessing Merlin
|
||||||
|
folderitems:
|
||||||
- title: Accessing Interactive Nodes
|
- title: Accessing Interactive Nodes
|
||||||
url: /merlin6/interactive.html
|
url: /merlin6/interactive.html
|
||||||
- title: Accessing from a Linux client
|
- title: Accessing from a Linux client
|
||||||
|
@ -22,3 +22,11 @@ topnav_dropdowns:
|
|||||||
url: /merlin6/use.html
|
url: /merlin6/use.html
|
||||||
- title: User Guide
|
- title: User Guide
|
||||||
url: /merlin6/user-guide.html
|
url: /merlin6/user-guide.html
|
||||||
|
- title: Slurm
|
||||||
|
folderitems:
|
||||||
|
- title: Cluster 'merlin5'
|
||||||
|
url: /merlin5/slurm-cluster.html
|
||||||
|
- title: Cluster 'merlin6'
|
||||||
|
url: /gmerlin6/slurm-cluster.html
|
||||||
|
- title: Cluster 'gmerlin6'
|
||||||
|
url: /gmerlin6/slurm-cluster.html
|
||||||
|
47
pages/gmerlin6/introduction.md
Normal file
47
pages/gmerlin6/introduction.md
Normal file
@ -0,0 +1,47 @@
|
|||||||
|
---
|
||||||
|
title: Cluster 'gmerlin6'
|
||||||
|
#tags:
|
||||||
|
#keywords:
|
||||||
|
last_updated: 07 April 2021
|
||||||
|
#summary: "GPU Merlin 6 cluster overview"
|
||||||
|
sidebar: merlin6_sidebar
|
||||||
|
permalink: /merlin5/introduction.html
|
||||||
|
redirect_from:
|
||||||
|
- /gmerlin6
|
||||||
|
- /gmerlin6/index.html
|
||||||
|
---
|
||||||
|
|
||||||
|
## Slurm 'merlin5' cluster
|
||||||
|
|
||||||
|
**Merlin5** was the old official PSI Local HPC cluster for development and
|
||||||
|
mission-critical applications which was built in 2016-2017. It was an
|
||||||
|
extension of the Merlin4 cluster and built from existing hardware due
|
||||||
|
to a lack of central investment on Local HPC Resources. **Merlin5** was
|
||||||
|
then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
|
||||||
|
with an important central investment of ~1,5M CHF. **Merlin5** was mostly
|
||||||
|
based on CPU resources, but also contained a small amount of GPU-based
|
||||||
|
resources which were mostly used by the BIO experiments.
|
||||||
|
|
||||||
|
**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
|
||||||
|
called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
|
||||||
|
and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
|
||||||
|
|
||||||
|
The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)**
|
||||||
|
cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
|
||||||
|
contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
|
||||||
|
|
||||||
|
### Submitting jobs to 'merlin5'
|
||||||
|
|
||||||
|
To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
|
||||||
|
the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
|
||||||
|
|
||||||
|
## The Merlin Architecture
|
||||||
|
|
||||||
|
### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
|
||||||
|
|
||||||
|
The following image shows the Slurm architecture design for Merlin cluster.
|
||||||
|
It contains a multi non-federated cluster setup, with a central Slurm database
|
||||||
|
and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
|
||||||
|
|
||||||
|

|
||||||
|
|
97
pages/merlin5/hardware-and-software-description.md
Normal file
97
pages/merlin5/hardware-and-software-description.md
Normal file
@ -0,0 +1,97 @@
|
|||||||
|
---
|
||||||
|
title: Hardware And Software Description
|
||||||
|
#tags:
|
||||||
|
#keywords:
|
||||||
|
last_updated: 09 April 2021
|
||||||
|
#summary: ""
|
||||||
|
sidebar: merlin6_sidebar
|
||||||
|
permalink: /merlin5/hardware-and-software.html
|
||||||
|
---
|
||||||
|
|
||||||
|
## Hardware
|
||||||
|
|
||||||
|
### Computing Nodes
|
||||||
|
|
||||||
|
Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
|
||||||
|
* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
|
||||||
|
* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
|
||||||
|
* 16 internal ports for intra chassis communication
|
||||||
|
* 2 connected external ports for inter chassis communication and storage access.
|
||||||
|
|
||||||
|
The below table summarizes the hardware setup for the Merlin5 computing nodes:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin5 CPU Computing Nodes</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Chassis</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#0</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[18-30]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td rowspan="1"><b>merlin-c-[31,32]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#1</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[33-45]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td rowspan="1"><b>merlin-c-[46,47]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
### Login Nodes
|
||||||
|
|
||||||
|
The login nodes are part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
|
||||||
|
and are used to compile and to submit jobs to the different ***Merlin Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
|
||||||
|
Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
|
||||||
|
|
||||||
|
### Storage
|
||||||
|
|
||||||
|
The storage is part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
|
||||||
|
and is mounted in all the ***Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
|
||||||
|
Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
|
||||||
|
|
||||||
|
### Network
|
||||||
|
|
||||||
|
Merlin5 cluster connectivity is based on the [Infiniband QDR](https://en.wikipedia.org/wiki/InfiniBand) technology.
|
||||||
|
This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs.
|
||||||
|
However, this is an old version of Infiniband which requires older drivers and software can not take advantage of the latest features.
|
||||||
|
|
||||||
|
## Software
|
||||||
|
|
||||||
|
In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](/merlin6/index.html).
|
||||||
|
|
||||||
|
Due to this, Merlin5 runs:
|
||||||
|
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||||
|
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||||
|
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||||
|
* [**MLNX_OFED LTS v.4.9-2.2.4.0**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed), which is an old version, but required because **ConnectX-3** support has been dropped on newer OFED versions.
|
47
pages/merlin5/introduction.md
Normal file
47
pages/merlin5/introduction.md
Normal file
@ -0,0 +1,47 @@
|
|||||||
|
---
|
||||||
|
title: Cluster 'merlin5'
|
||||||
|
#tags:
|
||||||
|
#keywords:
|
||||||
|
last_updated: 07 April 2021
|
||||||
|
#summary: "Merlin 5 cluster overview"
|
||||||
|
sidebar: merlin6_sidebar
|
||||||
|
permalink: /merlin5/introduction.html
|
||||||
|
redirect_from:
|
||||||
|
- /merlin5
|
||||||
|
- /merlin5/index.html
|
||||||
|
---
|
||||||
|
|
||||||
|
## Slurm 'merlin5' cluster
|
||||||
|
|
||||||
|
**Merlin5** was the old official PSI Local HPC cluster for development and
|
||||||
|
mission-critical applications which was built in 2016-2017. It was an
|
||||||
|
extension of the Merlin4 cluster and built from existing hardware due
|
||||||
|
to a lack of central investment on Local HPC Resources. **Merlin5** was
|
||||||
|
then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
|
||||||
|
with an important central investment of ~1,5M CHF. **Merlin5** was mostly
|
||||||
|
based on CPU resources, but also contained a small amount of GPU-based
|
||||||
|
resources which were mostly used by the BIO experiments.
|
||||||
|
|
||||||
|
**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
|
||||||
|
called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
|
||||||
|
and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
|
||||||
|
|
||||||
|
The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)**
|
||||||
|
cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
|
||||||
|
contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
|
||||||
|
|
||||||
|
### Submitting jobs to 'merlin5'
|
||||||
|
|
||||||
|
To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
|
||||||
|
the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
|
||||||
|
|
||||||
|
## The Merlin Architecture
|
||||||
|
|
||||||
|
### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
|
||||||
|
|
||||||
|
The following image shows the Slurm architecture design for Merlin cluster.
|
||||||
|
It contains a multi non-federated cluster setup, with a central Slurm database
|
||||||
|
and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
|
||||||
|
|
||||||
|

|
||||||
|
|
@ -8,104 +8,159 @@ sidebar: merlin6_sidebar
|
|||||||
permalink: /merlin6/hardware-and-software.html
|
permalink: /merlin6/hardware-and-software.html
|
||||||
---
|
---
|
||||||
|
|
||||||
# Hardware And Software Description
|
## Hardware
|
||||||
{: .no_toc }
|
|
||||||
|
|
||||||
## Table of contents
|
### Computing Nodes
|
||||||
{: .no_toc .text-delta }
|
|
||||||
|
|
||||||
1. TOC
|
The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw)
|
||||||
{:toc}
|
* *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades.
|
||||||
|
* A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
|
||||||
|
|
||||||
---
|
The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains:
|
||||||
|
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
|
||||||
## Computing Nodes
|
|
||||||
|
|
||||||
The new Merlin6 cluster contains an homogeneous solution based on *three* HP Apollo k6000 systems. Each HP Apollo k6000 chassis contains 22 HP XL320k Gen10 blades. However,
|
|
||||||
each chassis can contain up to 24 blades, so is possible to upgradew with up to 2 nodes per chassis.
|
|
||||||
|
|
||||||
Each HP XL320k Gen 10 blade can contain up to two processors of the latest Intel® Xeon® Scalable Processor family. The hardware and software configuration is the following:
|
|
||||||
* 3 x HP Apollo k6000 chassis systems, each one:
|
|
||||||
* 22 x [HP Apollo XL230K Gen10](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw), each one:
|
|
||||||
* 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
|
|
||||||
* 12 x 32 GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
|
|
||||||
* Dual Port !InfiniBand !ConnectX-5 EDR-100Gbps (low latency network); one active port per chassis.
|
|
||||||
* 1 x 1.6TB NVMe SSD Disk
|
|
||||||
* ~300GB reserved for the O.S.
|
|
||||||
* ~1.2TB reserved for local fast scratch ``/scratch``.
|
|
||||||
* Software:
|
|
||||||
* RedHat Enterprise Linux 7.6
|
|
||||||
* [Slurm](https://slurm.schedmd.com/) v18.08
|
|
||||||
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
|
|
||||||
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
|
|
||||||
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
|
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
|
||||||
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
|
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
|
||||||
---
|
|
||||||
|
|
||||||
## Login Nodes
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin6 CPU Computing Nodes</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Chassis</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#0</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-0[01-24]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#1</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-1[01-24]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>#2</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-2[01-24]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.2TB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#3</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-3[01-06]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html">Intel Xeon Gold 6240R</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">48</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="2">1.2TB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td rowspan="1"><b>merlin-c-3[07-12]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">768GB</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
Each blade contains a NVMe disk, where up to 300TB are dedicated to the O.S., and ~1.2TB are reserved for local `/scratch`.
|
||||||
|
|
||||||
### merlin-l-0[1,2]
|
### Login Nodes
|
||||||
|
|
||||||
Two login nodes are inherit from the previous Merlin5 cluster: ``merlin-l-01.psi.ch``, ``merlin-l-02.psi.ch``. The hardware and software configuration is the following:
|
*One old login node* (``merlin-l-01.psi.ch``) is inherit from the previous Merlin5 cluster. Its mainly use is for running some BIO services (`cryosparc`) and for submitting jobs.
|
||||||
|
*Two new login nodes* (``merlin-l-001.psi.ch``,``merlin-l-002.psi.ch``) with similar configuration to the Merlin6 computing nodes are available for the users. The mainly use
|
||||||
|
is for compiling software and submitting jobs.
|
||||||
|
|
||||||
* 2 x HP DL380 Gen9, each one:
|
The connectivity is based on **ConnectX-5 EDR-100Gbps** for the new login nodes, and **ConnectIB FDR-56Gbps** for the old one.
|
||||||
* 2 x *16 core* [Intel® Xeon® Processor E5-2697AV4 Family](https://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-2697A-v4-40M-Cache-2-60-GHz-) (2.60-3.60GHz)
|
|
||||||
* Hyper-Threading disabled
|
|
||||||
* 16 x 32 GB (512 GB in total) of DDR4 memory clocked 2400 MHz.
|
|
||||||
* Dual Port Infiniband !ConnectIB FDR-56Gbps (low latency network).
|
|
||||||
* Software:
|
|
||||||
* RedHat Enterprise Linux 7.6
|
|
||||||
* [Slurm](https://slurm.schedmd.com/) v18.08
|
|
||||||
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
|
|
||||||
|
|
||||||
### merlin-l-00[1,2]
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin6 CPU Computing Nodes</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Hardware</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
|
||||||
|
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>Old</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-l-01</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-2697A-v4-40M-Cache-2-60-GHz-">Intel Xeon E5-2697AV4</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">16</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">100GB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">512GB</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>New</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-l-00[1,2]</b></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html">Intel Xeon Gold 6152</a></td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">44</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.8TB</td>
|
||||||
|
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
Two new login nodes are available in the new cluster: ``merlin-l-001.psi.ch``, ``merlin-l-002.psi.ch``. The hardware and software configuration is the following:
|
### Storage
|
||||||
|
|
||||||
* 2 x HP DL380 Gen10, each one:
|
|
||||||
* 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
|
|
||||||
* Hyper-threading enabled.
|
|
||||||
* 24 x 16GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
|
|
||||||
* Dual Port Infiniband !ConnectX-5 EDR-100Gbps (low latency network).
|
|
||||||
* Software:
|
|
||||||
* [NoMachine Terminal Server](https://www.nomachine.com/)
|
|
||||||
* Currently only on: ``merlin-l-001.psi.ch``.
|
|
||||||
* RedHat Enterprise Linux 7.6
|
|
||||||
* [Slurm](https://slurm.schedmd.com/) v18.08
|
|
||||||
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2 (merlin-l-001) v5.0.3 (merlin-l-002)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Storage
|
|
||||||
|
|
||||||
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
|
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
|
||||||
The solution is equipped with 334 x 10TB disks providing a useable capacity of 2.316 PiB (2.608PB). THe overall solution can provide a maximum read performance of 20GB/s.
|
* 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures.
|
||||||
* 1 x Lenovo DSS G240, composed by:
|
* Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**).
|
||||||
* 2 x ThinkSystem SR650, each one:
|
|
||||||
* 2 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
|
|
||||||
* 2 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
|
|
||||||
* 1 x ThinkSystem RAID 930-8i 2GB Flash PCIe 12Gb Adapter
|
|
||||||
* 1 x ThinkSystem SR630
|
|
||||||
* 1 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
|
|
||||||
* 1 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
|
|
||||||
* 4 x Lenovo Storage D3284 High Density Expansion Enclosure, each one:
|
|
||||||
* Holds 84 x 3.5" hot-swap drive bays in two drawers. Each drawer has three rows of drives, and each row has 14 drives.
|
|
||||||
* Each drive bay will contain a 10TB Helium 7.2K NL-SAS HDD.
|
|
||||||
* 2 x Mellanox SB7800 InfiniBand 1U Switch for High Availability and fast access to the storage with very low latency. Each one:
|
|
||||||
* 36 EDR-100Gbps ports
|
|
||||||
|
|
||||||
---
|
The storage solution is connected to the HPC clusters through 2 x **Mellanox SB7800 InfiniBand 1U Switches** for high availability and load balancing.
|
||||||
|
|
||||||
## Network
|
### Network
|
||||||
|
|
||||||
Merlin6 cluster connectivity is based on the [Infiniband](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
|
Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
|
||||||
extremely efficient MPI-based jobs:
|
extremely efficient MPI-based jobs:
|
||||||
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
|
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
|
||||||
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
|
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
|
||||||
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
|
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
|
||||||
|
|
||||||
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
|
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
|
||||||
* 1 * MSX6710 (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
|
* 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
|
||||||
* 2 * MSB7800 (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
|
* 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
|
||||||
* 3 * HP EDR Unmanaged switches, each one embedded to each HP Apollo k6000 chassis solution.
|
* 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution.
|
||||||
* 2 * MSB7700 (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
|
* 2 x **MSB7700** (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
|
||||||
|
|
||||||
|
## Software
|
||||||
|
|
||||||
|
In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs:
|
||||||
|
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||||
|
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||||
|
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||||
|
* [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-5** or superior cards.
|
||||||
|
* [MLNX_OFED LTS v.4.9-2.2.4.0](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) is installed for remaining **ConnectX-3** and **ConnectIB** cards.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user