Docs v2
This commit is contained in:
97
pages/merlin5/hardware-and-software-description.md
Normal file
97
pages/merlin5/hardware-and-software-description.md
Normal file
@ -0,0 +1,97 @@
|
||||
---
|
||||
title: Hardware And Software Description
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 09 April 2021
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin5/hardware-and-software.html
|
||||
---
|
||||
|
||||
## Hardware
|
||||
|
||||
### Computing Nodes
|
||||
|
||||
Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
|
||||
* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
|
||||
* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
|
||||
* 16 internal ports for intra chassis communication
|
||||
* 2 connected external ports for inter chassis communication and storage access.
|
||||
|
||||
The below table summarizes the hardware setup for the Merlin5 computing nodes:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="8">Merlin5 CPU Computing Nodes</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Chassis</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#0</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[18-30]</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td rowspan="1"><b>merlin-c-[31,32]</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2"><b>#1</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-c-[33-45]</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2"><a href="https://ark.intel.com/content/www/us/en/ark/products/64595/intel-xeon-processor-e5-2670-20m-cache-2-60-ghz-8-00-gt-s-intel-qpi.html">Intel Xeon E5-2670</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">16</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="2">50GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">64GB</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td rowspan="1"><b>merlin-c-[46,47]</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>128GB</b></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
### Login Nodes
|
||||
|
||||
The login nodes are part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
|
||||
and are used to compile and to submit jobs to the different ***Merlin Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
|
||||
Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
|
||||
|
||||
### Storage
|
||||
|
||||
The storage is part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
|
||||
and is mounted in all the ***Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
|
||||
Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
|
||||
|
||||
### Network
|
||||
|
||||
Merlin5 cluster connectivity is based on the [Infiniband QDR](https://en.wikipedia.org/wiki/InfiniBand) technology.
|
||||
This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs.
|
||||
However, this is an old version of Infiniband which requires older drivers and software can not take advantage of the latest features.
|
||||
|
||||
## Software
|
||||
|
||||
In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](/merlin6/index.html).
|
||||
|
||||
Due to this, Merlin5 runs:
|
||||
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||
* [**MLNX_OFED LTS v.4.9-2.2.4.0**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed), which is an old version, but required because **ConnectX-3** support has been dropped on newer OFED versions.
|
47
pages/merlin5/introduction.md
Normal file
47
pages/merlin5/introduction.md
Normal file
@ -0,0 +1,47 @@
|
||||
---
|
||||
title: Cluster 'merlin5'
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 07 April 2021
|
||||
#summary: "Merlin 5 cluster overview"
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin5/introduction.html
|
||||
redirect_from:
|
||||
- /merlin5
|
||||
- /merlin5/index.html
|
||||
---
|
||||
|
||||
## Slurm 'merlin5' cluster
|
||||
|
||||
**Merlin5** was the old official PSI Local HPC cluster for development and
|
||||
mission-critical applications which was built in 2016-2017. It was an
|
||||
extension of the Merlin4 cluster and built from existing hardware due
|
||||
to a lack of central investment on Local HPC Resources. **Merlin5** was
|
||||
then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
|
||||
with an important central investment of ~1,5M CHF. **Merlin5** was mostly
|
||||
based on CPU resources, but also contained a small amount of GPU-based
|
||||
resources which were mostly used by the BIO experiments.
|
||||
|
||||
**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
|
||||
called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
|
||||
and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
|
||||
|
||||
The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)**
|
||||
cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
|
||||
contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
|
||||
|
||||
### Submitting jobs to 'merlin5'
|
||||
|
||||
To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
|
||||
the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
|
||||
|
||||
## The Merlin Architecture
|
||||
|
||||
### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
|
||||
|
||||
The following image shows the Slurm architecture design for Merlin cluster.
|
||||
It contains a multi non-federated cluster setup, with a central Slurm database
|
||||
and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
|
||||
|
||||

|
||||
|
Reference in New Issue
Block a user