first stab at mkdocs migration
refactor CSCS and Meg content add merlin6 quick start update merlin6 nomachine docs give the userdoc its own color scheme we use the Materials default one refactored slurm general docs merlin6 add merlin6 JB docs add software support m6 docs add all files to nav vibed changes #1 add missing pages further vibing #2 vibe #3 further fixes
This commit is contained in:
151
docs/gmerlin6/hardware-and-software-description.md
Normal file
151
docs/gmerlin6/hardware-and-software-description.md
Normal file
@@ -0,0 +1,151 @@
|
||||
---
|
||||
title: Hardware And Software Description
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 19 April 2021
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /gmerlin6/hardware-and-software.html
|
||||
---
|
||||
|
||||
## Hardware
|
||||
|
||||
### GPU Computing Nodes
|
||||
|
||||
The GPU Merlin6 cluster was initially built from recycled workstations from different groups in the BIO division.
|
||||
From then, little by little it was updated with new nodes from sporadic investments from the same division, and it was never possible a central big investment.
|
||||
Hence, due to this, the Merlin6 GPU computing cluster has a non homogeneus solution, consisting on a big variety of hardware types and components.
|
||||
|
||||
On 2018, for the common good, BIO decided to open the cluster to the Merlin users and make it widely accessible for the PSI scientists.
|
||||
|
||||
The below table summarizes the hardware setup for the Merlin6 GPU computing nodes:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th scope='colgroup' style="vertical-align:middle;text-align:center;" colspan="9">Merlin6 GPU Computing Nodes</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Node</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Processor</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Sockets</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Cores</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Threads</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Scratch</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">Memory</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">GPUs</th>
|
||||
<th scope='col' style="vertical-align:middle;text-align:center;" colspan="1">GPU Model</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-001</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/82930/intel-core-i7-5960x-processor-extreme-edition-20m-cache-up-to-3-50-ghz.html">Intel Core i7-5960X</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">16</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.8TB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">128GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">GTX1080</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-00[2-5]</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/92984/intel-xeon-processor-e5-2640-v4-25m-cache-2-40-ghz.html">Intel Xeon E5-2640</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">20</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.8TB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">128GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">4</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">GTX1080</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-006</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/92984/intel-xeon-processor-e5-2640-v4-25m-cache-2-40-ghz.html">Intel Xeon E5-2640</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">20</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">800GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">128GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">4</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">GTX1080Ti</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-00[7-9]</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/92984/intel-xeon-processor-e5-2640-v4-25m-cache-2-40-ghz.html">Intel Xeon E5-2640</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">20</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">3.5TB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">128GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">4</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">GTX1080Ti</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-01[0-3]</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://ark.intel.com/content/www/us/en/ark/products/197098/intel-xeon-silver-4210r-processor-13-75m-cache-2-40-ghz.html">Intel Xeon Silver 4210R</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">20</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1.7TB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">128GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">4</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">RTX2080Ti</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-014</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://www.intel.com/content/www/us/en/products/sku/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz/specifications.html?wapkw=Intel(R)%20Xeon(R)%20Gold%206240R%20CP">Intel Xeon Gold 6240R</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">48</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2.9TB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">8</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">RTX2080Ti</td>
|
||||
</tr>
|
||||
<tr style="vertical-align:middle;text-align:center;" ralign="center">
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><b>merlin-g-015</b></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1"><a href="https://www.intel.com/content/www/us/en/products/sku/215279/intel-xeon-gold-5318s-processor-36m-cache-2-10-ghz/specifications.html">Intel(R) Xeon Gold 5318S</a></td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">48</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">1</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">2.9TB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">384GB</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">8</td>
|
||||
<td style="vertical-align:middle;text-align:center;" rowspan="1">RTX A5000</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
### Login Nodes
|
||||
|
||||
The login nodes are part of the **[Merlin6](../merlin6/cluster-introduction.md)** HPC cluster,
|
||||
and are used to compile and to submit jobs to the different ***Merlin Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
|
||||
Please refer to the **[Merlin6 Hardware Documentation](../merlin6/hardware-and-software-description.md)** for further information.
|
||||
|
||||
### Storage
|
||||
|
||||
The storage is part of the **[Merlin6](../merlin6/cluster-introduction.md)** HPC cluster,
|
||||
and is mounted in all the ***Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
|
||||
Please refer to the **[Merlin6 Hardware Documentation](../merlin6/hardware-and-software-description.md)** for further information.
|
||||
|
||||
### Network
|
||||
|
||||
The Merlin6 cluster connectivity is based on the [Infiniband FDR and EDR](https://en.wikipedia.org/wiki/InfiniBand) technologies.
|
||||
This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs.
|
||||
To check the network speed (56Gbps for **FDR**, 100Gbps for **EDR**) of the different machines, it can be checked by running on each node the following command:
|
||||
|
||||
```bash
|
||||
ibstat | grep Rate
|
||||
```
|
||||
|
||||
## Software
|
||||
|
||||
In the Merlin6 GPU computing nodes, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
|
||||
|
||||
Due to this, the Merlin6 GPU nodes run:
|
||||
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||
* [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-4** or superior cards.
|
||||
Reference in New Issue
Block a user