Reorganize merlin6 pages to follow navigation menu
The folders are only used for source organization; URLs remain flat.
This commit is contained in:
42
pages/merlin6/01 introduction/code-of-conduct.md
Normal file
42
pages/merlin6/01 introduction/code-of-conduct.md
Normal file
@ -0,0 +1,42 @@
|
||||
---
|
||||
title: Code Of Conduct
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/code-of-conduct.html
|
||||
---
|
||||
|
||||
## The Basic principle
|
||||
|
||||
The basic principle is courtesy and consideration for other users.
|
||||
|
||||
* Merlin6 is a system shared by many users, therefore you are kindly requested to apply common courtesy in using its resources. Please follow our guidelines which aim at providing and maintaining an efficient compute environment for all our users.
|
||||
* Basic shell programming skills are an essential requirement in a Linux/UNIX HPC cluster environment; a proficiency in shell programming is greatly beneficial.
|
||||
|
||||
## Interactive nodes
|
||||
|
||||
* The interactive nodes (also known as login nodes) are for development and quick testing:
|
||||
* It is **strictly forbidden to run production jobs** on the login nodes. All production jobs must be submitted to the batch system.
|
||||
* It is **forbidden to run long processes** occupying big parts of a login node's resources.
|
||||
* According to the previous rules, **misbehaving running processes will have to be killed.**
|
||||
in order to keep the system responsive for other users.
|
||||
|
||||
## Batch system
|
||||
|
||||
* Make sure that no broken or run-away processes are left when your job is done. Keep the process space clean on all nodes.
|
||||
* During the runtime of a job, it is mandatory to use the ``/scratch`` and ``/shared-scratch`` partitions for temporary data:
|
||||
* It is **forbidden** to use the ``/data/user``, ``/data/project`` or ``/psi/home/`` for that purpose.
|
||||
* Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.
|
||||
* Prefer ``/scratch`` over ``/shared-scratch`` and use the latter only when you require the temporary files to be visible from multiple nodes.
|
||||
* Read the description in **[Merlin6 directory structure](### Merlin6 directory structure)** for learning about the correct usage of each partition type.
|
||||
|
||||
## System Administrator Rights
|
||||
|
||||
* The system administrator has the right to temporarily block the access to Merlin6 for an account violating the Code of Conduct in order to maintain the efficiency and stability of the system.
|
||||
* Repetitive violations by the same user will be escalated to the user's supervisor.
|
||||
* The system administrator has the right to delete files in the **scratch** directories
|
||||
* after a job, if the job failed to clean up its files.
|
||||
* during the job in order to prevent a job from destabilizing a node or multiple nodes.
|
||||
* The system administrator has the right to kill any misbehaving running processes.
|
@ -0,0 +1,111 @@
|
||||
---
|
||||
title: Hardware And Software Description
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/hardware-and-software.html
|
||||
---
|
||||
|
||||
# Hardware And Software Description
|
||||
{: .no_toc }
|
||||
|
||||
## Table of contents
|
||||
{: .no_toc .text-delta }
|
||||
|
||||
1. TOC
|
||||
{:toc}
|
||||
|
||||
---
|
||||
|
||||
## Computing Nodes
|
||||
|
||||
The new Merlin6 cluster contains an homogeneous solution based on *three* HP Apollo k6000 systems. Each HP Apollo k6000 chassis contains 22 HP XL320k Gen10 blades. However,
|
||||
each chassis can contain up to 24 blades, so is possible to upgradew with up to 2 nodes per chassis.
|
||||
|
||||
Each HP XL320k Gen 10 blade can contain up to two processors of the latest Intel® Xeon® Scalable Processor family. The hardware and software configuration is the following:
|
||||
* 3 x HP Apollo k6000 chassis systems, each one:
|
||||
* 22 x [HP Apollo XL230K Gen10](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw), each one:
|
||||
* 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
|
||||
* 12 x 32 GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
|
||||
* Dual Port !InfiniBand !ConnectX-5 EDR-100Gbps (low latency network); one active port per chassis.
|
||||
* 1 x 1.6TB NVMe SSD Disk
|
||||
* ~300GB reserved for the O.S.
|
||||
* ~1.2TB reserved for local fast scratch ``/scratch``.
|
||||
* Software:
|
||||
* RedHat Enterprise Linux 7.6
|
||||
* [Slurm](https://slurm.schedmd.com/) v18.08
|
||||
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
|
||||
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
|
||||
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
|
||||
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
|
||||
---
|
||||
|
||||
## Login Nodes
|
||||
|
||||
### merlin-l-0[1,2]
|
||||
|
||||
Two login nodes are inherit from the previous Merlin5 cluster: ``merlin-l-01.psi.ch``, ``merlin-l-02.psi.ch``. The hardware and software configuration is the following:
|
||||
|
||||
* 2 x HP DL380 Gen9, each one:
|
||||
* 2 x *16 core* [Intel® Xeon® Processor E5-2697AV4 Family](https://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-2697A-v4-40M-Cache-2-60-GHz-) (2.60-3.60GHz)
|
||||
* Hyper-Threading disabled
|
||||
* 16 x 32 GB (512 GB in total) of DDR4 memory clocked 2400 MHz.
|
||||
* Dual Port Infiniband !ConnectIB FDR-56Gbps (low latency network).
|
||||
* Software:
|
||||
* RedHat Enterprise Linux 7.6
|
||||
* [Slurm](https://slurm.schedmd.com/) v18.08
|
||||
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
|
||||
|
||||
### merlin-l-00[1,2]
|
||||
|
||||
Two new login nodes are available in the new cluster: ``merlin-l-001.psi.ch``, ``merlin-l-002.psi.ch``. The hardware and software configuration is the following:
|
||||
|
||||
* 2 x HP DL380 Gen10, each one:
|
||||
* 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
|
||||
* Hyper-threading enabled.
|
||||
* 24 x 16GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
|
||||
* Dual Port Infiniband !ConnectX-5 EDR-100Gbps (low latency network).
|
||||
* Software:
|
||||
* [NoMachine Terminal Server](https://www.nomachine.com/)
|
||||
* Currently only on: ``merlin-l-001.psi.ch``.
|
||||
* RedHat Enterprise Linux 7.6
|
||||
* [Slurm](https://slurm.schedmd.com/) v18.08
|
||||
* [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
|
||||
|
||||
---
|
||||
|
||||
## Storage
|
||||
|
||||
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
|
||||
The solution is equipped with 334 x 10TB disks providing a useable capacity of 2.316 PiB (2.608PB). THe overall solution can provide a maximum read performance of 20GB/s.
|
||||
* 1 x Lenovo DSS G240, composed by:
|
||||
* 2 x ThinkSystem SR650, each one:
|
||||
* 2 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
|
||||
* 2 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
|
||||
* 1 x ThinkSystem RAID 930-8i 2GB Flash PCIe 12Gb Adapter
|
||||
* 1 x ThinkSystem SR630
|
||||
* 1 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
|
||||
* 1 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
|
||||
* 4 x Lenovo Storage D3284 High Density Expansion Enclosure, each one:
|
||||
* Holds 84 x 3.5" hot-swap drive bays in two drawers. Each drawer has three rows of drives, and each row has 14 drives.
|
||||
* Each drive bay will contain a 10TB Helium 7.2K NL-SAS HDD.
|
||||
* 2 x Mellanox SB7800 InfiniBand 1U Switch for High Availability and fast access to the storage with very low latency. Each one:
|
||||
* 36 EDR-100Gbps ports
|
||||
|
||||
---
|
||||
|
||||
## Network
|
||||
|
||||
Merlin6 cluster connectivity is based on the [Infiniband](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
|
||||
extremely efficient MPI-based jobs:
|
||||
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
|
||||
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
|
||||
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
|
||||
|
||||
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
|
||||
* 1 * MSX6710 (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
|
||||
* 2 * MSB7800 (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
|
||||
* 3 * HP EDR Unmanaged switches, each one embedded to each HP Apollo k6000 chassis solution.
|
||||
* 2 * MSB7700 (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
|
41
pages/merlin6/01 introduction/introduction.md
Normal file
41
pages/merlin6/01 introduction/introduction.md
Normal file
@ -0,0 +1,41 @@
|
||||
---
|
||||
title: Introduction
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 28 June 2019
|
||||
#summary: "Merlin 6 cluster overview"
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/introduction.html
|
||||
redirect_from:
|
||||
- /merlin6
|
||||
- /merlin6/index.html
|
||||
---
|
||||
|
||||
## About Merlin6
|
||||
|
||||
Merlin6 is a the official PSI Local HPC cluster for development and
|
||||
mission-critical applications that has been built in 2019. It replaces
|
||||
the Merlin5 cluster.
|
||||
|
||||
Merlin6 is designed to be extensible, so is technically possible to add
|
||||
more compute nodes and cluster storage without significant increase of
|
||||
the costs of the manpower and the operations.
|
||||
|
||||
Merlin6 is mostly based on CPU resources, but also contains a small amount
|
||||
of GPU-based resources which are mostly used by the BIO experiments.
|
||||
|
||||
---
|
||||
|
||||
## Merlin6 Architecture
|
||||
|
||||
### Merlin6 Cluster Architecture Diagram
|
||||
|
||||
The following image shows the Merlin6 cluster architecture diagram:
|
||||
|
||||

|
||||
|
||||
### Merlin5 + Merlin6 Slurm Cluster Architecture Design
|
||||
|
||||
The following image shows the Slurm architecture design for the Merlin5 & Merlin6 clusters:
|
||||
|
||||

|
Reference in New Issue
Block a user