diff --git a/_data/sidebars/merlin6_sidebar.yml b/_data/sidebars/merlin6_sidebar.yml
index 88b1331..d724d24 100644
--- a/_data/sidebars/merlin6_sidebar.yml
+++ b/_data/sidebars/merlin6_sidebar.yml
@@ -5,22 +5,36 @@ entries:
- product: Merlin
version: 6
folders:
- - title: Introduction
+ - title: Quick Start Guide
# URLs for top-level folders are optional. If omitted it is a bit easier to toggle the accordion.
#url: /merlin6/introduction.html
folderitems:
- - title: Introduction
- url: /merlin6/introduction.html
- title: Code Of Conduct
url: /merlin6/code-of-conduct.html
- - title: Hardware And Software Description
- url: /merlin6/hardware-and-software.html
- - title: Accessing Merlin
- folderitems:
- title: Requesting Accounts
url: /merlin6/request-account.html
- title: Requesting Projects
url: /merlin6/request-project.html
+ - title: Slurm CPU 'merlin5'
+ folderitems:
+ - title: Introduction
+ url: /merlin5/introduction.html
+ - title: Hardware And Software Description
+ url: /merlin5/hardware-and-software.html
+ - title: Slurm CPU 'merlin6'
+ folderitems:
+ - title: Introduction
+ url: /merlin6/introduction.html
+ - title: Hardware And Software Description
+ url: /merlin6/hardware-and-software.html
+ - title: Slurm GPU 'gmerlin6'
+ folderitems:
+ - title: Introduction
+ url: /gmerlin6/introduction.html
+ - title: Hardware And Software Description
+ url: /gmerlin6/hardware-and-software.html
+ - title: Accessing Merlin
+ folderitems:
- title: Accessing Interactive Nodes
url: /merlin6/interactive.html
- title: Accessing from a Linux client
diff --git a/_data/topnav.yml b/_data/topnav.yml
index 5680e7d..fe1d0c7 100644
--- a/_data/topnav.yml
+++ b/_data/topnav.yml
@@ -22,3 +22,11 @@ topnav_dropdowns:
url: /merlin6/use.html
- title: User Guide
url: /merlin6/user-guide.html
+ - title: Slurm
+ folderitems:
+ - title: Cluster 'merlin5'
+ url: /merlin5/slurm-cluster.html
+ - title: Cluster 'merlin6'
+ url: /gmerlin6/slurm-cluster.html
+ - title: Cluster 'gmerlin6'
+ url: /gmerlin6/slurm-cluster.html
diff --git a/pages/gmerlin6/introduction.md b/pages/gmerlin6/introduction.md
new file mode 100644
index 0000000..6fc3dff
--- /dev/null
+++ b/pages/gmerlin6/introduction.md
@@ -0,0 +1,47 @@
+---
+title: Cluster 'gmerlin6'
+#tags:
+#keywords:
+last_updated: 07 April 2021
+#summary: "GPU Merlin 6 cluster overview"
+sidebar: merlin6_sidebar
+permalink: /merlin5/introduction.html
+redirect_from:
+ - /gmerlin6
+ - /gmerlin6/index.html
+---
+
+## Slurm 'merlin5' cluster
+
+**Merlin5** was the old official PSI Local HPC cluster for development and
+mission-critical applications which was built in 2016-2017. It was an
+extension of the Merlin4 cluster and built from existing hardware due
+to a lack of central investment on Local HPC Resources. **Merlin5** was
+then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
+with an important central investment of ~1,5M CHF. **Merlin5** was mostly
+based on CPU resources, but also contained a small amount of GPU-based
+resources which were mostly used by the BIO experiments.
+
+**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
+called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
+and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
+
+The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)**
+cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
+contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
+
+### Submitting jobs to 'merlin5'
+
+To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
+the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
+
+## The Merlin Architecture
+
+### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
+
+The following image shows the Slurm architecture design for Merlin cluster.
+It contains a multi non-federated cluster setup, with a central Slurm database
+and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
+
+
+
diff --git a/pages/merlin5/hardware-and-software-description.md b/pages/merlin5/hardware-and-software-description.md
new file mode 100644
index 0000000..bf06ab9
--- /dev/null
+++ b/pages/merlin5/hardware-and-software-description.md
@@ -0,0 +1,97 @@
+---
+title: Hardware And Software Description
+#tags:
+#keywords:
+last_updated: 09 April 2021
+#summary: ""
+sidebar: merlin6_sidebar
+permalink: /merlin5/hardware-and-software.html
+---
+
+## Hardware
+
+### Computing Nodes
+
+Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
+* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
+* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
+ * 16 internal ports for intra chassis communication
+ * 2 connected external ports for inter chassis communication and storage access.
+
+The below table summarizes the hardware setup for the Merlin5 computing nodes:
+
+
+
+
+ Merlin5 CPU Computing Nodes |
+
+
+ Chassis |
+ Node |
+ Processor |
+ Sockets |
+ Cores |
+ Threads |
+ Scratch |
+ Memory |
+
+
+
+
+ #0 |
+ merlin-c-[18-30] |
+ Intel Xeon E5-2670 |
+ 2 |
+ 16 |
+ 1 |
+ 50GB |
+ 64GB |
+
+
+ merlin-c-[31,32] |
+ 128GB |
+
+
+ #1 |
+ merlin-c-[33-45] |
+ Intel Xeon E5-2670 |
+ 2 |
+ 16 |
+ 1 |
+ 50GB |
+ 64GB |
+
+
+ merlin-c-[46,47] |
+ 128GB |
+
+
+
+
+### Login Nodes
+
+The login nodes are part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
+and are used to compile and to submit jobs to the different ***Merlin Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
+Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
+
+### Storage
+
+The storage is part of the **[Merlin6](/merlin6/introduction.html)** HPC cluster,
+and is mounted in all the ***Slurm clusters*** (`merlin5`,`merlin6`,`gmerlin6`,etc.).
+Please refer to the **[Merlin6 Hardware Documentation](/merlin6/hardware-and-software.html)** for further information.
+
+### Network
+
+Merlin5 cluster connectivity is based on the [Infiniband QDR](https://en.wikipedia.org/wiki/InfiniBand) technology.
+This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs.
+However, this is an old version of Infiniband which requires older drivers and software can not take advantage of the latest features.
+
+## Software
+
+In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](/merlin6/index.html).
+
+Due to this, Merlin5 runs:
+* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
+* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
+* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
+* [**MLNX_OFED LTS v.4.9-2.2.4.0**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed), which is an old version, but required because **ConnectX-3** support has been dropped on newer OFED versions.
diff --git a/pages/merlin5/introduction.md b/pages/merlin5/introduction.md
new file mode 100644
index 0000000..40ccfab
--- /dev/null
+++ b/pages/merlin5/introduction.md
@@ -0,0 +1,47 @@
+---
+title: Cluster 'merlin5'
+#tags:
+#keywords:
+last_updated: 07 April 2021
+#summary: "Merlin 5 cluster overview"
+sidebar: merlin6_sidebar
+permalink: /merlin5/introduction.html
+redirect_from:
+ - /merlin5
+ - /merlin5/index.html
+---
+
+## Slurm 'merlin5' cluster
+
+**Merlin5** was the old official PSI Local HPC cluster for development and
+mission-critical applications which was built in 2016-2017. It was an
+extension of the Merlin4 cluster and built from existing hardware due
+to a lack of central investment on Local HPC Resources. **Merlin5** was
+then replaced by the **[Merlin6](/merlin6/index.html)** cluster in 2019,
+with an important central investment of ~1,5M CHF. **Merlin5** was mostly
+based on CPU resources, but also contained a small amount of GPU-based
+resources which were mostly used by the BIO experiments.
+
+**Merlin5** has been kept as a **Local HPC [Slurm](https://slurm.schedmd.com/overview.html) cluster**,
+called **`merlin5`**. In that way, the old CPU computing nodes are still available as extra computation resources,
+and as an extension of the official production **`merlin6`** [Slurm](https://slurm.schedmd.com/overview.html) cluster.
+
+The old Merlin5 _**login nodes**_, _**GPU nodes**_ and _**storage**_ were fully migrated to the **[Merlin6](/merlin6/index.html)**
+cluster, which becomes the **main Local HPC Cluster**. Hence, **[Merlin6](/merlin6/index.html)**
+contains the storage which is mounted on the different Merlin HPC [Slurm](https://slurm.schedmd.com/overview.html) Clusters (`merlin5`, `merlin6`, `gmerlin6`).
+
+### Submitting jobs to 'merlin5'
+
+To submit jobs to the **`merlin5`** Slurm cluster, it must be done from the **Merlin6** login nodes by using
+the option `--clusters=merlin5` on any of the Slurm commands (`sbatch`, `salloc`, `srun`, etc. commands).
+
+## The Merlin Architecture
+
+### Multi Non-Federated Cluster Architecture Design: The Merlin cluster
+
+The following image shows the Slurm architecture design for Merlin cluster.
+It contains a multi non-federated cluster setup, with a central Slurm database
+and multiple independent clusters (`merlin5`, `merlin6`, `gmerlin6`):
+
+
+
diff --git a/pages/merlin6/01 introduction/hardware-and-software-description.md b/pages/merlin6/01 introduction/hardware-and-software-description.md
index 4439a8a..139757a 100644
--- a/pages/merlin6/01 introduction/hardware-and-software-description.md
+++ b/pages/merlin6/01 introduction/hardware-and-software-description.md
@@ -8,104 +8,159 @@ sidebar: merlin6_sidebar
permalink: /merlin6/hardware-and-software.html
---
-# Hardware And Software Description
-{: .no_toc }
+## Hardware
-## Table of contents
-{: .no_toc .text-delta }
+### Computing Nodes
-1. TOC
-{:toc}
+The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw)
+* *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades.
+* A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
----
+The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains:
+* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
+ * 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
+ * 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
-## Computing Nodes
+
+
+
+ Merlin6 CPU Computing Nodes |
+
+
+ Chassis |
+ Node |
+ Processor |
+ Sockets |
+ Cores |
+ Threads |
+ Scratch |
+ Memory |
+
+
+
+
+ #0 |
+ merlin-c-0[01-24] |
+ Intel Xeon Gold 6152 |
+ 2 |
+ 44 |
+ 2 |
+ 1.2TB |
+ 384GB |
+
+
+ #1 |
+ merlin-c-1[01-24] |
+ Intel Xeon Gold 6152 |
+ 2 |
+ 44 |
+ 2 |
+ 1.2TB |
+ 384GB |
+
+
+ #2 |
+ merlin-c-2[01-24] |
+ Intel Xeon Gold 6152 |
+ 2 |
+ 44 |
+ 2 |
+ 1.2TB |
+ 384GB |
+
+
+ #3 |
+ merlin-c-3[01-06] |
+ Intel Xeon Gold 6240R |
+ 2 |
+ 48 |
+ 2 |
+ 1.2TB |
+ 384GB |
+
+
+ merlin-c-3[07-12] |
+ 768GB |
+
+
+
+Each blade contains a NVMe disk, where up to 300TB are dedicated to the O.S., and ~1.2TB are reserved for local `/scratch`.
-The new Merlin6 cluster contains an homogeneous solution based on *three* HP Apollo k6000 systems. Each HP Apollo k6000 chassis contains 22 HP XL320k Gen10 blades. However,
-each chassis can contain up to 24 blades, so is possible to upgradew with up to 2 nodes per chassis.
+### Login Nodes
-Each HP XL320k Gen 10 blade can contain up to two processors of the latest Intel® Xeon® Scalable Processor family. The hardware and software configuration is the following:
-* 3 x HP Apollo k6000 chassis systems, each one:
- * 22 x [HP Apollo XL230K Gen10](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw), each one:
- * 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
- * 12 x 32 GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
- * Dual Port !InfiniBand !ConnectX-5 EDR-100Gbps (low latency network); one active port per chassis.
- * 1 x 1.6TB NVMe SSD Disk
- * ~300GB reserved for the O.S.
- * ~1.2TB reserved for local fast scratch ``/scratch``.
- * Software:
- * RedHat Enterprise Linux 7.6
- * [Slurm](https://slurm.schedmd.com/) v18.08
- * [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
- * 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
- * 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
- * 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
----
+*One old login node* (``merlin-l-01.psi.ch``) is inherit from the previous Merlin5 cluster. Its mainly use is for running some BIO services (`cryosparc`) and for submitting jobs.
+*Two new login nodes* (``merlin-l-001.psi.ch``,``merlin-l-002.psi.ch``) with similar configuration to the Merlin6 computing nodes are available for the users. The mainly use
+is for compiling software and submitting jobs.
-## Login Nodes
+The connectivity is based on **ConnectX-5 EDR-100Gbps** for the new login nodes, and **ConnectIB FDR-56Gbps** for the old one.
-### merlin-l-0[1,2]
+
+
+
+ Merlin6 CPU Computing Nodes |
+
+
+ Hardware |
+ Node |
+ Processor |
+ Sockets |
+ Cores |
+ Threads |
+ Scratch |
+ Memory |
+
+
+
+
+ Old |
+ merlin-l-01 |
+ Intel Xeon E5-2697AV4 |
+ 2 |
+ 16 |
+ 2 |
+ 100GB |
+ 512GB |
+
+
+ New |
+ merlin-l-00[1,2] |
+ Intel Xeon Gold 6152 |
+ 2 |
+ 44 |
+ 2 |
+ 1.8TB |
+ 384GB |
+
+
+
-Two login nodes are inherit from the previous Merlin5 cluster: ``merlin-l-01.psi.ch``, ``merlin-l-02.psi.ch``. The hardware and software configuration is the following:
-
-* 2 x HP DL380 Gen9, each one:
- * 2 x *16 core* [Intel® Xeon® Processor E5-2697AV4 Family](https://ark.intel.com/products/91768/Intel-Xeon-Processor-E5-2697A-v4-40M-Cache-2-60-GHz-) (2.60-3.60GHz)
- * Hyper-Threading disabled
- * 16 x 32 GB (512 GB in total) of DDR4 memory clocked 2400 MHz.
- * Dual Port Infiniband !ConnectIB FDR-56Gbps (low latency network).
- * Software:
- * RedHat Enterprise Linux 7.6
- * [Slurm](https://slurm.schedmd.com/) v18.08
- * [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2
-
-### merlin-l-00[1,2]
-
-Two new login nodes are available in the new cluster: ``merlin-l-001.psi.ch``, ``merlin-l-002.psi.ch``. The hardware and software configuration is the following:
-
-* 2 x HP DL380 Gen10, each one:
- * 2 x *22 core* [Intel® Xeon® Gold 6152 Scalable Processor](https://ark.intel.com/products/120491/Intel-Xeon-Gold-6152-Processor-30-25M-Cache-2-10-GHz-) (2.10-3.70GHz).
- * Hyper-threading enabled.
- * 24 x 16GB (384 GB in total) of DDR4 memory clocked 2666 MHz.
- * Dual Port Infiniband !ConnectX-5 EDR-100Gbps (low latency network).
- * Software:
- * [NoMachine Terminal Server](https://www.nomachine.com/)
- * Currently only on: ``merlin-l-001.psi.ch``.
- * RedHat Enterprise Linux 7.6
- * [Slurm](https://slurm.schedmd.com/) v18.08
- * [GPFS](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) v5.0.2 (merlin-l-001) v5.0.3 (merlin-l-002)
-
----
-
-## Storage
+### Storage
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
-The solution is equipped with 334 x 10TB disks providing a useable capacity of 2.316 PiB (2.608PB). THe overall solution can provide a maximum read performance of 20GB/s.
-* 1 x Lenovo DSS G240, composed by:
- * 2 x ThinkSystem SR650, each one:
- * 2 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
- * 2 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
- * 1 x ThinkSystem RAID 930-8i 2GB Flash PCIe 12Gb Adapter
- * 1 x ThinkSystem SR630
- * 1 x Dual Port Infiniband ConnectX-5 EDR-100Gbps (low latency network).
- * 1 x Dual Port Infiniband ConnectX-4 EDR-100Gbps (low latency network).
- * 4 x Lenovo Storage D3284 High Density Expansion Enclosure, each one:
- * Holds 84 x 3.5" hot-swap drive bays in two drawers. Each drawer has three rows of drives, and each row has 14 drives.
- * Each drive bay will contain a 10TB Helium 7.2K NL-SAS HDD.
- * 2 x Mellanox SB7800 InfiniBand 1U Switch for High Availability and fast access to the storage with very low latency. Each one:
- * 36 EDR-100Gbps ports
+* 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures.
+* Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**).
----
+The storage solution is connected to the HPC clusters through 2 x **Mellanox SB7800 InfiniBand 1U Switches** for high availability and load balancing.
-## Network
+### Network
-Merlin6 cluster connectivity is based on the [Infiniband](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
+Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
extremely efficient MPI-based jobs:
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
-* 1 * MSX6710 (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
-* 2 * MSB7800 (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
-* 3 * HP EDR Unmanaged switches, each one embedded to each HP Apollo k6000 chassis solution.
-* 2 * MSB7700 (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
+* 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
+* 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
+* 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution.
+* 2 x **MSB7700** (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).
+
+## Software
+
+In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs:
+* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
+* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
+* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
+* [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-5** or superior cards.
+ * [MLNX_OFED LTS v.4.9-2.2.4.0](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) is installed for remaining **ConnectX-3** and **ConnectIB** cards.