initial formatting changes complete
This commit is contained in:
@@ -1,22 +1,15 @@
|
||||
---
|
||||
title: Hardware And Software Description
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 09 April 2021
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin5/hardware-and-software.html
|
||||
---
|
||||
# Hardware And Software Description
|
||||
|
||||
## Hardware
|
||||
|
||||
### Computing Nodes
|
||||
|
||||
Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
|
||||
|
||||
* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
|
||||
* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
|
||||
* 16 internal ports for intra chassis communication
|
||||
* 2 connected external ports for inter chassis communication and storage access.
|
||||
* 16 internal ports for intra chassis communication
|
||||
* 2 connected external ports for inter chassis communication and storage access.
|
||||
|
||||
The below table summarizes the hardware setup for the Merlin5 computing nodes:
|
||||
|
||||
@@ -91,6 +84,7 @@ However, this is an old version of Infiniband which requires older drivers and s
|
||||
In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
|
||||
|
||||
Due to this, Merlin5 runs:
|
||||
|
||||
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||
|
||||
@@ -1,12 +1,4 @@
|
||||
---
|
||||
title: Slurm Configuration
|
||||
#tags:
|
||||
keywords: configuration, partitions, node definition
|
||||
last_updated: 20 May 2021
|
||||
summary: "This document describes a summary of the Merlin5 Slurm configuration."
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin5/slurm-configuration.html
|
||||
---
|
||||
# Slurm Configuration
|
||||
|
||||
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster.
|
||||
|
||||
@@ -28,7 +20,6 @@ consider the memory as a *consumable resource*. Hence, users can *oversubscribe*
|
||||
this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago.
|
||||
If you know that this might be a problem for you, please, always use Merlin6 instead.
|
||||
|
||||
|
||||
## Running jobs in the 'merlin5' cluster
|
||||
|
||||
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster.
|
||||
@@ -96,11 +87,11 @@ Below are listed the most common settings:
|
||||
Notice that in **Merlin5** no hyper-threading is available (while in **Merlin6** it is).
|
||||
Hence, in **Merlin5** there is not need to specify `--hint` hyper-threading related options.
|
||||
|
||||
## User and job limits
|
||||
## User and job limits
|
||||
|
||||
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
|
||||
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
|
||||
avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
|
||||
pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied).
|
||||
pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied).
|
||||
In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
|
||||
resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).
|
||||
|
||||
@@ -119,7 +110,7 @@ with the format `SlurmQoS(limits)` (`SlurmQoS` can be listed from the `sacctmgr
|
||||
| **merlin** | merlin5(cpu=384) | None |
|
||||
| **merlin-long** | merlin5(cpu=384) | Max. 4 nodes |
|
||||
|
||||
By default, by QoS limits, a job can not use more than 384 cores (max CPU per job).
|
||||
By default, by QoS limits, a job can not use more than 384 cores (max CPU per job).
|
||||
However, for the `merlin-long`, this is even more restricted: there is an extra limit of 4 dedicated nodes for this partion. This is defined
|
||||
at the partition level, and will overwrite any QoS limit as long as this is more restrictive.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user