Migrating merlin6 user guide from jekyll-example1

From lsm-hpce/jekyll-example1 1eada07
This commit is contained in:
Spencer Bliven
2019-06-14 15:38:22 +02:00
parent 7c6f7b177d
commit ebff53c62c
19 changed files with 598 additions and 763 deletions

View File

@ -1,25 +1,16 @@
---
layout: default
title: Slurm Configuration
parent: Merlin6 Slurm
grand_parent: Merlin6 User Guide
nav_order: 2
---
# Slurm Configuration
{: .no_toc }
## Table of contents
{: .no_toc .text-delta }
1. TOC
{:toc}
#tags:
#keywords:
last_updated: 13 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/slurm-configuration.html
---
## Using the Slurm batch system
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
@ -30,7 +21,7 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
to the **merlin6** login nodes.
@ -40,9 +31,9 @@ The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* clu
It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
From July 2019, **merlin6** becomes the **default cluster** and any job submitted to Slurm will be submitted to that cluster. Users can keep submitting to
the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
### Using Slurm 'merlin6' cluster
@ -65,26 +56,26 @@ In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
#### Merlin6 Slurm partitions
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
The following *partitions* (also known as *queues*) are configured in Slurm:
| Partition | Default Partition | Default Time | Max Time | Max Nodes | Priority |
|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
| **general** | true | 1 day | 1 week | 50 | low |
| **daily** | false | 1 day | 1 day | 60 | medium |
| **hourly** | false | 1 hour | 1 hour | unlimited | highest |
| **general** | true | 1 day | 1 week | 50 | low |
| **daily** | false | 1 day | 1 day | 60 | medium |
| **hourly** | false | 1 hour | 1 hour | unlimited | highest |
General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
#### Merlin6 User limits
By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general** and **daily** partitions. For the **hourly** partition, there is no restriction.
By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general** and **daily** partitions. For the **hourly** partition, there is no restriction.
These limits are softed for the **daily** partition during non working hours and during the weekend as follows:
| Partition | Mon-Fri 08h-18h | Sun-Thu 18h-0h | From Fri 18h to Sun 8h | From Sun 8h to Mon 18h |
|:----------- | --------------- | -------------- | ----------------------- | ---------------------- |
| **general** | 528 | 528 | 528 | 528 |
| **daily** | 528 | 792 | Unlimited | 792 |
| **hourly** | Unlimited | Unlimited | Unlimited | Unlimited |
| **general** | 528 | 528 | 528 | 528 |
| **daily** | 528 | 792 | Unlimited | 792 |
| **hourly** | Unlimited | Unlimited | Unlimited | Unlimited |