Migrating merlin6 user guide from jekyll-example1
From lsm-hpce/jekyll-example1 1eada07
This commit is contained in:
@ -1,25 +1,16 @@
|
||||
---
|
||||
layout: default
|
||||
title: Slurm Configuration
|
||||
parent: Merlin6 Slurm
|
||||
grand_parent: Merlin6 User Guide
|
||||
nav_order: 2
|
||||
---
|
||||
|
||||
# Slurm Configuration
|
||||
{: .no_toc }
|
||||
|
||||
## Table of contents
|
||||
{: .no_toc .text-delta }
|
||||
|
||||
1. TOC
|
||||
{:toc}
|
||||
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/slurm-configuration.html
|
||||
---
|
||||
|
||||
## Using the Slurm batch system
|
||||
|
||||
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
|
||||
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
|
||||
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
|
||||
|
||||
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
|
||||
@ -30,7 +21,7 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
|
||||
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
|
||||
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
|
||||
|
||||
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
|
||||
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
|
||||
Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
|
||||
to the **merlin6** login nodes.
|
||||
|
||||
@ -40,9 +31,9 @@ The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* clu
|
||||
It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
|
||||
|
||||
From July 2019, **merlin6** becomes the **default cluster** and any job submitted to Slurm will be submitted to that cluster. Users can keep submitting to
|
||||
the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
|
||||
the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
|
||||
|
||||
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
|
||||
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
|
||||
|
||||
### Using Slurm 'merlin6' cluster
|
||||
|
||||
@ -65,26 +56,26 @@ In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
|
||||
|
||||
#### Merlin6 Slurm partitions
|
||||
|
||||
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
|
||||
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
|
||||
The following *partitions* (also known as *queues*) are configured in Slurm:
|
||||
|
||||
| Partition | Default Partition | Default Time | Max Time | Max Nodes | Priority |
|
||||
|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
|
||||
| **general** | true | 1 day | 1 week | 50 | low |
|
||||
| **daily** | false | 1 day | 1 day | 60 | medium |
|
||||
| **hourly** | false | 1 hour | 1 hour | unlimited | highest |
|
||||
| **general** | true | 1 day | 1 week | 50 | low |
|
||||
| **daily** | false | 1 day | 1 day | 60 | medium |
|
||||
| **hourly** | false | 1 hour | 1 hour | unlimited | highest |
|
||||
|
||||
General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
|
||||
General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
|
||||
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
|
||||
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
|
||||
|
||||
#### Merlin6 User limits
|
||||
|
||||
By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general** and **daily** partitions. For the **hourly** partition, there is no restriction.
|
||||
By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general** and **daily** partitions. For the **hourly** partition, there is no restriction.
|
||||
These limits are softed for the **daily** partition during non working hours and during the weekend as follows:
|
||||
|
||||
| Partition | Mon-Fri 08h-18h | Sun-Thu 18h-0h | From Fri 18h to Sun 8h | From Sun 8h to Mon 18h |
|
||||
|:----------- | --------------- | -------------- | ----------------------- | ---------------------- |
|
||||
| **general** | 528 | 528 | 528 | 528 |
|
||||
| **daily** | 528 | 792 | Unlimited | 792 |
|
||||
| **hourly** | Unlimited | Unlimited | Unlimited | Unlimited |
|
||||
| **general** | 528 | 528 | 528 | 528 |
|
||||
| **daily** | 528 | 792 | Unlimited | 792 |
|
||||
| **hourly** | Unlimited | Unlimited | Unlimited | Unlimited |
|
||||
|
Reference in New Issue
Block a user