From 226dd00bfb6581b6fc19a75152cac2caf84f66cb Mon Sep 17 00:00:00 2001
From: caubet_m <marc.caubet@psi.ch>
Date: Fri, 29 Jan 2021 12:31:41 +0100
Subject: [PATCH] Slurm configuration, changed per job and per user limits

---
 .../03 Job Submission/slurm-configuration.md  | 36 ++++++++++++++++---
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/pages/merlin6/03 Job Submission/slurm-configuration.md b/pages/merlin6/03 Job Submission/slurm-configuration.md
index 9562282..f7b3d8c 100644
--- a/pages/merlin6/03 Job Submission/slurm-configuration.md	
+++ b/pages/merlin6/03 Job Submission/slurm-configuration.md	
@@ -2,7 +2,7 @@
 title: Slurm Configuration
 #tags:
 keywords: configuration, partitions, node definition
-last_updated: 23 January 2020
+last_updated: 29 January 2021
 summary: "This document describes a summary of the Merlin6 configuration."
 sidebar: merlin6_sidebar
 permalink: /merlin6/slurm-configuration.html
@@ -105,7 +105,7 @@ with wide values).
 
 #### Per user limits for CPU partitions
 
-These limits which apply exclusively to users. In other words, there is a maximum of resource a single user can use. This is described in the table below,
+These limits which apply exclusively to users. In other words, there is a maximum of resources a single user can use. This is described in the table below,
 and limits will vary depending on the day of the week and the time (*working* vs *non-working* hours). Limits are shown in format: `SlurmQoS(limits)`,
 where `SlurmQoS` can be seen with the command `sacctmgr show qos`:
 
@@ -150,13 +150,41 @@ partitions, Slurm will also attempt first to allocate jobs on partitions with hi
 
 ### User and job limits 
 
+The GPU cluster contains some basic user and job limits to ensure that a single user can not overabuse the resources and a fair usage of the cluster.
+The limits are described below.
+
 #### Per job limits
 
-Per job limits are the same as the per user limits (see below).
+These are limits applying to a single job. In other words, there is a maximum of resources a single job can use. 
+Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below and are showed in format: `SlurmQoS(limits)`,
+(list of possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
+
+| Partition     | Mon-Sun 0h-24h                         |
+|:-------------:| :------------------------------------: | 
+| **gpu**       | gpu_week(cpu=40,gres/gpu=8,mem=200G)   |
+| **gpu-short** | gpu_week(cpu=40,gres/gpu=8,mem=200G)   |
+
+With these limits, a single job can not use more than 40 CPUs, more than 8 GPUs or more than 200GB. 
+Any job exceeding such limits will stay in the queue with error **`QOSMax[Cpu|GRES|Mem]PerJob`**. 
+Since there are no more QoS during the week which can increase job limits (this happens for instance in the CPU **daily** partition), the job needs to be cancelled and requested resources must be adapted according to the above resource limits.
 
 #### Per user limits for CPU partitions
 
-By default, a user can not use more than **two** GPU nodes in parallel. Hence, users are limited to use 8 GPUs in parallel at most (from 2 nodes).
+These limits apply exclusively to users. In other words, there is a maximum of resources a single user can use. 
+Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below and are showed in format: `SlurmQoS(limits)`,
+(list of possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
+
+| Partition     | Mon-Sun 0h-24h                                              |
+|:-------------:| :---------------------------------------------------------: |
+| **gpu**       | gpu_week(cpu=80,gres/gpu=16,mem=400G)                       |
+| **gpu-short** | gpu_week(cpu=80,gres/gpu=16,mem=400G)                       |
+
+With these limits, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB. 
+Jobs sent by any user already exceeding such limits will stay in the queue with error **`QOSMax[Cpu|GRES|Mem]PerUser`**. In that case, job can wait until some of the running resources by this user are freed. 
+
+Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc.
+Please try to avoid occupying all GPUs of the same type for several hours or multiple days, otherwise it would block other users needing the same
+type of GPU.
 
 ## Understanding the Slurm configuration (for advanced users)