From 48ba72fa35fb8e8df5344f28767b8d7ac119913c Mon Sep 17 00:00:00 2001 From: caubet_m Date: Fri, 29 Jan 2021 13:12:34 +0100 Subject: [PATCH] Updated missing Last Modified dates --- pages/merlin6/03 Job Submission/slurm-configuration.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/pages/merlin6/03 Job Submission/slurm-configuration.md b/pages/merlin6/03 Job Submission/slurm-configuration.md index f7b3d8c..3fe9a97 100644 --- a/pages/merlin6/03 Job Submission/slurm-configuration.md +++ b/pages/merlin6/03 Job Submission/slurm-configuration.md @@ -156,7 +156,7 @@ The limits are described below. #### Per job limits These are limits applying to a single job. In other words, there is a maximum of resources a single job can use. -Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below and are showed in format: `SlurmQoS(limits)`, +Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`, (list of possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`): | Partition | Mon-Sun 0h-24h | @@ -165,13 +165,13 @@ Limits are defined using QoS, and this is usually set at the partition level. Li | **gpu-short** | gpu_week(cpu=40,gres/gpu=8,mem=200G) | With these limits, a single job can not use more than 40 CPUs, more than 8 GPUs or more than 200GB. -Any job exceeding such limits will stay in the queue with error **`QOSMax[Cpu|GRES|Mem]PerJob`**. -Since there are no more QoS during the week which can increase job limits (this happens for instance in the CPU **daily** partition), the job needs to be cancelled and requested resources must be adapted according to the above resource limits. +Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**. +Since there are no more existing QoS during the week temporary overriding job limits (this happens for instance in the CPU **daily** partition), the job needs to be cancelled, and the requested resources must be adapted according to the above resource limits. #### Per user limits for CPU partitions These limits apply exclusively to users. In other words, there is a maximum of resources a single user can use. -Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below and are showed in format: `SlurmQoS(limits)`, +Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`, (list of possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`): | Partition | Mon-Sun 0h-24h | @@ -180,7 +180,7 @@ Limits are defined using QoS, and this is usually set at the partition level. Li | **gpu-short** | gpu_week(cpu=80,gres/gpu=16,mem=400G) | With these limits, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB. -Jobs sent by any user already exceeding such limits will stay in the queue with error **`QOSMax[Cpu|GRES|Mem]PerUser`**. In that case, job can wait until some of the running resources by this user are freed. +Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**. In that case, job can wait in the queue until some of the running resources are freed. Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc. Please try to avoid occupying all GPUs of the same type for several hours or multiple days, otherwise it would block other users needing the same