From a1f13cd07124f8afd1d4bea4cc0a513dfb18fd3f Mon Sep 17 00:00:00 2001 From: caubet_m Date: Thu, 9 Apr 2026 16:13:45 +0200 Subject: [PATCH] Add Slurm-Mail docs --- .../slurm-mail.md | 177 ++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 178 insertions(+) create mode 100644 docs/merlin7/03-Slurm-General-Documentation/slurm-mail.md diff --git a/docs/merlin7/03-Slurm-General-Documentation/slurm-mail.md b/docs/merlin7/03-Slurm-General-Documentation/slurm-mail.md new file mode 100644 index 00000000..58c3c457 --- /dev/null +++ b/docs/merlin7/03-Slurm-General-Documentation/slurm-mail.md @@ -0,0 +1,177 @@ +# Slurm-Mail + +## How to enable email notifications + +**`slurm-mail`** only sends messages if you ask Slurm to send them when submitting the job. The two main options are: + +* `--mail-user=` +* `--mail-type=` + +!!! tip + Some common values for `--mail-type` are: + + * `BEGIN` → send an email when the job starts running + * `END` → send an email when the job finishes + * `FAIL` → send an email if the job fails + * `TIME_LIMIT` → send an email if the job reaches its time limit + * `REQUEUE` → send an email if the job is requeued + * `ALL` → send emails for all supported events + +### Minimal example + +For most jobs, a good choice is: + +``` +#SBATCH --mail-user=marc.caubet@psi.ch +#SBATCH --mail-type=END,FAIL +``` + +## Understanding the information shown in `slurm-mail` + +When a Slurm job finishes, `slurm-mail` sends a summary table with the most important information about the job. This page explains those fields in a simple way. + +!!! tip + A Slurm job goes through several phases: + + - it is **submitted** + - it may **wait in queue** + - it starts **running** + - it finishes with some final **state** + + The **Slurm email report** helps you answer questions like: + + - Did my job really start? + - How long did it wait? + - How long did it run? + - Did it finish normally or fail? + - ***Did it use the resources efficiently?*** + +### Common fields + +* **ID**: The unique number of your job in Slurm. +* **Account**: The Slurm account charged for the job. This is important mainly when different projects, groups, or budgets exist. By default, all users use the default `merlin` account. +* **Name**: The name of the job. This is usually set with `--job-name=myjob`. If you do not set it, Slurm may use the script name or a default name. This field is only for identification. It does not affect performance. +* **Nodes**: Amount of nodes used by the job. +* **Partition**: The Slurm partition where the job ran. A partition is a group of nodes with a certain purpose or policy. Think of it as the queue or class of machines used for the job. +* **Requested Memory**: The amount of memory requested or allocated for the job. If your job uses more memory than allowed, it may fail with `OUT_OF_MEMORY`. A common mistake is to request too little memory. Another common mistake is to request far more than necessary, which can make scheduling harder. +* **TimeLimit**: The runtime requested for the job. Notice that this is **not** the actual runtime of the job. Default value can be replaced with `--time=D-HH:MM:SS`. +* **Std out**: Location of the standard out file of the job. Default value can be replaced with `--output=my.out`. +* **Std err**: Location of the standard error file of the job. Default value can be replaced with `--output=my.err`. +* **Work dir**: The work directory used by the job. +* **Admin Comment**: The admin comment added to the job. +* **Comment**: The job's comment. +* **Start**: The date and time when the job actually started running on a node: this is the moment when resources were assigned and your job began execution. +* **End**: The date and time when the job finished. +* **Elapsed**: The real elapsed runtime (wallclock) of the job. In simple terms, **wallclock is how much real time passed from start to finish**. If your job started at 10:00 and ended at 10:37, the wallclock (`Elapsed`) time is 37 minutes. This is different from `CPU Time` (see below). Even if a job uses many CPUs, wallclock is just the normal time seen on a clock. +* **CPU Time**: Is the total amount of CPU work used by the job. Unlike **`Elapsed`**, which is normal clock time from start to finish, **`CPU Time`** adds up the CPU usage across all allocated tasks and CPUs. +Because of that, **it can be larger than `Elapsed`**, especially for parallel jobs. + * ***This is value is very important***, further details here: [About CPU Time and CPU Efficiency](/merlin7/03-Slurm-General-Documentation/slurm-mail/#about-cpu-time-and-cpu-efficiency) +* **CPU Efficiency**: How well the allocated CPUs were actually used. + * ***This is value is very important***, further details here: [About CPU Time and CPU Efficiency](/merlin7/03-Slurm-General-Documentation/slurm-mail/#about-cpu-time-and-cpu-efficiency) +* **Maximum memory usage per node**: this is the highest memory consumption observed on one node during the job. If the job used multiple nodes, this value does **not** represent the total memory used by the whole job, only the worst case on a single node. This helps detect whether one node was close to the memory limit, even if the average usage across the job was lower. +* **Wallclock Accuracy**: This tells you how close your requested Time Limit was to the actual runtime. + * ***This is value is very important***, further details here: [About Wallclock Accuracy](/merlin7/03-Slurm-General-Documentation/slurm-mail/#about-wallclock-accuracy) +* **Node List**: The list of nodes used by the job. +* **Exit Code**: The return code of the job. Usually, `0:0` means success, anything else usually means an error somewhere. For many users, a simple rule is: `0` = good, `non-zero` = something probably failed. Note that a job can sometimes look finished but still have a non-zero exit code if the application itself failed. +* **Exit State**: The final result of the job. Common examples: + * `COMPLETED` → job finished successfully + * `FAILED` → job ended with an error + * `CANCELLED` → job was cancelled by user or admin + * `TIMEOUT` → job hit the time limit + * `OUT_OF_MEMORY` → job used more memory than allowed + * `NODE_FAIL` → a node problem affected the job + * `WALLCLOCK EXCEEDED` → the job has exceeded the wallclock. + +### Trackable Resources + +These fields show the main resources that Slurm accounted for in the job. In Slurm, these are called **TRES** (**Trackable RESources**): resources that can be tracked for usage and, in some cases, also used for limits, fairshare, or billing. + +* **`billing`**: A billing value calculated by Slurm from the job resources, using the partition's configured [TRES billing weights](https://slurm.schedmd.com/tres.html#TRESBillingWeights). It is mainly used internally for accounting, fairshare, and enforcing some usage limits. It is not always equal to the number of CPUs, although on some systems it may end up looking similar. ***This value can be typically ignored for day-to-day job debugging.*** +* **`cpu`**: The number of CPUs allocated to the job. For most users, this is simply the number of CPU slots reserved for the job. +* **`mem`**: The memory allocated to the job. This is the memory reservation used by Slurm for the job allocation, not necessarily the memory the job actually consumed. To see actual memory usage, look at memory-usage fields such as maximum or peak memory instead. +* **`node`**: The number of nodes allocated to the job. + +!!! tip + These values describe the resources assigned/accounted by Slurm, ***not necessarily the resources actually used by the application.*** + In particular, **`mem`** here is typically the allocated/requested memory, while actual memory consumption should be checked in the memory usage fields of the report. + **`billing`** is often the least intuitive field for users, because it depends on site configuration such as `TRESBillingWeights`. + + For most users, the most relevant fields here are **`cpu`**, **`mem`**, and **`node`**. The **`billing`** value is mainly useful for understanding accounting + and fairshare, and can usually be ignored for day-to-day job debugging. + +### About CPU Time and CPU Efficiency + +**CPU Time** is the total amount of CPU work used by the job. Therefore, it's a measure of how much processing time the CPUs spent on the job, and its value grows when more CPUs are used in parallel. + +Unlike **Elapsed**, which is normal clock time from start to finish, **CPU Time** adds up the CPU usage across all allocated tasks and CPUs. Because of that, it can be **larger than Elapsed**, especially for parallel jobs. + +!!! tip + * **Elapsed** = how long the job took on the clock + * **CPU Time** = how much total CPU work was done + + So if a job runs for **1 hour** on **4 CPUs** and keeps them busy, the **CPU Time** will be roughly **4 hours**, while **Elapsed** will still be about **1 hour**. + This follows Slurm’s accounting model, where CPU-related time is accumulated across the job rather than shown only as wall-clock duration. + +**CPU Efficiency** is about wow well the allocated CPUs were actually used: + +* **High CPU efficiency** usually means the CPUs were busy doing useful work. +* **Low CPU efficiency** may mean the job was waiting, sleeping, doing I/O, or using fewer CPUs than requested (e.g. typically when using `--hint=nomultithread`) + +!!! warning + When using `--hint=nomultithread`, CPU Efficiency will appear lower, roughly by half. This is normal and expected. + + This happens because only one hardware thread per core is used, while Slurm accounting may still reflect the full threaded CPU capacity of the node. + + **This is commonly the right choice for many MPI jobs**, since using both hardware threads of a core often does not improve performance and can even make it worse. + In other words, ***a lower reported CPU Efficiency in this case does not necessarily mean the job is inefficient.*** + + +The **CPU Time** field is useful together with **CPU Efficiency**: + +* If **CPU Time** is much lower than expected, the job may have spent time waiting, sleeping, or doing I/O. +* If **CPU Time** is high and **CPU Efficiency** is also high, the CPUs were likely kept busy doing useful work. + +!!! tip + **CPU Time** and **CPU Efficiency** fields are very useful for improving future submissions. + +### About Wallclock Accuracy + +The wallclock accuracy of the job tells you how close your requested Time Limit was to the actual runtime. + +In simple terms: + +* **Good accuracy** means you requested a runtime close to what the job really needed +* **Poor accuracy** means you asked for much more time than the job used, or the job hit the limit too closely + +Why this matters: + +* **If you request far too much time**, your jobs may wait longer in queue than necessary. +* **If you request too little time**, jobs may be killed before finishing + +A practical way to think about it: + +* Very low usage of the time limit → **probably overestimated** +* Very close to 100% → **maybe risky**, the job could hit the limit next time +* Somewhere reasonably in between → **usually better** + +!!! warning + * For the actual runtime of the job, look at `Elapsed` instead. + * **`Wallclock Accuracy` can sometimes be slightly above 100%.** This usually happens because a small amount of extra time may be spent in Slurm-related steps at the end of the job, beyond the application runtime itself. + +### Wallclock estimation examples + +The following example probably means the job asked for much more time than needed: + +```yaml +Time Limit: 01:00:00 +Elapsed: 00:10:00 +Wallclock Accuracy: ~16.67% # Aprox. value, may slightly differ on real tests +``` + +The example below is much closer and usually a better estimate: + +```yaml +Time Limit: 01:00:00 +Elapsed: 00:55:00 +Wallclock Accuracy: ~91.67% # Aprox. value, may slightly differ on real tests +``` diff --git a/mkdocs.yml b/mkdocs.yml index 864dee0d..9d0db566 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -115,6 +115,7 @@ nav: - merlin7/03-Slurm-General-Documentation/slurm-configuration.md - merlin7/03-Slurm-General-Documentation/interactive-jobs.md - merlin7/03-Slurm-General-Documentation/slurm-examples.md + - merlin7/03-Slurm-General-Documentation/slurm-mail.md - Jupyterhub: - merlin7/04-Jupyterhub/jupyterhub.md - Software Support: