diff --git a/_data/sidebars/merlin6_sidebar.yml b/_data/sidebars/merlin6_sidebar.yml index 31e2bc6..0fc6893 100644 --- a/_data/sidebars/merlin6_sidebar.yml +++ b/_data/sidebars/merlin6_sidebar.yml @@ -35,7 +35,7 @@ entries: url: /merlin6/storage.html - title: Transferring Data url: /merlin6/transfer-data.html - - title: NoMachine + - title: Remote Desktop Access url: /merlin6/nomachine.html - title: Job Submission folderitems: @@ -43,9 +43,9 @@ entries: url: /merlin6/using-modules.html - title: Slurm Basic Commands url: /merlin6/slurm-basics.html - - title: Running Jobs + - title: Running Batch Scripts url: /merlin6/running-jobs.html - - title: Interactive Jobs + - title: Running Interactive Jobs url: /merlin6/interactive-jobs.html - title: Slurm Examples url: /merlin6/slurm-examples.html diff --git a/images/Slurm/sview.png b/images/Slurm/sview.png new file mode 100644 index 0000000..d6081c7 Binary files /dev/null and b/images/Slurm/sview.png differ diff --git a/pages/merlin6/03 Job Submission/interactive-jobs.md b/pages/merlin6/03 Job Submission/interactive-jobs.md index 4c1873b..f10cf1d 100644 --- a/pages/merlin6/03 Job Submission/interactive-jobs.md +++ b/pages/merlin6/03 Job Submission/interactive-jobs.md @@ -1,8 +1,8 @@ --- -title: Interactive Jobs +title: Running Interactive Jobs #tags: keywords: interactive, X11, X, srun -last_updated: 22 October 2019 +last_updated: 23 January 2020 summary: "This document describes how to run interactive jobs as well as X based software." sidebar: merlin6_sidebar permalink: /merlin6/interactive-jobs.html @@ -11,7 +11,10 @@ permalink: /merlin6/interactive-jobs.html ## Running interactive jobs There are two different ways for running interactive jobs in Slurm. This is possible by using -the ``srun`` or the ``salloc`` commands. +the ``salloc`` and ``srun`` commands: + +* **``salloc``**: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished. +* **``srun``**: is used for running parallel tasks. ### srun @@ -143,65 +146,9 @@ For other non officially supported graphical access (X11 forwarding): * For Windows clients, please follow [{Accessing Merlin -> Accessing from Windows Clients}](/merlin6/connect-from-windows.html) * For MacOS clients, please follow [{Accessing Merlin -> Accessing from MacOS Clients}](/merlin6/connect-from-macos.html) -#### Enable SSH Keys authentication - -For running ``srun`` with **X11** support (``srun --x11``) , you need to setup RSA keys properly. - -1. Generate the RSA keys as follows: - - ```bash - ssh-keygen -t rsa - ``` - - You will be requested for an *optional* passphrase. Entering it, provides more security (if somebody steals your private key he will - need to know the passphrase, however every time you use RSA keys you will need to type it). Whether to set a passphrase or not is up - to the users. - -2. Add the public key to the ``~/.ssh/authorized_keys`` file - - ```bash - cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys - ``` - -3. Ensure that ``~/.ssh/authorized_keys`` has proper permissions: - - ```bash - chmod 600 ~/.ssh/authorized_keys - ``` - -
-[Show 'ssh-keygen' example]: Generate RSA keys with default key filenames -
-(base) [caubet_m@merlin-l-001 .ssh]$ ssh-keygen -t rsa
-Generating public/private rsa key pair.
-Enter file in which to save the key (/psi/home/caubet_m/.ssh/id_rsa): 
-Enter passphrase (empty for no passphrase): 
-Enter same passphrase again: 
-Your identification has been saved in /psi/home/caubet_m/.ssh/id_rsa.
-Your public key has been saved in /psi/home/caubet_m/.ssh/id_rsa.pub.
-The key fingerprint is:
-SHA256:AMvGhBWxXs1MXHvwTpvXCOpjUZgy30E+5V38bcj4k2I caubet_m@merlin-l-001.psi.ch
-The key's randomart image is:
-+---[RSA 2048]----+
-|   o*o ...o . ...|
-|  .+ + =.  O o .o|
-|    * o * + Xo..+|
-|   o . . + B.*oo+|
-|    .   S + =.oo.|
-|         . .E.+  |
-|          +. . . |
-|         . .     |
-|                 |
-+----[SHA256]-----+
-
-(base) [caubet_m@merlin-l-001 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
-(base) [caubet_m@merlin-l-001 .ssh]$ chmod 600 ~/.ssh/authorized_keys
-
-
- ### 'srun' with x11 support -Once RSA keys are setup, you can run any windows based application. For that, you need to +Merlin5 and Merlin6 clusters allow running any windows based applications. For that, you need to add the option ``--x11`` to the ``srun`` command. In example: ```bash @@ -243,7 +190,7 @@ exit ### 'salloc' with x11 support -Once RSA keys are setup, you can run any windows based application. For that, you need to +**Merlin5** and **Merlin6** clusters allow running any windows based applications. For that, you need to add the option ``--x11`` to the ``salloc`` command. In example: ```bash diff --git a/pages/merlin6/03 Job Submission/monitoring.md b/pages/merlin6/03 Job Submission/monitoring.md index f7ff08f..01cf3c4 100644 --- a/pages/merlin6/03 Job Submission/monitoring.md +++ b/pages/merlin6/03 Job Submission/monitoring.md @@ -8,7 +8,200 @@ sidebar: merlin6_sidebar permalink: /merlin6/monitoring.html --- -## Monitoring +## Slurm Monitoring + +### Job status + +The status of submitted jobs can be check with the ``squeue`` command: + +```bash +squeue -u $username +``` + +Common statuses: + +* **merlin-\***: Running on the specified host +* **(Priority)**: Waiting in the queue +* **(Resources)**: At the head of the queue, waiting for machines to become available +* **(AssocGrpCpuLimit), (AssocGrpNodeLimit)**: Job would exceed per-user limitations on + the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and + resubmit with fewer resources, or else wait for your other jobs to finish. +* **(PartitionNodeLimit)**: Exceeds all resources available on this partition. + Run `scancel` and resubmit to a different partition (`-p`) or with fewer + resources. + +Check in the **man** pages (``man squeue``) for all possible options for this command. + +
+[Show 'squeue' example] +
+[root@merlin-slurmctld01 ~]# squeue -u feichtinger
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+         134332544   general spawner- feichtin  R 5-06:47:45      1 merlin-c-204
+         134321376   general subm-tal feichtin  R 5-22:27:59      1 merlin-c-204
+
+
+ +### Partition status + +The status of the nodes and partitions (a.k.a. queues) can be seen with the ``sinfo`` command: + +```bash +sinfo +``` + +Check in the **man** pages (``man sinfo``) for all possible options for this command. + +
+[Show 'sinfo' example] +
+[root@merlin-l-001 ~]# sinfo -l
+Thu Jan 23 16:34:49 2020
+PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE NODELIST
+test         up 1-00:00:00 1-infinite   no       NO        all      3       mixed merlin-c-[024,223-224]
+test         up 1-00:00:00 1-infinite   no       NO        all      2   allocated merlin-c-[123-124]
+test         up 1-00:00:00 1-infinite   no       NO        all      1        idle merlin-c-023
+general*     up 7-00:00:00       1-50   no       NO        all      6       mixed merlin-c-[007,204,207-209,219]
+general*     up 7-00:00:00       1-50   no       NO        all     57   allocated merlin-c-[001-005,008-020,101-122,201-203,205-206,210-218,220-222]
+general*     up 7-00:00:00       1-50   no       NO        all      3        idle merlin-c-[006,021-022]
+daily        up 1-00:00:00       1-60   no       NO        all      9       mixed merlin-c-[007,024,204,207-209,219,223-224]
+daily        up 1-00:00:00       1-60   no       NO        all     59   allocated merlin-c-[001-005,008-020,101-124,201-203,205-206,210-218,220-222]
+daily        up 1-00:00:00       1-60   no       NO        all      4        idle merlin-c-[006,021-023]
+hourly       up    1:00:00 1-infinite   no       NO        all      9       mixed merlin-c-[007,024,204,207-209,219,223-224]
+hourly       up    1:00:00 1-infinite   no       NO        all     59   allocated merlin-c-[001-005,008-020,101-124,201-203,205-206,210-218,220-222]
+hourly       up    1:00:00 1-infinite   no       NO        all      4        idle merlin-c-[006,021-023]
+gpu          up 7-00:00:00 1-infinite   no       NO        all      1       mixed merlin-g-007
+gpu          up 7-00:00:00 1-infinite   no       NO        all      8   allocated merlin-g-[001-006,008-009]
+
+
+ +### Job efficiency + +Users can check how efficient are their jobs. For that, the ``seff`` command is available. + +```bash +seff $jobid +``` + +
+[Show 'seff' example] +
+[root@merlin-slurmctld01 ~]# seff 134333893
+Job ID: 134333893
+Cluster: merlin6
+User/Group: albajacas_a/unx-sls
+State: COMPLETED (exit code 0)
+Nodes: 1
+Cores per node: 8
+CPU Utilized: 00:26:15
+CPU Efficiency: 49.47% of 00:53:04 core-walltime
+Job Wall-clock time: 00:06:38
+Memory Utilized: 60.73 MB
+Memory Efficiency: 0.19% of 31.25 GB
+
+
+ +### List job attributes + +The ``sjstat`` command is used to display statistics of jobs under control of SLURM. To use it + +```bash +jstat +``` + +
+[Show 'sjstat' example] +
+[root@merlin-l-001 ~]# sjstat -v
+
+Scheduling pool data:
+----------------------------------------------------------------------------------
+                           Total  Usable   Free   Node   Time      Other          
+Pool         Memory  Cpus  Nodes   Nodes  Nodes  Limit  Limit      traits         
+----------------------------------------------------------------------------------
+test        373502Mb    88      6       6      1  UNLIM 1-00:00:00   
+general*    373502Mb    88     66      66      8     50 7-00:00:00   
+daily       373502Mb    88     72      72      9     60 1-00:00:00   
+hourly      373502Mb    88     72      72      9  UNLIM   01:00:00   
+gpu         128000Mb     8      1       1      0  UNLIM 7-00:00:00   
+gpu         128000Mb    20      8       8      0  UNLIM 7-00:00:00   
+
+Running job data:
+---------------------------------------------------------------------------------------------------
+                                                 Time        Time            Time                  
+JobID    User      Procs Pool      Status        Used       Limit         Started  Master/Other    
+---------------------------------------------------------------------------------------------------
+13433377 collu_g       1 gpu       PD            0:00    24:00:00             N/A  (Resources)
+13433389 collu_g      20 gpu       PD            0:00    24:00:00             N/A  (Resources)
+13433382 jaervine      4 gpu       PD            0:00    24:00:00             N/A  (Priority)
+13433386 barret_d     20 gpu       PD            0:00    24:00:00             N/A  (Priority)
+13433382 pamula_f     20 gpu       PD            0:00   168:00:00             N/A  (Priority)
+13433387 pamula_f      4 gpu       PD            0:00    24:00:00             N/A  (Priority)
+13433365 andreani    132 daily     PD            0:00    24:00:00             N/A  (Dependency)
+13433388 marino_j      6 gpu       R          1:43:12   168:00:00  01-23T14:54:57  merlin-g-007
+13433377 choi_s       40 gpu       R          2:09:55    48:00:00  01-23T14:28:14  merlin-g-006
+13433373 qi_c         20 gpu       R          7:00:04    24:00:00  01-23T09:38:05  merlin-g-004
+13433390 jaervine      2 gpu       R             5:18    24:00:00  01-23T16:32:51  merlin-g-007
+13433390 jaervine      2 gpu       R            15:18    24:00:00  01-23T16:22:51  merlin-g-007
+13433375 bellotti      4 gpu       R          7:35:44     9:00:00  01-23T09:02:25  merlin-g-001
+13433358 bellotti      1 gpu       R       1-05:52:19   144:00:00  01-22T10:45:50  merlin-g-007
+13433377 lavriha_     20 gpu       R          5:13:24    24:00:00  01-23T11:24:45  merlin-g-008
+13433370 lavriha_     40 gpu       R         22:43:09    24:00:00  01-22T17:55:00  merlin-g-003
+13433373 qi_c         20 gpu       R         15:03:15    24:00:00  01-23T01:34:54  merlin-g-002
+13433371 qi_c          4 gpu       R         22:14:14   168:00:00  01-22T18:23:55  merlin-g-001
+13433254 feichtin      2 general   R       5-07:26:11   156:00:00  01-18T09:11:58  merlin-c-204
+13432137 feichtin      2 general   R       5-23:06:25   160:00:00  01-17T17:31:44  merlin-c-204
+13433389 albajaca     32 hourly    R            41:19     1:00:00  01-23T15:56:50  merlin-c-219
+13433387 riemann_      2 general   R          1:51:47     4:00:00  01-23T14:46:22  merlin-c-204
+13433370 jimenez_      2 general   R         23:20:45   168:00:00  01-22T17:17:24  merlin-c-106
+13433381 jimenez_      2 general   R          4:55:33   168:00:00  01-23T11:42:36  merlin-c-219
+13433390 sayed_m     128 daily     R            21:49    10:00:00  01-23T16:16:20  merlin-c-223
+13433359 adelmann      2 general   R       1-05:00:09    48:00:00  01-22T11:38:00  merlin-c-204
+13433377 zimmerma      2 daily     R          6:13:38    24:00:00  01-23T10:24:31  merlin-c-007
+13433375 zohdirad     24 daily     R          7:33:16    10:00:00  01-23T09:04:53  merlin-c-218
+13433363 zimmerma      6 general   R       1-02:54:20    47:50:00  01-22T13:43:49  merlin-c-106
+13433376 zimmerma      6 general   R          7:25:42    23:50:00  01-23T09:12:27  merlin-c-007
+13433371 vazquez_     16 daily     R         21:46:31    23:59:00  01-22T18:51:38  merlin-c-106
+13433382 vazquez_     16 daily     R          4:09:23    23:59:00  01-23T12:28:46  merlin-c-024
+13433376 jiang_j1    440 daily     R          7:11:14    10:00:00  01-23T09:26:55  merlin-c-123
+13433376 jiang_j1     24 daily     R          7:08:19    10:00:00  01-23T09:29:50  merlin-c-220
+13433384 kranjcev    440 daily     R          2:48:19    24:00:00  01-23T13:49:50  merlin-c-108
+13433371 vazquez_     16 general   R         20:15:15   120:00:00  01-22T20:22:54  merlin-c-210
+13433371 vazquez_     16 general   R         21:15:51   120:00:00  01-22T19:22:18  merlin-c-210
+13433374 colonna_    176 daily     R          8:23:18    24:00:00  01-23T08:14:51  merlin-c-211
+13433374 bures_l      88 daily     R         10:45:06    24:00:00  01-23T05:53:03  merlin-c-001
+13433375 derlet       88 daily     R          7:32:05    24:00:00  01-23T09:06:04  merlin-c-107
+13433373 derlet       88 daily     R         17:21:57    24:00:00  01-22T23:16:12  merlin-c-002
+13433373 derlet       88 daily     R         18:13:05    24:00:00  01-22T22:25:04  merlin-c-112
+13433365 andreani    264 daily     R          4:10:08    24:00:00  01-23T12:28:01  merlin-c-003
+13431187 mahrous_     88 general   R       6-15:59:16   168:00:00  01-17T00:38:53  merlin-c-111
+13433387 kranjcev      2 general   R          1:48:47     4:00:00  01-23T14:49:22  merlin-c-204
+13433368 karalis_    352 general   R       1-00:05:22    96:00:00  01-22T16:32:47  merlin-c-013
+13433367 karalis_    352 general   R       1-00:06:44    96:00:00  01-22T16:31:25  merlin-c-118
+13433385 karalis_    352 general   R          1:37:24    96:00:00  01-23T15:00:45  merlin-c-213
+13433374 sato        256 general   R         14:55:55    24:00:00  01-23T01:42:14  merlin-c-204
+13433374 sato         64 general   R         10:43:35    24:00:00  01-23T05:54:34  merlin-c-106
+67723568 sato         32 general   R         10:40:07    24:00:00  01-23T05:58:02  merlin-c-007
+13433265 khanppna    440 general   R       3-18:20:58   168:00:00  01-19T22:17:11  merlin-c-008
+13433375 khanppna    704 general   R          7:31:24    24:00:00  01-23T09:06:45  merlin-c-101
+13433371 khanppna    616 general   R         21:40:33    24:00:00  01-22T18:57:36  merlin-c-208
+
+
+ +### Graphical user interface + +When using **ssh** with X11 forwarding (``ssh -XY``) users can use ``sview``. **SView** is a graphical user +interface to view and modify Slurm state. To run **sview**: + +```bash +ssh -XY $username@merlin-l-001.psi.ch +sview +``` + +!['sview' graphical user interface]({{ "/images/Slurm/sview.png" }}) + + +## General Monitoring The following pages contain basic monitoring for Slurm and computing nodes. Currently, monitoring is based on Grafana + InfluxDB. In the future it will @@ -20,16 +213,16 @@ support: ### Merlin6 Monitoring Pages * Slurm monitoring: - * [Merlin6 Slurm Live Status](https://hpc-monitor01.psi.ch/d/vpwNKUhZz/merlin6-slurm-live-status?refresh=10s&orgId=1) - * [Merlin6 Slurm Overview](https://hpc-monitor01.psi.ch/d/QzBI6QoZz/merlin5-slurm-overview?refresh=10s&orgId=1) + * [Merlin6 Slurm Live Status](https://hpc-monitor02.psi.ch/d/QNcbW1AZk/merlin6-slurm-live-status?orgId=1&refresh=10s) + * [Merlin6 Slurm Overview](https://hpc-monitor02.psi.ch/d/94UxWJ0Zz/merlin6-slurm-overview?orgId=1&refresh=10s) * Nodes monitoring: - * [Merlin6 CPU Nodes Overview](https://hpc-monitor01.psi.ch/d/JmvLR8gZz/merlin6-computing-cpu-nodes?refresh=10s&orgId=1) - * [Merlin6 GPU Nodes Overview](https://hpc-monitor01.psi.ch/d/98l409-mk/merlin6-computing-gpu-nodes?refresh=5s&orgId=1) + * [Merlin6 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/JmvLR8gZz/merlin6-computing-cpu-nodes?orgId=1&refresh=10s) + * [Merlin6 GPU Nodes Overview](https://hpc-monitor02.psi.ch/d/gOo1Z10Wk/merlin6-computing-gpu-nodes?orgId=1&refresh=10s) ### Merlin5 Monitoring Pages * Slurm monitoring: - * [Merlin5 Slurm Live Status](https://hpc-monitor01.psi.ch/d/UbKbewTWz/merlin5-slurm-live-status?refresh=10s&orgId=1) - * [Merlin5 Slurm Overview](https://hpc-monitor01.psi.ch/d/QzBI6QoZz/merlin5-slurm-overview?refresh=10s&orgId=1) + * [Merlin5 Slurm Live Status](https://hpc-monitor02.psi.ch/d/o8msZJ0Zz/merlin5-slurm-live-status?orgId=1&refresh=10s) + * [Merlin5 Slurm Overview](https://hpc-monitor02.psi.ch/d/eWLEW1AWz/merlin5-slurm-overview?orgId=1&refresh=10s) * Nodes monitoring: - * [Merlin5 CPU Nodes Overview](https://hpc-monitor01.psi.ch/d/a-TsfGpZk/merlin5-computing-cpu-nodes?refresh=10s&orgId=1) + * [Merlin5 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/ejTyWJAWk/merlin5-computing-cpu-nodes?orgId=1&refresh=10s) diff --git a/pages/merlin6/03 Job Submission/running-jobs.md b/pages/merlin6/03 Job Submission/running-jobs.md index 913a001..2fa0a77 100644 --- a/pages/merlin6/03 Job Submission/running-jobs.md +++ b/pages/merlin6/03 Job Submission/running-jobs.md @@ -1,40 +1,69 @@ --- -title: Running Jobs +title: Running Slurm Scripts #tags: -#keywords: -last_updated: 18 June 2019 -#summary: "" +keywords: batch script, slurm, sbatch, srun +last_updated: 23 January 2020 +summary: "This document describes how to run batch scripts in Slurm." sidebar: merlin6_sidebar permalink: /merlin6/running-jobs.html --- -## Commands for running jobs -* **``sbatch``**: to submit a batch script to Slurm +## The rules + +Before starting using the cluster, please read the following rules: + +1. Always try to **estimate and** to **define a proper run time** of your jobs: + * Use ``--time=`` for that. + * This will ease the scheduling. + * Slurm will schedule efficiently the queued jobs. + * For very long runs, please consider using ***[Job Arrays with Checkpointing](/merlin6/running-jobs.html#array-jobs-running-very-long-tasks-with-checkpoint-files)*** +2. Try to optimize your jobs for running within **one day**. Please, consider the following: + * Some software can simply scale up by using more nodes while drastically reducing the run time. + * Some software allow to save a specific state, and a second job can start from that state. + * ***[Job Arrays with Checkpointing](/merlin6/running-jobs.html#array-jobs-running-very-long-tasks-with-checkpoint-files)*** can help you with that. + * Use the **'daily'** partition when you ensure that you can run within one day: + * ***'daily'*** **will give you more priority than running in the** ***'general'*** **queue!** +3. Is **forbidden** to run **very short jobs**: + * Running jobs of few seconds can cause severe problems. + * Running very short jobs causes a lot of overhead. + * ***Question:*** Is my job a very short job? + * ***Answer:*** If it lasts in few seconds or very few minutes, yes. + * ***Question:*** How long should my job run? + * ***Answer:*** as the *Rule of Thumb*, from 5' would start being ok, from 15' would preferred. + * Use ***[Packed Jobs](/merlin6/running-jobs.html#packed-jobs-running-a-large-number-of-short-tasks)*** for running a large number of short tasks. + * For short runs lasting in less than 1 hour, please use the **hourly** partition. + * ***'hourly'*** **will give you more priority than running in the** ***'daily'*** **queue!** +4. Do not submit hundreds of similar jobs! + * Use ***[Array Jobs](/merlin6/running-jobs.html#array-jobs-launching-a-large-number-of-related-jobs)*** for gathering jobs instead. + +## Basic commands for running batch scripts + +**``sbatch``** is the command used for submitting a batch script to Slurm + * Use **``srun``**: to run parallel tasks. + * As an alternative, ``mpirun`` and ``mpiexec`` can be used. However, ***is strongly recommended to user ``srun``**** instead. * Use **``squeue``** for checking jobs status * Use **``scancel``** for deleting a job from the queue. -* **``srun``**: to run parallel jobs in the batch system -* **``salloc``**: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished. -## Slurm parameters +## Basic settings For a complete list of options and parameters available is recommended to use the **man** pages (``man sbatch``, ``man srun``, ``man salloc``). Please, notice that behaviour for some parameters might change depending on the command (in example, ``--exclusive`` behaviour in ``sbatch`` differs from ``srun``. In this chapter we show the basic parameters which are usually needed in the Merlin cluster. -### Running in Merlin5 & Merlin6 +### Clusters * For running jobs in the **Merlin6** computing nodes, users have to add the following option: -```bash -#SBATCH --clusters=merlin6 -``` + ```bash + #SBATCH --clusters=merlin6 + ``` * For running jobs in the **Merlin5** computing nodes, users have to add the following options: -```bash -#SBATCH --clusters=merlin5 -``` + ```bash + #SBATCH --clusters=merlin5 + ``` ***For advanced users:*** If you do not care where to run the jobs (**Merlin5** or **Merlin6**) you can skip this setting, however you must make sure that your code can run on both clusters without any problem and you have defined proper settings in your *batch* script. @@ -54,24 +83,25 @@ For Merlin6, if no partition is defined, ``general`` will be the default, while Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup. -### Enabling/disabling hyperthreading +### Hyperthreaded vs non-Hyperthreaded jobs Computing nodes in **merlin6** have hyperthreading enabled: every core is running two threads. It means that for many cases it needs to be disabled and only those multithread-based applications will benefit from that. There are some parameters that users must apply: * For **hyperthreaded based jobs** users ***must*** specify the following options: -```bash -#SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs -#SBATCH --hint=multithread # Mandatory for multithreaded jobs -``` + ```bash + #SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs + #SBATCH --hint=multithread # Mandatory for multithreaded jobs + ``` * For **non-hyperthreaded based jobs** users ***must*** specify the following options: -```bash -#SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs -#SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs -``` -### Shared nodes and exclusivity + ```bash + #SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs + #SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs + ``` + +### Shared vs exclusive nodes The **Merlin5** and **Merlin6** clusters are designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This behaviour can be changed by a user if they require exclusive usage of nodes. @@ -83,7 +113,7 @@ Exclusivity of a node can be setup by specific the ``--exclusive`` option as fol #SBATCH --exclusive ``` -### Slurm CPU Recommended Settings +### Time There are some settings that are not mandatory but would be needed or useful to specify. These are the following: @@ -111,62 +141,9 @@ If you want to the default names it can be done with the options ``--output`` an Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**. -## CPU-based Jobs Settings +### GPU specific settings -CPU-based jobs are available for all PSI users. Users must belong to the ``merlin6`` Slurm ``Account`` in order to be able -to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the ``Account``. - -### Slurm CPU Templates - -The following examples apply to the **Merlin6** cluster. - -#### Nomultithreaded jobs example - -The following template should be used by any user submitting jobs to CPU nodes: - -```bash -#!/bin/sh -#SBATCH --partition= # Specify 'general' or 'daily' or 'hourly' -#SBATCH --time= # Strictly recommended when using 'general' partition. -#SBATCH --output= # Generate custom output file -#SBATCH --error= # Generate custom error file -#SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs -#SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs -##SBATCH --exclusive # Uncomment if you need exclusive node usage - -## Advanced options example -##SBATCH --nodes=1 # Uncomment and specify #nodes to use -##SBATCH --ntasks=44 # Uncomment and specify #nodes to use -##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node -##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task -``` - -#### Multithreaded jobs - -The following template should be used by any user submitting jobs to CPU nodes: - -```bash -#!/bin/sh -#SBATCH --partition= # Specify 'general' or 'daily' or 'hourly' -#SBATCH --time= # Strictly recommended when using 'general' partition. -#SBATCH --output= # Generate custom output file -#SBATCH --error= # Generate custom error file -#SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs -#SBATCH --hint=multithread # Mandatory for multithreaded jobs -##SBATCH --exclusive # Uncomment if you need exclusive node usage - -## Advanced options example -##SBATCH --nodes=1 # Uncomment and specify #nodes to use -##SBATCH --ntasks=88 # Uncomment and specify #nodes to use -##SBATCH --ntasks-per-node=88 # Uncomment and specify #tasks per node -##SBATCH --cpus-per-task=88 # Uncomment and specify the number of cores per task -``` - -## GPU-based Jobs Settings - -**Merlin6** GPUs are available for all PSI users, however, this is restricted to any user belonging to the ``merlin-gpu`` account. By default, all users are added to this account (exceptions could apply). - -### Merlin6 GPU account +#### Slurm account When using GPUs, users must switch to the **merlin-gpu** Slurm account in order to be able to run on GPU-based nodes. This is done with the ``--account`` setting as follows: @@ -174,7 +151,7 @@ When using GPUs, users must switch to the **merlin-gpu** Slurm account in order #SBATCH --account=merlin-gpu # The account 'merlin-gpu' must be used ``` -### Slurm CPU Mandatory Settings +#### GRES The following options are mandatory settings that **must be included** in your batch scripts: @@ -182,7 +159,7 @@ The following options are mandatory settings that **must be included** in your b #SBATCH --gres=gpu # Always set at least this option when using GPUs ``` -#### Slurm GPU Recommended Settings +##### GRES advanced settings GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process must be used. Users can define which GPUs resources they need with the ``--gres`` option. @@ -197,15 +174,63 @@ In example: ***Important note:*** Due to a bug in the configuration, ``[:type]`` (i.e. ``GTX1080`` or ``GTX1080Ti``) is not working. Users should skip that and use only ``gpu[:count]``. This will be fixed in the upcoming downtimes as it requires a full restart of the batch system. -### Slurm GPU Template +## Batch script templates + +### CPU-based jobs templates + +The following examples apply to the **Merlin6** cluster. + +#### Nomultithreaded jobs template + +The following template should be used by any user submitting jobs to CPU nodes: + +```bash +#!/bin/bash +#SBATCH --partition= # Specify 'general' or 'daily' or 'hourly' +#SBATCH --time= # Strongly recommended +#SBATCH --output= # Generate custom output file +#SBATCH --error= # Generate custom error file +#SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs +#SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs +##SBATCH --exclusive # Uncomment if you need exclusive node usage + +## Advanced options example +##SBATCH --nodes=1 # Uncomment and specify #nodes to use +##SBATCH --ntasks=44 # Uncomment and specify #nodes to use +##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node +##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task +``` + +#### Multithreaded jobs template + +The following template should be used by any user submitting jobs to CPU nodes: + +```bash +#!/bin/bash +#SBATCH --partition= # Specify 'general' or 'daily' or 'hourly' +#SBATCH --time= # Strongly recommended +#SBATCH --output= # Generate custom output file +#SBATCH --error= # Generate custom error file +#SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs +#SBATCH --hint=multithread # Mandatory for multithreaded jobs +##SBATCH --exclusive # Uncomment if you need exclusive node usage + +## Advanced options example +##SBATCH --nodes=1 # Uncomment and specify #nodes to use +##SBATCH --ntasks=88 # Uncomment and specify #nodes to use +##SBATCH --ntasks-per-node=88 # Uncomment and specify #tasks per node +##SBATCH --cpus-per-task=88 # Uncomment and specify the number of cores per task +``` + +### GPU-based jobs templates The following template should be used by any user submitting jobs to GPU nodes: ```bash -#!/bin/sh +#!/bin/bash #SBATCH --partition=gpu_ # Specify 'general' or 'daily' or 'hourly' #SBATCH --gres="gpu::" # You should specify at least 'gpu' -#SBATCH --time= # Strictly recommended when using 'general' partition. +#SBATCH --time= # Strongly recommended #SBATCH --output= # Generate custom output file #SBATCH --error= squeue -u bliven_s - JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) - 134507729 gpu test_scr bliven_s PD 0:00 3 (AssocGrpNodeLimit) - 134507768 general test_scr bliven_s PD 0:00 19 (AssocGrpCpuLimit) - 134507729 gpu test_scr bliven_s PD 0:00 3 (Resources) - 134506301 gpu test_scr bliven_s PD 0:00 1 (Priority) - 134506288 gpu test_scr bliven_s R 9:16 1 merlin-g-008 ``` -Common Statuses: +This will run 8 independent jobs, where each job can use the counter +variable `SLURM_ARRAY_TASK_ID` defined by Slurm inside of the job's +environment to feed the correct input arguments or configuration file +to the "myprogram" executable. Each job will receive the same set of +configurations (e.g. time limit of 8h in the example above). -* **merlin-\***: Running on the specified host -* **(Priority)**: Waiting in the queue -* **(Resources)**: At the head of the queue, waiting for machines to become available -* **(AssocGrpCpuLimit), (AssocGrpNodeLimit)**: Job would exceed per-user limitations on - the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and - resubmit with fewer resources, or else wait for your other jobs to finish. -* **(PartitionNodeLimit)**: Exceeds all resources available on this partition. - Run `scancel` and resubmit to a different partition (`-p`) or with fewer - resources. +The jobs are independent, but they will run in parallel (if the cluster resources allow for +it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each +have their own output file. + +**Note:** + * Do not use such jobs if you have very short tasks, since each array sub job will incur the full overhead for launching an independent Slurm job. For such cases you should used a **packed job** (see below). + * If you want to control how many of these jobs can run in parallel, you can use the `#SBATCH --array=1-100%5` syntax. The `%5` will define + that only 5 sub jobs may ever run in parallel. + +You also can use an array job approach to run over all files in a directory, substituting the payload with + +``` bash +FILES=(/path/to/data/*) +srun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]} +``` + +Or for a trivial case you could supply the values for a parameter scan in form +of a argument list that gets fed to the program using the counter variable. + +``` bash +ARGS=(0.05 0.25 0.5 1 2 5 100) +srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]} +``` + +### Array jobs: running very long tasks with checkpoint files + +If you need to run a job for much longer than the queues (partitions) permit, and +your executable is able to create checkpoint files, you can use this +strategy: + +``` bash +#!/bin/bash +#SBATCH --job-name=test-checkpoint +#SBATCH --partition=general +#SBATCH --ntasks=1 +#SBATCH --time=7-00:00:00 # each job can run for 7 days +#SBATCH --cpus-per-task=1 +#SBATCH --array=1-10%1 # Run a 10-job array, one job at a time. +if test -e checkpointfile; then + # There is a checkpoint file; + myprogram --read-checkp checkpointfile +else + # There is no checkpoint file, start a new simulation. + myprogram +fi +``` + +The `%1` in the `#SBATCH --array=1-10%1` statement defines that only 1 subjob can ever run in parallel, so +this will result in subjob n+1 only being started when job n has finished. It will read the checkpoint file +if it is present. + + +### Packed jobs: running a large number of short tasks + +Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate +Slurm job. Use job packing, i.e. you run the short tasks within the loop of a single Slurm job. + +You can launch the short tasks using `srun` with the `--exclusive` switch (not to be confused with the +switch of the same name used in the SBATCH commands). This switch will ensure that only a specified +number of tasks can run in parallel. + +As an example, the following job submission script will ask Slurm for +44 cores (threads), then it will run the =myprog= program 1000 times with +arguments passed from 1 to 1000. But with the =-N1 -n1 -c1 +--exclusive= option, it will control that at any point in time only 44 +instances are effectively running, each being allocated one CPU. You +can at this point decide to allocate several CPUs or tasks by adapting +the corresponding parameters. + +``` bash +#! /bin/bash +#SBATCH --job-name=test-checkpoint +#SBATCH --partition=general +#SBATCH --ntasks=1 +#SBATCH --time=7-00:00:00 +#SBATCH --ntasks=44 # defines the number of parallel tasks +for i in {1..1000} +do + srun -N1 -n1 -c1 --exclusive ./myprog $i & +done +wait +``` + +**Note:** The `&` at the end of the `srun` line is needed to not have the script waiting (blocking). +The `wait` command waits for all such background tasks to finish and returns the exit code. diff --git a/pages/merlin6/03 Job Submission/slurm-configuration.md b/pages/merlin6/03 Job Submission/slurm-configuration.md index 616d2f1..9619ab9 100644 --- a/pages/merlin6/03 Job Submission/slurm-configuration.md +++ b/pages/merlin6/03 Job Submission/slurm-configuration.md @@ -1,9 +1,9 @@ --- title: Slurm Configuration #tags: -#keywords: -last_updated: 18 June 2019 -#summary: "" +keywords: configuration, partitions, node definition +last_updated: 23 January 2020 +summary: "This document describes a summary of the Merlin6 configuration." sidebar: merlin6_sidebar permalink: /merlin6/slurm-configuration.html --- @@ -28,9 +28,9 @@ The following table show default and maximum resources that can be used per node | Nodes | Def.#CPUs | Max.#CPUs | #Threads | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs | |:-------------------| ---------:| ---------:| -------- | -----------:| -----------:| ------------:| --------:| --------- | --------- | -| merlin-c-[001-022] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A | -| merlin-c-[101-122] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A | -| merlin-c-[201-222] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A | +| merlin-c-[001-024] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A | +| merlin-c-[101-124] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A | +| merlin-c-[201-224] | 1 core | 44 cores | 2 | 4000 | 352000 | 352000 | 10000 | N/A | N/A | | merlin-g-[001] | 1 core | 8 cores | 1 | 4000 | 102400 | 102400 | 10000 | 1 | 2 | | merlin-g-[002-009] | 1 core | 20 cores | 1 | 4000 | 102400 | 102400 | 10000 | 1 | 4 | diff --git a/pages/merlin6/03 Job Submission/slurm-examples.md b/pages/merlin6/03 Job Submission/slurm-examples.md index 1d1c107..cc8f2e8 100644 --- a/pages/merlin6/03 Job Submission/slurm-examples.md +++ b/pages/merlin6/03 Job Submission/slurm-examples.md @@ -1,160 +1,110 @@ --- title: Slurm Examples #tags: -#keywords: +keywords: example, template, examples, templates, running jobs, sbatch last_updated: 28 June 2019 -#summary: "" +summary: "This document shows different template examples for running jobs in the Merlin cluster." sidebar: merlin6_sidebar permalink: /merlin6/slurm-examples.html --- -## Basic single core job +## Single core based job examples -### Basic single core job - Example 1 +### Example 1 + +In this example we want to do not use hyper-threading (``--ntasks-per-core=1`` and ``--hint=nomultithread``). In our Merlin6 configuration, +the default memory per cpu (in Slurm, this is equivalent to memory per thread) is 4000MB, but in this example we are using 1 single thread +per core. As we are not using the second thread in the core, we can double the memory used by the single thread to 8000MB. When using one +single thread per core, doubling the memory is recommended (however, some applications might not need it). ```bash #!/bin/bash #SBATCH --partition=hourly # Using 'hourly' will grant higher priority -#SBATCH --ntasks-per-core=1 # Force no Hyper-Threading, will run 1 task per core +#SBATCH --ntasks-per-core=1 # Request the max ntasks be invoked on each core +#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading #SBATCH --mem-per-cpu=8000 # Double the default memory per cpu #SBATCH --time=00:30:00 # Define max time job will run #SBATCH --output=myscript.out # Define your output file #SBATCH --error=myscript.err # Define your error file -my_script +module load $module # ... +My_Script || srun $task # ... ``` -In this example we run a single core job by defining ``--ntasks-per-core=1`` (which is also the default). Since the default memory per -cpu is 4000MB (in Slurm, this is equivalent to the memory per thread), and we are using 1 single thread per core, default memory per CPU -should be doubled: using a single thread will always be accounted as if the job was using the whole physical core (which has 2 available -hyperthreads), hence we want to use the memory as if we were using 2 threads. +### Example 2 -### Basic single core job - Example 2 +In this example we want to do not use hyper-threading (``--ntasks-per-core=1`` and ``--hint=nomultithread``). We want to run a single +task but we need to use all the memory available in the node. For that, we need to define that the job will use the whole memory of +a node with ``--mem=352000`` (which is the maximum memory available on a single Apollo node). Whenever you want to run a job requiring +more memory than the default (4000MB per thread) is very important to specify the amount of memory that the job will use. ```bash #!/bin/bash #SBATCH --partition=hourly # Using 'hourly' will grant higher priority -#SBATCH --ntasks-per-core=1 # Force no Hyper-Threading, will run 1 task per core +#SBATCH --ntasks-per-core=1 # Request the max ntasks be invoked on each core +#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading #SBATCH --mem=352000 # We want to use the whole memory #SBATCH --time=00:30:00 # Define max time job will run #SBATCH --output=myscript.out # Define your output file #SBATCH --error=myscript.err # Define your error file -my_script +module load $module # ... +My_Script || srun $task # ... ``` -In this example we run a single core job by defining ``--ntasks-per-core=1`` (which is also the default). Also, we define that the -job will use the whole memory of a node with ``--mem=352000`` (which is the maximum memory available per Apollo node). Whenever -you want to run a job needing more memory than the default (4000MB per thread) is very important to specify the amount of memory that -the job will use. This must be done in order to avoid conflicts with other jobs from other users. +## Multi core based job examples -## Basic MPI with hyper-threading +### Example 1: with Hyper-Threading + +In this example we run a job that will run 88 tasks. Merlin6 Apollo nodes have 44 cores each one with hyper-threading +enabled. This means that we can run 2 threads per core, in total 88 threads. To accomplish that, users should specify +``--ntasks-per-core=2`` and ``--hint=multithread``. On the other hand, we add the option ``--exclusive`` to ensure +that the node usage is exclusive and no other jobs are running there. Finally, notice that the default memory per +thread is 4000MB; hence, in total this job can use up to 352000MB memory which is the maximum allowed in a single node. ```bash #!/bin/bash #SBATCH --partition=hourly # Using 'hourly' will grant higher priority #SBATCH --exclusive # Use the node in exclusive mode #SBATCH --ntasks=88 # Job will run 88 tasks -#SBATCH --ntasks-per-core=2 # Force Hyper-Threading, will run 2 tasks per core +#SBATCH --ntasks-per-core=2 # Request the max ntasks be invoked on each core +#SBATCH --hint=multithread # Use extra threads with in-core multi-threading #SBATCH --time=00:30:00 # Define max time job will run #SBATCH --output=myscript.out # Define your output file #SBATCH --error=myscript.err # Define your error file -module load gcc/8.3.0 openmpi/3.1.3 - -MPI_script +module load $module # ... +My_Script || srun $task # ... ``` -In this example we run a job that will run 88 tasks. Merlin6 Apollo nodes have 44 cores each one with HT -enabled. This means that we can run 2 threads per core, in total 88 threads. We add the option ``--exclusive`` to -ensure that the node usage is exclusive and no other jobs are running there. Finally, the default memory -per thread is 4000MB, in total this job can use up to 352000MB memory which is the maximum allowed in a single node. +### Example 2: without Hyper-Threading -## Basic MPI without hyper-threading +In this example we want to run a job that will run 44 tasks, and for performance reason we want to disable hyper-threading. +Merlin6 Apollo nodes have 44 cores each one with hyper-threading enabled. For ensure that only 1 thread will be used, users +should specify ``--ntasks-per-core=1`` and ``--hint=nomultithread``. With this configuration, each task will run in 1 thread, +and each tasks will be assigned to an independent core. We add the option ``--exclusive`` to ensure that the node usage is +exclusive and no other jobs are running there. Finally, in our Slurm configuration the default memory per thread is 4000MB, +but we want to use only 1 thread. This means that only half of the memory would be used. If the job requires more memory, +users need to increase it by either by setting ``--mem=352000`` or (exclusive) by setting ``--mem-per-cpu=8000``. ```bash #!/bin/bash #SBATCH --partition=hourly # Using 'hourly' will grant higher priority #SBATCH --ntasks=44 # Job will run 44 tasks -#SBATCH --ntasks-per-core=1 # Force no Hyper-Threading, will run 1 task per core +#SBATCH --ntasks-per-core=2 # Request the max ntasks be invoked on each core +#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading #SBATCH --mem=352000 # Define the whole memory of the node #SBATCH --time=00:30:00 # Define max time job will run #SBATCH --output=myscript.out # Define your output file #SBATCH --error=myscript.err # Define your output file -module load gcc/8.3.0 openmpi/3.1.3 - -MPI_script +module load $module # ... +My_Script || srun $task # ... ``` -In this example we run a job that will run 44 tasks, and Hyper-Threading will not be used. Merlin6 Apollo nodes have 44 cores -each one with HT enabled. However, defining ``--ntasks-per-core=1`` we force the use of one single thread per core (if this is -not defined, will be the default, but is recommended to add it explicitly). Each task will -run in 1 thread, and each tasks will be assigned to an independent core. We add the option ``--exclusive`` to ensre that the node -usage is exclusive and no other jobs are running there. Finally, since the default memory per thread is 4000MB and we use only -1 thread, we want to avoid using half of the memory: we have to specify that we will use the whole memory of the node with the -option ``--mem=352000`` (which is the maximum memory available in the node)`. +## Advanced examples -## Advanced Slurm Example - -Copy-paste the following example in a file called myAdvancedTest.batch): - -```bash -#!/bin/bash -#SBATCH --partition=daily # name of slurm partition to submit -#SBATCH --time=2:00:00 # limit the execution of this job to 2 hours, see sinfo for the max. allowance -#SBATCH --nodes=2 # number of nodes -#SBATCH --ntasks=44 # number of tasks - -module load gcc/8.3.0 openmpi/3.1.3 -module list - -echo "Example no-MPI:" ; hostname # will print one hostname per node -echo "Example MPI:" ; mpirun hostname # will print one hostname per ntask -``` - -In the above example are specified the options ``--nodes=2`` and ``--ntasks=44``. This means that up 2 nodes are requested, -and is expected to run 44 tasks. Hence, 44 cores are needed for running that job (we do not specify ``--ntasks-per-core``, so it will -default to ``1``). Slurm will try to allocate a maximum of 2 nodes, both together having at least 44 cores. -Since our nodes have 44 cores / each, if nodes are empty (no other users have running jobs there), job can land on a single node -(it has enough cores to run 44 tasks). - -If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires -more memory per core) you should specify other options. - -A good example is ``--ntasks-per-node=22``. This will equally distribute 22 tasks on 2 nodes. - -```bash -#SBATCH --ntasks-per-node=22 -``` - -A different example could be by specifying how much memory per core is needed. For instance ``--mem-per-cpu=32000`` will reserve -~32000MB per core. Since we have a maximum of 352000MB per Apollo node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB) per node. -It means that 4 nodes will be needed (max 11 tasks per node due to memory definition, and we need to run 44 tasks), in this case we need to change ``--nodes=4`` -(or remove ``--nodes``). Alternatively, we can decrease ``--mem-per-cpu`` to a lower value which can allow the use of at least 44 cores per node (i.e. with ``16000`` -should be able to use 2 nodes) - -```bash -#SBATCH --mem-per-cpu=16000 -``` - -Finally, in order to ensure exclusivity of the node, an option *--exclusive* can be used (see below). This will ensure that -the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely -free nodes will be allocated). - -```bash -#SBATCH --exclusive -``` - -This can be combined with the previous examples. - -More advanced configurations can be defined and can be combined with the previous examples. More information about advanced -options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch'). - -If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run -advanced configurations unless your are sure of what you are doing. - -## Array Jobs - launching a large number of related jobs +### Array Jobs: launching a large number of related jobs If you need to run a large number of jobs based on the same executable with systematically varying inputs, e.g. for a parameter sweep, you can do this most easily in form of a **simple array job**. @@ -202,7 +152,7 @@ ARGS=(0.05 0.25 0.5 1 2 5 100) srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]} ``` -## Array jobs for running very long tasks with checkpoint files +### Array jobs: running very long tasks with checkpoint files If you need to run a job for much longer than the queues (partitions) permit, and your executable is able to create checkpoint files, you can use this @@ -230,7 +180,7 @@ this will result in subjob n+1 only being started when job n has finished. It wi if it is present. -## Packed jobs - running a large number of short tasks +### Packed jobs: running a large number of short tasks Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate Slurm job. Use job packing, i.e. you run the short tasks within the loop of a single Slurm job. @@ -264,3 +214,62 @@ wait **Note:** The `&` at the end of the `srun` line is needed to not have the script waiting (blocking). The `wait` command waits for all such background tasks to finish and returns the exit code. +## Hands-On Example + +Copy-paste the following example in a file called myAdvancedTest.batch): + +```bash +#!/bin/bash +#SBATCH --partition=daily # name of slurm partition to submit +#SBATCH --time=2:00:00 # limit the execution of this job to 2 hours, see sinfo for the max. allowance +#SBATCH --nodes=2 # number of nodes +#SBATCH --ntasks=44 # number of tasks +#SBATCH --ntasks-per-core=1 # Request the max ntasks be invoked on each core +#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading + +module load gcc/9.2.0 openmpi/3.1.5_merlin6 +module list + +echo "Example no-MPI:" ; hostname # will print one hostname per node +echo "Example MPI:" ; mpirun hostname # will print one hostname per ntask +``` + +In the above example are specified the options ``--nodes=2`` and ``--ntasks=44``. This means that up 2 nodes are requested, +and is expected to run 44 tasks. Hence, 44 cores are needed for running that job. Slurm will try to allocate a maximum of +2 nodes, both together having at least 44 cores. Since our nodes have 44 cores / each, if nodes are empty (no other users +have running jobs there), job can land on a single node (it has enough cores to run 44 tasks). + +If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job +requires more memory per core) you should specify other options. + +A good example is ``--ntasks-per-node=22``. This will equally distribute 22 tasks on 2 nodes. + +```bash +#SBATCH --ntasks-per-node=22 +``` + +A different example could be by specifying how much memory per core is needed. For instance ``--mem-per-cpu=32000`` will reserve +~32000MB per core. Since we have a maximum of 352000MB per Apollo node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB) per node. +It means that 4 nodes will be needed (max 11 tasks per node due to memory definition, and we need to run 44 tasks), in this case we need to change ``--nodes=4`` +(or remove ``--nodes``). Alternatively, we can decrease ``--mem-per-cpu`` to a lower value which can allow the use of at least 44 cores per node (i.e. with ``16000`` +should be able to use 2 nodes) + +```bash +#SBATCH --mem-per-cpu=16000 +``` + +Finally, in order to ensure exclusivity of the node, an option *--exclusive* can be used (see below). This will ensure that +the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely +free nodes will be allocated). + +```bash +#SBATCH --exclusive +``` + +This can be combined with the previous examples. + +More advanced configurations can be defined and can be combined with the previous examples. More information about advanced +options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run 'man sbatch'). + +If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run +advanced configurations unless your are sure of what you are doing. diff --git a/pages/merlin6/03 Job Submission/using-modules.md b/pages/merlin6/03 Job Submission/using-modules.md index 00377a5..6c0c9a2 100644 --- a/pages/merlin6/03 Job Submission/using-modules.md +++ b/pages/merlin6/03 Job Submission/using-modules.md @@ -34,14 +34,14 @@ Also, you can load multiple packages at once. This can be useful for instance wh ```bash # Single line -module load gcc/8.3.0 openmpi/3.1.3 +module load gcc/9.2.0 openmpi/3.1.5_merlin6 # Multiple line -module load gcc/8.3.0 -module load openmpi/3.1.3 +module load gcc/9.2.0 +module load openmpi/3.1.5_merlin6 ``` -In the example above, we load ``openmpi/3.1.3`` but we also specify ``gcc/8.3.0`` which is a strict dependency. The dependency must be +In the example above, we load ``openmpi/3.1.5_merlin6`` but we also specify ``gcc/9.2.0`` which is a strict dependency. The dependency must be loaded in advance. ---