Added Running Jobs
This commit is contained in:
parent
2b990bfdef
commit
2b901919c8
@ -29,10 +29,12 @@ entries:
|
|||||||
url: /merlin6/slurm-access.html
|
url: /merlin6/slurm-access.html
|
||||||
- title: Merlin6 Slurm
|
- title: Merlin6 Slurm
|
||||||
folderitems:
|
folderitems:
|
||||||
- title: Slurm Basic Commands
|
|
||||||
url: /merlin6/slurm-basics.html
|
|
||||||
- title: Slurm Configuration
|
- title: Slurm Configuration
|
||||||
url: /merlin6/slurm-configuration.html
|
url: /merlin6/slurm-configuration.html
|
||||||
|
- title: Slurm Basic Commands
|
||||||
|
url: /merlin6/slurm-basics.html
|
||||||
|
- title: Running Jobs
|
||||||
|
url: /merlin6/running-jobs.html
|
||||||
- title: Support
|
- title: Support
|
||||||
folderitems:
|
folderitems:
|
||||||
- title: Contact
|
- title: Contact
|
||||||
|
122
pages/merlin6/merlin6-slurm/running-jobs.md
Normal file
122
pages/merlin6/merlin6-slurm/running-jobs.md
Normal file
@ -0,0 +1,122 @@
|
|||||||
|
---
|
||||||
|
title: Running Jobs
|
||||||
|
#tags:
|
||||||
|
#keywords:
|
||||||
|
last_updated: 13 June 2019
|
||||||
|
#summary: ""
|
||||||
|
sidebar: merlin6_sidebar
|
||||||
|
permalink: /merlin6/running-jobs.html
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commands for running jobs
|
||||||
|
|
||||||
|
* ``sbatch``: to submit a batch script to Slurm
|
||||||
|
* ``squeue``: for checking the status of your jobs
|
||||||
|
* ``scancel``: for deleting a job from the queue
|
||||||
|
* ``srun``: to run parallel jobs in the batch system
|
||||||
|
* ``salloc``: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
|
||||||
|
* ``salloc`` is equivalent to an interactive run
|
||||||
|
|
||||||
|
## Slurm settings
|
||||||
|
|
||||||
|
### Shared nodes and exclusivity
|
||||||
|
|
||||||
|
The **Merlin6** cluster has been designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing
|
||||||
|
co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This
|
||||||
|
behaviour can be changed by a user if they require exclusive usage of nodes.
|
||||||
|
|
||||||
|
By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way,
|
||||||
|
we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs.
|
||||||
|
|
||||||
|
Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --exclusive
|
||||||
|
```
|
||||||
|
|
||||||
|
### Output and Errors
|
||||||
|
|
||||||
|
By default, Slurm script will generate standard output and errors files in the directory from where
|
||||||
|
you submit the batch script:
|
||||||
|
|
||||||
|
* standard output will be written into a file ``slurm-$SLURM_JOB_ID.out``.
|
||||||
|
* standard error will be written into a file ``slurm-$SLURM_JOB_ID.err``.
|
||||||
|
|
||||||
|
If you want to the default names it can be done with the options ``--output`` and ``--error``. In example:
|
||||||
|
|
||||||
|
```batch
|
||||||
|
#SBATCH --output=logs/myJob.%N.%j.out # Generate an output file per hostname and jobid
|
||||||
|
#SBATCH --error=logs/myJob.%N.%j.err # Generate an errori file per hostname and jobid
|
||||||
|
```
|
||||||
|
|
||||||
|
Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**.
|
||||||
|
|
||||||
|
### Partitions
|
||||||
|
|
||||||
|
Merlin6 contains 3 partitions for general purpose. These are ``general``, ``daily`` and ``hourly``. If no partition is defined,
|
||||||
|
``general`` will be the default. Partition can be defined with the ``--partition`` option as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --partition=<general|daily|hourly> # name of slurm partition to submit. 'general' is the 'default'.
|
||||||
|
```
|
||||||
|
|
||||||
|
Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup.
|
||||||
|
|
||||||
|
### CPU-based Jobs Settings
|
||||||
|
|
||||||
|
CPU-based jobs are available for all PSI users. Users must belong to the ``merlin6`` Slurm ``Account`` in order to be able
|
||||||
|
to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the ``Account``.
|
||||||
|
|
||||||
|
#### Slurm CPU Mandatory Settings
|
||||||
|
|
||||||
|
The following options are mandatory settings that **must be included** in your batch scripts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --constraint=mc # Always set it to 'mc' for CPU jobs.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Slurm CPU Recommended Settings
|
||||||
|
|
||||||
|
There are some settings that are not mandatory but would be needed or useful to specify. These are the following:
|
||||||
|
|
||||||
|
* ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying
|
||||||
|
shorter times. This may affect scheduling priorities.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --time=<D-HH:MM:SS> # Time job needs to run
|
||||||
|
```
|
||||||
|
|
||||||
|
### GPU-based Jobs Settings
|
||||||
|
|
||||||
|
GPU-base jobs are restricted to BIO users, however access for PSI users can be requested on demand. Users must belong to
|
||||||
|
the ``merlin6-gpu`` Slurm ``Account`` in order to be able to run GPU-based nodes. BIO users belonging to any BIO group
|
||||||
|
are automatically registered to the ``merlin6-gpu`` account. Other users should request access to the Merlin6 administrators.
|
||||||
|
|
||||||
|
#### Slurm CPU Mandatory Settings
|
||||||
|
|
||||||
|
The following options are mandatory settings that **must be included** in your batch scripts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --constraint=gpu # Always set it to 'gpu' for GPU jobs.
|
||||||
|
#SBATCH --gres=gpu # Always set at least this option when using GPUs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Slurm GPU Recommended Settings
|
||||||
|
|
||||||
|
GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process
|
||||||
|
must be used. Users can define which GPUs resources they need with the ``--gres`` option.
|
||||||
|
This would be according to the following rules:
|
||||||
|
|
||||||
|
* All machines except ``merlin-g-001`` have up to 4 GPUs. ``merlin-g-001`` has up to 2 GPUs.
|
||||||
|
* Two different NVIDIA models profiles exist: ``GTX1080`` and ``GTX1080Ti``.
|
||||||
|
|
||||||
|
Valid ``gres`` options are: ``gpu[[:type]:count]`` where:
|
||||||
|
|
||||||
|
* ``type``: can be ``GTX1080`` or ``GTX1080Ti``
|
||||||
|
* ``count``: will be the number of GPUs to use
|
||||||
|
|
||||||
|
In example:
|
||||||
|
|
||||||
|
```batch
|
||||||
|
#SBATCH --gres=gpu:GTX1080:4 # Use 4 x GTX1080 GPUs
|
||||||
|
```
|
@ -2,7 +2,7 @@
|
|||||||
title: Slurm Configuration
|
title: Slurm Configuration
|
||||||
#tags:
|
#tags:
|
||||||
#keywords:
|
#keywords:
|
||||||
last_updated: 13 June 2019
|
last_updated: 18 June 2019
|
||||||
#summary: ""
|
#summary: ""
|
||||||
sidebar: merlin6_sidebar
|
sidebar: merlin6_sidebar
|
||||||
permalink: /merlin6/slurm-configuration.html
|
permalink: /merlin6/slurm-configuration.html
|
||||||
@ -43,11 +43,13 @@ Basic usage for the **merlin6** cluster will be detailed here. For advanced usag
|
|||||||
|
|
||||||
The following table show default and maximum resources that can be used per node:
|
The following table show default and maximum resources that can be used per node:
|
||||||
|
|
||||||
| Nodes | Def.#CPUs | Max.#CPUs | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
|
| Nodes | Def.#CPUs | Max.#CPUs | #Threads | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
|
||||||
|:---------------------------------- | ---------:| ---------:| -----------:| -----------:| ------------:| --------:| --------- | --------- |
|
|:-------------------| ---------:| ---------:| -------- | -----------:| -----------:| ------------:| --------:| --------- | --------- |
|
||||||
| merlin-c-[001-022,101-122,201-222] | 1 core | 44 cores | 8000 | 352000 | 352000 | 10000 | N/A | N/A |
|
| merlin-c-[001-022] | 1 core | 44 cores | 1 | 8000 | 352000 | 352000 | 10000 | N/A | N/A |
|
||||||
| merlin-g-[001] | 1 core | 8 cores | 8000 | 102498 | 102498 | 10000 | 1 | 2 |
|
| merlin-c-[101-122] | 1 core | 44 cores | 1 | 8000 | 352000 | 352000 | 10000 | N/A | N/A |
|
||||||
| merlin-g-[002-009] | 1 core | 10 cores | 8000 | 102498 | 102498 | 10000 | 1 | 4 |
|
| merlin-c-[201-222] | 1 core | 44 cores | 1 | 8000 | 352000 | 352000 | 10000 | N/A | N/A |
|
||||||
|
| merlin-g-[001] | 1 core | 8 cores | 1 | 8000 | 102498 | 102498 | 10000 | 1 | 2 |
|
||||||
|
| merlin-g-[002-009] | 1 core | 10 cores | 1 | 8000 | 102498 | 102498 | 10000 | 1 | 4 |
|
||||||
|
|
||||||
If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
|
If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
|
||||||
and maximum memory allowed is ``Max.Mem/Node``.
|
and maximum memory allowed is ``Max.Mem/Node``.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user