Finished running jobs
This commit is contained in:
@ -10,12 +10,10 @@ permalink: /merlin6/running-jobs.html
|
||||
|
||||
## Commands for running jobs
|
||||
|
||||
* ``sbatch``: to submit a batch script to Slurm
|
||||
* ``squeue``: for checking the status of your jobs
|
||||
* ``scancel``: for deleting a job from the queue
|
||||
* ``sbatch``: to submit a batch script to Slurm. Use ``squeue`` for checking jobs status and ``scancel`` for deleting a job from the queue.
|
||||
* ``srun``: to run parallel jobs in the batch system
|
||||
* ``salloc``: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
|
||||
* ``salloc`` is equivalent to an interactive run
|
||||
This is equivalent to interactive run.
|
||||
|
||||
## Shared nodes and exclusivity
|
||||
|
||||
@ -28,7 +26,7 @@ we fill up first mixed nodes and we ensure that free full resources are availabl
|
||||
|
||||
Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows:
|
||||
|
||||
```bash
|
||||
```
|
||||
#SBATCH --exclusive
|
||||
```
|
||||
|
||||
@ -42,7 +40,7 @@ you submit the batch script:
|
||||
|
||||
If you want to the default names it can be done with the options ``--output`` and ``--error``. In example:
|
||||
|
||||
```batch
|
||||
```
|
||||
#SBATCH --output=logs/myJob.%N.%j.out # Generate an output file per hostname and jobid
|
||||
#SBATCH --error=logs/myJob.%N.%j.err # Generate an errori file per hostname and jobid
|
||||
```
|
||||
@ -54,7 +52,7 @@ Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting
|
||||
Merlin6 contains 3 partitions for general purpose. These are ``general``, ``daily`` and ``hourly``. If no partition is defined,
|
||||
``general`` will be the default. Partition can be defined with the ``--partition`` option as follows:
|
||||
|
||||
```bash
|
||||
```
|
||||
#SBATCH --partition=<general|daily|hourly> # name of slurm partition to submit. 'general' is the 'default'.
|
||||
```
|
||||
|
||||
@ -69,7 +67,7 @@ to run on CPU-based nodes. All users registered in Merlin6 are automatically inc
|
||||
|
||||
The following options are mandatory settings that **must be included** in your batch scripts:
|
||||
|
||||
```bash
|
||||
```
|
||||
#SBATCH --constraint=mc # Always set it to 'mc' for CPU jobs.
|
||||
```
|
||||
|
||||
@ -80,10 +78,34 @@ There are some settings that are not mandatory but would be needed or useful to
|
||||
* ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying
|
||||
shorter times. This may affect scheduling priorities.
|
||||
|
||||
```bash
|
||||
```
|
||||
#SBATCH --time=<D-HH:MM:SS> # Time job needs to run
|
||||
```
|
||||
|
||||
### Slurm CPU Template
|
||||
|
||||
The following template should be used by any user submitting jobs to CPU nodes:
|
||||
|
||||
```
|
||||
#!/bin/sh
|
||||
#SBATCH --partition=<general|daily|hourly> # Specify 'general' or 'daily' or 'hourly'
|
||||
#SBATCH --time=<D-HH:MM:SS> # Recommended, and strictly recommended when using 'general' partition.
|
||||
#SBATCH --output=<output_file> # Generate custom output file
|
||||
#SBATCH --error=<error_file> # Generate custom error file
|
||||
#SBATCH --constraint=mc # You must specify 'mc' when using 'cpu' jobs
|
||||
#SBATCH --ntasks-per-core=1 # Recommended one thread per core
|
||||
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
||||
|
||||
## Advanced options example
|
||||
##SBATCH --nodes=1 # Uncomment and specify number of nodes to use
|
||||
##SBATCH --ntasks=44 # Uncomment and specify number of nodes to use
|
||||
##SBATCH --ntasks-per-node=44 # Uncomment and specify number of tasks per node
|
||||
##SBATCH --ntasks-per-core=2 # Uncomment and specifty number of tasks per core (threads)
|
||||
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
||||
```
|
||||
|
||||
* Users needing hyper-threading can specify ``--ntasks-per-core=2`` instead. This is not recommended for generic usage.
|
||||
|
||||
## GPU-based Jobs Settings
|
||||
|
||||
GPU-base jobs are restricted to BIO users, however access for PSI users can be requested on demand. Users must belong to
|
||||
@ -94,27 +116,42 @@ are automatically registered to the ``merlin6-gpu`` account. Other users should
|
||||
|
||||
The following options are mandatory settings that **must be included** in your batch scripts:
|
||||
|
||||
```bash
|
||||
```
|
||||
#SBATCH --constraint=gpu # Always set it to 'gpu' for GPU jobs.
|
||||
#SBATCH --gres=gpu # Always set at least this option when using GPUs
|
||||
```
|
||||
|
||||
## Slurm GPU Recommended Settings
|
||||
### Slurm GPU Recommended Settings
|
||||
|
||||
GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process
|
||||
must be used. Users can define which GPUs resources they need with the ``--gres`` option.
|
||||
Valid ``gres`` options are: ``gpu[[:type]:count]`` where ``type=GTX1080|GTX1080Ti`` and ``count=<number of gpus to use>``
|
||||
This would be according to the following rules:
|
||||
|
||||
* All machines except ``merlin-g-001`` have up to 4 GPUs. ``merlin-g-001`` has up to 2 GPUs.
|
||||
* Two different NVIDIA models profiles exist: ``GTX1080`` and ``GTX1080Ti``.
|
||||
|
||||
Valid ``gres`` options are: ``gpu[[:type]:count]`` where:
|
||||
|
||||
* ``type``: can be ``GTX1080`` or ``GTX1080Ti``
|
||||
* ``count``: will be the number of GPUs to use
|
||||
|
||||
In example:
|
||||
|
||||
```batch
|
||||
#SBATCH --gres=gpu:GTX1080:4 # Use 4 x GTX1080 GPUs
|
||||
```
|
||||
#SBATCH --gres=gpu:GTX1080:8 # Use 8 x GTX1080 GPUs
|
||||
```
|
||||
|
||||
### Slurm GPU Template
|
||||
|
||||
The following template should be used by any user submitting jobs to GPU nodes:
|
||||
|
||||
```
|
||||
#!/bin/sh
|
||||
#SBATCH --partition=<general|daily|hourly> # Specify 'general' or 'daily' or 'hourly'
|
||||
#SBATCH --time=<D-HH:MM:SS> # Recommended, and strictly recommended when using 'general' partition.
|
||||
#SBATCH --output=<output_file> # Generate custom output file
|
||||
#SBATCH --error=<error_file # Generate custom error file
|
||||
#SBATCH --constraint=gpu # You must specify 'gpu' for using GPUs
|
||||
#SBATCH --gres="gpu:<type>:<number_gpus>" # You should specify at least 'gpu'
|
||||
#SBATCH --ntasks-per-core=1 # GPU nodes have hyper-threading disabled
|
||||
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
||||
|
||||
## Advanced options example
|
||||
##SBATCH --nodes=1 # Uncomment and specify number of nodes to use
|
||||
##SBATCH --ntasks=44 # Uncomment and specify number of nodes to use
|
||||
##SBATCH --ntasks-per-node=44 # Uncomment and specify number of tasks per node
|
||||
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
||||
```
|
||||
|
Reference in New Issue
Block a user