218 lines
8.5 KiB
Markdown
218 lines
8.5 KiB
Markdown
# Running Interactive Jobs
|
|
|
|
## Running interactive jobs
|
|
|
|
There are two different ways for running interactive jobs in Slurm. This is possible by using
|
|
the `salloc` and `srun` commands:
|
|
|
|
* **`salloc`**: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
|
|
* **`srun`**: is used for running parallel tasks.
|
|
|
|
### srun
|
|
|
|
Is run is used to run parallel jobs in the batch system. It can be used within a batch script
|
|
(which can be run with `sbatch`), or within a job allocation (which can be run with `salloc`).
|
|
Also, it can be used as a direct command (in example, from the login nodes).
|
|
|
|
When used inside a batch script or during a job allocation, `srun` is constricted to the
|
|
amount of resources allocated by the `sbatch`/`salloc` commands. In `sbatch`, usually
|
|
these resources are defined inside the batch script with the format `#SBATCH <option>=<value>`.
|
|
In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core)
|
|
and 2 nodes, `srun` is constricted to these amount of resources (you can use less, but never
|
|
exceed those limits).
|
|
|
|
When used from the login node, usually is used to run a specific command or software in an
|
|
interactive way. `srun` is a blocking process (it will block bash prompt until the `srun`
|
|
command finishes, unless you run it in background with `&`). This can be very useful to run
|
|
interactive software which pops up a Window and then submits jobs or run sub-tasks in the
|
|
background (in example, **Relion**, **cisTEM**, etc.)
|
|
|
|
Refer to `man srun` for exploring all possible options for that command.
|
|
|
|
??? note "Running 'hostname' command on 3 nodes, using 2 cores (1 task/core) per node"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --ntasks=6 --ntasks-per-node=2 --nodes=3 hostname
|
|
srun: job 135088230 queued and waiting for resources
|
|
srun: job 135088230 has been allocated resources
|
|
merlin-c-102.psi.ch
|
|
merlin-c-102.psi.ch
|
|
merlin-c-101.psi.ch
|
|
merlin-c-101.psi.ch
|
|
merlin-c-103.psi.ch
|
|
merlin-c-103.psi.ch
|
|
```
|
|
|
|
### salloc
|
|
|
|
**`salloc`** is used to obtain a Slurm job allocation (a set of nodes). Once job is allocated,
|
|
users are able to execute interactive command(s). Once finished (`exit` or `Ctrl+D`),
|
|
the allocation is released. **`salloc`** is a blocking command, it is, command will be blocked
|
|
until the requested resources are allocated.
|
|
|
|
When running **`salloc`**, once the resources are allocated, *by default* the user will get
|
|
a ***new shell on one of the allocated resources*** (if a user has requested few nodes, it will
|
|
prompt a new shell on the first allocated node). However, this behaviour can be changed by adding
|
|
a shell (`$SHELL`) at the end of the `salloc` command. In example:
|
|
|
|
```bash
|
|
# Typical 'salloc' call
|
|
# - Same as running:
|
|
# 'salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL'
|
|
salloc --clusters=merlin6 -N 2 -n 2
|
|
|
|
# Custom 'salloc' call
|
|
# - $SHELL will open a local shell on the login node from where ``salloc`` is running
|
|
salloc --clusters=merlin6 -N 2 -n 2 $SHELL
|
|
```
|
|
|
|
??? note "Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - *default*"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --ntasks=2 --nodes=2
|
|
salloc: Pending job allocation 135171306
|
|
salloc: job 135171306 queued and waiting for resources
|
|
salloc: job 135171306 has been allocated resources
|
|
salloc: Granted job allocation 135171306
|
|
|
|
(base) [caubet_m@merlin-c-213 ~]$ srun hostname
|
|
merlin-c-213.psi.ch
|
|
merlin-c-214.psi.ch
|
|
|
|
(base) [caubet_m@merlin-c-213 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171306
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL
|
|
salloc: Pending job allocation 135171342
|
|
salloc: job 135171342 queued and waiting for resources
|
|
salloc: job 135171342 has been allocated resources
|
|
salloc: Granted job allocation 135171342
|
|
|
|
(base) [caubet_m@merlin-c-021 ~]$ srun hostname
|
|
merlin-c-021.psi.ch
|
|
merlin-c-022.psi.ch
|
|
|
|
(base) [caubet_m@merlin-c-021 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171342
|
|
```
|
|
|
|
??? note "Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - `$SHELL`"
|
|
```console
|
|
(base) [caubet_m@merlin-export-01 ~]$ salloc --clusters=merlin6 --ntasks=2 --nodes=2 $SHELL
|
|
salloc: Pending job allocation 135171308
|
|
salloc: job 135171308 queued and waiting for resources
|
|
salloc: job 135171308 has been allocated resources
|
|
salloc: Granted job allocation 135171308
|
|
|
|
(base) [caubet_m@merlin-export-01 ~]$ srun hostname
|
|
merlin-c-218.psi.ch
|
|
merlin-c-117.psi.ch
|
|
|
|
(base) [caubet_m@merlin-export-01 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171308
|
|
```
|
|
|
|
## Running interactive jobs with X11 support
|
|
|
|
### Requirements
|
|
|
|
#### Graphical access
|
|
|
|
[NoMachine](../../how-to-use-merlin/nomachine.md) is the official supported service for graphical
|
|
access in the Merlin cluster. This service is running on the login nodes. Check the
|
|
document [{Accessing Merlin -> NoMachine}](../../how-to-use-merlin/nomachine.md) for details about
|
|
how to connect to the **NoMachine** service in the Merlin cluster.
|
|
|
|
For other non officially supported graphical access (X11 forwarding):
|
|
|
|
* For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](../../how-to-use-merlin/connect-from-linux.md)
|
|
* For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](../../how-to-use-merlin/connect-from-windows.md)
|
|
* For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](../../how-to-use-merlin/connect-from-macos.md)
|
|
|
|
### 'srun' with x11 support
|
|
|
|
Merlin5 and Merlin6 clusters allow running any windows based applications. For that, you need to
|
|
add the option ``--x11`` to the ``srun`` command. In example:
|
|
|
|
```bash
|
|
srun --clusters=merlin6 --x11 xclock
|
|
```
|
|
|
|
will popup a X11 based clock.
|
|
|
|
In the same manner, you can create a bash shell with x11 support. For doing that, you need
|
|
to add the option ``--pty`` to the ``srun --x11`` command. Once resource is allocated, from
|
|
there you can interactively run X11 and non-X11 based commands.
|
|
|
|
```bash
|
|
srun --clusters=merlin6 --x11 --pty bash
|
|
```
|
|
|
|
??? note "Using 'srun' with X11 support"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 xclock
|
|
srun: job 135095591 queued and waiting for resources
|
|
srun: job 135095591 has been allocated resources
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 --pty bash
|
|
srun: job 135095592 queued and waiting for resources
|
|
srun: job 135095592 has been allocated resources
|
|
|
|
(base) [caubet_m@merlin-c-205 ~]$ xclock
|
|
|
|
(base) [caubet_m@merlin-c-205 ~]$ echo "This was an example"
|
|
This was an example
|
|
|
|
(base) [caubet_m@merlin-c-205 ~]$ exit
|
|
exit
|
|
```
|
|
|
|
### 'salloc' with x11 support
|
|
|
|
**Merlin5** and **Merlin6** clusters allow running any windows based applications. For that, you need to
|
|
add the option ``--x11`` to the ``salloc`` command. In example:
|
|
|
|
```bash
|
|
salloc --clusters=merlin6 --x11 xclock
|
|
```
|
|
|
|
will popup a X11 based clock.
|
|
|
|
In the same manner, you can create a bash shell with x11 support. For doing that, you need
|
|
to add to run just ``salloc --clusters=merlin6 --x11``. Once resource is allocated, from
|
|
there you can interactively run X11 and non-X11 based commands.
|
|
|
|
```bash
|
|
salloc --clusters=merlin6 --x11
|
|
```
|
|
|
|
??? note "Using 'salloc' with X11 support examples"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11 xclock
|
|
salloc: Pending job allocation 135171355
|
|
salloc: job 135171355 queued and waiting for resources
|
|
salloc: job 135171355 has been allocated resources
|
|
salloc: Granted job allocation 135171355
|
|
salloc: Relinquishing job allocation 135171355
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11
|
|
salloc: Pending job allocation 135171349
|
|
salloc: job 135171349 queued and waiting for resources
|
|
salloc: job 135171349 has been allocated resources
|
|
salloc: Granted job allocation 135171349
|
|
salloc: Waiting for resource configuration
|
|
salloc: Nodes merlin-c-117 are ready for job
|
|
|
|
(base) [caubet_m@merlin-c-117 ~]$ xclock
|
|
|
|
(base) [caubet_m@merlin-c-117 ~]$ echo "This was an example"
|
|
This was an example
|
|
|
|
(base) [caubet_m@merlin-c-117 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171349
|
|
```
|