refactor CSCS and Meg content add merlin6 quick start update merlin6 nomachine docs give the userdoc its own color scheme we use the Materials default one refactored slurm general docs merlin6 add merlin6 JB docs add software support m6 docs add all files to nav vibed changes #1 add missing pages further vibing #2 vibe #3 further fixes
218 lines
8.5 KiB
Markdown
218 lines
8.5 KiB
Markdown
# Running Interactive Jobs
|
|
|
|
## Running interactive jobs
|
|
|
|
There are two different ways for running interactive jobs in Slurm. This is possible by using
|
|
the `salloc` and `srun` commands:
|
|
|
|
* **`salloc`**: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
|
|
* **`srun`**: is used for running parallel tasks.
|
|
|
|
### srun
|
|
|
|
Is run is used to run parallel jobs in the batch system. It can be used within a batch script
|
|
(which can be run with `sbatch`), or within a job allocation (which can be run with `salloc`).
|
|
Also, it can be used as a direct command (in example, from the login nodes).
|
|
|
|
When used inside a batch script or during a job allocation, `srun` is constricted to the
|
|
amount of resources allocated by the `sbatch`/`salloc` commands. In `sbatch`, usually
|
|
these resources are defined inside the batch script with the format `#SBATCH <option>=<value>`.
|
|
In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core)
|
|
and 2 nodes, `srun` is constricted to these amount of resources (you can use less, but never
|
|
exceed those limits).
|
|
|
|
When used from the login node, usually is used to run a specific command or software in an
|
|
interactive way. `srun` is a blocking process (it will block bash prompt until the `srun`
|
|
command finishes, unless you run it in background with `&`). This can be very useful to run
|
|
interactive software which pops up a Window and then submits jobs or run sub-tasks in the
|
|
background (in example, **Relion**, **cisTEM**, etc.)
|
|
|
|
Refer to `man srun` for exploring all possible options for that command.
|
|
|
|
??? note "Running 'hostname' command on 3 nodes, using 2 cores (1 task/core) per node"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --ntasks=6 --ntasks-per-node=2 --nodes=3 hostname
|
|
srun: job 135088230 queued and waiting for resources
|
|
srun: job 135088230 has been allocated resources
|
|
merlin-c-102.psi.ch
|
|
merlin-c-102.psi.ch
|
|
merlin-c-101.psi.ch
|
|
merlin-c-101.psi.ch
|
|
merlin-c-103.psi.ch
|
|
merlin-c-103.psi.ch
|
|
```
|
|
|
|
### salloc
|
|
|
|
**`salloc`** is used to obtain a Slurm job allocation (a set of nodes). Once job is allocated,
|
|
users are able to execute interactive command(s). Once finished (`exit` or `Ctrl+D`),
|
|
the allocation is released. **`salloc`** is a blocking command, it is, command will be blocked
|
|
until the requested resources are allocated.
|
|
|
|
When running **`salloc`**, once the resources are allocated, *by default* the user will get
|
|
a ***new shell on one of the allocated resources*** (if a user has requested few nodes, it will
|
|
prompt a new shell on the first allocated node). However, this behaviour can be changed by adding
|
|
a shell (`$SHELL`) at the end of the `salloc` command. In example:
|
|
|
|
```bash
|
|
# Typical 'salloc' call
|
|
# - Same as running:
|
|
# 'salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL'
|
|
salloc --clusters=merlin6 -N 2 -n 2
|
|
|
|
# Custom 'salloc' call
|
|
# - $SHELL will open a local shell on the login node from where ``salloc`` is running
|
|
salloc --clusters=merlin6 -N 2 -n 2 $SHELL
|
|
```
|
|
|
|
??? note "Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - *default*"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --ntasks=2 --nodes=2
|
|
salloc: Pending job allocation 135171306
|
|
salloc: job 135171306 queued and waiting for resources
|
|
salloc: job 135171306 has been allocated resources
|
|
salloc: Granted job allocation 135171306
|
|
|
|
(base) [caubet_m@merlin-c-213 ~]$ srun hostname
|
|
merlin-c-213.psi.ch
|
|
merlin-c-214.psi.ch
|
|
|
|
(base) [caubet_m@merlin-c-213 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171306
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL
|
|
salloc: Pending job allocation 135171342
|
|
salloc: job 135171342 queued and waiting for resources
|
|
salloc: job 135171342 has been allocated resources
|
|
salloc: Granted job allocation 135171342
|
|
|
|
(base) [caubet_m@merlin-c-021 ~]$ srun hostname
|
|
merlin-c-021.psi.ch
|
|
merlin-c-022.psi.ch
|
|
|
|
(base) [caubet_m@merlin-c-021 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171342
|
|
```
|
|
|
|
??? note "Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - `$SHELL`"
|
|
```console
|
|
(base) [caubet_m@merlin-export-01 ~]$ salloc --clusters=merlin6 --ntasks=2 --nodes=2 $SHELL
|
|
salloc: Pending job allocation 135171308
|
|
salloc: job 135171308 queued and waiting for resources
|
|
salloc: job 135171308 has been allocated resources
|
|
salloc: Granted job allocation 135171308
|
|
|
|
(base) [caubet_m@merlin-export-01 ~]$ srun hostname
|
|
merlin-c-218.psi.ch
|
|
merlin-c-117.psi.ch
|
|
|
|
(base) [caubet_m@merlin-export-01 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171308
|
|
```
|
|
|
|
## Running interactive jobs with X11 support
|
|
|
|
### Requirements
|
|
|
|
#### Graphical access
|
|
|
|
[NoMachine](../how-to-use-merlin/nomachine.md) is the official supported service for graphical
|
|
access in the Merlin cluster. This service is running on the login nodes. Check the
|
|
document [{Accessing Merlin -> NoMachine}](../how-to-use-merlin/nomachine.md) for details about
|
|
how to connect to the **NoMachine** service in the Merlin cluster.
|
|
|
|
For other non officially supported graphical access (X11 forwarding):
|
|
|
|
* For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](../how-to-use-merlin/connect-from-linux.md)
|
|
* For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](../how-to-use-merlin/connect-from-windows.md)
|
|
* For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](../how-to-use-merlin/connect-from-macos.md)
|
|
|
|
### 'srun' with x11 support
|
|
|
|
Merlin5 and Merlin6 clusters allow running any windows based applications. For that, you need to
|
|
add the option ``--x11`` to the ``srun`` command. In example:
|
|
|
|
```bash
|
|
srun --clusters=merlin6 --x11 xclock
|
|
```
|
|
|
|
will popup a X11 based clock.
|
|
|
|
In the same manner, you can create a bash shell with x11 support. For doing that, you need
|
|
to add the option ``--pty`` to the ``srun --x11`` command. Once resource is allocated, from
|
|
there you can interactively run X11 and non-X11 based commands.
|
|
|
|
```bash
|
|
srun --clusters=merlin6 --x11 --pty bash
|
|
```
|
|
|
|
??? note "Using 'srun' with X11 support"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 xclock
|
|
srun: job 135095591 queued and waiting for resources
|
|
srun: job 135095591 has been allocated resources
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 --pty bash
|
|
srun: job 135095592 queued and waiting for resources
|
|
srun: job 135095592 has been allocated resources
|
|
|
|
(base) [caubet_m@merlin-c-205 ~]$ xclock
|
|
|
|
(base) [caubet_m@merlin-c-205 ~]$ echo "This was an example"
|
|
This was an example
|
|
|
|
(base) [caubet_m@merlin-c-205 ~]$ exit
|
|
exit
|
|
```
|
|
|
|
### 'salloc' with x11 support
|
|
|
|
**Merlin5** and **Merlin6** clusters allow running any windows based applications. For that, you need to
|
|
add the option ``--x11`` to the ``salloc`` command. In example:
|
|
|
|
```bash
|
|
salloc --clusters=merlin6 --x11 xclock
|
|
```
|
|
|
|
will popup a X11 based clock.
|
|
|
|
In the same manner, you can create a bash shell with x11 support. For doing that, you need
|
|
to add to run just ``salloc --clusters=merlin6 --x11``. Once resource is allocated, from
|
|
there you can interactively run X11 and non-X11 based commands.
|
|
|
|
```bash
|
|
salloc --clusters=merlin6 --x11
|
|
```
|
|
|
|
??? note "Using 'salloc' with X11 support examples"
|
|
```console
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11 xclock
|
|
salloc: Pending job allocation 135171355
|
|
salloc: job 135171355 queued and waiting for resources
|
|
salloc: job 135171355 has been allocated resources
|
|
salloc: Granted job allocation 135171355
|
|
salloc: Relinquishing job allocation 135171355
|
|
|
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11
|
|
salloc: Pending job allocation 135171349
|
|
salloc: job 135171349 queued and waiting for resources
|
|
salloc: job 135171349 has been allocated resources
|
|
salloc: Granted job allocation 135171349
|
|
salloc: Waiting for resource configuration
|
|
salloc: Nodes merlin-c-117 are ready for job
|
|
|
|
(base) [caubet_m@merlin-c-117 ~]$ xclock
|
|
|
|
(base) [caubet_m@merlin-c-117 ~]$ echo "This was an example"
|
|
This was an example
|
|
|
|
(base) [caubet_m@merlin-c-117 ~]$ exit
|
|
exit
|
|
salloc: Relinquishing job allocation 135171349
|
|
```
|