Files
gitea-pages/pages/merlin7/03-Slurm-General-Documentation/interactive-jobs.md

7.4 KiB

title, keywords, last_updated, summary, sidebar, permalink
title keywords last_updated summary sidebar permalink
Running Interactive Jobs interactive, X11, X, srun, salloc, job, jobs, slurm, nomachine, nx 07 August 2024 This document describes how to run interactive jobs as well as X based software. merlin7_sidebar /merlin7/interactive-jobs.html

Running interactive jobs

There are two different ways for running interactive jobs in Slurm. This is possible by using the salloc and srun commands:

  • salloc: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
  • srun: is used for running parallel tasks.

srun

Is run is used to run parallel jobs in the batch system. It can be used within a batch script (which can be run with sbatch), or within a job allocation (which can be run with salloc). Also, it can be used as a direct command (in example, from the login nodes).

When used inside a batch script or during a job allocation, srun is constricted to the amount of resources allocated by the sbatch/salloc commands. In sbatch, usually these resources are defined inside the batch script with the format #SBATCH <option>=<value>. In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core) and 2 nodes, srun is constricted to these amount of resources (you can use less, but never exceed those limits).

When used from the login node, usually is used to run a specific command or software in an interactive way. srun is a blocking process (it will block bash prompt until the srun command finishes, unless you run it in background with &). This can be very useful to run interactive software which pops up a Window and then submits jobs or run sub-tasks in the background (in example, Relion, cisTEM, etc.)

Refer to man srun for exploring all possible options for that command.

[Show 'srun' example]: Running 'hostname' command on 3 nodes, using 2 cores (1 task/core) per node
caubet_m@login001:~> srun --clusters=merlin7 --ntasks=6 --ntasks-per-node=2 --nodes=3 hostname
cn001.merlin7.psi.ch
cn001.merlin7.psi.ch
cn002.merlin7.psi.ch
cn002.merlin7.psi.ch
cn003.merlin7.psi.ch
cn003.merlin7.psi.ch

salloc

salloc is used to obtain a Slurm job allocation (a set of nodes). Once job is allocated, users are able to execute interactive command(s). Once finished (exit or Ctrl+D), the allocation is released. salloc is a blocking command, it is, command will be blocked until the requested resources are allocated.

When running salloc, once the resources are allocated, by default the user will get a new shell on one of the allocated resources (if a user has requested few nodes, it will prompt a new shell on the first allocated node). However, this behaviour can be changed by adding a shell ($SHELL) at the end of the salloc command. In example:

# Typical 'salloc' call 
salloc --clusters=merlin7 -N 2 -n 2

# Custom 'salloc' call
#   - $SHELL will open a local shell on the login node from where ``salloc`` is running
salloc --clusters=merlin7 -N 2 -n 2 $SHELL
[Show 'salloc' example]: Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - Default
caubet_m@login001:~> salloc --clusters=merlin7 -N 2 -n 2
salloc: Granted job allocation 161
salloc: Nodes cn[001-002] are ready for job

caubet_m@login001:~> srun hostname cn002.merlin7.psi.ch cn001.merlin7.psi.ch

caubet_m@login001:~> exit exit salloc: Relinquishing job allocation 161

[Show 'salloc' example]: Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - $SHELL
caubet_m@login001:~> salloc --clusters=merlin7 --ntasks=2 --nodes=2 $SHELL
salloc: Granted job allocation 165
salloc: Nodes cn[001-002] are ready for job
caubet_m@login001:~> srun hostname
cn001.merlin7.psi.ch
cn002.merlin7.psi.ch
caubet_m@login001:~> exit
exit
salloc: Relinquishing job allocation 165

Running interactive jobs with X11 support

Requirements

Graphical access

NoMachine is the official supported service for graphical access in the Merlin cluster. This service is running on the login nodes. Check the document {Accessing Merlin -> NoMachine} for details about how to connect to the NoMachine service in the Merlin cluster.

For other non officially supported graphical access (X11 forwarding):

'srun' with x11 support

Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to add the option --x11 to the srun command. In example:

srun --clusters=merlin7 --x11 sview

will popup a X11 based slurm view of the cluster.

In the same manner, you can create a bash shell with x11 support. For doing that, you need to add the option --pty to the srun --x11 command. Once resource is allocated, from there you can interactively run X11 and non-X11 based commands.

srun --clusters=merlin7 --x11 --pty bash
[Show 'srun' with X11 support examples]
caubet_m@login001:~> srun --clusters=merlin7 --x11 sview

caubet_m@login001:~>

caubet_m@login001:~> srun --clusters=merlin7 --x11 --pty bash

caubet_m@cn003:~> sview

caubet_m@cn003:~> echo "This was an example" This was an example

caubet_m@cn003:~> exit exit

'salloc' with x11 support

Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to add the option --x11 to the salloc command. In example:

salloc --clusters=merlin7 --x11 sview

will popup a X11 based slurm view of the cluster.

In the same manner, you can create a bash shell with x11 support. For doing that, you need to add to run just salloc --clusters=merlin7 --x11. Once resource is allocated, from there you can interactively run X11 and non-X11 based commands.

salloc --clusters=merlin7 --x11
[Show 'salloc' with X11 support examples]
caubet_m@login001:~> salloc --clusters=merlin7 --x11 sview
salloc: Granted job allocation 174
salloc: Nodes cn001 are ready for job
salloc: Relinquishing job allocation 174

caubet_m@login001:> salloc --clusters=merlin7 --x11 salloc: Granted job allocation 175 salloc: Nodes cn001 are ready for job caubet_m@cn001:>

caubet_m@cn001:~> sview

caubet_m@cn001:~> echo "This was an example" This was an example

caubet_m@cn001:~> exit exit salloc: Relinquishing job allocation 175