NoMachine fix + running jobs
This commit is contained in:
parent
5baf53df77
commit
6169b7a8dc
@ -1,5 +1,5 @@
|
|||||||
---
|
---
|
||||||
title: NoMachine
|
title: Remote Desktop Access
|
||||||
|
|
||||||
#tags:
|
#tags:
|
||||||
#keywords:
|
#keywords:
|
||||||
@ -9,9 +9,9 @@ sidebar: merlin6_sidebar
|
|||||||
permalink: /merlin6/nomachine.html
|
permalink: /merlin6/nomachine.html
|
||||||
---
|
---
|
||||||
|
|
||||||
NoMachine is a desktop virtualization tool. It is similar to VNC, Remote
|
Users can login in Merlin through a Linux Remote Desktop Session. NoMachine
|
||||||
Desktop, etc. It uses the NX protocol to enable a graphical login to remote
|
is a desktop virtualization tool. It is similar to VNC, Remote Desktop, etc.
|
||||||
servers.
|
It uses the NX protocol to enable a graphical login to remote servers.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
|
@ -10,41 +10,72 @@ permalink: /merlin6/running-jobs.html
|
|||||||
|
|
||||||
## Commands for running jobs
|
## Commands for running jobs
|
||||||
|
|
||||||
* ``sbatch``: to submit a batch script to Slurm. Use ``squeue`` for checking jobs status and ``scancel`` for deleting a job from the queue.
|
* **``sbatch``**: to submit a batch script to Slurm
|
||||||
* ``srun``: to run parallel jobs in the batch system
|
* Use **``squeue``** for checking jobs status
|
||||||
* ``salloc``: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
|
* Use **``scancel``** for deleting a job from the queue.
|
||||||
This is equivalent to interactive run.
|
* **``srun``**: to run parallel jobs in the batch system
|
||||||
|
* **``salloc``**: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.
|
||||||
|
|
||||||
## Running on Merlin5
|
## Slurm parameters
|
||||||
|
|
||||||
The **Merlin5** cluster will be available at least until 1st of November 2019. In the meantime, users can keep submitting jobs to the old cluster
|
For a complete list of options and parameters available is recommended to use the **man** pages (``man sbatch``, ``man srun``, ``man salloc``). Please, notice that behaviour for some parameters might change depending on the command (in example, ``--exclusive`` behaviour in ``sbatch`` differs from ``srun``.
|
||||||
but they will need to specify a couple of extra options to their scripts.
|
|
||||||
|
|
||||||
```bash
|
In this chapter we show the basic parameters which are usually needed in the Merlin cluster.
|
||||||
#SBATCH --clusters=merlin5
|
|
||||||
```
|
|
||||||
|
|
||||||
By adding ``--clusters=merlin5`` it will send the jobs to the old Merlin5 computing nodes. Also, ``--partition=<merlin|gpu>`` can be specified in
|
### Running in Merlin5 & Merlin6
|
||||||
order to use the old Merlin5 partitions.
|
|
||||||
|
|
||||||
## Running on Merlin6
|
* For running jobs in the **Merlin6** computing nodes, users have to add the following option:
|
||||||
|
|
||||||
In order to run on the **Merlin6** cluster, users have to add the following options:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
#SBATCH --clusters=merlin6
|
#SBATCH --clusters=merlin6
|
||||||
```
|
```
|
||||||
|
|
||||||
By adding ``--clusters=merlin6`` it will send the jobs to the old Merlin6 computing nodes.
|
* For running jobs in the **Merlin5** computing nodes, users have to add the following options:
|
||||||
|
|
||||||
## Shared nodes and exclusivity
|
```bash
|
||||||
|
#SBATCH --clusters=merlin5
|
||||||
|
```
|
||||||
|
|
||||||
The **Merlin6** cluster has been designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing
|
***For advanced users:*** If you do not care where to run the jobs (**Merlin5** or **Merlin6**) you can skip this setting, however you must make sure that your code can run on both clusters without any problem and you have defined proper settings in your *batch* script.
|
||||||
co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This
|
|
||||||
behaviour can be changed by a user if they require exclusive usage of nodes.
|
|
||||||
|
|
||||||
By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way,
|
### Partitions
|
||||||
we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs.
|
|
||||||
|
**Merlin6** contains 4 partitions for general purpose, while **Merlin5** contains 1 single CPU partition (for historical reasons):
|
||||||
|
|
||||||
|
* **Merlin6** CPU partitions are 3: ``general``, ``daily`` and ``hourly``.
|
||||||
|
* **Merlin6** GPU partition is 1: ``gpu``.
|
||||||
|
* **Merlin5** CPU partition is 1: ``merlin``
|
||||||
|
|
||||||
|
For Merlin6, if no partition is defined, ``general`` will be the default, while for Merlin5 is ``merlin``. Partitions can be changed by defining the ``--partition`` option as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --partition=<partition_name> # Partition to use. 'general' is the 'default' in Merlin6.
|
||||||
|
```
|
||||||
|
|
||||||
|
Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup.
|
||||||
|
|
||||||
|
### Enabling/disabling hyperthreading
|
||||||
|
|
||||||
|
Computing nodes in **merlin6** have hyperthreading enabled: every core is running two threads. It means that for many cases it needs to be disabled and only those multithread-based applications will benefit from that. There are some parameters that users must apply:
|
||||||
|
|
||||||
|
* For **hyperthreaded based jobs** users ***must*** specify the following options:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs
|
||||||
|
#SBATCH --hint=multithread # Mandatory for multithreaded jobs
|
||||||
|
```
|
||||||
|
|
||||||
|
* For **non-hyperthreaded based jobs** users ***must*** specify the following options:
|
||||||
|
```bash
|
||||||
|
#SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs
|
||||||
|
#SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Shared nodes and exclusivity
|
||||||
|
|
||||||
|
The **Merlin5** and **Merlin6** clusters are designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This behaviour can be changed by a user if they require exclusive usage of nodes.
|
||||||
|
|
||||||
|
By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way, we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs.
|
||||||
|
|
||||||
Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows:
|
Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows:
|
||||||
|
|
||||||
@ -52,7 +83,18 @@ Exclusivity of a node can be setup by specific the ``--exclusive`` option as fol
|
|||||||
#SBATCH --exclusive
|
#SBATCH --exclusive
|
||||||
```
|
```
|
||||||
|
|
||||||
## Output and Errors
|
### Slurm CPU Recommended Settings
|
||||||
|
|
||||||
|
There are some settings that are not mandatory but would be needed or useful to specify. These are the following:
|
||||||
|
|
||||||
|
* ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying
|
||||||
|
shorter times. **This will affect scheduling priorities**, hence is important to define it (and to define it properly).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --time=<D-HH:MM:SS> # Time job needs to run
|
||||||
|
```
|
||||||
|
|
||||||
|
### Output and Errors
|
||||||
|
|
||||||
By default, Slurm script will generate standard output and errors files in the directory from where
|
By default, Slurm script will generate standard output and errors files in the directory from where
|
||||||
you submit the batch script:
|
you submit the batch script:
|
||||||
@ -69,38 +111,16 @@ If you want to the default names it can be done with the options ``--output`` an
|
|||||||
|
|
||||||
Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**.
|
Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**.
|
||||||
|
|
||||||
## Partitions
|
|
||||||
|
|
||||||
Merlin6 contains 6 partitions for general purpose:
|
|
||||||
|
|
||||||
* For the CPU these are ``general``, ``daily`` and ``hourly``.
|
|
||||||
* For the GPU these are ``gpu``.
|
|
||||||
|
|
||||||
If no partition is defined, ``general`` will be the default. Partition can be defined with the ``--partition`` option as follows:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
#SBATCH --partition=<partition_name> # Partition to use. 'general' is the 'default'.
|
|
||||||
```
|
|
||||||
|
|
||||||
Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup.
|
|
||||||
|
|
||||||
## CPU-based Jobs Settings
|
## CPU-based Jobs Settings
|
||||||
|
|
||||||
CPU-based jobs are available for all PSI users. Users must belong to the ``merlin6`` Slurm ``Account`` in order to be able
|
CPU-based jobs are available for all PSI users. Users must belong to the ``merlin6`` Slurm ``Account`` in order to be able
|
||||||
to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the ``Account``.
|
to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the ``Account``.
|
||||||
|
|
||||||
### Slurm CPU Recommended Settings
|
### Slurm CPU Templates
|
||||||
|
|
||||||
There are some settings that are not mandatory but would be needed or useful to specify. These are the following:
|
The following examples apply to the **Merlin6** cluster.
|
||||||
|
|
||||||
* ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying
|
#### Nomultithreaded jobs example
|
||||||
shorter times. This may affect scheduling priorities.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
#SBATCH --time=<D-HH:MM:SS> # Time job needs to run
|
|
||||||
```
|
|
||||||
|
|
||||||
### Slurm CPU Template
|
|
||||||
|
|
||||||
The following template should be used by any user submitting jobs to CPU nodes:
|
The following template should be used by any user submitting jobs to CPU nodes:
|
||||||
|
|
||||||
@ -110,24 +130,49 @@ The following template should be used by any user submitting jobs to CPU nodes:
|
|||||||
#SBATCH --time=<D-HH:MM:SS> # Strictly recommended when using 'general' partition.
|
#SBATCH --time=<D-HH:MM:SS> # Strictly recommended when using 'general' partition.
|
||||||
#SBATCH --output=<output_file> # Generate custom output file
|
#SBATCH --output=<output_file> # Generate custom output file
|
||||||
#SBATCH --error=<error_file> # Generate custom error file
|
#SBATCH --error=<error_file> # Generate custom error file
|
||||||
#SBATCH --ntasks-per-core=1 # Recommended one thread per core
|
#SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs
|
||||||
|
#SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs
|
||||||
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
||||||
|
|
||||||
## Advanced options example
|
## Advanced options example
|
||||||
##SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
##SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
||||||
##SBATCH --ntasks=44 # Uncomment and specify #nodes to use
|
##SBATCH --ntasks=44 # Uncomment and specify #nodes to use
|
||||||
##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node
|
##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node
|
||||||
##SBATCH --ntasks-per-core=2 # Uncomment and specify #tasks per core (a.k.a. threads)
|
|
||||||
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
||||||
```
|
```
|
||||||
|
|
||||||
* Users needing hyper-threading can specify ``--ntasks-per-core=2`` instead. This is not recommended for generic usage.
|
#### Multithreaded jobs
|
||||||
|
|
||||||
|
The following template should be used by any user submitting jobs to CPU nodes:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/sh
|
||||||
|
#SBATCH --partition=<general|daily|hourly> # Specify 'general' or 'daily' or 'hourly'
|
||||||
|
#SBATCH --time=<D-HH:MM:SS> # Strictly recommended when using 'general' partition.
|
||||||
|
#SBATCH --output=<output_file> # Generate custom output file
|
||||||
|
#SBATCH --error=<error_file> # Generate custom error file
|
||||||
|
#SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs
|
||||||
|
#SBATCH --hint=multithread # Mandatory for multithreaded jobs
|
||||||
|
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
||||||
|
|
||||||
|
## Advanced options example
|
||||||
|
##SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
||||||
|
##SBATCH --ntasks=88 # Uncomment and specify #nodes to use
|
||||||
|
##SBATCH --ntasks-per-node=88 # Uncomment and specify #tasks per node
|
||||||
|
##SBATCH --cpus-per-task=88 # Uncomment and specify the number of cores per task
|
||||||
|
```
|
||||||
|
|
||||||
## GPU-based Jobs Settings
|
## GPU-based Jobs Settings
|
||||||
|
|
||||||
GPU-base jobs are restricted to BIO users, however access for PSI users can be requested on demand. Users must belong to
|
**Merlin6** GPUs are available for all PSI users, however, this is restricted to any user belonging to the ``merlin-gpu`` account. By default, all users are added to this account (exceptions could apply).
|
||||||
the ``merlin6-gpu`` Slurm ``Account`` in order to be able to run GPU-based nodes. BIO users belonging to any BIO group
|
|
||||||
are automatically registered to the ``merlin6-gpu`` account. Other users should request access to the Merlin6 administrators.
|
### Merlin6 GPU account
|
||||||
|
|
||||||
|
When using GPUs, users must switch to the **merlin-gpu** Slurm account in order to be able to run on GPU-based nodes. This is done with the ``--account`` setting as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#SBATCH --account=merlin-gpu # The account 'merlin-gpu' must be used
|
||||||
|
```
|
||||||
|
|
||||||
### Slurm CPU Mandatory Settings
|
### Slurm CPU Mandatory Settings
|
||||||
|
|
||||||
@ -137,7 +182,7 @@ The following options are mandatory settings that **must be included** in your b
|
|||||||
#SBATCH --gres=gpu # Always set at least this option when using GPUs
|
#SBATCH --gres=gpu # Always set at least this option when using GPUs
|
||||||
```
|
```
|
||||||
|
|
||||||
### Slurm GPU Recommended Settings
|
#### Slurm GPU Recommended Settings
|
||||||
|
|
||||||
GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process
|
GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process
|
||||||
must be used. Users can define which GPUs resources they need with the ``--gres`` option.
|
must be used. Users can define which GPUs resources they need with the ``--gres`` option.
|
||||||
@ -147,9 +192,11 @@ This would be according to the following rules:
|
|||||||
In example:
|
In example:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
#SBATCH --gres=gpu:GTX1080:8 # Use 8 x GTX1080 GPUs
|
#SBATCH --gres=gpu:GTX1080:4 # Use a node with 4 x GTX1080 GPUs
|
||||||
```
|
```
|
||||||
|
|
||||||
|
***Important note:*** Due to a bug in the configuration, ``[:type]`` (i.e. ``GTX1080`` or ``GTX1080Ti``) is not working. Users should skip that and use only ``gpu[:count]``. This will be fixed in the upcoming downtimes as it requires a full restart of the batch system.
|
||||||
|
|
||||||
### Slurm GPU Template
|
### Slurm GPU Template
|
||||||
|
|
||||||
The following template should be used by any user submitting jobs to GPU nodes:
|
The following template should be used by any user submitting jobs to GPU nodes:
|
||||||
@ -157,13 +204,15 @@ The following template should be used by any user submitting jobs to GPU nodes:
|
|||||||
```bash
|
```bash
|
||||||
#!/bin/sh
|
#!/bin/sh
|
||||||
#SBATCH --partition=gpu_<general|daily|hourly> # Specify 'general' or 'daily' or 'hourly'
|
#SBATCH --partition=gpu_<general|daily|hourly> # Specify 'general' or 'daily' or 'hourly'
|
||||||
|
#SBATCH --gres="gpu:<type>:<number_gpus>" # You should specify at least 'gpu'
|
||||||
#SBATCH --time=<D-HH:MM:SS> # Strictly recommended when using 'general' partition.
|
#SBATCH --time=<D-HH:MM:SS> # Strictly recommended when using 'general' partition.
|
||||||
#SBATCH --output=<output_file> # Generate custom output file
|
#SBATCH --output=<output_file> # Generate custom output file
|
||||||
#SBATCH --error=<error_file # Generate custom error file
|
#SBATCH --error=<error_file # Generate custom error file
|
||||||
#SBATCH --gres="gpu:<type>:<number_gpus>" # You should specify at least 'gpu'
|
|
||||||
#SBATCH --ntasks-per-core=1 # GPU nodes have hyper-threading disabled
|
#SBATCH --ntasks-per-core=1 # GPU nodes have hyper-threading disabled
|
||||||
|
#SBATCH --account=merlin-gpu # The account 'merlin-gpu' must be used
|
||||||
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
||||||
|
|
||||||
|
|
||||||
## Advanced options example
|
## Advanced options example
|
||||||
##SBATCH --nodes=1 # Uncomment and specify number of nodes to use
|
##SBATCH --nodes=1 # Uncomment and specify number of nodes to use
|
||||||
##SBATCH --ntasks=44 # Uncomment and specify number of nodes to use
|
##SBATCH --ntasks=44 # Uncomment and specify number of nodes to use
|
||||||
@ -176,8 +225,8 @@ The following template should be used by any user submitting jobs to GPU nodes:
|
|||||||
|
|
||||||
The status of submitted jobs can be check with the `squeue` command:
|
The status of submitted jobs can be check with the `squeue` command:
|
||||||
|
|
||||||
```
|
```bash
|
||||||
~ $ squeue -u bliven_s
|
$> squeue -u bliven_s
|
||||||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
||||||
134507729 gpu test_scr bliven_s PD 0:00 3 (AssocGrpNodeLimit)
|
134507729 gpu test_scr bliven_s PD 0:00 3 (AssocGrpNodeLimit)
|
||||||
134507768 general test_scr bliven_s PD 0:00 19 (AssocGrpCpuLimit)
|
134507768 general test_scr bliven_s PD 0:00 19 (AssocGrpCpuLimit)
|
||||||
@ -187,13 +236,14 @@ The status of submitted jobs can be check with the `squeue` command:
|
|||||||
```
|
```
|
||||||
|
|
||||||
Common Statuses:
|
Common Statuses:
|
||||||
- *merlin-\** Running on the specified host
|
|
||||||
- *(Priority)* Waiting in the queue
|
* **merlin-\***: Running on the specified host
|
||||||
- *(Resources)* At the head of the queue, waiting for machines to become available
|
* **(Priority)**: Waiting in the queue
|
||||||
- *(AssocGrpCpuLimit), (AssocGrpNodeLimit)* Job would exceed per-user limitations on
|
* **(Resources)**: At the head of the queue, waiting for machines to become available
|
||||||
|
* **(AssocGrpCpuLimit), (AssocGrpNodeLimit)**: Job would exceed per-user limitations on
|
||||||
the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and
|
the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and
|
||||||
resubmit with fewer resources, or else wait for your other jobs to finish.
|
resubmit with fewer resources, or else wait for your other jobs to finish.
|
||||||
- *(PartitionNodeLimit)* Exceeds all resources available on this partition.
|
* **(PartitionNodeLimit)**: Exceeds all resources available on this partition.
|
||||||
Run `scancel` and resubmit to a different partition (`-p`) or with fewer
|
Run `scancel` and resubmit to a different partition (`-p`) or with fewer
|
||||||
resources.
|
resources.
|
||||||
|
|
||||||
|
@ -25,6 +25,9 @@ sbatch Script.sh # to submit a script (example below) to the slurm.
|
|||||||
srun <command> # to submit a command to Slurm. Same options as in 'sbatch' can be used.
|
srun <command> # to submit a command to Slurm. Same options as in 'sbatch' can be used.
|
||||||
salloc # to allocate computing nodes. Use for interactive runs.
|
salloc # to allocate computing nodes. Use for interactive runs.
|
||||||
scancel job_id # to cancel slurm job, job id is the numeric id, seen by the squeue.
|
scancel job_id # to cancel slurm job, job id is the numeric id, seen by the squeue.
|
||||||
|
sview # X interface for managing jobs and track job run information.
|
||||||
|
seff # Calculates the efficiency of a job
|
||||||
|
sjstat # List attributes of jobs under the SLURM control
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
Loading…
x
Reference in New Issue
Block a user