diff --git a/pages/merlin6/02 accessing-merlin6/nomachine.md b/pages/merlin6/02 accessing-merlin6/nomachine.md index 21f0207..e712d0d 100644 --- a/pages/merlin6/02 accessing-merlin6/nomachine.md +++ b/pages/merlin6/02 accessing-merlin6/nomachine.md @@ -1,5 +1,5 @@ --- -title: NoMachine +title: Remote Desktop Access #tags: #keywords: @@ -9,9 +9,9 @@ sidebar: merlin6_sidebar permalink: /merlin6/nomachine.html --- -NoMachine is a desktop virtualization tool. It is similar to VNC, Remote -Desktop, etc. It uses the NX protocol to enable a graphical login to remote -servers. +Users can login in Merlin through a Linux Remote Desktop Session. NoMachine +is a desktop virtualization tool. It is similar to VNC, Remote Desktop, etc. +It uses the NX protocol to enable a graphical login to remote servers. ## Installation diff --git a/pages/merlin6/03 Job Submission/running-jobs.md b/pages/merlin6/03 Job Submission/running-jobs.md index accbaf8..913a001 100644 --- a/pages/merlin6/03 Job Submission/running-jobs.md +++ b/pages/merlin6/03 Job Submission/running-jobs.md @@ -10,41 +10,72 @@ permalink: /merlin6/running-jobs.html ## Commands for running jobs -* ``sbatch``: to submit a batch script to Slurm. Use ``squeue`` for checking jobs status and ``scancel`` for deleting a job from the queue. -* ``srun``: to run parallel jobs in the batch system -* ``salloc``: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished. -This is equivalent to interactive run. +* **``sbatch``**: to submit a batch script to Slurm + * Use **``squeue``** for checking jobs status + * Use **``scancel``** for deleting a job from the queue. +* **``srun``**: to run parallel jobs in the batch system +* **``salloc``**: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished. -## Running on Merlin5 +## Slurm parameters -The **Merlin5** cluster will be available at least until 1st of November 2019. In the meantime, users can keep submitting jobs to the old cluster -but they will need to specify a couple of extra options to their scripts. +For a complete list of options and parameters available is recommended to use the **man** pages (``man sbatch``, ``man srun``, ``man salloc``). Please, notice that behaviour for some parameters might change depending on the command (in example, ``--exclusive`` behaviour in ``sbatch`` differs from ``srun``. -```bash -#SBATCH --clusters=merlin5 -``` +In this chapter we show the basic parameters which are usually needed in the Merlin cluster. -By adding ``--clusters=merlin5`` it will send the jobs to the old Merlin5 computing nodes. Also, ``--partition=`` can be specified in -order to use the old Merlin5 partitions. +### Running in Merlin5 & Merlin6 -## Running on Merlin6 - -In order to run on the **Merlin6** cluster, users have to add the following options: +* For running jobs in the **Merlin6** computing nodes, users have to add the following option: ```bash #SBATCH --clusters=merlin6 ``` -By adding ``--clusters=merlin6`` it will send the jobs to the old Merlin6 computing nodes. +* For running jobs in the **Merlin5** computing nodes, users have to add the following options: -## Shared nodes and exclusivity +```bash +#SBATCH --clusters=merlin5 +``` -The **Merlin6** cluster has been designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing -co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This -behaviour can be changed by a user if they require exclusive usage of nodes. +***For advanced users:*** If you do not care where to run the jobs (**Merlin5** or **Merlin6**) you can skip this setting, however you must make sure that your code can run on both clusters without any problem and you have defined proper settings in your *batch* script. -By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way, -we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs. +### Partitions + +**Merlin6** contains 4 partitions for general purpose, while **Merlin5** contains 1 single CPU partition (for historical reasons): + + * **Merlin6** CPU partitions are 3: ``general``, ``daily`` and ``hourly``. + * **Merlin6** GPU partition is 1: ``gpu``. + * **Merlin5** CPU partition is 1: ``merlin`` + +For Merlin6, if no partition is defined, ``general`` will be the default, while for Merlin5 is ``merlin``. Partitions can be changed by defining the ``--partition`` option as follows: + +```bash +#SBATCH --partition= # Partition to use. 'general' is the 'default' in Merlin6. +``` + +Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup. + +### Enabling/disabling hyperthreading + +Computing nodes in **merlin6** have hyperthreading enabled: every core is running two threads. It means that for many cases it needs to be disabled and only those multithread-based applications will benefit from that. There are some parameters that users must apply: + +* For **hyperthreaded based jobs** users ***must*** specify the following options: + +```bash +#SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs +#SBATCH --hint=multithread # Mandatory for multithreaded jobs +``` + +* For **non-hyperthreaded based jobs** users ***must*** specify the following options: +```bash +#SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs +#SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs +``` + +### Shared nodes and exclusivity + +The **Merlin5** and **Merlin6** clusters are designed in a way that should allow running MPI/OpenMP processes as well as single core based jobs. For allowing co-existence, nodes are configured by default in a shared mode. It means, that multiple jobs from multiple users may land in the same node. This behaviour can be changed by a user if they require exclusive usage of nodes. + +By default, Slurm will try to allocate jobs on nodes that are already occupied by processes not requiring exclusive usage of a node. In this way, we fill up first mixed nodes and we ensure that free full resources are available for MPI/OpenMP jobs. Exclusivity of a node can be setup by specific the ``--exclusive`` option as follows: @@ -52,7 +83,18 @@ Exclusivity of a node can be setup by specific the ``--exclusive`` option as fol #SBATCH --exclusive ``` -## Output and Errors +### Slurm CPU Recommended Settings + +There are some settings that are not mandatory but would be needed or useful to specify. These are the following: + +* ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying +shorter times. **This will affect scheduling priorities**, hence is important to define it (and to define it properly). + + ```bash + #SBATCH --time= # Time job needs to run + ``` + +### Output and Errors By default, Slurm script will generate standard output and errors files in the directory from where you submit the batch script: @@ -69,38 +111,16 @@ If you want to the default names it can be done with the options ``--output`` an Use **man sbatch** (``man sbatch | grep -A36 '^filename pattern'``) for getting a list specification of **filename patterns**. -## Partitions - -Merlin6 contains 6 partitions for general purpose: - - * For the CPU these are ``general``, ``daily`` and ``hourly``. - * For the GPU these are ``gpu``. - -If no partition is defined, ``general`` will be the default. Partition can be defined with the ``--partition`` option as follows: - -```bash -#SBATCH --partition= # Partition to use. 'general' is the 'default'. -``` - -Please check the section [Slurm Configuration#Merlin6 Slurm Partitions] for more information about Merlin6 partition setup. - ## CPU-based Jobs Settings CPU-based jobs are available for all PSI users. Users must belong to the ``merlin6`` Slurm ``Account`` in order to be able to run on CPU-based nodes. All users registered in Merlin6 are automatically included in the ``Account``. -### Slurm CPU Recommended Settings +### Slurm CPU Templates -There are some settings that are not mandatory but would be needed or useful to specify. These are the following: +The following examples apply to the **Merlin6** cluster. -* ``--time``: mostly used when you need to specify longer runs in the ``general`` partition, also useful for specifying -shorter times. This may affect scheduling priorities. - - ```bash - #SBATCH --time= # Time job needs to run - ``` - -### Slurm CPU Template +#### Nomultithreaded jobs example The following template should be used by any user submitting jobs to CPU nodes: @@ -110,24 +130,49 @@ The following template should be used by any user submitting jobs to CPU nodes: #SBATCH --time= # Strictly recommended when using 'general' partition. #SBATCH --output= # Generate custom output file #SBATCH --error= # Generate custom error file -#SBATCH --ntasks-per-core=1 # Recommended one thread per core +#SBATCH --ntasks-per-core=1 # Mandatory for non-multithreaded jobs +#SBATCH --hint=nomultithread # Mandatory for non-multithreaded jobs ##SBATCH --exclusive # Uncomment if you need exclusive node usage ## Advanced options example ##SBATCH --nodes=1 # Uncomment and specify #nodes to use ##SBATCH --ntasks=44 # Uncomment and specify #nodes to use ##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node -##SBATCH --ntasks-per-core=2 # Uncomment and specify #tasks per core (a.k.a. threads) ##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task ``` -* Users needing hyper-threading can specify ``--ntasks-per-core=2`` instead. This is not recommended for generic usage. +#### Multithreaded jobs + +The following template should be used by any user submitting jobs to CPU nodes: + +```bash +#!/bin/sh +#SBATCH --partition= # Specify 'general' or 'daily' or 'hourly' +#SBATCH --time= # Strictly recommended when using 'general' partition. +#SBATCH --output= # Generate custom output file +#SBATCH --error= # Generate custom error file +#SBATCH --ntasks-per-core=2 # Mandatory for multithreaded jobs +#SBATCH --hint=multithread # Mandatory for multithreaded jobs +##SBATCH --exclusive # Uncomment if you need exclusive node usage + +## Advanced options example +##SBATCH --nodes=1 # Uncomment and specify #nodes to use +##SBATCH --ntasks=88 # Uncomment and specify #nodes to use +##SBATCH --ntasks-per-node=88 # Uncomment and specify #tasks per node +##SBATCH --cpus-per-task=88 # Uncomment and specify the number of cores per task +``` ## GPU-based Jobs Settings -GPU-base jobs are restricted to BIO users, however access for PSI users can be requested on demand. Users must belong to -the ``merlin6-gpu`` Slurm ``Account`` in order to be able to run GPU-based nodes. BIO users belonging to any BIO group -are automatically registered to the ``merlin6-gpu`` account. Other users should request access to the Merlin6 administrators. +**Merlin6** GPUs are available for all PSI users, however, this is restricted to any user belonging to the ``merlin-gpu`` account. By default, all users are added to this account (exceptions could apply). + +### Merlin6 GPU account + +When using GPUs, users must switch to the **merlin-gpu** Slurm account in order to be able to run on GPU-based nodes. This is done with the ``--account`` setting as follows: + +```bash +#SBATCH --account=merlin-gpu # The account 'merlin-gpu' must be used +``` ### Slurm CPU Mandatory Settings @@ -137,7 +182,7 @@ The following options are mandatory settings that **must be included** in your b #SBATCH --gres=gpu # Always set at least this option when using GPUs ``` -### Slurm GPU Recommended Settings +#### Slurm GPU Recommended Settings GPUs are also a shared resource. Hence, multiple users can run jobs on a single node, but only one GPU per user process must be used. Users can define which GPUs resources they need with the ``--gres`` option. @@ -147,9 +192,11 @@ This would be according to the following rules: In example: ```bash -#SBATCH --gres=gpu:GTX1080:8 # Use 8 x GTX1080 GPUs +#SBATCH --gres=gpu:GTX1080:4 # Use a node with 4 x GTX1080 GPUs ``` +***Important note:*** Due to a bug in the configuration, ``[:type]`` (i.e. ``GTX1080`` or ``GTX1080Ti``) is not working. Users should skip that and use only ``gpu[:count]``. This will be fixed in the upcoming downtimes as it requires a full restart of the batch system. + ### Slurm GPU Template The following template should be used by any user submitting jobs to GPU nodes: @@ -157,13 +204,15 @@ The following template should be used by any user submitting jobs to GPU nodes: ```bash #!/bin/sh #SBATCH --partition=gpu_ # Specify 'general' or 'daily' or 'hourly' +#SBATCH --gres="gpu::" # You should specify at least 'gpu' #SBATCH --time= # Strictly recommended when using 'general' partition. #SBATCH --output= # Generate custom output file #SBATCH --error= squeue -u bliven_s + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) + 134507729 gpu test_scr bliven_s PD 0:00 3 (AssocGrpNodeLimit) + 134507768 general test_scr bliven_s PD 0:00 19 (AssocGrpCpuLimit) + 134507729 gpu test_scr bliven_s PD 0:00 3 (Resources) + 134506301 gpu test_scr bliven_s PD 0:00 1 (Priority) + 134506288 gpu test_scr bliven_s R 9:16 1 merlin-g-008 ``` Common Statuses: -- *merlin-\** Running on the specified host -- *(Priority)* Waiting in the queue -- *(Resources)* At the head of the queue, waiting for machines to become available -- *(AssocGrpCpuLimit), (AssocGrpNodeLimit)* Job would exceed per-user limitations on + +* **merlin-\***: Running on the specified host +* **(Priority)**: Waiting in the queue +* **(Resources)**: At the head of the queue, waiting for machines to become available +* **(AssocGrpCpuLimit), (AssocGrpNodeLimit)**: Job would exceed per-user limitations on the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and resubmit with fewer resources, or else wait for your other jobs to finish. -- *(PartitionNodeLimit)* Exceeds all resources available on this partition. +* **(PartitionNodeLimit)**: Exceeds all resources available on this partition. Run `scancel` and resubmit to a different partition (`-p`) or with fewer resources. diff --git a/pages/merlin6/03 Job Submission/slurm-basic-commands.md b/pages/merlin6/03 Job Submission/slurm-basic-commands.md index 6e1b1f6..92e9bee 100644 --- a/pages/merlin6/03 Job Submission/slurm-basic-commands.md +++ b/pages/merlin6/03 Job Submission/slurm-basic-commands.md @@ -25,6 +25,9 @@ sbatch Script.sh # to submit a script (example below) to the slurm. srun # to submit a command to Slurm. Same options as in 'sbatch' can be used. salloc # to allocate computing nodes. Use for interactive runs. scancel job_id # to cancel slurm job, job id is the numeric id, seen by the squeue. +sview # X interface for managing jobs and track job run information. +seff # Calculates the efficiency of a job +sjstat # List attributes of jobs under the SLURM control ``` ---