gitea-pages/gothic.md at 5719e247efc35f9bc51507a4c70f9ebb5bd9d045

Files

caubet_m 5719e247ef GOTHIC: include exit code, add --x11/interactive

2022-08-24 13:40:41 +02:00

7.9 KiB

Raw Blame History

title, last_updated, keywords, summary, sidebar, permalink

title	last_updated	keywords	summary	sidebar	permalink
GOTHIC	37 February 2022	software, ansys, slurm	This document describes how to run Gothic in the Merlin cluster	merlin6_sidebar	/merlin6/gothic.html

This document describes generic information of how to run Gothic in the Merlin cluster

Gothic installation

Gothic is locally installed in Merlin in the following directory:

/data/project/general/software/gothic

Multiple versions are available. As of August 22, 2022, the latest installed version is Gothic 8.3 QA.

Future releases will be placed in the PSI Modules system, therefore, loading it through PModules will be possible at some point. However, in the meantime one has to use the existing installations present in /data/project/general/software/gothic.

Running Gothic

General requirements

When running Gothic in interactive or batch mode, one has to consider the following requirements:

Use always one node only: Gothic runs a single instance. Therefore, it can not run on multiple nodes. Adding option --nodes=1-1 or -N 1-1 is strongly recommended: this will prevent Slurm to allocate multiple nodes if the Slurm allocation definition is ambiguous.
Use one task only: Gothic spawns one main process, which then will spawn multiple threads depending on the number of available cores. Therefore, one has to specify 1 task (--ntasks=1 or -n 1).
Use multiple CPUs: since Gothic will spawn multiple threads, then multiple CPUs can be used. Adding --cpus-per-task=<num_cpus> or -c <num_cpus> is in general recommended. Notice that <num_cpus> must never exceed the maximum number of CPUS in a compute node (usually 88).
Use multithread: Gothic is an OpenMP based software, therefore, running in hyper-threading mode is strongly recommended. Use the option --hint=multithread for enforcing hyper-threading.
[Optional] Memory setup: The default memory per CPU (4000MB) is usually enough for running Gothic. If you require more memory, you can always set the --mem=<mem_in_MB> option. This is in general not necessary.

Interactive

Is not allowed to run CPU intensive interactive jobs in the login nodes. Only applications capable to limit the number of cores are allowed to run for longer time. Also, running in the login nodes is not efficient, since resources are shared with other processes and users.

Is possible to submit interactive jobs to the cluster by allocating a full compute node, or even by allocating a few cores only. This will grant dedicated CPUs and resources and in general it will not affect other users.

For interactive jobs, is strongly recommended to use the hourly partition, which usually has a good availability of nodes.

For longer runs, one should use the daily (or general) partition. However, getting interactive access to nodes on these partitions is sometimes more difficult if the cluster is pretty full.

To submit an interactive job, consider the following requirements:

X11 forwarding must be enabled: Gothic spawns an interactive window which requires X11 forwarding when using it remotely, therefore using the Slurm option --x11 is necessary.
Ensure that the scratch area is accessible: For running Gothic, one has to define a scratch area with the GTHTMP environment variable. There are two options:
1. Use local scratch: Each compute node has its own /scratch area. This area is independent to any other node, therefore not visible by other nodes. Using the top directory /scratch for interactive jobs is the simplest way, and it can be defined before or after the allocation creation, as follows:
```
# Example 1: Define GTHTMP before the allocation
export GTHTMP=/scratch
salloc ...

# Example 2: Define GTHTMP after the allocation
salloc ...
export GTHTMP=/scratch
```
Notice that if you want to create a custom sub-directory (i.e. /scratch/$USER, one has to create the sub-directory on every new allocation! In example: ```bash # Example 1: export GTHTMP=/scratch/$USER salloc ... mkdir -p $GTHTMP
```
# Example 2:
salloc ...
export GTHTMP=/scratch/$USER
mkdir -p $GTHTMP
``` 
```
Creating sub-directories makes the process more complex, therefore using just /scratch is simpler and recommended. 2. Shared scratch: Using shared scratch allows to have a directory visible from all compute nodes and login nodes. Therefore, one can use /shared-scratch to achieve the same as in 1., but creating a sub-directory needs to be done just once.

Please, consider that /scratch usually provides better performance and, in addition, will offload the main storage. Therefore, using local scratch is strongly recommended. Use the shared scratch only when strongly necessary.
Use the hourly partition: Using the hourly partition is recommended for running interactive jobs (latency is in general lower). However, daily and general are also available if you expect longer runs, but in these cases you should expect longer waiting times.

These requirements are in addition to the requirements previously described in the General requirements section.

Interactive allocations: examples

Requesting a full node,

salloc --partition=hourly  -N 1 -n 1 -c 88 --hint=multithread --x11 --exclusive --mem=0

Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):

num_cpus=22
salloc --partition=hourly -N 1 -n 1 -c $num_cpus --hint=multithread --x11

Batch job

The Slurm cluster is mainly used by non interactive batch jobs: Users submit a job, which goes into a queue, and waits until Slurm can assign resources to it. In general, the longer the job, the longer the waiting time, unless there are enough free resources to inmediately start running it.

Running Gothic in a Slurm batch script is pretty simple. One has to mainly consider the requirements described in the General requirements section, and:

Use local scratch for running batch jobs. In general, defining GTHTMP in a batch script is simpler than on an allocation. If you plan to run multiple jobs in the same node, you can even create a second sub-directory level based on the Slurm Job ID:
```
mkdir -p /scratch/$USER/$SLURM_JOB_ID
export GTHTMP=/scratch/$USER/$SLURM_JOB_ID
... # Run Gothic here
rm -rf /scratch/$USER/$SLURM_JOB_ID
```
Temporary data generated by the job in GTHTMP must be removed at the end of the job, as showed above.

Batch script: examples

Requesting a full node:

#!/bin/bash -l
#SBATCH --job-name=Gothic
#SBATCH --time=3-00:00:00
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=88
#SBATCH --hint=multithread
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH --clusters=merlin6

INPUT_FILE='MY_INPUT.SIN'

mkdir -p /scratch/$USER/$SLURM_JOB_ID
export GTHTMP=/scratch/$USER/$SLURM_JOB_ID

/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK
gth_exit_code=$?

rm -rf /scratch/$USER/$SLURM_JOB_ID

Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):

#!/bin/bash -l
#SBATCH --job-name=Gothic
#SBATCH --time=3-00:00:00
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=22
#SBATCH --hint=multithread
#SBATCH --clusters=merlin6

INPUT_FILE='MY_INPUT.SIN'

mkdir -p /scratch/$USER/$SLURM_JOB_ID
export GTHTMP=/scratch/$USER/$SLURM_JOB_ID

/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK
gth_exit_code=$?

rm -rf /scratch/$USER/$SLURM_JOB_ID
exit $gth_exit_code

7.9 KiB Raw Blame History