8.1 KiB
title, keywords, last_updated, summary, sidebar, permalink
title | keywords | last_updated | summary | sidebar | permalink |
---|---|---|---|---|---|
GOTHIC | software, gothic, slurm, interactive, batch job | 07 September 2022 | This document describes how to run Gothic in the Merlin cluster | merlin6_sidebar | /merlin6/gothic.html |
This document describes generic information of how to run Gothic in the Merlin cluster
Gothic installation
Gothic is locally installed in Merlin in the following directory:
/data/project/general/software/gothic
Multiple versions are available. As of August 22, 2022, the latest installed version is Gothic 8.3 QA.
Future releases will be placed in the PSI Modules system, therefore,
loading it through PModules will be possible at some point. However, in the
meantime one has to use the existing installations present in
/data/project/general/software/gothic
.
Running Gothic
General requirements
When running Gothic in interactive or batch mode, one has to consider the following requirements:
- Use always one node only: Gothic runs a single instance.
Therefore, it can not run on multiple nodes. Adding option
--nodes=1-1
or-N 1-1
is strongly recommended: this will prevent Slurm to allocate multiple nodes if the Slurm allocation definition is ambiguous. - Use one task only: Gothic spawns one main process, which then will
spawn multiple threads depending on the number of available cores.
Therefore, one has to specify 1 task (
--ntasks=1
or-n 1
). - Use multiple CPUs: since Gothic will spawn multiple threads, then
multiple CPUs can be used. Adding
--cpus-per-task=<num_cpus>
or-c <num_cpus>
is in general recommended. Notice that<num_cpus>
must never exceed the maximum number of CPUS in a compute node (usually 88). - Use multithread: Gothic is an OpenMP based software, therefore,
running in hyper-threading mode is strongly recommended. Use the option
--hint=multithread
for enforcing hyper-threading. - [Optional] Memory setup: The default memory per CPU (4000MB)
is usually enough for running Gothic. If you require more memory, you
can always set the
--mem=<mem_in_MB>
option. This is in general not necessary.
Interactive
Is not allowed to run CPU intensive interactive jobs in the login nodes. Only applications capable to limit the number of cores are allowed to run for longer time. Also, running in the login nodes is not efficient, since resources are shared with other processes and users.
Is possible to submit interactive jobs to the cluster by allocating a full compute node, or even by allocating a few cores only. This will grant dedicated CPUs and resources and in general it will not affect other users.
For interactive jobs, is strongly recommended to use the hourly
partition,
which usually has a good availability of nodes.
For longer runs, one should use the daily
(or general
) partition.
However, getting interactive access to nodes on these partitions is
sometimes more difficult if the cluster is pretty full.
To submit an interactive job, consider the following requirements:
-
X11 forwarding must be enabled: Gothic spawns an interactive window which requires X11 forwarding when using it remotely, therefore using the Slurm option
--x11
is necessary. -
Ensure that the scratch area is accessible: For running Gothic, one has to define a scratch area with the
GTHTMP
environment variable. There are two options:- Use local scratch: Each compute node has its own
/scratch
area. This area is independent to any other node, therefore not visible by other nodes. Using the top directory/scratch
for interactive jobs is the simplest way, and it can be defined before or after the allocation creation, as follows:# Example 1: Define GTHTMP before the allocation export GTHTMP=/scratch salloc ... # Example 2: Define GTHTMP after the allocation salloc ... export GTHTMP=/scratch
Notice that if you want to create a custom sub-directory (i.e.
/scratch/$USER
, one has to create the sub-directory on every new allocation! In example: ```bash # Example 1: export GTHTMP=/scratch/$USER salloc ... mkdir -p $GTHTMP# Example 2: salloc ... export GTHTMP=/scratch/$USER mkdir -p $GTHTMP ```
Creating sub-directories makes the process more complex, therefore using just
/scratch
is simpler and recommended. 2. Shared scratch: Using shared scratch allows to have a directory visible from all compute nodes and login nodes. Therefore, one can use/shared-scratch
to achieve the same as in 1., but creating a sub-directory needs to be done just once.Please, consider that
/scratch
usually provides better performance and, in addition, will offload the main storage. Therefore, using local scratch is strongly recommended. Use the shared scratch only when strongly necessary. - Use local scratch: Each compute node has its own
-
Use the
hourly
partition: Using thehourly
partition is recommended for running interactive jobs (latency is in general lower). However,daily
andgeneral
are also available if you expect longer runs, but in these cases you should expect longer waiting times.
These requirements are in addition to the requirements previously described in the General requirements section.
Interactive allocations: examples
- Requesting a full node,
salloc --partition=hourly -N 1 -n 1 -c 88 --hint=multithread --x11 --exclusive --mem=0
- Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):
num_cpus=22
salloc --partition=hourly -N 1 -n 1 -c $num_cpus --hint=multithread --x11
Batch job
The Slurm cluster is mainly used by non interactive batch jobs: Users submit a job, which goes into a queue, and waits until Slurm can assign resources to it. In general, the longer the job, the longer the waiting time, unless there are enough free resources to inmediately start running it.
Running Gothic in a Slurm batch script is pretty simple. One has to mainly consider the requirements described in the General requirements section, and:
- Use local scratch for running batch jobs. In general, defining
GTHTMP
in a batch script is simpler than on an allocation. If you plan to run multiple jobs in the same node, you can even create a second sub-directory level based on the Slurm Job ID:Temporary data generated by the job inmkdir -p /scratch/$USER/$SLURM_JOB_ID export GTHTMP=/scratch/$USER/$SLURM_JOB_ID ... # Run Gothic here rm -rf /scratch/$USER/$SLURM_JOB_ID
GTHTMP
must be removed at the end of the job, as showed above.
Batch script: examples
- Requesting a full node:
#!/bin/bash -l #SBATCH --job-name=Gothic #SBATCH --time=3-00:00:00 #SBATCH --partition=general #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=88 #SBATCH --hint=multithread #SBATCH --exclusive #SBATCH --mem=0 #SBATCH --clusters=merlin6 INPUT_FILE='MY_INPUT.SIN' mkdir -p /scratch/$USER/$SLURM_JOB_ID export GTHTMP=/scratch/$USER/$SLURM_JOB_ID /data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK gth_exit_code=$? # Clean up data in /scratch rm -rf /scratch/$USER/$SLURM_JOB_ID # Return exit code from GOTHIC exit $gth_exit_code
- Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):
#!/bin/bash -l #SBATCH --job-name=Gothic #SBATCH --time=3-00:00:00 #SBATCH --partition=general #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=22 #SBATCH --hint=multithread #SBATCH --clusters=merlin6 INPUT_FILE='MY_INPUT.SIN' mkdir -p /scratch/$USER/$SLURM_JOB_ID export GTHTMP=/scratch/$USER/$SLURM_JOB_ID /data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK gth_exit_code=$? # Clean up data in /scratch rm -rf /scratch/$USER/$SLURM_JOB_ID # Return exit code from GOTHIC exit $gth_exit_code