Files
gitea-pages/pages/merlin6/99-support/known-problems.md
2023-09-14 11:25:47 +02:00

7.0 KiB

title: Known Problems #tags: keywords: known problems, troubleshooting, illegal instructions, paraview, ansys, shell, opengl, mesa, vglrun, module: command not found, error last_updated: 07 September 2022 #summary: "" sidebar: merlin6_sidebar permalink: /merlin6/known-problems.html

Common errors

Illegal instruction error

It may happened that your code, compiled on one machine will not be executed on another throwing exception like "(Illegal instruction)". This is usually because the software was compiled with a set of instructions newer than the ones available in the node where the software runs, and it mostly depends on the processor generation.

In example, merlin-l-001 and merlin-l-002 contain a newer generation of processors than the old GPUs nodes, or than the Merlin5 cluster. Hence, unless one compiles the software with compatibility with set of instructions from older processors, it will not run on old nodes. Sometimes, this is properly set by default at the compilation time, but sometimes is not.

For GCC, please refer to GCC x86 Options for compiling options. In case of doubts, contact us.

Slurm

sbatch using one core despite setting -c/--cpus-per-task

From Slurm v22.05.6, the behavior of srun has changed. Merlin has been updated to this version since Tuesday 13.12.2022.

srun will no longer read in SLURM_CPUS_PER_TASK, which is typically set when defining -c/--cpus-per-task in the sbatch command. This means you will implicitly have to specify -c\--cpus-per-task also on your srun calls, or set the new SRUN_CPUS_PER_TASK environment variable to accomplish the same thing. Therefore, unless this is implicitly specified, srun will use only one Core per task (resulting in 2 CPUs per task when multithreading is enabled)

An example for setting up srun with -c\--cpus-per-task:

(base)[caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method1
#!/bin/bash
#SBATCH -n 1
#SBATCH --cpus-per-task=8

echo 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'
srun python -c "import os; print(os.sched_getaffinity(0))"

echo 'One has to implicitly specify $SLURM_CPUS_PER_TASK'
echo 'In this example, by setting -c/--cpus-per-task in srun'
srun --cpus-per-task=$SLURM_CPUS_PER_TASK python -c "import os; print(os.sched_getaffinity(0))"

(base)[caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method1
Submitted batch job 8000813

(base)[caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000813.out 
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
{1, 45}
One has to implicitly specify $SLURM_CPUS_PER_TASK
In this example, by setting -c/--cpus-per-task in srun
{1, 2, 3, 4, 45, 46, 47, 48}

An example to accomplish the same thing with the SRUN_CPUS_PER_TASK environment variable:

(base)[caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method2
#!/bin/bash
#SBATCH -n 1
#SBATCH --cpus-per-task=8

echo 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'
srun python -c "import os; print(os.sched_getaffinity(0))"

echo 'One has to implicitly specify $SLURM_CPUS_PER_TASK'
echo 'In this example, by setting an environment variable SRUN_CPUS_PER_TASK'
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun python -c "import os; print(os.sched_getaffinity(0))"


(base)[caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2
Submitted batch job 8000815

(base)[caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000815.out 
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
{1, 45}
One has to implicitly specify $SLURM_CPUS_PER_TASK
In this example, by setting an environment variable SRUN_CPUS_PER_TASK
{1, 2, 3, 4, 45, 46, 47, 48}

General topics

Default SHELL

In general, /bin/bash is the recommended default user's SHELL when working in Merlin.

Some users might notice that BASH is not the default SHELL when logging in to Merlin systems, or they might need to run a different SHELL. This is probably because when the PSI account was requested, no SHELL description was specified or a different one was requested explicitly by the requestor. Users can check which is the default SHELL specified in the PSI account with the following command:

getent passwd $USER | awk -F: '{print $NF}'

If SHELL does not correspond to the one you need to use, you should request a central change for it. This is because Merlin accounts are central PSI accounts. Hence, change must be requested via PSI Service Now.

Alternatively, if you work on other PSI Linux systems but for Merlin you need a different SHELL type, a temporary change can be performed during login startup. You can update one of the following files:

  • ~/.login
  • ~/.profile
  • Any rc or profile file in your home directory (i.e. .cshrc, .bashrc, .bash_profile, etc.)

with the following lines:

# Replace MY_SHELL with the bash type you need
MY_SHELL=/bin/bash
exec $MY_SHELL -l

Notice that available shells can be found in the following file:

cat /etc/shells

3D acceleration: OpenGL vs Mesa

Some applications can run with OpenGL support. This is only possible when the node contains a GPU card.

In general, X11 with Mesa Driver is the recommended method as it will work in all cases (no need of GPUs). In example, for ParaView:

module load paraview
paraview-mesa paraview   # 'paraview --mesa' for old releases

However, if one needs to run with OpenGL support, this is still possible by running vglrun. In example, for running Paraview:

module load paraview
vglrun paraview

Officially, the supported method for running vglrun is by using the NoMachine remote desktop. Running vglrun it's also possible using SSH with X11 Forwarding. However, it's very slow and it's only recommended when running in Slurm (from NoMachine). Please, avoid running vglrun over SSH from a desktop or laptop.

Software

ANSYS

Sometimes, running ANSYS/Fluent requires X11 support. For that, one should run fluent as follows.

module load ANSYS
fluent -driver x11

Paraview

For running Paraview, one can run it with Mesa support or OpenGL support. Please refer to OpenGL vs Mesa for further information about how to run it.

Module command not found

In some circumstances the module command may not be initialized properly. For instance, you may see the following error upon logon:

bash: module: command not found

The most common cause for this is a custom .bashrc file which fails to source the global /etc/bashrc responsible for setting up PModules in some OS versions. To fix this, add the following to $HOME/.bashrc:

if [ -f /etc/bashrc ]; then
    . /etc/bashrc
fi

It can also be fixed temporarily in an existing terminal by running . /etc/bashrc manually.