Files
gitea-pages/pages/merlin6/known-problems-and-troubleshooting.md
Spencer Bliven ebff53c62c Migrating merlin6 user guide from jekyll-example1
From lsm-hpce/jekyll-example1 1eada07
2019-06-14 15:38:22 +02:00

2.7 KiB

title, last_updated, sidebar, permalink
title last_updated sidebar permalink
Known Problems and Troubleshooting 13 June 2019 merlin6_sidebar /merlin6/troubleshooting.html

Known Problems

Paraview, ANSYS and OpenGL

Try to use X11(mesa) driver for Paraview and ANSYS instead of OpenGL:

# ANSYS
module load ANSYS
fluent -driver x11

# ParaView
module load paraview
paraview --mesa

###+ Illegal instructions

It may happened that your code, compiled on one machine will not be executed on another throwing exception like "(Illegal instruction)". Check (with "hostname" command) on which of the node you are and compare it with the names from first item. We observe few applications that can't be run on merlin-c-01..16 because of this problem (notice that these machines are more then 5 years old). Hint: you may choose the particular flavour of the machines for your slurm job, check the "--cores-per-node" option for sbatch:

sbatch --cores-per-socket=8 Script.sh # will filter the selection of the machine and exclude the oldest one, merlin-c-01..16

Troubleshooting

Before asking for help

Please, if you have problems running jobs and you want to report something or just ask for help, please gather and attach in advance the following information:

  • Unix username and session (who am i command output)
  • Environment settings (env command output)
  • Slurm batch script location (path to script and input/output files)
  • Slurm job_id (id is returned on sbatch/salloc command, but also can be taken from squeue commmand)

Troubleshooting SSH

Use the ssh command with the "-vvv" option and copy and paste (no screenshot please) the output to your request in Service-Now. Example

ssh -Y -vvv bond_j@merlin-l-01

Troubleshooting SLURM

If one copies Slurm commands or batch scripts from another cluster, they may need some changes (often minor) to run successfully on Merlin5. Examine carefully the error message, especially concerning the options used in the slurm commands.

Try to submit jobs using the examples given in the section "Using Batch System to Submit Jobs to Merlin5". If you can run successfully an example for a type of job (!OpenMP, MPI) similar to your one, try to edit the example to run your application.

If the problem remains, then, in your request in Service-Now, describe the problem in details that are needed to reproduce it. Include the output of the following commands:

date
hostname
pwd
module list

# All slurm commands used with the corresponding output

Do not delete any output and error files generated by Slurm. Make a copy of the failed job script if you like to edit it meanwhile.