84 lines
2.7 KiB
Markdown
84 lines
2.7 KiB
Markdown
---
|
|
title: Known Problems and Troubleshooting
|
|
#tags:
|
|
#keywords:
|
|
last_updated: 13 June 2019
|
|
#summary: ""
|
|
sidebar: merlin6_sidebar
|
|
permalink: /merlin6/troubleshooting.html
|
|
---
|
|
|
|
## Known Problems
|
|
|
|
### Paraview, ANSYS and OpenGL
|
|
|
|
Try to use X11(mesa) driver for Paraview and ANSYS instead of OpenGL:
|
|
|
|
```bash
|
|
# ANSYS
|
|
module load ANSYS
|
|
fluent -driver x11
|
|
|
|
# ParaView
|
|
module load paraview
|
|
paraview --mesa
|
|
```
|
|
|
|
###+ Illegal instructions
|
|
|
|
It may happened that your code, compiled on one machine will not be executed on another throwing exception like "(Illegal instruction)".
|
|
Check (with "hostname" command) on which of the node you are and compare it with the names from first item. We observe few applications
|
|
that can't be run on merlin-c-01..16 because of this problem (notice that these machines are more then 5 years old). Hint: you may
|
|
choose the particular flavour of the machines for your slurm job, check the "--cores-per-node" option for sbatch:
|
|
|
|
```bash
|
|
sbatch --cores-per-socket=8 Script.sh # will filter the selection of the machine and exclude the oldest one, merlin-c-01..16
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Before asking for help
|
|
|
|
Please, if you have problems running jobs and you want to report something or just ask for help,
|
|
please gather and attach in advance the following information:
|
|
|
|
* Unix username and session (``who am i`` command output)
|
|
* Environment settings (``env`` command output)
|
|
* Slurm batch script location (path to script and input/output files)
|
|
* Slurm job_id (``id`` is returned on ``sbatch``/``salloc`` command, but also can be taken from ``squeue`` commmand)
|
|
|
|
### Troubleshooting SSH
|
|
|
|
Use the ssh command with the "-vvv" option and copy and paste (no screenshot please)
|
|
the output to your request in Service-Now. Example
|
|
|
|
```bash
|
|
ssh -Y -vvv bond_j@merlin-l-01
|
|
```
|
|
|
|
### Troubleshooting SLURM
|
|
|
|
If one copies Slurm commands or batch scripts from another cluster,
|
|
they may need some changes (often minor) to run successfully on Merlin5.
|
|
Examine carefully the error message, especially concerning the options
|
|
used in the slurm commands.
|
|
|
|
Try to submit jobs using the examples given in the section "Using Batch System to Submit Jobs to Merlin5".
|
|
If you can run successfully an example for a type of job (!OpenMP, MPI) similar to your one,
|
|
try to edit the example to run your application.
|
|
|
|
If the problem remains, then, in your request in Service-Now, describe the problem in details that
|
|
are needed to reproduce it. Include the output of the following commands:
|
|
|
|
```bash
|
|
date
|
|
hostname
|
|
pwd
|
|
module list
|
|
|
|
# All slurm commands used with the corresponding output
|
|
```
|
|
|
|
Do not delete any output and error files generated by Slurm.
|
|
Make a copy of the failed job script if you like to edit it meanwhile.
|