Added Known Problems and Troubleshooting (splitted)

This commit is contained in:
caubet_m 2019-06-20 20:18:13 +02:00
parent f847e9358d
commit f7283512da
4 changed files with 85 additions and 87 deletions

View File

@ -15,8 +15,6 @@ entries:
url: /merlin6/code-of-conduct.html url: /merlin6/code-of-conduct.html
- title: Hardware And Software Description - title: Hardware And Software Description
url: /merlin6/hardware-and-software.html url: /merlin6/hardware-and-software.html
- title: Migrating From Merlin5
url: /merlin6/migrating.html
- title: Accessing Merlin - title: Accessing Merlin
folderitems: folderitems:
- title: Requesting Accounts - title: Requesting Accounts
@ -39,7 +37,11 @@ entries:
url: /merlin6/running-jobs.html url: /merlin6/running-jobs.html
- title: Support - title: Support
folderitems: folderitems:
- title: Migrating From Merlin5
url: /merlin6/migrating.html
- title: Known Problems
url: /merlin6/known-problems.html
- title: Troubleshooting
url: /merlin6/troubleshooting.html
- title: Contact - title: Contact
url: /merlin6/contact.html url: /merlin6/contact.html
- title: Known Problems and Troubleshooting
url: /merlin6/troubleshooting.html

View File

@ -1,83 +0,0 @@
---
title: Known Problems and Troubleshooting
#tags:
#keywords:
last_updated: 13 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/troubleshooting.html
---
## Known Problems
### Paraview, ANSYS and OpenGL
Try to use X11(mesa) driver for Paraview and ANSYS instead of OpenGL:
```bash
# ANSYS
module load ANSYS
fluent -driver x11
# ParaView
module load paraview
paraview --mesa
```
###+ Illegal instructions
It may happened that your code, compiled on one machine will not be executed on another throwing exception like "(Illegal instruction)".
Check (with "hostname" command) on which of the node you are and compare it with the names from first item. We observe few applications
that can't be run on merlin-c-01..16 because of this problem (notice that these machines are more then 5 years old). Hint: you may
choose the particular flavour of the machines for your slurm job, check the "--cores-per-node" option for sbatch:
```bash
sbatch --cores-per-socket=8 Script.sh # will filter the selection of the machine and exclude the oldest one, merlin-c-01..16
```
## Troubleshooting
### Before asking for help
Please, if you have problems running jobs and you want to report something or just ask for help,
please gather and attach in advance the following information:
* Unix username and session (``who am i`` command output)
* Environment settings (``env`` command output)
* Slurm batch script location (path to script and input/output files)
* Slurm job_id (``id`` is returned on ``sbatch``/``salloc`` command, but also can be taken from ``squeue`` commmand)
### Troubleshooting SSH
Use the ssh command with the "-vvv" option and copy and paste (no screenshot please)
the output to your request in Service-Now. Example
```bash
ssh -Y -vvv bond_j@merlin-l-01
```
### Troubleshooting SLURM
If one copies Slurm commands or batch scripts from another cluster,
they may need some changes (often minor) to run successfully on Merlin5.
Examine carefully the error message, especially concerning the options
used in the slurm commands.
Try to submit jobs using the examples given in the section "Using Batch System to Submit Jobs to Merlin5".
If you can run successfully an example for a type of job (!OpenMP, MPI) similar to your one,
try to edit the example to run your application.
If the problem remains, then, in your request in Service-Now, describe the problem in details that
are needed to reproduce it. Include the output of the following commands:
```bash
date
hostname
pwd
module list
# All slurm commands used with the corresponding output
```
Do not delete any output and error files generated by Slurm.
Make a copy of the failed job script if you like to edit it meanwhile.

View File

@ -0,0 +1,36 @@
---
title: Known Problems
#tags:
#keywords:
last_updated: 20 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/known-problems.html
---
## Known Problems
### Paraview, ANSYS and OpenGL
Try to use X11(mesa) driver for Paraview and ANSYS instead of OpenGL:
```bash
# ANSYS
module load ANSYS
fluent -driver x11
# ParaView
module load paraview
paraview --mesa
```
###+ Illegal instructions
It may happened that your code, compiled on one machine will not be executed on another throwing exception like "(Illegal instruction)".
Check (with "hostname" command) on which of the node you are and compare it with the names from first item. We observe few applications
that can't be run on merlin-c-01..16 because of this problem (notice that these machines are more then 5 years old). Hint: you may
choose the particular flavour of the machines for your slurm job, check the "--cores-per-node" option for sbatch:
```bash
sbatch --cores-per-socket=8 Script.sh # will filter the selection of the machine and exclude the oldest one, merlin-c-01..16
```

View File

@ -0,0 +1,43 @@
---
title: Troubleshooting
#tags:
#keywords:
last_updated: 20 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/troubleshooting.html
---
For troubleshooting, please contact us through the official channels. See [Contact](/merlin6/contact.html)
for more information.
## Troubleshooting running Slurm jobs
If you want to report a problem or request for help when running jobs, please **always provide**
the following information:
1. Provide your batch script or, alternatively, the path to your batch script.
2. Add **always** the following commands to your batch script
```bash
echo "User information:"; who am i
echo "Running hostname:"; hostname
echo "Current location:"; pwd
echo "User environment:"; env
echo "List of PModules:"; module list
```
3. Whenever possible, provide the Slurm JobID.
Providing this information is **extremely important** in order to ease debugging, otherwise
only with the description of the issue or just the error message is completely insufficient
in most cases.
### Troubleshooting SSH
Use the ssh command with the "-vvv" option and copy and paste (no screenshots please)
the output to your request in Service-Now. Example
```bash
ssh -Y -vvv $username@merlin-l-01.psi.ch
```