Added Known Problems and Troubleshooting (splitted)
This commit is contained in:
parent
f847e9358d
commit
f7283512da
@ -15,8 +15,6 @@ entries:
|
||||
url: /merlin6/code-of-conduct.html
|
||||
- title: Hardware And Software Description
|
||||
url: /merlin6/hardware-and-software.html
|
||||
- title: Migrating From Merlin5
|
||||
url: /merlin6/migrating.html
|
||||
- title: Accessing Merlin
|
||||
folderitems:
|
||||
- title: Requesting Accounts
|
||||
@ -39,7 +37,11 @@ entries:
|
||||
url: /merlin6/running-jobs.html
|
||||
- title: Support
|
||||
folderitems:
|
||||
- title: Migrating From Merlin5
|
||||
url: /merlin6/migrating.html
|
||||
- title: Known Problems
|
||||
url: /merlin6/known-problems.html
|
||||
- title: Troubleshooting
|
||||
url: /merlin6/troubleshooting.html
|
||||
- title: Contact
|
||||
url: /merlin6/contact.html
|
||||
- title: Known Problems and Troubleshooting
|
||||
url: /merlin6/troubleshooting.html
|
||||
|
@ -1,83 +0,0 @@
|
||||
---
|
||||
title: Known Problems and Troubleshooting
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/troubleshooting.html
|
||||
---
|
||||
|
||||
## Known Problems
|
||||
|
||||
### Paraview, ANSYS and OpenGL
|
||||
|
||||
Try to use X11(mesa) driver for Paraview and ANSYS instead of OpenGL:
|
||||
|
||||
```bash
|
||||
# ANSYS
|
||||
module load ANSYS
|
||||
fluent -driver x11
|
||||
|
||||
# ParaView
|
||||
module load paraview
|
||||
paraview --mesa
|
||||
```
|
||||
|
||||
###+ Illegal instructions
|
||||
|
||||
It may happened that your code, compiled on one machine will not be executed on another throwing exception like "(Illegal instruction)".
|
||||
Check (with "hostname" command) on which of the node you are and compare it with the names from first item. We observe few applications
|
||||
that can't be run on merlin-c-01..16 because of this problem (notice that these machines are more then 5 years old). Hint: you may
|
||||
choose the particular flavour of the machines for your slurm job, check the "--cores-per-node" option for sbatch:
|
||||
|
||||
```bash
|
||||
sbatch --cores-per-socket=8 Script.sh # will filter the selection of the machine and exclude the oldest one, merlin-c-01..16
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Before asking for help
|
||||
|
||||
Please, if you have problems running jobs and you want to report something or just ask for help,
|
||||
please gather and attach in advance the following information:
|
||||
|
||||
* Unix username and session (``who am i`` command output)
|
||||
* Environment settings (``env`` command output)
|
||||
* Slurm batch script location (path to script and input/output files)
|
||||
* Slurm job_id (``id`` is returned on ``sbatch``/``salloc`` command, but also can be taken from ``squeue`` commmand)
|
||||
|
||||
### Troubleshooting SSH
|
||||
|
||||
Use the ssh command with the "-vvv" option and copy and paste (no screenshot please)
|
||||
the output to your request in Service-Now. Example
|
||||
|
||||
```bash
|
||||
ssh -Y -vvv bond_j@merlin-l-01
|
||||
```
|
||||
|
||||
### Troubleshooting SLURM
|
||||
|
||||
If one copies Slurm commands or batch scripts from another cluster,
|
||||
they may need some changes (often minor) to run successfully on Merlin5.
|
||||
Examine carefully the error message, especially concerning the options
|
||||
used in the slurm commands.
|
||||
|
||||
Try to submit jobs using the examples given in the section "Using Batch System to Submit Jobs to Merlin5".
|
||||
If you can run successfully an example for a type of job (!OpenMP, MPI) similar to your one,
|
||||
try to edit the example to run your application.
|
||||
|
||||
If the problem remains, then, in your request in Service-Now, describe the problem in details that
|
||||
are needed to reproduce it. Include the output of the following commands:
|
||||
|
||||
```bash
|
||||
date
|
||||
hostname
|
||||
pwd
|
||||
module list
|
||||
|
||||
# All slurm commands used with the corresponding output
|
||||
```
|
||||
|
||||
Do not delete any output and error files generated by Slurm.
|
||||
Make a copy of the failed job script if you like to edit it meanwhile.
|
36
pages/merlin6/known-problems.md
Normal file
36
pages/merlin6/known-problems.md
Normal file
@ -0,0 +1,36 @@
|
||||
---
|
||||
title: Known Problems
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 20 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/known-problems.html
|
||||
---
|
||||
|
||||
## Known Problems
|
||||
|
||||
### Paraview, ANSYS and OpenGL
|
||||
|
||||
Try to use X11(mesa) driver for Paraview and ANSYS instead of OpenGL:
|
||||
|
||||
```bash
|
||||
# ANSYS
|
||||
module load ANSYS
|
||||
fluent -driver x11
|
||||
|
||||
# ParaView
|
||||
module load paraview
|
||||
paraview --mesa
|
||||
```
|
||||
|
||||
###+ Illegal instructions
|
||||
|
||||
It may happened that your code, compiled on one machine will not be executed on another throwing exception like "(Illegal instruction)".
|
||||
Check (with "hostname" command) on which of the node you are and compare it with the names from first item. We observe few applications
|
||||
that can't be run on merlin-c-01..16 because of this problem (notice that these machines are more then 5 years old). Hint: you may
|
||||
choose the particular flavour of the machines for your slurm job, check the "--cores-per-node" option for sbatch:
|
||||
|
||||
```bash
|
||||
sbatch --cores-per-socket=8 Script.sh # will filter the selection of the machine and exclude the oldest one, merlin-c-01..16
|
||||
```
|
43
pages/merlin6/troubleshooting.md
Normal file
43
pages/merlin6/troubleshooting.md
Normal file
@ -0,0 +1,43 @@
|
||||
---
|
||||
title: Troubleshooting
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 20 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/troubleshooting.html
|
||||
---
|
||||
|
||||
For troubleshooting, please contact us through the official channels. See [Contact](/merlin6/contact.html)
|
||||
for more information.
|
||||
|
||||
## Troubleshooting running Slurm jobs
|
||||
|
||||
If you want to report a problem or request for help when running jobs, please **always provide**
|
||||
the following information:
|
||||
|
||||
1. Provide your batch script or, alternatively, the path to your batch script.
|
||||
2. Add **always** the following commands to your batch script
|
||||
|
||||
```bash
|
||||
echo "User information:"; who am i
|
||||
echo "Running hostname:"; hostname
|
||||
echo "Current location:"; pwd
|
||||
echo "User environment:"; env
|
||||
echo "List of PModules:"; module list
|
||||
```
|
||||
|
||||
3. Whenever possible, provide the Slurm JobID.
|
||||
|
||||
Providing this information is **extremely important** in order to ease debugging, otherwise
|
||||
only with the description of the issue or just the error message is completely insufficient
|
||||
in most cases.
|
||||
|
||||
### Troubleshooting SSH
|
||||
|
||||
Use the ssh command with the "-vvv" option and copy and paste (no screenshots please)
|
||||
the output to your request in Service-Now. Example
|
||||
|
||||
```bash
|
||||
ssh -Y -vvv $username@merlin-l-01.psi.ch
|
||||
```
|
Loading…
x
Reference in New Issue
Block a user