merge and move support pages
this are now under the /support path, meaning that this is unified for all clusters.
This commit is contained in:
172
docs/support/known-problems.md
Normal file
172
docs/support/known-problems.md
Normal file
@@ -0,0 +1,172 @@
|
||||
# Known Problems
|
||||
|
||||
## Common errors
|
||||
|
||||
### Illegal instruction error
|
||||
|
||||
It may happened that your code, compiled on one machine will not be executed on another throwing exception like **"(Illegal instruction)"**.
|
||||
This is usually because the software was compiled with a set of instructions newer than the ones available in the node where the software runs,
|
||||
and it mostly depends on the processor generation.
|
||||
|
||||
In example, `merlin-l-001` and `merlin-l-002` contain a newer generation of processors than the old GPUs nodes, or than the Merlin5 cluster.
|
||||
Hence, unless one compiles the software with compatibility with set of instructions from older processors, it will not run on old nodes.
|
||||
Sometimes, this is properly set by default at the compilation time, but sometimes is not.
|
||||
|
||||
For GCC, please refer to [GCC x86 Options](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html) for compiling options. In case of doubts, contact us.
|
||||
|
||||
## Slurm
|
||||
|
||||
### sbatch using one core despite setting -c/--cpus-per-task
|
||||
|
||||
From **Slurm v22.05.6**, the behavior of `srun` has changed. Merlin has been updated to this version since *Tuesday 13.12.2022*.
|
||||
|
||||
`srun` will no longer read in `SLURM_CPUS_PER_TASK`, which is typically set when defining `-c/--cpus-per-task` in the `sbatch` command.
|
||||
This means you will implicitly have to specify `-c\--cpus-per-task` also on your `srun` calls, or set the new `SRUN_CPUS_PER_TASK` environment variable to accomplish the same thing.
|
||||
Therefore, unless this is implicitly specified, `srun` will use only one Core per task (resulting in 2 CPUs per task when multithreading is enabled)
|
||||
|
||||
An example for setting up `srun` with `-c\--cpus-per-task`:
|
||||
|
||||
```bash
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method1
|
||||
#!/bin/bash
|
||||
#SBATCH -n 1
|
||||
#SBATCH --cpus-per-task=8
|
||||
|
||||
echo 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'
|
||||
srun python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
echo 'One has to implicitly specify $SLURM_CPUS_PER_TASK'
|
||||
echo 'In this example, by setting -c/--cpus-per-task in srun'
|
||||
srun --cpus-per-task=$SLURM_CPUS_PER_TASK python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method1
|
||||
Submitted batch job 8000813
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000813.out
|
||||
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
||||
{1, 45}
|
||||
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
||||
In this example, by setting -c/--cpus-per-task in srun
|
||||
{1, 2, 3, 4, 45, 46, 47, 48}
|
||||
```
|
||||
|
||||
An example to accomplish the same thing with the `SRUN_CPUS_PER_TASK` environment variable:
|
||||
|
||||
```bash
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method2
|
||||
#!/bin/bash
|
||||
#SBATCH -n 1
|
||||
#SBATCH --cpus-per-task=8
|
||||
|
||||
echo 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'
|
||||
srun python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
echo 'One has to implicitly specify $SLURM_CPUS_PER_TASK'
|
||||
echo 'In this example, by setting an environment variable SRUN_CPUS_PER_TASK'
|
||||
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
|
||||
srun python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2
|
||||
Submitted batch job 8000815
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000815.out
|
||||
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
||||
{1, 45}
|
||||
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
||||
In this example, by setting an environment variable SRUN_CPUS_PER_TASK
|
||||
{1, 2, 3, 4, 45, 46, 47, 48}
|
||||
```
|
||||
|
||||
## General topics
|
||||
|
||||
### Default SHELL
|
||||
|
||||
In general, **`/bin/bash` is the recommended default user's SHELL** when working in Merlin.
|
||||
|
||||
Some users might notice that BASH is not the default SHELL when logging in to Merlin systems, or they might need to run a different SHELL.
|
||||
This is probably because when the PSI account was requested, no SHELL description was specified or a different one was requested explicitly by the requestor.
|
||||
Users can check which is the default SHELL specified in the PSI account with the following command:
|
||||
|
||||
```bash
|
||||
getent passwd $USER | awk -F: '{print $NF}'
|
||||
```
|
||||
|
||||
If SHELL does not correspond to the one you need to use, you should request a central change for it.
|
||||
This is because Merlin accounts are central PSI accounts. Hence, **change must be requested** via [PSI Service Now](index.md#psi-service-now).
|
||||
|
||||
Alternatively, if you work on other PSI Linux systems but for Merlin you need a different SHELL type, a temporary change can be performed during login startup.
|
||||
You can update one of the following files:
|
||||
|
||||
* `~/.login`
|
||||
* `~/.profile`
|
||||
* Any `rc` or `profile` file in your home directory (i.e. `.cshrc`, `.bashrc`, `.bash_profile`, etc.)
|
||||
|
||||
with the following lines:
|
||||
|
||||
```bash
|
||||
# Replace MY_SHELL with the bash type you need
|
||||
MY_SHELL=/bin/bash
|
||||
exec $MY_SHELL -l
|
||||
```
|
||||
|
||||
Notice that available *shells* can be found in the following file:
|
||||
|
||||
```bash
|
||||
cat /etc/shells
|
||||
```
|
||||
|
||||
### 3D acceleration: OpenGL vs Mesa
|
||||
|
||||
Some applications can run with OpenGL support. This is only possible when the node contains a GPU card.
|
||||
|
||||
In general, X11 with Mesa Driver is the recommended method as it will work in all cases (no need of GPUs). In example, for ParaView:
|
||||
|
||||
```bash
|
||||
module load paraview
|
||||
paraview-mesa paraview # 'paraview --mesa' for old releases
|
||||
```
|
||||
|
||||
However, if one needs to run with OpenGL support, this is still possible by running `vglrun`. In example, for running Paraview:
|
||||
|
||||
```bash
|
||||
module load paraview
|
||||
vglrun paraview
|
||||
```
|
||||
|
||||
Officially, the supported method for running `vglrun` is by using the [NoMachine remote desktop](../merlin7/02-How-To-Use-Merlin/nomachine.md).
|
||||
Running `vglrun` it's also possible using SSH with X11 Forwarding. However, it's very slow and it's only recommended when running
|
||||
in Slurm (from [NoMachine](../merlin7/02-How-To-Use-Merlin/nomachine.md)). Please, avoid running `vglrun` over SSH from a desktop or laptop.
|
||||
|
||||
## Software
|
||||
|
||||
### ANSYS
|
||||
|
||||
Sometimes, running ANSYS/Fluent requires X11 support. For that, one should run fluent as follows.
|
||||
|
||||
```bash
|
||||
module load ANSYS
|
||||
fluent -driver x11
|
||||
```
|
||||
|
||||
### Paraview
|
||||
|
||||
For running Paraview, one can run it with Mesa support or OpenGL support. Please refer to [OpenGL vs Mesa](#3d-acceleration-opengl-vs-mesa) for
|
||||
further information about how to run it.
|
||||
|
||||
### Module command not found
|
||||
|
||||
In some circumstances the module command may not be initialized properly. For instance, you may see the following error upon logon:
|
||||
|
||||
```
|
||||
bash: module: command not found
|
||||
```
|
||||
|
||||
The most common cause for this is a custom `.bashrc` file which fails to source the global `/etc/bashrc` responsible for setting up PModules in some OS versions. To fix this, add the following to `$HOME/.bashrc`:
|
||||
|
||||
```bash
|
||||
if [ -f /etc/bashrc ]; then
|
||||
. /etc/bashrc
|
||||
fi
|
||||
```
|
||||
|
||||
It can also be fixed temporarily in an existing terminal by running `. /etc/bashrc` manually.
|
||||
Reference in New Issue
Block a user