merge and move support pages
this are now under the /support path, meaning that this is unified for all clusters.
This commit is contained in:
65
docs/support/faq.md
Normal file
65
docs/support/faq.md
Normal file
@@ -0,0 +1,65 @@
|
||||
---
|
||||
title: "FAQ"
|
||||
---
|
||||
|
||||
# Frequently Asked Questions
|
||||
|
||||
## How do I register for Merlin?
|
||||
|
||||
See [Requesting Merlin Access](../merlin7/01-Quick-Start-Guide/requesting-accounts.md).
|
||||
|
||||
## How do I get information about downtimes and updates?
|
||||
|
||||
See [Get updated through the Merlin User list!](index.md#merlin-user-mailing-list)
|
||||
|
||||
## How can I request access to a Merlin project directory?
|
||||
|
||||
Merlin projects are placed in the `/data/project` directory. Access to each
|
||||
project is controlled by Unix group membership. If you require access to an
|
||||
existing project, please request group membership as described in
|
||||
[Requesting Unix Group Membership](../merlin7/01-Quick-Start-Guide/requesting-projects.md#requesting-unix-group-membership).
|
||||
|
||||
Your project leader or project colleagues will know what Unix group you should
|
||||
belong to. Otherwise, you can check what Unix group is allowed to access that
|
||||
project directory (simply run `ls -ltrhd` for the project directory).
|
||||
|
||||
## Can I install software myself?
|
||||
|
||||
Most software can be installed in user directories without any special
|
||||
permissions. We recommend using `/data/user/$USER/bin` for software since home
|
||||
directories are fairly small. For software that will be used by multiple
|
||||
groups/users you can also [request the admins](index.md) install it as a
|
||||
[module](../merlin7/05-Software-Support/pmodules.md).
|
||||
|
||||
How to install depends a bit on the software itself. There are three common
|
||||
installation procedures:
|
||||
|
||||
* *binary distributions*. These are easy; just put them in a directory (eg
|
||||
`/data/user/$USER/bin`) and add that to your PATH.
|
||||
* *source compilation* using make/cmake/autoconfig/etc. Usually the
|
||||
compilation scripts accept a `--prefix=/data/user/$USER` directory for where
|
||||
to install it. Then they place files under `<prefix>/bin`, `<prefix>/lib`,
|
||||
etc. The exact syntax should be documented in the installation instructions.
|
||||
!!! note inline end
|
||||
The following is based on `merlin6`, but should still be valid for `merlin7`.
|
||||
* *conda environment*. This is now becoming standard for python-based
|
||||
software, including lots of the AI tools. First follow the [initial setup
|
||||
instructions](../merlin6/software-support/python.md#anaconda) to configure conda to
|
||||
use /data/user instead of your home directory. Then you can create
|
||||
environments like:
|
||||
|
||||
```bash
|
||||
module load anaconda/2019.07
|
||||
# if they provide environment.yml
|
||||
conda env create -f environment.yml
|
||||
|
||||
# or to create manually
|
||||
conda create --name myenv python==3.9 ...
|
||||
|
||||
conda activate myenv
|
||||
```
|
||||
|
||||
## Something doesn't work
|
||||
|
||||
Check the list of [known problems](known-problems.md) to see if a solution is known.
|
||||
If not, please [contact the admins](index.md).
|
||||
56
docs/support/index.md
Normal file
56
docs/support/index.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Getting Support
|
||||
|
||||
!!! tip
|
||||
It is strongly recommended that users subscribe to the [user mailing
|
||||
list](#merlin-user-mailing-list), that way you will receive the newest
|
||||
announcements concerning the status of the clusters, information regarding
|
||||
maintenance actions, and other tasks that might affect your work.
|
||||
|
||||
There are several channels you can use to get support:
|
||||
|
||||
* the **preferred** choice is to submit a ticket with [PSI Service Now](https://psi.service-now.com/psisp), alternatively
|
||||
* you can also us our [user mailing list](#merlin-user-mailing-list), or lastly
|
||||
* you can email the Admins directly <merlin-admins@lists.psi.ch>
|
||||
|
||||
!!! info
|
||||
Basic contact information is also displayed on every shell login to the
|
||||
system using the *Message of the Day* mechanism.
|
||||
|
||||
## PSI Service Now
|
||||
|
||||
[PSI Service Now](https://psi.service-now.com/psisp) is the official tool for
|
||||
opening tickets and requests.
|
||||
|
||||
* PSI HelpDesk will redirect the incident to the corresponding department, or
|
||||
* you can always assign it directly by checking the box `I know which service
|
||||
is affected` and providing the service name `Local HPC Resources (e.g.
|
||||
Merlin) [CF]` (just type in `Local` and you should get the valid
|
||||
completions).
|
||||
|
||||
## Merlin User mailing list
|
||||
|
||||
This mailing list is the official channel used by Merlin administrators to inform users about downtimes,
|
||||
interventions or problems. Users can be subscribed in two ways:
|
||||
|
||||
* *Preferred way*: Self-registration through [Sympa](https://psilists.ethz.ch/sympa/info/merlin-users)
|
||||
* If you need to subscribe many people (e.g. your whole group) by sending a
|
||||
request to the admin list <merlin-admins@lists.psi.ch>
|
||||
and providing a list of email addresses.
|
||||
|
||||
## Email the Admins
|
||||
|
||||
This is the official way to contact Merlin Administrators for discussions which
|
||||
do not fit well into the incident category. Do not hesitate to contact us for
|
||||
such cases.
|
||||
|
||||
**E-Mail**: <merlin-admins@lists.psi.ch>
|
||||
|
||||
---
|
||||
|
||||
## Who are we?
|
||||
|
||||
The PSI Merlin clusters are managed by the **[High Performance Computing and
|
||||
Emerging technologies Group](https://www.psi.ch/de/lsm/hpce-group)**, which is
|
||||
part of the [Science IT Infrastructure, and Services department
|
||||
(AWI)](https://www.psi.ch/en/awi) in PSI's [Center for Scientific Computing,
|
||||
Theory and Data (SCD)](https://www.psi.ch/en/csd).
|
||||
172
docs/support/known-problems.md
Normal file
172
docs/support/known-problems.md
Normal file
@@ -0,0 +1,172 @@
|
||||
# Known Problems
|
||||
|
||||
## Common errors
|
||||
|
||||
### Illegal instruction error
|
||||
|
||||
It may happened that your code, compiled on one machine will not be executed on another throwing exception like **"(Illegal instruction)"**.
|
||||
This is usually because the software was compiled with a set of instructions newer than the ones available in the node where the software runs,
|
||||
and it mostly depends on the processor generation.
|
||||
|
||||
In example, `merlin-l-001` and `merlin-l-002` contain a newer generation of processors than the old GPUs nodes, or than the Merlin5 cluster.
|
||||
Hence, unless one compiles the software with compatibility with set of instructions from older processors, it will not run on old nodes.
|
||||
Sometimes, this is properly set by default at the compilation time, but sometimes is not.
|
||||
|
||||
For GCC, please refer to [GCC x86 Options](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html) for compiling options. In case of doubts, contact us.
|
||||
|
||||
## Slurm
|
||||
|
||||
### sbatch using one core despite setting -c/--cpus-per-task
|
||||
|
||||
From **Slurm v22.05.6**, the behavior of `srun` has changed. Merlin has been updated to this version since *Tuesday 13.12.2022*.
|
||||
|
||||
`srun` will no longer read in `SLURM_CPUS_PER_TASK`, which is typically set when defining `-c/--cpus-per-task` in the `sbatch` command.
|
||||
This means you will implicitly have to specify `-c\--cpus-per-task` also on your `srun` calls, or set the new `SRUN_CPUS_PER_TASK` environment variable to accomplish the same thing.
|
||||
Therefore, unless this is implicitly specified, `srun` will use only one Core per task (resulting in 2 CPUs per task when multithreading is enabled)
|
||||
|
||||
An example for setting up `srun` with `-c\--cpus-per-task`:
|
||||
|
||||
```bash
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method1
|
||||
#!/bin/bash
|
||||
#SBATCH -n 1
|
||||
#SBATCH --cpus-per-task=8
|
||||
|
||||
echo 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'
|
||||
srun python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
echo 'One has to implicitly specify $SLURM_CPUS_PER_TASK'
|
||||
echo 'In this example, by setting -c/--cpus-per-task in srun'
|
||||
srun --cpus-per-task=$SLURM_CPUS_PER_TASK python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method1
|
||||
Submitted batch job 8000813
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000813.out
|
||||
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
||||
{1, 45}
|
||||
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
||||
In this example, by setting -c/--cpus-per-task in srun
|
||||
{1, 2, 3, 4, 45, 46, 47, 48}
|
||||
```
|
||||
|
||||
An example to accomplish the same thing with the `SRUN_CPUS_PER_TASK` environment variable:
|
||||
|
||||
```bash
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method2
|
||||
#!/bin/bash
|
||||
#SBATCH -n 1
|
||||
#SBATCH --cpus-per-task=8
|
||||
|
||||
echo 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'
|
||||
srun python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
echo 'One has to implicitly specify $SLURM_CPUS_PER_TASK'
|
||||
echo 'In this example, by setting an environment variable SRUN_CPUS_PER_TASK'
|
||||
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
|
||||
srun python -c "import os; print(os.sched_getaffinity(0))"
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2
|
||||
Submitted batch job 8000815
|
||||
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000815.out
|
||||
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
||||
{1, 45}
|
||||
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
||||
In this example, by setting an environment variable SRUN_CPUS_PER_TASK
|
||||
{1, 2, 3, 4, 45, 46, 47, 48}
|
||||
```
|
||||
|
||||
## General topics
|
||||
|
||||
### Default SHELL
|
||||
|
||||
In general, **`/bin/bash` is the recommended default user's SHELL** when working in Merlin.
|
||||
|
||||
Some users might notice that BASH is not the default SHELL when logging in to Merlin systems, or they might need to run a different SHELL.
|
||||
This is probably because when the PSI account was requested, no SHELL description was specified or a different one was requested explicitly by the requestor.
|
||||
Users can check which is the default SHELL specified in the PSI account with the following command:
|
||||
|
||||
```bash
|
||||
getent passwd $USER | awk -F: '{print $NF}'
|
||||
```
|
||||
|
||||
If SHELL does not correspond to the one you need to use, you should request a central change for it.
|
||||
This is because Merlin accounts are central PSI accounts. Hence, **change must be requested** via [PSI Service Now](index.md#psi-service-now).
|
||||
|
||||
Alternatively, if you work on other PSI Linux systems but for Merlin you need a different SHELL type, a temporary change can be performed during login startup.
|
||||
You can update one of the following files:
|
||||
|
||||
* `~/.login`
|
||||
* `~/.profile`
|
||||
* Any `rc` or `profile` file in your home directory (i.e. `.cshrc`, `.bashrc`, `.bash_profile`, etc.)
|
||||
|
||||
with the following lines:
|
||||
|
||||
```bash
|
||||
# Replace MY_SHELL with the bash type you need
|
||||
MY_SHELL=/bin/bash
|
||||
exec $MY_SHELL -l
|
||||
```
|
||||
|
||||
Notice that available *shells* can be found in the following file:
|
||||
|
||||
```bash
|
||||
cat /etc/shells
|
||||
```
|
||||
|
||||
### 3D acceleration: OpenGL vs Mesa
|
||||
|
||||
Some applications can run with OpenGL support. This is only possible when the node contains a GPU card.
|
||||
|
||||
In general, X11 with Mesa Driver is the recommended method as it will work in all cases (no need of GPUs). In example, for ParaView:
|
||||
|
||||
```bash
|
||||
module load paraview
|
||||
paraview-mesa paraview # 'paraview --mesa' for old releases
|
||||
```
|
||||
|
||||
However, if one needs to run with OpenGL support, this is still possible by running `vglrun`. In example, for running Paraview:
|
||||
|
||||
```bash
|
||||
module load paraview
|
||||
vglrun paraview
|
||||
```
|
||||
|
||||
Officially, the supported method for running `vglrun` is by using the [NoMachine remote desktop](../merlin7/02-How-To-Use-Merlin/nomachine.md).
|
||||
Running `vglrun` it's also possible using SSH with X11 Forwarding. However, it's very slow and it's only recommended when running
|
||||
in Slurm (from [NoMachine](../merlin7/02-How-To-Use-Merlin/nomachine.md)). Please, avoid running `vglrun` over SSH from a desktop or laptop.
|
||||
|
||||
## Software
|
||||
|
||||
### ANSYS
|
||||
|
||||
Sometimes, running ANSYS/Fluent requires X11 support. For that, one should run fluent as follows.
|
||||
|
||||
```bash
|
||||
module load ANSYS
|
||||
fluent -driver x11
|
||||
```
|
||||
|
||||
### Paraview
|
||||
|
||||
For running Paraview, one can run it with Mesa support or OpenGL support. Please refer to [OpenGL vs Mesa](#3d-acceleration-opengl-vs-mesa) for
|
||||
further information about how to run it.
|
||||
|
||||
### Module command not found
|
||||
|
||||
In some circumstances the module command may not be initialized properly. For instance, you may see the following error upon logon:
|
||||
|
||||
```
|
||||
bash: module: command not found
|
||||
```
|
||||
|
||||
The most common cause for this is a custom `.bashrc` file which fails to source the global `/etc/bashrc` responsible for setting up PModules in some OS versions. To fix this, add the following to `$HOME/.bashrc`:
|
||||
|
||||
```bash
|
||||
if [ -f /etc/bashrc ]; then
|
||||
. /etc/bashrc
|
||||
fi
|
||||
```
|
||||
|
||||
It can also be fixed temporarily in an existing terminal by running `. /etc/bashrc` manually.
|
||||
42
docs/support/troubleshooting.md
Normal file
42
docs/support/troubleshooting.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Troubleshooting
|
||||
|
||||
For troubleshooting, please contact us through the official channels. See
|
||||
[here](index.md) for more information.
|
||||
|
||||
## Known Problems
|
||||
|
||||
Before contacting us for support, please check the [Known
|
||||
Problems](known-problems.md) page to see if there is an existing workaround for
|
||||
your specific problem.
|
||||
|
||||
## Troubleshooting Slurm Jobs
|
||||
|
||||
If you want to report a problem or request for help when running jobs, please
|
||||
**always provide** the following information:
|
||||
|
||||
1. Provide your batch script or, alternatively, the path to your batch script.
|
||||
2. Add **always** the following commands to your batch script
|
||||
|
||||
```bash
|
||||
echo "User information:"; who am i
|
||||
echo "Running hostname:"; hostname
|
||||
echo "Current location:"; pwd
|
||||
echo "User environment:"; env
|
||||
echo "List of PModules:"; module list
|
||||
```
|
||||
|
||||
3. Whenever possible, provide the Slurm JobID.
|
||||
|
||||
Providing this information is **extremely important** in order to ease
|
||||
debugging, otherwise only with the description of the issue or just the error
|
||||
message is completely insufficient in most cases.
|
||||
|
||||
## Troubleshooting SSH
|
||||
|
||||
Use the ssh command with the "-vvv" option and copy and paste the text
|
||||
(**please don't send us screenshots**) the output to your request in
|
||||
Service-Now. Example:
|
||||
|
||||
```bash
|
||||
ssh -Y -vvv $username@<hostname>
|
||||
```
|
||||
Reference in New Issue
Block a user