first stab at mkdocs migration
refactor CSCS and Meg content add merlin6 quick start update merlin6 nomachine docs give the userdoc its own color scheme we use the Materials default one refactored slurm general docs merlin6 add merlin6 JB docs add software support m6 docs add all files to nav vibed changes #1 add missing pages further vibing #2 vibe #3 further fixes
This commit is contained in:
@@ -0,0 +1,49 @@
|
||||
# Accessing Interactive Nodes
|
||||
|
||||
## SSH Access
|
||||
|
||||
For interactive command shell access, use an SSH client. We recommend to activate SSH's X11 forwarding to allow you to use graphical
|
||||
applications (e.g. a text editor, but for more performant graphical access, refer to the sections below). X applications are supported
|
||||
in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:
|
||||
|
||||
* Merlin6 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
|
||||
* Hence, Merlin6 administrators **do not offer official support** for X11 client setup.
|
||||
* Nevertheless, a generic guide for X11 client setup (*Linux*, *Windows* and *MacOS*) is provided below.
|
||||
* PSI desktop configuration issues must be addressed through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
||||
* Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
|
||||
|
||||
### Accessing from a Linux client
|
||||
|
||||
Refer to [{How To Use Merlin -> Accessing from Linux Clients}](../how-to-use-merlin/connect-from-linux.md) for **Linux** SSH client and X11 configuration.
|
||||
|
||||
### Accessing from a Windows client
|
||||
|
||||
Refer to [{How To Use Merlin -> Accessing from Windows Clients}](../how-to-use-merlin/connect-from-windows.md) for **Windows** SSH client and X11 configuration.
|
||||
|
||||
### Accessing from a MacOS client
|
||||
|
||||
Refer to [{How To Use Merlin -> Accessing from MacOS Clients}](../how-to-use-merlin/connect-from-macos.md) for **MacOS** SSH client and X11 configuration.
|
||||
|
||||
## NoMachine Remote Desktop Access
|
||||
|
||||
X applications are supported in the login nodes and can run efficiently through a **NoMachine** client. This is the officially supported way to run more demanding X applications on Merlin6.
|
||||
|
||||
* For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
||||
* For other workstations The client software can be downloaded from the [Nomachine Website](https://www.nomachine.com/product&p=NoMachine%20Enterprise%20Client).
|
||||
|
||||
### Configuring NoMachine
|
||||
|
||||
Refer to [{How To Use Merlin -> Remote Desktop Access}](../how-to-use-merlin/nomachine.md) for further instructions of how to configure the NoMachine client and how to access it from PSI and from outside PSI.
|
||||
|
||||
## Login nodes hardware description
|
||||
|
||||
The Merlin6 login nodes are the official machines for accessing the recources of Merlin6.
|
||||
From these machines, users can submit jobs to the Slurm batch system as well as visualize or compile their software.
|
||||
|
||||
The Merlin6 login nodes are the following:
|
||||
|
||||
| Hostname | SSH | NoMachine | #cores | #Threads | CPU | Memory | Scratch | Scratch Mountpoint |
|
||||
| ------------------- | --- | --------- | ------ |:--------:| :-------------------- | ------ | ---------- | :------------------ |
|
||||
| merlin-l-001.psi.ch | yes | yes | 2 x 22 | 2 | Intel Xeon Gold 6152 | 384GB | 1.8TB NVMe | ``/scratch`` |
|
||||
| merlin-l-002.psi.ch | yes | yes | 2 x 22 | 2 | Intel Xeon Gold 6142 | 384GB | 1.8TB NVMe | ``/scratch`` |
|
||||
| merlin-l-01.psi.ch | yes | - | 2 x 16 | 2 | Intel Xeon E5-2697Av4 | 512GB | 100GB SAS | ``/scratch`` |
|
||||
49
docs/merlin6/quick-start-guide/accessing-slurm.md
Normal file
49
docs/merlin6/quick-start-guide/accessing-slurm.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Accessing Slurm Cluster
|
||||
|
||||
## The Merlin Slurm clusters
|
||||
|
||||
Merlin contains a multi-cluster setup, where multiple Slurm clusters coexist under the same umbrella.
|
||||
It basically contains the following clusters:
|
||||
|
||||
* The **Merlin6 Slurm CPU cluster**, which is called [**`merlin6`**](#merlin6-cpu-cluster-access).
|
||||
* The **Merlin6 Slurm GPU cluster**, which is called [**`gmerlin6`**](#merlin6-gpu-cluster-access).
|
||||
* The *old Merlin5 Slurm CPU cluster*, which is called [**`merlin5`**](#merlin5-cpu-cluster-access), still supported in a best effort basis.
|
||||
|
||||
## Accessing the Slurm clusters
|
||||
|
||||
Any job submission must be performed from a **Merlin login node**. Please refer to the [**Accessing the Interactive Nodes documentation**](accessing-interactive-nodes.md)
|
||||
for further information about how to access the cluster.
|
||||
|
||||
In addition, any job *must be submitted from a high performance storage area visible by the login nodes and by the computing nodes*. For this, the possible storage areas are the following:
|
||||
|
||||
* `/data/user`
|
||||
* `/data/project`
|
||||
* `/shared-scratch`
|
||||
|
||||
Please, avoid using `/psi/home` directories for submitting jobs.
|
||||
|
||||
### Merlin6 CPU cluster access
|
||||
|
||||
The **Merlin6 CPU cluster** (**`merlin6`**) is the default cluster configured
|
||||
in the login nodes. Any job submission will use by default this cluster, unless
|
||||
the option `--cluster` is specified with another of the existing clusters.
|
||||
|
||||
For further information about how to use this cluster, please visit: [**Merlin6 CPU Slurm Cluster documentation**](../slurm-configuration.md).
|
||||
|
||||
### Merlin6 GPU cluster access
|
||||
|
||||
The **Merlin6 GPU cluster** (**`gmerlin6`**) is visible from the login nodes. However, to submit jobs to this cluster, one needs to specify the option `--cluster=gmerlin6` when submitting a job or allocation.
|
||||
|
||||
For further information about how to use this cluster, please visit: [**Merlin6 GPU Slurm Cluster documentation**](../../gmerlin6/slurm-configuration.md).
|
||||
|
||||
### Merlin5 CPU cluster access
|
||||
|
||||
The **Merlin5 CPU cluster** (**`merlin5`**) is visible from the login nodes. However, to submit jobs
|
||||
to this cluster, one needs to specify the option `--cluster=merlin5` when submitting a job or allocation.
|
||||
|
||||
Using this cluster is in general not recommended, however this is still
|
||||
available for old users needing extra computational resources or longer jobs.
|
||||
Have in mind that this cluster is only supported in a **best effort basis**,
|
||||
and it contains very old hardware and configurations.
|
||||
|
||||
For further information about how to use this cluster, please visit the [**Merlin5 CPU Slurm Cluster documentation**](../../merlin5/slurm-configuration.md).
|
||||
48
docs/merlin6/quick-start-guide/code-of-conduct.md
Normal file
48
docs/merlin6/quick-start-guide/code-of-conduct.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Code Of Conduct
|
||||
|
||||
## The Basic principle
|
||||
|
||||
The basic principle is courtesy and consideration for other users.
|
||||
|
||||
* Merlin6 is a system shared by many users, therefore you are kindly requested to apply common courtesy in using its resources. Please follow our guidelines which aim at providing and maintaining an efficient compute environment for all our users.
|
||||
* Basic shell programming skills are an essential requirement in a Linux/UNIX HPC cluster environment; a proficiency in shell programming is greatly beneficial.
|
||||
|
||||
## Interactive nodes
|
||||
|
||||
* The interactive nodes (also known as login nodes) are for development and quick testing:
|
||||
* It is **strictly forbidden to run production jobs** on the login nodes. All production jobs must
|
||||
be submitted to the batch system.
|
||||
* It is **forbidden to run long processes** occupying big parts of a login node's resources.
|
||||
* According to the previous rules, **misbehaving running processes will have to be killed.**
|
||||
in order to keep the system responsive for other users.
|
||||
|
||||
## Batch system
|
||||
|
||||
* Make sure that no broken or run-away processes are left when your job is done. Keep the process space clean on all nodes.
|
||||
* During the runtime of a job, it is mandatory to use the ``/scratch`` and ``/shared-scratch`` partitions for temporary data:
|
||||
* It is **forbidden** to use the ``/data/user``, ``/data/project`` or ``/psi/home/`` for that purpose.
|
||||
* Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.
|
||||
* Prefer ``/scratch`` over ``/shared-scratch`` and use the latter only when you require the temporary files to be visible from multiple nodes.
|
||||
* Read the description in **[Merlin6 directory structure](../how-to-use-merlin/storage.md#merlin6-directories)** for learning about the correct usage of each partition type.
|
||||
|
||||
## User and project data
|
||||
|
||||
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
||||
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
|
||||
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
||||
|
||||
!!! warning
|
||||
|
||||
When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled.
|
||||
Hence, **when a user leaves PSI**, she, her supervisor or team **must ensure that the data is backed up to an external storage**
|
||||
|
||||
## System Administrator Rights
|
||||
|
||||
* The system administrator has the right to temporarily block the access to
|
||||
Merlin6 for an account violating the Code of Conduct in order to maintain the
|
||||
efficiency and stability of the system.
|
||||
* Repetitive violations by the same user will be escalated to the user's supervisor.
|
||||
* The system administrator has the right to delete files in the **scratch** directories
|
||||
* after a job, if the job failed to clean up its files.
|
||||
* during the job in order to prevent a job from destabilizing a node or multiple nodes.
|
||||
* The system administrator has the right to kill any misbehaving running processes.
|
||||
56
docs/merlin6/quick-start-guide/introduction.md
Normal file
56
docs/merlin6/quick-start-guide/introduction.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Introduction
|
||||
|
||||
## The Merlin local HPC cluster
|
||||
|
||||
Historically, the local HPC clusters at PSI were named **Merlin**. Over the years,
|
||||
multiple generations of Merlin have been deployed.
|
||||
|
||||
At present, the **Merlin local HPC cluster** contains _two_ generations of it:
|
||||
|
||||
* the old **Merlin5** cluster (`merlin5` Slurm cluster), and
|
||||
* the newest generation **Merlin6**, which is divided in two Slurm clusters:
|
||||
* `merlin6` as the Slurm CPU cluster
|
||||
* `gmerlin6` as the Slurm GPU cluster.
|
||||
|
||||
Access to the different Slurm clusters is possible from the [**Merlin login nodes**](accessing-interactive-nodes.md),
|
||||
which can be accessed through the [SSH protocol](accessing-interactive-nodes.md#ssh-access) or the [NoMachine (NX) service](../how-to-use-merlin/nomachine.md).
|
||||
|
||||
The following image shows the Slurm architecture design for the Merlin5 & Merlin6 (CPU & GPU) clusters:
|
||||
|
||||

|
||||
|
||||
### Merlin6
|
||||
|
||||
Merlin6 is a the official PSI Local HPC cluster for development and
|
||||
mission-critical applications that has been built in 2019. It replaces
|
||||
the Merlin5 cluster.
|
||||
|
||||
Merlin6 is designed to be extensible, so is technically possible to add
|
||||
more compute nodes and cluster storage without significant increase of
|
||||
the costs of the manpower and the operations.
|
||||
|
||||
Merlin6 contains all the main services needed for running cluster, including
|
||||
**login nodes**, **storage**, **computing nodes** and other _subservices_,
|
||||
connected to the central PSI IT infrastructure.
|
||||
|
||||
#### CPU and GPU Slurm clusters
|
||||
|
||||
The Merlin6 **computing nodes** are mostly based on **CPU** resources. However,
|
||||
it also contains a small amount of **GPU**-based resources, which are mostly used
|
||||
by the BIO Division and by Deep Leaning project.
|
||||
|
||||
These computational resources are split into **two** different **[Slurm](https://slurm.schedmd.com/overview.html)** clusters:
|
||||
|
||||
* The Merlin6 CPU nodes are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`merlin6`**](../slurm-configuration.md).
|
||||
* This is the **default Slurm cluster** configured in the login nodes: any job submitted without the option `--cluster` will be submited to this cluster.
|
||||
* The Merlin6 GPU resources are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`gmerlin6`**](../../gmerlin6/slurm-configuration.md).
|
||||
* Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``.
|
||||
|
||||
### Merlin5
|
||||
|
||||
The old Slurm **CPU** _Merlin_ cluster is still active and is maintained in a best effort basis.
|
||||
|
||||
**Merlin5** only contains **computing nodes** resources in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster.
|
||||
|
||||
* The Merlin5 CPU cluster is called [**merlin5**](../../merlin5/slurm-configuration.md).
|
||||
|
||||
42
docs/merlin6/quick-start-guide/requesting-accounts.md
Normal file
42
docs/merlin6/quick-start-guide/requesting-accounts.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Requesting Merlin Accounts
|
||||
|
||||
## Requesting Access to Merlin6
|
||||
|
||||
Access to Merlin6 is regulated by a PSI user's account being a member of the **`svc-cluster_merlin6`** group. Access to this group will also grant access to older generations of Merlin (`merlin5`).
|
||||
|
||||
Requesting **Merlin6** access *has to be done* with the corresponding **[Request Linux Group Membership](https://psi.service-now.com/psisp?id=psi_new_sc_cat_item&sys_id=84f2c0c81b04f110679febd9bb4bcbb1)** form, available in the [PSI Service Now Service Catalog](https://psi.service-now.com/psisp).
|
||||
|
||||

|
||||
|
||||
Mandatory customizable fields are the following:
|
||||
|
||||
* **`Order Access for user`**, which defaults to the logged in user. However, requesting access for another user it's also possible.
|
||||
* **`Request membership for group`**, for Merlin6 the **`svc-cluster_merlin6`** must be selected.
|
||||
* **`Justification`**, please add here a short justification why access to Merlin6 is necessary.
|
||||
|
||||
Once submitted, the Merlin responsible will approve the request as soon as possible (within the next few hours on working days). Once the request is approved, *it may take up to 30 minutes to get the account fully configured*.
|
||||
|
||||
## Requesting Access to Merlin5
|
||||
|
||||
Access to Merlin5 is regulated by a PSI user's account being a member of the **`svc-cluster_merlin5`** group. Access to this group does not grant access to newer generations of Merlin (`merlin6`, `gmerlin6`, and future ones).
|
||||
|
||||
Requesting **Merlin5** access *has to be done* with the corresponding **[Request Linux Group Membership](https://psi.service-now.com/psisp?id=psi_new_sc_cat_item&sys_id=84f2c0c81b04f110679febd9bb4bcbb1)** form, available in the [PSI Service Now Service Catalog](https://psi.service-now.com/psisp).
|
||||
|
||||

|
||||
|
||||
Mandatory customizable fields are the following:
|
||||
|
||||
* **`Order Access for user`**, which defaults to the logged in user. However, requesting access for another user it's also possible.
|
||||
* **`Request membership for group`**, for Merlin5 the **`svc-cluster_merlin5`** must be selected.
|
||||
* **`Justification`**, please add here a short justification why access to Merlin5 is necessary.
|
||||
|
||||
Once submitted, the Merlin responsible will approve the request as soon as possible (within the next few hours on working days). Once the request is approved, *it may take up to 30 minutes to get the account fully configured*.
|
||||
|
||||
## Further documentation
|
||||
|
||||
Further information it's also available in the Linux Central Documentation:
|
||||
|
||||
* [Unix Group / Group Management for users](https://linux.psi.ch/documentation/services/user-guide/unix_groups.html)
|
||||
* [Unix Group / Group Management for group managers](https://linux.psi.ch/documentation/services/admin-guide/unix_groups.html)
|
||||
|
||||
**Special thanks** to the **Linux Central Team** and **AIT** to make this possible.
|
||||
122
docs/merlin6/quick-start-guide/requesting-projects.md
Normal file
122
docs/merlin6/quick-start-guide/requesting-projects.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Requesting a Merlin Project
|
||||
|
||||
A project owns its own storage area in Merlin, which can be accessed by other group members.
|
||||
|
||||
Projects can receive a higher storage quota than user areas and should be the primary way of organizing bigger storage requirements
|
||||
in a multi-user collaboration.
|
||||
|
||||
Access to a project's directories is governed by project members belonging to a common **Unix group**. You may use an existing
|
||||
Unix group or you may have a new Unix group created especially for the project. The **project responsible** will be the owner of
|
||||
the Unix group (*this is important*)!
|
||||
|
||||
This document explains how to request new Unix group, to request membership for existing groups, and the procedure for requesting a Merlin project.
|
||||
|
||||
## About Unix groups
|
||||
|
||||
Before requesting a Merlin project, it is important to have a Unix group that can be used to grant access to it to different members
|
||||
of the project.
|
||||
|
||||
Unix groups in the PSI Active Directory (which is the PSI central database containing user and group information, and more) are defined by the `unx-` prefix, followed by a name.
|
||||
In general, PSI employees working on Linux systems (including HPC clusters, like Merlin) can request for a non-existing Unix group, and can become responsible for managing it.
|
||||
In addition, a list of administrators can be set. The administrators, together with the group manager, can approve or deny membership requests. Further information about this topic
|
||||
is covered in the [Linux Documentation - Services Admin Guides: Unix Groups / Group Management](https://linux.psi.ch/documentation/services/admin-guide/unix_groups.html), managed by the Central Linux Team.
|
||||
|
||||
To gran access to specific Merlin project directories, some users may require to be added to some specific **Unix groups**:
|
||||
|
||||
* Each Merlin project (i.e. `/data/project/{bio|general}/$projectname`) or experiment (i.e. `/data/experiment/$experimentname`) directory has access restricted by ownership and group membership (with a very few exceptions allowing public access).
|
||||
* Users requiring access to a specific restricted project or experiment directory have to request membership for the corresponding Unix group owning the directory.
|
||||
|
||||
### Requesting a new Unix group
|
||||
|
||||
**If you need a new Unix group** to be created, you need to first get this group through a separate
|
||||
**[PSI Service Now ticket](https://psi.service-now.com/psisp)**. **Please use the following template.**
|
||||
You can also specify the login names of the initial group members and the **owner** of the group.
|
||||
The owner of the group is the person who will be allowed to modify the group.
|
||||
|
||||
* Please open an *Incident Request* with subject:
|
||||
|
||||
```text
|
||||
Subject: Request for new unix group xxxx
|
||||
```
|
||||
|
||||
* and base the text field of the request on this template
|
||||
|
||||
```text
|
||||
Dear HelpDesk
|
||||
|
||||
I would like to request a new unix group.
|
||||
|
||||
Unix Group Name: unx-xxxxx
|
||||
Initial Group Members: xxxxx, yyyyy, zzzzz, ...
|
||||
Group Owner: xxxxx
|
||||
Group Administrators: aaaaa, bbbbb, ccccc, ....
|
||||
|
||||
Best regards,
|
||||
```
|
||||
|
||||
### Requesting Unix group membership
|
||||
|
||||
Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper **Unix group** owning that project.
|
||||
Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example:
|
||||
|
||||
```bash
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname
|
||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/bio/$projectname
|
||||
```
|
||||
|
||||
Requesting membership for a specific Unix group *has to be done* with the corresponding **[Request Linux Group Membership](https://psi.service-now.com/psisp?id=psi_new_sc_cat_item&sys_id=84f2c0c81b04f110679febd9bb4bcbb1)** form, available in the [PSI Service Now Service Catalog](https://psi.service-now.com/psisp).
|
||||
|
||||

|
||||
|
||||
Once submitted, the responsible of the Unix group has to approve the request.
|
||||
|
||||
**Important note**: Requesting access to specific Unix Groups will require validation from the responsible of the Unix Group. If you ask for inclusion in many groups it may take longer, since the fulfillment of the request will depend on more people.
|
||||
|
||||
Further information can be found in the [Linux Documentation - Services User guide: Unix Groups / Group Management](https://linux.psi.ch/documentation/services/user-guide/unix_groups.html)
|
||||
|
||||
### Managing Unix Groups
|
||||
|
||||
Other administration operations on Unix Groups it's mainly covered in the [Linux Documentation - Services Admin Guides: Unix Groups / Group Management](https://linux.psi.ch/documentation/services/admin-guide/unix_groups.html), managed by the Central Linux Team.
|
||||
|
||||
## Requesting a Merlin project
|
||||
|
||||
Once a Unix group is available, a Merlin project can be requested.
|
||||
To request a project, please provide the following information in a **[PSI Service Now ticket](https://psi.service-now.com/psisp)**
|
||||
|
||||
* Please open an *Incident Request* with subject:
|
||||
|
||||
```text
|
||||
Subject: [Merlin6] Project Request for project name xxxxxx
|
||||
```
|
||||
|
||||
* and base the text field of the request on this template
|
||||
|
||||
```text
|
||||
Dear HelpDesk
|
||||
|
||||
I would like to request a new Merlin6 project.
|
||||
|
||||
Project Name: xxxxx
|
||||
UnixGroup: xxxxx # Must be an existing Unix Group
|
||||
|
||||
The project responsible is the Owner of the Unix Group.
|
||||
If you need a storage quota exceeding the defaults, please provide a description
|
||||
and motivation for the higher storage needs:
|
||||
|
||||
Storage Quota: 1TB with a maximum of 1M Files
|
||||
Reason: (None for default 1TB/1M)
|
||||
|
||||
Best regards,
|
||||
```
|
||||
|
||||
The **default storage quota** for a project is 1TB (with a maximal *Number of Files* of 1M). If you need a larger assignment, you
|
||||
need to request this and provide a description of your storage needs.
|
||||
|
||||
## Further documentation
|
||||
|
||||
Further information it's also available in the Linux Central Documentation:
|
||||
|
||||
* [Unix Group / Group Management for users](https://linux.psi.ch/documentation/services/user-guide/unix_groups.html)
|
||||
* [Unix Group / Group Management for group managers](https://linux.psi.ch/documentation/services/admin-guide/unix_groups.html)
|
||||
|
||||
**Special thanks** to the **Linux Central Team** and **AIT** to make this possible.
|
||||
Reference in New Issue
Block a user