first stab at mkdocs migration

refactor CSCS and Meg content

add merlin6 quick start

update merlin6 nomachine docs

give the userdoc its own color scheme

we use the Materials default one

refactored slurm general docs merlin6

add merlin6 JB docs

add software support m6 docs

add all files to nav

vibed changes #1

add missing pages

further vibing #2

vibe #3

further fixes
This commit is contained in:
2025-11-26 17:28:07 +01:00
parent 149de6fb18
commit bde174b726
313 changed files with 2608 additions and 11593 deletions

View File

@@ -0,0 +1,47 @@
# Accessing Interactive Nodes
## SSH Access
For interactive command shell access, use an SSH client. We recommend to activate SSH's X11 forwarding to allow you to use graphical
applications (e.g. a text editor, but for more performant graphical access, refer to the sections below). X applications are supported
in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:
* Merlin7 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
* Hence, Merlin7 administrators **do not offer official support** for X11 client setup.
* Nevertheless, a generic guide for X11 client setup (*Linux*, *Windows* and *MacOS*) is provided below.
* PSI desktop configuration issues must be addressed through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
* Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
### Accessing from a Linux client
Refer to [{How To Use Merlin -> Accessing from Linux Clients}](../02-How-To-Use-Merlin/connect-from-windows.md) for **Linux** SSH client and X11 configuration.
### Accessing from a Windows client
Refer to [{How To Use Merlin -> Accessing from Windows Clients}](../02-How-To-Use-Merlin/connect-from-windows.md) for **Windows** SSH client and X11 configuration.
### Accessing from a MacOS client
Refer to [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-Merlin/connect-from-macos.md) for **MacOS** SSH client and X11 configuration.
## NoMachine Remote Desktop Access
X applications are supported in the login nodes and can run efficiently through a **NoMachine** client. This is the officially supported way to run more demanding X applications on Merlin7.
* For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
* For other workstations The client software can be downloaded from the [Nomachine Website](https://www.nomachine.com/product&p=NoMachine%20Enterprise%20Client).
### Configuring NoMachine
Refer to [{How To Use Merlin -> Remote Desktop Access}](../02-How-To-Use-Merlin/nomachine.md) for further instructions of how to configure the NoMachine client and how to access it from PSI and from outside PSI.
## Login nodes hardware description
The Merlin7 login nodes are the official machines for accessing the recources of Merlin7.
From these machines, users can submit jobs to the Slurm batch system as well as visualize or compile their software.
The Merlin7 login nodes are the following:
| Hostname | SSH | NoMachine | Scratch | Scratch Mountpoint |
| ----------------------- | --- | --------- | -------- | :------------------ |
| login001.merlin7.psi.ch | yes | yes | 1TB NVMe | ``/scratch`` |
| login002.merlin7.psi.ch | yes | yes | 1TB NVMe | ``/scratch`` |

View File

@@ -0,0 +1,40 @@
---
title: Accessing Slurm Cluster
#tags:
keywords: slurm, batch system, merlin5, merlin7, gmerlin7, cpu, gpu
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/slurm-access.html
---
## The Merlin Slurm clusters
Merlin contains a multi-cluster setup, where multiple Slurm clusters coexist under the same umbrella.
It basically contains the following clusters:
* The **Merlin7 Slurm CPU cluster**, which is called [**`merlin7`**](#merlin7-cpu-cluster-access).
* The **Merlin7 Slurm GPU cluster**, which is called [**`gmerlin7`**](#merlin7-gpu-cluster-access).
## Accessing the Slurm clusters
Any job submission must be performed from a **Merlin login node**. Please refer to the [**Accessing the Interactive Nodes documentation**](accessing-interactive-nodes.md)
for further information about how to access the cluster.
In addition, any job *must be submitted from a high performance storage area visible by the login nodes and by the computing nodes*. For this, the possible storage areas are the following:
* `/data/user`
* `/data/project`
* `/data/scratch/shared`
### Merlin7 CPU cluster access
The **Merlin7 CPU cluster** (**`merlin7`**) is the default cluster configured in the login nodes. Any job submission will use by default this cluster, unless
the option `--cluster` is specified with another of the existing clusters.
For further information about how to use this cluster, please visit: [**Merlin7 CPU Slurm Cluster documentation**](../03-Slurm-General-Documentation/slurm-configuration.md#cpu-cluster-merlin7).
### Merlin7 GPU cluster access
The **Merlin7 GPU cluster** (**`gmerlin7`**) is visible from the login nodes. However, to submit jobs to this cluster, one needs to specify the option `--cluster=gmerlin7` when submitting a job or allocation.
For further information about how to use this cluster, please visit: [**Merlin7 GPU Slurm Cluster documentation**](../03-Slurm-General-Documentation/slurm-configuration.md#gpu-cluster-gmerlin7).

View File

@@ -0,0 +1,53 @@
---
title: Code Of Conduct
#tags:
keywords: code of conduct, rules, principle, policy, policies, administrator, backup
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/code-of-conduct.html
---
## The Basic principle
The basic principle is courtesy and consideration for other users.
* Merlin7 is a system shared by many users, therefore you are kindly requested to apply common courtesy in using its resources. Please follow our guidelines which aim at providing and maintaining an efficient compute environment for all our users.
* Basic shell programming skills are an essential requirement in a Linux/UNIX HPC cluster environment; a proficiency in shell programming is greatly beneficial.
## Interactive nodes
* The interactive nodes (also known as login nodes) are for development and quick testing:
* It is **strictly forbidden to run production jobs** on the login nodes. All production jobs must be submitted to the batch system.
* It is **forbidden to run long processes** occupying big parts of a login node's resources.
* According to the previous rules, **misbehaving running processes will have to be killed.**
in order to keep the system responsive for other users.
## Batch system
* Make sure that no broken or run-away processes are left when your job is done. Keep the process space clean on all nodes.
* During the runtime of a job, it is mandatory to use the ``/scratch`` and ``/data/scratch/shared`` partitions for temporary data:
* It is **forbidden** to use the ``/data/user`` or ``/data/project`` for that purpose.
* Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.
* Prefer ``/scratch`` over ``/data/scratch/shared`` and _use the latter only when you require the temporary files to be visible from multiple nodes_.
* Read the description in **[Merlin7 directory structure](../02-How-To-Use-Merlin/storage.md#merlin7-directories)** for learning about the correct usage of each partition type.
## User and project data
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
!!! warning
When a user leaves PSI and his account has been removed, her storage space
in Merlin may be recycled. Hence, **when a user leaves PSI**, she, her
supervisor or team **must ensure that the data is backed up to an external
storage**!
## System Administrator Rights
* The system administrator has the right to temporarily block the access to Merlin7 for an account violating the Code of Conduct in order to maintain the efficiency and stability of the system.
* Repetitive violations by the same user will be escalated to the user's supervisor.
* The system administrator has the right to delete files in the **scratch** directories
* after a job, if the job failed to clean up its files.
* during the job in order to prevent a job from destabilizing a node or multiple nodes.
* The system administrator has the right to kill any misbehaving running processes.

View File

@@ -0,0 +1,64 @@
---
title: Introduction
#tags:
keywords: introduction, home, welcome, architecture, design
last_updated: 07 September 2022
sidebar: merlin7_sidebar
permalink: /merlin7/introduction.html
redirect_from:
- /merlin7
- /merlin7/index.html
---
## About Merlin7
The Merlin7 cluster is moving toward **production** state since August 2024, this is expected latest by Q4 2025. Since January 2025 the system has been generally available,
but due to some remaining issues with the platform, the schedule of the migration of users and communities has been delayed. You will be notified well in advance
regarding the migration of data.
All PSI users can request access to Merlin7, please go to the [Requesting Merlin Accounts](requesting-accounts.md) page and complete the steps given there.
In case you identify errors or missing information, please provide feedback through [merlin-admins mailing list](mailto:merlin-admins@lists.psi.ch) mailing list or [submit a ticket using the PSI service portal](https://psi.service-now.com/psisp).
## Infrastructure
### Hardware
The Merlin7 cluster contains the following node specification:
| Node | #N | CPU | RAM | GPU | #GPUs |
| ----: | -- | --- | --- | ----: | ---: |
| Login | 2 | 2 AMD EPYC 7742 (64 Cores 2.25GHz) | 512GB | | |
| CPU | 77 | 2 AMD EPYC 7742 (64 Cores 2.25GHz) | 512GB | | |
| GPU A100 | 8 | 2 AMD EPYC 7713 (64 Cores 3.2GHz) | 512GB | A100 80GB | 4 |
| GPU GH | 5 | NVIDIA ARM Grace Neoverse v2 (144 Cores 3.1GHz) | 864GB (Unified) | GH200 120GB | 4 |
### Network
The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able
to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on at [HPE](https://www.hpe.com/psnow/doc/PSN1012904596HREN) and
at <https://www.glennklockwood.com/garden/slingshot>.
Through software interfaces like [libFabric](https://ofiwg.github.io/libfabric/) (which available on Merlin7), application can leverage the network seamlessly.
### Storage
Unlike previous iteration of the Merlin HPC clusters, Merlin7 _does not_ have any local storage. Instead storage for the entire cluster is provided through
a dedicated storage appliance from HPE/Cray called [ClusterStor](https://www.hpe.com/psnow/doc/PSN1012842049INEN.pdf).
The appliance is built of several storage servers:
* 2 management nodes
* 2 MDS servers, 12 drives per server, 2.9TiB (Raid10)
* 8 OSS-D servers, 106 drives per server, 14.5 T.B HDDs (Gridraid / Raid6)
* 4 OSS-F servers, 12 drives per server 7TiB SSDs (Raid10)
With effective storage capacity of:
* 10 PB HDD
* value visible on linux: HDD 9302.4 TiB
* 162 TB SSD
* value visible on linux: SSD 151.6 TiB
* 23.6 TiB on Metadata
The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.

View File

@@ -0,0 +1,24 @@
---
title: Requesting Merlin Accounts
#tags:
keywords: registration, register, account, merlin5, merlin7, snow, service now
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/request-account.html
---
## Requesting Access to Merlin7
All PSI users can ask for access to the Merlin7 cluster. Access to Merlin7 is regulated by the PSI user's account being a member of the **`svc-cluster_merlin7`** access group.
Requesting **Merlin7** access *has to be done* using the **[Request Linux Group Membership](https://psi.service-now.com/psisp?id=psi_new_sc_cat_item&sys_id=84f2c0c81b04f110679febd9bb4bcbb1)** form, available in [PSI's central Service Catalog](https://psi.service-now.com/psisp) on Service Now.
![Example: Requesting access to Merlin7](../../images/Access/01-request-merlin7-membership.png)
Mandatory fields you need to fill:
* **`Order Access for user:`** Defaults to the logged in user. However, requesting access for another user it's also possible.
* **`Request membership for group:`** Choose**`svc-cluster_merlin7`**.
* **`Justification:`** Please add a short justification of what you will be running on Merlin7.
Once submitted, the Merlin responsibles will approve the request as soon as possible (within the next few hours on working days). Once the request is approved, *it may take up to 30 minutes to get the account fully configured*.

View File

@@ -0,0 +1,123 @@
---
title: Requesting a Merlin Project
#tags:
keywords: merlin project, project, snow, service now
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/request-project.html
---
A project owns its own storage area in Merlin, which can be accessed by other group members.
Projects can receive a higher storage quota than user areas and should be the primary way of organizing bigger storage requirements
in a multi-user collaboration.
Access to a project's directories is governed by project members belonging to a common **Unix group**. You may use an existing
Unix group or you may have a new Unix group created especially for the project. The **project responsible** will be the owner of
the Unix group (*this is important*)!
This document explains how to request new Unix group, to request membership for existing groups, and the procedure for requesting a Merlin project.
## About Unix groups
Before requesting a Merlin project, it is important to have a Unix group that can be used to grant access to it to different members
of the project.
Unix groups in the PSI Active Directory (which is the PSI central database containing user and group information, and more) are defined by the `unx-` prefix, followed by a name.
In general, PSI employees working on Linux systems (including HPC clusters, like Merlin) can request for a non-existing Unix group, and can become responsible for managing it.
In addition, a list of administrators can be set. The administrators, together with the group manager, can approve or deny membership requests. Further information about this topic
is covered in the [Linux Documentation - Services Admin Guides: Unix Groups / Group Management](https://linux.psi.ch/admin-guide/configuration/basic/users_and_groups.html), managed by the Central Linux Team.
To gran access to specific Merlin project directories, some users may require to be added to some specific **Unix groups**:
* Each Merlin project (i.e. `/data/project/{bio|general}/$projectname`) or experiment (i.e. `/data/experiment/$experimentname`) directory has access restricted by ownership and group membership (with a very few exceptions allowing public access).
* Users requiring access to a specific restricted project or experiment directory have to request membership for the corresponding Unix group owning the directory.
### Requesting a new Unix group
**If you need a new Unix group** to be created, you need to first get this group through a separate
**[PSI Service Now ticket](https://psi.service-now.com/psisp)**. **Please use the following template.**
You can also specify the login names of the initial group members and the **owner** of the group.
The owner of the group is the person who will be allowed to modify the group.
* Please open an *Incident Request* with subject:
```
Subject: Request for new unix group xxxx
```
* and base the text field of the request on this template
```
Dear HelpDesk
I would like to request a new unix group.
Unix Group Name: unx-xxxxx
Initial Group Members: xxxxx, yyyyy, zzzzz, ...
Group Owner: xxxxx
Group Administrators: aaaaa, bbbbb, ccccc, ....
Best regards,
```
### Requesting Unix group membership
Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper **Unix group** owning that project.
Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example:
```bash
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/bio/$projectname
```
Requesting membership for a specific Unix group *has to be done* with the corresponding **[Request Linux Group Membership](https://psi.service-now.com/psisp?id=psi_new_sc_cat_item&sys_id=84f2c0c81b04f110679febd9bb4bcbb1)** form, available in the [PSI Service Now Service Catalog](https://psi.service-now.com/psisp).
![Example: Requesting Unix Group membership](../../images/Access/01-request-unx-group-membership.png)
Once submitted, the responsible of the Unix group has to approve the request.
**Important note**: Requesting access to specific Unix Groups will require validation from the responsible of the Unix Group. If you ask for inclusion in many groups it may take longer, since the fulfillment of the request will depend on more people.
Further information can be found in the [Linux Documentation - Services User guide: Unix Groups / Group Management](https://linux.psi.ch/documentation/services/user-guide/unix_groups.html)
### Managing Unix Groups
Other administration operations on Unix Groups it's mainly covered in the [Linux Documentation - Services Admin Guides: Unix Groups / Group Management](https://linux.psi.ch/documentation/services/admin-guide/unix_groups.html), managed by the Central Linux Team.
## Requesting a Merlin project
Once a Unix group is available, a Merlin project can be requested.
To request a project, please provide the following information in a **[PSI Service Now ticket](https://psi.service-now.com/psisp)**
* Please open an *Incident Request* with subject:
```
Subject: [Merlin7] Project Request for project name xxxxxx
```
* and base the text field of the request on this template
```
Dear HelpDesk
I would like to request a new Merlin7 project.
Project Name: xxxxx
UnixGroup: xxxxx # Must be an existing Unix Group
The project responsible is the Owner of the Unix Group.
If you need a storage quota exceeding the defaults, please provide a description
and motivation for the higher storage needs:
Storage Quota: 1TB with a maximum of 1M Files
Reason: (None for default 1TB/1M)
Best regards,
```
The **default storage quota** for a project is 1TB (with a maximal *Number of Files* of 1M). If you need a larger assignment, you
need to request this and provide a description of your storage needs.
## Further documentation
Further information it's also available in the Linux Central Documentation:
* [Unix Group / Group Management for users](https://linux.psi.ch/documentation/services/user-guide/unix_groups.html)
* [Unix Group / Group Management for group managers](https://linux.psi.ch/documentation/services/admin-guide/unix_groups.html)
**Special thanks** to the **Linux Central Team** and **AIT** to make this possible.