initial formatting changes complete

This commit is contained in:
2026-01-06 16:40:15 +01:00
parent 173f822230
commit 5f759a629a
81 changed files with 806 additions and 1113 deletions

View File

@@ -52,4 +52,4 @@ For 2025 we can offer access to [CSCS Alps](https://www.cscs.ch/computers/alps)
* Mailing list contact: <psi-hpc-at-cscs-admin@lists.psi.ch> * Mailing list contact: <psi-hpc-at-cscs-admin@lists.psi.ch>
* Marc Caubet Serrabou <marc.caubet@psi.ch> * Marc Caubet Serrabou <marc.caubet@psi.ch>
* Derek Feichtinger <derek.feichtinger@psi.ch> * Derek Feichtinger <derek.feichtinger@psi.ch>
* Mailing list for receiving user notifications and survey information: psi-hpc-at-cscs@lists.psi.ch [(subscribe)](https://psilists.ethz.ch/sympa/subscribe/psi-hpc-at-cscs) * Mailing list for receiving user notifications and survey information: <psi-hpc-at-cscs@lists.psi.ch> [(subscribe)](https://psilists.ethz.ch/sympa/subscribe/psi-hpc-at-cscs)

View File

@@ -1,12 +1,4 @@
--- # Introduction
title: Introduction
#tags:
#keywords:
last_updated: 28 June 2019
#summary: "GPU Merlin 6 cluster overview"
sidebar: merlin6_sidebar
permalink: /gmerlin6/cluster-introduction.html
---
## About Merlin6 GPU cluster ## About Merlin6 GPU cluster

View File

@@ -1,12 +1,4 @@
--- # Hardware And Software Description
title: Hardware And Software Description
#tags:
#keywords:
last_updated: 19 April 2021
#summary: ""
sidebar: merlin6_sidebar
permalink: /gmerlin6/hardware-and-software.html
---
## Hardware ## Hardware
@@ -145,6 +137,7 @@ ibstat | grep Rate
In the Merlin6 GPU computing nodes, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md). In the Merlin6 GPU computing nodes, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
Due to this, the Merlin6 GPU nodes run: Due to this, the Merlin6 GPU nodes run:
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index) * [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions. * [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) * [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)

View File

@@ -1,12 +1,4 @@
--- # Slurm cluster 'gmerlin6'
title: Slurm cluster 'gmerlin6'
#tags:
keywords: configuration, partitions, node definition, gmerlin6
last_updated: 29 January 2021
summary: "This document describes a summary of the Slurm 'configuration."
sidebar: merlin6_sidebar
permalink: /gmerlin6/slurm-configuration.html
---
This documentation shows basic Slurm configuration and options needed to run jobs in the GPU cluster. This documentation shows basic Slurm configuration and options needed to run jobs in the GPU cluster.
@@ -49,30 +41,35 @@ Users might need to specify the Slurm partition. If no partition is specified, i
The table below resumes shows all possible partitions available to users: The table below resumes shows all possible partitions available to users:
| GPU Partition | Default Time | Max Time | PriorityJobFactor\* | PriorityTier\*\* | | GPU Partition | Default Time | Max Time | PriorityJobFactor | PriorityTier |
|:---------------------: | :----------: | :--------: | :-----------------: | :--------------: | |:---------------------: | :----------: | :--------: | :-----------------: | :--------------: |
| `gpu` | 1 day | 1 week | 1 | 1 | | `gpu` | 1 day | 1 week | 1 | 1 |
| `gpu-short` | 2 hours | 2 hours | 1000 | 500 | | `gpu-short` | 2 hours | 2 hours | 1000 | 500 |
| `gwendolen` | 30 minutes | 2 hours | 1000 | 1000 | | `gwendolen` | 30 minutes | 2 hours | 1000 | 1000 |
| `gwendolen-long`\*\*\* | 30 minutes | 8 hours | 1 | 1 | | `gwendolen-long` | 30 minutes | 8 hours | 1 | 1 |
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority. partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
\*\*Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values. and, if possible, they will preempt running jobs from partitions with lower **PriorityTier** values.
\*\*\***gwnedolen-long** is a special partition which is enabled during non-working hours only. As of _Nov 2023_, the current policy is to disable this partition from Mon to Fri, from 1am to 5pm. However, jobs can be submitted anytime, but can only be scheduled outside this time range. **gwnedolen-long** is a special partition which is enabled during non-working
hours only. As of **Nov 2023**, the current policy is to disable this partition
from Mon to Fri, from 1am to 5pm. However, jobs can be submitted anytime, but
can only be scheduled outside this time range.
### Merlin6 GPU Accounts ### Merlin6 GPU Accounts
Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account. Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account. This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
```bash ```bash
#SBATCH --account=merlin # Possible values: merlin, gwendolen #SBATCH --account=merlin # Possible values: merlin, gwendolen
``` ```
Not all the accounts can be used on all partitions. This is resumed in the table below: Not all the accounts can be used on all partitions. This is resumed in the table below:
| Slurm Account | Slurm Partitions | | Slurm Account | Slurm Partitions |
@@ -86,10 +83,16 @@ Users only need to specify the `gwendolen` account when using the `gwendolen` or
#### The 'gwendolen' account #### The 'gwendolen' account
For running jobs in the **`gwendolen`/`gwendolen-long`** partitions, users must specify the **`gwendolen`** account. For running jobs in the **`gwendolen`/`gwendolen-long`** partitions, users must
The `merlin` account is not allowed to use the Gwendolen partitions. specify the **`gwendolen`** account. The `merlin` account is not allowed to
use the Gwendolen partitions.
Gwendolen is restricted to a set of users belonging to the **`unx-gwendolen`** Unix group. If you belong to a project allowed to use **Gwendolen**, or you are a user which would like to have access to it, please request access to the **`unx-gwendolen`** Unix group through [PSI Service Now](https://psi.service-now.com/): the request will be redirected to the responsible of the project (Andreas Adelmann). Gwendolen is restricted to a set of users belonging to the **`unx-gwendolen`**
Unix group. If you belong to a project allowed to use **Gwendolen**, or you are
a user which would like to have access to it, please request access to the
**`unx-gwendolen`** Unix group through [PSI Service
Now](https://psi.service-now.com/): the request will be redirected to the
responsible of the project (Andreas Adelmann).
### Slurm GPU specific options ### Slurm GPU specific options
@@ -119,10 +122,14 @@ This is detailed in the below table.
#### Constraint / Features #### Constraint / Features
Instead of specifying the GPU **type**, sometimes users would need to **specify the GPU by the amount of memory available in the GPU** card itself. Instead of specifying the GPU **type**, sometimes users would need to **specify
This has been defined in Slurm with **Features**, which is a tag which defines the GPU memory for the different GPU cards. the GPU by the amount of memory available in the GPU** card itself.
Users can specify which GPU memory size needs to be used with the `--constraint` option. In that case, notice that *in many cases
there is not need to specify `[<type>:]`* in the `--gpus` option. This has been defined in Slurm with **Features**, which is a tag which defines
the GPU memory for the different GPU cards. Users can specify which GPU memory
size needs to be used with the `--constraint` option. In that case, notice that
*in many cases there is not need to specify `[<type>:]`* in the `--gpus`
option.
```bash ```bash
#SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_24gb, gpumem_40gb #SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_24gb, gpumem_40gb
@@ -172,6 +179,7 @@ The table below shows the available **Features** and which GPU card models and G
#### Other GPU options #### Other GPU options
Alternative Slurm options for GPU based jobs are available. Please refer to the **man** pages Alternative Slurm options for GPU based jobs are available. Please refer to the **man** pages
for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`). for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`).
Below are listed the most common settings: Below are listed the most common settings:
@@ -192,6 +200,7 @@ Please, notice that when defining `[<type>:]` once, then all other options must
#### Dealing with Hyper-Threading #### Dealing with Hyper-Threading
The **`gmerlin6`** cluster contains the partitions `gwendolen` and `gwendolen-long`, which have a node with Hyper-Threading enabled. The **`gmerlin6`** cluster contains the partitions `gwendolen` and `gwendolen-long`, which have a node with Hyper-Threading enabled.
In that case, one should always specify whether to use Hyper-Threading or not. If not defined, Slurm will In that case, one should always specify whether to use Hyper-Threading or not. If not defined, Slurm will
generally use it (exceptions apply). For this machine, generally HT is recommended. generally use it (exceptions apply). For this machine, generally HT is recommended.
@@ -219,6 +228,7 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am | | **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount * With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB. (default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**. Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**.
As there are no more existing QoS during the week temporary overriding job limits (this happens for As there are no more existing QoS during the week temporary overriding job limits (this happens for
@@ -226,6 +236,7 @@ instance in the CPU **daily** partition), the job needs to be cancelled, and the
must be adapted according to the above resource limits. must be adapted according to the above resource limits.
* The **gwendolen** and **gwendolen-long** partitions are two special partitions for a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine. * The **gwendolen** and **gwendolen-long** partitions are two special partitions for a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine.
Only users belonging to the **`unx-gwendolen`** Unix group can run in these partitions. No limits are applied (machine resources can be completely used). Only users belonging to the **`unx-gwendolen`** Unix group can run in these partitions. No limits are applied (machine resources can be completely used).
* The **`gwendolen-long`** partition is available 24h. However, * The **`gwendolen-long`** partition is available 24h. However,
@@ -234,9 +245,11 @@ Only users belonging to the **`unx-gwendolen`** Unix group can run in these part
### Per user limits for GPU partitions ### Per user limits for GPU partitions
These limits apply exclusively to users. In other words, there is a maximum of resources a single user can use. These limits apply exclusively to users. In other words, there is a maximum of
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)` resources a single user can use. Limits are defined using QoS, and this is
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`): usually set at the partition level. Limits are described in the table below
with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be listed
with the command `sacctmgr show qos`):
| Partition | Slurm Account | Mon-Sun 0h-24h | | Partition | Slurm Account | Mon-Sun 0h-24h |
|:------------------:| :----------------: | :---------------------------------------------: | |:------------------:| :----------------: | :---------------------------------------------: |
@@ -246,12 +259,17 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am | | **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB. * With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**.
In that case, job can wait in the queue until some of the running resources are freed. Jobs sent by any user already exceeding such limits will stay in the queue
with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**. In that case, job can
wait in the queue until some of the running resources are freed.
* Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc. * Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc.
Please try to avoid occupying all GPUs of the same type for several hours or multiple days, otherwise it would block other users needing the same
type of GPU. !!! warning
Please try to avoid occupying all GPUs of the same type for several hours or
multiple days, otherwise it would block other users needing the same type of
GPU.
## Advanced Slurm configuration ## Advanced Slurm configuration
@@ -265,4 +283,8 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access. * ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files. The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).
Configuration files for the old **merlin5** cluster or for the **gmerlin6**
cluster must be checked directly on any of the **merlin5** or **gmerlin6**
computing nodes (in example, by login in to one of the nodes while a job or an
active allocation is running).

BIN
docs/images/merlin_cave.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.4 MiB

View File

@@ -6,9 +6,9 @@ hide:
# HPCE User Documentation # HPCE User Documentation
![The HPCE clusters](images/front_page.png){ width="500" } ![The HPCE clusters](images/merlin_cave.png){ width="650px" }
/// caption /// caption
The magical trio 🪄 _Within his lair, the wizard ever strives for the perfection of his art._
/// ///
The [HPCE The [HPCE
@@ -16,11 +16,7 @@ group](https://www.psi.ch/en/awi/high-performance-computing-and-emerging-technol
is part of the [PSI Center for Scientific Computing, Theory and is part of the [PSI Center for Scientific Computing, Theory and
Data](https://www.psi.ch/en/csd) at [Paul Scherrer Data](https://www.psi.ch/en/csd) at [Paul Scherrer
Institute](https://www.psi.ch). It provides a range of HPC services for PSI Institute](https://www.psi.ch). It provides a range of HPC services for PSI
scientists, such as the Merlin series of HPC clusters, and also engages in researchers, staff, and external collaborators, such as the Merlin series of
research activities on technologies (data analysis and machine learning HPC clusters. Furthermore the HPCE group engages in research activities on
technologies) used on these systems. technologies (data analysis and machine learning technologies) used on these
systems.
## Quick Links
- user support
- news

View File

@@ -38,6 +38,7 @@ A `experiment_migration.setup` migration script must be executed from **any MeG
* The script **must be executed after every reboot** of the destination nodes. * The script **must be executed after every reboot** of the destination nodes.
* **Reason:** On Merlin7, the home directory for the `root` user resides on ephemeral storage (no physical disk). * **Reason:** On Merlin7, the home directory for the `root` user resides on ephemeral storage (no physical disk).
After a reboot, this directory is cleaned, so **SSH keys need to be redeployed** before running the migration again. After a reboot, this directory is cleaned, so **SSH keys need to be redeployed** before running the migration again.
#### When using a PSI Active Directory (AD) account #### When using a PSI Active Directory (AD) account
@@ -76,17 +77,21 @@ If you are stuck, email: [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists
* Please, before starting the transfer ensure that: * Please, before starting the transfer ensure that:
* The source and destination directories are correct. * The source and destination directories are correct.
* The destination directories exist. * The destination directories exist.
2. **Run additional syncs if needed** 2. **Run additional syncs if needed**
* Subsequent syncs can be executed to transfer changes. * Subsequent syncs can be executed to transfer changes.
* Ensure that **only one sync for the same directory runs at a time**. * Ensure that **only one sync for the same directory runs at a time**.
* Multiple syncs are often required since the first one may take several hours or even days. * Multiple syncs are often required since the first one may take several hours or even days.
3. Schedule a date for the final migration: 3. Schedule a date for the final migration:
* Any activity must be stopped on the source directory. * Any activity must be stopped on the source directory.
* In the same way, no activity must be done on the destination until the migration is complete. * In the same way, no activity must be done on the destination until the migration is complete.
4. **Perform a final sync with the `-E` option** (if it applies) 4. **Perform a final sync with the `-E` option** (if it applies)
* Use `-E` **only if you need to delete files on the destination that were removed from the source.** * Use `-E` **only if you need to delete files on the destination that were removed from the source.**
* This ensures the destination becomes an exact mirror of the source. * This ensures the destination becomes an exact mirror of the source.
* **Never use `-E` after the destination has gone into production**, as it will delete new data created there. * **Never use `-E` after the destination has gone into production**, as it will delete new data created there.
5. Disable access on the source folder. 5. Disable access on the source folder.
6. Enable access on the destination folder. 6. Enable access on the destination folder.
* At this point, **no new syncs have to be performed.** * At this point, **no new syncs have to be performed.**

View File

@@ -1,18 +1,11 @@
--- # Hardware And Software Description
title: Hardware And Software Description
#tags:
#keywords:
last_updated: 09 April 2021
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin5/hardware-and-software.html
---
## Hardware ## Hardware
### Computing Nodes ### Computing Nodes
Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster). Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis. * Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps** * Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
* 16 internal ports for intra chassis communication * 16 internal ports for intra chassis communication
@@ -91,6 +84,7 @@ However, this is an old version of Infiniband which requires older drivers and s
In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md). In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
Due to this, Merlin5 runs: Due to this, Merlin5 runs:
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index) * [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions. * [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) * [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)

View File

@@ -1,12 +1,4 @@
--- # Slurm Configuration
title: Slurm Configuration
#tags:
keywords: configuration, partitions, node definition
last_updated: 20 May 2021
summary: "This document describes a summary of the Merlin5 Slurm configuration."
sidebar: merlin6_sidebar
permalink: /merlin5/slurm-configuration.html
---
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster. This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster.
@@ -28,7 +20,6 @@ consider the memory as a *consumable resource*. Hence, users can *oversubscribe*
this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago. this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago.
If you know that this might be a problem for you, please, always use Merlin6 instead. If you know that this might be a problem for you, please, always use Merlin6 instead.
## Running jobs in the 'merlin5' cluster ## Running jobs in the 'merlin5' cluster
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster. In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster.

View File

@@ -1,12 +1,4 @@
--- # Downtimes
title: Downtimes
#tags:
#keywords:
last_updated: 28 June 2019
#summary: "Merlin 6 cluster overview"
sidebar: merlin6_sidebar
permalink: /merlin6/downtimes.html
---
On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance. On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance.
Users will be informed with at least one week in advance when a downtime is scheduled for the next month. Users will be informed with at least one week in advance when a downtime is scheduled for the next month.
@@ -22,7 +14,9 @@ Scheduled downtimes mostly affecting the storage and Slurm configurantions may r
When this is required, users will be informed accordingly. Two different types of draining are possible: When this is required, users will be informed accordingly. Two different types of draining are possible:
* **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition. * **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition.
Jobs already running on the partition continue to run. This will be the **default** drain method. Jobs already running on the partition continue to run. This will be the **default** drain method.
* **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message), * **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message),
but jobs already queued on the partition may be allocated to nodes and run. but jobs already queued on the partition may be allocated to nodes and run.

View File

@@ -1,12 +1,4 @@
--- # Past Downtimes
title: Past Downtimes
#tags:
#keywords:
last_updated: 03 September 2019
#summary: "Merlin 6 cluster overview"
sidebar: merlin6_sidebar
permalink: /merlin6/past-downtimes.html
---
## Past Downtimes: Log Changes ## Past Downtimes: Log Changes

View File

@@ -1,12 +1,4 @@
--- # Contact
title: Contact
#tags:
keywords: contact, support, snow, service now, mailing list, mailing, email, mail, merlin-admins@lists.psi.ch, merlin-users@lists.psi.ch, merlin users
last_updated: 07 September 2022
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/contact.html
---
## Support ## Support

View File

@@ -1,14 +1,4 @@
--- # FAQ
title: FAQ
#tags:
keywords: faq, frequently asked questions, support
last_updated: 27 October 2022
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/faq.html
---
{%include toc.html %}
## How do I register for Merlin? ## How do I register for Merlin?
@@ -35,7 +25,7 @@ How to install depends a bit on the software itself. There are three common inst
2. *source compilation* using make/cmake/autoconfig/etc. Usually the compilation scripts accept a `--prefix=/data/user/$USER` directory for where to install it. Then they place files under `<prefix>/bin`, `<prefix>/lib`, etc. The exact syntax should be documented in the installation instructions. 2. *source compilation* using make/cmake/autoconfig/etc. Usually the compilation scripts accept a `--prefix=/data/user/$USER` directory for where to install it. Then they place files under `<prefix>/bin`, `<prefix>/lib`, etc. The exact syntax should be documented in the installation instructions.
3. *conda environment*. This is now becoming standard for python-based software, including lots of the AI tools. First follow the [initial setup instructions](../software-support/python.md#anaconda) to configure conda to use /data/user instead of your home directory. Then you can create environments like: 3. *conda environment*. This is now becoming standard for python-based software, including lots of the AI tools. First follow the [initial setup instructions](../software-support/python.md#anaconda) to configure conda to use /data/user instead of your home directory. Then you can create environments like:
``` ```bash
module load anaconda/2019.07 module load anaconda/2019.07
# if they provide environment.yml # if they provide environment.yml
conda env create -f environment.yml conda env create -f environment.yml

View File

@@ -1,12 +1,4 @@
--- # Known Problems
title: Known Problems
#tags:
keywords: "known problems, troubleshooting, illegal instructions, paraview, ansys, shell, opengl, mesa, vglrun, module: command not found, error"
last_updated: 07 September 2022
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/known-problems.html
---
## Common errors ## Common errors
@@ -72,7 +64,6 @@ echo 'In this example, by setting an environment variable SRUN_CPUS_PER_TASK'
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun python -c "import os; print(os.sched_getaffinity(0))" srun python -c "import os; print(os.sched_getaffinity(0))"
(base)[caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2 (base)[caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2
Submitted batch job 8000815 Submitted batch job 8000815
@@ -84,7 +75,6 @@ In this example, by setting an environment variable SRUN_CPUS_PER_TASK
{1, 2, 3, 4, 45, 46, 47, 48} {1, 2, 3, 4, 45, 46, 47, 48}
``` ```
## General topics ## General topics
### Default SHELL ### Default SHELL

View File

@@ -1,12 +1,4 @@
--- # Migration From Merlin5
title: Migration From Merlin5
#tags:
keywords: merlin5, merlin6, migration, rsync, archive, archiving, lts, long-term storage
last_updated: 07 September 2022
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/migrating.html
---
## Directories ## Directories
@@ -33,6 +25,7 @@ where:
* **USR**: Quota is setup individually per user name * **USR**: Quota is setup individually per user name
* **GRP**: Quota is setup individually per Unix Group name * **GRP**: Quota is setup individually per Unix Group name
* **Fileset**: Quota is setup per project root directory. * **Fileset**: Quota is setup per project root directory.
* User data directory ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created. * User data directory ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
* Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded. * Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded.

View File

@@ -1,12 +1,4 @@
--- # Troubleshooting
title: Troubleshooting
#tags:
keywords: troubleshooting, problems, faq, known problems
last_updated: 07 September 2022
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/troubleshooting.html
---
For troubleshooting, please contact us through the official channels. See [Contact](contact.md) For troubleshooting, please contact us through the official channels. See [Contact](contact.md)
for more information. for more information.

View File

@@ -1,22 +1,16 @@
--- # Hardware And Software Description
title: Hardware And Software Description
#tags:
#keywords:
last_updated: 13 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/hardware-and-software.html
---
## Hardware ## Hardware
### Computing Nodes ### Computing Nodes
The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw) The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw)
* *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades. * *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades.
* A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements. * A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains: The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains:
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw) * 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity) * 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity) * 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
@@ -142,6 +136,7 @@ The connectivity is based on **ConnectX-5 EDR-100Gbps** for the new login nodes,
### Storage ### Storage
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5). The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
* 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures. * 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures.
* Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**). * Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**).
@@ -151,11 +146,13 @@ The storage solution is connected to the HPC clusters through 2 x **Mellanox SB7
Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
extremely efficient MPI-based jobs: extremely efficient MPI-based jobs:
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth. * Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth. * Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth. * Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis): Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
* 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible. * 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
* 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode. * 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
* 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution. * 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution.
@@ -164,6 +161,7 @@ Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniban
## Software ## Software
In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs: In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs:
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index) * [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions. * [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html) * [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)

View File

@@ -48,11 +48,12 @@ Below are the main steps for using the Data Catalog.
**`/data/project`**. It would be also necessary when the Merlin export server (**`merlin-archive.psi.ch`**) **`/data/project`**. It would be also necessary when the Merlin export server (**`merlin-archive.psi.ch`**)
is down for any reason. is down for any reason.
* Archive the dataset: * Archive the dataset:
* Visit [https://discovery.psi.ch](https://discovery.psi.ch) * Visit [<https://discovery.psi.ch](https://discovery.psi.ch>)
* Click **`Archive`** for the dataset * Click **`Archive`** for the dataset
* The system will now copy the data to the PetaByte Archive at CSCS * The system will now copy the data to the PetaByte Archive at CSCS
* Retrieve data from the catalog: * Retrieve data from the catalog:
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **`Retrieve`** * Find the dataset on [<https://discovery.psi.ch](https://discovery.psi.ch>) and click **`Retrieve`**
* Wait for the data to be copied to the PSI retrieval system * Wait for the data to be copied to the PSI retrieval system
* Run **`datasetRetriever`** script * Run **`datasetRetriever`** script
@@ -266,7 +267,6 @@ step will take a long time and may appear to have hung. You can check what files
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json /data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
2019/11/06 11:04:43 Latest version: 1.1.11 2019/11/06 11:04:43 Latest version: 1.1.11
2019/11/06 11:04:43 Your version of this program is up-to-date 2019/11/06 11:04:43 Your version of this program is up-to-date
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment... 2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
2019/11/06 11:04:43 Your username: 2019/11/06 11:04:43 Your username:
@@ -316,7 +316,6 @@ user_n@pb-archive.psi.ch's password:
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case). 2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
The data must first be copied to a rsync cache server. The data must first be copied to a rsync cache server.
2019/11/06 11:05:04 Do you want to continue (Y/n)? 2019/11/06 11:05:04 Do you want to continue (Y/n)?
Y Y
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012 2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012

View File

@@ -1,12 +1,4 @@
--- # Connecting from a MacOS Client
title: Connecting from a MacOS Client
#tags:
keywords: MacOS, mac os, mac, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a MacOS client."
sidebar: merlin6_sidebar
permalink: /merlin6/connect-from-macos.html
---
## SSH without X11 Forwarding ## SSH without X11 Forwarding

View File

@@ -1,11 +1,4 @@
--- # Connecting from a Windows Client
title: Connecting from a Windows Client
keywords: microsoft, mocosoft, windows, putty, xming, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a Windows client."
sidebar: merlin6_sidebar
permalink: /merlin6/connect-from-windows.html
---
## SSH with PuTTY without X11 Forwarding ## SSH with PuTTY without X11 Forwarding
@@ -13,6 +6,7 @@ PuTTY is one of the most common tools for SSH.
Check, if the following software packages are installed on the Windows workstation by Check, if the following software packages are installed on the Windows workstation by
inspecting the *Start* menu (hint: use the *Search* box to save time): inspecting the *Start* menu (hint: use the *Search* box to save time):
* PuTTY (should be already installed) * PuTTY (should be already installed)
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding)) * *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
@@ -28,7 +22,6 @@ If they are missing, you can install them using the Software Kiosk icon on the D
![Create Merlin Session](../../images/PuTTY/Putty_Session.png) ![Create Merlin Session](../../images/PuTTY/Putty_Session.png)
## SSH with PuTTY with X11 Forwarding ## SSH with PuTTY with X11 Forwarding
Official X11 Forwarding support is through NoMachine. Please follow the document Official X11 Forwarding support is through NoMachine. Please follow the document

View File

@@ -1,12 +1,4 @@
--- # Kerberos and AFS authentication
title: Kerberos and AFS authentication
#tags:
keywords: kerberos, AFS, kinit, klist, keytab, tickets, connecting, client, configuration, slurm
last_updated: 07 September 2022
summary: "This document describes how to use Kerberos."
sidebar: merlin6_sidebar
permalink: /merlin6/kerberos.html
---
Projects and users have their own areas in the central PSI AFS service. In order Projects and users have their own areas in the central PSI AFS service. In order
to access to these areas, valid Kerberos and AFS tickets must be granted. to access to these areas, valid Kerberos and AFS tickets must be granted.
@@ -20,7 +12,6 @@ time is 10 hours. It means than one needs to constantly renew (`krenew` command)
granting tickets, and their validity can not be extended longer than 7 days. At this point, granting tickets, and their validity can not be extended longer than 7 days. At this point,
one needs to obtain new granting tickets. one needs to obtain new granting tickets.
## Obtaining granting tickets with username and password ## Obtaining granting tickets with username and password
As already described above, the most common use case is to obtain Kerberos and AFS granting tickets As already described above, the most common use case is to obtain Kerberos and AFS granting tickets
@@ -28,6 +19,7 @@ by introducing username and password:
* When login to Merlin through SSH protocol, if this is done with username + password authentication, * When login to Merlin through SSH protocol, if this is done with username + password authentication,
tickets for Kerberos and AFS will be automatically obtained. tickets for Kerberos and AFS will be automatically obtained.
* When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to * When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to
run `kinit` (to obtain a granting Kerberos ticket) followed by `aklog` (to obtain a granting AFS ticket). run `kinit` (to obtain a granting Kerberos ticket) followed by `aklog` (to obtain a granting AFS ticket).
See further details below. See further details below.
@@ -52,13 +44,13 @@ krenew
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit, * Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
and then `kinit` should be used instead. and then `kinit` should be used instead.
## Obtanining granting tickets with keytab ## Obtanining granting tickets with keytab
Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs
requiring access to private areas in AFS. For that, there's the possibility to generate a **keytab** file. requiring access to private areas in AFS. For that, there's the possibility to generate a **keytab** file.
Be aware that the **keytab** file must be **private**, **fully protected** by correct permissions and not shared with any Be aware that the **keytab** file must be **private**, **fully protected** by correct permissions and not shared with any
other users. other users.
### Creating a keytab file ### Creating a keytab file
@@ -70,6 +62,7 @@ For generating a **keytab**, one has to:
module load krb5/1.20 module load krb5/1.20
``` ```
2. Create a private directory for storing the Kerberos **keytab** file 2. Create a private directory for storing the Kerberos **keytab** file
```bash ```bash
mkdir -p ~/.k5 mkdir -p ~/.k5
``` ```
@@ -78,6 +71,7 @@ mkdir -p ~/.k5
ktutil ktutil
``` ```
4. In the `ktutil` console, one has to generate a **keytab** file as follows: 4. In the `ktutil` console, one has to generate a **keytab** file as follows:
```bash ```bash
# Replace $USER by your username # Replace $USER by your username
add_entry -password -k 0 -f -p $USER add_entry -password -k 0 -f -p $USER
@@ -85,6 +79,7 @@ wkt /psi/home/$USER/.k5/krb5.keytab
exit exit
``` ```
Notice that you will need to add your password once. This step is required for generating the **keytab** file. Notice that you will need to add your password once. This step is required for generating the **keytab** file.
5. Once back to the main shell, one has to ensure that the file contains the proper permissions: 5. Once back to the main shell, one has to ensure that the file contains the proper permissions:
```bash ```bash
chmod 0600 ~/.k5/krb5.keytab chmod 0600 ~/.k5/krb5.keytab
@@ -112,14 +107,17 @@ The steps should be the following:
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")" export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
``` ```
* To obtain a Kerberos5 granting ticket, run `kinit` by using your keytab: * To obtain a Kerberos5 granting ticket, run `kinit` by using your keytab:
```bash ```bash
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
``` ```
* To obtain a granting AFS ticket, run `aklog`: * To obtain a granting AFS ticket, run `aklog`:
```bash ```bash
aklog aklog
``` ```
* At the end of the job, you can remove destroy existing Kerberos tickets. * At the end of the job, you can remove destroy existing Kerberos tickets.
```bash ```bash
kdestroy kdestroy
``` ```

View File

@@ -1,13 +1,4 @@
--- # Configuring SSH Keys in Merlin
title: Configuring SSH Keys in Merlin
#tags:
keywords: linux, connecting, client, configuration, SSH, Keys, SSH-Keys, RSA, authorization, authentication
last_updated: 15 Jul 2020
summary: "This document describes how to deploy SSH Keys in Merlin."
sidebar: merlin6_sidebar
permalink: /merlin6/ssh-keys.html
---
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password. Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys. One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
@@ -30,6 +21,7 @@ For creating **SSH RSA Keys**, one should:
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future. 1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm. * Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory. * This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows: 2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
```bash ```bash
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

View File

@@ -8,6 +8,7 @@ This document describes the different directories of the Merlin6 cluster.
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.). * ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`** * **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account. * ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
!!! warning !!! warning
@@ -31,9 +32,11 @@ merlin_quotas
Merlin6 offers the following directory classes for users: Merlin6 offers the following directory classes for users:
* ``/psi/home/<username>``: Private user **home** directory * ``/psi/home/<username>``: Private user **home** directory
* ``/data/user/<username>``: Private user **data** directory * ``/data/user/<username>``: Private user **data** directory
* ``/data/project/general/<projectname>``: Shared **Project** directory * ``/data/project/general/<projectname>``: Shared **Project** directory
* For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists. * For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists.
* ``/scratch``: Local *scratch* disk (only visible by the node running a job). * ``/scratch``: Local *scratch* disk (only visible by the node running a job).
* ``/shared-scratch``: Shared *scratch* disk (visible from all nodes). * ``/shared-scratch``: Shared *scratch* disk (visible from all nodes).
* ``/export``: Export directory for data transfer, visible from `ra-merlin-01.psi.ch`, `ra-merlin-02.psi.ch` and Merlin login nodes. * ``/export``: Export directory for data transfer, visible from `ra-merlin-01.psi.ch`, `ra-merlin-02.psi.ch` and Merlin login nodes.
@@ -95,6 +98,7 @@ quota -s
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies. * Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
* Is **forbidden** to use the home directories for IO intensive tasks * Is **forbidden** to use the home directories for IO intensive tasks
* Use `/scratch`, `/shared-scratch`, `/data/user` or `/data/project` for this purpose. * Use `/scratch`, `/shared-scratch`, `/data/user` or `/data/project` for this purpose.
* Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**. * Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**.
Snapshots can be accessed at this path: Snapshots can be accessed at this path:
@@ -122,6 +126,7 @@ mmlsquota -u <username> --block-size auto merlin-user
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies. * Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime. * Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
* Use ``/scratch``, ``/shared-scratch`` for this purpose. * Use ``/scratch``, ``/shared-scratch`` for this purpose.
* No backup policy is applied for user data directories: users are responsible for backing up their data. * No backup policy is applied for user data directories: users are responsible for backing up their data.
### Project data directory ### Project data directory

View File

@@ -1,12 +1,4 @@
--- # Transferring Data
title: Transferring Data
#tags:
keywords: transferring data, data transfer, rsync, winscp, copy data, copying, sftp, import, export, hopx, vpn
last_updated: 24 August 2023
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/transfer-data.html
---
## Overview ## Overview
@@ -24,7 +16,6 @@ visibility.
- Systems on the internet can access the [PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer) service - Systems on the internet can access the [PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer) service
`datatransfer.psi.ch`, using ssh-based protocols and [Globus](https://www.globus.org/) `datatransfer.psi.ch`, using ssh-based protocols and [Globus](https://www.globus.org/)
## Direct transfer via Merlin6 login nodes ## Direct transfer via Merlin6 login nodes
The following methods transfer data directly via the [login The following methods transfer data directly via the [login
@@ -50,7 +41,6 @@ rsync -avAHXS ~/localdata user@merlin-l-01.psi.ch:/data/project/general/myprojec
You can resume interrupted transfers by simply rerunning the command. Previously You can resume interrupted transfers by simply rerunning the command. Previously
transferred files will be skipped. transferred files will be skipped.
### WinSCP ### WinSCP
The WinSCP tool can be used for remote file transfer on Windows. It is available The WinSCP tool can be used for remote file transfer on Windows. It is available
@@ -84,9 +74,11 @@ The following filesystems are mounted:
* `/merlin/export` which points to the `/export` directory in Merlin. * `/merlin/export` which points to the `/export` directory in Merlin.
* `/merlin/data/experiment/mu3e` which points to the `/data/experiment/mu3e` directories in Merlin. * `/merlin/data/experiment/mu3e` which points to the `/data/experiment/mu3e` directories in Merlin.
* Mu3e sub-directories are mounted in RW (read-write), except for `data` (read-only mounted) * Mu3e sub-directories are mounted in RW (read-write), except for `data` (read-only mounted)
* `/merlin/data/project/general` which points to the `/data/project/general` directories in Merlin. * `/merlin/data/project/general` which points to the `/data/project/general` directories in Merlin.
* Owners of Merlin projects should request explicit access to it. * Owners of Merlin projects should request explicit access to it.
* Currently, only `CSCS` is available for transferring files between PizDaint/Alps and Merlin * Currently, only `CSCS` is available for transferring files between PizDaint/Alps and Merlin
* `/merlin/data/project/bio` which points to the `/data/project/bio` directories in Merlin. * `/merlin/data/project/bio` which points to the `/data/project/bio` directories in Merlin.
* `/merlin/data/user` which points to the `/data/user` directories in Merlin. * `/merlin/data/user` which points to the `/data/user` directories in Merlin.
@@ -128,6 +120,7 @@ For exporting data from Merlin to outside PSI by using `/export`, one has to:
For importing data from outside PSI to Merlin by using `/export`, one has to: For importing data from outside PSI to Merlin by using `/export`, one has to:
* From **`datatransfer.psi.ch`**, copy the data from outside PSI to `/merlin/export`. * From **`datatransfer.psi.ch`**, copy the data from outside PSI to `/merlin/export`.
Ensure to properly secure your directories and files with proper permissions. Ensure to properly secure your directories and files with proper permissions.
* Once data is copied, from a Merlin login node, copy your data from `/export` to any directory (i.e. `/data/project`, `/data/user`, `/scratch`). * Once data is copied, from a Merlin login node, copy your data from `/export` to any directory (i.e. `/data/project`, `/data/user`, `/scratch`).
@@ -161,5 +154,4 @@ provides a helpful wrapper over the Gnome storage utilities, and provides suppor
- FTP, SFTP - FTP, SFTP
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome) - [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
[More instruction on using `merlin_rmount`](../software-support/merlin-rmount.md) [More instruction on using `merlin_rmount`](../software-support/merlin-rmount.md)

View File

@@ -1,12 +1,3 @@
---
#tags:
keywords: Pmodules, software, stable, unstable, deprecated, overlay, overlays, release stage, module, package, packages, library, libraries
last_updated: 07 September 2022
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/using-modules.html
---
# Using PModules # Using PModules
## Environment Modules ## Environment Modules

View File

@@ -5,15 +5,17 @@
* The new Slurm CPU cluster is called **`merlin6`**. * The new Slurm CPU cluster is called **`merlin6`**.
* The new Slurm GPU cluster is called [**`gmerlin6`**](../gmerlin6/cluster-introduction.md) * The new Slurm GPU cluster is called [**`gmerlin6`**](../gmerlin6/cluster-introduction.md)
* The old Slurm *merlin* cluster is still active and best effort support is provided. * The old Slurm *merlin* cluster is still active and best effort support is provided.
The cluster, was renamed as [**merlin5**](../merlin5/cluster-introduction.md). The cluster, was renamed as [**merlin5**](../merlin5/cluster-introduction.md).
From July 2019, **`merlin6`** becomes the **default Slurm cluster** and any job submitted from the login node will be submitted to that cluster if not. From July 2019, **`merlin6`** becomes the **default Slurm cluster** and any job submitted from the login node will be submitted to that cluster if not.
* Users can keep submitting to the old *`merlin5`* computing nodes by using the option ``--cluster=merlin5``. * Users can keep submitting to the old *`merlin5`* computing nodes by using the option ``--cluster=merlin5``.
* Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``. * Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``.
### Slurm 'merlin6' ### Slurm 'merlin6'
**CPU nodes** are configured in a **Slurm** cluster, called **`merlin6`**, and **CPU nodes** are configured in a **Slurm** cluster, called **`merlin6`**, and
this is the _**default Slurm cluster**_. Hence, by default, if no Slurm cluster is this is the ***default Slurm cluster***. Hence, by default, if no Slurm cluster is
specified (with the `--cluster` option), this will be the cluster to which the jobs specified (with the `--cluster` option), this will be the cluster to which the jobs
will be sent. will be sent.

View File

@@ -23,12 +23,14 @@ The basic principle is courtesy and consideration for other users.
* It is **forbidden** to use the ``/data/user``, ``/data/project`` or ``/psi/home/`` for that purpose. * It is **forbidden** to use the ``/data/user``, ``/data/project`` or ``/psi/home/`` for that purpose.
* Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes. * Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.
* Prefer ``/scratch`` over ``/shared-scratch`` and use the latter only when you require the temporary files to be visible from multiple nodes. * Prefer ``/scratch`` over ``/shared-scratch`` and use the latter only when you require the temporary files to be visible from multiple nodes.
* Read the description in **[Merlin6 directory structure](../how-to-use-merlin/storage.md#merlin6-directories)** for learning about the correct usage of each partition type. * Read the description in **[Merlin6 directory structure](../how-to-use-merlin/storage.md#merlin6-directories)** for learning about the correct usage of each partition type.
## User and project data ## User and project data
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.). * ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`** * **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account. * ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
!!! warning !!! warning

View File

@@ -1,12 +1,4 @@
--- # Slurm Configuration
title: Slurm Configuration
#tags:
keywords: configuration, partitions, node definition
last_updated: 29 January 2021
summary: "This document describes a summary of the Merlin6 configuration."
sidebar: merlin6_sidebar
permalink: /merlin6/slurm-configuration.html
---
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin6 CPU cluster. This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin6 CPU cluster.
@@ -28,6 +20,7 @@ If nothing is specified, by default each core will use up to 8GB of memory. Memo
In **`merlin6`**, Memory is considered a Consumable Resource, as well as the CPU. Hence, both resources will account when submitting a job, In **`merlin6`**, Memory is considered a Consumable Resource, as well as the CPU. Hence, both resources will account when submitting a job,
and by default resources can not be oversubscribed. This is a main difference with the old **`merlin5`** cluster, when only CPU were accounted, and by default resources can not be oversubscribed. This is a main difference with the old **`merlin5`** cluster, when only CPU were accounted,
and memory was by default oversubscribed. and memory was by default oversubscribed.
!!! tip "Check Configuration" !!! tip "Check Configuration"
@@ -66,12 +59,12 @@ The following *partitions* (also known as *queues*) are configured in Slurm:
| **asa-ansys** | 1 hour | 90 days | unlimited | 1000 | 4 | 15600 | | **asa-ansys** | 1 hour | 90 days | unlimited | 1000 | 4 | 15600 |
| **mu3e** | 1 day | 7 days | unlimited | 1000 | 4 | 3712 | | **mu3e** | 1 day | 7 days | unlimited | 1000 | 4 | 3712 |
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority The **PriorityJobFactor** value will be added to the job priority (**PARTITION** column in `sprio -l` ). In other words, jobs sent to higher priority
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority. partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
**\*\***Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values. and, if possible, they will preempt running jobs from partitions with lower **PriorityTier** values.
* The **`general`** partition is the **default**. It can not have more than 50 nodes running jobs. * The **`general`** partition is the **default**. It can not have more than 50 nodes running jobs.
* For **`daily`** this limitation is extended to 67 nodes. * For **`daily`** this limitation is extended to 67 nodes.
@@ -79,11 +72,18 @@ and, if possible, they will preempt running jobs from partitions with lower *Pri
* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources. * **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
!!! tip "Partition Selection" !!! tip "Partition Selection"
Jobs which would run for less than one day should be always sent to **daily**, while jobs that would run for less than one hour should be sent to **hourly**. This would ensure that you have highest priority over jobs sent to partitions with less priority, but also because **general** has limited the number of nodes that can be used for that. The idea behind that, is that the cluster can not be blocked by long jobs and we can always ensure resources for shorter jobs. Jobs which would run for less than one day should be always sent to
**daily**, while jobs that would run for less than one hour should be sent
to **hourly**. This would ensure that you have highest priority over jobs
sent to partitions with less priority, but also because **general** has
limited the number of nodes that can be used for that. The idea behind
that, is that the cluster can not be blocked by long jobs and we can always
ensure resources for shorter jobs.
### Merlin5 CPU Accounts ### Merlin5 CPU Accounts
Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account. Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account. This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
```bash ```bash
@@ -100,16 +100,14 @@ Not all the accounts can be used on all partitions. This is resumed in the table
#### Private accounts #### Private accounts
* The *`gfa-asa`* and *`mu3e`* accounts are private accounts. These can be used for accessing dedicated * The *`gfa-asa`* and *`mu3e`* accounts are private accounts. These can be used for accessing dedicated partitions with nodes owned by different groups.
partitions with nodes owned by different groups.
### Slurm CPU specific options ### Slurm CPU specific options
Some options are available when using CPUs. These are detailed here. Some options are available when using CPUs. These are detailed here.
Alternative Slurm options for CPU based jobs are available. Please refer to the
Alternative Slurm options for CPU based jobs are available. Please refer to the **man** pages **man** pages for each Slurm command for further information about it (`man
for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`). salloc`, `man sbatch`, `man srun`). Below are listed the most common settings:
Below are listed the most common settings:
```bash ```bash
#SBATCH --hint=[no]multithread #SBATCH --hint=[no]multithread
@@ -125,8 +123,9 @@ Below are listed the most common settings:
#### Enabling/Disabling Hyper-Threading #### Enabling/Disabling Hyper-Threading
The **`merlin6`** cluster contains nodes with Hyper-Threading enabled. One should always specify The **`merlin6`** cluster contains nodes with Hyper-Threading enabled. One
whether to use Hyper-Threading or not. If not defined, Slurm will generally use it (exceptions apply). should always specify whether to use Hyper-Threading or not. If not defined,
Slurm will generally use it (exceptions apply).
```bash ```bash
#SBATCH --hint=multithread # Use extra threads with in-core multi-threading. #SBATCH --hint=multithread # Use extra threads with in-core multi-threading.
@@ -138,7 +137,7 @@ whether to use Hyper-Threading or not. If not defined, Slurm will generally use
Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more
specific features. For the CPU nodes, we have the following features: specific features. For the CPU nodes, we have the following features:
``` ```text
NodeName=merlin-c-[001-024,101-124,201-224] Features=mem_384gb,xeon-gold-6152 NodeName=merlin-c-[001-024,101-124,201-224] Features=mem_384gb,xeon-gold-6152
NodeName=merlin-c-[301-312] Features=mem_768gb,xeon-gold-6240r NodeName=merlin-c-[301-312] Features=mem_768gb,xeon-gold-6240r
NodeName=merlin-c-[313-318] Features=mem_768gb,xeon-gold-6240r NodeName=merlin-c-[313-318] Features=mem_768gb,xeon-gold-6240r
@@ -149,24 +148,34 @@ Therefore, users running on `hourly` can select which node they want to use (fat
This is possible by using the option `--constraint=<feature_name>` in Slurm. This is possible by using the option `--constraint=<feature_name>` in Slurm.
Examples: Examples:
1. Select nodes with 48 cores only (nodes with [2 x Xeon Gold 6240R](https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html)): 1. Select nodes with 48 cores only (nodes with [2 x Xeon Gold 6240R](https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html)):
```
```bash
sbatch --constraint=xeon-gold-6240r ... sbatch --constraint=xeon-gold-6240r ...
``` ```
2. Select nodes with 44 cores only (nodes with [2 x Xeon Gold 6152](https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html)):
``` 1. Select nodes with 44 cores only (nodes with [2 x Xeon Gold 6152](https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html)):
```bash
sbatch --constraint=xeon-gold-6152 ... sbatch --constraint=xeon-gold-6152 ...
``` ```
3. Select fat memory nodes only:
``` 1. Select fat memory nodes only:
```bash
sbatch --constraint=mem_768gb ... sbatch --constraint=mem_768gb ...
``` ```
4. Select regular memory nodes only:
``` 1. Select regular memory nodes only:
```bash
sbatch --constraint=mem_384gb ... sbatch --constraint=mem_384gb ...
``` ```
5. Select fat memory nodes with 48 cores only:
``` 1. Select fat memory nodes with 48 cores only:
```bash
sbatch --constraint=mem_768gb,xeon-gold-6240r ... sbatch --constraint=mem_768gb,xeon-gold-6240r ...
``` ```
@@ -190,14 +199,24 @@ Hence, there is a need of setting up wise limits and to ensure that there is a f
of the cluster while allowing jobs of different nature and sizes (it is, **single core** based **vs parallel jobs** of different sizes) to run. of the cluster while allowing jobs of different nature and sizes (it is, **single core** based **vs parallel jobs** of different sizes) to run.
!!! warning "Resource Limits" !!! warning "Resource Limits"
Wide limits are provided in the **daily** and **hourly** partitions, while for **general** those limits are more restrictive. However, we kindly ask users to inform the Merlin administrators when there are plans to send big jobs which would require a massive draining of nodes for allocating such jobs. This would apply to jobs requiring the **unlimited** QoS (see below "Per job limits"). Wide limits are provided in the **daily** and **hourly** partitions, while
for **general** those limits are more restrictive. However, we kindly ask
users to inform the Merlin administrators when there are plans to send big
jobs which would require a massive draining of nodes for allocating such
jobs. This would apply to jobs requiring the **unlimited** QoS (see below
"Per job limits").
!!! tip "Custom Requirements" !!! tip "Custom Requirements"
If you have different requirements, please let us know, we will try to accommodate or propose a solution for you. If you have different requirements, please let us know, we will try to
accommodate or propose a solution for you.
#### Per job limits #### Per job limits
These are limits which apply to a single job. In other words, there is a maximum of resources a single job can use. Limits are described in the table below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`). Some limits will vary depending on the day and time of the week. These are limits which apply to a single job. In other words, there is a
maximum of resources a single job can use. Limits are described in the table
below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be
listed with the command `sacctmgr show qos`). Some limits will vary depending
on the day and time of the week.
| Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h | | Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h |
|:----------: | :------------------------------: | :------------------------------: | :------------------------------: | |:----------: | :------------------------------: | :------------------------------: | :------------------------------: |
@@ -205,18 +224,29 @@ These are limits which apply to a single job. In other words, there is a maximum
| **daily** | daytime(cpu=704,mem=2750G) | nighttime(cpu=1408,mem=5500G) | unlimited(cpu=2200,mem=8593.75G) | | **daily** | daytime(cpu=704,mem=2750G) | nighttime(cpu=1408,mem=5500G) | unlimited(cpu=2200,mem=8593.75G) |
| **hourly** | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) | | **hourly** | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) |
By default, a job can not use more than 704 cores (max CPU per job). In the same way, memory is also proportionally limited. This is equivalent as By default, a job can not use more than 704 cores (max CPU per job). In the
running a job using up to 8 nodes at once. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours). same way, memory is also proportionally limited. This is equivalent as running
Limits are softed for the **daily** partition during non working hours, and during the weekend limits are even wider. a job using up to 8 nodes at once. This limit applies to the **general**
partition (fixed limit) and to the **daily** partition (only during working
hours).
For the **hourly** partition, **despite running many parallel jobs is something not desirable** (for allocating such jobs it requires massive draining of nodes), Limits are softed for the **daily** partition during non working hours, and
wider limits are provided. In order to avoid massive nodes drain in the cluster, for allocating huge jobs, setting per job limits is necessary. Hence, **unlimited** QoS during the weekend limits are even wider. For the **hourly** partition,
mostly refers to "per user" limits more than to "per job" limits (in other words, users can run any number of hourly jobs, but the job size for such jobs is limited **despite running many parallel jobs is something not desirable** (for
with wide values). allocating such jobs it requires massive draining of nodes), wider limits are
provided. In order to avoid massive nodes drain in the cluster, for allocating
huge jobs, setting per job limits is necessary. Hence, **unlimited** QoS mostly
refers to "per user" limits more than to "per job" limits (in other words,
users can run any number of hourly jobs, but the job size for such jobs is
limited with wide values).
#### Per user limits for CPU partitions #### Per user limits for CPU partitions
These limits which apply exclusively to users. In other words, there is a maximum of resources a single user can use. Limits are described in the table below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`). Some limits will vary depending on the day and time of the week. These limits which apply exclusively to users. In other words, there is a
maximum of resources a single user can use. Limits are described in the table
below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be
listed with the command `sacctmgr show qos`). Some limits will vary depending
on the day and time of the week.
| Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h | | Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h |
|:-----------:| :----------------------------: | :---------------------------: | :----------------------------: | |:-----------:| :----------------------------: | :---------------------------: | :----------------------------: |
@@ -224,15 +254,22 @@ These limits which apply exclusively to users. In other words, there is a maximu
| **daily** | daytime(cpu=1408,mem=5500G) | nighttime(cpu=2112,mem=8250G) | unlimited(cpu=6336,mem=24750G) | | **daily** | daytime(cpu=1408,mem=5500G) | nighttime(cpu=2112,mem=8250G) | unlimited(cpu=6336,mem=24750G) |
| **hourly** | unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G)| unlimited(cpu=6336,mem=24750G) | | **hourly** | unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G)| unlimited(cpu=6336,mem=24750G) |
By default, users can not use more than 704 cores at the same time (max CPU per user). Memory is also proportionally limited in the same way. This is By default, users can not use more than 704 cores at the same time (max CPU per
equivalent to 8 exclusive nodes. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours). user). Memory is also proportionally limited in the same way. This is
For the **hourly** partition, there are no limits restriction and user limits are removed. Limits are softed for the **daily** partition during non equivalent to 8 exclusive nodes. This limit applies to the **general**
working hours, and during the weekend limits are removed. partition (fixed limit) and to the **daily** partition (only during working
hours).
For the **hourly** partition, there are no limits restriction and user limits
are removed. Limits are softed for the **daily** partition during non working
hours, and during the weekend limits are removed.
## Advanced Slurm configuration ## Advanced Slurm configuration
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs. Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system. the batch system technology for managing and scheduling jobs. Slurm has been
installed in a **multi-clustered** configuration, allowing to integrate
multiple clusters in the same batch system.
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files: For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
@@ -240,5 +277,10 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access. * ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access. * ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files. The previous configuration files which can be found in the login nodes,
Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running). correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster or for the **gmerlin6**
cluster must be checked directly on any of the **merlin5** or **gmerlin6**
computing nodes (in example, by login in to one of the nodes while a job or an
active allocation is running).

View File

@@ -249,7 +249,6 @@ sview
!['sview' graphical user interface](../../images/slurm/sview.png) !['sview' graphical user interface](../../images/slurm/sview.png)
## General Monitoring ## General Monitoring
The following pages contain basic monitoring for Slurm and computing nodes. The following pages contain basic monitoring for Slurm and computing nodes.

View File

@@ -37,6 +37,7 @@ Before starting using the cluster, please read the following rules:
## Basic settings ## Basic settings
For a complete list of options and parameters available is recommended to use the **man pages** (i.e. `man sbatch`, `man srun`, `man salloc`). For a complete list of options and parameters available is recommended to use the **man pages** (i.e. `man sbatch`, `man srun`, `man salloc`).
Please, notice that behaviour for some parameters might change depending on the command used when running jobs (in example, `--exclusive` behaviour in `sbatch` differs from `srun`). Please, notice that behaviour for some parameters might change depending on the command used when running jobs (in example, `--exclusive` behaviour in `sbatch` differs from `srun`).
In this chapter we show the basic parameters which are usually needed in the Merlin cluster. In this chapter we show the basic parameters which are usually needed in the Merlin cluster.

View File

@@ -251,7 +251,6 @@ The `%1` in the `#SBATCH --array=1-10%1` statement defines that only 1 subjob ca
this will result in subjob n+1 only being started when job n has finished. It will read the checkpoint file this will result in subjob n+1 only being started when job n has finished. It will read the checkpoint file
if it is present. if it is present.
### Packed jobs: running a large number of short tasks ### Packed jobs: running a large number of short tasks
Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate

View File

@@ -116,6 +116,7 @@ fi
In the above example, one can increase the number of *nodes* and/or *ntasks* if needed and combine it In the above example, one can increase the number of *nodes* and/or *ntasks* if needed and combine it
with `--exclusive` whenever needed. In general, **no hypertheading** is recommended for MPI based jobs. with `--exclusive` whenever needed. In general, **no hypertheading** is recommended for MPI based jobs.
Also, one can combine it with `--exclusive` when necessary. Finally, one can change the MPI technology in `-start-method` Also, one can combine it with `--exclusive` when necessary. Finally, one can change the MPI technology in `-start-method`
(check CFX documentation for possible values). (check CFX documentation for possible values).

View File

@@ -1,6 +1,4 @@
--- # ANSYS - MAPDL
title: ANSYS - MAPDL
---
# ANSYS - Mechanical APDL # ANSYS - Mechanical APDL

View File

@@ -62,6 +62,7 @@ option. This will show the location of the different ANSYS releases as follows:
### ANSYS RSM ### ANSYS RSM
**ANSYS Remote Solve Manager (RSM)** is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop. **ANSYS Remote Solve Manager (RSM)** is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop.
Therefore, PSI workstations with direct access to Merlin can submit jobs by using RSM. Therefore, PSI workstations with direct access to Merlin can submit jobs by using RSM.
For further information, please visit the **[ANSYS RSM](ansys-rsm.md)** section. For further information, please visit the **[ANSYS RSM](ansys-rsm.md)** section.

View File

@@ -27,6 +27,7 @@ Refer to [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-
## NoMachine Remote Desktop Access ## NoMachine Remote Desktop Access
X applications are supported in the login nodes and can run efficiently through a **NoMachine** client. This is the officially supported way to run more demanding X applications on Merlin7. X applications are supported in the login nodes and can run efficiently through a **NoMachine** client. This is the officially supported way to run more demanding X applications on Merlin7.
* For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*. * For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
* For other workstations The client software can be downloaded from the [Nomachine Website](https://www.nomachine.com/product&p=NoMachine%20Enterprise%20Client). * For other workstations The client software can be downloaded from the [Nomachine Website](https://www.nomachine.com/product&p=NoMachine%20Enterprise%20Client).

View File

@@ -1,12 +1,4 @@
--- # Accessing Slurm Cluster
title: Accessing Slurm Cluster
#tags:
keywords: slurm, batch system, merlin5, merlin7, gmerlin7, cpu, gpu
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/slurm-access.html
---
## The Merlin Slurm clusters ## The Merlin Slurm clusters

View File

@@ -1,12 +1,4 @@
--- # Code Of Conduct
title: Code Of Conduct
#tags:
keywords: code of conduct, rules, principle, policy, policies, administrator, backup
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/code-of-conduct.html
---
## The Basic principle ## The Basic principle

View File

@@ -1,14 +1,4 @@
--- # Introduction
title: Introduction
#tags:
keywords: introduction, home, welcome, architecture, design
last_updated: 07 September 2022
sidebar: merlin7_sidebar
permalink: /merlin7/introduction.html
redirect_from:
- /merlin7
- /merlin7/index.html
---
## About Merlin7 ## About Merlin7

View File

@@ -1,12 +1,4 @@
--- # Requesting Merlin Accounts
title: Requesting Merlin Accounts
#tags:
keywords: registration, register, account, merlin5, merlin7, snow, service now
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/request-account.html
---
## Requesting Access to Merlin7 ## Requesting Access to Merlin7

View File

@@ -1,12 +1,4 @@
--- # Requesting a Merlin Project
title: Requesting a Merlin Project
#tags:
keywords: merlin project, project, snow, service now
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/request-project.html
---
A project owns its own storage area in Merlin, which can be accessed by other group members. A project owns its own storage area in Merlin, which can be accessed by other group members.
@@ -62,6 +54,7 @@ The owner of the group is the person who will be allowed to modify the group.
### Requesting Unix group membership ### Requesting Unix group membership
Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper **Unix group** owning that project. Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper **Unix group** owning that project.
Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example: Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example:
```bash ```bash
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname (base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname

View File

@@ -1,12 +1,4 @@
--- # Archive & PSI Data Catalog
title: Archive & PSI Data Catalog
#tags:
keywords: linux, archive, data catalog, archiving, lts, tape, long term storage, ingestion, datacatalog
last_updated: 31 January 2020
summary: "This document describes how to use the PSI Data Catalog for archiving Merlin7 data."
sidebar: merlin7_sidebar
permalink: /merlin7/archive.html
---
## PSI Data Catalog as a PSI Central Service ## PSI Data Catalog as a PSI Central Service
@@ -55,11 +47,11 @@ Below are the main steps for using the Data Catalog.
**``/data/project``**. It would be also necessary when the Merlin export server (**``merlin-archive.psi.ch``**) **``/data/project``**. It would be also necessary when the Merlin export server (**``merlin-archive.psi.ch``**)
is down for any reason. is down for any reason.
* Archive the dataset: * Archive the dataset:
* Visit [https://discovery.psi.ch](https://discovery.psi.ch) * Visit [<https://discovery.psi.ch](https://discovery.psi.ch>)
* Click **``Archive``** for the dataset * Click **``Archive``** for the dataset
* The system will now copy the data to the PetaByte Archive at CSCS * The system will now copy the data to the PetaByte Archive at CSCS
* Retrieve data from the catalog: * Retrieve data from the catalog:
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **``Retrieve``** * Find the dataset on [<https://discovery.psi.ch](https://discovery.psi.ch>) and click **``Retrieve``**
* Wait for the data to be copied to the PSI retrieval system * Wait for the data to be copied to the PSI retrieval system
* Run **``datasetRetriever``** script * Run **``datasetRetriever``** script
@@ -179,6 +171,7 @@ datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json
You will be asked whether you want to copy the data to the central system: You will be asked whether you want to copy the data to the central system:
* If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can * If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can
directly read the data. directly read the data.
* If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets * If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets
to the PSI archive system may take quite a while (minutes to hours). to the PSI archive system may take quite a while (minutes to hours).
@@ -188,7 +181,7 @@ This is important, since the next step is for the system to copy all the data to
this process may take several days, and it will fail if any modifications are detected. this process may take several days, and it will fail if any modifications are detected.
If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog: If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog:
[https://discovery.psi.ch](https://discovery.psi.ch). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion [<https://discovery.psi.ch](https://discovery.psi.ch>). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion
is complete. is complete.
If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive' If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive'
@@ -271,7 +264,6 @@ step will take a long time and may appear to have hung. You can check what files
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json /data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
2019/11/06 11:04:43 Latest version: 1.1.11 2019/11/06 11:04:43 Latest version: 1.1.11
2019/11/06 11:04:43 Your version of this program is up-to-date 2019/11/06 11:04:43 Your version of this program is up-to-date
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment... 2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
2019/11/06 11:04:43 Your username: 2019/11/06 11:04:43 Your username:
@@ -321,7 +313,6 @@ user_n@pb-archive.psi.ch's password:
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case). 2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
The data must first be copied to a rsync cache server. The data must first be copied to a rsync cache server.
2019/11/06 11:05:04 Do you want to continue (Y/n)? 2019/11/06 11:05:04 Do you want to continue (Y/n)?
Y Y
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012 2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
@@ -359,7 +350,7 @@ user_n@pb-archive.psi.ch's password:
### Publishing ### Publishing
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch. After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on <http://doi.psi.ch>.
For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8). For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8).

View File

@@ -1,12 +1,4 @@
--- # Connecting from a Linux Client
title: Connecting from a Linux Client
#tags:
keywords: linux, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a Linux client."
sidebar: merlin7_sidebar
permalink: /merlin7/connect-from-linux.html
---
## SSH without X11 Forwarding ## SSH without X11 Forwarding

View File

@@ -1,12 +1,4 @@
--- # Connecting from a MacOS Client
title: Connecting from a MacOS Client
#tags:
keywords: MacOS, mac os, mac, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a MacOS client."
sidebar: merlin7_sidebar
permalink: /merlin7/connect-from-macos.html
---
## SSH without X11 Forwarding ## SSH without X11 Forwarding

View File

@@ -6,6 +6,7 @@ PuTTY is one of the most common tools for SSH.
Check, if the following software packages are installed on the Windows workstation by Check, if the following software packages are installed on the Windows workstation by
inspecting the *Start* menu (hint: use the *Search* box to save time): inspecting the *Start* menu (hint: use the *Search* box to save time):
* PuTTY (should be already installed) * PuTTY (should be already installed)
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding)) * *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
@@ -21,7 +22,6 @@ If they are missing, you can install them using the Software Kiosk icon on the D
![Create Merlin Session](../../images/PuTTY/Putty_Session.png) ![Create Merlin Session](../../images/PuTTY/Putty_Session.png)
## SSH with PuTTY with X11 Forwarding ## SSH with PuTTY with X11 Forwarding
Official X11 Forwarding support is through NoMachine. Please follow the document Official X11 Forwarding support is through NoMachine. Please follow the document

View File

@@ -1,12 +1,4 @@
--- # Kerberos and AFS authentication
title: Kerberos and AFS authentication
#tags:
keywords: kerberos, AFS, kinit, klist, keytab, tickets, connecting, client, configuration, slurm
last_updated: 07 September 2022
summary: "This document describes how to use Kerberos."
sidebar: merlin7_sidebar
permalink: /merlin7/kerberos.html
---
Projects and users have their own areas in the central PSI AFS service. In order Projects and users have their own areas in the central PSI AFS service. In order
to access to these areas, valid Kerberos and AFS tickets must be granted. to access to these areas, valid Kerberos and AFS tickets must be granted.

View File

@@ -10,10 +10,8 @@ provides a helpful wrapper over the Gnome storage utilities (GIO and GVFS), and
- FTP, SFTP - FTP, SFTP
- [complete list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome) - [complete list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
## Usage ## Usage
### Start a session ### Start a session
First, start a new session. This will start a new bash shell in the current terminal where you can add further commands. First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.
@@ -47,7 +45,6 @@ Other endpoints can be mounted using the `merlin_rmount --mount <endpoint>` comm
![merlin_rmount --mount](../../images/rmount/mount.png) ![merlin_rmount --mount](../../images/rmount/mount.png)
### Accessing Files ### Accessing Files
After mounting a volume the script will print the mountpoint. It should be of the form After mounting a volume the script will print the mountpoint. It should be of the form
@@ -67,7 +64,6 @@ ln -s ~/mnt /run/user/$UID/gvfs
Files are accessible as long as the `merlin_rmount` shell remains open. Files are accessible as long as the `merlin_rmount` shell remains open.
### Disconnecting ### Disconnecting
To disconnect, close the session with one of the following: To disconnect, close the session with one of the following:
@@ -78,7 +74,6 @@ To disconnect, close the session with one of the following:
Disconnecting will unmount all volumes. Disconnecting will unmount all volumes.
## Alternatives ## Alternatives
### Thunar ### Thunar

View File

@@ -1,12 +1,4 @@
--- # Merlin7 Tools
title: Merlin7 Tools
#tags:
keywords: merlin_quotas
#last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/tools.html
---
## About ## About
@@ -105,4 +97,3 @@ If you are added/removed from a project, you can update this config file by
calling `merlin_quotas genconf --force` (notice the `--force`, which will overwrite calling `merlin_quotas genconf --force` (notice the `--force`, which will overwrite
your existing config file) or by editing the file by hand (*not recommended*). your existing config file) or by editing the file by hand (*not recommended*).

View File

@@ -1,10 +1,4 @@
--- # Remote Desktop Access to Merlin7
title: Remote Desktop Access to Merlin7
keywords: NX, NoMachine, remote desktop access, login node, login001, login002, merlin7-nx-01, merlin7-nx, nx.psi.ch, VPN, browser access
last_updated: 07 August 2024
sidebar: merlin7_sidebar
permalink: /merlin7/nomachine.html
---
## Overview ## Overview
@@ -21,7 +15,7 @@ If you are inside the PSI network, you can directly connect to the Merlin7 NoMac
#### Method 1: Using a Web Browser #### Method 1: Using a Web Browser
Open your web browser and navigate to [https://merlin7-nx.psi.ch:4443](https://merlin7-nx.psi.ch:4443). Open your web browser and navigate to <https://merlin7-nx.psi.ch:4443>.
#### Method 2: Using the NoMachine Client #### Method 2: Using the NoMachine Client
@@ -42,7 +36,7 @@ Documentation about the `nx.psi.ch` service can be found [here](https://www.psi.
##### Using a Web Browser ##### Using a Web Browser
Open your web browser and navigate to [https://nx.psi.ch](https://nx.psi.ch). Open your web browser and navigate to <https://nx.psi.ch>.
##### Using the NoMachine Client ##### Using the NoMachine Client

View File

@@ -1,12 +1,4 @@
--- # Software repositories
title: Software repositories
#tags:
keywords: modules, software, stable, unstable, deprecated, spack, repository, repositories
last_updated: 16 January 2024
summary: "This page contains information about the different software repositories"
sidebar: merlin7_sidebar
permalink: /merlin7/software-repositories.html
---
## Module Systems in Merlin7 ## Module Systems in Merlin7

View File

@@ -1,13 +1,4 @@
--- # Configuring SSH Keys in Merlin
title: Configuring SSH Keys in Merlin
#tags:
keywords: linux, connecting, client, configuration, SSH, Keys, SSH-Keys, RSA, authorization, authentication
last_updated: 15 Jul 2020
summary: "This document describes how to deploy SSH Keys in Merlin."
sidebar: merlin7_sidebar
permalink: /merlin7/ssh-keys.html
---
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password. Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys. One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.

View File

@@ -1,13 +1,4 @@
--- # Merlin7 Storage
title: Merlin7 Storage
#tags:
keywords: storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas
#last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
redirect_from: /merlin7/data-directories.html
permalink: /merlin7/storage.html
---
## Introduction ## Introduction

View File

@@ -1,26 +1,19 @@
--- # Transferring Data
title: Transferring Data
#tags:
keywords: transferring data, data transfer, rsync, winscp, copy data, copying, sftp, import, export, hop, vpn
last_updated: 24 August 2023
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/transfer-data.html
---
## Overview ## Overview
Most data transfer methods support both sending and receiving, so you may initiate the transfer from either **Merlin** or the other system — depending on **network visibility**. Most data transfer methods support both sending and receiving, so you may initiate the transfer from either **Merlin** or the other system — depending on **network visibility**.
- **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
- **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
- HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
- Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
- **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
> SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**. * **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
> * However, **transfers from any PSI host to Merlin7 using port 22 are allowed**. * **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
> * HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
> Port `21` is also available for FTP transfers from PSI to Merlin7. * Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
* **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
!!! note
SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**.
However, **transfers from any PSI host to Merlin7 using port 22 are allowed**.
Port `21` is also available for FTP transfers from PSI to Merlin7.
### Choosing the best transfer method ### Choosing the best transfer method
@@ -46,6 +39,7 @@ The following methods transfer data directly via the [login nodes](../01-Quick-S
### Rsync (Recommended for Linux/macOS) ### Rsync (Recommended for Linux/macOS)
Rsync is the **preferred** method for small datasets from Linux/macOS systems. It supports **resuming interrupted transfers** and **skips already transferred files**. Syntax: Rsync is the **preferred** method for small datasets from Linux/macOS systems. It supports **resuming interrupted transfers** and **skips already transferred files**. Syntax:
```bash ```bash
rsync -avAHXS <src> <dst> rsync -avAHXS <src> <dst>
``` ```
@@ -65,12 +59,15 @@ rsync -avAHXS ~/localdata $USER@login001.merlin7.psi.ch:/data/project/general/my
### SCP ### SCP
SCP works similarly to `rsync` but **does not support resuming** interrupted transfers. It may be used for quick one-off transfers. Example: SCP works similarly to `rsync` but **does not support resuming** interrupted transfers. It may be used for quick one-off transfers. Example:
```bash ```bash
scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/ scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
``` ```
### Secure FTP ### Secure FTP
A `vsftpd` service is available on the login nodes, providing high-speed transfers. Choose the server based on your **speed vs. encryption** needs: A `vsftpd` service is available on the login nodes, providing high-speed transfers. Choose the server based on your **speed vs. encryption** needs:
* **`login001.merlin7.psi.ch`:** Encrypted control & data channels. * **`login001.merlin7.psi.ch`:** Encrypted control & data channels.
**Use if your data is sensitive**. **Slower**, but secure. **Use if your data is sensitive**. **Slower**, but secure.
* **`service03.merlin7.psi.ch`**: Encrypted control channel only. * **`service03.merlin7.psi.ch`**: Encrypted control channel only.
@@ -80,9 +77,11 @@ A `vsftpd` service is available on the login nodes, providing high-speed transfe
The **control channel** is always **encrypted**, therefore, authentication is encrypted and secured. The **control channel** is always **encrypted**, therefore, authentication is encrypted and secured.
## UI-based Clients for Data Transfer ## UI-based Clients for Data Transfer
### WinSCP (Windows) ### WinSCP (Windows)
Available in the **Software Kiosk** on PSI Windows machines. Available in the **Software Kiosk** on PSI Windows machines.
* Using your PSI credentials, connect to * Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`. * when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to: * when using port 21, connect to:
@@ -95,6 +94,7 @@ Available in the **Software Kiosk** on PSI Windows machines.
### FileZilla (Linux/MacOS/Windows) ### FileZilla (Linux/MacOS/Windows)
Download from [FileZilla Project](https://filezilla-project.org/), or install from your Linux software repositories if available. Download from [FileZilla Project](https://filezilla-project.org/), or install from your Linux software repositories if available.
* Using your PSI credentials, connect to * Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`. * when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to: * when using port 21, connect to:
@@ -105,20 +105,23 @@ Download from [FileZilla Project](https://filezilla-project.org/), or install fr
## Sharing Files with SWITCHfilesender ## Sharing Files with SWITCHfilesender
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is a Swiss-hosted installation of the [FileSender](https://filesender.org/) project — a web-based application that allows authenticated users to securely and easily send **arbitrarily large files** to other users. Features: **[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is a Swiss-hosted installation of the [FileSender](https://filesender.org/) project — a web-based application that allows authenticated users to securely and easily send **arbitrarily large files** to other users. Features:
- **Secure large file transfers:** Send files that exceed normal email attachment limits.
- **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads. * **Secure large file transfers:** Send files that exceed normal email attachment limits.
- **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account. * **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads.
- **Designed for research & education:** Developed to meet the needs of universities and research institutions. * **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account.
* **Designed for research & education:** Developed to meet the needs of universities and research institutions.
About the authentication: About the authentication:
- It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
- It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**. * It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
- PSI employees can log in using their PSI account: * It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**.
* PSI employees can log in using their PSI account:
1. Open [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload). 1. Open [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload).
2. Select **PSI** as the institution. 2. Select **PSI** as the institution.
3. Authenticate with your PSI credentials. 3. Authenticate with your PSI credentials.
The service is designed to **send large files for temporary availability**, not as a permanent publishing platform. Typical use case: The service is designed to **send large files for temporary availability**, not as a permanent publishing platform. Typical use case:
1. Upload a file. 1. Upload a file.
2. Share the download link with a recipient. 2. Share the download link with a recipient.
3. File remains available until the specified **expiration date** is reached, or the **download limit** is reached. 3. File remains available until the specified **expiration date** is reached, or the **download limit** is reached.
@@ -134,6 +137,7 @@ From August 2024, Merlin is connected to the **[PSI Data Transfer](https://www.p
[reported](../99-support/contact.md) to the Merlin administrators, which will forward the request if necessary. [reported](../99-support/contact.md) to the Merlin administrators, which will forward the request if necessary.
The PSI Data Transfer servers supports the following protocols: The PSI Data Transfer servers supports the following protocols:
* Data Transfer - SSH (scp / rsync) * Data Transfer - SSH (scp / rsync)
* Data Transfer - Globus * Data Transfer - Globus
@@ -150,27 +154,25 @@ Therefore, having the Microsoft Authenticator App is required as explained [here
## Connecting to Merlin7 from outside PSI ## Connecting to Merlin7 from outside PSI
Merlin7 is fully accessible from within the PSI network. To connect from outside you can use: Merlin7 is fully accessible from within the PSI network. To connect from outside you can use:
- [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
- [SSH hopx](https://www.psi.ch/en/computing/ssh-hop) * [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
* [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
* Please avoid transferring big amount data through **hop** * Please avoid transferring big amount data through **hop**
- [No Machine](nomachine.md) * [No Machine](nomachine.md)
* Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access) * Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
* Please avoid transferring big amount of data through **NoMachine** * Please avoid transferring big amount of data through **NoMachine**
{% comment %}
## Connecting from Merlin7 to outside file shares ## Connecting from Merlin7 to outside file shares
### `merlin_rmount` command ### `merlin_rmount` command
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including
- SMB/CIFS (Windows shared folders)
- WebDav
- AFP
- FTP, SFTP
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
* SMB/CIFS (Windows shared folders)
* WebDav
* AFP
* FTP, SFTP
* [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
[More instruction on using `merlin_rmount`](merlin-rmount.md) [More instruction on using `merlin_rmount`](merlin-rmount.md)
{% endcomment %}

View File

@@ -119,6 +119,7 @@ how to connect to the **NoMachine** service in the Merlin cluster.
For other non officially supported graphical access (X11 forwarding): For other non officially supported graphical access (X11 forwarding):
* For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](../02-How-To-Use-Merlin/connect-from-linux.md) * For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](../02-How-To-Use-Merlin/connect-from-linux.md)
* For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](../02-How-To-Use-Merlin/connect-from-windows.md) * For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](../02-How-To-Use-Merlin/connect-from-windows.md)
* For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-Merlin/connect-from-macos.md) * For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-Merlin/connect-from-macos.md)

View File

@@ -1,12 +1,4 @@
--- # Slurm cluster 'merlin7'
title: Slurm cluster 'merlin7'
#tags:
keywords: configuration, partitions, node definition
#last_updated: 24 Mai 2023
summary: "This document describes a summary of the Merlin7 configuration."
sidebar: merlin7_sidebar
permalink: /merlin7/merlin7-configuration.html
---
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster. This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.

View File

@@ -1,12 +1,4 @@
--- # Slurm merlin7 Configuration
title: Slurm merlin7 Configuration
#tags:
keywords: configuration, partitions, node definition
#last_updated: 24 Mai 2023
summary: "This document describes a summary of the Merlin7 Slurm CPU-based configuration."
sidebar: merlin7_sidebar
permalink: /merlin7/slurm-configuration.html
---
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster. This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
@@ -53,6 +45,7 @@ However, when necessary, one can specify the cluster as follows:
### CPU general configuration ### CPU general configuration
The **Merlin7 CPU cluster** is configured with the **`CR_CORE_MEMORY`** and **`CR_ONE_TASK_PER_CORE`** options. The **Merlin7 CPU cluster** is configured with the **`CR_CORE_MEMORY`** and **`CR_ONE_TASK_PER_CORE`** options.
* This configuration treats both cores and memory as consumable resources. * This configuration treats both cores and memory as consumable resources.
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU * Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
to fulfill a job's resource requirements. to fulfill a job's resource requirements.
@@ -223,12 +216,14 @@ For submittng jobs to the GPU cluster, **the cluster name `gmerlin7` must be spe
### GPU general configuration ### GPU general configuration
The **Merlin7 GPU cluster** is configured with the **`CR_CORE_MEMORY`**, **`CR_ONE_TASK_PER_CORE`**, and **`ENFORCE_BINDING_GRES`** options. The **Merlin7 GPU cluster** is configured with the **`CR_CORE_MEMORY`**, **`CR_ONE_TASK_PER_CORE`**, and **`ENFORCE_BINDING_GRES`** options.
* This configuration treats both cores and memory as consumable resources. * This configuration treats both cores and memory as consumable resources.
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU * Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
to fulfill a job's resource requirements. to fulfill a job's resource requirements.
* Slurm will allocate the CPUs to the selected GPU. * Slurm will allocate the CPUs to the selected GPU.
By default, Slurm will allocate one task per core, which means: By default, Slurm will allocate one task per core, which means:
* For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job. * For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job.
* For the NVIDIA GraceHopper-based nodes, each task will consume 1 **CPU**. * For the NVIDIA GraceHopper-based nodes, each task will consume 1 **CPU**.
@@ -251,6 +246,7 @@ Notes on memory configuration:
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread). * **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
The total memory requested cannot exceed the **`MaxMemPerNode`** value. The total memory requested cannot exceed the **`MaxMemPerNode`** value.
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core, * **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
adjusted. adjusted.
@@ -278,7 +274,9 @@ effectively.
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
**to specific partitions** in line with the established resource allocation policies. The table below outlines the **to specific partitions** in line with the established resource allocation policies. The table below outlines the
various QoS definitions applicable to the merlin7 CPU-based cluster. Here: various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
* `MaxTRES` specifies resource limits per job. * `MaxTRES` specifies resource limits per job.
* `MaxTRESPU` specifies resource limits per user. * `MaxTRESPU` specifies resource limits per user.

View File

@@ -1,12 +1,4 @@
--- # Slurm Examples
title: Slurm Examples
#tags:
keywords: slurm example, template, examples, templates, running jobs, sbatch, single core based jobs, HT, multithread, no-multithread, mpi, openmp, packed jobs, hands-on, array jobs, gpu
last_updated: 24 Mai 2023
summary: "This document shows different template examples for running jobs in the Merlin cluster."
sidebar: merlin7_sidebar
permalink: /merlin7/slurm-examples.html
---
## Single core based job examples ## Single core based job examples

View File

@@ -1,12 +1,4 @@
--- # Jupyterhub on Merlin7
title: Jupyterhub on Merlin7
#tags:
keywords: jupyterhub, jupyter, jupyterlab, notebook, notebooks
last_updated: 24 July 2025
summary: "Jupyterhub service description"
sidebar: merlin7_sidebar
permalink: /merlin7/jupyterhub.html
---
Jupyterhub provides [jupyter notebooks](https://jupyter.org/) that are launched on Jupyterhub provides [jupyter notebooks](https://jupyter.org/) that are launched on
cluster nodes of merlin and can be accessed through a web portal. cluster nodes of merlin and can be accessed through a web portal.

View File

@@ -1,12 +1,4 @@
--- # ANSYS RSM (Remote Resolve Manager)
title: ANSYS RSM (Remote Resolve Manager)
#tags:
keywords: software, ansys, rsm, slurm, interactive, rsm, windows
last_updated: 23 August 2024
summary: "This document describes how to use the ANSYS Remote Resolve Manager service in the Merlin7 cluster"
sidebar: merlin7_sidebar
permalink: /merlin7/ansys-rsm.html
---
## ANSYS Remote Resolve Manager ## ANSYS Remote Resolve Manager
@@ -34,6 +26,7 @@ The different steps and settings required to make it work are that following:
3. In the **HPC Resource** tab, fill up the corresponding fields as follows: 3. In the **HPC Resource** tab, fill up the corresponding fields as follows:
![HPC Resource](../../images/ANSYS/merlin7/rsm-2-add_cluster.png) ![HPC Resource](../../images/ANSYS/merlin7/rsm-2-add_cluster.png)
* **"Name"**: Add here the preffered name for the cluster. For example: `Merlin7 cluster` * **"Name"**: Add here the preffered name for the cluster. For example: `Merlin7 cluster`
* **"HPC Type"**: Select `SLURM` * **"HPC Type"**: Select `SLURM`
* **"Submit host"**: `service03.merlin7.psi.ch` * **"Submit host"**: `service03.merlin7.psi.ch`
* **"Slurm Job submission arguments (optional)"**: Add any required Slurm options for running your jobs. * **"Slurm Job submission arguments (optional)"**: Add any required Slurm options for running your jobs.

View File

@@ -1,12 +1,4 @@
--- # ANSYS
title: ANSYS
#tags:
keywords: software, ansys, slurm, interactive, rsm, pmodules, overlay, overlays
last_updated: 23 August 2024
summary: "This document describes how to load and use ANSYS in the Merlin7 cluster"
sidebar: merlin7_sidebar
permalink: /merlin7/ansys.html
---
This document describes generic information of how to load and run ANSYS software in the Merlin cluster This document describes generic information of how to load and run ANSYS software in the Merlin cluster
@@ -22,7 +14,6 @@ Merlin high performance storage and we have made it available from Pmodules.
### Loading Merlin7 ANSYS ### Loading Merlin7 ANSYS
```bash ```bash
module purge module purge
module use unstable # Optional module use unstable # Optional
@@ -69,7 +60,6 @@ ANSYS/2025R2:
</pre> </pre>
</details> </details>
!!! tip !!! tip
Please always run **ANSYS/2024R2 or superior**. Please always run **ANSYS/2024R2 or superior**.

View File

@@ -1,11 +1,4 @@
--- # CP2k
title: CP2k
keywords: CP2k software, compile
summary: "CP2k is a quantum chemistry and solid state physics software package"
sidebar: merlin7_sidebar
toc: false
permalink: /merlin7/cp2k.html
---
## CP2k ## CP2k
@@ -131,14 +124,13 @@ module purge
module use Spack unstable module use Spack unstable
module load gcc/12.3 openmpi/5.0.8-r5lz-A100-gpu dbcsr/2.8.0-3r22-A100-gpu-omp cosma/2.7.0-y2tr-gpu cuda/12.6.0-3y6a dftd4/3.7.0-4k4c-omp elpa/2025.01.002-bovg-A100-gpu-omp fftw/3.3.10-syba-omp hdf5/1.14.6-pcsd libint/2.11.1-3lxv libxc/7.0.0-u556 libxsmm/1.17-2azz netlib-scalapack/2.2.2-rmcf openblas/0.3.30-ynou-omp plumed/2.9.2-47hk py-fypp/3.1-z25p py-numpy/2.3.2-45ay python/3.13.5-qivs sirius/develop-qz4c-A100-gpu-omp spglib/2.5.0-jl5l-omp spla/1.6.1-hrgf-gpu cmake/3.31.8-j47l ninja/1.12.1-afxy module load gcc/12.3 openmpi/5.0.8-r5lz-A100-gpu dbcsr/2.8.0-3r22-A100-gpu-omp cosma/2.7.0-y2tr-gpu cuda/12.6.0-3y6a dftd4/3.7.0-4k4c-omp elpa/2025.01.002-bovg-A100-gpu-omp fftw/3.3.10-syba-omp hdf5/1.14.6-pcsd libint/2.11.1-3lxv libxc/7.0.0-u556 libxsmm/1.17-2azz netlib-scalapack/2.2.2-rmcf openblas/0.3.30-ynou-omp plumed/2.9.2-47hk py-fypp/3.1-z25p py-numpy/2.3.2-45ay python/3.13.5-qivs sirius/develop-qz4c-A100-gpu-omp spglib/2.5.0-jl5l-omp spla/1.6.1-hrgf-gpu cmake/3.31.8-j47l ninja/1.12.1-afxy
git clone https://github.com/cp2k/cp2k.git git clone <https://github.com/cp2k/cp2k.git>
cd cp2k cd cp2k
mkdir build && cd build mkdir build && cd build
CC=mpicc CXX=mpic++ FC=mpifort cmake -GNinja -DCMAKE_CUDA_HOST_COMPILER=mpicc -DCP2K_USE_LIBXC=ON -DCP2K_USE_LIBINT2=ON -DCP2K_USE_SPGLIB=ON -DCP2K_USE_ELPA=ON -DCP2K_USE_SPLA=ON -DCP2K_USE_SIRIUS=ON -DCP2K_USE_PLUMED=ON -DCP2K_USE_DFTD4=ON -DCP2K_USE_COSMA=ON -DCP2K_USE_ACCEL=CUDA -DCMAKE_CUDA_ARCHITECTURES=80 -DCP2K_USE_FFTW3=ON .. CC=mpicc CXX=mpic++ FC=mpifort cmake -GNinja -DCMAKE_CUDA_HOST_COMPILER=mpicc -DCP2K_USE_LIBXC=ON -DCP2K_USE_LIBINT2=ON -DCP2K_USE_SPGLIB=ON -DCP2K_USE_ELPA=ON -DCP2K_USE_SPLA=ON -DCP2K_USE_SIRIUS=ON -DCP2K_USE_PLUMED=ON -DCP2K_USE_DFTD4=ON -DCP2K_USE_COSMA=ON -DCP2K_USE_ACCEL=CUDA -DCMAKE_CUDA_ARCHITECTURES=80 -DCP2K_USE_FFTW3=ON ..
ninja -j 16 ninja -j 16
``` ```
#### GH200 #### GH200
[![Pipeline](https://gitea.psi.ch/HPCE/spack-psi/actions/workflows/cp2k_gh_merlin7.yml/badge.svg?branch=main)](https://gitea.psi.ch/HPCE/spack-psi) [![Pipeline](https://gitea.psi.ch/HPCE/spack-psi/actions/workflows/cp2k_gh_merlin7.yml/badge.svg?branch=main)](https://gitea.psi.ch/HPCE/spack-psi)

View File

@@ -1,12 +1,4 @@
--- # Cray Programming Environment
title: Cray Programming Environment
#tags:
keywords: cray, module
last_updated: 24 Mai 2023
summary: "This document describes how to use the Cray Programming Environment on Merlin7."
sidebar: merlin7_sidebar
permalink: /merlin7/cray-module-env.html
---
## Loading the Cray module ## Loading the Cray module
@@ -38,7 +30,7 @@ as a whole called Programming Environment. In the Cray Programming Environment,
* `cray-libsci` is a collection of numerical routines tuned for performance on Cray systems. * `cray-libsci` is a collection of numerical routines tuned for performance on Cray systems.
* `libfabric` is an important low-level library that allows you to take advantage of the high performance Slingshot network. * `libfabric` is an important low-level library that allows you to take advantage of the high performance Slingshot network.
* `cray-mpich` is a CUDA-aware MPI implementation, optimized for Cray systems. * `cray-mpich` is a CUDA-aware MPI implementation, optimized for Cray systems.
* `cce` is the compiler from Cray. C/C++ compilers are based on Clang/LLVM while Fortran supports Fortran 2018 standard. More info: https://user.cscs.ch/computing/compilation/cray/ * `cce` is the compiler from Cray. C/C++ compilers are based on Clang/LLVM while Fortran supports Fortran 2018 standard. More info: <https://user.cscs.ch/computing/compilation/cray/>
You can switch between different programming environments. You can check the available module with the `module avail` command, as follows: You can switch between different programming environments. You can check the available module with the `module avail` command, as follows:

View File

@@ -1,11 +1,4 @@
--- # GROMACS
title: GROMACS
keywords: GROMACS software, compile
summary: "GROMACS (GROningen Machine for Chemical Simulations) is a versatile and widely-used open source package to perform molecular dynamics"
sidebar: merlin7_sidebar
toc: false
permalink: /merlin7/gromacs.html
---
## GROMACS ## GROMACS

View File

@@ -1,11 +1,4 @@
--- # IPPL
title: IPPL
keywords: IPPL software, compile
summary: "Independent Parallel Particle Layer (IPPL) is a performance portable C++ library for Particle-Mesh methods"
sidebar: merlin7_sidebar
toc: false
permalink: /merlin7/ippl.html
---
## IPPL ## IPPL

View File

@@ -1,11 +1,4 @@
--- # LAMMPS
title: LAMMPS
keywords: LAMMPS software, compile
summary: "LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state"
sidebar: merlin7_sidebar
toc: false
permalink: /merlin7/lammps.html
---
## LAMMPS ## LAMMPS

View File

@@ -1,11 +1,4 @@
--- # OPAL-X
title: OPAL-X
keywords: OPAL-X software, compile
summary: "OPAL (Object Oriented Particle Accelerator Library) is an open source C++ framework for general particle accelerator simulations including 3D space charge, short range wake fields and particle matter interaction."
sidebar: merlin7_sidebar
toc: false
permalink: /merlin7/opal-x.html
---
## OPAL ## OPAL

View File

@@ -1,12 +1,4 @@
--- # OpenMPI Support
title: OpenMPI Support
#tags:
last_updated: 15 January 2025
keywords: software, openmpi, slurm
summary: "This document describes how to use OpenMPI in the Merlin7 cluster"
sidebar: merlin7_sidebar
permalink: /merlin7/openmpi.html
---
## Introduction ## Introduction

View File

@@ -1,12 +1,4 @@
--- # PSI Modules
title: PSI Modules
#tags:
keywords: Pmodules, software, stable, unstable, deprecated, overlay, overlays, release stage, module, package, packages, library, libraries
last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/pmodules.html
---
## PSI Environment Modules ## PSI Environment Modules

View File

@@ -1,11 +1,4 @@
--- # Quantum Espresso
title: Quantum Espresso
keywords: Quantum Espresso software, compile
summary: "Quantum Espresso code for electronic-structure calculations and materials modeling at the nanoscale"
sidebar: merlin7_sidebar
toc: false
permalink: /merlin7/quantum-espresso.html
---
## Quantum ESPRESSO ## Quantum ESPRESSO
@@ -121,7 +114,6 @@ module purge
module use Spack unstable module use Spack unstable
module load nvhpc/25.3 openmpi/5.0.7-e3bf-GH200-gpu fftw/3.3.10-sfpw-omp hdf5/develop-2.0-ztvo nvpl-blas/0.4.0.1-3zpg nvpl-lapack/0.3.0-ymy5 netlib-scalapack/2.2.2-qrhq cmake/3.31.6-5dl7 module load nvhpc/25.3 openmpi/5.0.7-e3bf-GH200-gpu fftw/3.3.10-sfpw-omp hdf5/develop-2.0-ztvo nvpl-blas/0.4.0.1-3zpg nvpl-lapack/0.3.0-ymy5 netlib-scalapack/2.2.2-qrhq cmake/3.31.6-5dl7
cd <path to QE source directory> cd <path to QE source directory>
mkdir build mkdir build
cd build cd build

View File

@@ -1,11 +1,4 @@
--- # Spack
title: Spack
keywords: spack, python, software, compile
summary: "Spack the HPC package manager documentation"
sidebar: merlin7_sidebar
toc: false
permalink: /merlin7/spack.html
---
For Merlin7 the *package manager for supercomputing* [Spack](https://spack.io/) is available. It is meant to compliment the existing PModules For Merlin7 the *package manager for supercomputing* [Spack](https://spack.io/) is available. It is meant to compliment the existing PModules
solution, giving users the opertunity to manage their own software environments. solution, giving users the opertunity to manage their own software environments.

View File

@@ -1,12 +1,4 @@
--- # Contact
title: Contact
#tags:
keywords: contact, support, snow, service now, mailing list, mailing, email, mail, merlin-admins@lists.psi.ch, merlin-users@lists.psi.ch, merlin users
last_updated: 15. Jan 2025
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/contact.html
---
## Support ## Support
@@ -16,10 +8,10 @@ Support can be asked through:
Basic contact information is also displayed on every shell login to the system using the *Message of the Day* mechanism. Basic contact information is also displayed on every shell login to the system using the *Message of the Day* mechanism.
### PSI Service Now ### PSI Service Now
**[PSI Service Now](https://psi.service-now.com/psisp)**: is the official tool for opening incident requests. **[PSI Service Now](https://psi.service-now.com/psisp)**: is the official tool for opening incident requests.
* PSI HelpDesk will redirect the incident to the corresponding department, or * PSI HelpDesk will redirect the incident to the corresponding department, or
* you can always assign it directly by checking the box `I know which service is affected` and providing the service name `Local HPC Resources (e.g. Merlin) [CF]` (just type in `Local` and you should get the valid completions). * you can always assign it directly by checking the box `I know which service is affected` and providing the service name `Local HPC Resources (e.g. Merlin) [CF]` (just type in `Local` and you should get the valid completions).

View File

@@ -1,12 +1,3 @@
---
#tags:
keywords: merlin6, merlin7, migration, fpsync, rsync
#summary: ""
sidebar: merlin7_sidebar
last_updated: 28 May 2025
permalink: /merlin7/migrating.html
---
# Merlin6 to Merlin7 Migration Guide # Merlin6 to Merlin7 Migration Guide
Welcome to the official documentation for migrating your data from **Merlin6** to **Merlin7**. Please follow the instructions carefully to ensure a smooth and secure transition. Welcome to the official documentation for migrating your data from **Merlin6** to **Merlin7**. Please follow the instructions carefully to ensure a smooth and secure transition.
@@ -73,10 +64,12 @@ Before starting the migration, make sure you:
* If not yet registered, please do so following [these instructions](../01-Quick-Start-Guide/requesting-accounts.md) * If not yet registered, please do so following [these instructions](../01-Quick-Start-Guide/requesting-accounts.md)
* **have cleaned up your data to reduce migration time and space usage**. * **have cleaned up your data to reduce migration time and space usage**.
* **For the user data migration**, ensure your total usage on Merlin6 (`/psi/home`+`/data/user`) is **well below the 1TB quota** (use the `merlin_quotas` command). Remember: * **For the user data migration**, ensure your total usage on Merlin6 (`/psi/home`+`/data/user`) is **well below the 1TB quota** (use the `merlin_quotas` command). Remember:
* **Merlin7 also has a 1TB quota on your home directory**, and you might already have data there. * **Merlin7 also has a 1TB quota on your home directory**, and you might already have data there.
* If your usage exceeds this during the transfer, the process might fail. * If your usage exceeds this during the transfer, the process might fail.
* No activity should be running / performed on Merlin6 when the transfer process is ongoing. * No activity should be running / performed on Merlin6 when the transfer process is ongoing.
### Recommended Cleanup Actions ### Recommended Cleanup Actions
@@ -117,6 +110,7 @@ This script will:
* `~/merlin6home` → copy of your old home * `~/merlin6home` → copy of your old home
> ⚠️ **Important:** If `~/merlin6home` or `~/merlin6data` already exist on Merlin7, the script will exit. > ⚠️ **Important:** If `~/merlin6home` or `~/merlin6data` already exist on Merlin7, the script will exit.
> **Please remove them or contact support**. > **Please remove them or contact support**.
If there are issues, the script will: If there are issues, the script will:

View File

@@ -181,10 +181,10 @@ nav:
- merlin6/99-support/known-problems.md - merlin6/99-support/known-problems.md
- merlin6/99-support/migration-from-merlin5.md - merlin6/99-support/migration-from-merlin5.md
- merlin6/99-support/troubleshooting.md - merlin6/99-support/troubleshooting.md
- PSI@CSCS:
- cscs-userlab/index.md
- cscs-userlab/transfer-data.md
- MeG: - MeG:
- meg/index.md - meg/index.md
- meg/contact.md - meg/contact.md
- meg/migration-to-merlin7.md - meg/migration-to-merlin7.md
- PSI@CSCS:
- cscs-userlab/index.md
- cscs-userlab/transfer-data.md