initial formatting changes complete
This commit is contained in:
@@ -43,13 +43,13 @@ For 2025 we can offer access to [CSCS Alps](https://www.cscs.ch/computers/alps)
|
|||||||
|
|
||||||
* [CSCS User Portal](https://user.cscs.ch/)
|
* [CSCS User Portal](https://user.cscs.ch/)
|
||||||
* Documentation
|
* Documentation
|
||||||
* [CSCS Eiger CPU multicore cluster](https://docs.cscs.ch/clusters/eiger/)
|
* [CSCS Eiger CPU multicore cluster](https://docs.cscs.ch/clusters/eiger/)
|
||||||
* [CSCS Daint GPU cluster](https://docs.cscs.ch/clusters/daint/)
|
* [CSCS Daint GPU cluster](https://docs.cscs.ch/clusters/daint/)
|
||||||
|
|
||||||
## Contact information
|
## Contact information
|
||||||
|
|
||||||
* PSI Contacts:
|
* PSI Contacts:
|
||||||
* Mailing list contact: <psi-hpc-at-cscs-admin@lists.psi.ch>
|
* Mailing list contact: <psi-hpc-at-cscs-admin@lists.psi.ch>
|
||||||
* Marc Caubet Serrabou <marc.caubet@psi.ch>
|
* Marc Caubet Serrabou <marc.caubet@psi.ch>
|
||||||
* Derek Feichtinger <derek.feichtinger@psi.ch>
|
* Derek Feichtinger <derek.feichtinger@psi.ch>
|
||||||
* Mailing list for receiving user notifications and survey information: psi-hpc-at-cscs@lists.psi.ch [(subscribe)](https://psilists.ethz.ch/sympa/subscribe/psi-hpc-at-cscs)
|
* Mailing list for receiving user notifications and survey information: <psi-hpc-at-cscs@lists.psi.ch> [(subscribe)](https://psilists.ethz.ch/sympa/subscribe/psi-hpc-at-cscs)
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Introduction
|
||||||
title: Introduction
|
|
||||||
#tags:
|
|
||||||
#keywords:
|
|
||||||
last_updated: 28 June 2019
|
|
||||||
#summary: "GPU Merlin 6 cluster overview"
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /gmerlin6/cluster-introduction.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## About Merlin6 GPU cluster
|
## About Merlin6 GPU cluster
|
||||||
|
|
||||||
@@ -25,10 +17,10 @@ of **GPU**-based resources which are mostly used by the BIO experiments.
|
|||||||
|
|
||||||
### Slurm 'gmerlin6'
|
### Slurm 'gmerlin6'
|
||||||
|
|
||||||
THe **GPU nodes** have a dedicated **Slurm** cluster, called **`gmerli6`**.
|
THe **GPU nodes** have a dedicated **Slurm** cluster, called **`gmerli6`**.
|
||||||
|
|
||||||
This cluster contains the same shared storage resources (`/data/user`, `/data/project`, `/shared-scracth`, `/afs`, `/psi/home`)
|
This cluster contains the same shared storage resources (`/data/user`, `/data/project`, `/shared-scracth`, `/afs`, `/psi/home`)
|
||||||
which are present in the other Merlin Slurm clusters (`merlin5`,`merlin6`). The Slurm `gmerlin6` cluster is maintainted
|
which are present in the other Merlin Slurm clusters (`merlin5`,`merlin6`). The Slurm `gmerlin6` cluster is maintainted
|
||||||
independently to ease access for the users and keep independent user accounting.
|
independently to ease access for the users and keep independent user accounting.
|
||||||
|
|
||||||
## Merlin6 Architecture
|
## Merlin6 Architecture
|
||||||
|
|||||||
@@ -1,19 +1,11 @@
|
|||||||
---
|
# Hardware And Software Description
|
||||||
title: Hardware And Software Description
|
|
||||||
#tags:
|
|
||||||
#keywords:
|
|
||||||
last_updated: 19 April 2021
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /gmerlin6/hardware-and-software.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Hardware
|
## Hardware
|
||||||
|
|
||||||
### GPU Computing Nodes
|
### GPU Computing Nodes
|
||||||
|
|
||||||
The GPU Merlin6 cluster was initially built from recycled workstations from different groups in the BIO division.
|
The GPU Merlin6 cluster was initially built from recycled workstations from different groups in the BIO division.
|
||||||
From then, little by little it was updated with new nodes from sporadic investments from the same division, and it was never possible a central big investment.
|
From then, little by little it was updated with new nodes from sporadic investments from the same division, and it was never possible a central big investment.
|
||||||
Hence, due to this, the Merlin6 GPU computing cluster has a non homogeneus solution, consisting on a big variety of hardware types and components.
|
Hence, due to this, the Merlin6 GPU computing cluster has a non homogeneus solution, consisting on a big variety of hardware types and components.
|
||||||
|
|
||||||
On 2018, for the common good, BIO decided to open the cluster to the Merlin users and make it widely accessible for the PSI scientists.
|
On 2018, for the common good, BIO decided to open the cluster to the Merlin users and make it widely accessible for the PSI scientists.
|
||||||
@@ -145,6 +137,7 @@ ibstat | grep Rate
|
|||||||
In the Merlin6 GPU computing nodes, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
|
In the Merlin6 GPU computing nodes, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
|
||||||
|
|
||||||
Due to this, the Merlin6 GPU nodes run:
|
Due to this, the Merlin6 GPU nodes run:
|
||||||
|
|
||||||
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||||
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||||
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Slurm cluster 'gmerlin6'
|
||||||
title: Slurm cluster 'gmerlin6'
|
|
||||||
#tags:
|
|
||||||
keywords: configuration, partitions, node definition, gmerlin6
|
|
||||||
last_updated: 29 January 2021
|
|
||||||
summary: "This document describes a summary of the Slurm 'configuration."
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /gmerlin6/slurm-configuration.html
|
|
||||||
---
|
|
||||||
|
|
||||||
This documentation shows basic Slurm configuration and options needed to run jobs in the GPU cluster.
|
This documentation shows basic Slurm configuration and options needed to run jobs in the GPU cluster.
|
||||||
|
|
||||||
@@ -49,30 +41,35 @@ Users might need to specify the Slurm partition. If no partition is specified, i
|
|||||||
|
|
||||||
The table below resumes shows all possible partitions available to users:
|
The table below resumes shows all possible partitions available to users:
|
||||||
|
|
||||||
| GPU Partition | Default Time | Max Time | PriorityJobFactor\* | PriorityTier\*\* |
|
| GPU Partition | Default Time | Max Time | PriorityJobFactor | PriorityTier |
|
||||||
|:---------------------: | :----------: | :--------: | :-----------------: | :--------------: |
|
|:---------------------: | :----------: | :--------: | :-----------------: | :--------------: |
|
||||||
| `gpu` | 1 day | 1 week | 1 | 1 |
|
| `gpu` | 1 day | 1 week | 1 | 1 |
|
||||||
| `gpu-short` | 2 hours | 2 hours | 1000 | 500 |
|
| `gpu-short` | 2 hours | 2 hours | 1000 | 500 |
|
||||||
| `gwendolen` | 30 minutes | 2 hours | 1000 | 1000 |
|
| `gwendolen` | 30 minutes | 2 hours | 1000 | 1000 |
|
||||||
| `gwendolen-long`\*\*\* | 30 minutes | 8 hours | 1 | 1 |
|
| `gwendolen-long` | 30 minutes | 8 hours | 1 | 1 |
|
||||||
|
|
||||||
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
|
The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
|
||||||
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
|
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
|
||||||
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
|
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
|
||||||
|
|
||||||
\*\*Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
|
Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
|
||||||
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values.
|
and, if possible, they will preempt running jobs from partitions with lower **PriorityTier** values.
|
||||||
|
|
||||||
\*\*\***gwnedolen-long** is a special partition which is enabled during non-working hours only. As of _Nov 2023_, the current policy is to disable this partition from Mon to Fri, from 1am to 5pm. However, jobs can be submitted anytime, but can only be scheduled outside this time range.
|
**gwnedolen-long** is a special partition which is enabled during non-working
|
||||||
|
hours only. As of **Nov 2023**, the current policy is to disable this partition
|
||||||
|
from Mon to Fri, from 1am to 5pm. However, jobs can be submitted anytime, but
|
||||||
|
can only be scheduled outside this time range.
|
||||||
|
|
||||||
### Merlin6 GPU Accounts
|
### Merlin6 GPU Accounts
|
||||||
|
|
||||||
Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
|
Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
|
||||||
|
|
||||||
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
|
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
#SBATCH --account=merlin # Possible values: merlin, gwendolen
|
#SBATCH --account=merlin # Possible values: merlin, gwendolen
|
||||||
```
|
```
|
||||||
|
|
||||||
Not all the accounts can be used on all partitions. This is resumed in the table below:
|
Not all the accounts can be used on all partitions. This is resumed in the table below:
|
||||||
|
|
||||||
| Slurm Account | Slurm Partitions |
|
| Slurm Account | Slurm Partitions |
|
||||||
@@ -82,14 +79,20 @@ Not all the accounts can be used on all partitions. This is resumed in the table
|
|||||||
|
|
||||||
By default, all users belong to the `merlin` Slurm accounts, and jobs are submitted to the `gpu` partition when no partition is defined.
|
By default, all users belong to the `merlin` Slurm accounts, and jobs are submitted to the `gpu` partition when no partition is defined.
|
||||||
|
|
||||||
Users only need to specify the `gwendolen` account when using the `gwendolen` or `gwendolen-long` partitions, otherwise specifying account is not needed (it will always default to `merlin`).
|
Users only need to specify the `gwendolen` account when using the `gwendolen` or `gwendolen-long` partitions, otherwise specifying account is not needed (it will always default to `merlin`).
|
||||||
|
|
||||||
#### The 'gwendolen' account
|
#### The 'gwendolen' account
|
||||||
|
|
||||||
For running jobs in the **`gwendolen`/`gwendolen-long`** partitions, users must specify the **`gwendolen`** account.
|
For running jobs in the **`gwendolen`/`gwendolen-long`** partitions, users must
|
||||||
The `merlin` account is not allowed to use the Gwendolen partitions.
|
specify the **`gwendolen`** account. The `merlin` account is not allowed to
|
||||||
|
use the Gwendolen partitions.
|
||||||
|
|
||||||
Gwendolen is restricted to a set of users belonging to the **`unx-gwendolen`** Unix group. If you belong to a project allowed to use **Gwendolen**, or you are a user which would like to have access to it, please request access to the **`unx-gwendolen`** Unix group through [PSI Service Now](https://psi.service-now.com/): the request will be redirected to the responsible of the project (Andreas Adelmann).
|
Gwendolen is restricted to a set of users belonging to the **`unx-gwendolen`**
|
||||||
|
Unix group. If you belong to a project allowed to use **Gwendolen**, or you are
|
||||||
|
a user which would like to have access to it, please request access to the
|
||||||
|
**`unx-gwendolen`** Unix group through [PSI Service
|
||||||
|
Now](https://psi.service-now.com/): the request will be redirected to the
|
||||||
|
responsible of the project (Andreas Adelmann).
|
||||||
|
|
||||||
### Slurm GPU specific options
|
### Slurm GPU specific options
|
||||||
|
|
||||||
@@ -119,16 +122,20 @@ This is detailed in the below table.
|
|||||||
|
|
||||||
#### Constraint / Features
|
#### Constraint / Features
|
||||||
|
|
||||||
Instead of specifying the GPU **type**, sometimes users would need to **specify the GPU by the amount of memory available in the GPU** card itself.
|
Instead of specifying the GPU **type**, sometimes users would need to **specify
|
||||||
This has been defined in Slurm with **Features**, which is a tag which defines the GPU memory for the different GPU cards.
|
the GPU by the amount of memory available in the GPU** card itself.
|
||||||
Users can specify which GPU memory size needs to be used with the `--constraint` option. In that case, notice that *in many cases
|
|
||||||
there is not need to specify `[<type>:]`* in the `--gpus` option.
|
This has been defined in Slurm with **Features**, which is a tag which defines
|
||||||
|
the GPU memory for the different GPU cards. Users can specify which GPU memory
|
||||||
|
size needs to be used with the `--constraint` option. In that case, notice that
|
||||||
|
*in many cases there is not need to specify `[<type>:]`* in the `--gpus`
|
||||||
|
option.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
#SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_24gb, gpumem_40gb
|
#SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_24gb, gpumem_40gb
|
||||||
```
|
```
|
||||||
|
|
||||||
The table below shows the available **Features** and which GPU card models and GPU nodes they belong to:
|
The table below shows the available **Features** and which GPU card models and GPU nodes they belong to:
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<thead>
|
<thead>
|
||||||
@@ -172,6 +179,7 @@ The table below shows the available **Features** and which GPU card models and G
|
|||||||
#### Other GPU options
|
#### Other GPU options
|
||||||
|
|
||||||
Alternative Slurm options for GPU based jobs are available. Please refer to the **man** pages
|
Alternative Slurm options for GPU based jobs are available. Please refer to the **man** pages
|
||||||
|
|
||||||
for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`).
|
for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`).
|
||||||
Below are listed the most common settings:
|
Below are listed the most common settings:
|
||||||
|
|
||||||
@@ -191,8 +199,9 @@ Please, notice that when defining `[<type>:]` once, then all other options must
|
|||||||
|
|
||||||
#### Dealing with Hyper-Threading
|
#### Dealing with Hyper-Threading
|
||||||
|
|
||||||
The **`gmerlin6`** cluster contains the partitions `gwendolen` and `gwendolen-long`, which have a node with Hyper-Threading enabled.
|
The **`gmerlin6`** cluster contains the partitions `gwendolen` and `gwendolen-long`, which have a node with Hyper-Threading enabled.
|
||||||
In that case, one should always specify whether to use Hyper-Threading or not. If not defined, Slurm will
|
|
||||||
|
In that case, one should always specify whether to use Hyper-Threading or not. If not defined, Slurm will
|
||||||
generally use it (exceptions apply). For this machine, generally HT is recommended.
|
generally use it (exceptions apply). For this machine, generally HT is recommended.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -200,14 +209,14 @@ generally use it (exceptions apply). For this machine, generally HT is recommend
|
|||||||
#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.
|
#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.
|
||||||
```
|
```
|
||||||
|
|
||||||
## User and job limits
|
## User and job limits
|
||||||
|
|
||||||
The GPU cluster contains some basic user and job limits to ensure that a single user can not overabuse the resources and a fair usage of the cluster.
|
The GPU cluster contains some basic user and job limits to ensure that a single user can not overabuse the resources and a fair usage of the cluster.
|
||||||
The limits are described below.
|
The limits are described below.
|
||||||
|
|
||||||
### Per job limits
|
### Per job limits
|
||||||
|
|
||||||
These are limits applying to a single job. In other words, there is a maximum of resources a single job can use.
|
These are limits applying to a single job. In other words, there is a maximum of resources a single job can use.
|
||||||
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
|
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
|
||||||
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
|
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
|
||||||
|
|
||||||
@@ -218,25 +227,29 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
|
|||||||
| **gwendolen** | `gwendolen` | No limits |
|
| **gwendolen** | `gwendolen` | No limits |
|
||||||
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
|
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
|
||||||
|
|
||||||
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
|
* With the limits in the public `gpu` and `gpu-short` partitions, a single job using the `merlin` acccount
|
||||||
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
|
|
||||||
Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**.
|
(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB.
|
||||||
As there are no more existing QoS during the week temporary overriding job limits (this happens for
|
Any job exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerJob`**.
|
||||||
instance in the CPU **daily** partition), the job needs to be cancelled, and the requested resources
|
As there are no more existing QoS during the week temporary overriding job limits (this happens for
|
||||||
must be adapted according to the above resource limits.
|
instance in the CPU **daily** partition), the job needs to be cancelled, and the requested resources
|
||||||
|
must be adapted according to the above resource limits.
|
||||||
|
|
||||||
* The **gwendolen** and **gwendolen-long** partitions are two special partitions for a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine.
|
* The **gwendolen** and **gwendolen-long** partitions are two special partitions for a **[NVIDIA DGX A100](https://www.nvidia.com/en-us/data-center/dgx-a100/)** machine.
|
||||||
Only users belonging to the **`unx-gwendolen`** Unix group can run in these partitions. No limits are applied (machine resources can be completely used).
|
|
||||||
|
Only users belonging to the **`unx-gwendolen`** Unix group can run in these partitions. No limits are applied (machine resources can be completely used).
|
||||||
|
|
||||||
* The **`gwendolen-long`** partition is available 24h. However,
|
* The **`gwendolen-long`** partition is available 24h. However,
|
||||||
* from 5:30am to 9pm the partition is `down` (jobs can be submitted, but can not run until the partition is set to `active`).
|
* from 5:30am to 9pm the partition is `down` (jobs can be submitted, but can not run until the partition is set to `active`).
|
||||||
* from 9pm to 5:30am jobs are allowed to run (partition is set to `active`).
|
* from 9pm to 5:30am jobs are allowed to run (partition is set to `active`).
|
||||||
|
|
||||||
### Per user limits for GPU partitions
|
### Per user limits for GPU partitions
|
||||||
|
|
||||||
These limits apply exclusively to users. In other words, there is a maximum of resources a single user can use.
|
These limits apply exclusively to users. In other words, there is a maximum of
|
||||||
Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: `SlurmQoS(limits)`
|
resources a single user can use. Limits are defined using QoS, and this is
|
||||||
(possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`):
|
usually set at the partition level. Limits are described in the table below
|
||||||
|
with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be listed
|
||||||
|
with the command `sacctmgr show qos`):
|
||||||
|
|
||||||
| Partition | Slurm Account | Mon-Sun 0h-24h |
|
| Partition | Slurm Account | Mon-Sun 0h-24h |
|
||||||
|:------------------:| :----------------: | :---------------------------------------------: |
|
|:------------------:| :----------------: | :---------------------------------------------: |
|
||||||
@@ -245,13 +258,18 @@ Limits are defined using QoS, and this is usually set at the partition level. Li
|
|||||||
| **gwendolen** | `gwendolen` | No limits |
|
| **gwendolen** | `gwendolen` | No limits |
|
||||||
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
|
| **gwendolen-long** | `gwendolen` | No limits, active from 9pm to 5:30am |
|
||||||
|
|
||||||
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
|
* With the limits in the public `gpu` and `gpu-short` partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.
|
||||||
Jobs sent by any user already exceeding such limits will stay in the queue with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**.
|
|
||||||
In that case, job can wait in the queue until some of the running resources are freed.
|
Jobs sent by any user already exceeding such limits will stay in the queue
|
||||||
|
with the message **`QOSMax[Cpu|GRES|Mem]PerUser`**. In that case, job can
|
||||||
|
wait in the queue until some of the running resources are freed.
|
||||||
|
|
||||||
* Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc.
|
* Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc.
|
||||||
Please try to avoid occupying all GPUs of the same type for several hours or multiple days, otherwise it would block other users needing the same
|
|
||||||
type of GPU.
|
!!! warning
|
||||||
|
Please try to avoid occupying all GPUs of the same type for several hours or
|
||||||
|
multiple days, otherwise it would block other users needing the same type of
|
||||||
|
GPU.
|
||||||
|
|
||||||
## Advanced Slurm configuration
|
## Advanced Slurm configuration
|
||||||
|
|
||||||
@@ -265,4 +283,8 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
|
|||||||
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
|
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
|
||||||
|
|
||||||
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
|
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
|
||||||
Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).
|
|
||||||
|
Configuration files for the old **merlin5** cluster or for the **gmerlin6**
|
||||||
|
cluster must be checked directly on any of the **merlin5** or **gmerlin6**
|
||||||
|
computing nodes (in example, by login in to one of the nodes while a job or an
|
||||||
|
active allocation is running).
|
||||||
|
|||||||
BIN
docs/images/merlin_cave.png
Normal file
BIN
docs/images/merlin_cave.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 4.4 MiB |
@@ -6,9 +6,9 @@ hide:
|
|||||||
|
|
||||||
# HPCE User Documentation
|
# HPCE User Documentation
|
||||||
|
|
||||||
{ width="500" }
|
{ width="650px" }
|
||||||
/// caption
|
/// caption
|
||||||
The magical trio 🪄
|
_Within his lair, the wizard ever strives for the perfection of his art._
|
||||||
///
|
///
|
||||||
|
|
||||||
The [HPCE
|
The [HPCE
|
||||||
@@ -16,11 +16,7 @@ group](https://www.psi.ch/en/awi/high-performance-computing-and-emerging-technol
|
|||||||
is part of the [PSI Center for Scientific Computing, Theory and
|
is part of the [PSI Center for Scientific Computing, Theory and
|
||||||
Data](https://www.psi.ch/en/csd) at [Paul Scherrer
|
Data](https://www.psi.ch/en/csd) at [Paul Scherrer
|
||||||
Institute](https://www.psi.ch). It provides a range of HPC services for PSI
|
Institute](https://www.psi.ch). It provides a range of HPC services for PSI
|
||||||
scientists, such as the Merlin series of HPC clusters, and also engages in
|
researchers, staff, and external collaborators, such as the Merlin series of
|
||||||
research activities on technologies (data analysis and machine learning
|
HPC clusters. Furthermore the HPCE group engages in research activities on
|
||||||
technologies) used on these systems.
|
technologies (data analysis and machine learning technologies) used on these
|
||||||
|
systems.
|
||||||
## Quick Links
|
|
||||||
|
|
||||||
- user support
|
|
||||||
- news
|
|
||||||
|
|||||||
@@ -15,8 +15,8 @@ Welcome to the official documentation for migrating experiment data from **MEG**
|
|||||||
| merlin7 | /data/user/`$USER` | /data/user/`$USER` | /data/project/meg | |
|
| merlin7 | /data/user/`$USER` | /data/user/`$USER` | /data/project/meg | |
|
||||||
|
|
||||||
* The **Merlin6 home and user data directores have been merged** into the single new home directory `/data/user/$USER` on Merlin7.
|
* The **Merlin6 home and user data directores have been merged** into the single new home directory `/data/user/$USER` on Merlin7.
|
||||||
* This is the same for the home directory in the meg cluster, which has to be merged into `/data/user/$USER` on Merlin7.
|
* This is the same for the home directory in the meg cluster, which has to be merged into `/data/user/$USER` on Merlin7.
|
||||||
* Users are responsible for moving the data.
|
* Users are responsible for moving the data.
|
||||||
* The **experiment directory has been integrated into `/data/project/meg`**.
|
* The **experiment directory has been integrated into `/data/project/meg`**.
|
||||||
|
|
||||||
### Recommended Cleanup Actions
|
### Recommended Cleanup Actions
|
||||||
@@ -38,16 +38,17 @@ A `experiment_migration.setup` migration script must be executed from **any MeG
|
|||||||
|
|
||||||
* The script **must be executed after every reboot** of the destination nodes.
|
* The script **must be executed after every reboot** of the destination nodes.
|
||||||
* **Reason:** On Merlin7, the home directory for the `root` user resides on ephemeral storage (no physical disk).
|
* **Reason:** On Merlin7, the home directory for the `root` user resides on ephemeral storage (no physical disk).
|
||||||
|
|
||||||
After a reboot, this directory is cleaned, so **SSH keys need to be redeployed** before running the migration again.
|
After a reboot, this directory is cleaned, so **SSH keys need to be redeployed** before running the migration again.
|
||||||
|
|
||||||
#### When using a PSI Active Directory (AD) account
|
#### When using a PSI Active Directory (AD) account
|
||||||
|
|
||||||
* Applicable accounts include, for example:
|
* Applicable accounts include, for example:
|
||||||
* `gac-meg2_data`
|
* `gac-meg2_data`
|
||||||
* `gac-meg2`
|
* `gac-meg2`
|
||||||
* The script only needs to be executed **once**, provided that:
|
* The script only needs to be executed **once**, provided that:
|
||||||
* The home directory for the AD account is located on a shared storage area.
|
* The home directory for the AD account is located on a shared storage area.
|
||||||
* This shared storage is accessible from the node executing the transfer.
|
* This shared storage is accessible from the node executing the transfer.
|
||||||
* **Reason:** On Merlin7, these accounts have their home directories on persistent shared storage, so the SSH keys remain available across reboots.
|
* **Reason:** On Merlin7, these accounts have their home directories on persistent shared storage, so the SSH keys remain available across reboots.
|
||||||
|
|
||||||
To run it:
|
To run it:
|
||||||
@@ -71,25 +72,29 @@ If you are stuck, email: [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists
|
|||||||
### Migration Procedure
|
### Migration Procedure
|
||||||
|
|
||||||
1. **Run an initial sync**, ideally within a `tmux` session
|
1. **Run an initial sync**, ideally within a `tmux` session
|
||||||
* This copies the bulk of the data from MeG to Merlin7.
|
* This copies the bulk of the data from MeG to Merlin7.
|
||||||
* **IMPORTANT: Do not modify the destination directories**
|
* **IMPORTANT: Do not modify the destination directories**
|
||||||
* Please, before starting the transfer ensure that:
|
* Please, before starting the transfer ensure that:
|
||||||
* The source and destination directories are correct.
|
* The source and destination directories are correct.
|
||||||
* The destination directories exist.
|
* The destination directories exist.
|
||||||
|
|
||||||
2. **Run additional syncs if needed**
|
2. **Run additional syncs if needed**
|
||||||
* Subsequent syncs can be executed to transfer changes.
|
* Subsequent syncs can be executed to transfer changes.
|
||||||
* Ensure that **only one sync for the same directory runs at a time**.
|
* Ensure that **only one sync for the same directory runs at a time**.
|
||||||
* Multiple syncs are often required since the first one may take several hours or even days.
|
* Multiple syncs are often required since the first one may take several hours or even days.
|
||||||
|
|
||||||
3. Schedule a date for the final migration:
|
3. Schedule a date for the final migration:
|
||||||
* Any activity must be stopped on the source directory.
|
* Any activity must be stopped on the source directory.
|
||||||
* In the same way, no activity must be done on the destination until the migration is complete.
|
* In the same way, no activity must be done on the destination until the migration is complete.
|
||||||
|
|
||||||
4. **Perform a final sync with the `-E` option** (if it applies)
|
4. **Perform a final sync with the `-E` option** (if it applies)
|
||||||
* Use `-E` **only if you need to delete files on the destination that were removed from the source.**
|
* Use `-E` **only if you need to delete files on the destination that were removed from the source.**
|
||||||
* This ensures the destination becomes an exact mirror of the source.
|
* This ensures the destination becomes an exact mirror of the source.
|
||||||
* **Never use `-E` after the destination has gone into production**, as it will delete new data created there.
|
* **Never use `-E` after the destination has gone into production**, as it will delete new data created there.
|
||||||
|
|
||||||
5. Disable access on the source folder.
|
5. Disable access on the source folder.
|
||||||
6. Enable access on the destination folder.
|
6. Enable access on the destination folder.
|
||||||
* At this point, **no new syncs have to be performed.**
|
* At this point, **no new syncs have to be performed.**
|
||||||
|
|
||||||
!!! note "Important"
|
!!! note "Important"
|
||||||
|
|
||||||
@@ -160,9 +165,9 @@ The following example demonstrates how to migrate the **entire `online`** direct
|
|||||||
From: /meg/data1/online
|
From: /meg/data1/online
|
||||||
To: login001.merlin7.psi.ch:/data/project/meg/data1/online
|
To: login001.merlin7.psi.ch:/data/project/meg/data1/online
|
||||||
Threads: 10 | Split: 20000 files | Max size: 100G
|
Threads: 10 | Split: 20000 files | Max size: 100G
|
||||||
RunID:
|
RunID:
|
||||||
|
|
||||||
Please confirm to start (y/N):
|
Please confirm to start (y/N):
|
||||||
❌ Transfer cancelled by user.
|
❌ Transfer cancelled by user.
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -181,7 +186,7 @@ The following example demonstrates how to migrate **only a subdirectory**. In th
|
|||||||
From: /meg/data1/shared/subprojects/meg1
|
From: /meg/data1/shared/subprojects/meg1
|
||||||
To: login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1
|
To: login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1
|
||||||
Threads: 10 | Split: 20000 files | Max size: 100G
|
Threads: 10 | Split: 20000 files | Max size: 100G
|
||||||
RunID:
|
RunID:
|
||||||
|
|
||||||
Please confirm to start (y/N): N
|
Please confirm to start (y/N): N
|
||||||
❌ Transfer cancelled by user.
|
❌ Transfer cancelled by user.
|
||||||
@@ -196,5 +201,5 @@ This command initiates the migration of the directory, by creating the destinati
|
|||||||
```
|
```
|
||||||
|
|
||||||
* Runs FPSYNC with 10 threads and N parts of max 20000 files or 100G files:
|
* Runs FPSYNC with 10 threads and N parts of max 20000 files or 100G files:
|
||||||
* Source: `/meg/data1/shared/subprojects/meg1`
|
* Source: `/meg/data1/shared/subprojects/meg1`
|
||||||
* Destination: `login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1`
|
* Destination: `login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1`
|
||||||
|
|||||||
@@ -1,22 +1,15 @@
|
|||||||
---
|
# Hardware And Software Description
|
||||||
title: Hardware And Software Description
|
|
||||||
#tags:
|
|
||||||
#keywords:
|
|
||||||
last_updated: 09 April 2021
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin5/hardware-and-software.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Hardware
|
## Hardware
|
||||||
|
|
||||||
### Computing Nodes
|
### Computing Nodes
|
||||||
|
|
||||||
Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
|
Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).
|
||||||
|
|
||||||
* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
|
* Merlin5 is based on the [**HPE c7000 Enclosure**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128339) solution, with 16 x [**HPE ProLiant BL460c Gen8**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04123239) nodes per chassis.
|
||||||
* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
|
* Connectivity is based on Infiniband **ConnectX-3 QDR-40Gbps**
|
||||||
* 16 internal ports for intra chassis communication
|
* 16 internal ports for intra chassis communication
|
||||||
* 2 connected external ports for inter chassis communication and storage access.
|
* 2 connected external ports for inter chassis communication and storage access.
|
||||||
|
|
||||||
The below table summarizes the hardware setup for the Merlin5 computing nodes:
|
The below table summarizes the hardware setup for the Merlin5 computing nodes:
|
||||||
|
|
||||||
@@ -91,6 +84,7 @@ However, this is an old version of Infiniband which requires older drivers and s
|
|||||||
In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
|
In Merlin5, we try to keep software stack coherency with the main cluster [Merlin6](../merlin6/index.md).
|
||||||
|
|
||||||
Due to this, Merlin5 runs:
|
Due to this, Merlin5 runs:
|
||||||
|
|
||||||
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||||
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||||
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Slurm Configuration
|
||||||
title: Slurm Configuration
|
|
||||||
#tags:
|
|
||||||
keywords: configuration, partitions, node definition
|
|
||||||
last_updated: 20 May 2021
|
|
||||||
summary: "This document describes a summary of the Merlin5 Slurm configuration."
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin5/slurm-configuration.html
|
|
||||||
---
|
|
||||||
|
|
||||||
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster.
|
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster.
|
||||||
|
|
||||||
@@ -28,7 +20,6 @@ consider the memory as a *consumable resource*. Hence, users can *oversubscribe*
|
|||||||
this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago.
|
this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago.
|
||||||
If you know that this might be a problem for you, please, always use Merlin6 instead.
|
If you know that this might be a problem for you, please, always use Merlin6 instead.
|
||||||
|
|
||||||
|
|
||||||
## Running jobs in the 'merlin5' cluster
|
## Running jobs in the 'merlin5' cluster
|
||||||
|
|
||||||
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster.
|
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster.
|
||||||
@@ -96,11 +87,11 @@ Below are listed the most common settings:
|
|||||||
Notice that in **Merlin5** no hyper-threading is available (while in **Merlin6** it is).
|
Notice that in **Merlin5** no hyper-threading is available (while in **Merlin6** it is).
|
||||||
Hence, in **Merlin5** there is not need to specify `--hint` hyper-threading related options.
|
Hence, in **Merlin5** there is not need to specify `--hint` hyper-threading related options.
|
||||||
|
|
||||||
## User and job limits
|
## User and job limits
|
||||||
|
|
||||||
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
|
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
|
||||||
avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
|
avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
|
||||||
pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied).
|
pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied).
|
||||||
In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
|
In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
|
||||||
resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).
|
resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).
|
||||||
|
|
||||||
@@ -119,7 +110,7 @@ with the format `SlurmQoS(limits)` (`SlurmQoS` can be listed from the `sacctmgr
|
|||||||
| **merlin** | merlin5(cpu=384) | None |
|
| **merlin** | merlin5(cpu=384) | None |
|
||||||
| **merlin-long** | merlin5(cpu=384) | Max. 4 nodes |
|
| **merlin-long** | merlin5(cpu=384) | Max. 4 nodes |
|
||||||
|
|
||||||
By default, by QoS limits, a job can not use more than 384 cores (max CPU per job).
|
By default, by QoS limits, a job can not use more than 384 cores (max CPU per job).
|
||||||
However, for the `merlin-long`, this is even more restricted: there is an extra limit of 4 dedicated nodes for this partion. This is defined
|
However, for the `merlin-long`, this is even more restricted: there is an extra limit of 4 dedicated nodes for this partion. This is defined
|
||||||
at the partition level, and will overwrite any QoS limit as long as this is more restrictive.
|
at the partition level, and will overwrite any QoS limit as long as this is more restrictive.
|
||||||
|
|
||||||
|
|||||||
@@ -1,17 +1,9 @@
|
|||||||
---
|
# Downtimes
|
||||||
title: Downtimes
|
|
||||||
#tags:
|
|
||||||
#keywords:
|
|
||||||
last_updated: 28 June 2019
|
|
||||||
#summary: "Merlin 6 cluster overview"
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/downtimes.html
|
|
||||||
---
|
|
||||||
|
|
||||||
On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance.
|
On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance.
|
||||||
Users will be informed with at least one week in advance when a downtime is scheduled for the next month.
|
Users will be informed with at least one week in advance when a downtime is scheduled for the next month.
|
||||||
|
|
||||||
Downtimes will be informed to users through the <merlin-users@lists.psi.ch> mail list. Also, a detailed description
|
Downtimes will be informed to users through the <merlin-users@lists.psi.ch> mail list. Also, a detailed description
|
||||||
for the nexts scheduled interventions will be available in [Next Scheduled Downtimes](#next-scheduled-downtimes)).
|
for the nexts scheduled interventions will be available in [Next Scheduled Downtimes](#next-scheduled-downtimes)).
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -21,12 +13,14 @@ for the nexts scheduled interventions will be available in [Next Scheduled Downt
|
|||||||
Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes.
|
Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes.
|
||||||
When this is required, users will be informed accordingly. Two different types of draining are possible:
|
When this is required, users will be informed accordingly. Two different types of draining are possible:
|
||||||
|
|
||||||
* **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition.
|
* **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition.
|
||||||
Jobs already running on the partition continue to run. This will be the **default** drain method.
|
|
||||||
* **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message),
|
|
||||||
but jobs already queued on the partition may be allocated to nodes and run.
|
|
||||||
|
|
||||||
Unless explicitly specified, the default draining policy for each partition will be the following:
|
Jobs already running on the partition continue to run. This will be the **default** drain method.
|
||||||
|
|
||||||
|
* **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message),
|
||||||
|
but jobs already queued on the partition may be allocated to nodes and run.
|
||||||
|
|
||||||
|
Unless explicitly specified, the default draining policy for each partition will be the following:
|
||||||
|
|
||||||
* The **daily** and **general** partitions will be soft drained 12h before the downtime.
|
* The **daily** and **general** partitions will be soft drained 12h before the downtime.
|
||||||
* The **hourly** partition will be soft drained 1 hour before the downtime.
|
* The **hourly** partition will be soft drained 1 hour before the downtime.
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Past Downtimes
|
||||||
title: Past Downtimes
|
|
||||||
#tags:
|
|
||||||
#keywords:
|
|
||||||
last_updated: 03 September 2019
|
|
||||||
#summary: "Merlin 6 cluster overview"
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/past-downtimes.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Past Downtimes: Log Changes
|
## Past Downtimes: Log Changes
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Contact
|
||||||
title: Contact
|
|
||||||
#tags:
|
|
||||||
keywords: contact, support, snow, service now, mailing list, mailing, email, mail, merlin-admins@lists.psi.ch, merlin-users@lists.psi.ch, merlin users
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/contact.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Support
|
## Support
|
||||||
|
|
||||||
|
|||||||
@@ -1,14 +1,4 @@
|
|||||||
---
|
# FAQ
|
||||||
title: FAQ
|
|
||||||
#tags:
|
|
||||||
keywords: faq, frequently asked questions, support
|
|
||||||
last_updated: 27 October 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/faq.html
|
|
||||||
---
|
|
||||||
|
|
||||||
{%include toc.html %}
|
|
||||||
|
|
||||||
## How do I register for Merlin?
|
## How do I register for Merlin?
|
||||||
|
|
||||||
@@ -35,7 +25,7 @@ How to install depends a bit on the software itself. There are three common inst
|
|||||||
2. *source compilation* using make/cmake/autoconfig/etc. Usually the compilation scripts accept a `--prefix=/data/user/$USER` directory for where to install it. Then they place files under `<prefix>/bin`, `<prefix>/lib`, etc. The exact syntax should be documented in the installation instructions.
|
2. *source compilation* using make/cmake/autoconfig/etc. Usually the compilation scripts accept a `--prefix=/data/user/$USER` directory for where to install it. Then they place files under `<prefix>/bin`, `<prefix>/lib`, etc. The exact syntax should be documented in the installation instructions.
|
||||||
3. *conda environment*. This is now becoming standard for python-based software, including lots of the AI tools. First follow the [initial setup instructions](../software-support/python.md#anaconda) to configure conda to use /data/user instead of your home directory. Then you can create environments like:
|
3. *conda environment*. This is now becoming standard for python-based software, including lots of the AI tools. First follow the [initial setup instructions](../software-support/python.md#anaconda) to configure conda to use /data/user instead of your home directory. Then you can create environments like:
|
||||||
|
|
||||||
```
|
```bash
|
||||||
module load anaconda/2019.07
|
module load anaconda/2019.07
|
||||||
# if they provide environment.yml
|
# if they provide environment.yml
|
||||||
conda env create -f environment.yml
|
conda env create -f environment.yml
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Known Problems
|
||||||
title: Known Problems
|
|
||||||
#tags:
|
|
||||||
keywords: "known problems, troubleshooting, illegal instructions, paraview, ansys, shell, opengl, mesa, vglrun, module: command not found, error"
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/known-problems.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Common errors
|
## Common errors
|
||||||
|
|
||||||
@@ -17,8 +9,8 @@ This is usually because the software was compiled with a set of instructions new
|
|||||||
and it mostly depends on the processor generation.
|
and it mostly depends on the processor generation.
|
||||||
|
|
||||||
In example, `merlin-l-001` and `merlin-l-002` contain a newer generation of processors than the old GPUs nodes, or than the Merlin5 cluster.
|
In example, `merlin-l-001` and `merlin-l-002` contain a newer generation of processors than the old GPUs nodes, or than the Merlin5 cluster.
|
||||||
Hence, unless one compiles the software with compatibility with set of instructions from older processors, it will not run on old nodes.
|
Hence, unless one compiles the software with compatibility with set of instructions from older processors, it will not run on old nodes.
|
||||||
Sometimes, this is properly set by default at the compilation time, but sometimes is not.
|
Sometimes, this is properly set by default at the compilation time, but sometimes is not.
|
||||||
|
|
||||||
For GCC, please refer to [GCC x86 Options](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html) for compiling options. In case of doubts, contact us.
|
For GCC, please refer to [GCC x86 Options](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html) for compiling options. In case of doubts, contact us.
|
||||||
|
|
||||||
@@ -49,7 +41,7 @@ srun --cpus-per-task=$SLURM_CPUS_PER_TASK python -c "import os; print(os.sched_g
|
|||||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method1
|
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method1
|
||||||
Submitted batch job 8000813
|
Submitted batch job 8000813
|
||||||
|
|
||||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000813.out
|
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000813.out
|
||||||
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
||||||
{1, 45}
|
{1, 45}
|
||||||
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
||||||
@@ -72,11 +64,10 @@ echo 'In this example, by setting an environment variable SRUN_CPUS_PER_TASK'
|
|||||||
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
|
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
|
||||||
srun python -c "import os; print(os.sched_getaffinity(0))"
|
srun python -c "import os; print(os.sched_getaffinity(0))"
|
||||||
|
|
||||||
|
|
||||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2
|
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2
|
||||||
Submitted batch job 8000815
|
Submitted batch job 8000815
|
||||||
|
|
||||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000815.out
|
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000815.out
|
||||||
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK
|
||||||
{1, 45}
|
{1, 45}
|
||||||
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
One has to implicitly specify $SLURM_CPUS_PER_TASK
|
||||||
@@ -84,14 +75,13 @@ In this example, by setting an environment variable SRUN_CPUS_PER_TASK
|
|||||||
{1, 2, 3, 4, 45, 46, 47, 48}
|
{1, 2, 3, 4, 45, 46, 47, 48}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## General topics
|
## General topics
|
||||||
|
|
||||||
### Default SHELL
|
### Default SHELL
|
||||||
|
|
||||||
In general, **`/bin/bash` is the recommended default user's SHELL** when working in Merlin.
|
In general, **`/bin/bash` is the recommended default user's SHELL** when working in Merlin.
|
||||||
|
|
||||||
Some users might notice that BASH is not the default SHELL when logging in to Merlin systems, or they might need to run a different SHELL.
|
Some users might notice that BASH is not the default SHELL when logging in to Merlin systems, or they might need to run a different SHELL.
|
||||||
This is probably because when the PSI account was requested, no SHELL description was specified or a different one was requested explicitly by the requestor.
|
This is probably because when the PSI account was requested, no SHELL description was specified or a different one was requested explicitly by the requestor.
|
||||||
Users can check which is the default SHELL specified in the PSI account with the following command:
|
Users can check which is the default SHELL specified in the PSI account with the following command:
|
||||||
|
|
||||||
@@ -99,10 +89,10 @@ Users can check which is the default SHELL specified in the PSI account with the
|
|||||||
getent passwd $USER | awk -F: '{print $NF}'
|
getent passwd $USER | awk -F: '{print $NF}'
|
||||||
```
|
```
|
||||||
|
|
||||||
If SHELL does not correspond to the one you need to use, you should request a central change for it.
|
If SHELL does not correspond to the one you need to use, you should request a central change for it.
|
||||||
This is because Merlin accounts are central PSI accounts. Hence, **change must be requested via [PSI Service Now](contact.md#psi-service-now)**.
|
This is because Merlin accounts are central PSI accounts. Hence, **change must be requested via [PSI Service Now](contact.md#psi-service-now)**.
|
||||||
|
|
||||||
Alternatively, if you work on other PSI Linux systems but for Merlin you need a different SHELL type, a temporary change can be performed during login startup.
|
Alternatively, if you work on other PSI Linux systems but for Merlin you need a different SHELL type, a temporary change can be performed during login startup.
|
||||||
You can update one of the following files:
|
You can update one of the following files:
|
||||||
* `~/.login`
|
* `~/.login`
|
||||||
* `~/.profile`
|
* `~/.profile`
|
||||||
@@ -116,7 +106,7 @@ MY_SHELL=/bin/bash
|
|||||||
exec $MY_SHELL -l
|
exec $MY_SHELL -l
|
||||||
```
|
```
|
||||||
|
|
||||||
Notice that available *shells* can be found in the following file:
|
Notice that available *shells* can be found in the following file:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cat /etc/shells
|
cat /etc/shells
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Migration From Merlin5
|
||||||
title: Migration From Merlin5
|
|
||||||
#tags:
|
|
||||||
keywords: merlin5, merlin6, migration, rsync, archive, archiving, lts, long-term storage
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/migrating.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Directories
|
## Directories
|
||||||
|
|
||||||
@@ -30,9 +22,10 @@ where:
|
|||||||
* **Block** is capacity size in GB and TB
|
* **Block** is capacity size in GB and TB
|
||||||
* **Files** is number of files + directories in Millions (M)
|
* **Files** is number of files + directories in Millions (M)
|
||||||
* **Quota types** are the following:
|
* **Quota types** are the following:
|
||||||
* **USR**: Quota is setup individually per user name
|
* **USR**: Quota is setup individually per user name
|
||||||
* **GRP**: Quota is setup individually per Unix Group name
|
* **GRP**: Quota is setup individually per Unix Group name
|
||||||
* **Fileset**: Quota is setup per project root directory.
|
* **Fileset**: Quota is setup per project root directory.
|
||||||
|
|
||||||
* User data directory ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
|
* User data directory ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
|
||||||
* Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded.
|
* Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded.
|
||||||
|
|
||||||
|
|||||||
@@ -1,15 +1,7 @@
|
|||||||
---
|
# Troubleshooting
|
||||||
title: Troubleshooting
|
|
||||||
#tags:
|
|
||||||
keywords: troubleshooting, problems, faq, known problems
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/troubleshooting.html
|
|
||||||
---
|
|
||||||
|
|
||||||
For troubleshooting, please contact us through the official channels. See [Contact](contact.md)
|
For troubleshooting, please contact us through the official channels. See [Contact](contact.md)
|
||||||
for more information.
|
for more information.
|
||||||
|
|
||||||
## Known Problems
|
## Known Problems
|
||||||
|
|
||||||
@@ -30,7 +22,7 @@ the following information:
|
|||||||
echo "Current location:"; pwd
|
echo "Current location:"; pwd
|
||||||
echo "User environment:"; env
|
echo "User environment:"; env
|
||||||
echo "List of PModules:"; module list
|
echo "List of PModules:"; module list
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Whenever possible, provide the Slurm JobID.
|
3. Whenever possible, provide the Slurm JobID.
|
||||||
|
|
||||||
|
|||||||
@@ -1,25 +1,19 @@
|
|||||||
---
|
# Hardware And Software Description
|
||||||
title: Hardware And Software Description
|
|
||||||
#tags:
|
|
||||||
#keywords:
|
|
||||||
last_updated: 13 June 2019
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/hardware-and-software.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Hardware
|
## Hardware
|
||||||
|
|
||||||
### Computing Nodes
|
### Computing Nodes
|
||||||
|
|
||||||
The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw)
|
The new Merlin6 cluster contains a solution based on **four** [**HPE Apollo k6000 Chassis**](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016641enw)
|
||||||
|
|
||||||
* *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades.
|
* *Three* of them contain 24 x [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades.
|
||||||
* A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
|
* A *fourth* chassis was purchased on 2021 with [**HP Apollo XL230K Gen10**](https://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=a00016634enw) blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.
|
||||||
|
|
||||||
The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains:
|
The connectivity for the Merlin6 cluster is based on **ConnectX-5 EDR-100Gbps**, and each chassis contains:
|
||||||
|
|
||||||
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
|
* 1 x [HPE Apollo InfiniBand EDR 36-port Unmanaged Switch](https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00016643enw)
|
||||||
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
|
* 24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)
|
||||||
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
|
* 12 external EDR-100Gbps ports (for external for internal low latency connectivity)
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<thead>
|
<thead>
|
||||||
@@ -142,6 +136,7 @@ The connectivity is based on **ConnectX-5 EDR-100Gbps** for the new login nodes,
|
|||||||
### Storage
|
### Storage
|
||||||
|
|
||||||
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
|
The storage node is based on the [Lenovo Distributed Storage Solution for IBM Spectrum Scale](https://lenovopress.com/lp0626-lenovo-distributed-storage-solution-for-ibm-spectrum-scale-x3650-m5).
|
||||||
|
|
||||||
* 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures.
|
* 2 x **Lenovo DSS G240** systems, each one composed by 2 IO Nodes **ThinkSystem SR650** mounting 4 x **Lenovo Storage D3284 High Density Expansion** enclosures.
|
||||||
* Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**).
|
* Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are **ConnectX-5** and 2 are **ConnectX-4**).
|
||||||
|
|
||||||
@@ -151,11 +146,13 @@ The storage solution is connected to the HPC clusters through 2 x **Mellanox SB7
|
|||||||
|
|
||||||
Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
|
Merlin6 cluster connectivity is based on the [**Infiniband**](https://en.wikipedia.org/wiki/InfiniBand) technology. This allows fast access with very low latencies to the data as well as running
|
||||||
extremely efficient MPI-based jobs:
|
extremely efficient MPI-based jobs:
|
||||||
|
|
||||||
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
|
* Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.
|
||||||
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
|
* Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.
|
||||||
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
|
* Communication to the storage ensures up to 800Gbps of aggregated bandwidth.
|
||||||
|
|
||||||
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
|
Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):
|
||||||
|
|
||||||
* 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
|
* 1 x **MSX6710** (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.
|
||||||
* 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
|
* 2 x **MSB7800** (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.
|
||||||
* 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution.
|
* 3 x **HP EDR Unmanaged** switches, each one embedded to each HP Apollo k6000 chassis solution.
|
||||||
@@ -164,8 +161,9 @@ Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniban
|
|||||||
## Software
|
## Software
|
||||||
|
|
||||||
In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs:
|
In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, **Merlin6** runs:
|
||||||
|
|
||||||
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
* [**RedHat Enterprise Linux 7**](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.9_release_notes/index)
|
||||||
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
* [**Slurm**](https://slurm.schedmd.com/), we usually try to keep it up to date with the most recent versions.
|
||||||
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
* [**GPFS v5**](https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/ibmspectrumscale502_welcome.html)
|
||||||
* [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-5** or superior cards.
|
* [**MLNX_OFED LTS v.5.2-2.2.0.0 or newer**](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) for all **ConnectX-5** or superior cards.
|
||||||
* [MLNX_OFED LTS v.4.9-2.2.4.0](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) is installed for remaining **ConnectX-3** and **ConnectIB** cards.
|
* [MLNX_OFED LTS v.4.9-2.2.4.0](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) is installed for remaining **ConnectX-3** and **ConnectIB** cards.
|
||||||
|
|||||||
@@ -41,20 +41,21 @@ Archiving can be done from any node accessible by the users (usually from the lo
|
|||||||
Below are the main steps for using the Data Catalog.
|
Below are the main steps for using the Data Catalog.
|
||||||
|
|
||||||
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
|
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
|
||||||
* Prepare a metadata file describing the dataset
|
* Prepare a metadata file describing the dataset
|
||||||
* Run **`datasetIngestor`** script
|
* Run **`datasetIngestor`** script
|
||||||
* If necessary, the script will copy the data to the PSI archive servers
|
* If necessary, the script will copy the data to the PSI archive servers
|
||||||
* Usually this is necessary when archiving from directories other than **`/data/user`** or
|
* Usually this is necessary when archiving from directories other than **`/data/user`** or
|
||||||
**`/data/project`**. It would be also necessary when the Merlin export server (**`merlin-archive.psi.ch`**)
|
**`/data/project`**. It would be also necessary when the Merlin export server (**`merlin-archive.psi.ch`**)
|
||||||
is down for any reason.
|
is down for any reason.
|
||||||
* Archive the dataset:
|
* Archive the dataset:
|
||||||
* Visit [https://discovery.psi.ch](https://discovery.psi.ch)
|
* Visit [<https://discovery.psi.ch](https://discovery.psi.ch>)
|
||||||
* Click **`Archive`** for the dataset
|
* Click **`Archive`** for the dataset
|
||||||
* The system will now copy the data to the PetaByte Archive at CSCS
|
* The system will now copy the data to the PetaByte Archive at CSCS
|
||||||
|
|
||||||
* Retrieve data from the catalog:
|
* Retrieve data from the catalog:
|
||||||
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **`Retrieve`**
|
* Find the dataset on [<https://discovery.psi.ch](https://discovery.psi.ch>) and click **`Retrieve`**
|
||||||
* Wait for the data to be copied to the PSI retrieval system
|
* Wait for the data to be copied to the PSI retrieval system
|
||||||
* Run **`datasetRetriever`** script
|
* Run **`datasetRetriever`** script
|
||||||
|
|
||||||
Since large data sets may take a lot of time to transfer, some steps are
|
Since large data sets may take a lot of time to transfer, some steps are
|
||||||
designed to happen in the background. The discovery website can be used to
|
designed to happen in the background. The discovery website can be used to
|
||||||
@@ -246,7 +247,7 @@ step will take a long time and may appear to have hung. You can check what files
|
|||||||
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
|
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
|
||||||
|
|
||||||
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
|
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
|
||||||
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
|
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
tar -f [output].tar [srcdir]
|
tar -f [output].tar [srcdir]
|
||||||
@@ -266,7 +267,6 @@ step will take a long time and may appear to have hung. You can check what files
|
|||||||
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
|
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
|
||||||
2019/11/06 11:04:43 Latest version: 1.1.11
|
2019/11/06 11:04:43 Latest version: 1.1.11
|
||||||
|
|
||||||
|
|
||||||
2019/11/06 11:04:43 Your version of this program is up-to-date
|
2019/11/06 11:04:43 Your version of this program is up-to-date
|
||||||
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
|
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
|
||||||
2019/11/06 11:04:43 Your username:
|
2019/11/06 11:04:43 Your username:
|
||||||
@@ -316,7 +316,6 @@ user_n@pb-archive.psi.ch's password:
|
|||||||
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
|
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
|
||||||
The data must first be copied to a rsync cache server.
|
The data must first be copied to a rsync cache server.
|
||||||
|
|
||||||
|
|
||||||
2019/11/06 11:05:04 Do you want to continue (Y/n)?
|
2019/11/06 11:05:04 Do you want to continue (Y/n)?
|
||||||
Y
|
Y
|
||||||
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
|
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Connecting from a MacOS Client
|
||||||
title: Connecting from a MacOS Client
|
|
||||||
#tags:
|
|
||||||
keywords: MacOS, mac os, mac, connecting, client, configuration, SSH, X11
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
summary: "This document describes a recommended setup for a MacOS client."
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/connect-from-macos.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## SSH without X11 Forwarding
|
## SSH without X11 Forwarding
|
||||||
|
|
||||||
|
|||||||
@@ -1,18 +1,12 @@
|
|||||||
---
|
# Connecting from a Windows Client
|
||||||
title: Connecting from a Windows Client
|
|
||||||
keywords: microsoft, mocosoft, windows, putty, xming, connecting, client, configuration, SSH, X11
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
summary: "This document describes a recommended setup for a Windows client."
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/connect-from-windows.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## SSH with PuTTY without X11 Forwarding
|
## SSH with PuTTY without X11 Forwarding
|
||||||
|
|
||||||
PuTTY is one of the most common tools for SSH.
|
PuTTY is one of the most common tools for SSH.
|
||||||
|
|
||||||
Check, if the following software packages are installed on the Windows workstation by
|
Check, if the following software packages are installed on the Windows workstation by
|
||||||
inspecting the *Start* menu (hint: use the *Search* box to save time):
|
inspecting the *Start* menu (hint: use the *Search* box to save time):
|
||||||
|
|
||||||
* PuTTY (should be already installed)
|
* PuTTY (should be already installed)
|
||||||
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
|
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
|
||||||
|
|
||||||
@@ -28,7 +22,6 @@ If they are missing, you can install them using the Software Kiosk icon on the D
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
## SSH with PuTTY with X11 Forwarding
|
## SSH with PuTTY with X11 Forwarding
|
||||||
|
|
||||||
Official X11 Forwarding support is through NoMachine. Please follow the document
|
Official X11 Forwarding support is through NoMachine. Please follow the document
|
||||||
|
|||||||
@@ -1,26 +1,17 @@
|
|||||||
---
|
# Kerberos and AFS authentication
|
||||||
title: Kerberos and AFS authentication
|
|
||||||
#tags:
|
|
||||||
keywords: kerberos, AFS, kinit, klist, keytab, tickets, connecting, client, configuration, slurm
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
summary: "This document describes how to use Kerberos."
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/kerberos.html
|
|
||||||
---
|
|
||||||
|
|
||||||
Projects and users have their own areas in the central PSI AFS service. In order
|
Projects and users have their own areas in the central PSI AFS service. In order
|
||||||
to access to these areas, valid Kerberos and AFS tickets must be granted.
|
to access to these areas, valid Kerberos and AFS tickets must be granted.
|
||||||
|
|
||||||
These tickets are automatically granted when accessing through SSH with
|
These tickets are automatically granted when accessing through SSH with
|
||||||
username and password. Alternatively, one can get a granting ticket with the `kinit` (Kerberos)
|
username and password. Alternatively, one can get a granting ticket with the `kinit` (Kerberos)
|
||||||
and `aklog` (AFS ticket, which needs to be run after `kinit`) commands.
|
and `aklog` (AFS ticket, which needs to be run after `kinit`) commands.
|
||||||
|
|
||||||
Due to PSI security policies, the maximum lifetime of the ticket is 7 days, and the default
|
Due to PSI security policies, the maximum lifetime of the ticket is 7 days, and the default
|
||||||
time is 10 hours. It means than one needs to constantly renew (`krenew` command) the existing
|
time is 10 hours. It means than one needs to constantly renew (`krenew` command) the existing
|
||||||
granting tickets, and their validity can not be extended longer than 7 days. At this point,
|
granting tickets, and their validity can not be extended longer than 7 days. At this point,
|
||||||
one needs to obtain new granting tickets.
|
one needs to obtain new granting tickets.
|
||||||
|
|
||||||
|
|
||||||
## Obtaining granting tickets with username and password
|
## Obtaining granting tickets with username and password
|
||||||
|
|
||||||
As already described above, the most common use case is to obtain Kerberos and AFS granting tickets
|
As already described above, the most common use case is to obtain Kerberos and AFS granting tickets
|
||||||
@@ -28,8 +19,9 @@ by introducing username and password:
|
|||||||
* When login to Merlin through SSH protocol, if this is done with username + password authentication,
|
* When login to Merlin through SSH protocol, if this is done with username + password authentication,
|
||||||
tickets for Kerberos and AFS will be automatically obtained.
|
tickets for Kerberos and AFS will be automatically obtained.
|
||||||
* When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to
|
* When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to
|
||||||
|
|
||||||
run `kinit` (to obtain a granting Kerberos ticket) followed by `aklog` (to obtain a granting AFS ticket).
|
run `kinit` (to obtain a granting Kerberos ticket) followed by `aklog` (to obtain a granting AFS ticket).
|
||||||
See further details below.
|
See further details below.
|
||||||
|
|
||||||
To manually obtain granting tickets, one has to:
|
To manually obtain granting tickets, one has to:
|
||||||
1. To obtain a granting Kerberos ticket, one needs to run `kinit $USER` and enter the PSI password.
|
1. To obtain a granting Kerberos ticket, one needs to run `kinit $USER` and enter the PSI password.
|
||||||
@@ -49,16 +41,16 @@ klist
|
|||||||
```bash
|
```bash
|
||||||
krenew
|
krenew
|
||||||
```
|
```
|
||||||
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
|
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
|
||||||
and then `kinit` should be used instead.
|
and then `kinit` should be used instead.
|
||||||
|
|
||||||
|
|
||||||
## Obtanining granting tickets with keytab
|
## Obtanining granting tickets with keytab
|
||||||
|
|
||||||
Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs
|
Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs
|
||||||
requiring access to private areas in AFS. For that, there's the possibility to generate a **keytab** file.
|
requiring access to private areas in AFS. For that, there's the possibility to generate a **keytab** file.
|
||||||
|
|
||||||
|
Be aware that the **keytab** file must be **private**, **fully protected** by correct permissions and not shared with any
|
||||||
|
|
||||||
Be aware that the **keytab** file must be **private**, **fully protected** by correct permissions and not shared with any
|
|
||||||
other users.
|
other users.
|
||||||
|
|
||||||
### Creating a keytab file
|
### Creating a keytab file
|
||||||
@@ -70,6 +62,7 @@ For generating a **keytab**, one has to:
|
|||||||
module load krb5/1.20
|
module load krb5/1.20
|
||||||
```
|
```
|
||||||
2. Create a private directory for storing the Kerberos **keytab** file
|
2. Create a private directory for storing the Kerberos **keytab** file
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mkdir -p ~/.k5
|
mkdir -p ~/.k5
|
||||||
```
|
```
|
||||||
@@ -78,6 +71,7 @@ mkdir -p ~/.k5
|
|||||||
ktutil
|
ktutil
|
||||||
```
|
```
|
||||||
4. In the `ktutil` console, one has to generate a **keytab** file as follows:
|
4. In the `ktutil` console, one has to generate a **keytab** file as follows:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Replace $USER by your username
|
# Replace $USER by your username
|
||||||
add_entry -password -k 0 -f -p $USER
|
add_entry -password -k 0 -f -p $USER
|
||||||
@@ -85,6 +79,7 @@ wkt /psi/home/$USER/.k5/krb5.keytab
|
|||||||
exit
|
exit
|
||||||
```
|
```
|
||||||
Notice that you will need to add your password once. This step is required for generating the **keytab** file.
|
Notice that you will need to add your password once. This step is required for generating the **keytab** file.
|
||||||
|
|
||||||
5. Once back to the main shell, one has to ensure that the file contains the proper permissions:
|
5. Once back to the main shell, one has to ensure that the file contains the proper permissions:
|
||||||
```bash
|
```bash
|
||||||
chmod 0600 ~/.k5/krb5.keytab
|
chmod 0600 ~/.k5/krb5.keytab
|
||||||
@@ -112,14 +107,17 @@ The steps should be the following:
|
|||||||
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
|
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
|
||||||
```
|
```
|
||||||
* To obtain a Kerberos5 granting ticket, run `kinit` by using your keytab:
|
* To obtain a Kerberos5 granting ticket, run `kinit` by using your keytab:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
|
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
|
||||||
```
|
```
|
||||||
* To obtain a granting AFS ticket, run `aklog`:
|
* To obtain a granting AFS ticket, run `aklog`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
aklog
|
aklog
|
||||||
```
|
```
|
||||||
* At the end of the job, you can remove destroy existing Kerberos tickets.
|
* At the end of the job, you can remove destroy existing Kerberos tickets.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
kdestroy
|
kdestroy
|
||||||
```
|
```
|
||||||
@@ -137,7 +135,7 @@ This is the **recommended** way. At the end of the job, is strongly recommended
|
|||||||
#SBATCH --output=run.out # Generate custom output file
|
#SBATCH --output=run.out # Generate custom output file
|
||||||
#SBATCH --error=run.err # Generate custom error file
|
#SBATCH --error=run.err # Generate custom error file
|
||||||
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
||||||
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
|
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
|
||||||
#SBATCH --cpus-per-task=1
|
#SBATCH --cpus-per-task=1
|
||||||
#SBATCH --constraint=xeon-gold-6152
|
#SBATCH --constraint=xeon-gold-6152
|
||||||
#SBATCH --hint=nomultithread
|
#SBATCH --hint=nomultithread
|
||||||
|
|||||||
@@ -103,5 +103,5 @@ These settings prevent "bluriness" at the cost of some performance! (You might w
|
|||||||
* Display > Resize remote display (forces 1:1 pixel sizes)
|
* Display > Resize remote display (forces 1:1 pixel sizes)
|
||||||
* Display > Change settings > Quality: Choose Medium-Best Quality
|
* Display > Change settings > Quality: Choose Medium-Best Quality
|
||||||
* Display > Change settings > Modify advanced settings
|
* Display > Change settings > Modify advanced settings
|
||||||
* Check: Disable network-adaptive display quality (diables lossy compression)
|
* Check: Disable network-adaptive display quality (diables lossy compression)
|
||||||
* Check: Disable client side image post-processing
|
* Check: Disable client side image post-processing
|
||||||
|
|||||||
@@ -1,13 +1,4 @@
|
|||||||
---
|
# Configuring SSH Keys in Merlin
|
||||||
title: Configuring SSH Keys in Merlin
|
|
||||||
|
|
||||||
#tags:
|
|
||||||
keywords: linux, connecting, client, configuration, SSH, Keys, SSH-Keys, RSA, authorization, authentication
|
|
||||||
last_updated: 15 Jul 2020
|
|
||||||
summary: "This document describes how to deploy SSH Keys in Merlin."
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/ssh-keys.html
|
|
||||||
---
|
|
||||||
|
|
||||||
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
|
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
|
||||||
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
|
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
|
||||||
@@ -22,14 +13,15 @@ User can check whether a SSH key already exists. These would be placed in the **
|
|||||||
is usually the default one, and files in there would be **`id_rsa`** (private key) and **`id_rsa.pub`** (public key).
|
is usually the default one, and files in there would be **`id_rsa`** (private key) and **`id_rsa.pub`** (public key).
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ls ~/.ssh/id*
|
ls ~/.ssh/id*
|
||||||
```
|
```
|
||||||
|
|
||||||
For creating **SSH RSA Keys**, one should:
|
For creating **SSH RSA Keys**, one should:
|
||||||
|
|
||||||
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
|
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
|
||||||
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
|
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
|
||||||
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
|
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
|
||||||
|
|
||||||
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
|
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
|
||||||
```bash
|
```bash
|
||||||
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
||||||
@@ -57,16 +49,16 @@ For creating **SSH RSA Keys**, one should:
|
|||||||
|
|
||||||
### Using Authentication Agent in SSH session
|
### Using Authentication Agent in SSH session
|
||||||
|
|
||||||
By default, when accessing the login node via SSH (with `ForwardAgent=yes`), it will automatically add your
|
By default, when accessing the login node via SSH (with `ForwardAgent=yes`), it will automatically add your
|
||||||
SSH Keys to the authentication agent. Hence, no actions should not be needed by the user. One can configure
|
SSH Keys to the authentication agent. Hence, no actions should not be needed by the user. One can configure
|
||||||
`ForwardAgent=yes` as follows:
|
`ForwardAgent=yes` as follows:
|
||||||
|
|
||||||
* **(Recommended)** In your local Linux (workstation, laptop or desktop) add the following line in the
|
* **(Recommended)** In your local Linux (workstation, laptop or desktop) add the following line in the
|
||||||
`$HOME/.ssh/config` (or alternatively in `/etc/ssh/ssh_config`) file:
|
`$HOME/.ssh/config` (or alternatively in `/etc/ssh/ssh_config`) file:
|
||||||
```
|
```
|
||||||
ForwardAgent yes
|
ForwardAgent yes
|
||||||
```
|
```
|
||||||
* Alternatively, on each SSH you can add the option `ForwardAgent=yes` in the SSH command. In example:
|
* Alternatively, on each SSH you can add the option `ForwardAgent=yes` in the SSH command. In example:
|
||||||
```bash
|
```bash
|
||||||
ssh -XY -o ForwardAgent=yes merlin-l-001.psi.ch
|
ssh -XY -o ForwardAgent=yes merlin-l-001.psi.ch
|
||||||
```
|
```
|
||||||
@@ -74,12 +66,12 @@ SSH Keys to the authentication agent. Hence, no actions should not be needed by
|
|||||||
If `ForwardAgent` is not enabled as shown above, one needs to run the authentication agent and then add your key
|
If `ForwardAgent` is not enabled as shown above, one needs to run the authentication agent and then add your key
|
||||||
to the **ssh-agent**. This must be done once per SSH session, as follows:
|
to the **ssh-agent**. This must be done once per SSH session, as follows:
|
||||||
|
|
||||||
* Run `eval $(ssh-agent -s)` to run the **ssh-agent** in that SSH session
|
* Run `eval $(ssh-agent -s)` to run the **ssh-agent** in that SSH session
|
||||||
* Check whether the authentication agent has your key already added:
|
* Check whether the authentication agent has your key already added:
|
||||||
```bash
|
```bash
|
||||||
ssh-add -l | grep "/psi/home/$(whoami)/.ssh"
|
ssh-add -l | grep "/psi/home/$(whoami)/.ssh"
|
||||||
```
|
```
|
||||||
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
||||||
You will be requested for the **passphrase** of your key, and it can be done by running:
|
You will be requested for the **passphrase** of your key, and it can be done by running:
|
||||||
```bash
|
```bash
|
||||||
ssh-add
|
ssh-add
|
||||||
@@ -96,7 +88,7 @@ However, for NoMachine one always need to add the private key identity to the au
|
|||||||
```bash
|
```bash
|
||||||
ssh-add -l | grep "/psi/home/$(whoami)/.ssh"
|
ssh-add -l | grep "/psi/home/$(whoami)/.ssh"
|
||||||
```
|
```
|
||||||
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
||||||
You will be requested for the **passphrase** of your key, and it can be done by running:
|
You will be requested for the **passphrase** of your key, and it can be done by running:
|
||||||
```bash
|
```bash
|
||||||
ssh-add
|
ssh-add
|
||||||
|
|||||||
@@ -7,7 +7,8 @@ This document describes the different directories of the Merlin6 cluster.
|
|||||||
### User and project data
|
### User and project data
|
||||||
|
|
||||||
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
||||||
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
|
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
|
||||||
|
|
||||||
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
||||||
|
|
||||||
!!! warning
|
!!! warning
|
||||||
@@ -31,13 +32,15 @@ merlin_quotas
|
|||||||
Merlin6 offers the following directory classes for users:
|
Merlin6 offers the following directory classes for users:
|
||||||
|
|
||||||
* ``/psi/home/<username>``: Private user **home** directory
|
* ``/psi/home/<username>``: Private user **home** directory
|
||||||
|
|
||||||
* ``/data/user/<username>``: Private user **data** directory
|
* ``/data/user/<username>``: Private user **data** directory
|
||||||
* ``/data/project/general/<projectname>``: Shared **Project** directory
|
* ``/data/project/general/<projectname>``: Shared **Project** directory
|
||||||
* For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists.
|
* For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists.
|
||||||
|
|
||||||
* ``/scratch``: Local *scratch* disk (only visible by the node running a job).
|
* ``/scratch``: Local *scratch* disk (only visible by the node running a job).
|
||||||
* ``/shared-scratch``: Shared *scratch* disk (visible from all nodes).
|
* ``/shared-scratch``: Shared *scratch* disk (visible from all nodes).
|
||||||
* ``/export``: Export directory for data transfer, visible from `ra-merlin-01.psi.ch`, `ra-merlin-02.psi.ch` and Merlin login nodes.
|
* ``/export``: Export directory for data transfer, visible from `ra-merlin-01.psi.ch`, `ra-merlin-02.psi.ch` and Merlin login nodes.
|
||||||
* Refer to **[Transferring Data](../how-to-use-merlin/transfer-data.md)** for more information about the export area and data transfer service.
|
* Refer to **[Transferring Data](../how-to-use-merlin/transfer-data.md)** for more information about the export area and data transfer service.
|
||||||
|
|
||||||
!!! tip
|
!!! tip
|
||||||
|
|
||||||
@@ -65,7 +68,7 @@ Properties of the directory classes:
|
|||||||
|
|
||||||
The use of **scratch** and **export** areas as an extension of the quota
|
The use of **scratch** and **export** areas as an extension of the quota
|
||||||
_is forbidden_. **scratch** and **export** areas _must not contain_ final
|
_is forbidden_. **scratch** and **export** areas _must not contain_ final
|
||||||
data.
|
data.
|
||||||
|
|
||||||
**_Auto cleanup policies_** in the **scratch** and **export** areas are applied.
|
**_Auto cleanup policies_** in the **scratch** and **export** areas are applied.
|
||||||
|
|
||||||
@@ -94,7 +97,8 @@ quota -s
|
|||||||
|
|
||||||
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
|
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
|
||||||
* Is **forbidden** to use the home directories for IO intensive tasks
|
* Is **forbidden** to use the home directories for IO intensive tasks
|
||||||
* Use `/scratch`, `/shared-scratch`, `/data/user` or `/data/project` for this purpose.
|
* Use `/scratch`, `/shared-scratch`, `/data/user` or `/data/project` for this purpose.
|
||||||
|
|
||||||
* Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**.
|
* Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**.
|
||||||
Snapshots can be accessed at this path:
|
Snapshots can be accessed at this path:
|
||||||
|
|
||||||
@@ -121,7 +125,8 @@ mmlsquota -u <username> --block-size auto merlin-user
|
|||||||
|
|
||||||
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
|
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
|
||||||
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
|
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
|
||||||
* Use ``/scratch``, ``/shared-scratch`` for this purpose.
|
* Use ``/scratch``, ``/shared-scratch`` for this purpose.
|
||||||
|
|
||||||
* No backup policy is applied for user data directories: users are responsible for backing up their data.
|
* No backup policy is applied for user data directories: users are responsible for backing up their data.
|
||||||
|
|
||||||
### Project data directory
|
### Project data directory
|
||||||
@@ -178,7 +183,7 @@ The properties of the available scratch storage spaces are given in the followin
|
|||||||
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
|
* Read **[Important: Code of Conduct](../quick-start-guide/code-of-conduct.md)** for more information about Merlin6 policies.
|
||||||
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
||||||
* Temporary files *must be deleted at the end of the job by the user*.
|
* Temporary files *must be deleted at the end of the job by the user*.
|
||||||
* Remaining files will be deleted by the system if detected.
|
* Remaining files will be deleted by the system if detected.
|
||||||
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
||||||
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|
||||||
|
|
||||||
@@ -190,6 +195,6 @@ Please read **[Transferring Data](../how-to-use-merlin/transfer-data.md)** for m
|
|||||||
#### Export directory policy
|
#### Export directory policy
|
||||||
|
|
||||||
* Temporary files *must be deleted at the end of the job by the user*.
|
* Temporary files *must be deleted at the end of the job by the user*.
|
||||||
* Remaining files will be deleted by the system if detected.
|
* Remaining files will be deleted by the system if detected.
|
||||||
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
||||||
* If for some reason the export area gets full, admins have the rights to cleanup the oldest data
|
* If for some reason the export area gets full, admins have the rights to cleanup the oldest data
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Transferring Data
|
||||||
title: Transferring Data
|
|
||||||
#tags:
|
|
||||||
keywords: transferring data, data transfer, rsync, winscp, copy data, copying, sftp, import, export, hopx, vpn
|
|
||||||
last_updated: 24 August 2023
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/transfer-data.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
@@ -24,7 +16,6 @@ visibility.
|
|||||||
- Systems on the internet can access the [PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer) service
|
- Systems on the internet can access the [PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer) service
|
||||||
`datatransfer.psi.ch`, using ssh-based protocols and [Globus](https://www.globus.org/)
|
`datatransfer.psi.ch`, using ssh-based protocols and [Globus](https://www.globus.org/)
|
||||||
|
|
||||||
|
|
||||||
## Direct transfer via Merlin6 login nodes
|
## Direct transfer via Merlin6 login nodes
|
||||||
|
|
||||||
The following methods transfer data directly via the [login
|
The following methods transfer data directly via the [login
|
||||||
@@ -50,7 +41,6 @@ rsync -avAHXS ~/localdata user@merlin-l-01.psi.ch:/data/project/general/myprojec
|
|||||||
You can resume interrupted transfers by simply rerunning the command. Previously
|
You can resume interrupted transfers by simply rerunning the command. Previously
|
||||||
transferred files will be skipped.
|
transferred files will be skipped.
|
||||||
|
|
||||||
|
|
||||||
### WinSCP
|
### WinSCP
|
||||||
|
|
||||||
The WinSCP tool can be used for remote file transfer on Windows. It is available
|
The WinSCP tool can be used for remote file transfer on Windows. It is available
|
||||||
@@ -64,7 +54,7 @@ local computer and merlin.
|
|||||||
|
|
||||||
Authentication of users is provided through SimpleSAMLphp, supporting SAML2, LDAP and RADIUS and more. Users without an account can be sent an upload voucher by an authenticated user. FileSender is developed to the requirements of the higher education and research community.
|
Authentication of users is provided through SimpleSAMLphp, supporting SAML2, LDAP and RADIUS and more. Users without an account can be sent an upload voucher by an authenticated user. FileSender is developed to the requirements of the higher education and research community.
|
||||||
|
|
||||||
The purpose of the software is to send a large file to someone, have that file available for download for a certain number of downloads and/or a certain amount of time, and after that automatically delete the file. The software is not intended as a permanent file publishing platform.
|
The purpose of the software is to send a large file to someone, have that file available for download for a certain number of downloads and/or a certain amount of time, and after that automatically delete the file. The software is not intended as a permanent file publishing platform.
|
||||||
|
|
||||||
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is fully integrated with PSI, therefore, PSI employees can log in by using their PSI account (through Authentication and Authorization Infrastructure / AAI, by selecting PSI as the institution to be used for log in).
|
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is fully integrated with PSI, therefore, PSI employees can log in by using their PSI account (through Authentication and Authorization Infrastructure / AAI, by selecting PSI as the institution to be used for log in).
|
||||||
|
|
||||||
@@ -82,11 +72,13 @@ Notice that `datatransfer.psi.ch` does not allow SSH login, only `rsync`, `scp`
|
|||||||
|
|
||||||
The following filesystems are mounted:
|
The following filesystems are mounted:
|
||||||
* `/merlin/export` which points to the `/export` directory in Merlin.
|
* `/merlin/export` which points to the `/export` directory in Merlin.
|
||||||
* `/merlin/data/experiment/mu3e` which points to the `/data/experiment/mu3e` directories in Merlin.
|
* `/merlin/data/experiment/mu3e` which points to the `/data/experiment/mu3e` directories in Merlin.
|
||||||
* Mu3e sub-directories are mounted in RW (read-write), except for `data` (read-only mounted)
|
* Mu3e sub-directories are mounted in RW (read-write), except for `data` (read-only mounted)
|
||||||
|
|
||||||
* `/merlin/data/project/general` which points to the `/data/project/general` directories in Merlin.
|
* `/merlin/data/project/general` which points to the `/data/project/general` directories in Merlin.
|
||||||
* Owners of Merlin projects should request explicit access to it.
|
* Owners of Merlin projects should request explicit access to it.
|
||||||
* Currently, only `CSCS` is available for transferring files between PizDaint/Alps and Merlin
|
* Currently, only `CSCS` is available for transferring files between PizDaint/Alps and Merlin
|
||||||
|
|
||||||
* `/merlin/data/project/bio` which points to the `/data/project/bio` directories in Merlin.
|
* `/merlin/data/project/bio` which points to the `/data/project/bio` directories in Merlin.
|
||||||
* `/merlin/data/user` which points to the `/data/user` directories in Merlin.
|
* `/merlin/data/user` which points to the `/data/user` directories in Merlin.
|
||||||
|
|
||||||
@@ -120,16 +112,17 @@ Transferring big amounts of data from outside PSI to Merlin is always possible t
|
|||||||
##### Exporting data from Merlin
|
##### Exporting data from Merlin
|
||||||
|
|
||||||
For exporting data from Merlin to outside PSI by using `/export`, one has to:
|
For exporting data from Merlin to outside PSI by using `/export`, one has to:
|
||||||
* From a Merlin login node, copy your data from any directory (i.e. `/data/project`, `/data/user`, `/scratch`) to
|
* From a Merlin login node, copy your data from any directory (i.e. `/data/project`, `/data/user`, `/scratch`) to
|
||||||
`/export`. Ensure to properly secure your directories and files with proper permissions.
|
`/export`. Ensure to properly secure your directories and files with proper permissions.
|
||||||
* Once data is copied, from **`datatransfer.psi.ch`**, copy the data from `/merlin/export` to outside PSI
|
* Once data is copied, from **`datatransfer.psi.ch`**, copy the data from `/merlin/export` to outside PSI
|
||||||
|
|
||||||
##### Importing data to Merlin
|
##### Importing data to Merlin
|
||||||
|
|
||||||
For importing data from outside PSI to Merlin by using `/export`, one has to:
|
For importing data from outside PSI to Merlin by using `/export`, one has to:
|
||||||
* From **`datatransfer.psi.ch`**, copy the data from outside PSI to `/merlin/export`.
|
* From **`datatransfer.psi.ch`**, copy the data from outside PSI to `/merlin/export`.
|
||||||
|
|
||||||
Ensure to properly secure your directories and files with proper permissions.
|
Ensure to properly secure your directories and files with proper permissions.
|
||||||
* Once data is copied, from a Merlin login node, copy your data from `/export` to any directory (i.e. `/data/project`, `/data/user`, `/scratch`).
|
* Once data is copied, from a Merlin login node, copy your data from `/export` to any directory (i.e. `/data/project`, `/data/user`, `/scratch`).
|
||||||
|
|
||||||
#### Request access to your project directory
|
#### Request access to your project directory
|
||||||
|
|
||||||
@@ -144,10 +137,10 @@ Merlin6 is fully accessible from within the PSI network. To connect from outside
|
|||||||
|
|
||||||
- [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
|
- [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
|
||||||
- [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
|
- [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
|
||||||
* Please avoid transferring big amount data through **hopx**
|
* Please avoid transferring big amount data through **hopx**
|
||||||
- [No Machine](nomachine.md)
|
- [No Machine](nomachine.md)
|
||||||
* Remote Interactive Access through [**'rem-acc.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
|
* Remote Interactive Access through [**'rem-acc.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
|
||||||
* Please avoid transferring big amount of data through **NoMachine**
|
* Please avoid transferring big amount of data through **NoMachine**
|
||||||
|
|
||||||
## Connecting from Merlin6 to outside file shares
|
## Connecting from Merlin6 to outside file shares
|
||||||
|
|
||||||
@@ -161,5 +154,4 @@ provides a helpful wrapper over the Gnome storage utilities, and provides suppor
|
|||||||
- FTP, SFTP
|
- FTP, SFTP
|
||||||
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
||||||
|
|
||||||
|
|
||||||
[More instruction on using `merlin_rmount`](../software-support/merlin-rmount.md)
|
[More instruction on using `merlin_rmount`](../software-support/merlin-rmount.md)
|
||||||
|
|||||||
@@ -1,17 +1,8 @@
|
|||||||
---
|
|
||||||
#tags:
|
|
||||||
keywords: Pmodules, software, stable, unstable, deprecated, overlay, overlays, release stage, module, package, packages, library, libraries
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/using-modules.html
|
|
||||||
---
|
|
||||||
|
|
||||||
# Using PModules
|
# Using PModules
|
||||||
|
|
||||||
## Environment Modules
|
## Environment Modules
|
||||||
|
|
||||||
On top of the operating system stack we provide different software using the PSI developed PModule system.
|
On top of the operating system stack we provide different software using the PSI developed PModule system.
|
||||||
|
|
||||||
PModules is the official supported way and each package is deployed by a specific expert. Usually, in PModules
|
PModules is the official supported way and each package is deployed by a specific expert. Usually, in PModules
|
||||||
software which is used by many people will be found.
|
software which is used by many people will be found.
|
||||||
@@ -79,8 +70,8 @@ module use overlay_merlin
|
|||||||
```
|
```
|
||||||
|
|
||||||
Then, once `overlay_merlin` is invoked, it will disable central software installations with the same version (if exist), and will be replaced
|
Then, once `overlay_merlin` is invoked, it will disable central software installations with the same version (if exist), and will be replaced
|
||||||
by the local ones in Merlin. Releases from the central Pmodules repository which do not have a copy in the Merlin overlay will remain
|
by the local ones in Merlin. Releases from the central Pmodules repository which do not have a copy in the Merlin overlay will remain
|
||||||
visible. In example, for each ANSYS release, one can identify where it is installed by searching ANSYS in PModules with the `--verbose`
|
visible. In example, for each ANSYS release, one can identify where it is installed by searching ANSYS in PModules with the `--verbose`
|
||||||
option. This will show the location of the different ANSYS releases as follows:
|
option. This will show the location of the different ANSYS releases as follows:
|
||||||
* For ANSYS releases installed in the central repositories, the path starts with `/opt/psi`
|
* For ANSYS releases installed in the central repositories, the path starts with `/opt/psi`
|
||||||
* For ANSYS releases installed in the Merlin6 repository (and/or overwritting the central ones), the path starts with `/data/software/pmodules`
|
* For ANSYS releases installed in the Merlin6 repository (and/or overwritting the central ones), the path starts with `/data/software/pmodules`
|
||||||
|
|||||||
@@ -4,16 +4,18 @@
|
|||||||
|
|
||||||
* The new Slurm CPU cluster is called **`merlin6`**.
|
* The new Slurm CPU cluster is called **`merlin6`**.
|
||||||
* The new Slurm GPU cluster is called [**`gmerlin6`**](../gmerlin6/cluster-introduction.md)
|
* The new Slurm GPU cluster is called [**`gmerlin6`**](../gmerlin6/cluster-introduction.md)
|
||||||
* The old Slurm *merlin* cluster is still active and best effort support is provided.
|
* The old Slurm *merlin* cluster is still active and best effort support is provided.
|
||||||
|
|
||||||
The cluster, was renamed as [**merlin5**](../merlin5/cluster-introduction.md).
|
The cluster, was renamed as [**merlin5**](../merlin5/cluster-introduction.md).
|
||||||
|
|
||||||
From July 2019, **`merlin6`** becomes the **default Slurm cluster** and any job submitted from the login node will be submitted to that cluster if not .
|
From July 2019, **`merlin6`** becomes the **default Slurm cluster** and any job submitted from the login node will be submitted to that cluster if not.
|
||||||
|
|
||||||
* Users can keep submitting to the old *`merlin5`* computing nodes by using the option ``--cluster=merlin5``.
|
* Users can keep submitting to the old *`merlin5`* computing nodes by using the option ``--cluster=merlin5``.
|
||||||
* Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``.
|
* Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``.
|
||||||
|
|
||||||
### Slurm 'merlin6'
|
### Slurm 'merlin6'
|
||||||
|
|
||||||
**CPU nodes** are configured in a **Slurm** cluster, called **`merlin6`**, and
|
**CPU nodes** are configured in a **Slurm** cluster, called **`merlin6`**, and
|
||||||
this is the _**default Slurm cluster**_. Hence, by default, if no Slurm cluster is
|
this is the ***default Slurm cluster***. Hence, by default, if no Slurm cluster is
|
||||||
specified (with the `--cluster` option), this will be the cluster to which the jobs
|
specified (with the `--cluster` option), this will be the cluster to which the jobs
|
||||||
will be sent.
|
will be sent.
|
||||||
|
|||||||
@@ -7,10 +7,10 @@ applications (e.g. a text editor, but for more performant graphical access, refe
|
|||||||
in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:
|
in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:
|
||||||
|
|
||||||
* Merlin6 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
|
* Merlin6 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
|
||||||
* Hence, Merlin6 administrators **do not offer official support** for X11 client setup.
|
* Hence, Merlin6 administrators **do not offer official support** for X11 client setup.
|
||||||
* Nevertheless, a generic guide for X11 client setup (*Linux*, *Windows* and *MacOS*) is provided below.
|
* Nevertheless, a generic guide for X11 client setup (*Linux*, *Windows* and *MacOS*) is provided below.
|
||||||
* PSI desktop configuration issues must be addressed through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
* PSI desktop configuration issues must be addressed through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
||||||
* Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
|
* Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
|
||||||
|
|
||||||
### Accessing from a Linux client
|
### Accessing from a Linux client
|
||||||
|
|
||||||
|
|||||||
@@ -10,25 +10,27 @@ The basic principle is courtesy and consideration for other users.
|
|||||||
## Interactive nodes
|
## Interactive nodes
|
||||||
|
|
||||||
* The interactive nodes (also known as login nodes) are for development and quick testing:
|
* The interactive nodes (also known as login nodes) are for development and quick testing:
|
||||||
* It is **strictly forbidden to run production jobs** on the login nodes. All production jobs must
|
* It is **strictly forbidden to run production jobs** on the login nodes. All production jobs must
|
||||||
be submitted to the batch system.
|
be submitted to the batch system.
|
||||||
* It is **forbidden to run long processes** occupying big parts of a login node's resources.
|
* It is **forbidden to run long processes** occupying big parts of a login node's resources.
|
||||||
* According to the previous rules, **misbehaving running processes will have to be killed.**
|
* According to the previous rules, **misbehaving running processes will have to be killed.**
|
||||||
in order to keep the system responsive for other users.
|
in order to keep the system responsive for other users.
|
||||||
|
|
||||||
## Batch system
|
## Batch system
|
||||||
|
|
||||||
* Make sure that no broken or run-away processes are left when your job is done. Keep the process space clean on all nodes.
|
* Make sure that no broken or run-away processes are left when your job is done. Keep the process space clean on all nodes.
|
||||||
* During the runtime of a job, it is mandatory to use the ``/scratch`` and ``/shared-scratch`` partitions for temporary data:
|
* During the runtime of a job, it is mandatory to use the ``/scratch`` and ``/shared-scratch`` partitions for temporary data:
|
||||||
* It is **forbidden** to use the ``/data/user``, ``/data/project`` or ``/psi/home/`` for that purpose.
|
* It is **forbidden** to use the ``/data/user``, ``/data/project`` or ``/psi/home/`` for that purpose.
|
||||||
* Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.
|
* Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.
|
||||||
* Prefer ``/scratch`` over ``/shared-scratch`` and use the latter only when you require the temporary files to be visible from multiple nodes.
|
* Prefer ``/scratch`` over ``/shared-scratch`` and use the latter only when you require the temporary files to be visible from multiple nodes.
|
||||||
|
|
||||||
* Read the description in **[Merlin6 directory structure](../how-to-use-merlin/storage.md#merlin6-directories)** for learning about the correct usage of each partition type.
|
* Read the description in **[Merlin6 directory structure](../how-to-use-merlin/storage.md#merlin6-directories)** for learning about the correct usage of each partition type.
|
||||||
|
|
||||||
## User and project data
|
## User and project data
|
||||||
|
|
||||||
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
||||||
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
|
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
|
||||||
|
|
||||||
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
||||||
|
|
||||||
!!! warning
|
!!! warning
|
||||||
@@ -41,8 +43,8 @@ The basic principle is courtesy and consideration for other users.
|
|||||||
* The system administrator has the right to temporarily block the access to
|
* The system administrator has the right to temporarily block the access to
|
||||||
Merlin6 for an account violating the Code of Conduct in order to maintain the
|
Merlin6 for an account violating the Code of Conduct in order to maintain the
|
||||||
efficiency and stability of the system.
|
efficiency and stability of the system.
|
||||||
* Repetitive violations by the same user will be escalated to the user's supervisor.
|
* Repetitive violations by the same user will be escalated to the user's supervisor.
|
||||||
* The system administrator has the right to delete files in the **scratch** directories
|
* The system administrator has the right to delete files in the **scratch** directories
|
||||||
* after a job, if the job failed to clean up its files.
|
* after a job, if the job failed to clean up its files.
|
||||||
* during the job in order to prevent a job from destabilizing a node or multiple nodes.
|
* during the job in order to prevent a job from destabilizing a node or multiple nodes.
|
||||||
* The system administrator has the right to kill any misbehaving running processes.
|
* The system administrator has the right to kill any misbehaving running processes.
|
||||||
|
|||||||
@@ -9,8 +9,8 @@ At present, the **Merlin local HPC cluster** contains _two_ generations of it:
|
|||||||
|
|
||||||
* the old **Merlin5** cluster (`merlin5` Slurm cluster), and
|
* the old **Merlin5** cluster (`merlin5` Slurm cluster), and
|
||||||
* the newest generation **Merlin6**, which is divided in two Slurm clusters:
|
* the newest generation **Merlin6**, which is divided in two Slurm clusters:
|
||||||
* `merlin6` as the Slurm CPU cluster
|
* `merlin6` as the Slurm CPU cluster
|
||||||
* `gmerlin6` as the Slurm GPU cluster.
|
* `gmerlin6` as the Slurm GPU cluster.
|
||||||
|
|
||||||
Access to the different Slurm clusters is possible from the [**Merlin login nodes**](accessing-interactive-nodes.md),
|
Access to the different Slurm clusters is possible from the [**Merlin login nodes**](accessing-interactive-nodes.md),
|
||||||
which can be accessed through the [SSH protocol](accessing-interactive-nodes.md#ssh-access) or the [NoMachine (NX) service](../how-to-use-merlin/nomachine.md).
|
which can be accessed through the [SSH protocol](accessing-interactive-nodes.md#ssh-access) or the [NoMachine (NX) service](../how-to-use-merlin/nomachine.md).
|
||||||
@@ -42,9 +42,9 @@ by the BIO Division and by Deep Leaning project.
|
|||||||
These computational resources are split into **two** different **[Slurm](https://slurm.schedmd.com/overview.html)** clusters:
|
These computational resources are split into **two** different **[Slurm](https://slurm.schedmd.com/overview.html)** clusters:
|
||||||
|
|
||||||
* The Merlin6 CPU nodes are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`merlin6`**](../slurm-configuration.md).
|
* The Merlin6 CPU nodes are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`merlin6`**](../slurm-configuration.md).
|
||||||
* This is the **default Slurm cluster** configured in the login nodes: any job submitted without the option `--cluster` will be submited to this cluster.
|
* This is the **default Slurm cluster** configured in the login nodes: any job submitted without the option `--cluster` will be submited to this cluster.
|
||||||
* The Merlin6 GPU resources are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`gmerlin6`**](../../gmerlin6/slurm-configuration.md).
|
* The Merlin6 GPU resources are in a dedicated **[Slurm](https://slurm.schedmd.com/overview.html)** cluster called [**`gmerlin6`**](../../gmerlin6/slurm-configuration.md).
|
||||||
* Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``.
|
* Users submitting to the **`gmerlin6`** GPU cluster need to specify the option ``--cluster=gmerlin6``.
|
||||||
|
|
||||||
### Merlin5
|
### Merlin5
|
||||||
|
|
||||||
|
|||||||
@@ -43,9 +43,9 @@ The owner of the group is the person who will be allowed to modify the group.
|
|||||||
|
|
||||||
```text
|
```text
|
||||||
Dear HelpDesk
|
Dear HelpDesk
|
||||||
|
|
||||||
I would like to request a new unix group.
|
I would like to request a new unix group.
|
||||||
|
|
||||||
Unix Group Name: unx-xxxxx
|
Unix Group Name: unx-xxxxx
|
||||||
Initial Group Members: xxxxx, yyyyy, zzzzz, ...
|
Initial Group Members: xxxxx, yyyyy, zzzzz, ...
|
||||||
Group Owner: xxxxx
|
Group Owner: xxxxx
|
||||||
@@ -93,16 +93,16 @@ To request a project, please provide the following information in a **[PSI Servi
|
|||||||
|
|
||||||
```text
|
```text
|
||||||
Dear HelpDesk
|
Dear HelpDesk
|
||||||
|
|
||||||
I would like to request a new Merlin6 project.
|
I would like to request a new Merlin6 project.
|
||||||
|
|
||||||
Project Name: xxxxx
|
Project Name: xxxxx
|
||||||
UnixGroup: xxxxx # Must be an existing Unix Group
|
UnixGroup: xxxxx # Must be an existing Unix Group
|
||||||
|
|
||||||
The project responsible is the Owner of the Unix Group.
|
The project responsible is the Owner of the Unix Group.
|
||||||
If you need a storage quota exceeding the defaults, please provide a description
|
If you need a storage quota exceeding the defaults, please provide a description
|
||||||
and motivation for the higher storage needs:
|
and motivation for the higher storage needs:
|
||||||
|
|
||||||
Storage Quota: 1TB with a maximum of 1M Files
|
Storage Quota: 1TB with a maximum of 1M Files
|
||||||
Reason: (None for default 1TB/1M)
|
Reason: (None for default 1TB/1M)
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Slurm Configuration
|
||||||
title: Slurm Configuration
|
|
||||||
#tags:
|
|
||||||
keywords: configuration, partitions, node definition
|
|
||||||
last_updated: 29 January 2021
|
|
||||||
summary: "This document describes a summary of the Merlin6 configuration."
|
|
||||||
sidebar: merlin6_sidebar
|
|
||||||
permalink: /merlin6/slurm-configuration.html
|
|
||||||
---
|
|
||||||
|
|
||||||
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin6 CPU cluster.
|
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin6 CPU cluster.
|
||||||
|
|
||||||
@@ -23,11 +15,12 @@ The following table show default and maximum resources that can be used per node
|
|||||||
| merlin-c-[313-318] | 1 core | 44 cores | 1 | 748800 | 748800 | 10000 | N/A | N/A |
|
| merlin-c-[313-318] | 1 core | 44 cores | 1 | 748800 | 748800 | 10000 | N/A | N/A |
|
||||||
| merlin-c-[319-324] | 1 core | 44 cores | 2 | 748800 | 748800 | 10000 | N/A | N/A |
|
| merlin-c-[319-324] | 1 core | 44 cores | 2 | 748800 | 748800 | 10000 | N/A | N/A |
|
||||||
|
|
||||||
If nothing is specified, by default each core will use up to 8GB of memory. Memory can be increased with the `--mem=<mem_in_MB>` and
|
If nothing is specified, by default each core will use up to 8GB of memory. Memory can be increased with the `--mem=<mem_in_MB>` and
|
||||||
`--mem-per-cpu=<mem_in_MB>` options, and maximum memory allowed is `Max.Mem/Node`.
|
`--mem-per-cpu=<mem_in_MB>` options, and maximum memory allowed is `Max.Mem/Node`.
|
||||||
|
|
||||||
In **`merlin6`**, Memory is considered a Consumable Resource, as well as the CPU. Hence, both resources will account when submitting a job,
|
In **`merlin6`**, Memory is considered a Consumable Resource, as well as the CPU. Hence, both resources will account when submitting a job,
|
||||||
and by default resources can not be oversubscribed. This is a main difference with the old **`merlin5`** cluster, when only CPU were accounted,
|
and by default resources can not be oversubscribed. This is a main difference with the old **`merlin5`** cluster, when only CPU were accounted,
|
||||||
|
|
||||||
and memory was by default oversubscribed.
|
and memory was by default oversubscribed.
|
||||||
|
|
||||||
!!! tip "Check Configuration"
|
!!! tip "Check Configuration"
|
||||||
@@ -66,12 +59,12 @@ The following *partitions* (also known as *queues*) are configured in Slurm:
|
|||||||
| **asa-ansys** | 1 hour | 90 days | unlimited | 1000 | 4 | 15600 |
|
| **asa-ansys** | 1 hour | 90 days | unlimited | 1000 | 4 | 15600 |
|
||||||
| **mu3e** | 1 day | 7 days | unlimited | 1000 | 4 | 3712 |
|
| **mu3e** | 1 day | 7 days | unlimited | 1000 | 4 | 3712 |
|
||||||
|
|
||||||
\*The **PriorityJobFactor** value will be added to the job priority (*PARTITION* column in `sprio -l` ). In other words, jobs sent to higher priority
|
The **PriorityJobFactor** value will be added to the job priority (**PARTITION** column in `sprio -l` ). In other words, jobs sent to higher priority
|
||||||
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
|
partitions will usually run first (however, other factors such like **job age** or mainly **fair share** might affect to that decision). For the GPU
|
||||||
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
|
partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.
|
||||||
|
|
||||||
**\*\***Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
|
Jobs submitted to a partition with a higher **PriorityTier** value will be dispatched before pending jobs in partition with lower *PriorityTier* value
|
||||||
and, if possible, they will preempt running jobs from partitions with lower *PriorityTier* values.
|
and, if possible, they will preempt running jobs from partitions with lower **PriorityTier** values.
|
||||||
|
|
||||||
* The **`general`** partition is the **default**. It can not have more than 50 nodes running jobs.
|
* The **`general`** partition is the **default**. It can not have more than 50 nodes running jobs.
|
||||||
* For **`daily`** this limitation is extended to 67 nodes.
|
* For **`daily`** this limitation is extended to 67 nodes.
|
||||||
@@ -79,11 +72,18 @@ and, if possible, they will preempt running jobs from partitions with lower *Pri
|
|||||||
* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
|
* **`asa-general`,`asa-daily`,`asa-ansys`,`asa-visas` and `mu3e`** are **private** partitions, belonging to different experiments owning the machines. **Access is restricted** in all cases. However, by agreement with the experiments, nodes are usually added to the **`hourly`** partition as extra resources for the public resources.
|
||||||
|
|
||||||
!!! tip "Partition Selection"
|
!!! tip "Partition Selection"
|
||||||
Jobs which would run for less than one day should be always sent to **daily**, while jobs that would run for less than one hour should be sent to **hourly**. This would ensure that you have highest priority over jobs sent to partitions with less priority, but also because **general** has limited the number of nodes that can be used for that. The idea behind that, is that the cluster can not be blocked by long jobs and we can always ensure resources for shorter jobs.
|
Jobs which would run for less than one day should be always sent to
|
||||||
|
**daily**, while jobs that would run for less than one hour should be sent
|
||||||
|
to **hourly**. This would ensure that you have highest priority over jobs
|
||||||
|
sent to partitions with less priority, but also because **general** has
|
||||||
|
limited the number of nodes that can be used for that. The idea behind
|
||||||
|
that, is that the cluster can not be blocked by long jobs and we can always
|
||||||
|
ensure resources for shorter jobs.
|
||||||
|
|
||||||
### Merlin5 CPU Accounts
|
### Merlin5 CPU Accounts
|
||||||
|
|
||||||
Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
|
Users need to ensure that the public **`merlin`** account is specified. No specifying account options would default to this account.
|
||||||
|
|
||||||
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
|
This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -100,16 +100,14 @@ Not all the accounts can be used on all partitions. This is resumed in the table
|
|||||||
|
|
||||||
#### Private accounts
|
#### Private accounts
|
||||||
|
|
||||||
* The *`gfa-asa`* and *`mu3e`* accounts are private accounts. These can be used for accessing dedicated
|
* The *`gfa-asa`* and *`mu3e`* accounts are private accounts. These can be used for accessing dedicated partitions with nodes owned by different groups.
|
||||||
partitions with nodes owned by different groups.
|
|
||||||
|
|
||||||
### Slurm CPU specific options
|
### Slurm CPU specific options
|
||||||
|
|
||||||
Some options are available when using CPUs. These are detailed here.
|
Some options are available when using CPUs. These are detailed here.
|
||||||
|
Alternative Slurm options for CPU based jobs are available. Please refer to the
|
||||||
Alternative Slurm options for CPU based jobs are available. Please refer to the **man** pages
|
**man** pages for each Slurm command for further information about it (`man
|
||||||
for each Slurm command for further information about it (`man salloc`, `man sbatch`, `man srun`).
|
salloc`, `man sbatch`, `man srun`). Below are listed the most common settings:
|
||||||
Below are listed the most common settings:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
#SBATCH --hint=[no]multithread
|
#SBATCH --hint=[no]multithread
|
||||||
@@ -125,8 +123,9 @@ Below are listed the most common settings:
|
|||||||
|
|
||||||
#### Enabling/Disabling Hyper-Threading
|
#### Enabling/Disabling Hyper-Threading
|
||||||
|
|
||||||
The **`merlin6`** cluster contains nodes with Hyper-Threading enabled. One should always specify
|
The **`merlin6`** cluster contains nodes with Hyper-Threading enabled. One
|
||||||
whether to use Hyper-Threading or not. If not defined, Slurm will generally use it (exceptions apply).
|
should always specify whether to use Hyper-Threading or not. If not defined,
|
||||||
|
Slurm will generally use it (exceptions apply).
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
#SBATCH --hint=multithread # Use extra threads with in-core multi-threading.
|
#SBATCH --hint=multithread # Use extra threads with in-core multi-threading.
|
||||||
@@ -138,7 +137,7 @@ whether to use Hyper-Threading or not. If not defined, Slurm will generally use
|
|||||||
Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more
|
Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more
|
||||||
specific features. For the CPU nodes, we have the following features:
|
specific features. For the CPU nodes, we have the following features:
|
||||||
|
|
||||||
```
|
```text
|
||||||
NodeName=merlin-c-[001-024,101-124,201-224] Features=mem_384gb,xeon-gold-6152
|
NodeName=merlin-c-[001-024,101-124,201-224] Features=mem_384gb,xeon-gold-6152
|
||||||
NodeName=merlin-c-[301-312] Features=mem_768gb,xeon-gold-6240r
|
NodeName=merlin-c-[301-312] Features=mem_768gb,xeon-gold-6240r
|
||||||
NodeName=merlin-c-[313-318] Features=mem_768gb,xeon-gold-6240r
|
NodeName=merlin-c-[313-318] Features=mem_768gb,xeon-gold-6240r
|
||||||
@@ -149,26 +148,36 @@ Therefore, users running on `hourly` can select which node they want to use (fat
|
|||||||
This is possible by using the option `--constraint=<feature_name>` in Slurm.
|
This is possible by using the option `--constraint=<feature_name>` in Slurm.
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
|
|
||||||
1. Select nodes with 48 cores only (nodes with [2 x Xeon Gold 6240R](https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html)):
|
1. Select nodes with 48 cores only (nodes with [2 x Xeon Gold 6240R](https://ark.intel.com/content/www/us/en/ark/products/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz.html)):
|
||||||
```
|
|
||||||
sbatch --constraint=xeon-gold-6240r ...
|
```bash
|
||||||
```
|
sbatch --constraint=xeon-gold-6240r ...
|
||||||
2. Select nodes with 44 cores only (nodes with [2 x Xeon Gold 6152](https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html)):
|
```
|
||||||
```
|
|
||||||
sbatch --constraint=xeon-gold-6152 ...
|
1. Select nodes with 44 cores only (nodes with [2 x Xeon Gold 6152](https://ark.intel.com/content/www/us/en/ark/products/120491/intel-xeon-gold-6152-processor-30-25m-cache-2-10-ghz.html)):
|
||||||
```
|
|
||||||
3. Select fat memory nodes only:
|
```bash
|
||||||
```
|
sbatch --constraint=xeon-gold-6152 ...
|
||||||
sbatch --constraint=mem_768gb ...
|
```
|
||||||
```
|
|
||||||
4. Select regular memory nodes only:
|
1. Select fat memory nodes only:
|
||||||
```
|
|
||||||
sbatch --constraint=mem_384gb ...
|
```bash
|
||||||
```
|
sbatch --constraint=mem_768gb ...
|
||||||
5. Select fat memory nodes with 48 cores only:
|
```
|
||||||
```
|
|
||||||
sbatch --constraint=mem_768gb,xeon-gold-6240r ...
|
1. Select regular memory nodes only:
|
||||||
```
|
|
||||||
|
```bash
|
||||||
|
sbatch --constraint=mem_384gb ...
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Select fat memory nodes with 48 cores only:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sbatch --constraint=mem_768gb,xeon-gold-6240r ...
|
||||||
|
```
|
||||||
|
|
||||||
Detailing exactly which type of nodes you want to use is important, therefore, for groups with private accounts (`mu3e`,`gfa-asa`) or for
|
Detailing exactly which type of nodes you want to use is important, therefore, for groups with private accounts (`mu3e`,`gfa-asa`) or for
|
||||||
public users running on the `hourly` partition, *constraining nodes by features is recommended*. This becomes even more important when
|
public users running on the `hourly` partition, *constraining nodes by features is recommended*. This becomes even more important when
|
||||||
@@ -178,11 +187,11 @@ having heterogeneous clusters.
|
|||||||
|
|
||||||
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
|
In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.
|
||||||
|
|
||||||
### User and job limits
|
### User and job limits
|
||||||
|
|
||||||
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
|
In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to
|
||||||
avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
|
avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example,
|
||||||
pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied).
|
pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied).
|
||||||
In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
|
In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all
|
||||||
resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).
|
resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).
|
||||||
|
|
||||||
@@ -190,14 +199,24 @@ Hence, there is a need of setting up wise limits and to ensure that there is a f
|
|||||||
of the cluster while allowing jobs of different nature and sizes (it is, **single core** based **vs parallel jobs** of different sizes) to run.
|
of the cluster while allowing jobs of different nature and sizes (it is, **single core** based **vs parallel jobs** of different sizes) to run.
|
||||||
|
|
||||||
!!! warning "Resource Limits"
|
!!! warning "Resource Limits"
|
||||||
Wide limits are provided in the **daily** and **hourly** partitions, while for **general** those limits are more restrictive. However, we kindly ask users to inform the Merlin administrators when there are plans to send big jobs which would require a massive draining of nodes for allocating such jobs. This would apply to jobs requiring the **unlimited** QoS (see below "Per job limits").
|
Wide limits are provided in the **daily** and **hourly** partitions, while
|
||||||
|
for **general** those limits are more restrictive. However, we kindly ask
|
||||||
|
users to inform the Merlin administrators when there are plans to send big
|
||||||
|
jobs which would require a massive draining of nodes for allocating such
|
||||||
|
jobs. This would apply to jobs requiring the **unlimited** QoS (see below
|
||||||
|
"Per job limits").
|
||||||
|
|
||||||
!!! tip "Custom Requirements"
|
!!! tip "Custom Requirements"
|
||||||
If you have different requirements, please let us know, we will try to accommodate or propose a solution for you.
|
If you have different requirements, please let us know, we will try to
|
||||||
|
accommodate or propose a solution for you.
|
||||||
|
|
||||||
#### Per job limits
|
#### Per job limits
|
||||||
|
|
||||||
These are limits which apply to a single job. In other words, there is a maximum of resources a single job can use. Limits are described in the table below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`). Some limits will vary depending on the day and time of the week.
|
These are limits which apply to a single job. In other words, there is a
|
||||||
|
maximum of resources a single job can use. Limits are described in the table
|
||||||
|
below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be
|
||||||
|
listed with the command `sacctmgr show qos`). Some limits will vary depending
|
||||||
|
on the day and time of the week.
|
||||||
|
|
||||||
| Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h |
|
| Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h |
|
||||||
|:----------: | :------------------------------: | :------------------------------: | :------------------------------: |
|
|:----------: | :------------------------------: | :------------------------------: | :------------------------------: |
|
||||||
@@ -205,18 +224,29 @@ These are limits which apply to a single job. In other words, there is a maximum
|
|||||||
| **daily** | daytime(cpu=704,mem=2750G) | nighttime(cpu=1408,mem=5500G) | unlimited(cpu=2200,mem=8593.75G) |
|
| **daily** | daytime(cpu=704,mem=2750G) | nighttime(cpu=1408,mem=5500G) | unlimited(cpu=2200,mem=8593.75G) |
|
||||||
| **hourly** | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) |
|
| **hourly** | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) | unlimited(cpu=2200,mem=8593.75G) |
|
||||||
|
|
||||||
By default, a job can not use more than 704 cores (max CPU per job). In the same way, memory is also proportionally limited. This is equivalent as
|
By default, a job can not use more than 704 cores (max CPU per job). In the
|
||||||
running a job using up to 8 nodes at once. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours).
|
same way, memory is also proportionally limited. This is equivalent as running
|
||||||
Limits are softed for the **daily** partition during non working hours, and during the weekend limits are even wider.
|
a job using up to 8 nodes at once. This limit applies to the **general**
|
||||||
|
partition (fixed limit) and to the **daily** partition (only during working
|
||||||
|
hours).
|
||||||
|
|
||||||
For the **hourly** partition, **despite running many parallel jobs is something not desirable** (for allocating such jobs it requires massive draining of nodes),
|
Limits are softed for the **daily** partition during non working hours, and
|
||||||
wider limits are provided. In order to avoid massive nodes drain in the cluster, for allocating huge jobs, setting per job limits is necessary. Hence, **unlimited** QoS
|
during the weekend limits are even wider. For the **hourly** partition,
|
||||||
mostly refers to "per user" limits more than to "per job" limits (in other words, users can run any number of hourly jobs, but the job size for such jobs is limited
|
**despite running many parallel jobs is something not desirable** (for
|
||||||
with wide values).
|
allocating such jobs it requires massive draining of nodes), wider limits are
|
||||||
|
provided. In order to avoid massive nodes drain in the cluster, for allocating
|
||||||
|
huge jobs, setting per job limits is necessary. Hence, **unlimited** QoS mostly
|
||||||
|
refers to "per user" limits more than to "per job" limits (in other words,
|
||||||
|
users can run any number of hourly jobs, but the job size for such jobs is
|
||||||
|
limited with wide values).
|
||||||
|
|
||||||
#### Per user limits for CPU partitions
|
#### Per user limits for CPU partitions
|
||||||
|
|
||||||
These limits which apply exclusively to users. In other words, there is a maximum of resources a single user can use. Limits are described in the table below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be listed with the command `sacctmgr show qos`). Some limits will vary depending on the day and time of the week.
|
These limits which apply exclusively to users. In other words, there is a
|
||||||
|
maximum of resources a single user can use. Limits are described in the table
|
||||||
|
below with the format: `SlurmQoS(limits)` (possible `SlurmQoS` values can be
|
||||||
|
listed with the command `sacctmgr show qos`). Some limits will vary depending
|
||||||
|
on the day and time of the week.
|
||||||
|
|
||||||
| Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h |
|
| Partition | Mon-Fri 0h-18h | Sun-Thu 18h-0h | From Fri 18h to Mon 0h |
|
||||||
|:-----------:| :----------------------------: | :---------------------------: | :----------------------------: |
|
|:-----------:| :----------------------------: | :---------------------------: | :----------------------------: |
|
||||||
@@ -224,15 +254,22 @@ These limits which apply exclusively to users. In other words, there is a maximu
|
|||||||
| **daily** | daytime(cpu=1408,mem=5500G) | nighttime(cpu=2112,mem=8250G) | unlimited(cpu=6336,mem=24750G) |
|
| **daily** | daytime(cpu=1408,mem=5500G) | nighttime(cpu=2112,mem=8250G) | unlimited(cpu=6336,mem=24750G) |
|
||||||
| **hourly** | unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G)| unlimited(cpu=6336,mem=24750G) |
|
| **hourly** | unlimited(cpu=6336,mem=24750G) | unlimited(cpu=6336,mem=24750G)| unlimited(cpu=6336,mem=24750G) |
|
||||||
|
|
||||||
By default, users can not use more than 704 cores at the same time (max CPU per user). Memory is also proportionally limited in the same way. This is
|
By default, users can not use more than 704 cores at the same time (max CPU per
|
||||||
equivalent to 8 exclusive nodes. This limit applies to the **general** partition (fixed limit) and to the **daily** partition (only during working hours).
|
user). Memory is also proportionally limited in the same way. This is
|
||||||
For the **hourly** partition, there are no limits restriction and user limits are removed. Limits are softed for the **daily** partition during non
|
equivalent to 8 exclusive nodes. This limit applies to the **general**
|
||||||
working hours, and during the weekend limits are removed.
|
partition (fixed limit) and to the **daily** partition (only during working
|
||||||
|
hours).
|
||||||
|
|
||||||
|
For the **hourly** partition, there are no limits restriction and user limits
|
||||||
|
are removed. Limits are softed for the **daily** partition during non working
|
||||||
|
hours, and during the weekend limits are removed.
|
||||||
|
|
||||||
## Advanced Slurm configuration
|
## Advanced Slurm configuration
|
||||||
|
|
||||||
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
|
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as
|
||||||
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
|
the batch system technology for managing and scheduling jobs. Slurm has been
|
||||||
|
installed in a **multi-clustered** configuration, allowing to integrate
|
||||||
|
multiple clusters in the same batch system.
|
||||||
|
|
||||||
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
|
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
|
||||||
|
|
||||||
@@ -240,5 +277,10 @@ For understanding the Slurm configuration setup in the cluster, sometimes may be
|
|||||||
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
|
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
|
||||||
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
|
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
|
||||||
|
|
||||||
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
|
The previous configuration files which can be found in the login nodes,
|
||||||
Configuration files for the old **merlin5** cluster or for the **gmerlin6** cluster must be checked directly on any of the **merlin5** or **gmerlin6** computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).
|
correspond exclusively to the **merlin6** cluster configuration files.
|
||||||
|
|
||||||
|
Configuration files for the old **merlin5** cluster or for the **gmerlin6**
|
||||||
|
cluster must be checked directly on any of the **merlin5** or **gmerlin6**
|
||||||
|
computing nodes (in example, by login in to one of the nodes while a job or an
|
||||||
|
active allocation is running).
|
||||||
|
|||||||
@@ -57,7 +57,7 @@ a shell (`$SHELL`) at the end of the `salloc` command. In example:
|
|||||||
```bash
|
```bash
|
||||||
# Typical 'salloc' call
|
# Typical 'salloc' call
|
||||||
# - Same as running:
|
# - Same as running:
|
||||||
# 'salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL'
|
# 'salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL'
|
||||||
salloc --clusters=merlin6 -N 2 -n 2
|
salloc --clusters=merlin6 -N 2 -n 2
|
||||||
|
|
||||||
# Custom 'salloc' call
|
# Custom 'salloc' call
|
||||||
@@ -155,7 +155,7 @@ srun --clusters=merlin6 --x11 --pty bash
|
|||||||
srun: job 135095591 queued and waiting for resources
|
srun: job 135095591 queued and waiting for resources
|
||||||
srun: job 135095591 has been allocated resources
|
srun: job 135095591 has been allocated resources
|
||||||
|
|
||||||
(base) [caubet_m@merlin-l-001 ~]$
|
(base) [caubet_m@merlin-l-001 ~]$
|
||||||
|
|
||||||
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 --pty bash
|
(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 --pty bash
|
||||||
srun: job 135095592 queued and waiting for resources
|
srun: job 135095592 queued and waiting for resources
|
||||||
@@ -198,7 +198,7 @@ salloc --clusters=merlin6 --x11
|
|||||||
salloc: Granted job allocation 135171355
|
salloc: Granted job allocation 135171355
|
||||||
salloc: Relinquishing job allocation 135171355
|
salloc: Relinquishing job allocation 135171355
|
||||||
|
|
||||||
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11
|
(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11
|
||||||
salloc: Pending job allocation 135171349
|
salloc: Pending job allocation 135171349
|
||||||
salloc: job 135171349 queued and waiting for resources
|
salloc: job 135171349 queued and waiting for resources
|
||||||
salloc: job 135171349 has been allocated resources
|
salloc: job 135171349 has been allocated resources
|
||||||
|
|||||||
@@ -166,20 +166,20 @@ sjstat
|
|||||||
|
|
||||||
Scheduling pool data:
|
Scheduling pool data:
|
||||||
----------------------------------------------------------------------------------
|
----------------------------------------------------------------------------------
|
||||||
Total Usable Free Node Time Other
|
Total Usable Free Node Time Other
|
||||||
Pool Memory Cpus Nodes Nodes Nodes Limit Limit traits
|
Pool Memory Cpus Nodes Nodes Nodes Limit Limit traits
|
||||||
----------------------------------------------------------------------------------
|
----------------------------------------------------------------------------------
|
||||||
test 373502Mb 88 6 6 1 UNLIM 1-00:00:00
|
test 373502Mb 88 6 6 1 UNLIM 1-00:00:00
|
||||||
general* 373502Mb 88 66 66 8 50 7-00:00:00
|
general* 373502Mb 88 66 66 8 50 7-00:00:00
|
||||||
daily 373502Mb 88 72 72 9 60 1-00:00:00
|
daily 373502Mb 88 72 72 9 60 1-00:00:00
|
||||||
hourly 373502Mb 88 72 72 9 UNLIM 01:00:00
|
hourly 373502Mb 88 72 72 9 UNLIM 01:00:00
|
||||||
gpu 128000Mb 8 1 1 0 UNLIM 7-00:00:00
|
gpu 128000Mb 8 1 1 0 UNLIM 7-00:00:00
|
||||||
gpu 128000Mb 20 8 8 0 UNLIM 7-00:00:00
|
gpu 128000Mb 20 8 8 0 UNLIM 7-00:00:00
|
||||||
|
|
||||||
Running job data:
|
Running job data:
|
||||||
---------------------------------------------------------------------------------------------------
|
---------------------------------------------------------------------------------------------------
|
||||||
Time Time Time
|
Time Time Time
|
||||||
JobID User Procs Pool Status Used Limit Started Master/Other
|
JobID User Procs Pool Status Used Limit Started Master/Other
|
||||||
---------------------------------------------------------------------------------------------------
|
---------------------------------------------------------------------------------------------------
|
||||||
13433377 collu_g 1 gpu PD 0:00 24:00:00 N/A (Resources)
|
13433377 collu_g 1 gpu PD 0:00 24:00:00 N/A (Resources)
|
||||||
13433389 collu_g 20 gpu PD 0:00 24:00:00 N/A (Resources)
|
13433389 collu_g 20 gpu PD 0:00 24:00:00 N/A (Resources)
|
||||||
@@ -249,11 +249,10 @@ sview
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
## General Monitoring
|
## General Monitoring
|
||||||
|
|
||||||
The following pages contain basic monitoring for Slurm and computing nodes.
|
The following pages contain basic monitoring for Slurm and computing nodes.
|
||||||
Currently, monitoring is based on Grafana + InfluxDB. In the future it will
|
Currently, monitoring is based on Grafana + InfluxDB. In the future it will
|
||||||
be moved to a different service based on ElasticSearch + LogStash + Kibana.
|
be moved to a different service based on ElasticSearch + LogStash + Kibana.
|
||||||
|
|
||||||
In the meantime, the following monitoring pages are available in a best effort
|
In the meantime, the following monitoring pages are available in a best effort
|
||||||
@@ -262,17 +261,17 @@ support:
|
|||||||
### Merlin6 Monitoring Pages
|
### Merlin6 Monitoring Pages
|
||||||
|
|
||||||
* Slurm monitoring:
|
* Slurm monitoring:
|
||||||
* ***[Merlin6 Slurm Statistics - XDMOD](https://merlin-slurmmon01.psi.ch/)***
|
* ***[Merlin6 Slurm Statistics - XDMOD](https://merlin-slurmmon01.psi.ch/)***
|
||||||
* [Merlin6 Slurm Live Status](https://hpc-monitor02.psi.ch/d/QNcbW1AZk/merlin6-slurm-live-status?orgId=1&refresh=10s)
|
* [Merlin6 Slurm Live Status](https://hpc-monitor02.psi.ch/d/QNcbW1AZk/merlin6-slurm-live-status?orgId=1&refresh=10s)
|
||||||
* [Merlin6 Slurm Overview](https://hpc-monitor02.psi.ch/d/94UxWJ0Zz/merlin6-slurm-overview?orgId=1&refresh=10s)
|
* [Merlin6 Slurm Overview](https://hpc-monitor02.psi.ch/d/94UxWJ0Zz/merlin6-slurm-overview?orgId=1&refresh=10s)
|
||||||
* Nodes monitoring:
|
* Nodes monitoring:
|
||||||
* [Merlin6 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/JmvLR8gZz/merlin6-computing-cpu-nodes?orgId=1&refresh=10s)
|
* [Merlin6 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/JmvLR8gZz/merlin6-computing-cpu-nodes?orgId=1&refresh=10s)
|
||||||
* [Merlin6 GPU Nodes Overview](https://hpc-monitor02.psi.ch/d/gOo1Z10Wk/merlin6-computing-gpu-nodes?orgId=1&refresh=10s)
|
* [Merlin6 GPU Nodes Overview](https://hpc-monitor02.psi.ch/d/gOo1Z10Wk/merlin6-computing-gpu-nodes?orgId=1&refresh=10s)
|
||||||
|
|
||||||
### Merlin5 Monitoring Pages
|
### Merlin5 Monitoring Pages
|
||||||
|
|
||||||
* Slurm monitoring:
|
* Slurm monitoring:
|
||||||
* [Merlin5 Slurm Live Status](https://hpc-monitor02.psi.ch/d/o8msZJ0Zz/merlin5-slurm-live-status?orgId=1&refresh=10s)
|
* [Merlin5 Slurm Live Status](https://hpc-monitor02.psi.ch/d/o8msZJ0Zz/merlin5-slurm-live-status?orgId=1&refresh=10s)
|
||||||
* [Merlin5 Slurm Overview](https://hpc-monitor02.psi.ch/d/eWLEW1AWz/merlin5-slurm-overview?orgId=1&refresh=10s)
|
* [Merlin5 Slurm Overview](https://hpc-monitor02.psi.ch/d/eWLEW1AWz/merlin5-slurm-overview?orgId=1&refresh=10s)
|
||||||
* Nodes monitoring:
|
* Nodes monitoring:
|
||||||
* [Merlin5 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/ejTyWJAWk/merlin5-computing-cpu-nodes?orgId=1&refresh=10s)
|
* [Merlin5 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/ejTyWJAWk/merlin5-computing-cpu-nodes?orgId=1&refresh=10s)
|
||||||
|
|||||||
@@ -5,19 +5,19 @@
|
|||||||
Before starting using the cluster, please read the following rules:
|
Before starting using the cluster, please read the following rules:
|
||||||
|
|
||||||
1. To ease and improve *scheduling* and *backfilling*, always try to **estimate and** to **define a proper run time** of your jobs:
|
1. To ease and improve *scheduling* and *backfilling*, always try to **estimate and** to **define a proper run time** of your jobs:
|
||||||
* Use `--time=<D-HH:MM:SS>` for that.
|
* Use `--time=<D-HH:MM:SS>` for that.
|
||||||
* For very long runs, please consider using ***[Job Arrays with Checkpointing](#array-jobs-running-very-long-tasks-with-checkpoint-files)***
|
* For very long runs, please consider using ***[Job Arrays with Checkpointing](#array-jobs-running-very-long-tasks-with-checkpoint-files)***
|
||||||
2. Try to optimize your jobs for running at most within **one day**. Please, consider the following:
|
2. Try to optimize your jobs for running at most within **one day**. Please, consider the following:
|
||||||
* Some software can simply scale up by using more nodes while drastically reducing the run time.
|
* Some software can simply scale up by using more nodes while drastically reducing the run time.
|
||||||
* Some software allow to save a specific state, and a second job can start from that state: ***[Job Arrays with Checkpointing](#array-jobs-running-very-long-tasks-with-checkpoint-files)*** can help you with that.
|
* Some software allow to save a specific state, and a second job can start from that state: ***[Job Arrays with Checkpointing](#array-jobs-running-very-long-tasks-with-checkpoint-files)*** can help you with that.
|
||||||
* Jobs submitted to **`hourly`** get more priority than jobs submitted to **`daily`**: always use **`hourly`** for jobs shorter than 1 hour.
|
* Jobs submitted to **`hourly`** get more priority than jobs submitted to **`daily`**: always use **`hourly`** for jobs shorter than 1 hour.
|
||||||
* Jobs submitted to **`daily`** get more priority than jobs submitted to **`general`**: always use **`daily`** for jobs shorter than 1 day.
|
* Jobs submitted to **`daily`** get more priority than jobs submitted to **`general`**: always use **`daily`** for jobs shorter than 1 day.
|
||||||
3. Is **forbidden** to run **very short jobs** as they cause a lot of overhead but also can cause severe problems to the main scheduler.
|
3. Is **forbidden** to run **very short jobs** as they cause a lot of overhead but also can cause severe problems to the main scheduler.
|
||||||
* ***Question:*** Is my job a very short job? ***Answer:*** If it lasts in few seconds or very few minutes, yes.
|
* ***Question:*** Is my job a very short job? ***Answer:*** If it lasts in few seconds or very few minutes, yes.
|
||||||
* ***Question:*** How long should my job run? ***Answer:*** as the *Rule of Thumb*, from 5' would start being ok, from 15' would preferred.
|
* ***Question:*** How long should my job run? ***Answer:*** as the *Rule of Thumb*, from 5' would start being ok, from 15' would preferred.
|
||||||
* Use ***[Packed Jobs](#packed-jobs-running-a-large-number-of-short-tasks)*** for running a large number of short tasks.
|
* Use ***[Packed Jobs](#packed-jobs-running-a-large-number-of-short-tasks)*** for running a large number of short tasks.
|
||||||
4. Do not submit hundreds of similar jobs!
|
4. Do not submit hundreds of similar jobs!
|
||||||
* Use ***[Array Jobs](#array-jobs-launching-a-large-number-of-related-jobs)*** for gathering jobs instead.
|
* Use ***[Array Jobs](#array-jobs-launching-a-large-number-of-related-jobs)*** for gathering jobs instead.
|
||||||
|
|
||||||
!!! tip
|
!!! tip
|
||||||
Having a good estimation of the *time* needed by your jobs, a proper way for
|
Having a good estimation of the *time* needed by your jobs, a proper way for
|
||||||
@@ -37,6 +37,7 @@ Before starting using the cluster, please read the following rules:
|
|||||||
## Basic settings
|
## Basic settings
|
||||||
|
|
||||||
For a complete list of options and parameters available is recommended to use the **man pages** (i.e. `man sbatch`, `man srun`, `man salloc`).
|
For a complete list of options and parameters available is recommended to use the **man pages** (i.e. `man sbatch`, `man srun`, `man salloc`).
|
||||||
|
|
||||||
Please, notice that behaviour for some parameters might change depending on the command used when running jobs (in example, `--exclusive` behaviour in `sbatch` differs from `srun`).
|
Please, notice that behaviour for some parameters might change depending on the command used when running jobs (in example, `--exclusive` behaviour in `sbatch` differs from `srun`).
|
||||||
|
|
||||||
In this chapter we show the basic parameters which are usually needed in the Merlin cluster.
|
In this chapter we show the basic parameters which are usually needed in the Merlin cluster.
|
||||||
@@ -115,20 +116,20 @@ The following template should be used by any user submitting jobs to the Merlin6
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
#SBATCH --cluster=merlin6 # Cluster name
|
#SBATCH --cluster=merlin6 # Cluster name
|
||||||
#SBATCH --partition=general,daily,hourly # Specify one or multiple partitions
|
#SBATCH --partition=general,daily,hourly # Specify one or multiple partitions
|
||||||
#SBATCH --time=<D-HH:MM:SS> # Strongly recommended
|
#SBATCH --time=<D-HH:MM:SS> # Strongly recommended
|
||||||
#SBATCH --output=<output_file> # Generate custom output file
|
#SBATCH --output=<output_file> # Generate custom output file
|
||||||
#SBATCH --error=<error_file> # Generate custom error file
|
#SBATCH --error=<error_file> # Generate custom error file
|
||||||
#SBATCH --hint=nomultithread # Mandatory for multithreaded jobs
|
#SBATCH --hint=nomultithread # Mandatory for multithreaded jobs
|
||||||
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
##SBATCH --exclusive # Uncomment if you need exclusive node usage
|
||||||
##SBATCH --ntasks-per-core=1 # Only mandatory for multithreaded single tasks
|
##SBATCH --ntasks-per-core=1 # Only mandatory for multithreaded single tasks
|
||||||
|
|
||||||
## Advanced options example
|
## Advanced options example
|
||||||
##SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
##SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
||||||
##SBATCH --ntasks=44 # Uncomment and specify #nodes to use
|
##SBATCH --ntasks=44 # Uncomment and specify #nodes to use
|
||||||
##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node
|
##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node
|
||||||
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Multithreaded jobs template
|
#### Multithreaded jobs template
|
||||||
@@ -241,7 +242,7 @@ strategy:
|
|||||||
#SBATCH --time=7-00:00:00 # each job can run for 7 days
|
#SBATCH --time=7-00:00:00 # each job can run for 7 days
|
||||||
#SBATCH --cpus-per-task=1
|
#SBATCH --cpus-per-task=1
|
||||||
#SBATCH --array=1-10%1 # Run a 10-job array, one job at a time.
|
#SBATCH --array=1-10%1 # Run a 10-job array, one job at a time.
|
||||||
if test -e checkpointfile; then
|
if test -e checkpointfile; then
|
||||||
# There is a checkpoint file;
|
# There is a checkpoint file;
|
||||||
myprogram --read-checkp checkpointfile
|
myprogram --read-checkp checkpointfile
|
||||||
else
|
else
|
||||||
|
|||||||
@@ -9,9 +9,9 @@ information about options and examples.
|
|||||||
Useful commands for the slurm:
|
Useful commands for the slurm:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sinfo # to see the name of nodes, their occupancy,
|
sinfo # to see the name of nodes, their occupancy,
|
||||||
# name of slurm partitions, limits (try out with "-l" option)
|
# name of slurm partitions, limits (try out with "-l" option)
|
||||||
squeue # to see the currently running/waiting jobs in slurm
|
squeue # to see the currently running/waiting jobs in slurm
|
||||||
# (additional "-l" option may also be useful)
|
# (additional "-l" option may also be useful)
|
||||||
sbatch Script.sh # to submit a script (example below) to the slurm.
|
sbatch Script.sh # to submit a script (example below) to the slurm.
|
||||||
srun <command> # to submit a command to Slurm. Same options as in 'sbatch' can be used.
|
srun <command> # to submit a command to Slurm. Same options as in 'sbatch' can be used.
|
||||||
@@ -30,7 +30,7 @@ sacct # Show job accounting, useful for checking details of finished
|
|||||||
```bash
|
```bash
|
||||||
sinfo -N -l # list nodes, state, resources (#CPUs, memory per node, ...), etc.
|
sinfo -N -l # list nodes, state, resources (#CPUs, memory per node, ...), etc.
|
||||||
sshare -a # to list shares of associations to a cluster
|
sshare -a # to list shares of associations to a cluster
|
||||||
sprio -l # to view the factors that comprise a job's scheduling priority
|
sprio -l # to view the factors that comprise a job's scheduling priority
|
||||||
# add '-u <username>' for filtering user
|
# add '-u <username>' for filtering user
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -251,7 +251,6 @@ The `%1` in the `#SBATCH --array=1-10%1` statement defines that only 1 subjob ca
|
|||||||
this will result in subjob n+1 only being started when job n has finished. It will read the checkpoint file
|
this will result in subjob n+1 only being started when job n has finished. It will read the checkpoint file
|
||||||
if it is present.
|
if it is present.
|
||||||
|
|
||||||
|
|
||||||
### Packed jobs: running a large number of short tasks
|
### Packed jobs: running a large number of short tasks
|
||||||
|
|
||||||
Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate
|
Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate
|
||||||
|
|||||||
@@ -54,7 +54,7 @@ export ANSYSLI_SERVERS=2325@$LICENSE_SERVER
|
|||||||
# [Optional:END]
|
# [Optional:END]
|
||||||
|
|
||||||
SOLVER_FILE=/data/user/caubet_m/CFX5/mysolver.in
|
SOLVER_FILE=/data/user/caubet_m/CFX5/mysolver.in
|
||||||
cfx5solve -batch -def "$JOURNAL_FILE"
|
cfx5solve -batch -def "$JOURNAL_FILE"
|
||||||
```
|
```
|
||||||
|
|
||||||
One can enable hypertheading by defining `--hint=multithread`,
|
One can enable hypertheading by defining `--hint=multithread`,
|
||||||
@@ -99,23 +99,24 @@ if [ "$INTELMPI" == "yes" ]
|
|||||||
then
|
then
|
||||||
export I_MPI_DEBUG=4
|
export I_MPI_DEBUG=4
|
||||||
export I_MPI_PIN_CELL=core
|
export I_MPI_PIN_CELL=core
|
||||||
|
|
||||||
# Simple example: cfx5solve -batch -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
# Simple example: cfx5solve -batch -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
||||||
# -part $SLURM_NTASKS \
|
# -part $SLURM_NTASKS \
|
||||||
# -start-method 'Intel MPI Distributed Parallel'
|
# -start-method 'Intel MPI Distributed Parallel'
|
||||||
cfx5solve -batch -part-large -double -verbose -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
cfx5solve -batch -part-large -double -verbose -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
||||||
-part $SLURM_NTASKS -par-local -start-method 'Intel MPI Distributed Parallel'
|
-part $SLURM_NTASKS -par-local -start-method 'Intel MPI Distributed Parallel'
|
||||||
else
|
else
|
||||||
# Simple example: cfx5solve -batch -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
# Simple example: cfx5solve -batch -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
||||||
# -part $SLURM_NTASKS \
|
# -part $SLURM_NTASKS \
|
||||||
# -start-method 'IBM MPI Distributed Parallel'
|
# -start-method 'IBM MPI Distributed Parallel'
|
||||||
cfx5solve -batch -part-large -double -verbose -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
cfx5solve -batch -part-large -double -verbose -def "$JOURNAL_FILE" -par-dist "$HOSTLIST" \
|
||||||
-part $SLURM_NTASKS -par-local -start-method 'IBM MPI Distributed Parallel'
|
-part $SLURM_NTASKS -par-local -start-method 'IBM MPI Distributed Parallel'
|
||||||
fi
|
fi
|
||||||
```
|
```
|
||||||
|
|
||||||
In the above example, one can increase the number of *nodes* and/or *ntasks* if needed and combine it
|
In the above example, one can increase the number of *nodes* and/or *ntasks* if needed and combine it
|
||||||
with `--exclusive` whenever needed. In general, **no hypertheading** is recommended for MPI based jobs.
|
with `--exclusive` whenever needed. In general, **no hypertheading** is recommended for MPI based jobs.
|
||||||
|
|
||||||
Also, one can combine it with `--exclusive` when necessary. Finally, one can change the MPI technology in `-start-method`
|
Also, one can combine it with `--exclusive` when necessary. Finally, one can change the MPI technology in `-start-method`
|
||||||
(check CFX documentation for possible values).
|
(check CFX documentation for possible values).
|
||||||
|
|
||||||
|
|||||||
@@ -75,10 +75,10 @@ To setup HFSS RSM for using it with the Merlin cluster, it must be done from the
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
* **Select Scheduler**: `Remote RSM`.
|
* **Select Scheduler**: `Remote RSM`.
|
||||||
* **Server**: Add a Merlin login node.
|
* **Server**: Add a Merlin login node.
|
||||||
* **User name**: Add your Merlin username.
|
* **User name**: Add your Merlin username.
|
||||||
* **Password**: Add you Merlin username password.
|
* **Password**: Add you Merlin username password.
|
||||||
|
|
||||||
Once *refreshed*, the **Scheduler info** box must provide **Slurm**
|
Once *refreshed*, the **Scheduler info** box must provide **Slurm**
|
||||||
information of the server (see above picture). If the box contains that
|
information of the server (see above picture). If the box contains that
|
||||||
@@ -92,7 +92,7 @@ To setup HFSS RSM for using it with the Merlin cluster, it must be done from the
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
* In example, for **ANSYS/2022R1**, the location is `/data/software/pmodules/Tools/ANSYS/2021R1/v211/AnsysEM21.1/Linux64/ansysedt.exe`.
|
* In example, for **ANSYS/2022R1**, the location is `/data/software/pmodules/Tools/ANSYS/2021R1/v211/AnsysEM21.1/Linux64/ansysedt.exe`.
|
||||||
|
|
||||||
### HFSS Slurm (from login node only)
|
### HFSS Slurm (from login node only)
|
||||||
|
|
||||||
@@ -118,10 +118,10 @@ Desktop** to submit to Slurm. This can set as follows:
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
* **Select Scheduler**: `Slurm`.
|
* **Select Scheduler**: `Slurm`.
|
||||||
* **Server**: must point to `localhost`.
|
* **Server**: must point to `localhost`.
|
||||||
* **User name**: must be empty.
|
* **User name**: must be empty.
|
||||||
* **Password**: must be empty.
|
* **Password**: must be empty.
|
||||||
|
|
||||||
The **Server, User name** and **Password** boxes can't be modified, but if
|
The **Server, User name** and **Password** boxes can't be modified, but if
|
||||||
value do not match with the above settings, they should be changed by
|
value do not match with the above settings, they should be changed by
|
||||||
|
|||||||
@@ -1,6 +1,4 @@
|
|||||||
---
|
# ANSYS - MAPDL
|
||||||
title: ANSYS - MAPDL
|
|
||||||
---
|
|
||||||
|
|
||||||
# ANSYS - Mechanical APDL
|
# ANSYS - Mechanical APDL
|
||||||
|
|
||||||
@@ -143,12 +141,12 @@ then
|
|||||||
# When using -mpi=intelmpi, KMP Affinity must be disabled
|
# When using -mpi=intelmpi, KMP Affinity must be disabled
|
||||||
export KMP_AFFINITY=disabled
|
export KMP_AFFINITY=disabled
|
||||||
|
|
||||||
# INTELMPI is not aware about distribution of tasks.
|
# INTELMPI is not aware about distribution of tasks.
|
||||||
# - We need to define tasks distribution.
|
# - We need to define tasks distribution.
|
||||||
HOSTLIST=$(srun hostname | sort | uniq -c | awk '{print $2 ":" $1}' | tr '\n' ':' | sed 's/:$/\n/g')
|
HOSTLIST=$(srun hostname | sort | uniq -c | awk '{print $2 ":" $1}' | tr '\n' ':' | sed 's/:$/\n/g')
|
||||||
mapdl -b -dis -mpi intelmpi -machines $HOSTLIST -np ${SLURM_NTASKS} -i "$SOLVER_FILE"
|
mapdl -b -dis -mpi intelmpi -machines $HOSTLIST -np ${SLURM_NTASKS} -i "$SOLVER_FILE"
|
||||||
else
|
else
|
||||||
# IBMMPI (default) will be aware of the distribution of tasks.
|
# IBMMPI (default) will be aware of the distribution of tasks.
|
||||||
# - In principle, no need to force tasks distribution
|
# - In principle, no need to force tasks distribution
|
||||||
mapdl -b -dis -mpi ibmmpi -np ${SLURM_NTASKS} -i "$SOLVER_FILE"
|
mapdl -b -dis -mpi ibmmpi -np ${SLURM_NTASKS} -i "$SOLVER_FILE"
|
||||||
fi
|
fi
|
||||||
|
|||||||
@@ -56,25 +56,25 @@ The different steps and settings required to make it work are that following:
|
|||||||
2. Right-click the **HPC Resources** icon followed by **Add HPC Resource...**
|
2. Right-click the **HPC Resources** icon followed by **Add HPC Resource...**
|
||||||

|

|
||||||
3. In the **HPC Resource** tab, fill up the corresponding fields as follows:
|
3. In the **HPC Resource** tab, fill up the corresponding fields as follows:
|
||||||

|

|
||||||
* **"Name"**: Add here the preffered name for the cluster. In example: `Merlin6 cluster - merlin-l-001`
|
* **"Name"**: Add here the preffered name for the cluster. In example: `Merlin6 cluster - merlin-l-001`
|
||||||
* **"HPC Type"**: Select `SLURM`
|
* **"HPC Type"**: Select `SLURM`
|
||||||
* **"Submit host"**: Add one of the login nodes. In example `merlin-l-001`.
|
* **"Submit host"**: Add one of the login nodes. In example `merlin-l-001`.
|
||||||
* **"Slurm Job submission arguments (optional)"**: Add any required Slurm options for running your jobs.
|
* **"Slurm Job submission arguments (optional)"**: Add any required Slurm options for running your jobs.
|
||||||
* In general, `--hint=nomultithread` should be at least present.
|
* In general, `--hint=nomultithread` should be at least present.
|
||||||
* Check **"Use SSH protocol for inter and intra-node communication (Linux only)"**
|
* Check **"Use SSH protocol for inter and intra-node communication (Linux only)"**
|
||||||
* Select **"Able to directly submit and monitor HPC jobs"**.
|
* Select **"Able to directly submit and monitor HPC jobs"**.
|
||||||
* **"Apply"** changes.
|
* **"Apply"** changes.
|
||||||
4. In the **"File Management"** tab, fill up the corresponding fields as follows:
|
4. In the **"File Management"** tab, fill up the corresponding fields as follows:
|
||||||

|

|
||||||
* Select **"RSM internal file transfer mechanism"** and add **`/shared-scratch`** as the **"Staging directory path on Cluster"**
|
* Select **"RSM internal file transfer mechanism"** and add **`/shared-scratch`** as the **"Staging directory path on Cluster"**
|
||||||
* Select **"Scratch directory local to the execution node(s)"** and add **`/scratch`** as the **HPC scratch directory**.
|
* Select **"Scratch directory local to the execution node(s)"** and add **`/scratch`** as the **HPC scratch directory**.
|
||||||
* **Never check** the option "Keep job files in the staging directory when job is complete" if the previous
|
* **Never check** the option "Keep job files in the staging directory when job is complete" if the previous
|
||||||
option "Scratch directory local to the execution node(s)" was set.
|
option "Scratch directory local to the execution node(s)" was set.
|
||||||
* **"Apply"** changes.
|
* **"Apply"** changes.
|
||||||
5. In the **"Queues"** tab, use the left button to auto-discover partitions
|
5. In the **"Queues"** tab, use the left button to auto-discover partitions
|
||||||

|

|
||||||
* If no authentication method was configured before, an authentication window will appear. Use your
|
* If no authentication method was configured before, an authentication window will appear. Use your
|
||||||
PSI account to authenticate. Notice that the **`PSICH\`** prefix **must not be added**.
|
PSI account to authenticate. Notice that the **`PSICH\`** prefix **must not be added**.
|
||||||

|

|
||||||
* From the partition list, select the ones you want to typically use.
|
* From the partition list, select the ones you want to typically use.
|
||||||
|
|||||||
@@ -40,17 +40,17 @@ option. This will show the location of the different ANSYS releases as follows:
|
|||||||
|
|
||||||
Module Rel.stage Group Dependencies/Modulefile
|
Module Rel.stage Group Dependencies/Modulefile
|
||||||
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||||
ANSYS/2019R3 stable Tools dependencies:
|
ANSYS/2019R3 stable Tools dependencies:
|
||||||
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2019R3
|
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2019R3
|
||||||
ANSYS/2020R1 stable Tools dependencies:
|
ANSYS/2020R1 stable Tools dependencies:
|
||||||
modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1
|
modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1
|
||||||
ANSYS/2020R1-1 stable Tools dependencies:
|
ANSYS/2020R1-1 stable Tools dependencies:
|
||||||
modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1-1
|
modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1-1
|
||||||
ANSYS/2020R2 stable Tools dependencies:
|
ANSYS/2020R2 stable Tools dependencies:
|
||||||
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2020R2
|
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2020R2
|
||||||
ANSYS/2021R1 stable Tools dependencies:
|
ANSYS/2021R1 stable Tools dependencies:
|
||||||
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R1
|
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R1
|
||||||
ANSYS/2021R2 stable Tools dependencies:
|
ANSYS/2021R2 stable Tools dependencies:
|
||||||
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R2
|
modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R2
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -62,6 +62,7 @@ option. This will show the location of the different ANSYS releases as follows:
|
|||||||
### ANSYS RSM
|
### ANSYS RSM
|
||||||
|
|
||||||
**ANSYS Remote Solve Manager (RSM)** is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop.
|
**ANSYS Remote Solve Manager (RSM)** is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop.
|
||||||
|
|
||||||
Therefore, PSI workstations with direct access to Merlin can submit jobs by using RSM.
|
Therefore, PSI workstations with direct access to Merlin can submit jobs by using RSM.
|
||||||
|
|
||||||
For further information, please visit the **[ANSYS RSM](ansys-rsm.md)** section.
|
For further information, please visit the **[ANSYS RSM](ansys-rsm.md)** section.
|
||||||
|
|||||||
@@ -78,7 +78,7 @@ To submit an interactive job, consider the following requirements:
|
|||||||
# Example 1: Define GTHTMP before the allocation
|
# Example 1: Define GTHTMP before the allocation
|
||||||
export GTHTMP=/scratch
|
export GTHTMP=/scratch
|
||||||
salloc ...
|
salloc ...
|
||||||
|
|
||||||
# Example 2: Define GTHTMP after the allocation
|
# Example 2: Define GTHTMP after the allocation
|
||||||
salloc ...
|
salloc ...
|
||||||
export GTHTMP=/scratch
|
export GTHTMP=/scratch
|
||||||
@@ -89,7 +89,7 @@ To submit an interactive job, consider the following requirements:
|
|||||||
allocation! In example:
|
allocation! In example:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Example 1:
|
# Example 1:
|
||||||
export GTHTMP=/scratch/$USER
|
export GTHTMP=/scratch/$USER
|
||||||
salloc ...
|
salloc ...
|
||||||
mkdir -p $GTHTMP
|
mkdir -p $GTHTMP
|
||||||
@@ -125,7 +125,7 @@ the [General requirements](#general-requirements) section.
|
|||||||
* Requesting a full node:
|
* Requesting a full node:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
salloc --partition=hourly -N 1 -n 1 -c 88 --hint=multithread --x11 --exclusive --mem=0
|
salloc --partition=hourly -N 1 -n 1 -c 88 --hint=multithread --x11 --exclusive --mem=0
|
||||||
```
|
```
|
||||||
|
|
||||||
* Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):
|
* Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):
|
||||||
@@ -177,16 +177,16 @@ requirements](#general-requirements) section, and:
|
|||||||
#SBATCH --exclusive
|
#SBATCH --exclusive
|
||||||
#SBATCH --mem=0
|
#SBATCH --mem=0
|
||||||
#SBATCH --clusters=merlin6
|
#SBATCH --clusters=merlin6
|
||||||
|
|
||||||
INPUT_FILE='MY_INPUT.SIN'
|
INPUT_FILE='MY_INPUT.SIN'
|
||||||
|
|
||||||
mkdir -p /scratch/$USER/$SLURM_JOB_ID
|
mkdir -p /scratch/$USER/$SLURM_JOB_ID
|
||||||
export GTHTMP=/scratch/$USER/$SLURM_JOB_ID
|
export GTHTMP=/scratch/$USER/$SLURM_JOB_ID
|
||||||
|
|
||||||
/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK
|
/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK
|
||||||
gth_exit_code=$?
|
gth_exit_code=$?
|
||||||
|
|
||||||
# Clean up data in /scratch
|
# Clean up data in /scratch
|
||||||
rm -rf /scratch/$USER/$SLURM_JOB_ID
|
rm -rf /scratch/$USER/$SLURM_JOB_ID
|
||||||
|
|
||||||
# Return exit code from GOTHIC
|
# Return exit code from GOTHIC
|
||||||
@@ -205,16 +205,16 @@ requirements](#general-requirements) section, and:
|
|||||||
#SBATCH --cpus-per-task=22
|
#SBATCH --cpus-per-task=22
|
||||||
#SBATCH --hint=multithread
|
#SBATCH --hint=multithread
|
||||||
#SBATCH --clusters=merlin6
|
#SBATCH --clusters=merlin6
|
||||||
|
|
||||||
INPUT_FILE='MY_INPUT.SIN'
|
INPUT_FILE='MY_INPUT.SIN'
|
||||||
|
|
||||||
mkdir -p /scratch/$USER/$SLURM_JOB_ID
|
mkdir -p /scratch/$USER/$SLURM_JOB_ID
|
||||||
export GTHTMP=/scratch/$USER/$SLURM_JOB_ID
|
export GTHTMP=/scratch/$USER/$SLURM_JOB_ID
|
||||||
|
|
||||||
/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK
|
/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK
|
||||||
gth_exit_code=$?
|
gth_exit_code=$?
|
||||||
|
|
||||||
# Clean up data in /scratch
|
# Clean up data in /scratch
|
||||||
rm -rf /scratch/$USER/$SLURM_JOB_ID
|
rm -rf /scratch/$USER/$SLURM_JOB_ID
|
||||||
|
|
||||||
# Return exit code from GOTHIC
|
# Return exit code from GOTHIC
|
||||||
|
|||||||
@@ -2,15 +2,15 @@
|
|||||||
|
|
||||||
## SSH Access
|
## SSH Access
|
||||||
|
|
||||||
For interactive command shell access, use an SSH client. We recommend to activate SSH's X11 forwarding to allow you to use graphical
|
For interactive command shell access, use an SSH client. We recommend to activate SSH's X11 forwarding to allow you to use graphical
|
||||||
applications (e.g. a text editor, but for more performant graphical access, refer to the sections below). X applications are supported
|
applications (e.g. a text editor, but for more performant graphical access, refer to the sections below). X applications are supported
|
||||||
in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:
|
in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:
|
||||||
|
|
||||||
* Merlin7 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
|
* Merlin7 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
|
||||||
* Hence, Merlin7 administrators **do not offer official support** for X11 client setup.
|
* Hence, Merlin7 administrators **do not offer official support** for X11 client setup.
|
||||||
* Nevertheless, a generic guide for X11 client setup (*Linux*, *Windows* and *MacOS*) is provided below.
|
* Nevertheless, a generic guide for X11 client setup (*Linux*, *Windows* and *MacOS*) is provided below.
|
||||||
* PSI desktop configuration issues must be addressed through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
* PSI desktop configuration issues must be addressed through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
||||||
* Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
|
* Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
|
||||||
|
|
||||||
### Accessing from a Linux client
|
### Accessing from a Linux client
|
||||||
|
|
||||||
@@ -26,7 +26,8 @@ Refer to [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-
|
|||||||
|
|
||||||
## NoMachine Remote Desktop Access
|
## NoMachine Remote Desktop Access
|
||||||
|
|
||||||
X applications are supported in the login nodes and can run efficiently through a **NoMachine** client. This is the officially supported way to run more demanding X applications on Merlin7.
|
X applications are supported in the login nodes and can run efficiently through a **NoMachine** client. This is the officially supported way to run more demanding X applications on Merlin7.
|
||||||
|
|
||||||
* For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
* For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
||||||
* For other workstations The client software can be downloaded from the [Nomachine Website](https://www.nomachine.com/product&p=NoMachine%20Enterprise%20Client).
|
* For other workstations The client software can be downloaded from the [Nomachine Website](https://www.nomachine.com/product&p=NoMachine%20Enterprise%20Client).
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Accessing Slurm Cluster
|
||||||
title: Accessing Slurm Cluster
|
|
||||||
#tags:
|
|
||||||
keywords: slurm, batch system, merlin5, merlin7, gmerlin7, cpu, gpu
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/slurm-access.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## The Merlin Slurm clusters
|
## The Merlin Slurm clusters
|
||||||
|
|
||||||
@@ -28,7 +20,7 @@ In addition, any job *must be submitted from a high performance storage area vis
|
|||||||
|
|
||||||
### Merlin7 CPU cluster access
|
### Merlin7 CPU cluster access
|
||||||
|
|
||||||
The **Merlin7 CPU cluster** (**`merlin7`**) is the default cluster configured in the login nodes. Any job submission will use by default this cluster, unless
|
The **Merlin7 CPU cluster** (**`merlin7`**) is the default cluster configured in the login nodes. Any job submission will use by default this cluster, unless
|
||||||
the option `--cluster` is specified with another of the existing clusters.
|
the option `--cluster` is specified with another of the existing clusters.
|
||||||
|
|
||||||
For further information about how to use this cluster, please visit: [**Merlin7 CPU Slurm Cluster documentation**](../03-Slurm-General-Documentation/slurm-configuration.md#cpu-cluster-merlin7).
|
For further information about how to use this cluster, please visit: [**Merlin7 CPU Slurm Cluster documentation**](../03-Slurm-General-Documentation/slurm-configuration.md#cpu-cluster-merlin7).
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Code Of Conduct
|
||||||
title: Code Of Conduct
|
|
||||||
#tags:
|
|
||||||
keywords: code of conduct, rules, principle, policy, policies, administrator, backup
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/code-of-conduct.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## The Basic principle
|
## The Basic principle
|
||||||
|
|
||||||
@@ -18,10 +10,10 @@ The basic principle is courtesy and consideration for other users.
|
|||||||
## Interactive nodes
|
## Interactive nodes
|
||||||
|
|
||||||
* The interactive nodes (also known as login nodes) are for development and quick testing:
|
* The interactive nodes (also known as login nodes) are for development and quick testing:
|
||||||
* It is **strictly forbidden to run production jobs** on the login nodes. All production jobs must be submitted to the batch system.
|
* It is **strictly forbidden to run production jobs** on the login nodes. All production jobs must be submitted to the batch system.
|
||||||
* It is **forbidden to run long processes** occupying big parts of a login node's resources.
|
* It is **forbidden to run long processes** occupying big parts of a login node's resources.
|
||||||
* According to the previous rules, **misbehaving running processes will have to be killed.**
|
* According to the previous rules, **misbehaving running processes will have to be killed.**
|
||||||
in order to keep the system responsive for other users.
|
in order to keep the system responsive for other users.
|
||||||
|
|
||||||
## Batch system
|
## Batch system
|
||||||
|
|
||||||
@@ -35,7 +27,7 @@ The basic principle is courtesy and consideration for other users.
|
|||||||
## User and project data
|
## User and project data
|
||||||
|
|
||||||
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
||||||
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
||||||
|
|
||||||
!!! warning
|
!!! warning
|
||||||
When a user leaves PSI and his account has been removed, her storage space
|
When a user leaves PSI and his account has been removed, her storage space
|
||||||
@@ -46,8 +38,8 @@ The basic principle is courtesy and consideration for other users.
|
|||||||
## System Administrator Rights
|
## System Administrator Rights
|
||||||
|
|
||||||
* The system administrator has the right to temporarily block the access to Merlin7 for an account violating the Code of Conduct in order to maintain the efficiency and stability of the system.
|
* The system administrator has the right to temporarily block the access to Merlin7 for an account violating the Code of Conduct in order to maintain the efficiency and stability of the system.
|
||||||
* Repetitive violations by the same user will be escalated to the user's supervisor.
|
* Repetitive violations by the same user will be escalated to the user's supervisor.
|
||||||
* The system administrator has the right to delete files in the **scratch** directories
|
* The system administrator has the right to delete files in the **scratch** directories
|
||||||
* after a job, if the job failed to clean up its files.
|
* after a job, if the job failed to clean up its files.
|
||||||
* during the job in order to prevent a job from destabilizing a node or multiple nodes.
|
* during the job in order to prevent a job from destabilizing a node or multiple nodes.
|
||||||
* The system administrator has the right to kill any misbehaving running processes.
|
* The system administrator has the right to kill any misbehaving running processes.
|
||||||
|
|||||||
@@ -1,14 +1,4 @@
|
|||||||
---
|
# Introduction
|
||||||
title: Introduction
|
|
||||||
#tags:
|
|
||||||
keywords: introduction, home, welcome, architecture, design
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/introduction.html
|
|
||||||
redirect_from:
|
|
||||||
- /merlin7
|
|
||||||
- /merlin7/index.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## About Merlin7
|
## About Merlin7
|
||||||
|
|
||||||
@@ -56,9 +46,9 @@ The appliance is built of several storage servers:
|
|||||||
With effective storage capacity of:
|
With effective storage capacity of:
|
||||||
|
|
||||||
* 10 PB HDD
|
* 10 PB HDD
|
||||||
* value visible on linux: HDD 9302.4 TiB
|
* value visible on linux: HDD 9302.4 TiB
|
||||||
* 162 TB SSD
|
* 162 TB SSD
|
||||||
* value visible on linux: SSD 151.6 TiB
|
* value visible on linux: SSD 151.6 TiB
|
||||||
* 23.6 TiB on Metadata
|
* 23.6 TiB on Metadata
|
||||||
|
|
||||||
The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.
|
The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Requesting Merlin Accounts
|
||||||
title: Requesting Merlin Accounts
|
|
||||||
#tags:
|
|
||||||
keywords: registration, register, account, merlin5, merlin7, snow, service now
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/request-account.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Requesting Access to Merlin7
|
## Requesting Access to Merlin7
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Requesting a Merlin Project
|
||||||
title: Requesting a Merlin Project
|
|
||||||
#tags:
|
|
||||||
keywords: merlin project, project, snow, service now
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/request-project.html
|
|
||||||
---
|
|
||||||
|
|
||||||
A project owns its own storage area in Merlin, which can be accessed by other group members.
|
A project owns its own storage area in Merlin, which can be accessed by other group members.
|
||||||
|
|
||||||
@@ -21,21 +13,21 @@ This document explains how to request new Unix group, to request membership for
|
|||||||
|
|
||||||
## About Unix groups
|
## About Unix groups
|
||||||
|
|
||||||
Before requesting a Merlin project, it is important to have a Unix group that can be used to grant access to it to different members
|
Before requesting a Merlin project, it is important to have a Unix group that can be used to grant access to it to different members
|
||||||
of the project.
|
of the project.
|
||||||
|
|
||||||
Unix groups in the PSI Active Directory (which is the PSI central database containing user and group information, and more) are defined by the `unx-` prefix, followed by a name.
|
Unix groups in the PSI Active Directory (which is the PSI central database containing user and group information, and more) are defined by the `unx-` prefix, followed by a name.
|
||||||
In general, PSI employees working on Linux systems (including HPC clusters, like Merlin) can request for a non-existing Unix group, and can become responsible for managing it.
|
In general, PSI employees working on Linux systems (including HPC clusters, like Merlin) can request for a non-existing Unix group, and can become responsible for managing it.
|
||||||
In addition, a list of administrators can be set. The administrators, together with the group manager, can approve or deny membership requests. Further information about this topic
|
In addition, a list of administrators can be set. The administrators, together with the group manager, can approve or deny membership requests. Further information about this topic
|
||||||
is covered in the [Linux Documentation - Services Admin Guides: Unix Groups / Group Management](https://linux.psi.ch/admin-guide/configuration/basic/users_and_groups.html), managed by the Central Linux Team.
|
is covered in the [Linux Documentation - Services Admin Guides: Unix Groups / Group Management](https://linux.psi.ch/admin-guide/configuration/basic/users_and_groups.html), managed by the Central Linux Team.
|
||||||
|
|
||||||
To gran access to specific Merlin project directories, some users may require to be added to some specific **Unix groups**:
|
To gran access to specific Merlin project directories, some users may require to be added to some specific **Unix groups**:
|
||||||
* Each Merlin project (i.e. `/data/project/{bio|general}/$projectname`) or experiment (i.e. `/data/experiment/$experimentname`) directory has access restricted by ownership and group membership (with a very few exceptions allowing public access).
|
* Each Merlin project (i.e. `/data/project/{bio|general}/$projectname`) or experiment (i.e. `/data/experiment/$experimentname`) directory has access restricted by ownership and group membership (with a very few exceptions allowing public access).
|
||||||
* Users requiring access to a specific restricted project or experiment directory have to request membership for the corresponding Unix group owning the directory.
|
* Users requiring access to a specific restricted project or experiment directory have to request membership for the corresponding Unix group owning the directory.
|
||||||
|
|
||||||
### Requesting a new Unix group
|
### Requesting a new Unix group
|
||||||
|
|
||||||
**If you need a new Unix group** to be created, you need to first get this group through a separate
|
**If you need a new Unix group** to be created, you need to first get this group through a separate
|
||||||
**[PSI Service Now ticket](https://psi.service-now.com/psisp)**. **Please use the following template.**
|
**[PSI Service Now ticket](https://psi.service-now.com/psisp)**. **Please use the following template.**
|
||||||
You can also specify the login names of the initial group members and the **owner** of the group.
|
You can also specify the login names of the initial group members and the **owner** of the group.
|
||||||
The owner of the group is the person who will be allowed to modify the group.
|
The owner of the group is the person who will be allowed to modify the group.
|
||||||
@@ -48,9 +40,9 @@ The owner of the group is the person who will be allowed to modify the group.
|
|||||||
* and base the text field of the request on this template
|
* and base the text field of the request on this template
|
||||||
```
|
```
|
||||||
Dear HelpDesk
|
Dear HelpDesk
|
||||||
|
|
||||||
I would like to request a new unix group.
|
I would like to request a new unix group.
|
||||||
|
|
||||||
Unix Group Name: unx-xxxxx
|
Unix Group Name: unx-xxxxx
|
||||||
Initial Group Members: xxxxx, yyyyy, zzzzz, ...
|
Initial Group Members: xxxxx, yyyyy, zzzzz, ...
|
||||||
Group Owner: xxxxx
|
Group Owner: xxxxx
|
||||||
@@ -62,6 +54,7 @@ The owner of the group is the person who will be allowed to modify the group.
|
|||||||
### Requesting Unix group membership
|
### Requesting Unix group membership
|
||||||
|
|
||||||
Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper **Unix group** owning that project.
|
Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper **Unix group** owning that project.
|
||||||
|
|
||||||
Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example:
|
Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example:
|
||||||
```bash
|
```bash
|
||||||
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname
|
(base) ❄ [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname
|
||||||
@@ -95,16 +88,16 @@ To request a project, please provide the following information in a **[PSI Servi
|
|||||||
* and base the text field of the request on this template
|
* and base the text field of the request on this template
|
||||||
```
|
```
|
||||||
Dear HelpDesk
|
Dear HelpDesk
|
||||||
|
|
||||||
I would like to request a new Merlin7 project.
|
I would like to request a new Merlin7 project.
|
||||||
|
|
||||||
Project Name: xxxxx
|
Project Name: xxxxx
|
||||||
UnixGroup: xxxxx # Must be an existing Unix Group
|
UnixGroup: xxxxx # Must be an existing Unix Group
|
||||||
|
|
||||||
The project responsible is the Owner of the Unix Group.
|
The project responsible is the Owner of the Unix Group.
|
||||||
If you need a storage quota exceeding the defaults, please provide a description
|
If you need a storage quota exceeding the defaults, please provide a description
|
||||||
and motivation for the higher storage needs:
|
and motivation for the higher storage needs:
|
||||||
|
|
||||||
Storage Quota: 1TB with a maximum of 1M Files
|
Storage Quota: 1TB with a maximum of 1M Files
|
||||||
Reason: (None for default 1TB/1M)
|
Reason: (None for default 1TB/1M)
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Archive & PSI Data Catalog
|
||||||
title: Archive & PSI Data Catalog
|
|
||||||
#tags:
|
|
||||||
keywords: linux, archive, data catalog, archiving, lts, tape, long term storage, ingestion, datacatalog
|
|
||||||
last_updated: 31 January 2020
|
|
||||||
summary: "This document describes how to use the PSI Data Catalog for archiving Merlin7 data."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/archive.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## PSI Data Catalog as a PSI Central Service
|
## PSI Data Catalog as a PSI Central Service
|
||||||
|
|
||||||
@@ -19,14 +11,14 @@ The Data Catalog and Archive is suitable for:
|
|||||||
* Derived data produced by processing some inputs
|
* Derived data produced by processing some inputs
|
||||||
* Data required to reproduce PSI research and publications
|
* Data required to reproduce PSI research and publications
|
||||||
|
|
||||||
The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management.
|
The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management.
|
||||||
In accordance with this policy, ***data will be publicly released under CC-BY-SA 4.0 after an
|
In accordance with this policy, ***data will be publicly released under CC-BY-SA 4.0 after an
|
||||||
embargo period expires.***
|
embargo period expires.***
|
||||||
|
|
||||||
The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the
|
The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the
|
||||||
Merlin storage under the ``/data`` directories (currentlyi, ``/data/user`` and ``/data/project``).
|
Merlin storage under the ``/data`` directories (currentlyi, ``/data/user`` and ``/data/project``).
|
||||||
Archiving from other directories is also possible, however the process is much slower as data
|
Archiving from other directories is also possible, however the process is much slower as data
|
||||||
can not be directly retrieved by the PSI archive central servers (**central mode**), and needs to
|
can not be directly retrieved by the PSI archive central servers (**central mode**), and needs to
|
||||||
be indirectly copied to these (**decentral mode**).
|
be indirectly copied to these (**decentral mode**).
|
||||||
|
|
||||||
Archiving can be done from any node accessible by the users (usually from the login nodes).
|
Archiving can be done from any node accessible by the users (usually from the login nodes).
|
||||||
@@ -48,33 +40,33 @@ Archiving can be done from any node accessible by the users (usually from the lo
|
|||||||
Below are the main steps for using the Data Catalog.
|
Below are the main steps for using the Data Catalog.
|
||||||
|
|
||||||
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
|
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
|
||||||
* Prepare a metadata file describing the dataset
|
* Prepare a metadata file describing the dataset
|
||||||
* Run **``datasetIngestor``** script
|
* Run **``datasetIngestor``** script
|
||||||
* If necessary, the script will copy the data to the PSI archive servers
|
* If necessary, the script will copy the data to the PSI archive servers
|
||||||
* Usually this is necessary when archiving from directories other than **``/data/user``** or
|
* Usually this is necessary when archiving from directories other than **``/data/user``** or
|
||||||
**``/data/project``**. It would be also necessary when the Merlin export server (**``merlin-archive.psi.ch``**)
|
**``/data/project``**. It would be also necessary when the Merlin export server (**``merlin-archive.psi.ch``**)
|
||||||
is down for any reason.
|
is down for any reason.
|
||||||
* Archive the dataset:
|
* Archive the dataset:
|
||||||
* Visit [https://discovery.psi.ch](https://discovery.psi.ch)
|
* Visit [<https://discovery.psi.ch](https://discovery.psi.ch>)
|
||||||
* Click **``Archive``** for the dataset
|
* Click **``Archive``** for the dataset
|
||||||
* The system will now copy the data to the PetaByte Archive at CSCS
|
* The system will now copy the data to the PetaByte Archive at CSCS
|
||||||
* Retrieve data from the catalog:
|
* Retrieve data from the catalog:
|
||||||
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **``Retrieve``**
|
* Find the dataset on [<https://discovery.psi.ch](https://discovery.psi.ch>) and click **``Retrieve``**
|
||||||
* Wait for the data to be copied to the PSI retrieval system
|
* Wait for the data to be copied to the PSI retrieval system
|
||||||
* Run **``datasetRetriever``** script
|
* Run **``datasetRetriever``** script
|
||||||
|
|
||||||
Since large data sets may take a lot of time to transfer, some steps are designed to happen in the
|
Since large data sets may take a lot of time to transfer, some steps are designed to happen in the
|
||||||
background. The discovery website can be used to track the progress of each step.
|
background. The discovery website can be used to track the progress of each step.
|
||||||
|
|
||||||
### Account Registration
|
### Account Registration
|
||||||
|
|
||||||
Two types of account permit access to the Data Catalog. If your data was collected at a ***beamline***, you may
|
Two types of account permit access to the Data Catalog. If your data was collected at a ***beamline***, you may
|
||||||
have been assigned a **``p-group``** (e.g. ``p12345``) for the experiment. Other users are assigned **``a-group``**
|
have been assigned a **``p-group``** (e.g. ``p12345``) for the experiment. Other users are assigned **``a-group``**
|
||||||
(e.g. ``a-12345``).
|
(e.g. ``a-12345``).
|
||||||
|
|
||||||
Groups are usually assigned to a PI, and then individual user accounts are added to the group. This must be done
|
Groups are usually assigned to a PI, and then individual user accounts are added to the group. This must be done
|
||||||
under user request through PSI Service Now. For existing **a-groups** and **p-groups**, you can follow the standard
|
under user request through PSI Service Now. For existing **a-groups** and **p-groups**, you can follow the standard
|
||||||
central procedures. Alternatively, if you do not know how to do that, follow the Merlin7
|
central procedures. Alternatively, if you do not know how to do that, follow the Merlin7
|
||||||
**[Requesting extra Unix groups](../01-Quick-Start-Guide/requesting-accounts.md)** procedure, or open
|
**[Requesting extra Unix groups](../01-Quick-Start-Guide/requesting-accounts.md)** procedure, or open
|
||||||
a **[PSI Service Now](https://psi.service-now.com/psisp)** ticket.
|
a **[PSI Service Now](https://psi.service-now.com/psisp)** ticket.
|
||||||
|
|
||||||
@@ -114,11 +106,11 @@ $ SCICAT_TOKEN=RqYMZcqpqMJqluplbNYXLeSyJISLXfnkwlfBKuvTSdnlpKkU
|
|||||||
|
|
||||||
Tokens expire after 2 weeks and will need to be fetched from the website again.
|
Tokens expire after 2 weeks and will need to be fetched from the website again.
|
||||||
|
|
||||||
### Ingestion
|
### Ingestion
|
||||||
|
|
||||||
The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called
|
The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called
|
||||||
**``metadata.json``**, and can be created with a text editor (e.g. *``vim``*). It can in principle be saved anywhere,
|
**``metadata.json``**, and can be created with a text editor (e.g. *``vim``*). It can in principle be saved anywhere,
|
||||||
but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata'
|
but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata'
|
||||||
section below. An example follows:
|
section below. An example follows:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
@@ -176,30 +168,31 @@ It will ask for your PSI credentials and then print some info about the data to
|
|||||||
datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json
|
datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json
|
||||||
```
|
```
|
||||||
|
|
||||||
You will be asked whether you want to copy the data to the central system:
|
You will be asked whether you want to copy the data to the central system:
|
||||||
|
|
||||||
|
* If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can
|
||||||
|
|
||||||
* If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can
|
|
||||||
directly read the data.
|
directly read the data.
|
||||||
* If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets
|
* If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets
|
||||||
to the PSI archive system may take quite a while (minutes to hours).
|
to the PSI archive system may take quite a while (minutes to hours).
|
||||||
|
|
||||||
If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data.
|
If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data.
|
||||||
This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so
|
This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so
|
||||||
this process may take several days, and it will fail if any modifications are detected.
|
this process may take several days, and it will fail if any modifications are detected.
|
||||||
|
|
||||||
If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog:
|
If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog:
|
||||||
[https://discovery.psi.ch](https://discovery.psi.ch). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion
|
[<https://discovery.psi.ch](https://discovery.psi.ch>). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion
|
||||||
is complete.
|
is complete.
|
||||||
|
|
||||||
If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive'
|
If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive'
|
||||||
tab. You should see the newly ingested dataset. Check the dataset and click **``Archive``**. You should see the status change from **``datasetCreated``** to
|
tab. You should see the newly ingested dataset. Check the dataset and click **``Archive``**. You should see the status change from **``datasetCreated``** to
|
||||||
**``scheduleArchiveJob``**. This indicates that the data is in the process of being transferred to CSCS.
|
**``scheduleArchiveJob``**. This indicates that the data is in the process of being transferred to CSCS.
|
||||||
|
|
||||||
After a few days the dataset's status will change to **``datasetOnAchive``** indicating the data is stored. At this point it is safe to delete the data.
|
After a few days the dataset's status will change to **``datasetOnAchive``** indicating the data is stored. At this point it is safe to delete the data.
|
||||||
|
|
||||||
#### Useful commands
|
#### Useful commands
|
||||||
|
|
||||||
Running the datasetIngestor in dry mode (**without** ``--ingest``) finds most errors. However, it is sometimes convenient to find potential errors
|
Running the datasetIngestor in dry mode (**without** ``--ingest``) finds most errors. However, it is sometimes convenient to find potential errors
|
||||||
yourself with simple unix commands.
|
yourself with simple unix commands.
|
||||||
|
|
||||||
Find problematic filenames
|
Find problematic filenames
|
||||||
@@ -239,8 +232,8 @@ find . -name '*#autosave#' -delete
|
|||||||
Certificate invalid: name is not a listed principal
|
Certificate invalid: name is not a listed principal
|
||||||
```
|
```
|
||||||
It indicates that no kerberos token was provided for authentication. You can avoid the warning by first running kinit (PSI linux systems).
|
It indicates that no kerberos token was provided for authentication. You can avoid the warning by first running kinit (PSI linux systems).
|
||||||
|
|
||||||
* For decentral ingestion cases, the copy step is indicated by a message ``Running [/usr/bin/rsync -e ssh -avxz ...``. It is expected that this
|
* For decentral ingestion cases, the copy step is indicated by a message ``Running [/usr/bin/rsync -e ssh -avxz ...``. It is expected that this
|
||||||
step will take a long time and may appear to have hung. You can check what files have been successfully transfered using rsync:
|
step will take a long time and may appear to have hung. You can check what files have been successfully transfered using rsync:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -250,7 +243,7 @@ step will take a long time and may appear to have hung. You can check what files
|
|||||||
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
|
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
|
||||||
|
|
||||||
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
|
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
|
||||||
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
|
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
|
||||||
|
|
||||||
```
|
```
|
||||||
tar -f [output].tar [srcdir]
|
tar -f [output].tar [srcdir]
|
||||||
@@ -271,7 +264,6 @@ step will take a long time and may appear to have hung. You can check what files
|
|||||||
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
|
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
|
||||||
2019/11/06 11:04:43 Latest version: 1.1.11
|
2019/11/06 11:04:43 Latest version: 1.1.11
|
||||||
|
|
||||||
|
|
||||||
2019/11/06 11:04:43 Your version of this program is up-to-date
|
2019/11/06 11:04:43 Your version of this program is up-to-date
|
||||||
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
|
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
|
||||||
2019/11/06 11:04:43 Your username:
|
2019/11/06 11:04:43 Your username:
|
||||||
@@ -321,7 +313,6 @@ user_n@pb-archive.psi.ch's password:
|
|||||||
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
|
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
|
||||||
The data must first be copied to a rsync cache server.
|
The data must first be copied to a rsync cache server.
|
||||||
|
|
||||||
|
|
||||||
2019/11/06 11:05:04 Do you want to continue (Y/n)?
|
2019/11/06 11:05:04 Do you want to continue (Y/n)?
|
||||||
Y
|
Y
|
||||||
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
|
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
|
||||||
@@ -359,7 +350,7 @@ user_n@pb-archive.psi.ch's password:
|
|||||||
|
|
||||||
### Publishing
|
### Publishing
|
||||||
|
|
||||||
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch.
|
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on <http://doi.psi.ch>.
|
||||||
|
|
||||||
For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8).
|
For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8).
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Connecting from a Linux Client
|
||||||
title: Connecting from a Linux Client
|
|
||||||
#tags:
|
|
||||||
keywords: linux, connecting, client, configuration, SSH, X11
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
summary: "This document describes a recommended setup for a Linux client."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/connect-from-linux.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## SSH without X11 Forwarding
|
## SSH without X11 Forwarding
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Connecting from a MacOS Client
|
||||||
title: Connecting from a MacOS Client
|
|
||||||
#tags:
|
|
||||||
keywords: MacOS, mac os, mac, connecting, client, configuration, SSH, X11
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
summary: "This document describes a recommended setup for a MacOS client."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/connect-from-macos.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## SSH without X11 Forwarding
|
## SSH without X11 Forwarding
|
||||||
|
|
||||||
@@ -37,7 +29,7 @@ we provide a small recipe for enabling X11 Forwarding in MacOS.
|
|||||||
|
|
||||||
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
|
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
|
||||||
to implicitly add ``-X`` to all ssh connections:
|
to implicitly add ``-X`` to all ssh connections:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ForwardAgent yes
|
ForwardAgent yes
|
||||||
ForwardX11Trusted yes
|
ForwardX11Trusted yes
|
||||||
|
|||||||
@@ -4,8 +4,9 @@
|
|||||||
|
|
||||||
PuTTY is one of the most common tools for SSH.
|
PuTTY is one of the most common tools for SSH.
|
||||||
|
|
||||||
Check, if the following software packages are installed on the Windows workstation by
|
Check, if the following software packages are installed on the Windows workstation by
|
||||||
inspecting the *Start* menu (hint: use the *Search* box to save time):
|
inspecting the *Start* menu (hint: use the *Search* box to save time):
|
||||||
|
|
||||||
* PuTTY (should be already installed)
|
* PuTTY (should be already installed)
|
||||||
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
|
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
|
||||||
|
|
||||||
@@ -21,7 +22,6 @@ If they are missing, you can install them using the Software Kiosk icon on the D
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
## SSH with PuTTY with X11 Forwarding
|
## SSH with PuTTY with X11 Forwarding
|
||||||
|
|
||||||
Official X11 Forwarding support is through NoMachine. Please follow the document
|
Official X11 Forwarding support is through NoMachine. Please follow the document
|
||||||
@@ -29,9 +29,9 @@ Official X11 Forwarding support is through NoMachine. Please follow the document
|
|||||||
[{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for more details. However,
|
[{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for more details. However,
|
||||||
we provide a small recipe for enabling X11 Forwarding in Windows.
|
we provide a small recipe for enabling X11 Forwarding in Windows.
|
||||||
|
|
||||||
Check, if the **Xming** is installed on the Windows workstation by inspecting the
|
Check, if the **Xming** is installed on the Windows workstation by inspecting the
|
||||||
*Start* menu (hint: use the *Search* box to save time). If missing, you can install it by
|
*Start* menu (hint: use the *Search* box to save time). If missing, you can install it by
|
||||||
using the Software Kiosk icon (should be located on the Desktop).
|
using the Software Kiosk icon (should be located on the Desktop).
|
||||||
|
|
||||||
1. Ensure that a X server (**Xming**) is running. Otherwise, start it.
|
1. Ensure that a X server (**Xming**) is running. Otherwise, start it.
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Kerberos and AFS authentication
|
||||||
title: Kerberos and AFS authentication
|
|
||||||
#tags:
|
|
||||||
keywords: kerberos, AFS, kinit, klist, keytab, tickets, connecting, client, configuration, slurm
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
summary: "This document describes how to use Kerberos."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/kerberos.html
|
|
||||||
---
|
|
||||||
|
|
||||||
Projects and users have their own areas in the central PSI AFS service. In order
|
Projects and users have their own areas in the central PSI AFS service. In order
|
||||||
to access to these areas, valid Kerberos and AFS tickets must be granted.
|
to access to these areas, valid Kerberos and AFS tickets must be granted.
|
||||||
@@ -58,7 +50,7 @@ Kerberos ticket is mandatory.
|
|||||||
krenew
|
krenew
|
||||||
```
|
```
|
||||||
|
|
||||||
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
|
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
|
||||||
and then `kinit` should be used instead.
|
and then `kinit` should be used instead.
|
||||||
|
|
||||||
## Obtanining granting tickets with keytab
|
## Obtanining granting tickets with keytab
|
||||||
@@ -95,8 +87,8 @@ For generating a **keytab**, one has to:
|
|||||||
```
|
```
|
||||||
|
|
||||||
Please note:
|
Please note:
|
||||||
* That you will need to add your password once. This step is required for generating the **keytab** file.
|
* That you will need to add your password once. This step is required for generating the **keytab** file.
|
||||||
* `ktutil`does **not** report an error if you enter a wrong password! You can test with the `kinit` command documented below. If `kinit` fails with an error message like "pre-authentication failed", this is usually due to a wrong password/key in the keytab file. In this case **you have to remove the keytab file** and re-run the `ktutil` command. See "Updating the keytab file" in the section below.
|
* `ktutil`does **not** report an error if you enter a wrong password! You can test with the `kinit` command documented below. If `kinit` fails with an error message like "pre-authentication failed", this is usually due to a wrong password/key in the keytab file. In this case **you have to remove the keytab file** and re-run the `ktutil` command. See "Updating the keytab file" in the section below.
|
||||||
|
|
||||||
### Updating an existing keytab file
|
### Updating an existing keytab file
|
||||||
|
|
||||||
@@ -177,7 +169,7 @@ This is the **recommended** way. At the end of the job, is strongly recommended
|
|||||||
#SBATCH --output=run.out # Generate custom output file
|
#SBATCH --output=run.out # Generate custom output file
|
||||||
#SBATCH --error=run.err # Generate custom error file
|
#SBATCH --error=run.err # Generate custom error file
|
||||||
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
||||||
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
|
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
|
||||||
#SBATCH --cpus-per-task=1
|
#SBATCH --cpus-per-task=1
|
||||||
#SBATCH --constraint=xeon-gold-6152
|
#SBATCH --constraint=xeon-gold-6152
|
||||||
#SBATCH --hint=nomultithread
|
#SBATCH --hint=nomultithread
|
||||||
|
|||||||
@@ -10,10 +10,8 @@ provides a helpful wrapper over the Gnome storage utilities (GIO and GVFS), and
|
|||||||
- FTP, SFTP
|
- FTP, SFTP
|
||||||
- [complete list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
- [complete list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
||||||
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
|
|
||||||
### Start a session
|
### Start a session
|
||||||
|
|
||||||
First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.
|
First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.
|
||||||
@@ -38,7 +36,7 @@ merlin_rmount --select-mount
|
|||||||
Select the desired url using the arrow keys.
|
Select the desired url using the arrow keys.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
From this list any of the standard supported endpoints can be mounted.
|
From this list any of the standard supported endpoints can be mounted.
|
||||||
|
|
||||||
### Other endpoints
|
### Other endpoints
|
||||||
@@ -47,7 +45,6 @@ Other endpoints can be mounted using the `merlin_rmount --mount <endpoint>` comm
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
### Accessing Files
|
### Accessing Files
|
||||||
|
|
||||||
After mounting a volume the script will print the mountpoint. It should be of the form
|
After mounting a volume the script will print the mountpoint. It should be of the form
|
||||||
@@ -67,7 +64,6 @@ ln -s ~/mnt /run/user/$UID/gvfs
|
|||||||
|
|
||||||
Files are accessible as long as the `merlin_rmount` shell remains open.
|
Files are accessible as long as the `merlin_rmount` shell remains open.
|
||||||
|
|
||||||
|
|
||||||
### Disconnecting
|
### Disconnecting
|
||||||
|
|
||||||
To disconnect, close the session with one of the following:
|
To disconnect, close the session with one of the following:
|
||||||
@@ -78,7 +74,6 @@ To disconnect, close the session with one of the following:
|
|||||||
|
|
||||||
Disconnecting will unmount all volumes.
|
Disconnecting will unmount all volumes.
|
||||||
|
|
||||||
|
|
||||||
## Alternatives
|
## Alternatives
|
||||||
|
|
||||||
### Thunar
|
### Thunar
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Merlin7 Tools
|
||||||
title: Merlin7 Tools
|
|
||||||
#tags:
|
|
||||||
keywords: merlin_quotas
|
|
||||||
#last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/tools.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## About
|
## About
|
||||||
|
|
||||||
@@ -27,17 +19,17 @@ found on the [Storage page](storage.md#dir_classes).
|
|||||||
Simply calling `merlin_quotas` will show you a table of our quotas:
|
Simply calling `merlin_quotas` will show you a table of our quotas:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ merlin_quotas
|
$ merlin_quotas
|
||||||
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
|
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
|
||||||
-------------- --------- ---------- ------- --------- ---------- -------
|
-------------- --------- ---------- ------- --------- ---------- -------
|
||||||
/data/user 30.26G 1T 03% 367296 2097152 18%
|
/data/user 30.26G 1T 03% 367296 2097152 18%
|
||||||
└─ <USERNAME>
|
└─ <USERNAME>
|
||||||
/afs/psi.ch 3.4G 9.5G 36% 0 0 00%
|
/afs/psi.ch 3.4G 9.5G 36% 0 0 00%
|
||||||
└─ user/<USERDIR>
|
└─ user/<USERDIR>
|
||||||
/data/project 2.457T 10T 25% 58 2097152 00%
|
/data/project 2.457T 10T 25% 58 2097152 00%
|
||||||
└─ bio/shared
|
└─ bio/shared
|
||||||
/data/project 338.3G 10T 03% 199391 2097152 10%
|
/data/project 338.3G 10T 03% 199391 2097152 10%
|
||||||
└─ bio/hpce
|
└─ bio/hpce
|
||||||
```
|
```
|
||||||
|
|
||||||
!!! tip
|
!!! tip
|
||||||
@@ -105,4 +97,3 @@ If you are added/removed from a project, you can update this config file by
|
|||||||
calling `merlin_quotas genconf --force` (notice the `--force`, which will overwrite
|
calling `merlin_quotas genconf --force` (notice the `--force`, which will overwrite
|
||||||
your existing config file) or by editing the file by hand (*not recommended*).
|
your existing config file) or by editing the file by hand (*not recommended*).
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,10 +1,4 @@
|
|||||||
---
|
# Remote Desktop Access to Merlin7
|
||||||
title: Remote Desktop Access to Merlin7
|
|
||||||
keywords: NX, NoMachine, remote desktop access, login node, login001, login002, merlin7-nx-01, merlin7-nx, nx.psi.ch, VPN, browser access
|
|
||||||
last_updated: 07 August 2024
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/nomachine.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
@@ -21,7 +15,7 @@ If you are inside the PSI network, you can directly connect to the Merlin7 NoMac
|
|||||||
|
|
||||||
#### Method 1: Using a Web Browser
|
#### Method 1: Using a Web Browser
|
||||||
|
|
||||||
Open your web browser and navigate to [https://merlin7-nx.psi.ch:4443](https://merlin7-nx.psi.ch:4443).
|
Open your web browser and navigate to <https://merlin7-nx.psi.ch:4443>.
|
||||||
|
|
||||||
#### Method 2: Using the NoMachine Client
|
#### Method 2: Using the NoMachine Client
|
||||||
|
|
||||||
@@ -42,7 +36,7 @@ Documentation about the `nx.psi.ch` service can be found [here](https://www.psi.
|
|||||||
|
|
||||||
##### Using a Web Browser
|
##### Using a Web Browser
|
||||||
|
|
||||||
Open your web browser and navigate to [https://nx.psi.ch](https://nx.psi.ch).
|
Open your web browser and navigate to <https://nx.psi.ch>.
|
||||||
|
|
||||||
##### Using the NoMachine Client
|
##### Using the NoMachine Client
|
||||||
|
|
||||||
|
|||||||
@@ -1,16 +1,8 @@
|
|||||||
---
|
# Software repositories
|
||||||
title: Software repositories
|
|
||||||
#tags:
|
|
||||||
keywords: modules, software, stable, unstable, deprecated, spack, repository, repositories
|
|
||||||
last_updated: 16 January 2024
|
|
||||||
summary: "This page contains information about the different software repositories"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/software-repositories.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Module Systems in Merlin7
|
## Module Systems in Merlin7
|
||||||
|
|
||||||
Merlin7 provides a modular environment to ensure flexibility, compatibility, and optimized performance.
|
Merlin7 provides a modular environment to ensure flexibility, compatibility, and optimized performance.
|
||||||
The system supports three primary module types: PSI Environment Modules (PModules), Spack Modules, and Cray Environment Modules.
|
The system supports three primary module types: PSI Environment Modules (PModules), Spack Modules, and Cray Environment Modules.
|
||||||
|
|
||||||
### PSI Environment Modules (PModules)
|
### PSI Environment Modules (PModules)
|
||||||
@@ -35,7 +27,7 @@ Merlin7 also provides Spack modules, offering a modern and flexible package mana
|
|||||||
### Cray Environment Modules
|
### Cray Environment Modules
|
||||||
|
|
||||||
Merlin7 also supports Cray Environment Modules, which include compilers, MPI implementations, and libraries optimized
|
Merlin7 also supports Cray Environment Modules, which include compilers, MPI implementations, and libraries optimized
|
||||||
for Cray systems. However, Cray modules are not recommended as the default choice due to potential backward compatibility
|
for Cray systems. However, Cray modules are not recommended as the default choice due to potential backward compatibility
|
||||||
issues when the Cray Programming Environment (CPE) is upgraded to a newer version.
|
issues when the Cray Programming Environment (CPE) is upgraded to a newer version.
|
||||||
|
|
||||||
Recommendations:
|
Recommendations:
|
||||||
|
|||||||
@@ -1,13 +1,4 @@
|
|||||||
---
|
# Configuring SSH Keys in Merlin
|
||||||
title: Configuring SSH Keys in Merlin
|
|
||||||
|
|
||||||
#tags:
|
|
||||||
keywords: linux, connecting, client, configuration, SSH, Keys, SSH-Keys, RSA, authorization, authentication
|
|
||||||
last_updated: 15 Jul 2020
|
|
||||||
summary: "This document describes how to deploy SSH Keys in Merlin."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/ssh-keys.html
|
|
||||||
---
|
|
||||||
|
|
||||||
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
|
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
|
||||||
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
|
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
|
||||||
@@ -22,14 +13,14 @@ User can check whether a SSH key already exists. These would be placed in the **
|
|||||||
is usually the default one, and files in there would be **`id_rsa`** (private key) and **`id_rsa.pub`** (public key).
|
is usually the default one, and files in there would be **`id_rsa`** (private key) and **`id_rsa.pub`** (public key).
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ls ~/.ssh/id*
|
ls ~/.ssh/id*
|
||||||
```
|
```
|
||||||
|
|
||||||
For creating **SSH RSA Keys**, one should:
|
For creating **SSH RSA Keys**, one should:
|
||||||
|
|
||||||
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
|
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
|
||||||
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
|
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
|
||||||
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
|
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
|
||||||
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
|
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -92,7 +83,7 @@ to the **ssh-agent**. This must be done once per SSH session, as follows:
|
|||||||
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
|
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
|
||||||
```
|
```
|
||||||
|
|
||||||
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
||||||
You will be requested for the **passphrase** of your key, and it can be done by running:
|
You will be requested for the **passphrase** of your key, and it can be done by running:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -111,7 +102,7 @@ However, for NoMachine one always need to add the private key identity to the au
|
|||||||
```bash
|
```bash
|
||||||
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
|
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
|
||||||
```
|
```
|
||||||
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
||||||
You will be requested for the **passphrase** of your key, and it can be done by running:
|
You will be requested for the **passphrase** of your key, and it can be done by running:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
@@ -1,13 +1,4 @@
|
|||||||
---
|
# Merlin7 Storage
|
||||||
title: Merlin7 Storage
|
|
||||||
#tags:
|
|
||||||
keywords: storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas
|
|
||||||
#last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
redirect_from: /merlin7/data-directories.html
|
|
||||||
permalink: /merlin7/storage.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
@@ -30,13 +21,13 @@ Some of the Merlin7 directories have quotas applied. A way for checking the quot
|
|||||||
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
|
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ merlin_quotas
|
$ merlin_quotas
|
||||||
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
|
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
|
||||||
-------------- --------- ---------- ------- --------- ---------- -------
|
-------------- --------- ---------- ------- --------- ---------- -------
|
||||||
/data/user 30.26G 1T 03% 367296 2097152 18%
|
/data/user 30.26G 1T 03% 367296 2097152 18%
|
||||||
└─ <USERNAME>
|
└─ <USERNAME>
|
||||||
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
|
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
|
||||||
└─ user/<USERDIR>
|
└─ user/<USERDIR>
|
||||||
/data/scratch 688.9M 2T 00% 368471 0 00%
|
/data/scratch 688.9M 2T 00% 368471 0 00%
|
||||||
└─ shared
|
└─ shared
|
||||||
/data/project 3.373T 11T 31% 425644 2097152 20%
|
/data/project 3.373T 11T 31% 425644 2097152 20%
|
||||||
@@ -117,7 +108,7 @@ Directory policies:
|
|||||||
* No backup policy is applied for the user home directories: **users are responsible for backing up their data**.
|
* No backup policy is applied for the user home directories: **users are responsible for backing up their data**.
|
||||||
|
|
||||||
Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the `merlin_quotas` command described
|
Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the `merlin_quotas` command described
|
||||||
[above](storage.md#how-to-check-quotas).
|
[above](storage.md#how-to-check-quotas).
|
||||||
|
|
||||||
### Project data directory
|
### Project data directory
|
||||||
|
|
||||||
@@ -151,7 +142,7 @@ Directory policies:
|
|||||||
|
|
||||||
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
|
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
|
||||||
* It is **forbidden** to use the data directories as `/scratch` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
|
* It is **forbidden** to use the data directories as `/scratch` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
|
||||||
* Please Use `/scratch`, `/data/scratch/shared` for this purpose.
|
* Please Use `/scratch`, `/data/scratch/shared` for this purpose.
|
||||||
* No backups: users are responsible for managing the backups of their data directories.
|
* No backups: users are responsible for managing the backups of their data directories.
|
||||||
|
|
||||||
#### Dedicated project directories
|
#### Dedicated project directories
|
||||||
@@ -190,6 +181,6 @@ Scratch directories policies:
|
|||||||
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
|
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
|
||||||
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
||||||
* Temporary files *must be deleted at the end of the job by the user*.
|
* Temporary files *must be deleted at the end of the job by the user*.
|
||||||
* Remaining files will be deleted by the system if detected.
|
* Remaining files will be deleted by the system if detected.
|
||||||
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
||||||
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|
||||||
|
|||||||
@@ -1,26 +1,19 @@
|
|||||||
---
|
# Transferring Data
|
||||||
title: Transferring Data
|
|
||||||
#tags:
|
|
||||||
keywords: transferring data, data transfer, rsync, winscp, copy data, copying, sftp, import, export, hop, vpn
|
|
||||||
last_updated: 24 August 2023
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/transfer-data.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Most data transfer methods support both sending and receiving, so you may initiate the transfer from either **Merlin** or the other system — depending on **network visibility**.
|
Most data transfer methods support both sending and receiving, so you may initiate the transfer from either **Merlin** or the other system — depending on **network visibility**.
|
||||||
- **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
|
|
||||||
- **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
|
|
||||||
- HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
|
|
||||||
- Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
|
|
||||||
- **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
|
|
||||||
|
|
||||||
> SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**.
|
* **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
|
||||||
> * However, **transfers from any PSI host to Merlin7 using port 22 are allowed**.
|
* **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
|
||||||
>
|
* HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
|
||||||
> Port `21` is also available for FTP transfers from PSI to Merlin7.
|
* Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
|
||||||
|
* **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
|
||||||
|
|
||||||
|
!!! note
|
||||||
|
SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**.
|
||||||
|
However, **transfers from any PSI host to Merlin7 using port 22 are allowed**.
|
||||||
|
Port `21` is also available for FTP transfers from PSI to Merlin7.
|
||||||
|
|
||||||
### Choosing the best transfer method
|
### Choosing the best transfer method
|
||||||
|
|
||||||
@@ -46,6 +39,7 @@ The following methods transfer data directly via the [login nodes](../01-Quick-S
|
|||||||
### Rsync (Recommended for Linux/macOS)
|
### Rsync (Recommended for Linux/macOS)
|
||||||
|
|
||||||
Rsync is the **preferred** method for small datasets from Linux/macOS systems. It supports **resuming interrupted transfers** and **skips already transferred files**. Syntax:
|
Rsync is the **preferred** method for small datasets from Linux/macOS systems. It supports **resuming interrupted transfers** and **skips already transferred files**. Syntax:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
rsync -avAHXS <src> <dst>
|
rsync -avAHXS <src> <dst>
|
||||||
```
|
```
|
||||||
@@ -65,12 +59,15 @@ rsync -avAHXS ~/localdata $USER@login001.merlin7.psi.ch:/data/project/general/my
|
|||||||
### SCP
|
### SCP
|
||||||
|
|
||||||
SCP works similarly to `rsync` but **does not support resuming** interrupted transfers. It may be used for quick one-off transfers. Example:
|
SCP works similarly to `rsync` but **does not support resuming** interrupted transfers. It may be used for quick one-off transfers. Example:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
|
scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
|
||||||
```
|
```
|
||||||
|
|
||||||
### Secure FTP
|
### Secure FTP
|
||||||
|
|
||||||
A `vsftpd` service is available on the login nodes, providing high-speed transfers. Choose the server based on your **speed vs. encryption** needs:
|
A `vsftpd` service is available on the login nodes, providing high-speed transfers. Choose the server based on your **speed vs. encryption** needs:
|
||||||
|
|
||||||
* **`login001.merlin7.psi.ch`:** Encrypted control & data channels.
|
* **`login001.merlin7.psi.ch`:** Encrypted control & data channels.
|
||||||
**Use if your data is sensitive**. **Slower**, but secure.
|
**Use if your data is sensitive**. **Slower**, but secure.
|
||||||
* **`service03.merlin7.psi.ch`**: Encrypted control channel only.
|
* **`service03.merlin7.psi.ch`**: Encrypted control channel only.
|
||||||
@@ -80,14 +77,16 @@ A `vsftpd` service is available on the login nodes, providing high-speed transfe
|
|||||||
The **control channel** is always **encrypted**, therefore, authentication is encrypted and secured.
|
The **control channel** is always **encrypted**, therefore, authentication is encrypted and secured.
|
||||||
|
|
||||||
## UI-based Clients for Data Transfer
|
## UI-based Clients for Data Transfer
|
||||||
|
|
||||||
### WinSCP (Windows)
|
### WinSCP (Windows)
|
||||||
|
|
||||||
Available in the **Software Kiosk** on PSI Windows machines.
|
Available in the **Software Kiosk** on PSI Windows machines.
|
||||||
* Using your PSI credentials, connect to
|
|
||||||
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
|
* Using your PSI credentials, connect to
|
||||||
* when using port 21, connect to:
|
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
|
||||||
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
|
* when using port 21, connect to:
|
||||||
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
|
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
|
||||||
|
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
|
||||||
* Drag and drop files between your PC and Merlin.
|
* Drag and drop files between your PC and Merlin.
|
||||||
|
|
||||||
* FTP (port 21)
|
* FTP (port 21)
|
||||||
@@ -95,30 +94,34 @@ Available in the **Software Kiosk** on PSI Windows machines.
|
|||||||
### FileZilla (Linux/MacOS/Windows)
|
### FileZilla (Linux/MacOS/Windows)
|
||||||
|
|
||||||
Download from [FileZilla Project](https://filezilla-project.org/), or install from your Linux software repositories if available.
|
Download from [FileZilla Project](https://filezilla-project.org/), or install from your Linux software repositories if available.
|
||||||
* Using your PSI credentials, connect to
|
|
||||||
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
|
* Using your PSI credentials, connect to
|
||||||
* when using port 21, connect to:
|
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
|
||||||
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
|
* when using port 21, connect to:
|
||||||
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
|
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
|
||||||
|
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
|
||||||
* Supports drag-and-drop file transfers.
|
* Supports drag-and-drop file transfers.
|
||||||
|
|
||||||
## Sharing Files with SWITCHfilesender
|
## Sharing Files with SWITCHfilesender
|
||||||
|
|
||||||
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is a Swiss-hosted installation of the [FileSender](https://filesender.org/) project — a web-based application that allows authenticated users to securely and easily send **arbitrarily large files** to other users. Features:
|
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is a Swiss-hosted installation of the [FileSender](https://filesender.org/) project — a web-based application that allows authenticated users to securely and easily send **arbitrarily large files** to other users. Features:
|
||||||
- **Secure large file transfers:** Send files that exceed normal email attachment limits.
|
|
||||||
- **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads.
|
* **Secure large file transfers:** Send files that exceed normal email attachment limits.
|
||||||
- **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account.
|
* **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads.
|
||||||
- **Designed for research & education:** Developed to meet the needs of universities and research institutions.
|
* **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account.
|
||||||
|
* **Designed for research & education:** Developed to meet the needs of universities and research institutions.
|
||||||
|
|
||||||
About the authentication:
|
About the authentication:
|
||||||
- It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
|
|
||||||
- It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**.
|
* It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
|
||||||
- PSI employees can log in using their PSI account:
|
* It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**.
|
||||||
|
* PSI employees can log in using their PSI account:
|
||||||
1. Open [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload).
|
1. Open [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload).
|
||||||
2. Select **PSI** as the institution.
|
2. Select **PSI** as the institution.
|
||||||
3. Authenticate with your PSI credentials.
|
3. Authenticate with your PSI credentials.
|
||||||
|
|
||||||
The service is designed to **send large files for temporary availability**, not as a permanent publishing platform. Typical use case:
|
The service is designed to **send large files for temporary availability**, not as a permanent publishing platform. Typical use case:
|
||||||
|
|
||||||
1. Upload a file.
|
1. Upload a file.
|
||||||
2. Share the download link with a recipient.
|
2. Share the download link with a recipient.
|
||||||
3. File remains available until the specified **expiration date** is reached, or the **download limit** is reached.
|
3. File remains available until the specified **expiration date** is reached, or the **download limit** is reached.
|
||||||
@@ -130,10 +133,11 @@ The service is designed to **send large files for temporary availability**, not
|
|||||||
## PSI Data Transfer
|
## PSI Data Transfer
|
||||||
|
|
||||||
From August 2024, Merlin is connected to the **[PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer)** service,
|
From August 2024, Merlin is connected to the **[PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer)** service,
|
||||||
`datatransfer.psi.ch`. This is a central service managed by the **[Linux team](https://linux.psi.ch/index.html)**. However, any problems or questions related to it can be directly
|
`datatransfer.psi.ch`. This is a central service managed by the **[Linux team](https://linux.psi.ch/index.html)**. However, any problems or questions related to it can be directly
|
||||||
[reported](../99-support/contact.md) to the Merlin administrators, which will forward the request if necessary.
|
[reported](../99-support/contact.md) to the Merlin administrators, which will forward the request if necessary.
|
||||||
|
|
||||||
The PSI Data Transfer servers supports the following protocols:
|
The PSI Data Transfer servers supports the following protocols:
|
||||||
|
|
||||||
* Data Transfer - SSH (scp / rsync)
|
* Data Transfer - SSH (scp / rsync)
|
||||||
* Data Transfer - Globus
|
* Data Transfer - Globus
|
||||||
|
|
||||||
@@ -150,27 +154,25 @@ Therefore, having the Microsoft Authenticator App is required as explained [here
|
|||||||
## Connecting to Merlin7 from outside PSI
|
## Connecting to Merlin7 from outside PSI
|
||||||
|
|
||||||
Merlin7 is fully accessible from within the PSI network. To connect from outside you can use:
|
Merlin7 is fully accessible from within the PSI network. To connect from outside you can use:
|
||||||
- [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
|
|
||||||
- [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
|
|
||||||
* Please avoid transferring big amount data through **hop**
|
|
||||||
- [No Machine](nomachine.md)
|
|
||||||
* Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
|
|
||||||
* Please avoid transferring big amount of data through **NoMachine**
|
|
||||||
|
|
||||||
{% comment %}
|
* [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
|
||||||
|
* [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
|
||||||
|
* Please avoid transferring big amount data through **hop**
|
||||||
|
* [No Machine](nomachine.md)
|
||||||
|
* Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
|
||||||
|
* Please avoid transferring big amount of data through **NoMachine**
|
||||||
|
|
||||||
## Connecting from Merlin7 to outside file shares
|
## Connecting from Merlin7 to outside file shares
|
||||||
|
|
||||||
### `merlin_rmount` command
|
### `merlin_rmount` command
|
||||||
|
|
||||||
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
|
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
|
||||||
provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including
|
provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including
|
||||||
- SMB/CIFS (Windows shared folders)
|
|
||||||
- WebDav
|
|
||||||
- AFP
|
|
||||||
- FTP, SFTP
|
|
||||||
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
|
||||||
|
|
||||||
|
* SMB/CIFS (Windows shared folders)
|
||||||
|
* WebDav
|
||||||
|
* AFP
|
||||||
|
* FTP, SFTP
|
||||||
|
* [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
||||||
|
|
||||||
[More instruction on using `merlin_rmount`](merlin-rmount.md)
|
[More instruction on using `merlin_rmount`](merlin-rmount.md)
|
||||||
{% endcomment %}
|
|
||||||
|
|
||||||
|
|||||||
@@ -24,17 +24,17 @@ Is run is used to run parallel jobs in the batch system. It can be used within a
|
|||||||
(which can be run with ``sbatch``), or within a job allocation (which can be run with ``salloc``).
|
(which can be run with ``sbatch``), or within a job allocation (which can be run with ``salloc``).
|
||||||
Also, it can be used as a direct command (in example, from the login nodes).
|
Also, it can be used as a direct command (in example, from the login nodes).
|
||||||
|
|
||||||
When used inside a batch script or during a job allocation, ``srun`` is constricted to the
|
When used inside a batch script or during a job allocation, ``srun`` is constricted to the
|
||||||
amount of resources allocated by the ``sbatch``/``salloc`` commands. In ``sbatch``, usually
|
amount of resources allocated by the ``sbatch``/``salloc`` commands. In ``sbatch``, usually
|
||||||
these resources are defined inside the batch script with the format ``#SBATCH <option>=<value>``.
|
these resources are defined inside the batch script with the format ``#SBATCH <option>=<value>``.
|
||||||
In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core)
|
In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core)
|
||||||
and 2 nodes, ``srun`` is constricted to these amount of resources (you can use less, but never
|
and 2 nodes, ``srun`` is constricted to these amount of resources (you can use less, but never
|
||||||
exceed those limits).
|
exceed those limits).
|
||||||
|
|
||||||
When used from the login node, usually is used to run a specific command or software in an
|
When used from the login node, usually is used to run a specific command or software in an
|
||||||
interactive way. ``srun`` is a blocking process (it will block bash prompt until the ``srun``
|
interactive way. ``srun`` is a blocking process (it will block bash prompt until the ``srun``
|
||||||
command finishes, unless you run it in background with ``&``). This can be very useful to run
|
command finishes, unless you run it in background with ``&``). This can be very useful to run
|
||||||
interactive software which pops up a Window and then submits jobs or run sub-tasks in the
|
interactive software which pops up a Window and then submits jobs or run sub-tasks in the
|
||||||
background (in example, **Relion**, **cisTEM**, etc.)
|
background (in example, **Relion**, **cisTEM**, etc.)
|
||||||
|
|
||||||
Refer to ``man srun`` for exploring all possible options for that command.
|
Refer to ``man srun`` for exploring all possible options for that command.
|
||||||
@@ -65,7 +65,7 @@ prompt a new shell on the first allocated node). However, this behaviour can be
|
|||||||
a shell (`$SHELL`) at the end of the `salloc` command. In example:
|
a shell (`$SHELL`) at the end of the `salloc` command. In example:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Typical 'salloc' call
|
# Typical 'salloc' call
|
||||||
salloc --clusters=merlin7 --partition=interactive -N 2 -n 2
|
salloc --clusters=merlin7 --partition=interactive -N 2 -n 2
|
||||||
|
|
||||||
# Custom 'salloc' call
|
# Custom 'salloc' call
|
||||||
@@ -111,20 +111,21 @@ salloc: Relinquishing job allocation 165
|
|||||||
|
|
||||||
#### Graphical access
|
#### Graphical access
|
||||||
|
|
||||||
[NoMachine](../02-How-To-Use-Merlin/nomachine.md) is the official supported service for graphical
|
[NoMachine](../02-How-To-Use-Merlin/nomachine.md) is the official supported service for graphical
|
||||||
access in the Merlin cluster. This service is running on the login nodes. Check the
|
access in the Merlin cluster. This service is running on the login nodes. Check the
|
||||||
document [{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for details about
|
document [{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for details about
|
||||||
how to connect to the **NoMachine** service in the Merlin cluster.
|
how to connect to the **NoMachine** service in the Merlin cluster.
|
||||||
|
|
||||||
For other non officially supported graphical access (X11 forwarding):
|
For other non officially supported graphical access (X11 forwarding):
|
||||||
|
|
||||||
* For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](../02-How-To-Use-Merlin/connect-from-linux.md)
|
* For Linux clients, please follow [{How To Use Merlin -> Accessing from Linux Clients}](../02-How-To-Use-Merlin/connect-from-linux.md)
|
||||||
|
|
||||||
* For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](../02-How-To-Use-Merlin/connect-from-windows.md)
|
* For Windows clients, please follow [{How To Use Merlin -> Accessing from Windows Clients}](../02-How-To-Use-Merlin/connect-from-windows.md)
|
||||||
* For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-Merlin/connect-from-macos.md)
|
* For MacOS clients, please follow [{How To Use Merlin -> Accessing from MacOS Clients}](../02-How-To-Use-Merlin/connect-from-macos.md)
|
||||||
|
|
||||||
### 'srun' with x11 support
|
### 'srun' with x11 support
|
||||||
|
|
||||||
Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to
|
Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to
|
||||||
add the option ``--x11`` to the ``srun`` command. In example:
|
add the option ``--x11`` to the ``srun`` command. In example:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -146,7 +147,7 @@ srun --clusters=merlin7 --partition=interactive --x11 --pty bash
|
|||||||
<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
|
<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
|
||||||
caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 sview
|
caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 sview
|
||||||
|
|
||||||
caubet_m@login001:~>
|
caubet_m@login001:~>
|
||||||
|
|
||||||
caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 --pty bash
|
caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 --pty bash
|
||||||
|
|
||||||
@@ -162,7 +163,7 @@ exit
|
|||||||
|
|
||||||
### 'salloc' with x11 support
|
### 'salloc' with x11 support
|
||||||
|
|
||||||
**Merlin6** and **merlin7** clusters allow running any windows based applications. For that, you need to
|
**Merlin6** and **merlin7** clusters allow running any windows based applications. For that, you need to
|
||||||
add the option ``--x11`` to the ``salloc`` command. In example:
|
add the option ``--x11`` to the ``salloc`` command. In example:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -172,7 +173,7 @@ salloc --clusters=merlin7 --partition=interactive --x11 sview
|
|||||||
will popup a X11 based slurm view of the cluster.
|
will popup a X11 based slurm view of the cluster.
|
||||||
|
|
||||||
In the same manner, you can create a bash shell with x11 support. For doing that, you need
|
In the same manner, you can create a bash shell with x11 support. For doing that, you need
|
||||||
to add to run just ``salloc --clusters=merlin7 --partition=interactive --x11``. Once resource is allocated, from
|
to add to run just ``salloc --clusters=merlin7 --partition=interactive --x11``. Once resource is allocated, from
|
||||||
there you can interactively run X11 and non-X11 based commands.
|
there you can interactively run X11 and non-X11 based commands.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -187,10 +188,10 @@ salloc: Granted job allocation 174
|
|||||||
salloc: Nodes cn001 are ready for job
|
salloc: Nodes cn001 are ready for job
|
||||||
salloc: Relinquishing job allocation 174
|
salloc: Relinquishing job allocation 174
|
||||||
|
|
||||||
caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11
|
caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11
|
||||||
salloc: Granted job allocation 175
|
salloc: Granted job allocation 175
|
||||||
salloc: Nodes cn001 are ready for job
|
salloc: Nodes cn001 are ready for job
|
||||||
caubet_m@cn001:~>
|
caubet_m@cn001:~>
|
||||||
|
|
||||||
caubet_m@cn001:~> sview
|
caubet_m@cn001:~> sview
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Slurm cluster 'merlin7'
|
||||||
title: Slurm cluster 'merlin7'
|
|
||||||
#tags:
|
|
||||||
keywords: configuration, partitions, node definition
|
|
||||||
#last_updated: 24 Mai 2023
|
|
||||||
summary: "This document describes a summary of the Merlin7 configuration."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/merlin7-configuration.html
|
|
||||||
---
|
|
||||||
|
|
||||||
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
|
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
|
||||||
|
|
||||||
@@ -14,10 +6,10 @@ This documentation shows basic Slurm configuration and options needed to run job
|
|||||||
|
|
||||||
### Hardware
|
### Hardware
|
||||||
|
|
||||||
* 2 CPU-only login nodes
|
* 2 CPU-only login nodes
|
||||||
* 77 CPU-only compute nodes
|
* 77 CPU-only compute nodes
|
||||||
* 5 GPU A100 nodes
|
* 5 GPU A100 nodes
|
||||||
* 8 GPU Grace Hopper nodes
|
* 8 GPU Grace Hopper nodes
|
||||||
|
|
||||||
The specification of the node types is:
|
The specification of the node types is:
|
||||||
|
|
||||||
@@ -51,9 +43,9 @@ The appliance is built of several storage servers:
|
|||||||
With effective storage capacity of:
|
With effective storage capacity of:
|
||||||
|
|
||||||
* 10 PB HDD
|
* 10 PB HDD
|
||||||
* value visible on linux: HDD 9302.4 TiB
|
* value visible on linux: HDD 9302.4 TiB
|
||||||
* 162 TB SSD
|
* 162 TB SSD
|
||||||
* value visible on linux: SSD 151.6 TiB
|
* value visible on linux: SSD 151.6 TiB
|
||||||
* 23.6 TiB on Metadata
|
* 23.6 TiB on Metadata
|
||||||
|
|
||||||
The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.
|
The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Slurm merlin7 Configuration
|
||||||
title: Slurm merlin7 Configuration
|
|
||||||
#tags:
|
|
||||||
keywords: configuration, partitions, node definition
|
|
||||||
#last_updated: 24 Mai 2023
|
|
||||||
summary: "This document describes a summary of the Merlin7 Slurm CPU-based configuration."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/slurm-configuration.html
|
|
||||||
---
|
|
||||||
|
|
||||||
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
|
This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.
|
||||||
|
|
||||||
@@ -14,7 +6,7 @@ This documentation shows basic Slurm configuration and options needed to run job
|
|||||||
|
|
||||||
### CPU public partitions
|
### CPU public partitions
|
||||||
|
|
||||||
| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits |
|
| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits |
|
||||||
| -----------------: | -----------: | ----------: | -------: | ---------------: | --------------------: | --------------------: |
|
| -----------------: | -----------: | ----------: | -------: | ---------------: | --------------------: | --------------------: |
|
||||||
| **<u>general</u>** | 1-00:00:00 | 7-00:00:00 | Low | <u>merlin</u> | cpu=1024,mem=1920G | cpu=1024,mem=1920G |
|
| **<u>general</u>** | 1-00:00:00 | 7-00:00:00 | Low | <u>merlin</u> | cpu=1024,mem=1920G | cpu=1024,mem=1920G |
|
||||||
| **daily** | 0-01:00:00 | 1-00:00:00 | Medium | <u>merlin</u> | cpu=1024,mem=1920G | cpu=2048,mem=3840G |
|
| **daily** | 0-01:00:00 | 1-00:00:00 | Medium | <u>merlin</u> | cpu=1024,mem=1920G | cpu=2048,mem=3840G |
|
||||||
@@ -31,7 +23,7 @@ This documentation shows basic Slurm configuration and options needed to run job
|
|||||||
| **a100-daily** | 0-01:00:00 | 1-00:00:00 | Medium | <u>merlin</u> | gres/gpu=8 | gres/gpu=8 |
|
| **a100-daily** | 0-01:00:00 | 1-00:00:00 | Medium | <u>merlin</u> | gres/gpu=8 | gres/gpu=8 |
|
||||||
| **a100-hourly** | 0-00:30:00 | 0-01:00:00 | High | <u>merlin</u> | gres/gpu=8 | gres/gpu=8 |
|
| **a100-hourly** | 0-00:30:00 | 0-01:00:00 | High | <u>merlin</u> | gres/gpu=8 | gres/gpu=8 |
|
||||||
| **a100-interactive** | 0-01:00:00 | 0-12:00:00 | Very High | <u>merlin</u> | cpu=16,gres/gpu=1,mem=60G,node=1 | cpu=16,gres/gpu=1,mem=60G,node=1 |
|
| **a100-interactive** | 0-01:00:00 | 0-12:00:00 | Very High | <u>merlin</u> | cpu=16,gres/gpu=1,mem=60G,node=1 | cpu=16,gres/gpu=1,mem=60G,node=1 |
|
||||||
|
|
||||||
#### Grace-Hopper nodes
|
#### Grace-Hopper nodes
|
||||||
|
|
||||||
| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits |
|
| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits |
|
||||||
@@ -53,8 +45,9 @@ However, when necessary, one can specify the cluster as follows:
|
|||||||
### CPU general configuration
|
### CPU general configuration
|
||||||
|
|
||||||
The **Merlin7 CPU cluster** is configured with the **`CR_CORE_MEMORY`** and **`CR_ONE_TASK_PER_CORE`** options.
|
The **Merlin7 CPU cluster** is configured with the **`CR_CORE_MEMORY`** and **`CR_ONE_TASK_PER_CORE`** options.
|
||||||
|
|
||||||
* This configuration treats both cores and memory as consumable resources.
|
* This configuration treats both cores and memory as consumable resources.
|
||||||
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
||||||
to fulfill a job's resource requirements.
|
to fulfill a job's resource requirements.
|
||||||
|
|
||||||
By default, Slurm will allocate one task per core, which means:
|
By default, Slurm will allocate one task per core, which means:
|
||||||
@@ -75,15 +68,15 @@ scripts accordingly.
|
|||||||
|
|
||||||
Notes on memory configuration:
|
Notes on memory configuration:
|
||||||
* **Memory allocation options:** To request additional memory, use the following options in your submission script:
|
* **Memory allocation options:** To request additional memory, use the following options in your submission script:
|
||||||
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
||||||
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
||||||
|
|
||||||
The total memory requested cannot exceed the **`MaxMemPerNode`** value.
|
The total memory requested cannot exceed the **`MaxMemPerNode`** value.
|
||||||
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
||||||
effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
|
effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
|
||||||
adjusted.
|
adjusted.
|
||||||
|
|
||||||
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
||||||
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
|
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
|
||||||
|
|
||||||
!!! tip
|
!!! tip
|
||||||
@@ -93,19 +86,19 @@ adjusted.
|
|||||||
|
|
||||||
In the `merlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
|
In the `merlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
|
||||||
overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
|
overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
|
||||||
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
||||||
limits may result in pending jobs even when many nodes are idle due to low activity.
|
limits may result in pending jobs even when many nodes are idle due to low activity.
|
||||||
|
|
||||||
On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
|
On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
|
||||||
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
||||||
might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
|
might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
|
||||||
|
|
||||||
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
||||||
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
||||||
effectively.
|
effectively.
|
||||||
|
|
||||||
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
||||||
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
||||||
various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
||||||
* `MaxTRES` specifies resource limits per job.
|
* `MaxTRES` specifies resource limits per job.
|
||||||
* `MaxTRESPU` specifies resource limits per user.
|
* `MaxTRESPU` specifies resource limits per user.
|
||||||
@@ -119,7 +112,7 @@ various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
|||||||
| **cpu_interactive** | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 | partition |
|
| **cpu_interactive** | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 | partition |
|
||||||
|
|
||||||
Where:
|
Where:
|
||||||
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
||||||
restrictions.
|
restrictions.
|
||||||
* **`cpu_general` QoS:** This is the **default QoS** for `merlin7` _users_. It limits the total resources available to each
|
* **`cpu_general` QoS:** This is the **default QoS** for `merlin7` _users_. It limits the total resources available to each
|
||||||
user. Additionally, this QoS is applied to the `general` partition, enforcing restrictions at the partition level and
|
user. Additionally, this QoS is applied to the `general` partition, enforcing restrictions at the partition level and
|
||||||
@@ -172,17 +165,17 @@ Similarly, if no partition is specified, jobs are automatically submitted to the
|
|||||||
partitions provide higher priority and ensure quicker scheduling compared
|
partitions provide higher priority and ensure quicker scheduling compared
|
||||||
to **general**, which has limited node availability.
|
to **general**, which has limited node availability.
|
||||||
|
|
||||||
The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed
|
The **`hourly`** partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed
|
||||||
by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the
|
by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the
|
||||||
**`hourly`** partition might experience delays in such scenarios.
|
**`hourly`** partition might experience delays in such scenarios.
|
||||||
|
|
||||||
The **`interactive`** partition is designed specifically for real-time, interactive work. Here are the key characteristics:
|
The **`interactive`** partition is designed specifically for real-time, interactive work. Here are the key characteristics:
|
||||||
|
|
||||||
* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive
|
* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive
|
||||||
jobs may share the same physical CPU core. This can impact performance, but enables fast access for short-term tasks.
|
jobs may share the same physical CPU core. This can impact performance, but enables fast access for short-term tasks.
|
||||||
* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled
|
* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled
|
||||||
before any jobs in other partitions.
|
before any jobs in other partitions.
|
||||||
* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where
|
* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where
|
||||||
immediate access is important.
|
immediate access is important.
|
||||||
|
|
||||||
!!! warning
|
!!! warning
|
||||||
@@ -223,12 +216,14 @@ For submittng jobs to the GPU cluster, **the cluster name `gmerlin7` must be spe
|
|||||||
### GPU general configuration
|
### GPU general configuration
|
||||||
|
|
||||||
The **Merlin7 GPU cluster** is configured with the **`CR_CORE_MEMORY`**, **`CR_ONE_TASK_PER_CORE`**, and **`ENFORCE_BINDING_GRES`** options.
|
The **Merlin7 GPU cluster** is configured with the **`CR_CORE_MEMORY`**, **`CR_ONE_TASK_PER_CORE`**, and **`ENFORCE_BINDING_GRES`** options.
|
||||||
|
|
||||||
* This configuration treats both cores and memory as consumable resources.
|
* This configuration treats both cores and memory as consumable resources.
|
||||||
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
* Since the nodes are running with **hyper-threading** enabled, each core thread is counted as a CPU
|
||||||
to fulfill a job's resource requirements.
|
to fulfill a job's resource requirements.
|
||||||
* Slurm will allocate the CPUs to the selected GPU.
|
* Slurm will allocate the CPUs to the selected GPU.
|
||||||
|
|
||||||
By default, Slurm will allocate one task per core, which means:
|
By default, Slurm will allocate one task per core, which means:
|
||||||
|
|
||||||
* For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job.
|
* For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 **CPUs**, regardless of whether both threads are actively used by the job.
|
||||||
* For the NVIDIA GraceHopper-based nodes, each task will consume 1 **CPU**.
|
* For the NVIDIA GraceHopper-based nodes, each task will consume 1 **CPU**.
|
||||||
|
|
||||||
@@ -247,15 +242,16 @@ scripts accordingly.
|
|||||||
|
|
||||||
Notes on memory configuration:
|
Notes on memory configuration:
|
||||||
* **Memory allocation options:** To request additional memory, use the following options in your submission script:
|
* **Memory allocation options:** To request additional memory, use the following options in your submission script:
|
||||||
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
* **`--mem=<mem_in_MB>`**: Allocates memory per node.
|
||||||
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
* **`--mem-per-cpu=<mem_in_MB>`**: Allocates memory per CPU (equivalent to a core thread).
|
||||||
|
|
||||||
The total memory requested cannot exceed the **`MaxMemPerNode`** value.
|
The total memory requested cannot exceed the **`MaxMemPerNode`** value.
|
||||||
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
|
||||||
|
* **Impact of disabling Hyper-Threading:** Using the **`--hint=nomultithread`** option disables one thread per core,
|
||||||
effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
|
effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly
|
||||||
adjusted.
|
adjusted.
|
||||||
|
|
||||||
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended.
|
||||||
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
|
In such cases, you should double the **`--mem-per-cpu`** value to account for the reduced number of threads.
|
||||||
|
|
||||||
!!! tip
|
!!! tip
|
||||||
@@ -265,20 +261,22 @@ adjusted.
|
|||||||
|
|
||||||
In the `gmerlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
|
In the `gmerlin7` CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent
|
||||||
overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
|
overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster
|
||||||
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
efficiency. However, applying limits can occasionally impact the cluster’s utilization. For example, user-specific
|
||||||
limits may result in pending jobs even when many nodes are idle due to low activity.
|
limits may result in pending jobs even when many nodes are idle due to low activity.
|
||||||
|
|
||||||
On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
|
On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing
|
||||||
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
all available resources, which could block other jobs from running. Without job size limits, for instance, a large job
|
||||||
might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
|
might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.
|
||||||
|
|
||||||
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These
|
||||||
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist
|
||||||
effectively.
|
effectively.
|
||||||
|
|
||||||
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
To implement these limits, **we utilize Quality of Service (QoS)**. Different QoS policies are defined and applied
|
||||||
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
**to specific partitions** in line with the established resource allocation policies. The table below outlines the
|
||||||
|
|
||||||
various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
||||||
|
|
||||||
* `MaxTRES` specifies resource limits per job.
|
* `MaxTRES` specifies resource limits per job.
|
||||||
* `MaxTRESPU` specifies resource limits per user.
|
* `MaxTRESPU` specifies resource limits per user.
|
||||||
|
|
||||||
@@ -292,7 +290,7 @@ various QoS definitions applicable to the merlin7 CPU-based cluster. Here:
|
|||||||
| **gpu_a100_interactive** | cpu=16,gres/gpu=1,mem=60G,node=1 |cpu=16,gres/gpu=1,mem=60G,node=1 | partition |
|
| **gpu_a100_interactive** | cpu=16,gres/gpu=1,mem=60G,node=1 |cpu=16,gres/gpu=1,mem=60G,node=1 | partition |
|
||||||
|
|
||||||
Where:
|
Where:
|
||||||
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
* **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job
|
||||||
restrictions.
|
restrictions.
|
||||||
* **`gpu_general` QoS:** This is the **default QoS** for `gmerlin7` _users_. It limits the total resources available to each
|
* **`gpu_general` QoS:** This is the **default QoS** for `gmerlin7` _users_. It limits the total resources available to each
|
||||||
user. Additionally, this QoS is applied to the `[a100|gh]-general` partitions, enforcing restrictions at the partition level and
|
user. Additionally, this QoS is applied to the `[a100|gh]-general` partitions, enforcing restrictions at the partition level and
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Slurm Examples
|
||||||
title: Slurm Examples
|
|
||||||
#tags:
|
|
||||||
keywords: slurm example, template, examples, templates, running jobs, sbatch, single core based jobs, HT, multithread, no-multithread, mpi, openmp, packed jobs, hands-on, array jobs, gpu
|
|
||||||
last_updated: 24 Mai 2023
|
|
||||||
summary: "This document shows different template examples for running jobs in the Merlin cluster."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/slurm-examples.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Single core based job examples
|
## Single core based job examples
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Jupyterhub on Merlin7
|
||||||
title: Jupyterhub on Merlin7
|
|
||||||
#tags:
|
|
||||||
keywords: jupyterhub, jupyter, jupyterlab, notebook, notebooks
|
|
||||||
last_updated: 24 July 2025
|
|
||||||
summary: "Jupyterhub service description"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/jupyterhub.html
|
|
||||||
---
|
|
||||||
|
|
||||||
Jupyterhub provides [jupyter notebooks](https://jupyter.org/) that are launched on
|
Jupyterhub provides [jupyter notebooks](https://jupyter.org/) that are launched on
|
||||||
cluster nodes of merlin and can be accessed through a web portal.
|
cluster nodes of merlin and can be accessed through a web portal.
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# ANSYS RSM (Remote Resolve Manager)
|
||||||
title: ANSYS RSM (Remote Resolve Manager)
|
|
||||||
#tags:
|
|
||||||
keywords: software, ansys, rsm, slurm, interactive, rsm, windows
|
|
||||||
last_updated: 23 August 2024
|
|
||||||
summary: "This document describes how to use the ANSYS Remote Resolve Manager service in the Merlin7 cluster"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/ansys-rsm.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## ANSYS Remote Resolve Manager
|
## ANSYS Remote Resolve Manager
|
||||||
|
|
||||||
@@ -32,18 +24,19 @@ The different steps and settings required to make it work are that following:
|
|||||||
2. Right-click the **HPC Resources** icon followed by **Add HPC Resource...**
|
2. Right-click the **HPC Resources** icon followed by **Add HPC Resource...**
|
||||||

|

|
||||||
3. In the **HPC Resource** tab, fill up the corresponding fields as follows:
|
3. In the **HPC Resource** tab, fill up the corresponding fields as follows:
|
||||||

|

|
||||||
* **"Name"**: Add here the preffered name for the cluster. For example: `Merlin7 cluster`
|
* **"Name"**: Add here the preffered name for the cluster. For example: `Merlin7 cluster`
|
||||||
|
|
||||||
* **"HPC Type"**: Select `SLURM`
|
* **"HPC Type"**: Select `SLURM`
|
||||||
* **"Submit host"**: `service03.merlin7.psi.ch`
|
* **"Submit host"**: `service03.merlin7.psi.ch`
|
||||||
* **"Slurm Job submission arguments (optional)"**: Add any required Slurm options for running your jobs.
|
* **"Slurm Job submission arguments (optional)"**: Add any required Slurm options for running your jobs.
|
||||||
* `--hint=nomultithread` must be present.
|
* `--hint=nomultithread` must be present.
|
||||||
* `--exclusive` must also be present for now, due to a bug in the `Slingshot` interconnect which does not allow running shared nodes.
|
* `--exclusive` must also be present for now, due to a bug in the `Slingshot` interconnect which does not allow running shared nodes.
|
||||||
* Check **"Use SSH protocol for inter and intra-node communication (Linux only)"**
|
* Check **"Use SSH protocol for inter and intra-node communication (Linux only)"**
|
||||||
* Select **"Able to directly submit and monitor HPC jobs"**.
|
* Select **"Able to directly submit and monitor HPC jobs"**.
|
||||||
* **"Apply"** changes.
|
* **"Apply"** changes.
|
||||||
4. In the **"File Management"** tab, fill up the corresponding fields as follows:
|
4. In the **"File Management"** tab, fill up the corresponding fields as follows:
|
||||||

|

|
||||||
* Select **"RSM internal file transfer mechanism"** and add **`/data/scratch/shared`** as the **"Staging directory path on Cluster"**
|
* Select **"RSM internal file transfer mechanism"** and add **`/data/scratch/shared`** as the **"Staging directory path on Cluster"**
|
||||||
* Select **"Scratch directory local to the execution node(s)"** and add **`/scratch`** as the **HPC scratch directory**.
|
* Select **"Scratch directory local to the execution node(s)"** and add **`/scratch`** as the **HPC scratch directory**.
|
||||||
* **Never check** the option "Keep job files in the staging directory when job is complete" if the previous
|
* **Never check** the option "Keep job files in the staging directory when job is complete" if the previous
|
||||||
@@ -51,12 +44,12 @@ option "Scratch directory local to the execution node(s)" was set.
|
|||||||
* **"Apply"** changes.
|
* **"Apply"** changes.
|
||||||
5. In the **"Queues"** tab, use the left button to auto-discover partitions
|
5. In the **"Queues"** tab, use the left button to auto-discover partitions
|
||||||

|

|
||||||
* If no authentication method was configured before, an authentication window will appear. Use your
|
* If no authentication method was configured before, an authentication window will appear. Use your
|
||||||
PSI account to authenticate. Notice that the **`PSICH\`** prefix **must not be added**.
|
PSI account to authenticate. Notice that the **`PSICH\`** prefix **must not be added**.
|
||||||

|

|
||||||
* From the partition list, select the ones you want to typically use.
|
* From the partition list, select the ones you want to typically use.
|
||||||
* In general, standard Merlin users must use **`hourly`**, **`daily`** and **`general`** only.
|
* In general, standard Merlin users must use **`hourly`**, **`daily`** and **`general`** only.
|
||||||
* Other partitions are reserved for allowed users only.
|
* Other partitions are reserved for allowed users only.
|
||||||
* **"Apply"** changes.
|
* **"Apply"** changes.
|
||||||

|

|
||||||
6. *[Optional]* You can perform a test by submitting a test job on each partition by clicking on the **Submit** button
|
6. *[Optional]* You can perform a test by submitting a test job on each partition by clicking on the **Submit** button
|
||||||
@@ -67,7 +60,7 @@ for each selected partition.
|
|||||||
|
|
||||||
## Using RSM in ANSYS
|
## Using RSM in ANSYS
|
||||||
|
|
||||||
Using the RSM service in ANSYS is slightly different depending on the ANSYS software being used.
|
Using the RSM service in ANSYS is slightly different depending on the ANSYS software being used.
|
||||||
Please follow the official ANSYS documentation for details about how to use it for that specific software.
|
Please follow the official ANSYS documentation for details about how to use it for that specific software.
|
||||||
|
|
||||||
Alternativaly, please refer to some the examples showed in the following chapters (ANSYS specific software).
|
Alternativaly, please refer to some the examples showed in the following chapters (ANSYS specific software).
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# ANSYS
|
||||||
title: ANSYS
|
|
||||||
#tags:
|
|
||||||
keywords: software, ansys, slurm, interactive, rsm, pmodules, overlay, overlays
|
|
||||||
last_updated: 23 August 2024
|
|
||||||
summary: "This document describes how to load and use ANSYS in the Merlin7 cluster"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/ansys.html
|
|
||||||
---
|
|
||||||
|
|
||||||
This document describes generic information of how to load and run ANSYS software in the Merlin cluster
|
This document describes generic information of how to load and run ANSYS software in the Merlin cluster
|
||||||
|
|
||||||
@@ -14,15 +6,14 @@ This document describes generic information of how to load and run ANSYS softwar
|
|||||||
|
|
||||||
The ANSYS software can be loaded through **[PModules](pmodules.md)**.
|
The ANSYS software can be loaded through **[PModules](pmodules.md)**.
|
||||||
|
|
||||||
The default ANSYS versions are loaded from the central PModules repository.
|
The default ANSYS versions are loaded from the central PModules repository.
|
||||||
|
|
||||||
However, we provide local installations on Merlin7 which are needed mainly for some ANSYS packages, like Ansys RSM.
|
However, we provide local installations on Merlin7 which are needed mainly for some ANSYS packages, like Ansys RSM.
|
||||||
Due to this, and also to improve the interactive experience of the user, ANSYS has been also installed in the
|
Due to this, and also to improve the interactive experience of the user, ANSYS has been also installed in the
|
||||||
Merlin high performance storage and we have made it available from Pmodules.
|
Merlin high performance storage and we have made it available from Pmodules.
|
||||||
|
|
||||||
### Loading Merlin7 ANSYS
|
### Loading Merlin7 ANSYS
|
||||||
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
module purge
|
module purge
|
||||||
module use unstable # Optional
|
module use unstable # Optional
|
||||||
@@ -37,9 +28,9 @@ module load ANSYS/2025R2
|
|||||||
<details>
|
<details>
|
||||||
<summary>[Example] Loading ANSYS from the Merlin7 PModules repository</summary>
|
<summary>[Example] Loading ANSYS from the Merlin7 PModules repository</summary>
|
||||||
<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
|
<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
|
||||||
🔥 [caubet_m@login001:~]# module purge
|
🔥 [caubet_m@login001:~]# module purge
|
||||||
🔥 [caubet_m@login001:~]# module use unstable
|
🔥 [caubet_m@login001:~]# module use unstable
|
||||||
🔥 [caubet_m@login001:~]# module load cray
|
🔥 [caubet_m@login001:~]# module load cray
|
||||||
|
|
||||||
🔥 [caubet_m@login002:~]# module search ANSYS --verbose
|
🔥 [caubet_m@login002:~]# module search ANSYS --verbose
|
||||||
ANSYS/2022R2:
|
ANSYS/2022R2:
|
||||||
@@ -69,7 +60,6 @@ ANSYS/2025R2:
|
|||||||
</pre>
|
</pre>
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
|
||||||
!!! tip
|
!!! tip
|
||||||
Please always run **ANSYS/2024R2 or superior**.
|
Please always run **ANSYS/2024R2 or superior**.
|
||||||
|
|
||||||
|
|||||||
@@ -1,11 +1,4 @@
|
|||||||
---
|
# CP2k
|
||||||
title: CP2k
|
|
||||||
keywords: CP2k software, compile
|
|
||||||
summary: "CP2k is a quantum chemistry and solid state physics software package"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
toc: false
|
|
||||||
permalink: /merlin7/cp2k.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## CP2k
|
## CP2k
|
||||||
|
|
||||||
@@ -131,14 +124,13 @@ module purge
|
|||||||
module use Spack unstable
|
module use Spack unstable
|
||||||
module load gcc/12.3 openmpi/5.0.8-r5lz-A100-gpu dbcsr/2.8.0-3r22-A100-gpu-omp cosma/2.7.0-y2tr-gpu cuda/12.6.0-3y6a dftd4/3.7.0-4k4c-omp elpa/2025.01.002-bovg-A100-gpu-omp fftw/3.3.10-syba-omp hdf5/1.14.6-pcsd libint/2.11.1-3lxv libxc/7.0.0-u556 libxsmm/1.17-2azz netlib-scalapack/2.2.2-rmcf openblas/0.3.30-ynou-omp plumed/2.9.2-47hk py-fypp/3.1-z25p py-numpy/2.3.2-45ay python/3.13.5-qivs sirius/develop-qz4c-A100-gpu-omp spglib/2.5.0-jl5l-omp spla/1.6.1-hrgf-gpu cmake/3.31.8-j47l ninja/1.12.1-afxy
|
module load gcc/12.3 openmpi/5.0.8-r5lz-A100-gpu dbcsr/2.8.0-3r22-A100-gpu-omp cosma/2.7.0-y2tr-gpu cuda/12.6.0-3y6a dftd4/3.7.0-4k4c-omp elpa/2025.01.002-bovg-A100-gpu-omp fftw/3.3.10-syba-omp hdf5/1.14.6-pcsd libint/2.11.1-3lxv libxc/7.0.0-u556 libxsmm/1.17-2azz netlib-scalapack/2.2.2-rmcf openblas/0.3.30-ynou-omp plumed/2.9.2-47hk py-fypp/3.1-z25p py-numpy/2.3.2-45ay python/3.13.5-qivs sirius/develop-qz4c-A100-gpu-omp spglib/2.5.0-jl5l-omp spla/1.6.1-hrgf-gpu cmake/3.31.8-j47l ninja/1.12.1-afxy
|
||||||
|
|
||||||
git clone https://github.com/cp2k/cp2k.git
|
git clone <https://github.com/cp2k/cp2k.git>
|
||||||
cd cp2k
|
cd cp2k
|
||||||
|
|
||||||
mkdir build && cd build
|
mkdir build && cd build
|
||||||
CC=mpicc CXX=mpic++ FC=mpifort cmake -GNinja -DCMAKE_CUDA_HOST_COMPILER=mpicc -DCP2K_USE_LIBXC=ON -DCP2K_USE_LIBINT2=ON -DCP2K_USE_SPGLIB=ON -DCP2K_USE_ELPA=ON -DCP2K_USE_SPLA=ON -DCP2K_USE_SIRIUS=ON -DCP2K_USE_PLUMED=ON -DCP2K_USE_DFTD4=ON -DCP2K_USE_COSMA=ON -DCP2K_USE_ACCEL=CUDA -DCMAKE_CUDA_ARCHITECTURES=80 -DCP2K_USE_FFTW3=ON ..
|
CC=mpicc CXX=mpic++ FC=mpifort cmake -GNinja -DCMAKE_CUDA_HOST_COMPILER=mpicc -DCP2K_USE_LIBXC=ON -DCP2K_USE_LIBINT2=ON -DCP2K_USE_SPGLIB=ON -DCP2K_USE_ELPA=ON -DCP2K_USE_SPLA=ON -DCP2K_USE_SIRIUS=ON -DCP2K_USE_PLUMED=ON -DCP2K_USE_DFTD4=ON -DCP2K_USE_COSMA=ON -DCP2K_USE_ACCEL=CUDA -DCMAKE_CUDA_ARCHITECTURES=80 -DCP2K_USE_FFTW3=ON ..
|
||||||
ninja -j 16
|
ninja -j 16
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
#### GH200
|
#### GH200
|
||||||
[](https://gitea.psi.ch/HPCE/spack-psi)
|
[](https://gitea.psi.ch/HPCE/spack-psi)
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Cray Programming Environment
|
||||||
title: Cray Programming Environment
|
|
||||||
#tags:
|
|
||||||
keywords: cray, module
|
|
||||||
last_updated: 24 Mai 2023
|
|
||||||
summary: "This document describes how to use the Cray Programming Environment on Merlin7."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/cray-module-env.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Loading the Cray module
|
## Loading the Cray module
|
||||||
|
|
||||||
@@ -24,21 +16,21 @@ The Cray Programming Environment will load all the necessary dependencies. In ex
|
|||||||
🔥 [caubet_m@login001:~]# module list
|
🔥 [caubet_m@login001:~]# module list
|
||||||
Currently Loaded Modules:
|
Currently Loaded Modules:
|
||||||
1) craype-x86-rome 2) libfabric/1.15.2.0
|
1) craype-x86-rome 2) libfabric/1.15.2.0
|
||||||
3) craype-network-ofi
|
3) craype-network-ofi
|
||||||
4) xpmem/2.9.6-1.1_20240510205610__g087dc11fc19d 5) PrgEnv-cray/8.5.0
|
4) xpmem/2.9.6-1.1_20240510205610__g087dc11fc19d 5) PrgEnv-cray/8.5.0
|
||||||
6) cce/17.0.0 7) cray-libsci/23.12.5
|
6) cce/17.0.0 7) cray-libsci/23.12.5
|
||||||
8) cray-mpich/8.1.28 9) craype/2.7.30
|
8) cray-mpich/8.1.28 9) craype/2.7.30
|
||||||
10) perftools-base/23.12.0 11) cpe/23.12
|
10) perftools-base/23.12.0 11) cpe/23.12
|
||||||
12) cray/23.12
|
12) cray/23.12
|
||||||
```
|
```
|
||||||
|
|
||||||
You will notice an unfamiliar `PrgEnv-cray/8.5.0` that was loaded. This is a meta-module that Cray provides to simplify the switch of compilers and their associated dependencies and libraries,
|
You will notice an unfamiliar `PrgEnv-cray/8.5.0` that was loaded. This is a meta-module that Cray provides to simplify the switch of compilers and their associated dependencies and libraries,
|
||||||
as a whole called Programming Environment. In the Cray Programming Environment, there are 4 key modules.
|
as a whole called Programming Environment. In the Cray Programming Environment, there are 4 key modules.
|
||||||
|
|
||||||
* `cray-libsci` is a collection of numerical routines tuned for performance on Cray systems.
|
* `cray-libsci` is a collection of numerical routines tuned for performance on Cray systems.
|
||||||
* `libfabric` is an important low-level library that allows you to take advantage of the high performance Slingshot network.
|
* `libfabric` is an important low-level library that allows you to take advantage of the high performance Slingshot network.
|
||||||
* `cray-mpich` is a CUDA-aware MPI implementation, optimized for Cray systems.
|
* `cray-mpich` is a CUDA-aware MPI implementation, optimized for Cray systems.
|
||||||
* `cce` is the compiler from Cray. C/C++ compilers are based on Clang/LLVM while Fortran supports Fortran 2018 standard. More info: https://user.cscs.ch/computing/compilation/cray/
|
* `cce` is the compiler from Cray. C/C++ compilers are based on Clang/LLVM while Fortran supports Fortran 2018 standard. More info: <https://user.cscs.ch/computing/compilation/cray/>
|
||||||
|
|
||||||
You can switch between different programming environments. You can check the available module with the `module avail` command, as follows:
|
You can switch between different programming environments. You can check the available module with the `module avail` command, as follows:
|
||||||
|
|
||||||
@@ -46,13 +38,13 @@ You can switch between different programming environments. You can check the ava
|
|||||||
🔥 [caubet_m@login001:~]# module avail PrgEnv
|
🔥 [caubet_m@login001:~]# module avail PrgEnv
|
||||||
--------------------- /opt/cray/pe/lmod/modulefiles/core ---------------------
|
--------------------- /opt/cray/pe/lmod/modulefiles/core ---------------------
|
||||||
|
|
||||||
PrgEnv-cray/8.5.0 PrgEnv-gnu/8.5.0
|
PrgEnv-cray/8.5.0 PrgEnv-gnu/8.5.0
|
||||||
PrgEnv-nvhpc/8.5.0 PrgEnv-nvidia/8.5.0
|
PrgEnv-nvhpc/8.5.0 PrgEnv-nvidia/8.5.0
|
||||||
```
|
```
|
||||||
## Switching compiler suites
|
## Switching compiler suites
|
||||||
|
|
||||||
Compiler suites can be exchanged with PrgEnv (Programming Environments) provided by HPE-Cray. The wrappers call the correct compiler with appropriate options to build
|
Compiler suites can be exchanged with PrgEnv (Programming Environments) provided by HPE-Cray. The wrappers call the correct compiler with appropriate options to build
|
||||||
and link applications with relevant libraries, as required by the loaded modules (only dynamic linking is supported) and therefore should replace direct calls to compiler
|
and link applications with relevant libraries, as required by the loaded modules (only dynamic linking is supported) and therefore should replace direct calls to compiler
|
||||||
drivers in Makefiles and build scripts.
|
drivers in Makefiles and build scripts.
|
||||||
|
|
||||||
To swap the the compiler suite from the default Cray to GNU compiler, one can run the following.
|
To swap the the compiler suite from the default Cray to GNU compiler, one can run the following.
|
||||||
|
|||||||
@@ -1,11 +1,4 @@
|
|||||||
---
|
# GROMACS
|
||||||
title: GROMACS
|
|
||||||
keywords: GROMACS software, compile
|
|
||||||
summary: "GROMACS (GROningen Machine for Chemical Simulations) is a versatile and widely-used open source package to perform molecular dynamics"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
toc: false
|
|
||||||
permalink: /merlin7/gromacs.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## GROMACS
|
## GROMACS
|
||||||
|
|
||||||
|
|||||||
@@ -1,11 +1,4 @@
|
|||||||
---
|
# IPPL
|
||||||
title: IPPL
|
|
||||||
keywords: IPPL software, compile
|
|
||||||
summary: "Independent Parallel Particle Layer (IPPL) is a performance portable C++ library for Particle-Mesh methods"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
toc: false
|
|
||||||
permalink: /merlin7/ippl.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## IPPL
|
## IPPL
|
||||||
|
|
||||||
@@ -15,12 +8,12 @@ Independent Parallel Particle Layer (IPPL) is a performance portable C++ library
|
|||||||
|
|
||||||
GNU GPLv3
|
GNU GPLv3
|
||||||
|
|
||||||
## How to run on Merlin7
|
## How to run on Merlin7
|
||||||
### A100 nodes
|
### A100 nodes
|
||||||
[](https://gitea.psi.ch/HPCE/spack-psi)
|
[](https://gitea.psi.ch/HPCE/spack-psi)
|
||||||
```bash
|
```bash
|
||||||
module use Spack unstable
|
module use Spack unstable
|
||||||
module load gcc/13.2.0 openmpi/4.1.6-57rc-A100-gpu
|
module load gcc/13.2.0 openmpi/4.1.6-57rc-A100-gpu
|
||||||
module load boost/1.82.0-e7gp fftw/3.3.10 gnutls/3.8.3 googletest/1.14.0 gsl/2.8 h5hut/2.0.0rc7 openblas/0.3.26-omp cmake/3.31.6-oe7u
|
module load boost/1.82.0-e7gp fftw/3.3.10 gnutls/3.8.3 googletest/1.14.0 gsl/2.8 h5hut/2.0.0rc7 openblas/0.3.26-omp cmake/3.31.6-oe7u
|
||||||
|
|
||||||
cd <path to IPPL source directory>
|
cd <path to IPPL source directory>
|
||||||
@@ -39,8 +32,8 @@ salloc --partition=gh-daily --clusters=gmerlin7 --time=08:00:00 --ntasks=4 --nod
|
|||||||
ssh <allocated_gpu>
|
ssh <allocated_gpu>
|
||||||
|
|
||||||
module use Spack unstable
|
module use Spack unstable
|
||||||
module load gcc/13.2.0 openmpi/5.0.3-3lmi-GH200-gpu
|
module load gcc/13.2.0 openmpi/5.0.3-3lmi-GH200-gpu
|
||||||
module load boost/1.82.0-3ns6 fftw/3.3.10 gnutls/3.8.3 googletest/1.14.0 gsl/2.7.1 h5hut/2.0.0rc7 openblas/0.3.26 cmake/3.31.4-u2nm
|
module load boost/1.82.0-3ns6 fftw/3.3.10 gnutls/3.8.3 googletest/1.14.0 gsl/2.7.1 h5hut/2.0.0rc7 openblas/0.3.26 cmake/3.31.4-u2nm
|
||||||
|
|
||||||
cd <path to IPPL source directory>
|
cd <path to IPPL source directory>
|
||||||
mkdir build_gh
|
mkdir build_gh
|
||||||
|
|||||||
@@ -1,11 +1,4 @@
|
|||||||
---
|
# LAMMPS
|
||||||
title: LAMMPS
|
|
||||||
keywords: LAMMPS software, compile
|
|
||||||
summary: "LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
toc: false
|
|
||||||
permalink: /merlin7/lammps.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## LAMMPS
|
## LAMMPS
|
||||||
|
|
||||||
|
|||||||
@@ -1,11 +1,4 @@
|
|||||||
---
|
# OPAL-X
|
||||||
title: OPAL-X
|
|
||||||
keywords: OPAL-X software, compile
|
|
||||||
summary: "OPAL (Object Oriented Particle Accelerator Library) is an open source C++ framework for general particle accelerator simulations including 3D space charge, short range wake fields and particle matter interaction."
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
toc: false
|
|
||||||
permalink: /merlin7/opal-x.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## OPAL
|
## OPAL
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# OpenMPI Support
|
||||||
title: OpenMPI Support
|
|
||||||
#tags:
|
|
||||||
last_updated: 15 January 2025
|
|
||||||
keywords: software, openmpi, slurm
|
|
||||||
summary: "This document describes how to use OpenMPI in the Merlin7 cluster"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/openmpi.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
@@ -70,7 +62,7 @@ specific pmix plugin versions available: pmix_v5,pmix_v4,pmix_v3,pmix_v2
|
|||||||
```
|
```
|
||||||
|
|
||||||
Important Notes:
|
Important Notes:
|
||||||
* For OpenMPI, always use `pmix` by specifying the appropriate version (`pmix_$version`).
|
* For OpenMPI, always use `pmix` by specifying the appropriate version (`pmix_$version`).
|
||||||
When loading an OpenMPI module (via [PModules](pmodules.md) or [Spack](spack.md)), the corresponding PMIx version will be automatically loaded.
|
When loading an OpenMPI module (via [PModules](pmodules.md) or [Spack](spack.md)), the corresponding PMIx version will be automatically loaded.
|
||||||
* Users do not need to manually manage PMIx compatibility.
|
* Users do not need to manually manage PMIx compatibility.
|
||||||
|
|
||||||
|
|||||||
@@ -1,16 +1,8 @@
|
|||||||
---
|
# PSI Modules
|
||||||
title: PSI Modules
|
|
||||||
#tags:
|
|
||||||
keywords: Pmodules, software, stable, unstable, deprecated, overlay, overlays, release stage, module, package, packages, library, libraries
|
|
||||||
last_updated: 07 September 2022
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/pmodules.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## PSI Environment Modules
|
## PSI Environment Modules
|
||||||
|
|
||||||
On top of the operating system stack we provide different software using the PSI developed PModule system.
|
On top of the operating system stack we provide different software using the PSI developed PModule system.
|
||||||
|
|
||||||
PModules is the official supported way and each package is deployed by a specific expert. Usually, in PModules
|
PModules is the official supported way and each package is deployed by a specific expert. Usually, in PModules
|
||||||
software which is used by many people will be found.
|
software which is used by many people will be found.
|
||||||
@@ -22,25 +14,25 @@ If you miss any package/versions or a software with a specific missing feature,
|
|||||||
To ensure proper software lifecycle management, PModules uses three release stages: unstable, stable, and deprecated.
|
To ensure proper software lifecycle management, PModules uses three release stages: unstable, stable, and deprecated.
|
||||||
|
|
||||||
1. **Unstable Release Stage:**
|
1. **Unstable Release Stage:**
|
||||||
* Contains experimental or under-development software versions.
|
* Contains experimental or under-development software versions.
|
||||||
* Not visible to users by default. Use explicitly:
|
* Not visible to users by default. Use explicitly:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
module use unstable
|
module use unstable
|
||||||
```
|
```
|
||||||
* Software is promoted to **stable** after validation.
|
* Software is promoted to **stable** after validation.
|
||||||
2. **Stable Release Stage:**
|
2. **Stable Release Stage:**
|
||||||
* Default stage, containing fully tested and supported software versions.
|
* Default stage, containing fully tested and supported software versions.
|
||||||
* Recommended for all production workloads.
|
* Recommended for all production workloads.
|
||||||
|
|
||||||
3. **Deprecated Release Stage:**
|
3. **Deprecated Release Stage:**
|
||||||
* Contains software versions that are outdated or discontinued.
|
* Contains software versions that are outdated or discontinued.
|
||||||
* These versions are hidden by default but can be explicitly accessed:
|
* These versions are hidden by default but can be explicitly accessed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
module use deprecated
|
module use deprecated
|
||||||
```
|
```
|
||||||
* Deprecated software can still be loaded directly without additional configuration to ensure user transparency.
|
* Deprecated software can still be loaded directly without additional configuration to ensure user transparency.
|
||||||
|
|
||||||
## PModules commands
|
## PModules commands
|
||||||
|
|
||||||
@@ -57,7 +49,7 @@ module purge # unload all loaded packages and cleanup the en
|
|||||||
```
|
```
|
||||||
|
|
||||||
Please refer to the **external [PSI Modules](https://pmodules.gitpages.psi.ch/chap3.html) document** for
|
Please refer to the **external [PSI Modules](https://pmodules.gitpages.psi.ch/chap3.html) document** for
|
||||||
detailed information about the `module` command.
|
detailed information about the `module` command.
|
||||||
|
|
||||||
### module use/unuse
|
### module use/unuse
|
||||||
|
|
||||||
@@ -85,7 +77,7 @@ Please run `module avail --help` for further listing options.
|
|||||||
### module search
|
### module search
|
||||||
|
|
||||||
This is used to **search** for **software packages**. By default, if no **Release Stage** or **Software Group** is specified
|
This is used to **search** for **software packages**. By default, if no **Release Stage** or **Software Group** is specified
|
||||||
in the options of the `module search` command, it will search from the already invoked *Software Groups* and *Release Stages*.
|
in the options of the `module search` command, it will search from the already invoked *Software Groups* and *Release Stages*.
|
||||||
Direct package dependencies will be also showed.
|
Direct package dependencies will be also showed.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
@@ -1,11 +1,4 @@
|
|||||||
---
|
# Quantum Espresso
|
||||||
title: Quantum Espresso
|
|
||||||
keywords: Quantum Espresso software, compile
|
|
||||||
summary: "Quantum Espresso code for electronic-structure calculations and materials modeling at the nanoscale"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
toc: false
|
|
||||||
permalink: /merlin7/quantum-espresso.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Quantum ESPRESSO
|
## Quantum ESPRESSO
|
||||||
|
|
||||||
@@ -121,7 +114,6 @@ module purge
|
|||||||
module use Spack unstable
|
module use Spack unstable
|
||||||
module load nvhpc/25.3 openmpi/5.0.7-e3bf-GH200-gpu fftw/3.3.10-sfpw-omp hdf5/develop-2.0-ztvo nvpl-blas/0.4.0.1-3zpg nvpl-lapack/0.3.0-ymy5 netlib-scalapack/2.2.2-qrhq cmake/3.31.6-5dl7
|
module load nvhpc/25.3 openmpi/5.0.7-e3bf-GH200-gpu fftw/3.3.10-sfpw-omp hdf5/develop-2.0-ztvo nvpl-blas/0.4.0.1-3zpg nvpl-lapack/0.3.0-ymy5 netlib-scalapack/2.2.2-qrhq cmake/3.31.6-5dl7
|
||||||
|
|
||||||
|
|
||||||
cd <path to QE source directory>
|
cd <path to QE source directory>
|
||||||
mkdir build
|
mkdir build
|
||||||
cd build
|
cd build
|
||||||
|
|||||||
@@ -1,11 +1,4 @@
|
|||||||
---
|
# Spack
|
||||||
title: Spack
|
|
||||||
keywords: spack, python, software, compile
|
|
||||||
summary: "Spack the HPC package manager documentation"
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
toc: false
|
|
||||||
permalink: /merlin7/spack.html
|
|
||||||
---
|
|
||||||
|
|
||||||
For Merlin7 the *package manager for supercomputing* [Spack](https://spack.io/) is available. It is meant to compliment the existing PModules
|
For Merlin7 the *package manager for supercomputing* [Spack](https://spack.io/) is available. It is meant to compliment the existing PModules
|
||||||
solution, giving users the opertunity to manage their own software environments.
|
solution, giving users the opertunity to manage their own software environments.
|
||||||
|
|||||||
@@ -1,12 +1,4 @@
|
|||||||
---
|
# Contact
|
||||||
title: Contact
|
|
||||||
#tags:
|
|
||||||
keywords: contact, support, snow, service now, mailing list, mailing, email, mail, merlin-admins@lists.psi.ch, merlin-users@lists.psi.ch, merlin users
|
|
||||||
last_updated: 15. Jan 2025
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
permalink: /merlin7/contact.html
|
|
||||||
---
|
|
||||||
|
|
||||||
## Support
|
## Support
|
||||||
|
|
||||||
@@ -16,10 +8,10 @@ Support can be asked through:
|
|||||||
|
|
||||||
Basic contact information is also displayed on every shell login to the system using the *Message of the Day* mechanism.
|
Basic contact information is also displayed on every shell login to the system using the *Message of the Day* mechanism.
|
||||||
|
|
||||||
|
|
||||||
### PSI Service Now
|
### PSI Service Now
|
||||||
|
|
||||||
**[PSI Service Now](https://psi.service-now.com/psisp)**: is the official tool for opening incident requests.
|
**[PSI Service Now](https://psi.service-now.com/psisp)**: is the official tool for opening incident requests.
|
||||||
|
|
||||||
* PSI HelpDesk will redirect the incident to the corresponding department, or
|
* PSI HelpDesk will redirect the incident to the corresponding department, or
|
||||||
* you can always assign it directly by checking the box `I know which service is affected` and providing the service name `Local HPC Resources (e.g. Merlin) [CF]` (just type in `Local` and you should get the valid completions).
|
* you can always assign it directly by checking the box `I know which service is affected` and providing the service name `Local HPC Resources (e.g. Merlin) [CF]` (just type in `Local` and you should get the valid completions).
|
||||||
|
|
||||||
@@ -35,7 +27,7 @@ Basic contact information is also displayed on every shell login to the system u
|
|||||||
|
|
||||||
Is strongly recommended that users subscribe to the Merlin Users mailing list: **<merlin-users@lists.psi.ch>**
|
Is strongly recommended that users subscribe to the Merlin Users mailing list: **<merlin-users@lists.psi.ch>**
|
||||||
|
|
||||||
This mailing list is the official channel used by Merlin administrators to inform users about downtimes,
|
This mailing list is the official channel used by Merlin administrators to inform users about downtimes,
|
||||||
interventions or problems. Users can be subscribed in two ways:
|
interventions or problems. Users can be subscribed in two ways:
|
||||||
|
|
||||||
* *(Preferred way)* Self-registration through **[Sympa](https://psilists.ethz.ch/sympa/info/merlin-users)**
|
* *(Preferred way)* Self-registration through **[Sympa](https://psilists.ethz.ch/sympa/info/merlin-users)**
|
||||||
|
|||||||
@@ -1,12 +1,3 @@
|
|||||||
---
|
|
||||||
#tags:
|
|
||||||
keywords: merlin6, merlin7, migration, fpsync, rsync
|
|
||||||
#summary: ""
|
|
||||||
sidebar: merlin7_sidebar
|
|
||||||
last_updated: 28 May 2025
|
|
||||||
permalink: /merlin7/migrating.html
|
|
||||||
---
|
|
||||||
|
|
||||||
# Merlin6 to Merlin7 Migration Guide
|
# Merlin6 to Merlin7 Migration Guide
|
||||||
|
|
||||||
Welcome to the official documentation for migrating your data from **Merlin6** to **Merlin7**. Please follow the instructions carefully to ensure a smooth and secure transition.
|
Welcome to the official documentation for migrating your data from **Merlin6** to **Merlin7**. Please follow the instructions carefully to ensure a smooth and secure transition.
|
||||||
@@ -15,7 +6,7 @@ Welcome to the official documentation for migrating your data from **Merlin6** t
|
|||||||
|
|
||||||
### Phase 1: Users without Projects — **Deadline: July 11**
|
### Phase 1: Users without Projects — **Deadline: July 11**
|
||||||
|
|
||||||
If you **do not belong to any Merlin project**, i.e for
|
If you **do not belong to any Merlin project**, i.e for
|
||||||
|
|
||||||
* Users not in any group project (`/data/projects/general`)
|
* Users not in any group project (`/data/projects/general`)
|
||||||
* Users not in BIO, MEG, Mu3e
|
* Users not in BIO, MEG, Mu3e
|
||||||
@@ -59,8 +50,8 @@ for further information.
|
|||||||
* The **home directory and user data directory have been merged** into the single new home directory`/data/user/$USER`.
|
* The **home directory and user data directory have been merged** into the single new home directory`/data/user/$USER`.
|
||||||
* The **experiments directory has been integrated into `/data/project/`**:
|
* The **experiments directory has been integrated into `/data/project/`**:
|
||||||
|
|
||||||
* `/data/project/general` contains general Merlin7 projects.
|
* `/data/project/general` contains general Merlin7 projects.
|
||||||
* Other subdirectories are used for large-scale projects such as CLS division, Mu3e, and MeG.
|
* Other subdirectories are used for large-scale projects such as CLS division, Mu3e, and MeG.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -70,13 +61,15 @@ Before starting the migration, make sure you:
|
|||||||
|
|
||||||
* are **registered on Merlin7**.
|
* are **registered on Merlin7**.
|
||||||
|
|
||||||
* If not yet registered, please do so following [these instructions](../01-Quick-Start-Guide/requesting-accounts.md)
|
* If not yet registered, please do so following [these instructions](../01-Quick-Start-Guide/requesting-accounts.md)
|
||||||
|
|
||||||
* **have cleaned up your data to reduce migration time and space usage**.
|
* **have cleaned up your data to reduce migration time and space usage**.
|
||||||
|
|
||||||
* **For the user data migration**, ensure your total usage on Merlin6 (`/psi/home`+`/data/user`) is **well below the 1 TB quota** (use the `merlin_quotas` command). Remember:
|
* **For the user data migration**, ensure your total usage on Merlin6 (`/psi/home`+`/data/user`) is **well below the 1 TB quota** (use the `merlin_quotas` command). Remember:
|
||||||
|
|
||||||
* **Merlin7 also has a 1 TB quota on your home directory**, and you might already have data there.
|
* **Merlin7 also has a 1 TB quota on your home directory**, and you might already have data there.
|
||||||
* If your usage exceeds this during the transfer, the process might fail.
|
* If your usage exceeds this during the transfer, the process might fail.
|
||||||
|
|
||||||
* No activity should be running / performed on Merlin6 when the transfer process is ongoing.
|
* No activity should be running / performed on Merlin6 when the transfer process is ongoing.
|
||||||
|
|
||||||
### Recommended Cleanup Actions
|
### Recommended Cleanup Actions
|
||||||
@@ -85,13 +78,13 @@ Before starting the migration, make sure you:
|
|||||||
* Archive large, inactive data sets.
|
* Archive large, inactive data sets.
|
||||||
* Delete or clean up unused `conda` or `virtualenv` Python environments:
|
* Delete or clean up unused `conda` or `virtualenv` Python environments:
|
||||||
|
|
||||||
* These are often large and may not work as-is on Merlin7.
|
* These are often large and may not work as-is on Merlin7.
|
||||||
* You can export your conda environment description to a file with:
|
* You can export your conda environment description to a file with:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
conda env export -n myenv > $HOME/myenv.yml
|
conda env export -n myenv > $HOME/myenv.yml
|
||||||
```
|
```
|
||||||
* Then recreate them later on Merlin7 from these files.
|
* Then recreate them later on Merlin7 from these files.
|
||||||
|
|
||||||
> 🧹 For the **user data**, you can always remove more old data **after** migration — it will be copied into `~/merlin6data` and `~/merlin6home` on Merlin7.
|
> 🧹 For the **user data**, you can always remove more old data **after** migration — it will be copied into `~/merlin6data` and `~/merlin6home` on Merlin7.
|
||||||
|
|
||||||
@@ -113,10 +106,11 @@ This script will:
|
|||||||
* Configure and check that your environment is ready for transferring files via Slurm job.
|
* Configure and check that your environment is ready for transferring files via Slurm job.
|
||||||
* **Create two directories:**
|
* **Create two directories:**
|
||||||
|
|
||||||
* `~/merlin6data` → copy of your old /data/user
|
* `~/merlin6data` → copy of your old /data/user
|
||||||
* `~/merlin6home` → copy of your old home
|
* `~/merlin6home` → copy of your old home
|
||||||
|
|
||||||
|
> ⚠️ **Important:** If `~/merlin6home` or `~/merlin6data` already exist on Merlin7, the script will exit.
|
||||||
|
|
||||||
> ⚠️ **Important:** If `~/merlin6home` or `~/merlin6data` already exist on Merlin7, the script will exit.
|
|
||||||
> **Please remove them or contact support**.
|
> **Please remove them or contact support**.
|
||||||
|
|
||||||
If there are issues, the script will:
|
If there are issues, the script will:
|
||||||
@@ -159,9 +153,9 @@ If a problem occurs during the migration process:
|
|||||||
* 🔍 **Check the job log files** mentioned in the script output. They contain detailed messages that explain what failed and why.
|
* 🔍 **Check the job log files** mentioned in the script output. They contain detailed messages that explain what failed and why.
|
||||||
* 🛠️ **Fix the root cause** on the source system. Common issues include:
|
* 🛠️ **Fix the root cause** on the source system. Common issues include:
|
||||||
|
|
||||||
* Files with incorrect permissions
|
* Files with incorrect permissions
|
||||||
* Ownership mismatches
|
* Ownership mismatches
|
||||||
* Disk quota exceeded on Merlin7
|
* Disk quota exceeded on Merlin7
|
||||||
* 📚 Refer to the [⚠️ Common rsync/fpsync Migration Issues](#common-rsyncfpsync-migration-issues) section below for detailed explanations and solutions.
|
* 📚 Refer to the [⚠️ Common rsync/fpsync Migration Issues](#common-rsyncfpsync-migration-issues) section below for detailed explanations and solutions.
|
||||||
|
|
||||||
> ℹ️ **Important:** If `migrate_merlin6data.batch` fails, the migration process will automatically cancel `migrate_merlin6home.batch` to avoid ending in an inconsistent state.
|
> ℹ️ **Important:** If `migrate_merlin6data.batch` fails, the migration process will automatically cancel `migrate_merlin6home.batch` to avoid ending in an inconsistent state.
|
||||||
@@ -200,10 +194,10 @@ merlin7_migration.setup
|
|||||||
*Expected output:*
|
*Expected output:*
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
✅ login002.merlin7.psi.ch
|
✅ login002.merlin7.psi.ch
|
||||||
✅ `$USER` is a member of svc-cluster_merlin7
|
✅ `$USER` is a member of svc-cluster_merlin7
|
||||||
✅ Skipping key generation
|
✅ Skipping key generation
|
||||||
✅ SSH key already added to agent.
|
✅ SSH key already added to agent.
|
||||||
✅ SSH ID successfully copied to login00[1|2].merlin7.psi.ch.
|
✅ SSH ID successfully copied to login00[1|2].merlin7.psi.ch.
|
||||||
✅ Test successful.
|
✅ Test successful.
|
||||||
✅ /data/software/xfer_logs/caubet_m created.
|
✅ /data/software/xfer_logs/caubet_m created.
|
||||||
@@ -287,7 +281,7 @@ Further instructions will be sent via email once the owning team is contacted by
|
|||||||
* **Cause**: Source files are owned by another user (e.g. root or a collaborator).
|
* **Cause**: Source files are owned by another user (e.g. root or a collaborator).
|
||||||
* **Solution**:
|
* **Solution**:
|
||||||
|
|
||||||
* Change ownership before migration:
|
* Change ownership before migration:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
chown -R $USER /path/to/file
|
chown -R $USER /path/to/file
|
||||||
|
|||||||
@@ -181,10 +181,10 @@ nav:
|
|||||||
- merlin6/99-support/known-problems.md
|
- merlin6/99-support/known-problems.md
|
||||||
- merlin6/99-support/migration-from-merlin5.md
|
- merlin6/99-support/migration-from-merlin5.md
|
||||||
- merlin6/99-support/troubleshooting.md
|
- merlin6/99-support/troubleshooting.md
|
||||||
- PSI@CSCS:
|
|
||||||
- cscs-userlab/index.md
|
|
||||||
- cscs-userlab/transfer-data.md
|
|
||||||
- MeG:
|
- MeG:
|
||||||
- meg/index.md
|
- meg/index.md
|
||||||
- meg/contact.md
|
- meg/contact.md
|
||||||
- meg/migration-to-merlin7.md
|
- meg/migration-to-merlin7.md
|
||||||
|
- PSI@CSCS:
|
||||||
|
- cscs-userlab/index.md
|
||||||
|
- cscs-userlab/transfer-data.md
|
||||||
|
|||||||
Reference in New Issue
Block a user