initial formatting changes complete

This commit is contained in:
2026-01-06 16:40:15 +01:00
parent f58c1f57b8
commit 7db5d0fd05
81 changed files with 805 additions and 1112 deletions

View File

@@ -1,12 +1,4 @@
---
title: Archive & PSI Data Catalog
#tags:
keywords: linux, archive, data catalog, archiving, lts, tape, long term storage, ingestion, datacatalog
last_updated: 31 January 2020
summary: "This document describes how to use the PSI Data Catalog for archiving Merlin7 data."
sidebar: merlin7_sidebar
permalink: /merlin7/archive.html
---
# Archive & PSI Data Catalog
## PSI Data Catalog as a PSI Central Service
@@ -19,14 +11,14 @@ The Data Catalog and Archive is suitable for:
* Derived data produced by processing some inputs
* Data required to reproduce PSI research and publications
The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management.
In accordance with this policy, ***data will be publicly released under CC-BY-SA 4.0 after an
The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management.
In accordance with this policy, ***data will be publicly released under CC-BY-SA 4.0 after an
embargo period expires.***
The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the
The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the
Merlin storage under the ``/data`` directories (currentlyi, ``/data/user`` and ``/data/project``).
Archiving from other directories is also possible, however the process is much slower as data
can not be directly retrieved by the PSI archive central servers (**central mode**), and needs to
can not be directly retrieved by the PSI archive central servers (**central mode**), and needs to
be indirectly copied to these (**decentral mode**).
Archiving can be done from any node accessible by the users (usually from the login nodes).
@@ -48,33 +40,33 @@ Archiving can be done from any node accessible by the users (usually from the lo
Below are the main steps for using the Data Catalog.
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
* Prepare a metadata file describing the dataset
* Run **``datasetIngestor``** script
* If necessary, the script will copy the data to the PSI archive servers
* Usually this is necessary when archiving from directories other than **``/data/user``** or
* Prepare a metadata file describing the dataset
* Run **``datasetIngestor``** script
* If necessary, the script will copy the data to the PSI archive servers
* Usually this is necessary when archiving from directories other than **``/data/user``** or
**``/data/project``**. It would be also necessary when the Merlin export server (**``merlin-archive.psi.ch``**)
is down for any reason.
* Archive the dataset:
* Visit [https://discovery.psi.ch](https://discovery.psi.ch)
* Click **``Archive``** for the dataset
* The system will now copy the data to the PetaByte Archive at CSCS
* Visit [<https://discovery.psi.ch](https://discovery.psi.ch>)
* Click **``Archive``** for the dataset
* The system will now copy the data to the PetaByte Archive at CSCS
* Retrieve data from the catalog:
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **``Retrieve``**
* Wait for the data to be copied to the PSI retrieval system
* Run **``datasetRetriever``** script
* Find the dataset on [<https://discovery.psi.ch](https://discovery.psi.ch>) and click **``Retrieve``**
* Wait for the data to be copied to the PSI retrieval system
* Run **``datasetRetriever``** script
Since large data sets may take a lot of time to transfer, some steps are designed to happen in the
background. The discovery website can be used to track the progress of each step.
Since large data sets may take a lot of time to transfer, some steps are designed to happen in the
background. The discovery website can be used to track the progress of each step.
### Account Registration
Two types of account permit access to the Data Catalog. If your data was collected at a ***beamline***, you may
have been assigned a **``p-group``** (e.g. ``p12345``) for the experiment. Other users are assigned **``a-group``**
(e.g. ``a-12345``).
Two types of account permit access to the Data Catalog. If your data was collected at a ***beamline***, you may
have been assigned a **``p-group``** (e.g. ``p12345``) for the experiment. Other users are assigned **``a-group``**
(e.g. ``a-12345``).
Groups are usually assigned to a PI, and then individual user accounts are added to the group. This must be done
under user request through PSI Service Now. For existing **a-groups** and **p-groups**, you can follow the standard
central procedures. Alternatively, if you do not know how to do that, follow the Merlin7
central procedures. Alternatively, if you do not know how to do that, follow the Merlin7
**[Requesting extra Unix groups](../01-Quick-Start-Guide/requesting-accounts.md)** procedure, or open
a **[PSI Service Now](https://psi.service-now.com/psisp)** ticket.
@@ -114,11 +106,11 @@ $ SCICAT_TOKEN=RqYMZcqpqMJqluplbNYXLeSyJISLXfnkwlfBKuvTSdnlpKkU
Tokens expire after 2 weeks and will need to be fetched from the website again.
### Ingestion
### Ingestion
The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called
**``metadata.json``**, and can be created with a text editor (e.g. *``vim``*). It can in principle be saved anywhere,
but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata'
The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called
**``metadata.json``**, and can be created with a text editor (e.g. *``vim``*). It can in principle be saved anywhere,
but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata'
section below. An example follows:
```yaml
@@ -176,30 +168,31 @@ It will ask for your PSI credentials and then print some info about the data to
datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json
```
You will be asked whether you want to copy the data to the central system:
You will be asked whether you want to copy the data to the central system:
* If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can
* If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can
directly read the data.
* If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets
* If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets
to the PSI archive system may take quite a while (minutes to hours).
If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data.
This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so
If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data.
This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so
this process may take several days, and it will fail if any modifications are detected.
If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog:
[https://discovery.psi.ch](https://discovery.psi.ch). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion
If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog:
[<https://discovery.psi.ch](https://discovery.psi.ch>). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion
is complete.
If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive'
tab. You should see the newly ingested dataset. Check the dataset and click **``Archive``**. You should see the status change from **``datasetCreated``** to
If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive'
tab. You should see the newly ingested dataset. Check the dataset and click **``Archive``**. You should see the status change from **``datasetCreated``** to
**``scheduleArchiveJob``**. This indicates that the data is in the process of being transferred to CSCS.
After a few days the dataset's status will change to **``datasetOnAchive``** indicating the data is stored. At this point it is safe to delete the data.
#### Useful commands
Running the datasetIngestor in dry mode (**without** ``--ingest``) finds most errors. However, it is sometimes convenient to find potential errors
Running the datasetIngestor in dry mode (**without** ``--ingest``) finds most errors. However, it is sometimes convenient to find potential errors
yourself with simple unix commands.
Find problematic filenames
@@ -239,8 +232,8 @@ find . -name '*#autosave#' -delete
Certificate invalid: name is not a listed principal
```
It indicates that no kerberos token was provided for authentication. You can avoid the warning by first running kinit (PSI linux systems).
* For decentral ingestion cases, the copy step is indicated by a message ``Running [/usr/bin/rsync -e ssh -avxz ...``. It is expected that this
* For decentral ingestion cases, the copy step is indicated by a message ``Running [/usr/bin/rsync -e ssh -avxz ...``. It is expected that this
step will take a long time and may appear to have hung. You can check what files have been successfully transfered using rsync:
```bash
@@ -250,7 +243,7 @@ step will take a long time and may appear to have hung. You can check what files
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
```
tar -f [output].tar [srcdir]
@@ -271,7 +264,6 @@ step will take a long time and may appear to have hung. You can check what files
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
2019/11/06 11:04:43 Latest version: 1.1.11
2019/11/06 11:04:43 Your version of this program is up-to-date
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
2019/11/06 11:04:43 Your username:
@@ -321,7 +313,6 @@ user_n@pb-archive.psi.ch's password:
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
The data must first be copied to a rsync cache server.
2019/11/06 11:05:04 Do you want to continue (Y/n)?
Y
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
@@ -359,7 +350,7 @@ user_n@pb-archive.psi.ch's password:
### Publishing
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch.
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on <http://doi.psi.ch>.
For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8).

View File

@@ -1,12 +1,4 @@
---
title: Connecting from a Linux Client
#tags:
keywords: linux, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a Linux client."
sidebar: merlin7_sidebar
permalink: /merlin7/connect-from-linux.html
---
# Connecting from a Linux Client
## SSH without X11 Forwarding

View File

@@ -1,12 +1,4 @@
---
title: Connecting from a MacOS Client
#tags:
keywords: MacOS, mac os, mac, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a MacOS client."
sidebar: merlin7_sidebar
permalink: /merlin7/connect-from-macos.html
---
# Connecting from a MacOS Client
## SSH without X11 Forwarding
@@ -37,7 +29,7 @@ we provide a small recipe for enabling X11 Forwarding in MacOS.
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
to implicitly add ``-X`` to all ssh connections:
```bash
ForwardAgent yes
ForwardX11Trusted yes

View File

@@ -4,8 +4,9 @@
PuTTY is one of the most common tools for SSH.
Check, if the following software packages are installed on the Windows workstation by
Check, if the following software packages are installed on the Windows workstation by
inspecting the *Start* menu (hint: use the *Search* box to save time):
* PuTTY (should be already installed)
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
@@ -21,7 +22,6 @@ If they are missing, you can install them using the Software Kiosk icon on the D
![Create Merlin Session](../../images/PuTTY/Putty_Session.png)
## SSH with PuTTY with X11 Forwarding
Official X11 Forwarding support is through NoMachine. Please follow the document
@@ -29,9 +29,9 @@ Official X11 Forwarding support is through NoMachine. Please follow the document
[{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for more details. However,
we provide a small recipe for enabling X11 Forwarding in Windows.
Check, if the **Xming** is installed on the Windows workstation by inspecting the
Check, if the **Xming** is installed on the Windows workstation by inspecting the
*Start* menu (hint: use the *Search* box to save time). If missing, you can install it by
using the Software Kiosk icon (should be located on the Desktop).
using the Software Kiosk icon (should be located on the Desktop).
1. Ensure that a X server (**Xming**) is running. Otherwise, start it.

View File

@@ -1,12 +1,4 @@
---
title: Kerberos and AFS authentication
#tags:
keywords: kerberos, AFS, kinit, klist, keytab, tickets, connecting, client, configuration, slurm
last_updated: 07 September 2022
summary: "This document describes how to use Kerberos."
sidebar: merlin7_sidebar
permalink: /merlin7/kerberos.html
---
# Kerberos and AFS authentication
Projects and users have their own areas in the central PSI AFS service. In order
to access to these areas, valid Kerberos and AFS tickets must be granted.
@@ -58,7 +50,7 @@ Kerberos ticket is mandatory.
krenew
```
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
and then `kinit` should be used instead.
## Obtanining granting tickets with keytab
@@ -95,8 +87,8 @@ For generating a **keytab**, one has to:
```
Please note:
* That you will need to add your password once. This step is required for generating the **keytab** file.
* `ktutil`does **not** report an error if you enter a wrong password! You can test with the `kinit` command documented below. If `kinit` fails with an error message like "pre-authentication failed", this is usually due to a wrong password/key in the keytab file. In this case **you have to remove the keytab file** and re-run the `ktutil` command. See "Updating the keytab file" in the section below.
* That you will need to add your password once. This step is required for generating the **keytab** file.
* `ktutil`does **not** report an error if you enter a wrong password! You can test with the `kinit` command documented below. If `kinit` fails with an error message like "pre-authentication failed", this is usually due to a wrong password/key in the keytab file. In this case **you have to remove the keytab file** and re-run the `ktutil` command. See "Updating the keytab file" in the section below.
### Updating an existing keytab file
@@ -177,7 +169,7 @@ This is the **recommended** way. At the end of the job, is strongly recommended
#SBATCH --output=run.out # Generate custom output file
#SBATCH --error=run.err # Generate custom error file
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
#SBATCH --cpus-per-task=1
#SBATCH --constraint=xeon-gold-6152
#SBATCH --hint=nomultithread

View File

@@ -10,10 +10,8 @@ provides a helpful wrapper over the Gnome storage utilities (GIO and GVFS), and
- FTP, SFTP
- [complete list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
## Usage
### Start a session
First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.
@@ -38,7 +36,7 @@ merlin_rmount --select-mount
Select the desired url using the arrow keys.
![merlin_rmount --select-mount](../../images/rmount/select-mount.png)
From this list any of the standard supported endpoints can be mounted.
### Other endpoints
@@ -47,7 +45,6 @@ Other endpoints can be mounted using the `merlin_rmount --mount <endpoint>` comm
![merlin_rmount --mount](../../images/rmount/mount.png)
### Accessing Files
After mounting a volume the script will print the mountpoint. It should be of the form
@@ -67,7 +64,6 @@ ln -s ~/mnt /run/user/$UID/gvfs
Files are accessible as long as the `merlin_rmount` shell remains open.
### Disconnecting
To disconnect, close the session with one of the following:
@@ -78,7 +74,6 @@ To disconnect, close the session with one of the following:
Disconnecting will unmount all volumes.
## Alternatives
### Thunar

View File

@@ -1,12 +1,4 @@
---
title: Merlin7 Tools
#tags:
keywords: merlin_quotas
#last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/tools.html
---
# Merlin7 Tools
## About
@@ -27,17 +19,17 @@ found on the [Storage page](storage.md#dir_classes).
Simply calling `merlin_quotas` will show you a table of our quotas:
```console
$ merlin_quotas
$ merlin_quotas
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
-------------- --------- ---------- ------- --------- ---------- -------
/data/user 30.26G 1T 03% 367296 2097152 18%
/data/user 30.26G 1T 03% 367296 2097152 18%
└─ <USERNAME>
/afs/psi.ch 3.4G 9.5G 36% 0 0 00%
└─ user/<USERDIR>
/data/project 2.457T 10T 25% 58 2097152 00%
└─ bio/shared
/data/project 338.3G 10T 03% 199391 2097152 10%
└─ bio/hpce
└─ user/<USERDIR>
/data/project 2.457T 10T 25% 58 2097152 00%
└─ bio/shared
/data/project 338.3G 10T 03% 199391 2097152 10%
└─ bio/hpce
```
!!! tip
@@ -105,4 +97,3 @@ If you are added/removed from a project, you can update this config file by
calling `merlin_quotas genconf --force` (notice the `--force`, which will overwrite
your existing config file) or by editing the file by hand (*not recommended*).

View File

@@ -1,10 +1,4 @@
---
title: Remote Desktop Access to Merlin7
keywords: NX, NoMachine, remote desktop access, login node, login001, login002, merlin7-nx-01, merlin7-nx, nx.psi.ch, VPN, browser access
last_updated: 07 August 2024
sidebar: merlin7_sidebar
permalink: /merlin7/nomachine.html
---
# Remote Desktop Access to Merlin7
## Overview
@@ -21,7 +15,7 @@ If you are inside the PSI network, you can directly connect to the Merlin7 NoMac
#### Method 1: Using a Web Browser
Open your web browser and navigate to [https://merlin7-nx.psi.ch:4443](https://merlin7-nx.psi.ch:4443).
Open your web browser and navigate to <https://merlin7-nx.psi.ch:4443>.
#### Method 2: Using the NoMachine Client
@@ -42,7 +36,7 @@ Documentation about the `nx.psi.ch` service can be found [here](https://www.psi.
##### Using a Web Browser
Open your web browser and navigate to [https://nx.psi.ch](https://nx.psi.ch).
Open your web browser and navigate to <https://nx.psi.ch>.
##### Using the NoMachine Client

View File

@@ -1,16 +1,8 @@
---
title: Software repositories
#tags:
keywords: modules, software, stable, unstable, deprecated, spack, repository, repositories
last_updated: 16 January 2024
summary: "This page contains information about the different software repositories"
sidebar: merlin7_sidebar
permalink: /merlin7/software-repositories.html
---
# Software repositories
## Module Systems in Merlin7
Merlin7 provides a modular environment to ensure flexibility, compatibility, and optimized performance.
Merlin7 provides a modular environment to ensure flexibility, compatibility, and optimized performance.
The system supports three primary module types: PSI Environment Modules (PModules), Spack Modules, and Cray Environment Modules.
### PSI Environment Modules (PModules)
@@ -35,7 +27,7 @@ Merlin7 also provides Spack modules, offering a modern and flexible package mana
### Cray Environment Modules
Merlin7 also supports Cray Environment Modules, which include compilers, MPI implementations, and libraries optimized
for Cray systems. However, Cray modules are not recommended as the default choice due to potential backward compatibility
for Cray systems. However, Cray modules are not recommended as the default choice due to potential backward compatibility
issues when the Cray Programming Environment (CPE) is upgraded to a newer version.
Recommendations:

View File

@@ -1,13 +1,4 @@
---
title: Configuring SSH Keys in Merlin
#tags:
keywords: linux, connecting, client, configuration, SSH, Keys, SSH-Keys, RSA, authorization, authentication
last_updated: 15 Jul 2020
summary: "This document describes how to deploy SSH Keys in Merlin."
sidebar: merlin7_sidebar
permalink: /merlin7/ssh-keys.html
---
# Configuring SSH Keys in Merlin
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
@@ -22,14 +13,14 @@ User can check whether a SSH key already exists. These would be placed in the **
is usually the default one, and files in there would be **`id_rsa`** (private key) and **`id_rsa.pub`** (public key).
```bash
ls ~/.ssh/id*
ls ~/.ssh/id*
```
For creating **SSH RSA Keys**, one should:
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
```bash
@@ -92,7 +83,7 @@ to the **ssh-agent**. This must be done once per SSH session, as follows:
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
```
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
You will be requested for the **passphrase** of your key, and it can be done by running:
```bash
@@ -111,7 +102,7 @@ However, for NoMachine one always need to add the private key identity to the au
```bash
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
```
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
You will be requested for the **passphrase** of your key, and it can be done by running:
```bash

View File

@@ -1,13 +1,4 @@
---
title: Merlin7 Storage
#tags:
keywords: storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas
#last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
redirect_from: /merlin7/data-directories.html
permalink: /merlin7/storage.html
---
# Merlin7 Storage
## Introduction
@@ -30,13 +21,13 @@ Some of the Merlin7 directories have quotas applied. A way for checking the quot
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
```console
$ merlin_quotas
$ merlin_quotas
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
-------------- --------- ---------- ------- --------- ---------- -------
/data/user 30.26G 1T 03% 367296 2097152 18%
/data/user 30.26G 1T 03% 367296 2097152 18%
└─ <USERNAME>
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
└─ user/<USERDIR>
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
└─ user/<USERDIR>
/data/scratch 688.9M 2T 00% 368471 0 00%
└─ shared
/data/project 3.373T 11T 31% 425644 2097152 20%
@@ -117,7 +108,7 @@ Directory policies:
* No backup policy is applied for the user home directories: **users are responsible for backing up their data**.
Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the `merlin_quotas` command described
[above](storage.md#how-to-check-quotas).
[above](storage.md#how-to-check-quotas).
### Project data directory
@@ -151,7 +142,7 @@ Directory policies:
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
* It is **forbidden** to use the data directories as `/scratch` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
* Please Use `/scratch`, `/data/scratch/shared` for this purpose.
* Please Use `/scratch`, `/data/scratch/shared` for this purpose.
* No backups: users are responsible for managing the backups of their data directories.
#### Dedicated project directories
@@ -190,6 +181,6 @@ Scratch directories policies:
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
* Temporary files *must be deleted at the end of the job by the user*.
* Remaining files will be deleted by the system if detected.
* Remaining files will be deleted by the system if detected.
* Files not accessed within 28 days will be automatically cleaned up by the system.
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.

View File

@@ -1,26 +1,19 @@
---
title: Transferring Data
#tags:
keywords: transferring data, data transfer, rsync, winscp, copy data, copying, sftp, import, export, hop, vpn
last_updated: 24 August 2023
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/transfer-data.html
---
# Transferring Data
## Overview
Most data transfer methods support both sending and receiving, so you may initiate the transfer from either **Merlin** or the other system — depending on **network visibility**.
- **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
- **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
- HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
- Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
- **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
> SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**.
> * However, **transfers from any PSI host to Merlin7 using port 22 are allowed**.
>
> Port `21` is also available for FTP transfers from PSI to Merlin7.
* **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
* **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
* HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
* Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
* **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
!!! note
SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**.
However, **transfers from any PSI host to Merlin7 using port 22 are allowed**.
Port `21` is also available for FTP transfers from PSI to Merlin7.
### Choosing the best transfer method
@@ -46,6 +39,7 @@ The following methods transfer data directly via the [login nodes](../01-Quick-S
### Rsync (Recommended for Linux/macOS)
Rsync is the **preferred** method for small datasets from Linux/macOS systems. It supports **resuming interrupted transfers** and **skips already transferred files**. Syntax:
```bash
rsync -avAHXS <src> <dst>
```
@@ -65,12 +59,15 @@ rsync -avAHXS ~/localdata $USER@login001.merlin7.psi.ch:/data/project/general/my
### SCP
SCP works similarly to `rsync` but **does not support resuming** interrupted transfers. It may be used for quick one-off transfers. Example:
```bash
scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
```
### Secure FTP
A `vsftpd` service is available on the login nodes, providing high-speed transfers. Choose the server based on your **speed vs. encryption** needs:
* **`login001.merlin7.psi.ch`:** Encrypted control & data channels.
**Use if your data is sensitive**. **Slower**, but secure.
* **`service03.merlin7.psi.ch`**: Encrypted control channel only.
@@ -80,14 +77,16 @@ A `vsftpd` service is available on the login nodes, providing high-speed transfe
The **control channel** is always **encrypted**, therefore, authentication is encrypted and secured.
## UI-based Clients for Data Transfer
### WinSCP (Windows)
Available in the **Software Kiosk** on PSI Windows machines.
* Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to:
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
* Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to:
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
* Drag and drop files between your PC and Merlin.
* FTP (port 21)
@@ -95,30 +94,34 @@ Available in the **Software Kiosk** on PSI Windows machines.
### FileZilla (Linux/MacOS/Windows)
Download from [FileZilla Project](https://filezilla-project.org/), or install from your Linux software repositories if available.
* Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to:
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
* Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to:
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
* Supports drag-and-drop file transfers.
## Sharing Files with SWITCHfilesender
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is a Swiss-hosted installation of the [FileSender](https://filesender.org/) project — a web-based application that allows authenticated users to securely and easily send **arbitrarily large files** to other users. Features:
- **Secure large file transfers:** Send files that exceed normal email attachment limits.
- **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads.
- **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account.
- **Designed for research & education:** Developed to meet the needs of universities and research institutions.
* **Secure large file transfers:** Send files that exceed normal email attachment limits.
* **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads.
* **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account.
* **Designed for research & education:** Developed to meet the needs of universities and research institutions.
About the authentication:
- It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
- It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**.
- PSI employees can log in using their PSI account:
* It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
* It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**.
* PSI employees can log in using their PSI account:
1. Open [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload).
2. Select **PSI** as the institution.
3. Authenticate with your PSI credentials.
The service is designed to **send large files for temporary availability**, not as a permanent publishing platform. Typical use case:
1. Upload a file.
2. Share the download link with a recipient.
3. File remains available until the specified **expiration date** is reached, or the **download limit** is reached.
@@ -130,10 +133,11 @@ The service is designed to **send large files for temporary availability**, not
## PSI Data Transfer
From August 2024, Merlin is connected to the **[PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer)** service,
`datatransfer.psi.ch`. This is a central service managed by the **[Linux team](https://linux.psi.ch/index.html)**. However, any problems or questions related to it can be directly
`datatransfer.psi.ch`. This is a central service managed by the **[Linux team](https://linux.psi.ch/index.html)**. However, any problems or questions related to it can be directly
[reported](../99-support/contact.md) to the Merlin administrators, which will forward the request if necessary.
The PSI Data Transfer servers supports the following protocols:
* Data Transfer - SSH (scp / rsync)
* Data Transfer - Globus
@@ -150,27 +154,25 @@ Therefore, having the Microsoft Authenticator App is required as explained [here
## Connecting to Merlin7 from outside PSI
Merlin7 is fully accessible from within the PSI network. To connect from outside you can use:
- [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
- [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
* Please avoid transferring big amount data through **hop**
- [No Machine](nomachine.md)
* Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
* Please avoid transferring big amount of data through **NoMachine**
{% comment %}
* [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
* [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
* Please avoid transferring big amount data through **hop**
* [No Machine](nomachine.md)
* Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
* Please avoid transferring big amount of data through **NoMachine**
## Connecting from Merlin7 to outside file shares
### `merlin_rmount` command
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including
- SMB/CIFS (Windows shared folders)
- WebDav
- AFP
- FTP, SFTP
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
* SMB/CIFS (Windows shared folders)
* WebDav
* AFP
* FTP, SFTP
* [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
[More instruction on using `merlin_rmount`](merlin-rmount.md)
{% endcomment %}