first stab at mkdocs migration

refactor CSCS and Meg content

add merlin6 quick start

update merlin6 nomachine docs

give the userdoc its own color scheme

we use the Materials default one

refactored slurm general docs merlin6

add merlin6 JB docs

add software support m6 docs

add all files to nav

vibed changes #1

add missing pages

further vibing #2

vibe #3

further fixes
This commit is contained in:
2025-11-26 17:28:07 +01:00
parent 149de6fb18
commit bde174b726
313 changed files with 2608 additions and 11593 deletions

View File

@@ -0,0 +1,378 @@
---
title: Archive & PSI Data Catalog
#tags:
keywords: linux, archive, data catalog, archiving, lts, tape, long term storage, ingestion, datacatalog
last_updated: 31 January 2020
summary: "This document describes how to use the PSI Data Catalog for archiving Merlin7 data."
sidebar: merlin7_sidebar
permalink: /merlin7/archive.html
---
## PSI Data Catalog as a PSI Central Service
PSI provides access to the ***Data Catalog*** for **long-term data storage and retrieval**. Data is
stored on the ***PetaByte Archive*** at the **Swiss National Supercomputing Centre (CSCS)**.
The Data Catalog and Archive is suitable for:
* Raw data generated by PSI instruments
* Derived data produced by processing some inputs
* Data required to reproduce PSI research and publications
The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management.
In accordance with this policy, ***data will be publicly released under CC-BY-SA 4.0 after an
embargo period expires.***
The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the
Merlin storage under the ``/data`` directories (currentlyi, ``/data/user`` and ``/data/project``).
Archiving from other directories is also possible, however the process is much slower as data
can not be directly retrieved by the PSI archive central servers (**central mode**), and needs to
be indirectly copied to these (**decentral mode**).
Archiving can be done from any node accessible by the users (usually from the login nodes).
!!! tip
Archiving can be done in two different ways:
* **Central mode**: Possible for the user and project data directories, is
the fastest way as it does not require remote copy (data is directly retreived
by central AIT servers from Merlin through <merlin-archive.psi.ch>).
* **Decentral mode**: Possible for any directory, is the slowest way of
archiving as it requires to copy ('rsync') the data from Merlin to the
central AIT servers.
## Procedure
### Overview
Below are the main steps for using the Data Catalog.
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
* Prepare a metadata file describing the dataset
* Run **``datasetIngestor``** script
* If necessary, the script will copy the data to the PSI archive servers
* Usually this is necessary when archiving from directories other than **``/data/user``** or
**``/data/project``**. It would be also necessary when the Merlin export server (**``merlin-archive.psi.ch``**)
is down for any reason.
* Archive the dataset:
* Visit [https://discovery.psi.ch](https://discovery.psi.ch)
* Click **``Archive``** for the dataset
* The system will now copy the data to the PetaByte Archive at CSCS
* Retrieve data from the catalog:
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **``Retrieve``**
* Wait for the data to be copied to the PSI retrieval system
* Run **``datasetRetriever``** script
Since large data sets may take a lot of time to transfer, some steps are designed to happen in the
background. The discovery website can be used to track the progress of each step.
### Account Registration
Two types of account permit access to the Data Catalog. If your data was collected at a ***beamline***, you may
have been assigned a **``p-group``** (e.g. ``p12345``) for the experiment. Other users are assigned **``a-group``**
(e.g. ``a-12345``).
Groups are usually assigned to a PI, and then individual user accounts are added to the group. This must be done
under user request through PSI Service Now. For existing **a-groups** and **p-groups**, you can follow the standard
central procedures. Alternatively, if you do not know how to do that, follow the Merlin7
**[Requesting extra Unix groups](../01-Quick-Start-Guide/requesting-accounts.md)** procedure, or open
a **[PSI Service Now](https://psi.service-now.com/psisp)** ticket.
### Documentation
Accessing the Data Catalog is done through the [SciCat software](https://melanie.gitpages.psi.ch/SciCatPages/).
Documentation is here: [ingestManual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html).
#### Loading datacatalog tools
The latest datacatalog software is maintained in the PSI module system. To access it from the Merlin systems, run the following command:
```bash
module load datacatalog
```
It can be done from any host in the Merlin cluster accessible by users. Usually, login nodes will be the nodes used for archiving.
### Finding your token
As of 2022-04-14 a secure token is required to interact with the data catalog. This is a long random string that replaces the previous user/password authentication (allowing access for non-PSI use cases). **This string should be treated like a password and not shared.**
1. Go to discovery.psi.ch
1. Click 'Sign in' in the top right corner. Click the 'Login with PSI account' and log in on the PSI login1. page.
1. You should be redirected to your user settings and see a 'User Information' section. If not, click on1. your username in the top right and choose 'Settings' from the menu.
1. Look for the field 'Catamel Token'. This should be a 64-character string. Click the icon to copy the1. token.
![SciCat website](../../images/scicat_token.png)
You will need to save this token for later steps. To avoid including it in all the commands, I suggest saving it to an environmental variable (Linux):
```
$ SCICAT_TOKEN=RqYMZcqpqMJqluplbNYXLeSyJISLXfnkwlfBKuvTSdnlpKkU
```
(Hint: prefix this line with a space to avoid saving the token to your bash history.)
Tokens expire after 2 weeks and will need to be fetched from the website again.
### Ingestion
The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called
**``metadata.json``**, and can be created with a text editor (e.g. *``vim``*). It can in principle be saved anywhere,
but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata'
section below. An example follows:
```yaml
{
"principalInvestigator": "albrecht.gessler@psi.ch",
"creationLocation": "/PSI/EMF/JEOL2200FS",
"dataFormat": "TIFF+LZW Image Stack",
"sourceFolder": "/gpfs/group/LBR/pXXX/myimages",
"owner": "Wilhelm Tell",
"ownerEmail": "wilhelm.tell@psi.ch",
"type": "raw",
"description": "EM micrographs of amygdalin",
"ownerGroup": "a-12345",
"scientificMetadata": {
"description": "EM micrographs of amygdalin",
"sample": {
"name": "Amygdalin beta-glucosidase 1",
"uniprot": "P29259",
"species": "Apple"
},
"dataCollection": {
"date": "2018-08-01"
},
"microscopeParameters": {
"pixel size": {
"v": 0.885,
"u": "A"
},
"voltage": {
"v": 200,
"u": "kV"
},
"dosePerFrame": {
"v": 1.277,
"u": "e/A2"
}
}
}
}
```
It is recommended to use the [ScicatEditor](https://bliven_s.gitpages.psi.ch/SciCatEditor/) for creating metadata files. This is a browser-based tool specifically for ingesting PSI data. Using the tool avoids syntax errors and provides templates for common data sets and options. The finished JSON file can then be downloaded to merlin or copied into a text editor.
Another option is to use the SciCat graphical interface from NoMachine. This provides a graphical interface for selecting data to archive. This is particularly useful for data associated with a DUO experiment and p-group. Type `SciCat`` to get started after loading the `datacatalog`` module. The GUI also replaces the the command-line ingestion described below.
The following steps can be run from wherever you saved your ``metadata.json``. First, perform a "dry-run" which will check the metadata for errors:
```bash
datasetIngestor --token $SCICAT_TOKEN metadata.json
```
It will ask for your PSI credentials and then print some info about the data to be ingested. If there are no errors, proceed to the real ingestion:
```bash
datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json
```
You will be asked whether you want to copy the data to the central system:
* If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can
directly read the data.
* If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets
to the PSI archive system may take quite a while (minutes to hours).
If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data.
This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so
this process may take several days, and it will fail if any modifications are detected.
If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog:
[https://discovery.psi.ch](https://discovery.psi.ch). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion
is complete.
If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive'
tab. You should see the newly ingested dataset. Check the dataset and click **``Archive``**. You should see the status change from **``datasetCreated``** to
**``scheduleArchiveJob``**. This indicates that the data is in the process of being transferred to CSCS.
After a few days the dataset's status will change to **``datasetOnAchive``** indicating the data is stored. At this point it is safe to delete the data.
#### Useful commands
Running the datasetIngestor in dry mode (**without** ``--ingest``) finds most errors. However, it is sometimes convenient to find potential errors
yourself with simple unix commands.
Find problematic filenames
```bash
find . -iregex '.*/[^/]*[^a-zA-Z0-9_ ./-][^/]*'=
```
Find broken links
```bash
find -L . -type l
```
Find outside links
```bash
find . -type l -exec bash -c 'realpath --relative-base "`pwd`" "$0" 2>/dev/null |egrep "^[./]" |sed "s|^|$0 ->|" ' '{}' ';'
```
Delete certain files (use with caution)
```bash
# Empty directories
find . -type d -empty -delete
# Backup files
find . -name '*~' -delete
find . -name '*#autosave#' -delete
```
#### Troubleshooting & Known Bugs
* The following message can be safely ignored:
```bash
key_cert_check_authority: invalid certificate
Certificate invalid: name is not a listed principal
```
It indicates that no kerberos token was provided for authentication. You can avoid the warning by first running kinit (PSI linux systems).
* For decentral ingestion cases, the copy step is indicated by a message ``Running [/usr/bin/rsync -e ssh -avxz ...``. It is expected that this
step will take a long time and may appear to have hung. You can check what files have been successfully transfered using rsync:
```bash
rsync --list-only user_n@pb-archive.psi.ch:archive/UID/PATH/
```
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
```
tar -f [output].tar [srcdir]
```
Uncompressed data can be compressed on the cluster using the following command:
```
sbatch /data/software/Slurm/Utilities/Parallel_TarGz.batch -s [srcdir] -t [output].tar -n
```
Run /data/software/Slurm/Utilities/Parallel_TarGz.batch -h for more details and options.
#### Sample ingestion output (datasetIngestor 1.1.11)
<details>
<summary>[Show Example]: Sample ingestion output (datasetIngestor 1.1.11)</summary>
<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
2019/11/06 11:04:43 Latest version: 1.1.11
2019/11/06 11:04:43 Your version of this program is up-to-date
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
2019/11/06 11:04:43 Your username:
user_n
2019/11/06 11:04:48 Your password:
2019/11/06 11:04:52 User authenticated: XXX
2019/11/06 11:04:52 User is member in following a or p groups: XXX
2019/11/06 11:04:52 OwnerGroup information a-XXX verified successfully.
2019/11/06 11:04:52 contactEmail field added: XXX
2019/11/06 11:04:52 Scanning files in dataset /data/project/bio/myproject/archive
2019/11/06 11:04:52 No explicit filelistingPath defined - full folder /data/project/bio/myproject/archive is used.
2019/11/06 11:04:52 Source Folder: /data/project/bio/myproject/archive at /data/project/bio/myproject/archive
2019/11/06 11:04:57 The dataset contains 100000 files with a total size of 50000000000 bytes.
2019/11/06 11:04:57 creationTime field added: 2019-07-29 18:47:08 +0200 CEST
2019/11/06 11:04:57 endTime field added: 2019-11-06 10:52:17.256033 +0100 CET
2019/11/06 11:04:57 license field added: CC BY-SA 4.0
2019/11/06 11:04:57 isPublished field added: false
2019/11/06 11:04:57 classification field added: IN=medium,AV=low,CO=low
2019/11/06 11:04:57 Updated metadata object:
{
"accessGroups": [
"XXX"
],
"classification": "IN=medium,AV=low,CO=low",
"contactEmail": "XXX",
"creationLocation": "XXX",
"creationTime": "2019-07-29T18:47:08+02:00",
"dataFormat": "XXX",
"description": "XXX",
"endTime": "2019-11-06T10:52:17.256033+01:00",
"isPublished": false,
"license": "CC BY-SA 4.0",
"owner": "XXX",
"ownerEmail": "XXX",
"ownerGroup": "a-XXX",
"principalInvestigator": "XXX",
"scientificMetadata": {
...
},
"sourceFolder": "/data/project/bio/myproject/archive",
"type": "raw"
}
2019/11/06 11:04:57 Running [/usr/bin/ssh -l user_n pb-archive.psi.ch test -d /data/project/bio/myproject/archive].
key_cert_check_authority: invalid certificate
Certificate invalid: name is not a listed principal
user_n@pb-archive.psi.ch's password:
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
The data must first be copied to a rsync cache server.
2019/11/06 11:05:04 Do you want to continue (Y/n)?
Y
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
2019/11/06 11:05:09 The dataset contains 108057 files.
2019/11/06 11:05:10 Created file block 0 from file 0 to 1000 with total size of 413229990 bytes
2019/11/06 11:05:10 Created file block 1 from file 1000 to 2000 with total size of 416024000 bytes
2019/11/06 11:05:10 Created file block 2 from file 2000 to 3000 with total size of 416024000 bytes
2019/11/06 11:05:10 Created file block 3 from file 3000 to 4000 with total size of 416024000 bytes
...
2019/11/06 11:05:26 Created file block 105 from file 105000 to 106000 with total size of 416024000 bytes
2019/11/06 11:05:27 Created file block 106 from file 106000 to 107000 with total size of 416024000 bytes
2019/11/06 11:05:27 Created file block 107 from file 107000 to 108000 with total size of 850195143 bytes
2019/11/06 11:05:27 Created file block 108 from file 108000 to 108057 with total size of 151904903 bytes
2019/11/06 11:05:27 short dataset id: 0a9fe316-c9e7-4cc5-8856-e1346dd31e31
2019/11/06 11:05:27 Running [/usr/bin/rsync -e ssh -avxz /data/project/bio/myproject/archive/ user_n@pb-archive.psi.ch:archive
/0a9fe316-c9e7-4cc5-8856-e1346dd31e31/data/project/bio/myproject/archive].
key_cert_check_authority: invalid certificate
Certificate invalid: name is not a listed principal
user_n@pb-archive.psi.ch's password:
Permission denied, please try again.
user_n@pb-archive.psi.ch's password:
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
...
2019/11/06 12:05:08 Successfully updated {"pid":"12.345.67890/12345678-1234-1234-1234-123456789012",...}
2019/11/06 12:05:08 Submitting Archive Job for the ingested datasets.
2019/11/06 12:05:08 Job response Status: okay
2019/11/06 12:05:08 A confirmation email will be sent to XXX
12.345.67890/12345678-1234-1234-1234-123456789012
</pre>
</details>
### Publishing
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch.
For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8).
### Retrieving data
Retrieving data from the archive is also initiated through the Data Catalog. Please read the ['Retrieve' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-6).
## Further Information
* [PSI Data Catalog](https://discovery.psi.ch)
* [Full Documentation](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html)
* [Published Datasets (doi.psi.ch)](https://doi.psi.ch)
* Data Catalog [PSI page](https://www.psi.ch/photon-science-data-services/data-catalog-and-archive)
* Data catalog [SciCat Software](https://scicatproject.github.io/)
* [FAIR](https://www.nature.com/articles/sdata201618) definition and [SNF Research Policy](http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx#FAIR%20Data%20Principles%20for%20Research%20Data%20Management)
* [Petabyte Archive at CSCS](https://www.cscs.ch/fileadmin/user_upload/contents_publications/annual_reports/AR2017_Online.pdf)

View File

@@ -0,0 +1,48 @@
---
title: Connecting from a Linux Client
#tags:
keywords: linux, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a Linux client."
sidebar: merlin7_sidebar
permalink: /merlin7/connect-from-linux.html
---
## SSH without X11 Forwarding
This is the standard method. Official X11 support is provided through [NoMachine](nomachine.md).
For normal SSH sessions, use your SSH client as follows:
```bash
ssh $username@login001.merlin7.psi.ch
ssh $username@login002.merlin7.psi.ch
```
## SSH with X11 Forwarding
Official X11 Forwarding support is through NoMachine. Please follow the document
[{Job Submission -> Interactive Jobs}](../03-Slurm-General-Documentation/interactive-jobs.md#requirements) and
[{Accessing Merlin -> NoMachine}](nomachine.md) for more details. However,
we provide a small recipe for enabling X11 Forwarding in Linux.
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
to implicitly add ``-X`` to all ssh connections:
```bash
ForwardAgent yes
ForwardX11Trusted yes
```
* Alternatively, you can add the option ``-Y`` to the ``ssh`` command. In example:
```bash
ssh -X $username@login001.merlin7.psi.ch
ssh -X $username@login002.merlin7.psi.ch
```
* For testing that X11 forwarding works, just run ``sview``. A X11 based slurm view of the cluster should
popup in your client session:
```bash
sview
```

View File

@@ -0,0 +1,58 @@
---
title: Connecting from a MacOS Client
#tags:
keywords: MacOS, mac os, mac, connecting, client, configuration, SSH, X11
last_updated: 07 September 2022
summary: "This document describes a recommended setup for a MacOS client."
sidebar: merlin7_sidebar
permalink: /merlin7/connect-from-macos.html
---
## SSH without X11 Forwarding
This is the standard method. Official X11 support is provided through [NoMachine](nomachine.md).
For normal SSH sessions, use your SSH client as follows:
```bash
ssh $username@login001.merlin7.psi.ch
ssh $username@login002.merlin7.psi.ch
```
## SSH with X11 Forwarding
### Requirements
For running SSH with X11 Forwarding in MacOS, one needs to have a X server running in MacOS.
The official X Server for MacOS is **[XQuartz](https://www.xquartz.org/)**. Please ensure
you have it running before starting a SSH connection with X11 forwarding.
### SSH with X11 Forwarding in MacOS
Official X11 support is through NoMachine. Please follow the document
[{Job Submission -> Interactive Jobs}](../03-Slurm-General-Documentation/interactive-jobs.md#requirements) and
[{Accessing Merlin -> NoMachine}](nomachine.md) for more details. However,
we provide a small recipe for enabling X11 Forwarding in MacOS.
* Ensure that **[XQuartz](https://www.xquartz.org/)** is installed and running in your MacOS.
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
to implicitly add ``-X`` to all ssh connections:
```bash
ForwardAgent yes
ForwardX11Trusted yes
```
* Alternatively, you can add the option ``-Y`` to the ``ssh`` command. In example:
```bash
ssh -X $username@login001.merlin7.psi.ch
ssh -X $username@login002.merlin7.psi.ch
```
* For testing that X11 forwarding works, just run ``sview``. A X11 based slurm view of the cluster should
popup in your client session.
```bash
sview
```

View File

@@ -0,0 +1,40 @@
# Connecting from a Windows Client
## SSH with PuTTY without X11 Forwarding
PuTTY is one of the most common tools for SSH.
Check, if the following software packages are installed on the Windows workstation by
inspecting the *Start* menu (hint: use the *Search* box to save time):
* PuTTY (should be already installed)
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](#ssh-with-putty-with-x11-forwarding))
If they are missing, you can install them using the Software Kiosk icon on the Desktop.
1. Start PuTTY
2. *[Optional]* Enable ``xterm`` to have similar mouse behavour as in Linux:
![Enable 'xterm'](../../images/PuTTY/Putty_Mouse_XTerm.png)
3. Create session to a Merlin login node and *Open*:
![Create Merlin Session](../../images/PuTTY/Putty_Session.png)
## SSH with PuTTY with X11 Forwarding
Official X11 Forwarding support is through NoMachine. Please follow the document
[{Job Submission -> Interactive Jobs}](../03-Slurm-General-Documentation/interactive-jobs.md#requirements) and
[{Accessing Merlin -> NoMachine}](../02-How-To-Use-Merlin/nomachine.md) for more details. However,
we provide a small recipe for enabling X11 Forwarding in Windows.
Check, if the **Xming** is installed on the Windows workstation by inspecting the
*Start* menu (hint: use the *Search* box to save time). If missing, you can install it by
using the Software Kiosk icon (should be located on the Desktop).
1. Ensure that a X server (**Xming**) is running. Otherwise, start it.
2. Enable X11 Forwarding in your SSH client. In example, for Putty:
![Enable X11 Forwarding in Putty](../../images/PuTTY/Putty_X11_Forwarding.png)

View File

@@ -0,0 +1,230 @@
---
title: Kerberos and AFS authentication
#tags:
keywords: kerberos, AFS, kinit, klist, keytab, tickets, connecting, client, configuration, slurm
last_updated: 07 September 2022
summary: "This document describes how to use Kerberos."
sidebar: merlin7_sidebar
permalink: /merlin7/kerberos.html
---
Projects and users have their own areas in the central PSI AFS service. In order
to access to these areas, valid Kerberos and AFS tickets must be granted.
These tickets are automatically granted when accessing through SSH with
username and password. Alternatively, one can get a granting ticket with the `kinit` (Kerberos)
and `aklog` (AFS ticket, which needs to be run after `kinit`) commands.
Due to PSI security policies, the maximum lifetime of the ticket is 7 days, and the default
time is 10 hours. It means than one needs to constantly renew (`krenew` command) the existing
granting tickets, and their validity can not be extended longer than 7 days. At this point,
one needs to obtain new granting tickets.
## Obtaining granting tickets with username and password
As already described above, the most common use case is to obtain Kerberos and AFS granting tickets
by introducing username and password:
* When login to Merlin through SSH protocol, if this is done with username + password authentication,
tickets for Kerberos and AFS will be automatically obtained.
* When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to
run `kinit` (to obtain a granting Kerberos ticket) followed by `aklog` (to obtain a granting AFS ticket).
See further details below.
To manually obtain granting tickets, one has to:
1. To obtain a granting Kerberos ticket, one needs to run `kinit $USER` and enter the PSI password.
```bash
kinit $USER@D.PSI.CH
```
2. To obtain a granting ticket for AFS, one needs to run `aklog`. No password is necessary, but a valid
Kerberos ticket is mandatory.
```bash
aklog
```
3. To list the status of your granted tickets, users can use the `klist` command.
```bash
klist
```
4. To extend the validity of existing granting tickets, users can use the `krenew` command.
```bash
krenew
```
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
and then `kinit` should be used instead.
## Obtanining granting tickets with keytab
Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs
requiring access to private areas in AFS. For that, there's the possibility to generate a **keytab** file.
Be aware that the **keytab** file must be **private**, **fully protected** by correct permissions and not shared with any
other users.
### Creating a keytab file
For generating a **keytab**, one has to:
1. Create a private directory for storing the Kerberos **keytab** file
```bash
mkdir -p ~/.k5
```
2. Run the `ktutil` utility:
```bash
ktutil
```
3. In the `ktutil` console, one has to generate a **keytab** file as follows:
```bash
# Replace $USER by your username
add_entry -password -k 0 -f -p $USER
wkt /data/user/$USER/.k5/krb5.keytab
exit
```
Please note:
* That you will need to add your password once. This step is required for generating the **keytab** file.
* `ktutil`does **not** report an error if you enter a wrong password! You can test with the `kinit` command documented below. If `kinit` fails with an error message like "pre-authentication failed", this is usually due to a wrong password/key in the keytab file. In this case **you have to remove the keytab file** and re-run the `ktutil` command. See "Updating the keytab file" in the section below.
### Updating an existing keytab file
After a password change you have to update your **keytab**:
1. Remove the old **keytab** file
```bash
rm -f ~/.k5/krb5.keytab
```
2. Run the `ktutil` utility:
```bash
ktutil
```
3. In the `ktutil` console, one has to generate a **keytab** file as follows:
```bash
# Replace $USER by your username
add_entry -password -k 0 -f -p $USER
wkt /data/user/$USER/.k5/krb5.keytab
exit
```
### Obtaining tickets by using keytab files
Once the keytab is created, one can obtain kerberos tickets without being prompted for a password as follows:
```bash
kinit -kt ~/.k5/krb5.keytab $USER
aklog
```
## Slurm jobs accessing AFS
Some jobs may require to access private areas in AFS. For that, having a valid [**keytab**](kerberos.md#creating-a-keytab-file) file is required.
Then, from inside the batch script one can obtain granting tickets for Kerberos and AFS, which can be used for accessing AFS private areas.
The steps should be the following:
* Setup `KRB5CCNAME`, which can be used to specify the location of the Kerberos5 credentials (ticket) cache. In general it should point to a shared area
(`$HOME/.k5` is a good location), and is strongly recommended to generate an independent Kerberos5 credential cache (it is, creating a new credential cache per Slurm job):
```bash
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
```
* To obtain a Kerberos5 granting ticket, run `kinit` by using your keytab:
```bash
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
```
* To obtain a granting AFS ticket, run `aklog`:
```bash
aklog
```
* At the end of the job, you can remove destroy existing Kerberos tickets.
```bash
kdestroy
```
### Slurm batch script example: obtaining KRB+AFS granting tickets
#### Example 1: Independent crendetial cache per Slurm job
This is the **recommended** way. At the end of the job, is strongly recommended to remove / destroy the existing kerberos tickets.
```bash
#!/bin/bash
#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'
#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.
#SBATCH --output=run.out # Generate custom output file
#SBATCH --error=run.err # Generate custom error file
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
#SBATCH --cpus-per-task=1
#SBATCH --constraint=xeon-gold-6152
#SBATCH --hint=nomultithread
#SBATCH --job-name=krb5
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
aklog
klist
echo "Here should go my batch script code."
# Destroy Kerberos tickets created for this job only
kdestroy
klist
```
#### Example 2: Shared credential cache
Some users may need/prefer to run with a shared cache file. For doing that, one needs to
setup `KRB5CCNAME` from the **login node** session, before submitting the job.
```bash
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
```
Then, you can run one or multiple jobs scripts (or parallel job with `srun`). `KRB5CCNAME` will be propagated to the
job script or to the parallel job, therefore a single credential cache will be shared amongst different Slurm runs.
```bash
#!/bin/bash
#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'
#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.
#SBATCH --output=run.out # Generate custom output file
#SBATCH --error=run.err # Generate custom error file
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
#SBATCH --cpus-per-task=1
#SBATCH --constraint=xeon-gold-6152
#SBATCH --hint=nomultithread
#SBATCH --job-name=krb5
# KRB5CCNAME is inherit from the login node session
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
srun aklog
echo "Here should go my batch script code."
echo "No need to run 'kdestroy', as it may have to survive for running other jobs"
```

View File

@@ -0,0 +1,99 @@
# Using merlin_rmount
## Background
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
provides a helpful wrapper over the Gnome storage utilities (GIO and GVFS), and provides support for a wide range of remote file formats, including
- SMB/CIFS (Windows shared folders)
- WebDav
- AFP
- FTP, SFTP
- [complete list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
## Usage
### Start a session
First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.
```
$ merlin_rmount --init
[INFO] Starting new D-Bus RMOUNT session
(RMOUNT STARTED) [bliven_s@login002 ~]$
```
Note that behind the scenes this is creating a new dbus daemon. Running multiple daemons on the same login node leads to unpredictable results, so it is best not to initialize multiple sessions in parallel.
### Standard Endpoints
Standard endpoints can be mounted using
```
merlin_rmount --select-mount
```
Select the desired url using the arrow keys.
![merlin_rmount --select-mount](../../images/rmount/select-mount.png)
From this list any of the standard supported endpoints can be mounted.
### Other endpoints
Other endpoints can be mounted using the `merlin_rmount --mount <endpoint>` command.
![merlin_rmount --mount](../../images/rmount/mount.png)
### Accessing Files
After mounting a volume the script will print the mountpoint. It should be of the form
```
/run/user/$UID/gvfs/<endpoint>
```
where `$UID` gives your unix user id (a 5-digit number, also viewable with `id -u`) and
`<endpoint>` is some string generated from the mount options.
For convenience, it may be useful to add a symbolic link for this gvfs directory. For instance, this would allow all volumes to be accessed in ~/mnt/:
```
ln -s ~/mnt /run/user/$UID/gvfs
```
Files are accessible as long as the `merlin_rmount` shell remains open.
### Disconnecting
To disconnect, close the session with one of the following:
- The exit command
- CTRL-D
- Closing the terminal
Disconnecting will unmount all volumes.
## Alternatives
### Thunar
Users that prefer a GUI file browser may prefer the `thunar` command, which opens the Gnome File Browser. This is also available in NoMachine sessions in the bottom bar (1). Thunar supports the same remote filesystems as `merlin_rmount`; just type the URL in the address bar (2).
![Mounting with thunar](../../images/rmount/thunar_mount.png)
When using thunar within a NoMachine session, file transfers continue after closing NoMachine (as long as the NoMachine session stays active).
Files can also be accessed at the command line as needed (see 'Accessing Files' above).
## Resources
- [BIO docs](https://intranet.psi.ch/en/bio/webdav-data) on using these tools for
transfering EM data
- [Redhad docs on GVFS](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8)
- [gio reference](https://developer-old.gnome.org/gio/stable/gio.html)

View File

@@ -0,0 +1,108 @@
---
title: Merlin7 Tools
#tags:
keywords: merlin_quotas
#last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/tools.html
---
## About
We provide tool(s) to help user get the most out of using the cluster. The tools
described here are organised by use case and include usage examples.
## Files and Directories
### `merlin_quotas`
This tool is available on all of the login nodes and provides a brief overview of
a user's filesystem quotas. These are limits which restrict how much storage (or
number of files) a user can create. A generic table of filesystem quotas can be
found on the [Storage page](storage.md#dir_classes).
#### Example #1: Viewing quotas
Simply calling `merlin_quotas` will show you a table of our quotas:
```console
$ merlin_quotas
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
-------------- --------- ---------- ------- --------- ---------- -------
/data/user 30.26G 1T 03% 367296 2097152 18%
└─ <USERNAME>
/afs/psi.ch 3.4G 9.5G 36% 0 0 00%
└─ user/<USERDIR>
/data/project 2.457T 10T 25% 58 2097152 00%
└─ bio/shared
/data/project 338.3G 10T 03% 199391 2097152 10%
└─ bio/hpce
```
!!! tip
You can change the width of the table by either passing `--no-wrap` (to
disable wrapping of the *Path*) or `--width N` (to explicitly set some
width by `N` characters).
#### Example #2: Project view
The tool can also be used to list out information about what projects directories
there are and who owns/manages these:
```console
$ merlin_quotas projects
Project ID Path Owner Group
---------- ------------------------ --------- --------------
600000000 /data/project/bio/shared germann_e unx-merlin_adm
600000001 /data/project/bio/hpce assman_g unx-merlin_adm
```
By default this only shows information on projects that you have access to, but
to view the whole list you can pass `--all` flag:
```console
$ merlin_quotas projects --all
Project ID Path Owner Group
---------- ------------------------------- -------------- -----------------
500000000 /data/project/general/mcnp gac-mcnp unx-mcnp_all
500000001 /data/project/general/vis_as talanov_v unx-vis_as
500000002 /data/project/general/mmm krack org-7302
500000003 /data/project/general laeuch_a org-7201
└─ LTC_CompPhys
600000000 /data/project/bio/shared germann_e unx-merlin_adm
600000001 /data/project/bio/hpce assman_g unx-merlin_adm
600000002 /data/project/bio/abrahams abrahams_j unx-bio_abrahams
600000003 /data/project/bio/benoit benoit_r unx-bio_benoit
600000004 /data/project/bio/ishikawa ishikawa unx-bio_ishikawa
600000005 /data/project/bio/kammerer kammerer_r unx-bio_kammerer
600000006 /data/project/bio/korkhov korkhov_v unx-bio_korkhov
600000007 /data/project/bio/luo luo_j unx-bio_luo
600000008 /data/project/bio/mueller mueller_e unx-bio_mueller
600000009 /data/project/bio/poghosyan poghosyan_e unx-bio_poghosyan
600000010 /data/project/bio/schertler schertler_g unx-bio_schertler
600000011 /data/project/bio/shivashankar shivashankar_g unx-bio_shivashan
600000012 /data/project/bio/standfuss standfuss unx-bio_standfuss
600000013 /data/project/bio/steinmetz steinmetz unx-bio_steinmetz
```
!!! tip
As above you can change the table width by pass either `--no-wrap` or
`--width N`.
#### Example #3: Project config
To make tracking quotas of projects easier, `merlin_quotas` generates a config
file in your home directory which defines the projects to show when you call the
tool (called `~/.merlin_quotas`).
The config file simply contains a list (one per line) of project IDs which should
be tracked. In theory any (or all available projects) can be tracked, but due to
UNIX and Lustre permissions, accessing quotas information for a project you're not
a member of **is not possible**.
If you are added/removed from a project, you can update this config file by
calling `merlin_quotas genconf --force` (notice the `--force`, which will overwrite
your existing config file) or by editing the file by hand (*not recommended*).

View File

@@ -0,0 +1,147 @@
---
title: Remote Desktop Access to Merlin7
keywords: NX, NoMachine, remote desktop access, login node, login001, login002, merlin7-nx-01, merlin7-nx, nx.psi.ch, VPN, browser access
last_updated: 07 August 2024
sidebar: merlin7_sidebar
permalink: /merlin7/nomachine.html
---
## Overview
Merlin7 NoMachine provides users with remote desktop access to the Merlin7 computing environment. This service enables users to connect to their computing resources from any location, whether they are inside the PSI network or accessing from outside via secure methods.
## Accessing Merlin7 NoMachine
### From Inside PSI
If you are inside the PSI network, you can directly connect to the Merlin7 NoMachine service without the need to go through another service.
1. **Ensure Network Connectivity**: Make sure you are connected to the PSI internal network.
2. **Choose Your Access Method**: You can access Merlin7 using either a web browser or the NoMachine client.
#### Method 1: Using a Web Browser
Open your web browser and navigate to [https://merlin7-nx.psi.ch:4443](https://merlin7-nx.psi.ch:4443).
#### Method 2: Using the NoMachine Client
Settings for the NoMachine client:
- **Host**: `merlin7-nx.psi.ch`
- **Port**: `4000`
- **Protocol**: `NX`
- **Authentication**: `Use password authentication`
### From Outside PSI
Users outside the PSI network have two options for accessing the Merlin7 NoMachine service: through `nx.psi.ch` or via a VPN connection.
#### Option 1: Via `nx.psi.ch`
Documentation about the `nx.psi.ch` service can be found [here](https://www.psi.ch/en/photon-science-data-services/remote-desktop-nomachine).
##### Using a Web Browser
Open your web browser and navigate to [https://nx.psi.ch](https://nx.psi.ch).
##### Using the NoMachine Client
Settings for the NoMachine client:
- **Host**: `nx.psi.ch`
- **Port**: `4000`
- **Protocol**: `NX`
- **Authentication**: `Use password authentication`
#### Option 2: Via VPN
Alternatively, you can use a VPN connection to access Merlin7 as if you were inside the PSI network.
1. **Request VPN Access**: Contact the IT department to request VPN access if you do not already have it. Submit a request through the PSI Service Now ticketing system: [VPN Access (PSI employees)](https://psi.service-now.com/psisp?id=psi_new_sc_cat_item&sys_id=beccc01b6f44a200d02a82eeae3ee440).
2. **Connect to the VPN**: Once access is granted, connect to the PSI VPN using your credentials.
3. **Access Merlin7 NoMachine**: Once connected to the VPN, you can access Merlin7 using either a web browser or the NoMachine client as if you were inside the PSI network.
## The NoMachine Client
### Installation
#### Windows
The NoMachine client is available for PSI Windows computers in the Software Kiosk under the name **NX Client**.
#### macOS and Linux
The NoMachine client can be downloaded from [NoMachine's download page](https://downloads.nomachine.com).
### Connection Configuration
1. **Launch NoMachine Client**: Open the NoMachine client on your computer.
2. **Create a New Connection**: Click the **Add** button to create a new connection.
- On the **Address** tab configure:
- **Name**: Enter a name for your connection. This can be anything.
- **Host**: Enter the appropriate hostname (e.g. `merlin7-nx.psi.ch`).
- **Port**: Enter `4000`.
- **Protocol**: Select `NX`.
![Create New NoMachine Connection](../../images/nomachine/screen_nx_address.png)
- On the **Configuration** tab ensure:
- **Authentication**: Select `Use password authentication`.
![Create New NoMachine Connection](../../images/nomachine/screen_nx_configuration.png)
- Click the **Add** button to finish creating the new connection.
## Authenticating
When prompted, use your PSI credentials to authenticate.
![Create New NoMachine Connection](../../images/nomachine/screen_nx_auth.png)
## Managing Sessions
The Merlin7 NoMachine service is managed through a front-end server and back-end nodes, facilitating balanced and efficient access to remote desktop sessions.
### Architecture Overview
- **Front-End Server**: `merlin7-nx.psi.ch`
- Serves as the entry point for users connecting to the NoMachine service.
- Handles load-balancing and directs users to available back-end nodes.
- **Back-End Nodes**:
- `login001.merlin7.psi.ch`
- `login002.merlin7.psi.ch`
- These nodes host the NoMachine desktop service and manage the individual desktop sessions.
Access to the login node desktops must be initiated through the `merlin7-nx.psi.ch` front-end. The front-end service will distribute sessions across available nodes in the back-end, ensuring optimal resource usage.
### Opening NoMachine Desktop Sessions
When connecting to the `merlin7-nx.psi.ch` front-end, a new session automatically opens if no existing session is found. Users can manage their sessions as follows:
- **Reconnect to an Existing Session**: If you have an active session, you can reconnect to it by selecting the appropriate icon in the NoMachine client interface. This allows you to resume work without losing any progress.
![Open an existing Session](../../images/nomachine/screen_nx_single_session.png)
- **Create a Second Session**: If you require a separate session, you can select the **`New Desktop`** button. This option creates a second session on another login node, provided the node is available and operational.
### Session Management Considerations
- **Load Balancing**: The front-end service ensures that sessions are evenly distributed across the available back-end nodes to optimize performance and resource utilization.
- **Session Limits**: Users are limited to one session per back-end node to maintain system stability and efficiency.
## Support and Resources
If you encounter any issues or need further assistance with the Merlin7 NoMachine service, support is available via email. Please contact us at [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists.psi.ch), and our support team will be happy to assist you.
### Advanced Display Settings
NoMachine provides several options to optimize the display settings for better performance and clarity. These settings can be accessed and adjusted when creating a new session or by clicking the top right corner of a running session.
#### Prevent Rescaling
Preventing rescaling can help eliminate "blurriness" in your display, though it may affect performance. Adjust these settings based on your performance needs:
- Display: Choose `Resize remote display` (forces 1:1 pixel sizes)
- Display > Change settings > Quality: Choose medium-best quality
- Display > Change settings > Modify the advanced display settings
- Check: Disable network-adaptive display quality (turns off lossy compression)
- Check: Disable client side image post-processing

View File

@@ -0,0 +1,47 @@
---
title: Software repositories
#tags:
keywords: modules, software, stable, unstable, deprecated, spack, repository, repositories
last_updated: 16 January 2024
summary: "This page contains information about the different software repositories"
sidebar: merlin7_sidebar
permalink: /merlin7/software-repositories.html
---
## Module Systems in Merlin7
Merlin7 provides a modular environment to ensure flexibility, compatibility, and optimized performance.
The system supports three primary module types: PSI Environment Modules (PModules), Spack Modules, and Cray Environment Modules.
### PSI Environment Modules (PModules)
The PModules system, developed by PSI, is the officially supported module system on Merlin7. It is the preferred choice for accessing validated software across a wide range of applications.
Key Features:
* **Expert Deployment:** Each package is deployed and maintained by specific experts to ensure reliability and compatibility.
* **Broad Availability:** Commonly used software, such as OpenMPI, ANSYS, MATLAB, and other, is provided within PModules.
* **Custom Requests:** If a package, version, or feature is missing, users can contact the support team to explore feasibility for installation.
!!! tip
For further information about **PModules** on Merlin7 please refer to the [PSI Modules](../05-Software-Support/pmodules.md) chapter.
### Spack Modules
Merlin7 also provides Spack modules, offering a modern and flexible package management system. Spack supports a wide variety of software packages and versions. For more information, refer to the **external [PSI Spack](https://gitea.psi.ch/HPCE/spack-psi) documentation**.
!!! tip
For further information about **Spack** on Merlin7 please refer to the [Spack](../05-Software-Support/spack.md) chapter.
### Cray Environment Modules
Merlin7 also supports Cray Environment Modules, which include compilers, MPI implementations, and libraries optimized
for Cray systems. However, Cray modules are not recommended as the default choice due to potential backward compatibility
issues when the Cray Programming Environment (CPE) is upgraded to a newer version.
Recommendations:
* **Compiling Software:** Cray modules can be used when optimization for Cray hardware is essential.
* **General Use:** For most applications, prefer PModules, which ensure stability, backward compatibility, and long-term support.
!!! tip
For further information about **CPE** on Merlin7 please refer to the [Cray Modules](../05-Software-Support/cray-module.env.md) chapter.

View File

@@ -0,0 +1,184 @@
---
title: Configuring SSH Keys in Merlin
#tags:
keywords: linux, connecting, client, configuration, SSH, Keys, SSH-Keys, RSA, authorization, authentication
last_updated: 15 Jul 2020
summary: "This document describes how to deploy SSH Keys in Merlin."
sidebar: merlin7_sidebar
permalink: /merlin7/ssh-keys.html
---
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
One example is ANSYS Fluent, which, when used interactively, the way of communication between the GUI and the different nodes
is through the SSH protocol, and the use of SSH Keys is enforced.
## Setting up SSH Keys on Merlin
For security reason, users **must always protect SSH Keys with a passphrase**.
User can check whether a SSH key already exists. These would be placed in the **~/.ssh/** directory. `RSA` encryption
is usually the default one, and files in there would be **`id_rsa`** (private key) and **`id_rsa.pub`** (public key).
```bash
ls ~/.ssh/id*
```
For creating **SSH RSA Keys**, one should:
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
```bash
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
```
3. Configure the SSH client in order to force the usage of the **psi.ch** domain for trusting keys:
```bash
echo "CanonicalizeHostname yes" >> ~/.ssh/config
```
4. Configure further SSH options as follows:
```bash
echo "AddKeysToAgent yes" >> ~/.ssh/config
echo "ForwardAgent yes" >> ~/.ssh/config
```
Other options may be added.
5. Check that your SSH config file contains at least the lines mentioned in steps 3 and 4:
```console
# cat ~/.ssh/config
CanonicalizeHostname yes
AddKeysToAgent yes
ForwardAgent yes
```
## Using the SSH Keys
### Using Authentication Agent in SSH session
By default, when accessing the login node via SSH (with `ForwardAgent=yes`), it will automatically add your
SSH Keys to the authentication agent. Hence, no actions should not be needed by the user. One can configure
`ForwardAgent=yes` as follows:
* **(Recommended)** In your local Linux (workstation, laptop or desktop) add the following line in the
`$HOME/.ssh/config` (or alternatively in `/etc/ssh/ssh_config`) file:
```ssh_config
ForwardAgent yes
```
* Alternatively, on each SSH you can add the option `ForwardAgent=yes` in the SSH command. In example:
```bash
ssh -XY -o ForwardAgent=yes login001.merlin7.psi.ch
```
If `ForwardAgent` is not enabled as shown above, one needs to run the authentication agent and then add your key
to the **ssh-agent**. This must be done once per SSH session, as follows:
* Run `eval $(ssh-agent -s)` to run the **ssh-agent** in that SSH session
* Check whether the authentication agent has your key already added:
```bash
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
```
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
You will be requested for the **passphrase** of your key, and it can be done by running:
```bash
ssh-add
```
### Using Authentication Agent in NoMachine Session
By default, when using a NoMachine session, the `ssh-agent` should be automatically started. Hence, there is no need of
starting the agent or forwarding it.
However, for NoMachine one always need to add the private key identity to the authentication agent. This can be done as follows:
1. Check whether the authentication agent has already the key added:
```bash
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
```
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
You will be requested for the **passphrase** of your key, and it can be done by running:
```bash
ssh-add
```
You just need to run it once per NoMachine session, and it would apply to all terminal windows within that NoMachine session.
## Troubleshooting
### Errors when running 'ssh-add'
If the error `Could not open a connection to your authentication agent.` appears when running `ssh-add`, it means
that the authentication agent is not running. Please follow the previous procedures for starting it.
### Add/Update SSH RSA Key password
If an existing SSH Key does not have password, or you want to update an existing password with a new one, you can do it as follows:
```bash
ssh-keygen -p -f ~/.ssh/id_rsa
```
### SSH Keys deployed but not working
Please ensure proper permissions of the involved files, as well as any typos in the file names involved:
```bash
chmod u+rwx,go-rwx,g+s ~/.ssh
chmod u+rw-x,go-rwx ~/.ssh/authorized_keys
chmod u+rw-x,go-rwx ~/.ssh/id_rsa
chmod u+rw-x,go+r-wx ~/.ssh/id_rsa.pub
```
### Testing SSH Keys
Once SSH Key is created, for testing that the SSH Key is valid, one can do the following:
1. Create a **new** SSH session in one of the login nodes:
```bash
ssh login001
```
2. In the login node session, destroy any existing Kerberos ticket or active SSH Key:
```bash
kdestroy
ssh-add -D
```
3. Add the new private key identity to the authentication agent. You will be requested by the passphrase.
```bash
ssh-add
```
4. Check that your key is active by the SSH agent:
```bash
ssh-add -l
```
4. SSH to the second login node. No password should be requested:
```bash
ssh -vvv login002
```
If the last step succeeds, then means that your SSH Key is properly setup.

View File

@@ -0,0 +1,195 @@
---
title: Merlin7 Storage
#tags:
keywords: storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas
#last_updated: 07 September 2022
#summary: ""
sidebar: merlin7_sidebar
redirect_from: /merlin7/data-directories.html
permalink: /merlin7/storage.html
---
## Introduction
This document describes the different directories of the Merlin7 cluster.
### Backup and data policies
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
!!! warning
When a user leaves PSI and their account is removed, their storage space in
Merlin may be recycled. Hence, **when a user leaves PSI**, they, their
supervisor or team **must ensure that the data is backed up to an external
storage**!
### How to check quotas
Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the `merlin_quotas` command.
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
```console
$ merlin_quotas
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
-------------- --------- ---------- ------- --------- ---------- -------
/data/user 30.26G 1T 03% 367296 2097152 18%
└─ <USERNAME>
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
└─ user/<USERDIR>
/data/scratch 688.9M 2T 00% 368471 0 00%
└─ shared
/data/project 3.373T 11T 31% 425644 2097152 20%
└─ bio/shared
/data/project 4.142T 11T 38% 579596 2097152 28%
└─ bio/hpce
```
!!! note
On first use you will see a message about some configuration being
generated, this is expected. Don't be surprised that it takes some time.
After this using `merlin_quotas` should be faster.
The output shows the quotas set and how much you are using of the quota, for each filesystem that has this set. Notice that some users will have
one or more `/data/project/...` directories showing, depending on whether you are part of a specific PSI research group or project.
The general quota constraints for the different directories are shown in the [table below](#dir_classes). Further details on how to use `merlin_quotas`
can be found on the [Tools page](merlin_tools.md).
!!! tip
If you're interesting, you can retrieve the Lustre-based quota information
directly by calling `lfs quota -h -p $(( 100000000 + $(id -u $USER) ))
/data` directly. Using the `merlin_quotas` command is more convenient and
shows all your relevant filesystem quotas.
## Merlin7 directories
Merlin7 offers the following directory classes for users:
* `/data/user/<username>`: Private user **home** directory
* `/data/project/general`: project directory for Merlin
* `/data/project/bio/$projectname`: project directory for BIO
* `/data/project/mu3e/$projectname`: project directory for Mu3e
* `/data/project/meg/$projectname`: project directory for Mu3e
* `/scratch`: Local *scratch* disk (only visible by the node running a job).
* `/data/scratch/shared`: Shared *scratch* disk (visible from all nodes).
!!! tip
In Lustre there is a concept called **grace time**. Filesystems have a
block (amount of data) and inode (number of files) quota. These quotas
contain a soft and hard limits. Once the soft limit is reached, users can
keep writing up to their hard limit quota during the **grace period**.
Once the **grace time** or hard limit are reached, users will be unable to
write and will need remove data below the soft limit (or ask for a quota
increase when this is possible, see below table).
<a name="dir_classes"></a>Properties of the directory classes:
| Directory | Block Quota [Soft:Hard] | Inode Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Inodes | Backup |
| ---------------------------------- | ----------------------- | ----------------------- | :-------: | :--------------------------------- |:-------------------------------- | ------ |
| /data/user/$username | PRJ [1TB:1.074TB] | PRJ [2M:2.1M] | 7d | Immutable. Need a project. | Changeable when justified. | no |
| /data/project/bio/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
| /data/project/general/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
| /data/scratch/shared | USR [512GB:2TB] | | 7d | Up to x2 when strongly justified. | Changeable when justified. | no |
| /scratch | *Undef* | *Undef* | N/A | N/A | N/A | no |
!!! warning
The use of `/scratch` and `/data/scratch/shared` areas as an extension of
the quota *is forbidden*. The `/scratch` and `/data/scratch/shared` areas
***must not contain*** final data. Keep in mind that ***auto cleanup
policies*** in the `/scratch` and `/data/scratch/shared` areas are applied.
### User home directory
This is the default directory users will land when login in to any Merlin7 machine.
It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.
The home directories are mounted in the login and computing nodes under the directory
```bash
/data/user/$username
```
Directory policies:
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
* Is **forbidden** to use the home directories for IO-intensive tasks, instead use one of the **[scratch](storage.md#scratch-directories)** areas instead!
* No backup policy is applied for the user home directories: **users are responsible for backing up their data**.
Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the `merlin_quotas` command described
[above](storage.md#how-to-check-quotas).
### Project data directory
This storage is intended for keeping large amounts of a project's data, where the data also can be
shared by all members of the project (the project's corresponding UNIX group). We recommend to keep most data in
project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies
regarding extending the available storage space.
Scientists can request a Merlin project space as described in **[[Accessing Merlin -> Requesting a Project]](../01-Quick-Start-Guide/requesting-projects.md)**.
By default, Merlin can offer **general** project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified).
General Merlin projects might need to be reviewed after one year of their creation.
Once a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:
```bash
/data/project/general/$projectname
```
Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
```bash
lfs quota -h -p $projectid /data
```
!!! warning
Checking **quotas** for the Merlin projects is not yet possible. In the
future, a list of `projectid` will be provided, so users can check their
quotas.
Directory policies:
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
* It is **forbidden** to use the data directories as `/scratch` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
* Please Use `/scratch`, `/data/scratch/shared` for this purpose.
* No backups: users are responsible for managing the backups of their data directories.
#### Dedicated project directories
Some departments or divisions have bigger storage space requirements on Merlin7. At present, `bio`, `mu3e` and `meg` are the main ones.
These are mounted under the following paths:
```bash
/data/project/bio
/data/project/mu3e
/data/project/meg
```
They follow the same rules as the general projects, except that they have assigned more space.
### Scratch directories
There are two different types of scratch storage: **local** (`/scratch`) and **shared** (`/data/scratch/shared`).
* **local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially
true for all jobs running on a single node. Mount path:
```bash
/scratch
```
* **shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster
and all tasks need to do I/O on the same temporary files.
```bash
/data/scratch/shared
```
Scratch directories policies:
* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
* Temporary files *must be deleted at the end of the job by the user*.
* Remaining files will be deleted by the system if detected.
* Files not accessed within 28 days will be automatically cleaned up by the system.
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.

View File

@@ -0,0 +1,176 @@
---
title: Transferring Data
#tags:
keywords: transferring data, data transfer, rsync, winscp, copy data, copying, sftp, import, export, hop, vpn
last_updated: 24 August 2023
#summary: ""
sidebar: merlin7_sidebar
permalink: /merlin7/transfer-data.html
---
## Overview
Most data transfer methods support both sending and receiving, so you may initiate the transfer from either **Merlin** or the other system — depending on **network visibility**.
- **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
- **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
- HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
- Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
- **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
> SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**.
> * However, **transfers from any PSI host to Merlin7 using port 22 are allowed**.
>
> Port `21` is also available for FTP transfers from PSI to Merlin7.
### Choosing the best transfer method
| **Scenario** | **Recommended Method** | **Reason** |
| ------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- |
| Small dataset, Linux/macOS | `rsync` | Resume support, skips existing files, works over SSH |
| Quick one-time small transfer | `scp` | Simple syntax, no need to install extra tools |
| Large dataset, high speed needed (not sensitive) | FTP via `service03.merlin7.psi.ch` | Fastest transfer speed (unencrypted data channel) |
| Large dataset, high speed needed (sensitive data) | FTP via `ftp-encrypted.merlin7.psi.ch` | Encrypted control & data channels for security, but slower than `service03` |
| Windows interactive GUI transfer | WinSCP | User-friendly interface, PSI Software Kiosk, supports drag-and-drop |
| Cross-platform interactive GUI transfer | FileZilla | User-friendly interface, works on Linux/macOS/Windows, supports drag-and-drop |
| From the internet to PSI | [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) | Supports SSH-based protocols and Globus |
| Need for sharing large files | [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload) | Supports sharing large file and expiration date |
| PSI -> Merlin7 over FTP | Any FTP-based client | Port 21 allowed from PSI to Merlin7 |
| PSI -> Merlin7 over SSH | Any SSH-based method | Port 22 allowed from PSI to Merlin7 |
The next chapters contain detailed information about the different transfer methods available on Merlin7.
## Direct Transfer via Merlin7 Login Nodes
The following methods transfer data directly via the [login nodes](../01-Quick-Start-Guide/accessing-interactive-nodes.md#login-nodes-hardware-description). They are suitable for use from **within the PSI network**.
### Rsync (Recommended for Linux/macOS)
Rsync is the **preferred** method for small datasets from Linux/macOS systems. It supports **resuming interrupted transfers** and **skips already transferred files**. Syntax:
```bash
rsync -avAHXS <src> <dst>
```
**An example** for transferring local files to a Merlin project directory
```bash
rsync -avAHXS ~/localdata $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
```
!!! tip
If a transfer is interrupted, just rerun the command: `rsync` will skip existing files.
!!! warning
Rsync uses SSH (port 22). For large datasets, transfer speed might be limited.
### SCP
SCP works similarly to `rsync` but **does not support resuming** interrupted transfers. It may be used for quick one-off transfers. Example:
```bash
scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
```
### Secure FTP
A `vsftpd` service is available on the login nodes, providing high-speed transfers. Choose the server based on your **speed vs. encryption** needs:
* **`login001.merlin7.psi.ch`:** Encrypted control & data channels.
**Use if your data is sensitive**. **Slower**, but secure.
* **`service03.merlin7.psi.ch`**: Encrypted control channel only.
Use if your data can be transferred unencrypted. **Fastest** method.
!!! tip
The **control channel** is always **encrypted**, therefore, authentication is encrypted and secured.
## UI-based Clients for Data Transfer
### WinSCP (Windows)
Available in the **Software Kiosk** on PSI Windows machines.
* Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to:
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
* Drag and drop files between your PC and Merlin.
* FTP (port 21)
### FileZilla (Linux/MacOS/Windows)
Download from [FileZilla Project](https://filezilla-project.org/), or install from your Linux software repositories if available.
* Using your PSI credentials, connect to
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
* when using port 21, connect to:
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
* Supports drag-and-drop file transfers.
## Sharing Files with SWITCHfilesender
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is a Swiss-hosted installation of the [FileSender](https://filesender.org/) project — a web-based application that allows authenticated users to securely and easily send **arbitrarily large files** to other users. Features:
- **Secure large file transfers:** Send files that exceed normal email attachment limits.
- **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads.
- **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account.
- **Designed for research & education:** Developed to meet the needs of universities and research institutions.
About the authentication:
- It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
- It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**.
- PSI employees can log in using their PSI account:
1. Open [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload).
2. Select **PSI** as the institution.
3. Authenticate with your PSI credentials.
The service is designed to **send large files for temporary availability**, not as a permanent publishing platform. Typical use case:
1. Upload a file.
2. Share the download link with a recipient.
3. File remains available until the specified **expiration date** is reached, or the **download limit** is reached.
4. The file is **automatically deleted** after expiration.
!!! warning
SWITCHfilesender **is not** a long-term storage or archiving solution.
## PSI Data Transfer
From August 2024, Merlin is connected to the **[PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer)** service,
`datatransfer.psi.ch`. This is a central service managed by the **[Linux team](https://linux.psi.ch/index.html)**. However, any problems or questions related to it can be directly
[reported](../99-support/contact.md) to the Merlin administrators, which will forward the request if necessary.
The PSI Data Transfer servers supports the following protocols:
* Data Transfer - SSH (scp / rsync)
* Data Transfer - Globus
Notice that `datatransfer.psi.ch` does not allow SSH login, only `rsync`, `scp` and [Globus](https://www.globus.org/) access is allowed.
Access to the PSI Data Transfer uses ***Multi factor authentication*** (MFA).
Therefore, having the Microsoft Authenticator App is required as explained [here](https://www.psi.ch/en/computing/change-to-mfa).
!!! tip
Please follow the [Official PSI Data
Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer)
documentation for further instructions.
## Connecting to Merlin7 from outside PSI
Merlin7 is fully accessible from within the PSI network. To connect from outside you can use:
- [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
- [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
* Please avoid transferring big amount data through **hop**
- [No Machine](nomachine.md)
* Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
* Please avoid transferring big amount of data through **NoMachine**
{% comment %}
## Connecting from Merlin7 to outside file shares
### `merlin_rmount` command
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including
- SMB/CIFS (Windows shared folders)
- WebDav
- AFP
- FTP, SFTP
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
[More instruction on using `merlin_rmount`](merlin-rmount.md)
{% endcomment %}