first stab at mkdocs migration
This commit is contained in:
379
docs/merlin7/02-How-To-Use-Merlin/archive.md
Normal file
379
docs/merlin7/02-How-To-Use-Merlin/archive.md
Normal file
@@ -0,0 +1,379 @@
|
||||
---
|
||||
title: Archive & PSI Data Catalog
|
||||
#tags:
|
||||
keywords: linux, archive, data catalog, archiving, lts, tape, long term storage, ingestion, datacatalog
|
||||
last_updated: 31 January 2020
|
||||
summary: "This document describes how to use the PSI Data Catalog for archiving Merlin7 data."
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/archive.html
|
||||
---
|
||||
|
||||
## PSI Data Catalog as a PSI Central Service
|
||||
|
||||
PSI provides access to the ***Data Catalog*** for **long-term data storage and retrieval**. Data is
|
||||
stored on the ***PetaByte Archive*** at the **Swiss National Supercomputing Centre (CSCS)**.
|
||||
|
||||
The Data Catalog and Archive is suitable for:
|
||||
|
||||
* Raw data generated by PSI instruments
|
||||
* Derived data produced by processing some inputs
|
||||
* Data required to reproduce PSI research and publications
|
||||
|
||||
The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management.
|
||||
In accordance with this policy, ***data will be publicly released under CC-BY-SA 4.0 after an
|
||||
embargo period expires.***
|
||||
|
||||
The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the
|
||||
Merlin storage under the ``/data`` directories (currentlyi, ``/data/user`` and ``/data/project``).
|
||||
Archiving from other directories is also possible, however the process is much slower as data
|
||||
can not be directly retrieved by the PSI archive central servers (**central mode**), and needs to
|
||||
be indirectly copied to these (**decentral mode**).
|
||||
|
||||
Archiving can be done from any node accessible by the users (usually from the login nodes).
|
||||
|
||||
{{site.data.alerts.tip}} Archiving can be done in two different ways:
|
||||
<br>
|
||||
<b>'Central mode':</b> Possible for the user and project data directories, is the
|
||||
fastest way as it does not require remote copy (data is directly retreived by central AIT servers from Merlin
|
||||
through 'merlin-archive.psi.ch').
|
||||
<br>
|
||||
<br>
|
||||
<b>'Decentral mode':</b> Possible for any directory, is the slowest way of archiving as it requires
|
||||
to copy ('rsync') the data from Merlin to the central AIT servers.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
## Procedure
|
||||
|
||||
### Overview
|
||||
|
||||
Below are the main steps for using the Data Catalog.
|
||||
|
||||
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
|
||||
* Prepare a metadata file describing the dataset
|
||||
* Run **``datasetIngestor``** script
|
||||
* If necessary, the script will copy the data to the PSI archive servers
|
||||
* Usually this is necessary when archiving from directories other than **``/data/user``** or
|
||||
**``/data/project``**. It would be also necessary when the Merlin export server (**``merlin-archive.psi.ch``**)
|
||||
is down for any reason.
|
||||
* Archive the dataset:
|
||||
* Visit [https://discovery.psi.ch](https://discovery.psi.ch)
|
||||
* Click **``Archive``** for the dataset
|
||||
* The system will now copy the data to the PetaByte Archive at CSCS
|
||||
* Retrieve data from the catalog:
|
||||
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **``Retrieve``**
|
||||
* Wait for the data to be copied to the PSI retrieval system
|
||||
* Run **``datasetRetriever``** script
|
||||
|
||||
Since large data sets may take a lot of time to transfer, some steps are designed to happen in the
|
||||
background. The discovery website can be used to track the progress of each step.
|
||||
|
||||
### Account Registration
|
||||
|
||||
Two types of account permit access to the Data Catalog. If your data was collected at a ***beamline***, you may
|
||||
have been assigned a **``p-group``** (e.g. ``p12345``) for the experiment. Other users are assigned **``a-group``**
|
||||
(e.g. ``a-12345``).
|
||||
|
||||
Groups are usually assigned to a PI, and then individual user accounts are added to the group. This must be done
|
||||
under user request through PSI Service Now. For existing **a-groups** and **p-groups**, you can follow the standard
|
||||
central procedures. Alternatively, if you do not know how to do that, follow the Merlin7
|
||||
**[Requesting extra Unix groups](/merlin7/request-account.html#requesting-extra-unix-groups)** procedure, or open
|
||||
a **[PSI Service Now](https://psi.service-now.com/psisp)** ticket.
|
||||
|
||||
### Documentation
|
||||
|
||||
Accessing the Data Catalog is done through the [SciCat software](https://melanie.gitpages.psi.ch/SciCatPages/).
|
||||
Documentation is here: [ingestManual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html).
|
||||
|
||||
#### Loading datacatalog tools
|
||||
|
||||
The latest datacatalog software is maintained in the PSI module system. To access it from the Merlin systems, run the following command:
|
||||
|
||||
```bash
|
||||
module load datacatalog
|
||||
```
|
||||
|
||||
It can be done from any host in the Merlin cluster accessible by users. Usually, login nodes will be the nodes used for archiving.
|
||||
|
||||
### Finding your token
|
||||
|
||||
As of 2022-04-14 a secure token is required to interact with the data catalog. This is a long random string that replaces the previous user/password authentication (allowing access for non-PSI use cases). **This string should be treated like a password and not shared.**
|
||||
|
||||
1. Go to discovery.psi.ch
|
||||
1. Click 'Sign in' in the top right corner. Click the 'Login with PSI account' and log in on the PSI login1. page.
|
||||
1. You should be redirected to your user settings and see a 'User Information' section. If not, click on1. your username in the top right and choose 'Settings' from the menu.
|
||||
1. Look for the field 'Catamel Token'. This should be a 64-character string. Click the icon to copy the1. token.
|
||||
|
||||

|
||||
|
||||
You will need to save this token for later steps. To avoid including it in all the commands, I suggest saving it to an environmental variable (Linux):
|
||||
|
||||
```
|
||||
$ SCICAT_TOKEN=RqYMZcqpqMJqluplbNYXLeSyJISLXfnkwlfBKuvTSdnlpKkU
|
||||
```
|
||||
|
||||
(Hint: prefix this line with a space to avoid saving the token to your bash history.)
|
||||
|
||||
Tokens expire after 2 weeks and will need to be fetched from the website again.
|
||||
|
||||
### Ingestion
|
||||
|
||||
The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called
|
||||
**``metadata.json``**, and can be created with a text editor (e.g. *``vim``*). It can in principle be saved anywhere,
|
||||
but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata'
|
||||
section below. An example follows:
|
||||
|
||||
```yaml
|
||||
{
|
||||
"principalInvestigator": "albrecht.gessler@psi.ch",
|
||||
"creationLocation": "/PSI/EMF/JEOL2200FS",
|
||||
"dataFormat": "TIFF+LZW Image Stack",
|
||||
"sourceFolder": "/gpfs/group/LBR/pXXX/myimages",
|
||||
"owner": "Wilhelm Tell",
|
||||
"ownerEmail": "wilhelm.tell@psi.ch",
|
||||
"type": "raw",
|
||||
"description": "EM micrographs of amygdalin",
|
||||
"ownerGroup": "a-12345",
|
||||
"scientificMetadata": {
|
||||
"description": "EM micrographs of amygdalin",
|
||||
"sample": {
|
||||
"name": "Amygdalin beta-glucosidase 1",
|
||||
"uniprot": "P29259",
|
||||
"species": "Apple"
|
||||
},
|
||||
"dataCollection": {
|
||||
"date": "2018-08-01"
|
||||
},
|
||||
"microscopeParameters": {
|
||||
"pixel size": {
|
||||
"v": 0.885,
|
||||
"u": "A"
|
||||
},
|
||||
"voltage": {
|
||||
"v": 200,
|
||||
"u": "kV"
|
||||
},
|
||||
"dosePerFrame": {
|
||||
"v": 1.277,
|
||||
"u": "e/A2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
It is recommended to use the [ScicatEditor](https://bliven_s.gitpages.psi.ch/SciCatEditor/) for creating metadata files. This is a browser-based tool specifically for ingesting PSI data. Using the tool avoids syntax errors and provides templates for common data sets and options. The finished JSON file can then be downloaded to merlin or copied into a text editor.
|
||||
|
||||
Another option is to use the SciCat graphical interface from NoMachine. This provides a graphical interface for selecting data to archive. This is particularly useful for data associated with a DUO experiment and p-group. Type `SciCat`` to get started after loading the `datacatalog`` module. The GUI also replaces the the command-line ingestion described below.
|
||||
|
||||
The following steps can be run from wherever you saved your ``metadata.json``. First, perform a "dry-run" which will check the metadata for errors:
|
||||
|
||||
```bash
|
||||
datasetIngestor --token $SCICAT_TOKEN metadata.json
|
||||
```
|
||||
|
||||
It will ask for your PSI credentials and then print some info about the data to be ingested. If there are no errors, proceed to the real ingestion:
|
||||
|
||||
```bash
|
||||
datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json
|
||||
```
|
||||
|
||||
You will be asked whether you want to copy the data to the central system:
|
||||
|
||||
* If you are on the Merlin cluster and you are archiving data from ``/data/user`` or ``/data/project``, answer 'no' since the data catalog can
|
||||
directly read the data.
|
||||
* If you are on a directory other than ``/data/user`` and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets
|
||||
to the PSI archive system may take quite a while (minutes to hours).
|
||||
|
||||
If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data.
|
||||
This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so
|
||||
this process may take several days, and it will fail if any modifications are detected.
|
||||
|
||||
If using the ``--autoarchive`` option as suggested above, your dataset should now be in the queue. Check the data catalog:
|
||||
[https://discovery.psi.ch](https://discovery.psi.ch). Your job should have status 'WorkInProgress'. You will receive an email when the ingestion
|
||||
is complete.
|
||||
|
||||
If you didn't use ``--autoarchive``, you need to manually move the dataset into the archive queue. From **discovery.psi.ch**, navigate to the 'Archive'
|
||||
tab. You should see the newly ingested dataset. Check the dataset and click **``Archive``**. You should see the status change from **``datasetCreated``** to
|
||||
**``scheduleArchiveJob``**. This indicates that the data is in the process of being transferred to CSCS.
|
||||
|
||||
After a few days the dataset's status will change to **``datasetOnAchive``** indicating the data is stored. At this point it is safe to delete the data.
|
||||
|
||||
#### Useful commands
|
||||
|
||||
Running the datasetIngestor in dry mode (**without** ``--ingest``) finds most errors. However, it is sometimes convenient to find potential errors
|
||||
yourself with simple unix commands.
|
||||
|
||||
Find problematic filenames
|
||||
|
||||
```bash
|
||||
find . -iregex '.*/[^/]*[^a-zA-Z0-9_ ./-][^/]*'=
|
||||
```
|
||||
|
||||
Find broken links
|
||||
|
||||
```bash
|
||||
find -L . -type l
|
||||
```
|
||||
|
||||
Find outside links
|
||||
|
||||
```bash
|
||||
find . -type l -exec bash -c 'realpath --relative-base "`pwd`" "$0" 2>/dev/null |egrep "^[./]" |sed "s|^|$0 ->|" ' '{}' ';'
|
||||
```
|
||||
|
||||
Delete certain files (use with caution)
|
||||
|
||||
```bash
|
||||
# Empty directories
|
||||
find . -type d -empty -delete
|
||||
# Backup files
|
||||
find . -name '*~' -delete
|
||||
find . -name '*#autosave#' -delete
|
||||
```
|
||||
|
||||
#### Troubleshooting & Known Bugs
|
||||
|
||||
* The following message can be safely ignored:
|
||||
|
||||
```bash
|
||||
key_cert_check_authority: invalid certificate
|
||||
Certificate invalid: name is not a listed principal
|
||||
```
|
||||
It indicates that no kerberos token was provided for authentication. You can avoid the warning by first running kinit (PSI linux systems).
|
||||
|
||||
* For decentral ingestion cases, the copy step is indicated by a message ``Running [/usr/bin/rsync -e ssh -avxz ...``. It is expected that this
|
||||
step will take a long time and may appear to have hung. You can check what files have been successfully transfered using rsync:
|
||||
|
||||
```bash
|
||||
rsync --list-only user_n@pb-archive.psi.ch:archive/UID/PATH/
|
||||
```
|
||||
|
||||
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
|
||||
|
||||
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
|
||||
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
|
||||
|
||||
```
|
||||
tar -f [output].tar [srcdir]
|
||||
```
|
||||
|
||||
Uncompressed data can be compressed on the cluster using the following command:
|
||||
|
||||
```
|
||||
sbatch /data/software/Slurm/Utilities/Parallel_TarGz.batch -s [srcdir] -t [output].tar -n
|
||||
```
|
||||
|
||||
Run /data/software/Slurm/Utilities/Parallel_TarGz.batch -h for more details and options.
|
||||
|
||||
#### Sample ingestion output (datasetIngestor 1.1.11)
|
||||
<details>
|
||||
<summary>[Show Example]: Sample ingestion output (datasetIngestor 1.1.11)</summary>
|
||||
<pre class="terminal code highlight js-syntax-highlight plaintext" lang="plaintext" markdown="false">
|
||||
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
|
||||
2019/11/06 11:04:43 Latest version: 1.1.11
|
||||
|
||||
|
||||
2019/11/06 11:04:43 Your version of this program is up-to-date
|
||||
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
|
||||
2019/11/06 11:04:43 Your username:
|
||||
user_n
|
||||
2019/11/06 11:04:48 Your password:
|
||||
2019/11/06 11:04:52 User authenticated: XXX
|
||||
2019/11/06 11:04:52 User is member in following a or p groups: XXX
|
||||
2019/11/06 11:04:52 OwnerGroup information a-XXX verified successfully.
|
||||
2019/11/06 11:04:52 contactEmail field added: XXX
|
||||
2019/11/06 11:04:52 Scanning files in dataset /data/project/bio/myproject/archive
|
||||
2019/11/06 11:04:52 No explicit filelistingPath defined - full folder /data/project/bio/myproject/archive is used.
|
||||
2019/11/06 11:04:52 Source Folder: /data/project/bio/myproject/archive at /data/project/bio/myproject/archive
|
||||
2019/11/06 11:04:57 The dataset contains 100000 files with a total size of 50000000000 bytes.
|
||||
2019/11/06 11:04:57 creationTime field added: 2019-07-29 18:47:08 +0200 CEST
|
||||
2019/11/06 11:04:57 endTime field added: 2019-11-06 10:52:17.256033 +0100 CET
|
||||
2019/11/06 11:04:57 license field added: CC BY-SA 4.0
|
||||
2019/11/06 11:04:57 isPublished field added: false
|
||||
2019/11/06 11:04:57 classification field added: IN=medium,AV=low,CO=low
|
||||
2019/11/06 11:04:57 Updated metadata object:
|
||||
{
|
||||
"accessGroups": [
|
||||
"XXX"
|
||||
],
|
||||
"classification": "IN=medium,AV=low,CO=low",
|
||||
"contactEmail": "XXX",
|
||||
"creationLocation": "XXX",
|
||||
"creationTime": "2019-07-29T18:47:08+02:00",
|
||||
"dataFormat": "XXX",
|
||||
"description": "XXX",
|
||||
"endTime": "2019-11-06T10:52:17.256033+01:00",
|
||||
"isPublished": false,
|
||||
"license": "CC BY-SA 4.0",
|
||||
"owner": "XXX",
|
||||
"ownerEmail": "XXX",
|
||||
"ownerGroup": "a-XXX",
|
||||
"principalInvestigator": "XXX",
|
||||
"scientificMetadata": {
|
||||
...
|
||||
},
|
||||
"sourceFolder": "/data/project/bio/myproject/archive",
|
||||
"type": "raw"
|
||||
}
|
||||
2019/11/06 11:04:57 Running [/usr/bin/ssh -l user_n pb-archive.psi.ch test -d /data/project/bio/myproject/archive].
|
||||
key_cert_check_authority: invalid certificate
|
||||
Certificate invalid: name is not a listed principal
|
||||
user_n@pb-archive.psi.ch's password:
|
||||
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
|
||||
The data must first be copied to a rsync cache server.
|
||||
|
||||
|
||||
2019/11/06 11:05:04 Do you want to continue (Y/n)?
|
||||
Y
|
||||
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
|
||||
2019/11/06 11:05:09 The dataset contains 108057 files.
|
||||
2019/11/06 11:05:10 Created file block 0 from file 0 to 1000 with total size of 413229990 bytes
|
||||
2019/11/06 11:05:10 Created file block 1 from file 1000 to 2000 with total size of 416024000 bytes
|
||||
2019/11/06 11:05:10 Created file block 2 from file 2000 to 3000 with total size of 416024000 bytes
|
||||
2019/11/06 11:05:10 Created file block 3 from file 3000 to 4000 with total size of 416024000 bytes
|
||||
...
|
||||
2019/11/06 11:05:26 Created file block 105 from file 105000 to 106000 with total size of 416024000 bytes
|
||||
2019/11/06 11:05:27 Created file block 106 from file 106000 to 107000 with total size of 416024000 bytes
|
||||
2019/11/06 11:05:27 Created file block 107 from file 107000 to 108000 with total size of 850195143 bytes
|
||||
2019/11/06 11:05:27 Created file block 108 from file 108000 to 108057 with total size of 151904903 bytes
|
||||
2019/11/06 11:05:27 short dataset id: 0a9fe316-c9e7-4cc5-8856-e1346dd31e31
|
||||
2019/11/06 11:05:27 Running [/usr/bin/rsync -e ssh -avxz /data/project/bio/myproject/archive/ user_n@pb-archive.psi.ch:archive
|
||||
/0a9fe316-c9e7-4cc5-8856-e1346dd31e31/data/project/bio/myproject/archive].
|
||||
key_cert_check_authority: invalid certificate
|
||||
Certificate invalid: name is not a listed principal
|
||||
user_n@pb-archive.psi.ch's password:
|
||||
Permission denied, please try again.
|
||||
user_n@pb-archive.psi.ch's password:
|
||||
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
|
||||
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
|
||||
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
|
||||
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
|
||||
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
|
||||
...
|
||||
2019/11/06 12:05:08 Successfully updated {"pid":"12.345.67890/12345678-1234-1234-1234-123456789012",...}
|
||||
2019/11/06 12:05:08 Submitting Archive Job for the ingested datasets.
|
||||
2019/11/06 12:05:08 Job response Status: okay
|
||||
2019/11/06 12:05:08 A confirmation email will be sent to XXX
|
||||
12.345.67890/12345678-1234-1234-1234-123456789012
|
||||
</pre>
|
||||
</details>
|
||||
|
||||
### Publishing
|
||||
|
||||
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch.
|
||||
|
||||
For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8).
|
||||
|
||||
### Retrieving data
|
||||
|
||||
Retrieving data from the archive is also initiated through the Data Catalog. Please read the ['Retrieve' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-6).
|
||||
|
||||
## Further Information
|
||||
|
||||
* [PSI Data Catalog](https://discovery.psi.ch)
|
||||
* [Full Documentation](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html)
|
||||
* [Published Datasets (doi.psi.ch)](https://doi.psi.ch)
|
||||
* Data Catalog [PSI page](https://www.psi.ch/photon-science-data-services/data-catalog-and-archive)
|
||||
* Data catalog [SciCat Software](https://scicatproject.github.io/)
|
||||
* [FAIR](https://www.nature.com/articles/sdata201618) definition and [SNF Research Policy](http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx#FAIR%20Data%20Principles%20for%20Research%20Data%20Management)
|
||||
* [Petabyte Archive at CSCS](https://www.cscs.ch/fileadmin/user_upload/contents_publications/annual_reports/AR2017_Online.pdf)
|
||||
48
docs/merlin7/02-How-To-Use-Merlin/connect-from-linux.md
Normal file
48
docs/merlin7/02-How-To-Use-Merlin/connect-from-linux.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: Connecting from a Linux Client
|
||||
#tags:
|
||||
keywords: linux, connecting, client, configuration, SSH, X11
|
||||
last_updated: 07 September 2022
|
||||
summary: "This document describes a recommended setup for a Linux client."
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/connect-from-linux.html
|
||||
---
|
||||
|
||||
## SSH without X11 Forwarding
|
||||
|
||||
This is the standard method. Official X11 support is provided through [NoMachine](/merlin7/nomachine.html).
|
||||
For normal SSH sessions, use your SSH client as follows:
|
||||
|
||||
```bash
|
||||
ssh $username@login001.merlin7.psi.ch
|
||||
ssh $username@login002.merlin7.psi.ch
|
||||
```
|
||||
|
||||
## SSH with X11 Forwarding
|
||||
|
||||
Official X11 Forwarding support is through NoMachine. Please follow the document
|
||||
[{Job Submission -> Interactive Jobs}](/merlin7/interactive-jobs.html#Requirements) and
|
||||
[{Accessing Merlin -> NoMachine}](/merlin7/nomachine.html) for more details. However,
|
||||
we provide a small recipe for enabling X11 Forwarding in Linux.
|
||||
|
||||
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
|
||||
to implicitly add ``-X`` to all ssh connections:
|
||||
|
||||
```bash
|
||||
ForwardAgent yes
|
||||
ForwardX11Trusted yes
|
||||
```
|
||||
|
||||
* Alternatively, you can add the option ``-Y`` to the ``ssh`` command. In example:
|
||||
|
||||
```bash
|
||||
ssh -X $username@login001.merlin7.psi.ch
|
||||
ssh -X $username@login002.merlin7.psi.ch
|
||||
```
|
||||
|
||||
* For testing that X11 forwarding works, just run ``sview``. A X11 based slurm view of the cluster should
|
||||
popup in your client session:
|
||||
|
||||
```bash
|
||||
sview
|
||||
```
|
||||
58
docs/merlin7/02-How-To-Use-Merlin/connect-from-macos.md
Normal file
58
docs/merlin7/02-How-To-Use-Merlin/connect-from-macos.md
Normal file
@@ -0,0 +1,58 @@
|
||||
---
|
||||
title: Connecting from a MacOS Client
|
||||
#tags:
|
||||
keywords: MacOS, mac os, mac, connecting, client, configuration, SSH, X11
|
||||
last_updated: 07 September 2022
|
||||
summary: "This document describes a recommended setup for a MacOS client."
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/connect-from-macos.html
|
||||
---
|
||||
|
||||
## SSH without X11 Forwarding
|
||||
|
||||
This is the standard method. Official X11 support is provided through [NoMachine](/merlin7/nomachine.html).
|
||||
For normal SSH sessions, use your SSH client as follows:
|
||||
|
||||
```bash
|
||||
ssh $username@login001.merlin7.psi.ch
|
||||
ssh $username@login002.merlin7.psi.ch
|
||||
```
|
||||
|
||||
## SSH with X11 Forwarding
|
||||
|
||||
### Requirements
|
||||
|
||||
For running SSH with X11 Forwarding in MacOS, one needs to have a X server running in MacOS.
|
||||
The official X Server for MacOS is **[XQuartz](https://www.xquartz.org/)**. Please ensure
|
||||
you have it running before starting a SSH connection with X11 forwarding.
|
||||
|
||||
### SSH with X11 Forwarding in MacOS
|
||||
|
||||
Official X11 support is through NoMachine. Please follow the document
|
||||
[{Job Submission -> Interactive Jobs}](/merlin7/interactive-jobs.html#Requirements) and
|
||||
[{Accessing Merlin -> NoMachine}](/merlin7/nomachine.html) for more details. However,
|
||||
we provide a small recipe for enabling X11 Forwarding in MacOS.
|
||||
|
||||
* Ensure that **[XQuartz](https://www.xquartz.org/)** is installed and running in your MacOS.
|
||||
|
||||
* For enabling client X11 forwarding, add the following to the start of ``~/.ssh/config``
|
||||
to implicitly add ``-X`` to all ssh connections:
|
||||
|
||||
```bash
|
||||
ForwardAgent yes
|
||||
ForwardX11Trusted yes
|
||||
```
|
||||
|
||||
* Alternatively, you can add the option ``-Y`` to the ``ssh`` command. In example:
|
||||
|
||||
```bash
|
||||
ssh -X $username@login001.merlin7.psi.ch
|
||||
ssh -X $username@login002.merlin7.psi.ch
|
||||
```
|
||||
|
||||
* For testing that X11 forwarding works, just run ``sview``. A X11 based slurm view of the cluster should
|
||||
popup in your client session.
|
||||
|
||||
```bash
|
||||
sview
|
||||
```
|
||||
47
docs/merlin7/02-How-To-Use-Merlin/connect-from-windows.md
Normal file
47
docs/merlin7/02-How-To-Use-Merlin/connect-from-windows.md
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: Connecting from a Windows Client
|
||||
keywords: microsoft, mocosoft, windows, putty, xming, connecting, client, configuration, SSH, X11
|
||||
last_updated: 07 September 2022
|
||||
summary: "This document describes a recommended setup for a Windows client."
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/connect-from-windows.html
|
||||
---
|
||||
|
||||
## SSH with PuTTY without X11 Forwarding
|
||||
|
||||
PuTTY is one of the most common tools for SSH.
|
||||
|
||||
Check, if the following software packages are installed on the Windows workstation by
|
||||
inspecting the *Start* menu (hint: use the *Search* box to save time):
|
||||
* PuTTY (should be already installed)
|
||||
* *[Optional]* Xming (needed for [SSH with X11 Forwarding](/merlin7/connect-from-windows.html#ssh-with-x11-forwarding))
|
||||
|
||||
If they are missing, you can install them using the Software Kiosk icon on the Desktop.
|
||||
|
||||
1. Start PuTTY
|
||||
|
||||
2. *[Optional]* Enable ``xterm`` to have similar mouse behavour as in Linux:
|
||||
|
||||

|
||||
|
||||
3. Create session to a Merlin login node and *Open*:
|
||||
|
||||

|
||||
|
||||
|
||||
## SSH with PuTTY with X11 Forwarding
|
||||
|
||||
Official X11 Forwarding support is through NoMachine. Please follow the document
|
||||
[{Job Submission -> Interactive Jobs}](/merlin7/interactive-jobs.html#Requirements) and
|
||||
[{Accessing Merlin -> NoMachine}](/merlin7/nomachine.html) for more details. However,
|
||||
we provide a small recipe for enabling X11 Forwarding in Windows.
|
||||
|
||||
Check, if the **Xming** is installed on the Windows workstation by inspecting the
|
||||
*Start* menu (hint: use the *Search* box to save time). If missing, you can install it by
|
||||
using the Software Kiosk icon (should be located on the Desktop).
|
||||
|
||||
1. Ensure that a X server (**Xming**) is running. Otherwise, start it.
|
||||
|
||||
2. Enable X11 Forwarding in your SSH client. In example, for Putty:
|
||||
|
||||

|
||||
230
docs/merlin7/02-How-To-Use-Merlin/kerberos.md
Normal file
230
docs/merlin7/02-How-To-Use-Merlin/kerberos.md
Normal file
@@ -0,0 +1,230 @@
|
||||
---
|
||||
title: Kerberos and AFS authentication
|
||||
#tags:
|
||||
keywords: kerberos, AFS, kinit, klist, keytab, tickets, connecting, client, configuration, slurm
|
||||
last_updated: 07 September 2022
|
||||
summary: "This document describes how to use Kerberos."
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/kerberos.html
|
||||
---
|
||||
|
||||
Projects and users have their own areas in the central PSI AFS service. In order
|
||||
to access to these areas, valid Kerberos and AFS tickets must be granted.
|
||||
|
||||
These tickets are automatically granted when accessing through SSH with
|
||||
username and password. Alternatively, one can get a granting ticket with the `kinit` (Kerberos)
|
||||
and `aklog` (AFS ticket, which needs to be run after `kinit`) commands.
|
||||
|
||||
Due to PSI security policies, the maximum lifetime of the ticket is 7 days, and the default
|
||||
time is 10 hours. It means than one needs to constantly renew (`krenew` command) the existing
|
||||
granting tickets, and their validity can not be extended longer than 7 days. At this point,
|
||||
one needs to obtain new granting tickets.
|
||||
|
||||
## Obtaining granting tickets with username and password
|
||||
|
||||
As already described above, the most common use case is to obtain Kerberos and AFS granting tickets
|
||||
by introducing username and password:
|
||||
|
||||
* When login to Merlin through SSH protocol, if this is done with username + password authentication,
|
||||
tickets for Kerberos and AFS will be automatically obtained.
|
||||
* When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to
|
||||
run `kinit` (to obtain a granting Kerberos ticket) followed by `aklog` (to obtain a granting AFS ticket).
|
||||
See further details below.
|
||||
|
||||
To manually obtain granting tickets, one has to:
|
||||
|
||||
1. To obtain a granting Kerberos ticket, one needs to run `kinit $USER` and enter the PSI password.
|
||||
|
||||
```bash
|
||||
kinit $USER@D.PSI.CH
|
||||
```
|
||||
|
||||
2. To obtain a granting ticket for AFS, one needs to run `aklog`. No password is necessary, but a valid
|
||||
Kerberos ticket is mandatory.
|
||||
|
||||
```bash
|
||||
aklog
|
||||
```
|
||||
|
||||
3. To list the status of your granted tickets, users can use the `klist` command.
|
||||
|
||||
```bash
|
||||
klist
|
||||
```
|
||||
|
||||
4. To extend the validity of existing granting tickets, users can use the `krenew` command.
|
||||
|
||||
```bash
|
||||
krenew
|
||||
```
|
||||
|
||||
* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,
|
||||
and then `kinit` should be used instead.
|
||||
|
||||
## Obtanining granting tickets with keytab
|
||||
|
||||
Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs
|
||||
requiring access to private areas in AFS. For that, there's the possibility to generate a **keytab** file.
|
||||
|
||||
Be aware that the **keytab** file must be **private**, **fully protected** by correct permissions and not shared with any
|
||||
other users.
|
||||
|
||||
### Creating a keytab file
|
||||
|
||||
For generating a **keytab**, one has to:
|
||||
|
||||
1. Create a private directory for storing the Kerberos **keytab** file
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.k5
|
||||
```
|
||||
|
||||
2. Run the `ktutil` utility:
|
||||
|
||||
```bash
|
||||
ktutil
|
||||
```
|
||||
|
||||
3. In the `ktutil` console, one has to generate a **keytab** file as follows:
|
||||
|
||||
```bash
|
||||
# Replace $USER by your username
|
||||
add_entry -password -k 0 -f -p $USER
|
||||
wkt /data/user/$USER/.k5/krb5.keytab
|
||||
exit
|
||||
```
|
||||
|
||||
Please note:
|
||||
* That you will need to add your password once. This step is required for generating the **keytab** file.
|
||||
* `ktutil`does **not** report an error if you enter a wrong password! You can test with the `kinit` command documented below. If `kinit` fails with an error message like "pre-authentication failed", this is usually due to a wrong password/key in the keytab file. In this case **you have to remove the keytab file** and re-run the `ktutil` command. See "Updating the keytab file" in the section below.
|
||||
|
||||
### Updating an existing keytab file
|
||||
|
||||
After a password change you have to update your **keytab**:
|
||||
|
||||
1. Remove the old **keytab** file
|
||||
|
||||
```bash
|
||||
rm -f ~/.k5/krb5.keytab
|
||||
```
|
||||
|
||||
2. Run the `ktutil` utility:
|
||||
|
||||
```bash
|
||||
ktutil
|
||||
```
|
||||
|
||||
3. In the `ktutil` console, one has to generate a **keytab** file as follows:
|
||||
|
||||
```bash
|
||||
# Replace $USER by your username
|
||||
add_entry -password -k 0 -f -p $USER
|
||||
wkt /data/user/$USER/.k5/krb5.keytab
|
||||
exit
|
||||
```
|
||||
|
||||
### Obtaining tickets by using keytab files
|
||||
|
||||
Once the keytab is created, one can obtain kerberos tickets without being prompted for a password as follows:
|
||||
|
||||
```bash
|
||||
kinit -kt ~/.k5/krb5.keytab $USER
|
||||
aklog
|
||||
```
|
||||
|
||||
## Slurm jobs accessing AFS
|
||||
|
||||
Some jobs may require to access private areas in AFS. For that, having a valid [**keytab**](/merlin7/kerberos.html#generating-granting-tickets-with-keytab) file is required.
|
||||
Then, from inside the batch script one can obtain granting tickets for Kerberos and AFS, which can be used for accessing AFS private areas.
|
||||
|
||||
The steps should be the following:
|
||||
|
||||
* Setup `KRB5CCNAME`, which can be used to specify the location of the Kerberos5 credentials (ticket) cache. In general it should point to a shared area
|
||||
(`$HOME/.k5` is a good location), and is strongly recommended to generate an independent Kerberos5 credential cache (it is, creating a new credential cache per Slurm job):
|
||||
|
||||
```bash
|
||||
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
|
||||
```
|
||||
|
||||
* To obtain a Kerberos5 granting ticket, run `kinit` by using your keytab:
|
||||
|
||||
```bash
|
||||
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
|
||||
```
|
||||
|
||||
* To obtain a granting AFS ticket, run `aklog`:
|
||||
|
||||
```bash
|
||||
aklog
|
||||
```
|
||||
|
||||
* At the end of the job, you can remove destroy existing Kerberos tickets.
|
||||
|
||||
```bash
|
||||
kdestroy
|
||||
```
|
||||
|
||||
### Slurm batch script example: obtaining KRB+AFS granting tickets
|
||||
|
||||
#### Example 1: Independent crendetial cache per Slurm job
|
||||
|
||||
This is the **recommended** way. At the end of the job, is strongly recommended to remove / destroy the existing kerberos tickets.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'
|
||||
#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.
|
||||
#SBATCH --output=run.out # Generate custom output file
|
||||
#SBATCH --error=run.err # Generate custom error file
|
||||
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
||||
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
|
||||
#SBATCH --cpus-per-task=1
|
||||
#SBATCH --constraint=xeon-gold-6152
|
||||
#SBATCH --hint=nomultithread
|
||||
#SBATCH --job-name=krb5
|
||||
|
||||
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
|
||||
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
|
||||
aklog
|
||||
klist
|
||||
|
||||
echo "Here should go my batch script code."
|
||||
|
||||
# Destroy Kerberos tickets created for this job only
|
||||
kdestroy
|
||||
klist
|
||||
```
|
||||
|
||||
#### Example 2: Shared credential cache
|
||||
|
||||
Some users may need/prefer to run with a shared cache file. For doing that, one needs to
|
||||
setup `KRB5CCNAME` from the **login node** session, before submitting the job.
|
||||
|
||||
```bash
|
||||
export KRB5CCNAME="$(mktemp "$HOME/.k5/krb5cc_XXXXXX")"
|
||||
```
|
||||
|
||||
Then, you can run one or multiple jobs scripts (or parallel job with `srun`). `KRB5CCNAME` will be propagated to the
|
||||
job script or to the parallel job, therefore a single credential cache will be shared amongst different Slurm runs.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'
|
||||
#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.
|
||||
#SBATCH --output=run.out # Generate custom output file
|
||||
#SBATCH --error=run.err # Generate custom error file
|
||||
#SBATCH --nodes=1 # Uncomment and specify #nodes to use
|
||||
#SBATCH --ntasks=1 # Uncomment and specify #nodes to use
|
||||
#SBATCH --cpus-per-task=1
|
||||
#SBATCH --constraint=xeon-gold-6152
|
||||
#SBATCH --hint=nomultithread
|
||||
#SBATCH --job-name=krb5
|
||||
|
||||
# KRB5CCNAME is inherit from the login node session
|
||||
kinit -kt "$HOME/.k5/krb5.keytab" $USER@D.PSI.CH
|
||||
srun aklog
|
||||
|
||||
echo "Here should go my batch script code."
|
||||
|
||||
echo "No need to run 'kdestroy', as it may have to survive for running other jobs"
|
||||
```
|
||||
109
docs/merlin7/02-How-To-Use-Merlin/merlin-rmount.md
Normal file
109
docs/merlin7/02-How-To-Use-Merlin/merlin-rmount.md
Normal file
@@ -0,0 +1,109 @@
|
||||
---
|
||||
title: Using merlin_rmount
|
||||
#tags:
|
||||
keywords: >-
|
||||
transferring data, data transfer, rsync, dav, webdav, sftp, ftp, smb, cifs,
|
||||
copy data, copying, mount, file, folder, sharing
|
||||
last_updated: 24 August 2023
|
||||
#summary: ""
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/merlin-rmount.html
|
||||
---
|
||||
|
||||
## Background
|
||||
|
||||
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
|
||||
provides a helpful wrapper over the Gnome storage utilities (GIO and GVFS), and provides support for a wide range of remote file formats, including
|
||||
- SMB/CIFS (Windows shared folders)
|
||||
- WebDav
|
||||
- AFP
|
||||
- FTP, SFTP
|
||||
- [complete list](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
|
||||
### Start a session
|
||||
|
||||
First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.
|
||||
|
||||
```
|
||||
$ merlin_rmount --init
|
||||
[INFO] Starting new D-Bus RMOUNT session
|
||||
|
||||
(RMOUNT STARTED) [bliven_s@login002 ~]$
|
||||
```
|
||||
|
||||
Note that behind the scenes this is creating a new dbus daemon. Running multiple daemons on the same login node leads to unpredictable results, so it is best not to initialize multiple sessions in parallel.
|
||||
|
||||
### Standard Endpoints
|
||||
|
||||
Standard endpoints can be mounted using
|
||||
|
||||
```
|
||||
merlin_rmount --select-mount
|
||||
```
|
||||
|
||||
Select the desired url using the arrow keys.
|
||||
|
||||

|
||||
|
||||
From this list any of the standard supported endpoints can be mounted.
|
||||
|
||||
### Other endpoints
|
||||
|
||||
Other endpoints can be mounted using the `merlin_rmount --mount <endpoint>` command.
|
||||
|
||||

|
||||
|
||||
|
||||
### Accessing Files
|
||||
|
||||
After mounting a volume the script will print the mountpoint. It should be of the form
|
||||
|
||||
```
|
||||
/run/user/$UID/gvfs/<endpoint>
|
||||
```
|
||||
|
||||
where `$UID` gives your unix user id (a 5-digit number, also viewable with `id -u`) and
|
||||
`<endpoint>` is some string generated from the mount options.
|
||||
|
||||
For convenience, it may be useful to add a symbolic link for this gvfs directory. For instance, this would allow all volumes to be accessed in ~/mnt/:
|
||||
|
||||
```
|
||||
ln -s ~/mnt /run/user/$UID/gvfs
|
||||
```
|
||||
|
||||
Files are accessible as long as the `merlin_rmount` shell remains open.
|
||||
|
||||
|
||||
### Disconnecting
|
||||
|
||||
To disconnect, close the session with one of the following:
|
||||
|
||||
- The exit command
|
||||
- CTRL-D
|
||||
- Closing the terminal
|
||||
|
||||
Disconnecting will unmount all volumes.
|
||||
|
||||
|
||||
## Alternatives
|
||||
|
||||
### Thunar
|
||||
|
||||
Users that prefer a GUI file browser may prefer the `thunar` command, which opens the Gnome File Browser. This is also available in NoMachine sessions in the bottom bar (1). Thunar supports the same remote filesystems as `merlin_rmount`; just type the URL in the address bar (2).
|
||||
|
||||

|
||||
|
||||
When using thunar within a NoMachine session, file transfers continue after closing NoMachine (as long as the NoMachine session stays active).
|
||||
|
||||
Files can also be accessed at the command line as needed (see 'Accessing Files' above).
|
||||
|
||||
## Resources
|
||||
|
||||
- [BIO docs](https://intranet.psi.ch/en/bio/webdav-data) on using these tools for
|
||||
transfering EM data
|
||||
- [Redhad docs on GVFS](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8)
|
||||
- [gio reference](https://developer-old.gnome.org/gio/stable/gio.html)
|
||||
108
docs/merlin7/02-How-To-Use-Merlin/merlin_tools.md
Normal file
108
docs/merlin7/02-How-To-Use-Merlin/merlin_tools.md
Normal file
@@ -0,0 +1,108 @@
|
||||
---
|
||||
title: Merlin7 Tools
|
||||
#tags:
|
||||
keywords: merlin_quotas
|
||||
#last_updated: 07 September 2022
|
||||
#summary: ""
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/tools.html
|
||||
---
|
||||
|
||||
## About
|
||||
|
||||
We provide tool(s) to help user get the most out of using the cluster. The tools
|
||||
described here are organised by use case and include usage examples.
|
||||
|
||||
## Files and Directories
|
||||
|
||||
### `merlin_quotas`
|
||||
|
||||
This tool is available on all of the login nodes and provides a brief overview of
|
||||
a user's filesystem quotas. These are limits which restrict how much storage (or
|
||||
number of files) a user can create. A generic table of filesystem quotas can be
|
||||
found on the [Storage page](/merlin7/storage.html#dir_classes).
|
||||
|
||||
#### Example #1: Viewing quotas
|
||||
|
||||
Simply calling `merlin_quotas` will show you a table of our quotas:
|
||||
|
||||
```console
|
||||
$ merlin_quotas
|
||||
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
|
||||
-------------- --------- ---------- ------- --------- ---------- -------
|
||||
/data/user 30.26G 1T 03% 367296 2097152 18%
|
||||
└─ <USERNAME>
|
||||
/afs/psi.ch 3.4G 9.5G 36% 0 0 00%
|
||||
└─ user/<USERDIR>
|
||||
/data/project 2.457T 10T 25% 58 2097152 00%
|
||||
└─ bio/shared
|
||||
/data/project 338.3G 10T 03% 199391 2097152 10%
|
||||
└─ bio/hpce
|
||||
```
|
||||
|
||||
{{site.data.alerts.tip}}You can change the width of the table by either passing
|
||||
<code>--no-wrap</code> (to disable wrapping of the <i>Path</i>) or <code>--width N</code>
|
||||
(to explicitly set some width by <code>N</code> characters).
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
#### Example #2: Project view
|
||||
|
||||
The tool can also be used to list out information about what projects directories
|
||||
there are and who owns/manages these:
|
||||
|
||||
```console
|
||||
$ merlin_quotas projects
|
||||
Project ID Path Owner Group
|
||||
---------- ------------------------ --------- --------------
|
||||
600000000 /data/project/bio/shared germann_e unx-merlin_adm
|
||||
600000001 /data/project/bio/hpce assman_g unx-merlin_adm
|
||||
```
|
||||
|
||||
By default this only shows information on projects that you have access to, but
|
||||
to view the whole list you can pass `--all` flag:
|
||||
|
||||
```console
|
||||
$ merlin_quotas projects --all
|
||||
Project ID Path Owner Group
|
||||
---------- ------------------------------- -------------- -----------------
|
||||
500000000 /data/project/general/mcnp gac-mcnp unx-mcnp_all
|
||||
500000001 /data/project/general/vis_as talanov_v unx-vis_as
|
||||
500000002 /data/project/general/mmm krack org-7302
|
||||
500000003 /data/project/general laeuch_a org-7201
|
||||
└─ LTC_CompPhys
|
||||
600000000 /data/project/bio/shared germann_e unx-merlin_adm
|
||||
600000001 /data/project/bio/hpce assman_g unx-merlin_adm
|
||||
600000002 /data/project/bio/abrahams abrahams_j unx-bio_abrahams
|
||||
600000003 /data/project/bio/benoit benoit_r unx-bio_benoit
|
||||
600000004 /data/project/bio/ishikawa ishikawa unx-bio_ishikawa
|
||||
600000005 /data/project/bio/kammerer kammerer_r unx-bio_kammerer
|
||||
600000006 /data/project/bio/korkhov korkhov_v unx-bio_korkhov
|
||||
600000007 /data/project/bio/luo luo_j unx-bio_luo
|
||||
600000008 /data/project/bio/mueller mueller_e unx-bio_mueller
|
||||
600000009 /data/project/bio/poghosyan poghosyan_e unx-bio_poghosyan
|
||||
600000010 /data/project/bio/schertler schertler_g unx-bio_schertler
|
||||
600000011 /data/project/bio/shivashankar shivashankar_g unx-bio_shivashan
|
||||
600000012 /data/project/bio/standfuss standfuss unx-bio_standfuss
|
||||
600000013 /data/project/bio/steinmetz steinmetz unx-bio_steinmetz
|
||||
```
|
||||
|
||||
{{site.data.alerts.tip}}As above you can change the table width by pass either
|
||||
<code>--no-wrap</code> or <code>--width N</code>.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
#### Example #3: Project config
|
||||
|
||||
To make tracking quotas of projects easier, `merlin_quotas` generates a config
|
||||
file in your home directory which defines the projects to show when you call the
|
||||
tool (called `~/.merlin_quotas`).
|
||||
|
||||
The config file simply contains a list (one per line) of project IDs which should
|
||||
be tracked. In theory any (or all available projects) can be tracked, but due to
|
||||
UNIX and Lustre permissions, accessing quotas information for a project you're not
|
||||
a member of **is not possible**.
|
||||
|
||||
If you are added/removed from a project, you can update this config file by
|
||||
calling `merlin_quotas genconf --force` (notice the `--force`, which will overwrite
|
||||
your existing config file) or by editing the file by hand (*not recommended*).
|
||||
|
||||
|
||||
147
docs/merlin7/02-How-To-Use-Merlin/nomachine.md
Normal file
147
docs/merlin7/02-How-To-Use-Merlin/nomachine.md
Normal file
@@ -0,0 +1,147 @@
|
||||
---
|
||||
title: Remote Desktop Access to Merlin7
|
||||
keywords: NX, NoMachine, remote desktop access, login node, login001, login002, merlin7-nx-01, merlin7-nx, nx.psi.ch, VPN, browser access
|
||||
last_updated: 07 August 2024
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/nomachine.html
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Merlin7 NoMachine provides users with remote desktop access to the Merlin7 computing environment. This service enables users to connect to their computing resources from any location, whether they are inside the PSI network or accessing from outside via secure methods.
|
||||
|
||||
## Accessing Merlin7 NoMachine
|
||||
|
||||
### From Inside PSI
|
||||
|
||||
If you are inside the PSI network, you can directly connect to the Merlin7 NoMachine service without the need to go through another service.
|
||||
|
||||
1. **Ensure Network Connectivity**: Make sure you are connected to the PSI internal network.
|
||||
2. **Choose Your Access Method**: You can access Merlin7 using either a web browser or the NoMachine client.
|
||||
|
||||
#### Method 1: Using a Web Browser
|
||||
|
||||
Open your web browser and navigate to [https://merlin7-nx.psi.ch:4443](https://merlin7-nx.psi.ch:4443).
|
||||
|
||||
#### Method 2: Using the NoMachine Client
|
||||
|
||||
Settings for the NoMachine client:
|
||||
|
||||
- **Host**: `merlin7-nx.psi.ch`
|
||||
- **Port**: `4000`
|
||||
- **Protocol**: `NX`
|
||||
- **Authentication**: `Use password authentication`
|
||||
|
||||
### From Outside PSI
|
||||
|
||||
Users outside the PSI network have two options for accessing the Merlin7 NoMachine service: through `nx.psi.ch` or via a VPN connection.
|
||||
|
||||
#### Option 1: Via `nx.psi.ch`
|
||||
|
||||
Documentation about the `nx.psi.ch` service can be found [here](https://www.psi.ch/en/photon-science-data-services/remote-desktop-nomachine).
|
||||
|
||||
##### Using a Web Browser
|
||||
|
||||
Open your web browser and navigate to [https://nx.psi.ch](https://nx.psi.ch).
|
||||
|
||||
##### Using the NoMachine Client
|
||||
|
||||
Settings for the NoMachine client:
|
||||
|
||||
- **Host**: `nx.psi.ch`
|
||||
- **Port**: `4000`
|
||||
- **Protocol**: `NX`
|
||||
- **Authentication**: `Use password authentication`
|
||||
|
||||
#### Option 2: Via VPN
|
||||
|
||||
Alternatively, you can use a VPN connection to access Merlin7 as if you were inside the PSI network.
|
||||
|
||||
1. **Request VPN Access**: Contact the IT department to request VPN access if you do not already have it. Submit a request through the PSI Service Now ticketing system: [VPN Access (PSI employees)](https://psi.service-now.com/psisp?id=psi_new_sc_cat_item&sys_id=beccc01b6f44a200d02a82eeae3ee440).
|
||||
2. **Connect to the VPN**: Once access is granted, connect to the PSI VPN using your credentials.
|
||||
3. **Access Merlin7 NoMachine**: Once connected to the VPN, you can access Merlin7 using either a web browser or the NoMachine client as if you were inside the PSI network.
|
||||
|
||||
## The NoMachine Client
|
||||
|
||||
### Installation
|
||||
|
||||
#### Windows
|
||||
|
||||
The NoMachine client is available for PSI Windows computers in the Software Kiosk under the name **NX Client**.
|
||||
|
||||
#### macOS and Linux
|
||||
|
||||
The NoMachine client can be downloaded from [NoMachine's download page](https://downloads.nomachine.com).
|
||||
|
||||
### Connection Configuration
|
||||
|
||||
1. **Launch NoMachine Client**: Open the NoMachine client on your computer.
|
||||
2. **Create a New Connection**: Click the **Add** button to create a new connection.
|
||||
- On the **Address** tab configure:
|
||||
- **Name**: Enter a name for your connection. This can be anything.
|
||||
- **Host**: Enter the appropriate hostname (e.g. `merlin7-nx.psi.ch`).
|
||||
- **Port**: Enter `4000`.
|
||||
- **Protocol**: Select `NX`.
|
||||
|
||||

|
||||
|
||||
- On the **Configuration** tab ensure:
|
||||
- **Authentication**: Select `Use password authentication`.
|
||||
|
||||

|
||||
|
||||
- Click the **Add** button to finish creating the new connection.
|
||||
|
||||
## Authenticating
|
||||
|
||||
When prompted, use your PSI credentials to authenticate.
|
||||
|
||||

|
||||
|
||||
## Managing Sessions
|
||||
|
||||
The Merlin7 NoMachine service is managed through a front-end server and back-end nodes, facilitating balanced and efficient access to remote desktop sessions.
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
- **Front-End Server**: `merlin7-nx.psi.ch`
|
||||
- Serves as the entry point for users connecting to the NoMachine service.
|
||||
- Handles load-balancing and directs users to available back-end nodes.
|
||||
|
||||
- **Back-End Nodes**:
|
||||
- `login001.merlin7.psi.ch`
|
||||
- `login002.merlin7.psi.ch`
|
||||
- These nodes host the NoMachine desktop service and manage the individual desktop sessions.
|
||||
|
||||
Access to the login node desktops must be initiated through the `merlin7-nx.psi.ch` front-end. The front-end service will distribute sessions across available nodes in the back-end, ensuring optimal resource usage.
|
||||
|
||||
### Opening NoMachine Desktop Sessions
|
||||
|
||||
When connecting to the `merlin7-nx.psi.ch` front-end, a new session automatically opens if no existing session is found. Users can manage their sessions as follows:
|
||||
|
||||
- **Reconnect to an Existing Session**: If you have an active session, you can reconnect to it by selecting the appropriate icon in the NoMachine client interface. This allows you to resume work without losing any progress.
|
||||

|
||||
- **Create a Second Session**: If you require a separate session, you can select the **`New Desktop`** button. This option creates a second session on another login node, provided the node is available and operational.
|
||||
|
||||
### Session Management Considerations
|
||||
|
||||
- **Load Balancing**: The front-end service ensures that sessions are evenly distributed across the available back-end nodes to optimize performance and resource utilization.
|
||||
- **Session Limits**: Users are limited to one session per back-end node to maintain system stability and efficiency.
|
||||
|
||||
## Support and Resources
|
||||
|
||||
If you encounter any issues or need further assistance with the Merlin7 NoMachine service, support is available via email. Please contact us at [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists.psi.ch), and our support team will be happy to assist you.
|
||||
|
||||
### Advanced Display Settings
|
||||
|
||||
NoMachine provides several options to optimize the display settings for better performance and clarity. These settings can be accessed and adjusted when creating a new session or by clicking the top right corner of a running session.
|
||||
|
||||
#### Prevent Rescaling
|
||||
|
||||
Preventing rescaling can help eliminate "blurriness" in your display, though it may affect performance. Adjust these settings based on your performance needs:
|
||||
|
||||
- Display: Choose `Resize remote display` (forces 1:1 pixel sizes)
|
||||
- Display > Change settings > Quality: Choose medium-best quality
|
||||
- Display > Change settings > Modify the advanced display settings
|
||||
- Check: Disable network-adaptive display quality (turns off lossy compression)
|
||||
- Check: Disable client side image post-processing
|
||||
50
docs/merlin7/02-How-To-Use-Merlin/software-repositories.md
Normal file
50
docs/merlin7/02-How-To-Use-Merlin/software-repositories.md
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: Software repositories
|
||||
#tags:
|
||||
keywords: modules, software, stable, unstable, deprecated, spack, repository, repositories
|
||||
last_updated: 16 January 2024
|
||||
summary: "This page contains information about the different software repositories"
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/software-repositories.html
|
||||
---
|
||||
|
||||
## Module Systems in Merlin7
|
||||
|
||||
Merlin7 provides a modular environment to ensure flexibility, compatibility, and optimized performance.
|
||||
The system supports three primary module types: PSI Environment Modules (PModules), Spack Modules, and Cray Environment Modules.
|
||||
|
||||
### PSI Environment Modules (PModules)
|
||||
|
||||
The PModules system, developed by PSI, is the officially supported module system on Merlin7. It is the preferred choice for accessing validated software across a wide range of applications.
|
||||
|
||||
Key Features:
|
||||
* **Expert Deployment:** Each package is deployed and maintained by specific experts to ensure reliability and compatibility.
|
||||
* **Broad Availability:** Commonly used software, such as OpenMPI, ANSYS, MATLAB, and other, is provided within PModules.
|
||||
* **Custom Requests:** If a package, version, or feature is missing, users can contact the support team to explore feasibility for installation.
|
||||
|
||||
{{site.data.alerts.tip}}
|
||||
For further information about <b>Pmodules</b> on Merlin7 please refer to the <b><a href="/merlin7/pmodules.html">PSI Modules</a></b> chapter.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
### Spack Modules
|
||||
|
||||
Merlin7 also provides Spack modules, offering a modern and flexible package management system. Spack supports a wide variety of software packages and versions. For more information, refer to the **external [PSI Spack](https://gitea.psi.ch/HPCE/spack-psi) documentation**.
|
||||
|
||||
{{site.data.alerts.tip}}
|
||||
For further information about <b>Spack</b> on Merlin7 please refer to the <b><a href="/merlin7/spack.html">Spack</a></b> chapter.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
### Cray Environment Modules
|
||||
|
||||
Merlin7 also supports Cray Environment Modules, which include compilers, MPI implementations, and libraries optimized
|
||||
for Cray systems. However, Cray modules are not recommended as the default choice due to potential backward compatibility
|
||||
issues when the Cray Programming Environment (CPE) is upgraded to a newer version.
|
||||
|
||||
Recommendations:
|
||||
* **Compiling Software:** Cray modules can be used when optimization for Cray hardware is essential.
|
||||
* **General Use:** For most applications, prefer PModules, which ensure stability, backward compatibility, and long-term support.
|
||||
|
||||
{{site.data.alerts.tip}}
|
||||
For further information about <b>CPE</b> on Merlin7 please refer to the <b><a href="/merlin7/cray-module-env.html">Cray Modules</a></b> chapter.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
184
docs/merlin7/02-How-To-Use-Merlin/ssh-keys.md
Normal file
184
docs/merlin7/02-How-To-Use-Merlin/ssh-keys.md
Normal file
@@ -0,0 +1,184 @@
|
||||
---
|
||||
title: Configuring SSH Keys in Merlin
|
||||
|
||||
#tags:
|
||||
keywords: linux, connecting, client, configuration, SSH, Keys, SSH-Keys, RSA, authorization, authentication
|
||||
last_updated: 15 Jul 2020
|
||||
summary: "This document describes how to deploy SSH Keys in Merlin."
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/ssh-keys.html
|
||||
---
|
||||
|
||||
Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password.
|
||||
One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys.
|
||||
One example is ANSYS Fluent, which, when used interactively, the way of communication between the GUI and the different nodes
|
||||
is through the SSH protocol, and the use of SSH Keys is enforced.
|
||||
|
||||
## Setting up SSH Keys on Merlin
|
||||
|
||||
For security reason, users **must always protect SSH Keys with a passphrase**.
|
||||
|
||||
User can check whether a SSH key already exists. These would be placed in the **~/.ssh/** directory. `RSA` encryption
|
||||
is usually the default one, and files in there would be **`id_rsa`** (private key) and **`id_rsa.pub`** (public key).
|
||||
|
||||
```bash
|
||||
ls ~/.ssh/id*
|
||||
```
|
||||
|
||||
For creating **SSH RSA Keys**, one should:
|
||||
|
||||
1. Run `ssh-keygen`, a password will be requested twice. You **must remember** this password for the future.
|
||||
* Due to security reasons, ***always try protecting it with a password***. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.
|
||||
* This will generate a private key **id_rsa**, and a public key **id_rsa.pub** in your **~/.ssh** directory.
|
||||
2. Add your public key to the **`authorized_keys`** file, and ensure proper permissions for that file, as follows:
|
||||
|
||||
```bash
|
||||
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
||||
chmod 0600 ~/.ssh/authorized_keys
|
||||
```
|
||||
|
||||
3. Configure the SSH client in order to force the usage of the **psi.ch** domain for trusting keys:
|
||||
|
||||
```bash
|
||||
echo "CanonicalizeHostname yes" >> ~/.ssh/config
|
||||
```
|
||||
|
||||
4. Configure further SSH options as follows:
|
||||
|
||||
```bash
|
||||
echo "AddKeysToAgent yes" >> ~/.ssh/config
|
||||
echo "ForwardAgent yes" >> ~/.ssh/config
|
||||
```
|
||||
|
||||
Other options may be added.
|
||||
|
||||
5. Check that your SSH config file contains at least the lines mentioned in steps 3 and 4:
|
||||
|
||||
```console
|
||||
# cat ~/.ssh/config
|
||||
CanonicalizeHostname yes
|
||||
AddKeysToAgent yes
|
||||
ForwardAgent yes
|
||||
```
|
||||
|
||||
## Using the SSH Keys
|
||||
|
||||
### Using Authentication Agent in SSH session
|
||||
|
||||
By default, when accessing the login node via SSH (with `ForwardAgent=yes`), it will automatically add your
|
||||
SSH Keys to the authentication agent. Hence, no actions should not be needed by the user. One can configure
|
||||
`ForwardAgent=yes` as follows:
|
||||
|
||||
* **(Recommended)** In your local Linux (workstation, laptop or desktop) add the following line in the
|
||||
`$HOME/.ssh/config` (or alternatively in `/etc/ssh/ssh_config`) file:
|
||||
|
||||
```ssh_config
|
||||
ForwardAgent yes
|
||||
```
|
||||
|
||||
* Alternatively, on each SSH you can add the option `ForwardAgent=yes` in the SSH command. In example:
|
||||
|
||||
```bash
|
||||
ssh -XY -o ForwardAgent=yes login001.merlin7.psi.ch
|
||||
```
|
||||
|
||||
If `ForwardAgent` is not enabled as shown above, one needs to run the authentication agent and then add your key
|
||||
to the **ssh-agent**. This must be done once per SSH session, as follows:
|
||||
|
||||
* Run `eval $(ssh-agent -s)` to run the **ssh-agent** in that SSH session
|
||||
* Check whether the authentication agent has your key already added:
|
||||
|
||||
```bash
|
||||
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
|
||||
```
|
||||
|
||||
* If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
||||
You will be requested for the **passphrase** of your key, and it can be done by running:
|
||||
|
||||
```bash
|
||||
ssh-add
|
||||
```
|
||||
|
||||
### Using Authentication Agent in NoMachine Session
|
||||
|
||||
By default, when using a NoMachine session, the `ssh-agent` should be automatically started. Hence, there is no need of
|
||||
starting the agent or forwarding it.
|
||||
|
||||
However, for NoMachine one always need to add the private key identity to the authentication agent. This can be done as follows:
|
||||
|
||||
1. Check whether the authentication agent has already the key added:
|
||||
|
||||
```bash
|
||||
ssh-add -l | grep "/data/user/$(whoami)/.ssh"
|
||||
```
|
||||
2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent.
|
||||
You will be requested for the **passphrase** of your key, and it can be done by running:
|
||||
|
||||
```bash
|
||||
ssh-add
|
||||
```
|
||||
|
||||
You just need to run it once per NoMachine session, and it would apply to all terminal windows within that NoMachine session.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Errors when running 'ssh-add'
|
||||
|
||||
If the error `Could not open a connection to your authentication agent.` appears when running `ssh-add`, it means
|
||||
that the authentication agent is not running. Please follow the previous procedures for starting it.
|
||||
|
||||
### Add/Update SSH RSA Key password
|
||||
|
||||
If an existing SSH Key does not have password, or you want to update an existing password with a new one, you can do it as follows:
|
||||
|
||||
```bash
|
||||
ssh-keygen -p -f ~/.ssh/id_rsa
|
||||
```
|
||||
|
||||
### SSH Keys deployed but not working
|
||||
|
||||
Please ensure proper permissions of the involved files, as well as any typos in the file names involved:
|
||||
|
||||
```bash
|
||||
chmod u+rwx,go-rwx,g+s ~/.ssh
|
||||
chmod u+rw-x,go-rwx ~/.ssh/authorized_keys
|
||||
chmod u+rw-x,go-rwx ~/.ssh/id_rsa
|
||||
chmod u+rw-x,go+r-wx ~/.ssh/id_rsa.pub
|
||||
```
|
||||
|
||||
### Testing SSH Keys
|
||||
|
||||
Once SSH Key is created, for testing that the SSH Key is valid, one can do the following:
|
||||
|
||||
1. Create a **new** SSH session in one of the login nodes:
|
||||
|
||||
```bash
|
||||
ssh login001
|
||||
```
|
||||
|
||||
2. In the login node session, destroy any existing Kerberos ticket or active SSH Key:
|
||||
|
||||
```bash
|
||||
kdestroy
|
||||
ssh-add -D
|
||||
```
|
||||
|
||||
3. Add the new private key identity to the authentication agent. You will be requested by the passphrase.
|
||||
|
||||
```bash
|
||||
ssh-add
|
||||
```
|
||||
|
||||
4. Check that your key is active by the SSH agent:
|
||||
|
||||
```bash
|
||||
ssh-add -l
|
||||
```
|
||||
|
||||
4. SSH to the second login node. No password should be requested:
|
||||
|
||||
```bash
|
||||
ssh -vvv login002
|
||||
```
|
||||
|
||||
If the last step succeeds, then means that your SSH Key is properly setup.
|
||||
186
docs/merlin7/02-How-To-Use-Merlin/storage.md
Normal file
186
docs/merlin7/02-How-To-Use-Merlin/storage.md
Normal file
@@ -0,0 +1,186 @@
|
||||
---
|
||||
title: Merlin7 Storage
|
||||
#tags:
|
||||
keywords: storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas
|
||||
#last_updated: 07 September 2022
|
||||
#summary: ""
|
||||
sidebar: merlin7_sidebar
|
||||
redirect_from: /merlin7/data-directories.html
|
||||
permalink: /merlin7/storage.html
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
This document describes the different directories of the Merlin7 cluster.
|
||||
|
||||
### Backup and data policies
|
||||
|
||||
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
||||
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
||||
|
||||
{{site.data.alerts.warning}}When a user leaves PSI and their account is removed, their storage space in Merlin may be recycled.
|
||||
Hence, <b>when a user leaves PSI</b>, they, their supervisor or team <b>must ensure that the data is backed up to an external storage</b>
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
### How to check quotas
|
||||
|
||||
Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the `merlin_quotas` command.
|
||||
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
|
||||
|
||||
```console
|
||||
$ merlin_quotas
|
||||
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
|
||||
-------------- --------- ---------- ------- --------- ---------- -------
|
||||
/data/user 30.26G 1T 03% 367296 2097152 18%
|
||||
└─ <USERNAME>
|
||||
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
|
||||
└─ user/<USERDIR>
|
||||
/data/scratch 688.9M 2T 00% 368471 0 00%
|
||||
└─ shared
|
||||
/data/project 3.373T 11T 31% 425644 2097152 20%
|
||||
└─ bio/shared
|
||||
/data/project 4.142T 11T 38% 579596 2097152 28%
|
||||
└─ bio/hpce
|
||||
```
|
||||
|
||||
{{site.data.alerts.note}}On first use you will see a message about some configuration being generated, this is expected. Don't be
|
||||
surprised that it takes some time. After this using <code>merlin_quotas</code> should be faster.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
The output shows the quotas set and how much you are using of the quota, for each filesystem that has this set. Notice that some users will have
|
||||
one or more `/data/project/...` directories showing, depending on whether you are part of a specific PSI research group or project.
|
||||
|
||||
The general quota constraints for the different directories are shown in the [table below](#dir_classes). Further details on how to use `merlin_quotas`
|
||||
can be found on the [Tools page](/merlin7/tools.html).
|
||||
|
||||
{{site.data.alerts.tip}}If you're interesting, you can retrieve the Lustre-based quota information directly by calling
|
||||
<code>lfs quota -h -p $(( 100000000 + $(id -u $USER) )) /data</code> directly. Using the <code>merlin_quotas</code> command is more
|
||||
convenient and shows all your relevant filesystem quotas.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
## Merlin7 directories
|
||||
|
||||
Merlin7 offers the following directory classes for users:
|
||||
|
||||
* `/data/user/<username>`: Private user **home** directory
|
||||
* `/data/project/general`: project directory for Merlin
|
||||
* `/data/project/bio/$projectname`: project directory for BIO
|
||||
* `/data/project/mu3e/$projectname`: project directory for Mu3e
|
||||
* `/data/project/meg/$projectname`: project directory for Mu3e
|
||||
* `/scratch`: Local *scratch* disk (only visible by the node running a job).
|
||||
* `/data/scratch/shared`: Shared *scratch* disk (visible from all nodes).
|
||||
|
||||
{{site.data.alerts.tip}}In Lustre there is a concept called <b>grace time</b>. Filesystems have a block (amount of data) and inode (number of files) quota.
|
||||
These quotas contain a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the <b>grace period</b>.
|
||||
Once the <b>grace time</b> or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase
|
||||
when this is possible, see below table).
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
<a name="dir_classes"></a>Properties of the directory classes:
|
||||
|
||||
| Directory | Block Quota [Soft:Hard] | Inode Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Inodes | Backup |
|
||||
| ---------------------------------- | ----------------------- | ----------------------- | :-------: | :--------------------------------- |:-------------------------------- | ------ |
|
||||
| /data/user/$username | PRJ [1TB:1.074TB] | PRJ [2M:2.1M] | 7d | Immutable. Need a project. | Changeable when justified. | no |
|
||||
| /data/project/bio/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
|
||||
| /data/project/general/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
|
||||
| /data/scratch/shared | USR [512GB:2TB] | | 7d | Up to x2 when strongly justified. | Changeable when justified. | no |
|
||||
| /scratch | *Undef* | *Undef* | N/A | N/A | N/A | no |
|
||||
|
||||
{{site.data.alerts.warning}}The use of <b>/scratch</b> and <b>/data/scratch/shared</b> areas as an extension of the quota <i>is forbidden</i>. The <b>/scratch</b> and
|
||||
<b>/data/scratch/shared</b> areas <i>must not contain</i> final data. Keep in mind that <br><b><i>auto cleanup policies</i></b> in the <b>/scratch</b> and
|
||||
<b>/data/scratch/shared</b> areas are applied.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
### User home directory
|
||||
|
||||
This is the default directory users will land when login in to any Merlin7 machine.
|
||||
It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.
|
||||
|
||||
The home directories are mounted in the login and computing nodes under the directory
|
||||
|
||||
```bash
|
||||
/data/user/$username
|
||||
```
|
||||
|
||||
Directory policies:
|
||||
|
||||
* Read **[Important: Code of Conduct](/merlin7/code-of-conduct.html)** for more information about Merlin7 policies.
|
||||
* Is **forbidden** to use the home directories for IO-intensive tasks, instead use one of the **[scratch](/merlin7/storage.html#scratch-directories)** areas instead!
|
||||
* No backup policy is applied for the user home directories: **users are responsible for backing up their data**.
|
||||
|
||||
Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the `merlin_quotas` command described
|
||||
[above](/merlin7/storage.html#how-to-check-quotas).
|
||||
|
||||
### Project data directory
|
||||
|
||||
This storage is intended for keeping large amounts of a project's data, where the data also can be
|
||||
shared by all members of the project (the project's corresponding UNIX group). We recommend to keep most data in
|
||||
project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies
|
||||
regarding extending the available storage space.
|
||||
|
||||
Scientists can request a Merlin project space as described in **[[Accessing Merlin -> Requesting a Project]](/merlin7/request-project.html)**.
|
||||
By default, Merlin can offer **general** project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified).
|
||||
General Merlin projects might need to be reviewed after one year of their creation.
|
||||
|
||||
Once a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:
|
||||
|
||||
```bash
|
||||
/data/project/general/$projectname
|
||||
```
|
||||
|
||||
Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
|
||||
|
||||
```bash
|
||||
lfs quota -h -p $projectid /data
|
||||
```
|
||||
|
||||
{{site.data.alerts.warning}}Checking <b>quotas</b> for the Merlin projects is not yet possible.
|
||||
In the future, a list of `projectid` will be provided, so users can check their quotas.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
Directory policies:
|
||||
|
||||
* Read **[Important: Code of Conduct](/merlin7/code-of-conduct.html)** for more information about Merlin7 policies.
|
||||
* It is **forbidden** to use the data directories as `/scratch` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
|
||||
* Please Use `/scratch`, `/data/scratch/shared` for this purpose.
|
||||
* No backups: users are responsible for managing the backups of their data directories.
|
||||
|
||||
#### Dedicated project directories
|
||||
|
||||
Some departments or divisions have bigger storage space requirements on Merlin7. At present, `bio`, `mu3e` and `meg` are the main ones.
|
||||
These are mounted under the following paths:
|
||||
|
||||
```bash
|
||||
/data/project/bio
|
||||
/data/project/mu3e
|
||||
/data/project/meg
|
||||
```
|
||||
|
||||
They follow the same rules as the general projects, except that they have assigned more space.
|
||||
|
||||
### Scratch directories
|
||||
|
||||
There are two different types of scratch storage: **local** (`/scratch`) and **shared** (`/data/scratch/shared`).
|
||||
|
||||
* **local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially
|
||||
true for all jobs running on a single node. Mount path:
|
||||
|
||||
```bash
|
||||
/scratch
|
||||
```
|
||||
|
||||
* **shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster
|
||||
and all tasks need to do I/O on the same temporary files.
|
||||
|
||||
```bash
|
||||
/data/scratch/shared
|
||||
```
|
||||
|
||||
Scratch directories policies:
|
||||
|
||||
* Read **[Important: Code of Conduct](/merlin7/code-of-conduct.html)** for more information about Merlin7 policies.
|
||||
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
||||
* Temporary files *must be deleted at the end of the job by the user*.
|
||||
* Remaining files will be deleted by the system if detected.
|
||||
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
||||
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|
||||
177
docs/merlin7/02-How-To-Use-Merlin/transfer-data.md
Normal file
177
docs/merlin7/02-How-To-Use-Merlin/transfer-data.md
Normal file
@@ -0,0 +1,177 @@
|
||||
---
|
||||
title: Transferring Data
|
||||
#tags:
|
||||
keywords: transferring data, data transfer, rsync, winscp, copy data, copying, sftp, import, export, hop, vpn
|
||||
last_updated: 24 August 2023
|
||||
#summary: ""
|
||||
sidebar: merlin7_sidebar
|
||||
permalink: /merlin7/transfer-data.html
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Most data transfer methods support both sending and receiving, so you may initiate the transfer from either **Merlin** or the other system — depending on **network visibility**.
|
||||
- **From PSI Network to Merlin:** Merlin login nodes are visible from the PSI network, so direct transfers using `rsync`, or **ftp** are generally preferable. Transfers **from Merlin7 to PSI may require special firewall rules**.
|
||||
- **From Merlin to the Internet:** Merlin login nodes can access the internet with a **limited set of protocols**:
|
||||
- HTTP-based protocols on ports `80` or `445` (e.g., HTTPS, WebDAV).
|
||||
- Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.
|
||||
- **From the Internet to PSI:** Systems outside PSI can access the [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) at `datatransfer.psi.ch` using SSH-based protocols or [Globus](https://www.globus.org/).
|
||||
|
||||
> SSH-based protocols using port `22` **to most PSI servers** are generally **not permitted**.
|
||||
> * However, **transfers from any PSI host to Merlin7 using port 22 are allowed**.
|
||||
>
|
||||
> Port `21` is also available for FTP transfers from PSI to Merlin7.
|
||||
|
||||
### Choosing the best transfer method
|
||||
|
||||
| **Scenario** | **Recommended Method** | **Reason** |
|
||||
| ------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- |
|
||||
| Small dataset, Linux/macOS | `rsync` | Resume support, skips existing files, works over SSH |
|
||||
| Quick one-time small transfer | `scp` | Simple syntax, no need to install extra tools |
|
||||
| Large dataset, high speed needed (not sensitive) | FTP via `service03.merlin7.psi.ch` | Fastest transfer speed (unencrypted data channel) |
|
||||
| Large dataset, high speed needed (sensitive data) | FTP via `ftp-encrypted.merlin7.psi.ch` | Encrypted control & data channels for security, but slower than `service03` |
|
||||
| Windows interactive GUI transfer | WinSCP | User-friendly interface, PSI Software Kiosk, supports drag-and-drop |
|
||||
| Cross-platform interactive GUI transfer | FileZilla | User-friendly interface, works on Linux/macOS/Windows, supports drag-and-drop |
|
||||
| From the internet to PSI | [PSI Data Transfer Service](https://www.psi.ch/en/photon-science-data-services/data-transfer) | Supports SSH-based protocols and Globus |
|
||||
| Need for sharing large files | [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload) | Supports sharing large file and expiration date |
|
||||
| PSI -> Merlin7 over FTP | Any FTP-based client | Port 21 allowed from PSI to Merlin7 |
|
||||
| PSI -> Merlin7 over SSH | Any SSH-based method | Port 22 allowed from PSI to Merlin7 |
|
||||
|
||||
The next chapters contain detailed information about the different transfer methods available on Merlin7.
|
||||
|
||||
## Direct Transfer via Merlin7 Login Nodes
|
||||
|
||||
The following methods transfer data directly via the [login nodes](/merlin7/interactive.html#login-nodes-hardware-description). They are suitable for use from **within the PSI network**.
|
||||
|
||||
### Rsync (Recommended for Linux/macOS)
|
||||
|
||||
Rsync is the **preferred** method for small datasets from Linux/macOS systems. It supports **resuming interrupted transfers** and **skips already transferred files**. Syntax:
|
||||
```bash
|
||||
rsync -avAHXS <src> <dst>
|
||||
```
|
||||
|
||||
**An example** for transferring local files to a Merlin project directory
|
||||
|
||||
```bash
|
||||
rsync -avAHXS ~/localdata $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
|
||||
```
|
||||
{{site.data.alerts.tip}}
|
||||
If a transfer is interrupted, just rerun the command: <code>rsync</code> will skip existing files.
|
||||
{{site.data.alerts.end}}
|
||||
{{site.data.alerts.warning}}
|
||||
Rsync uses SSH (port 22). For large datasets, transfer speed might be limited.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
### SCP
|
||||
|
||||
SCP works similarly to `rsync` but **does not support resuming** interrupted transfers. It may be used for quick one-off transfers. Example:
|
||||
```bash
|
||||
scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/
|
||||
```
|
||||
|
||||
### Secure FTP
|
||||
A `vsftpd` service is available on the login nodes, providing high-speed transfers. Choose the server based on your **speed vs. encryption** needs:
|
||||
* **`login001.merlin7.psi.ch`:** Encrypted control & data channels.
|
||||
**Use if your data is sensitive**. **Slower**, but secure.
|
||||
* **`service03.merlin7.psi.ch`**: Encrypted control channel only.
|
||||
Use if your data can be transferred unencrypted. **Fastest** method.
|
||||
|
||||
{{site.data.alerts.tip}}
|
||||
The <b>control channel</b> is always <b>encrypted</b>, therefore, authentication is encrypted and secured.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
## UI-based Clients for Data Transfer
|
||||
### WinSCP (Windows)
|
||||
|
||||
Available in the **Software Kiosk** on PSI Windows machines.
|
||||
* Using your PSI credentials, connect to
|
||||
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
|
||||
* when using port 21, connect to:
|
||||
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
|
||||
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
|
||||
* Drag and drop files between your PC and Merlin.
|
||||
|
||||
* FTP (port 21)
|
||||
|
||||
### FileZilla (Linux/MacOS/Windows)
|
||||
|
||||
Download from [FileZilla Project](https://filezilla-project.org/), or install from your Linux software repositories if available.
|
||||
* Using your PSI credentials, connect to
|
||||
* when using port 22, connect to `login001.merlin7.psi.ch` or `login002.merlin7.psi.ch`.
|
||||
* when using port 21, connect to:
|
||||
* `ftp-encrypted.merlin7.psi.ch`: **Fast** transfer rates. **Both** control and data **channels encrypted**.
|
||||
* `service03.merlin7.psi.ch`: **Fastest** transfer rates, but **data channel not encrypted**.
|
||||
* Supports drag-and-drop file transfers.
|
||||
|
||||
## Sharing Files with SWITCHfilesender
|
||||
|
||||
**[SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload)** is a Swiss-hosted installation of the [FileSender](https://filesender.org/) project — a web-based application that allows authenticated users to securely and easily send **arbitrarily large files** to other users. Features:
|
||||
- **Secure large file transfers:** Send files that exceed normal email attachment limits.
|
||||
- **Time-limited availability:** Files are automatically deleted after the chosen expiration date or number of downloads.
|
||||
- **Voucher system:** Authenticated users can send upload vouchers to external recipients without an account.
|
||||
- **Designed for research & education:** Developed to meet the needs of universities and research institutions.
|
||||
|
||||
About the authentication:
|
||||
- It uses **SimpleSAMLphp**, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.
|
||||
- It's fully integrated with PSI's **Authentication and Authorization Infrastructure (AAI)**.
|
||||
- PSI employees can log in using their PSI account:
|
||||
1. Open [SWITCHfilesender](https://filesender.switch.ch/filesender2/?s=upload).
|
||||
2. Select **PSI** as the institution.
|
||||
3. Authenticate with your PSI credentials.
|
||||
|
||||
The service is designed to **send large files for temporary availability**, not as a permanent publishing platform. Typical use case:
|
||||
1. Upload a file.
|
||||
2. Share the download link with a recipient.
|
||||
3. File remains available until the specified **expiration date** is reached, or the **download limit** is reached.
|
||||
4. The file is **automatically deleted** after expiration.
|
||||
|
||||
{{site.data.alerts.warning}}
|
||||
SWITCHfilesender <b>is not</b> a long-term storage or archiving solution.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
## PSI Data Transfer
|
||||
|
||||
From August 2024, Merlin is connected to the **[PSI Data Transfer](https://www.psi.ch/en/photon-science-data-services/data-transfer)** service,
|
||||
`datatransfer.psi.ch`. This is a central service managed by the **[Linux team](https://linux.psi.ch/index.html)**. However, any problems or questions related to it can be directly
|
||||
[reported](/merlin7/contact.html) to the Merlin administrators, which will forward the request if necessary.
|
||||
|
||||
The PSI Data Transfer servers supports the following protocols:
|
||||
* Data Transfer - SSH (scp / rsync)
|
||||
* Data Transfer - Globus
|
||||
|
||||
Notice that `datatransfer.psi.ch` does not allow SSH login, only `rsync`, `scp` and [Globus](https://www.globus.org/) access is allowed.
|
||||
|
||||
Access to the PSI Data Transfer uses ***Multi factor authentication*** (MFA).
|
||||
Therefore, having the Microsoft Authenticator App is required as explained [here](https://www.psi.ch/en/computing/change-to-mfa).
|
||||
|
||||
{{site.data.alerts.tip}}Please follow the
|
||||
<b><a href="https://www.psi.ch/en/photon-science-data-services/data-transfer">Official PSI Data Transfer</a></b> documentation for further instructions.
|
||||
{{site.data.alerts.end}}
|
||||
|
||||
## Connecting to Merlin7 from outside PSI
|
||||
|
||||
Merlin7 is fully accessible from within the PSI network. To connect from outside you can use:
|
||||
- [VPN](https://www.psi.ch/en/computing/vpn) ([alternate instructions](https://intranet.psi.ch/BIO/ComputingVPN))
|
||||
- [SSH hopx](https://www.psi.ch/en/computing/ssh-hop)
|
||||
* Please avoid transferring big amount data through **hop**
|
||||
- [No Machine](nomachine.md)
|
||||
* Remote Interactive Access through [**'nx.psi.ch'**](https://www.psi.ch/en/photon-science-data-services/remote-interactive-access)
|
||||
* Please avoid transferring big amount of data through **NoMachine**
|
||||
|
||||
{% comment %}
|
||||
## Connecting from Merlin7 to outside file shares
|
||||
|
||||
### `merlin_rmount` command
|
||||
|
||||
Merlin provides a command for mounting remote file systems, called `merlin_rmount`. This
|
||||
provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including
|
||||
- SMB/CIFS (Windows shared folders)
|
||||
- WebDav
|
||||
- AFP
|
||||
- FTP, SFTP
|
||||
- [others](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_the_desktop_environment_in_rhel_8/managing-storage-volumes-in-gnome_using-the-desktop-environment-in-rhel-8#gvfs-back-ends_managing-storage-volumes-in-gnome)
|
||||
|
||||
|
||||
[More instruction on using `merlin_rmount`](/merlin7/merlin-rmount.html)
|
||||
{% endcomment %}
|
||||
|
||||
Reference in New Issue
Block a user