198 lines
11 KiB
Markdown
198 lines
11 KiB
Markdown
---
|
|
title: Merlin6 Storage
|
|
#tags:
|
|
#keywords:
|
|
last_updated: 28 June 2019
|
|
#summary: ""
|
|
sidebar: merlin6_sidebar
|
|
redirect_from: /merlin6/data-directories.html
|
|
permalink: /merlin6/storage.html
|
|
---
|
|
|
|
## Introduction
|
|
|
|
This document describes the different directories of the Merlin6 cluster.
|
|
|
|
### User and project data
|
|
|
|
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
|
* **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`**
|
|
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
|
|
|
{{site.data.alerts.warning}}When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled.
|
|
Hence, <b>when a user leaves PSI</b>, she, her supervisor or team <b>must ensure that the data is backed up to an external storage</b>
|
|
{{site.data.alerts.end}}
|
|
|
|
### Checking user quota
|
|
|
|
For each directory, we provide a way for checking quotas (when required). However, a single command ``merlin_quotas``
|
|
is provided. This is useful to show with a single command all quotas for your filesystems (including AFS, which is not mentioned here).
|
|
|
|
To check your quotas, please run:
|
|
|
|
```bash
|
|
merlin_quotas
|
|
```
|
|
|
|
## Merlin6 directories
|
|
|
|
Merlin6 offers the following directory classes for users:
|
|
|
|
* ``/psi/home/<username>``: Private user **home** directory
|
|
* ``/data/user/<username>``: Private user **data** directory
|
|
* ``/data/project/general/<projectname>``: Shared **Project** directory
|
|
* For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists.
|
|
* ``/scratch``: Local *scratch* disk (only visible by the node running a job).
|
|
* ``/shared-scratch``: Shared *scratch* disk (visible from all nodes).
|
|
* ``/export``: Export directory for data transfer, visible from `ra-merlin-01.psi.ch`, `ra-merlin-02.psi.ch` and Merlin login nodes.
|
|
* Refer to **[Transferring Data](/merlin6/transfer-data.html)** for more information about the export area and data transfer service.
|
|
|
|
{{site.data.alerts.tip}}In GPFS there is a concept called <b>GraceTime</b>. Filesystems have a block (amount of data) and file (number of files) quota.
|
|
This quota contains a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the <b>grace period</b>.
|
|
Once <b>GraceTime</b> or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase
|
|
when this is possible, see below table).
|
|
{{site.data.alerts.end}}
|
|
|
|
Properties of the directory classes:
|
|
|
|
| Directory | Block Quota [Soft:Hard] | Block Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Files | Backup | Backup Policy |
|
|
| ---------------------------------- | ----------------------- | ----------------------- | :-------: | :--------------------------------- |:-------------------------------- | ------ | :----------------------------- |
|
|
| /psi/home/$username | USR [10GB:11GB] | *Undef* | N/A | Up to x2 when strongly justified. | N/A | yes | Daily snapshots for 1 week |
|
|
| /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | 7d | Inmutable. Need a project. | Changeable when justified. | no | Users responsible for backup |
|
|
| /data/project/bio/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no | Project responsible for backup |
|
|
| /data/project/general/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no | Project responsible for backup |
|
|
| /scratch | *Undef* | *Undef* | N/A | N/A | N/A | no | N/A |
|
|
| /shared-scratch | USR [512GB:2TB] | USR [2M:2.5M] | 7d | Up to x2 when strongly justified. | Changeable when justified. | no | N/A |
|
|
| /export | USR [10MB:20TB] | USR [512K:5M] | 10d | Soft can be temporary increased. | Changeable when justified. | no | N/A |
|
|
|
|
{{site.data.alerts.warning}}The use of <b>scratch</b> and <b>export</b> areas as an extension of the quota <i>is forbidden</i>. <b>scratch</b> and <b>export</b> areas <i>must not contain</i> final data.
|
|
<br><b><i>Auto cleanup policies</i></b> in the <b>scratch</b> and <b>export</b> areas are applied.
|
|
{{site.data.alerts.end}}
|
|
|
|
### User home directory
|
|
|
|
This is the default directory users will land when login in to any Merlin6 machine.
|
|
It is intended for your scripts, documents, software development, and other files which
|
|
you want to have backuped. Do not use it for data or HPC I/O-hungry tasks.
|
|
|
|
This directory is mounted in the login and computing nodes under the path:
|
|
|
|
```bash
|
|
/psi/home/$username
|
|
```
|
|
|
|
Home directories are part of the PSI NFS Central Home storage provided by AIT and
|
|
are managed by the Merlin6 administrators.
|
|
|
|
Users can check their quota by running the following command:
|
|
|
|
```bash
|
|
quota -s
|
|
```
|
|
|
|
#### Home directory policy
|
|
|
|
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
|
* Is **forbidden** to use the home directories for IO intensive tasks
|
|
* Use ``/scratch``, ``/shared-scratch``, ``/data/user`` or ``/data/project`` for this purpose.
|
|
* Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**.
|
|
Snapshots can be accessed at this path:
|
|
|
|
```bash
|
|
/psi/home/.snapshop/$username
|
|
```
|
|
|
|
### User data directory
|
|
|
|
The user data directory is intended for *fast IO access* and keeping large amounts of private data.
|
|
This directory is mounted in the login and computing nodes under the directory
|
|
|
|
```bash
|
|
/data/user/$username
|
|
```
|
|
|
|
Users can check their quota by running the following command:
|
|
|
|
```bash
|
|
mmlsquota -u <username> --block-size auto merlin-user
|
|
```
|
|
|
|
#### User data directory policy
|
|
|
|
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
|
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
|
|
* Use ``/scratch``, ``/shared-scratch`` for this purpose.
|
|
* No backup policy is applied for user data directories: users are responsible for backing up their data.
|
|
|
|
### Project data directory
|
|
|
|
This storage is intended for *fast IO access* and keeping large amounts of a project's data, where the data also can be
|
|
shared by all members of the project (the project's corresponding unix group). We recommend to keep most data in
|
|
project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies
|
|
regarding extending the available storage space.
|
|
|
|
Experiments can request a project space as described in **[[Accessing Merlin -> Requesting a Project]](/merlin6/request-project.html)**
|
|
|
|
Once created, the project data directory will be mounted in the login and computing nodes under the dirctory:
|
|
|
|
```bash
|
|
/data/project/general/$projectname
|
|
```
|
|
|
|
Project quotas are defined on a per *group* basis. Users can check the project quota by running the following command:
|
|
|
|
```bash
|
|
mmlsquota -j $projectname --block-size auto -C merlin.psi.ch merlin-proj
|
|
```
|
|
|
|
#### Project Directory policy
|
|
|
|
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
|
* It is **forbidden** to use the data directories as ``scratch`` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files. Please Use ``/scratch``, ``/shared-scratch`` for this purpose.
|
|
* No backups: users are responsible for managing the backups of their data directories.
|
|
|
|
### Scratch directories
|
|
|
|
There are two different types of scratch storage: **local** (``/scratch``) and **shared** (``/shared-scratch``).
|
|
|
|
**local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially
|
|
true for all jobs running on a single node.
|
|
**shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster
|
|
and all tasks need to do I/O on the same temporary files.
|
|
|
|
**local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology. **Shared** scratch is implemented using a distributed parallel filesystem (GPFS) resulting in a higher latency, since it involves remote storage resources and more complex I/O coordination.
|
|
|
|
``/shared-scratch`` is only mounted in the *Merlin6* computing nodes (i.e. not on the login nodes), and its current size is 50TB. This can be increased in the future.
|
|
|
|
The properties of the available scratch storage spaces are given in the following table
|
|
|
|
| Cluster | Service | Scratch | Scratch Mountpoint | Shared Scratch | Shared Scratch Mountpoint | Comments |
|
|
| ------- | -------------- | ------------ | ------------------ | -------------- | ------------------------- | -------------------------------------- |
|
|
| merlin5 | computing node | 50GB / SAS | ``/scratch`` | ``N/A`` | ``N/A`` | ``merlin-c-[01-64]`` |
|
|
| merlin6 | login node | 100GB / SAS | ``/scratch`` | 50TB / GPFS | ``/shared-scratch`` | ``merlin-l-0[1,2]`` |
|
|
| merlin6 | computing node | 1.3TB / NVMe | ``/scratch`` | 50TB / GPFS | ``/shared-scratch`` | ``merlin-c-[001-024,101-124,201-224]`` |
|
|
| merlin6 | login node | 2.0TB / NVMe | ``/scratch`` | 50TB / GPFS | ``/shared-scratch`` | ``merlin-l-00[1,2]`` |
|
|
|
|
#### Scratch directories policy
|
|
|
|
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
|
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
|
* Temporary files *must be deleted at the end of the job by the user*.
|
|
* Remaining files will be deleted by the system if detected.
|
|
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
|
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|
|
|
|
### Export directory
|
|
|
|
Export directory is exclusively intended for transferring data from outside PSI to Merlin and viceversa. Is a temporary directoy with an auto-cleanup policy.
|
|
Please read **[Transferring Data](/merlin6/transfer-data.html)** for more information about it.
|
|
|
|
#### Export directory policy
|
|
|
|
* Temporary files *must be deleted at the end of the job by the user*.
|
|
* Remaining files will be deleted by the system if detected.
|
|
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
|
* If for some reason the export area gets full, admins have the rights to cleanup the oldest data
|
|
|
|
---
|