185 lines
10 KiB
Markdown
185 lines
10 KiB
Markdown
---
|
|
title: Merlin7 Storage
|
|
#tags:
|
|
keywords: storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas
|
|
#last_updated: 07 September 2022
|
|
#summary: ""
|
|
sidebar: merlin7_sidebar
|
|
redirect_from: /merlin7/data-directories.html
|
|
permalink: /merlin7/storage.html
|
|
---
|
|
|
|
## Introduction
|
|
|
|
This document describes the different directories of the Merlin7 cluster.
|
|
|
|
### Backup and data policies
|
|
|
|
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
|
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
|
|
|
{{site.data.alerts.warning}}When a user leaves PSI and their account is removed, their storage space in Merlin may be recycled.
|
|
Hence, <b>when a user leaves PSI</b>, they, their supervisor or team <b>must ensure that the data is backed up to an external storage</b>
|
|
{{site.data.alerts.end}}
|
|
|
|
### How to check quotas
|
|
|
|
Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the `merlin_quotas` command.
|
|
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
|
|
|
|
```console
|
|
$ merlin_quotas
|
|
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
|
|
-------------- --------- ---------- ------- --------- ---------- -------
|
|
/data/user 30.26G 1T 03% 367296 2097152 18%
|
|
└─ <USERNAME>
|
|
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
|
|
└─ user/<USERDIR>
|
|
/data/project 2.457T 10T 25% 58 2097152 00%
|
|
└─ bio/shared
|
|
/data/project 338.3G 10T 03% 199391 2097152 10%
|
|
└─ bio/hpce
|
|
```
|
|
|
|
{{site.data.alerts.note}}On first use you will see a message about some configuration being generated, this is expected. Don't be
|
|
surprised that it takes some time. After this using <code>merlin_quotas</code> should be faster.
|
|
{{site.data.alerts.end}}
|
|
|
|
The output shows the quotas set and how much you are using of the quota, for each filesystem that has this set. Notice that some users will have
|
|
one or more `/data/project/...` directories showing, depending on whether you are part of a specific PSI research group or project.
|
|
|
|
The general quota constraints for the different directories are shown in the [table below](#dir_classes). Further details on how to use `merlin_quotas`
|
|
can be found on the [Tools page](/merlin7/tools.html).
|
|
|
|
{{site.data.alerts.tip}}If you're interesting, you can retrieve the Lustre-based quota information directly by calling
|
|
<code>lfs quota -h -p $(( 100000000 + $(id -u $USER) )) /data</code> directly. Using the <code>merlin_quotas</code> command is more
|
|
convenient and shows all your relevant filesystem quotas.
|
|
{{site.data.alerts.end}}
|
|
|
|
## Merlin7 directories
|
|
|
|
Merlin7 offers the following directory classes for users:
|
|
|
|
* `/data/user/<username>`: Private user **home** directory
|
|
* `/data/project/general`: project directory for Merlin
|
|
* `/data/project/bio/$projectname`: project directory for BIO
|
|
* `/data/project/mu3e/$projectname`: project directory for Mu3e
|
|
* `/data/project/meg/$projectname`: project directory for Mu3e
|
|
* `/scratch`: Local *scratch* disk (only visible by the node running a job).
|
|
* `/data/scratch/shared`: Shared *scratch* disk (visible from all nodes).
|
|
|
|
{{site.data.alerts.tip}}In Lustre there is a concept called <b>grace time</b>. Filesystems have a block (amount of data) and inode (number of files) quota.
|
|
These quotas contain a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the <b>grace period</b>.
|
|
Once the <b>grace time</b> or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase
|
|
when this is possible, see below table).
|
|
{{site.data.alerts.end}}
|
|
|
|
<a name="dir_classes"></a>Properties of the directory classes:
|
|
|
|
| Directory | Block Quota [Soft:Hard] | Inode Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Inodes | Backup |
|
|
| ---------------------------------- | ----------------------- | ----------------------- | :-------: | :--------------------------------- |:-------------------------------- | ------ |
|
|
| /data/user/$username | PRJ [1TB:1.074TB] | PRJ [2M:2.1M] | 7d | Immutable. Need a project. | Changeable when justified. | no |
|
|
| /data/project/bio/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
|
|
| /data/project/general/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
|
|
| /data/scratch/shared | USR [512GB:2TB] | | 7d | Up to x2 when strongly justified. | Changeable when justified. | no |
|
|
| /scratch | *Undef* | *Undef* | N/A | N/A | N/A | no |
|
|
|
|
{{site.data.alerts.warning}}The use of <b>/scratch</b> and <b>/data/scratch/shared</b> areas as an extension of the quota <i>is forbidden</i>. The <b>/scratch</b> and
|
|
<b>/data/scratch/shared</b> areas <i>must not contain</i> final data. Keep in mind that <br><b><i>auto cleanup policies</i></b> in the <b>/scratch</b> and
|
|
<b>/data/scratch/shared</b> areas are applied.
|
|
{{site.data.alerts.end}}
|
|
|
|
### User home directory
|
|
|
|
This is the default directory users will land when login in to any Merlin7 machine.
|
|
It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.
|
|
|
|
The home directories are mounted in the login and computing nodes under the directory
|
|
|
|
```bash
|
|
/data/user/$username
|
|
```
|
|
|
|
Directory policies:
|
|
|
|
* Read **[Important: Code of Conduct](/merlin7/code-of-conduct.html)** for more information about Merlin7 policies.
|
|
* Is **forbidden** to use the home directories for IO-intensive tasks, instead use one of the **[scratch](/merlin7/storage.html#scratch-directories)** areas instead!
|
|
* No backup policy is applied for the user home directories: **users are responsible for backing up their data**.
|
|
|
|
Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the `merlin_quotas` command described
|
|
[above](/merlin7/storage.html#how-to-check-quotas).
|
|
|
|
### Project data directory
|
|
|
|
This storage is intended for keeping large amounts of a project's data, where the data also can be
|
|
shared by all members of the project (the project's corresponding UNIX group). We recommend to keep most data in
|
|
project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies
|
|
regarding extending the available storage space.
|
|
|
|
Scientists can request a Merlin project space as described in **[[Accessing Merlin -> Requesting a Project]](/merlin7/request-project.html)**.
|
|
By default, Merlin can offer **general** project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified).
|
|
General Merlin projects might need to be reviewed after one year of their creation.
|
|
|
|
Once a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:
|
|
|
|
```bash
|
|
/data/project/general/$projectname
|
|
```
|
|
|
|
Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
|
|
|
|
```bash
|
|
lfs quota -h -p $projectid /data
|
|
```
|
|
|
|
{{site.data.alerts.warning}}Checking <b>quotas</b> for the Merlin projects is not yet possible.
|
|
In the future, a list of `projectid` will be provided, so users can check their quotas.
|
|
{{site.data.alerts.end}}
|
|
|
|
Directory policies:
|
|
|
|
* Read **[Important: Code of Conduct](/merlin7/code-of-conduct.html)** for more information about Merlin7 policies.
|
|
* It is **forbidden** to use the data directories as `/scratch` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
|
|
* Please Use `/scratch`, `/data/scratch/shared` for this purpose.
|
|
* No backups: users are responsible for managing the backups of their data directories.
|
|
|
|
#### Dedicated project directories
|
|
|
|
Some departments or divisions have bigger storage space requirements on Merlin7. At present, `bio`, `mu3e` and `meg` are the main ones.
|
|
These are mounted under the following paths:
|
|
|
|
```bash
|
|
/data/project/bio
|
|
/data/project/mu3e
|
|
/data/project/meg
|
|
```
|
|
|
|
They follow the same rules as the general projects, except that they have assigned more space.
|
|
|
|
### Scratch directories
|
|
|
|
There are two different types of scratch storage: **local** (`/scratch`) and **shared** (`/data/scratch/shared`).
|
|
|
|
* **local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially
|
|
true for all jobs running on a single node. Mount path:
|
|
|
|
```bash
|
|
/scratch
|
|
```
|
|
|
|
* **shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster
|
|
and all tasks need to do I/O on the same temporary files.
|
|
|
|
```bash
|
|
/data/scratch/shared
|
|
```
|
|
|
|
Scratch directories policies:
|
|
|
|
* Read **[Important: Code of Conduct](/merlin7/code-of-conduct.html)** for more information about Merlin7 policies.
|
|
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
|
* Temporary files *must be deleted at the end of the job by the user*.
|
|
* Remaining files will be deleted by the system if detected.
|
|
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
|
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|