156 lines
8.7 KiB
Markdown
156 lines
8.7 KiB
Markdown
---
|
|
title: Merlin7 Storage
|
|
#tags:
|
|
keywords: storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas
|
|
last_updated: 07 September 2022
|
|
#summary: ""
|
|
sidebar: merlin7_sidebar
|
|
redirect_from: /merlin7/data-directories.html
|
|
permalink: /merlin7/storage.html
|
|
---
|
|
|
|
## Introduction
|
|
|
|
This document describes the different directories of the Merlin7 cluster.
|
|
|
|
### Backup and data policies
|
|
|
|
* ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
|
|
* ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
|
|
|
|
{{site.data.alerts.warning}}When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled.
|
|
Hence, <b>when a user leaves PSI</b>, she, her supervisor or team <b>must ensure that the data is backed up to an external storage</b>
|
|
{{site.data.alerts.end}}
|
|
|
|
### How to check quotas
|
|
|
|
Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the `merlin_quotas` command.
|
|
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
|
|
```bash
|
|
merlin_quotas
|
|
```
|
|
|
|
{{site.data.alerts.warning}}Currently, <b>merlin_quotas</b> is not functional one the Merlin7 cluster.
|
|
We will notify when this is ready to be used.
|
|
{{site.data.alerts.end}}
|
|
|
|
## Merlin7 directories
|
|
|
|
Merlin7 offers the following directory classes for users:
|
|
|
|
* ``/data/user/<username>``: Private user **home** directory
|
|
* ``/data/project/general``: project directory for Merlin
|
|
* ``/data/project/bio/$projectname`` project directory for BIO
|
|
* ``/data/project/mu3e/$projectname`` project directory for Mu3e
|
|
* ``/data/project/meg/$projectname`` project directory for Mu3e
|
|
* ``/scratch``: Local *scratch* disk (only visible by the node running a job).
|
|
* ``/data/scratch/shared``: Shared *scratch* disk (visible from all nodes).
|
|
|
|
{{site.data.alerts.tip}}In Lustre there is a concept called <b>grace time</b>. Filesystems have a block (amount of data) and inode (number of files) quota.
|
|
These quotas contain a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the <b>grace period</b>.
|
|
Once the <b>grace time</b> or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase
|
|
when this is possible, see below table).
|
|
{{site.data.alerts.end}}
|
|
|
|
Properties of the directory classes:
|
|
|
|
| Directory | Block Quota [Soft:Hard] | Inode Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Inodes | Backup |
|
|
| ---------------------------------- | ----------------------- | ----------------------- | :-------: | :--------------------------------- |:-------------------------------- | ------ |
|
|
| /data/user/$username | PRJ [1TB:1.074TB] | PRJ [2M:2.1M] | 7d | Inmutable. Need a project. | Changeable when justified. | no |
|
|
| /data/project/bio/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
|
|
| /data/project/general/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
|
|
| /scratch | *Undef* | *Undef* | N/A | N/A | N/A | no |
|
|
| /data/scratch/shared | USR [512GB:2TB] | | 7d | Up to x2 when strongly justified. | Changeable when justified. | no |
|
|
|
|
{{site.data.alerts.warning}}The use of <b>scratch</b> and <b>/data/scratch/shared</b> areas as an extension of the quota <i>is forbidden</i>. <b>scratch</b> and <b>/data/scratch/shared</b> areas <i>must not contain</i> final data. Keep in mind that <br><b><i>auto cleanup policies</i></b> in the <b>scratch</b> and <b>/data/scratch/shared</b> areas are applied.
|
|
{{site.data.alerts.end}}
|
|
|
|
### User home directory
|
|
|
|
This is the default directory users will land when login in to any Merlin7 machine.
|
|
It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.
|
|
|
|
The home directories are mounted in the login and computing nodes under the directory
|
|
```bash
|
|
/data/user/$username
|
|
```
|
|
|
|
Directory policies:
|
|
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin7 policies.
|
|
* Is **forbidden** to use the home directories for IO-intensive tasks
|
|
* Use always the local ``/scratch`` disk of the compute nodes in first place.
|
|
* Use ``/data/scratch/shared`` only when necessary. In example, by jobs requiring a fast shared storage area.
|
|
* No backup policy is applied for the user home directories: users are responsible for backing up their data.
|
|
|
|
Home directory quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
|
|
```bash
|
|
lfs quota -h -p $(( 100000000 + $(id -u $USER) )) /data
|
|
```
|
|
|
|
### Project data directory
|
|
|
|
This storage is intended for keeping large amounts of a project's data, where the data also can be
|
|
shared by all members of the project (the project's corresponding unix group). We recommend to keep most data in
|
|
project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies
|
|
regarding extending the available storage space.
|
|
|
|
Scientists can request a Merlin project space as described in **[[Accessing Merlin -> Requesting a Project]](/merlin7/request-project.html)**.
|
|
By default, Merlin can offer **general** project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified).
|
|
General Merlin projects might need to be reviewed after one year of their creation.
|
|
|
|
Once a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:
|
|
|
|
```bash
|
|
/data/project/general/$projectname
|
|
```
|
|
|
|
Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
|
|
```bash
|
|
lfs quota -h -p $projectid /data
|
|
```
|
|
{{site.data.alerts.warning}}Checking <b>quotas</b> for the Merlin projects is not yet possible.
|
|
In the future, a list of `projectid` will be provided, so users can check their quotas.
|
|
{{site.data.alerts.end}}
|
|
|
|
Directory policies:
|
|
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin7 policies.
|
|
* It is **forbidden** to use the data directories as ``scratch`` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
|
|
* Please Use ``/scratch``, ``/data/scratch/shared`` for this purpose.
|
|
* No backups: users are responsible for managing the backups of their data directories.
|
|
|
|
#### Dedicated project directories
|
|
|
|
Some departments or divisions have bigger storage space requirements on Merlin7. At present, `bio`, `mu3e` and `meg` are the main ones.
|
|
These are mounted under the following paths:
|
|
|
|
```bash
|
|
/data/project/bio
|
|
/data/project/mu3e
|
|
/data/project/meg
|
|
```
|
|
|
|
They follow the same rules as the general projects, except that they have assigned more space.
|
|
|
|
### Scratch directories
|
|
|
|
There are two different types of scratch storage: **local** (``/scratch``) and **shared** (``/data/scratch/shared``).
|
|
|
|
* **local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially
|
|
true for all jobs running on a single node. Mount path:
|
|
```bash
|
|
/scratch
|
|
```
|
|
* **shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster
|
|
and all tasks need to do I/O on the same temporary files.
|
|
```bash
|
|
/data/scratch/shared
|
|
```
|
|
|
|
Scratch directories policies:
|
|
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin7 policies.
|
|
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
|
|
* Temporary files *must be deleted at the end of the job by the user*.
|
|
* Remaining files will be deleted by the system if detected.
|
|
* Files not accessed within 28 days will be automatically cleaned up by the system.
|
|
* If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
|