reviewed direcory documentation

This commit is contained in:
feichtinger 2019-06-28 18:12:46 +02:00
parent bb27bb4caa
commit d0e6314246

View File

@ -2,7 +2,7 @@
title: Merlin6 Data Directories
#tags:
#keywords:
last_updated: 18 June 2019
last_updated: 28 June 2019
#summary: ""
sidebar: merlin6_sidebar
permalink: /merlin6/data-directories.html
@ -10,16 +10,16 @@ permalink: /merlin6/data-directories.html
## Merlin6 directory structure
Merlin6 contain the following directories available for users:
Merlin6 offers the following directory classes for users:
* ``/psi/home/<username>``: private user **home** directory
* ``/data/user/<username>``: private user **home** directory
* ``/psi/home/<username>``: Private user **home** directory
* ``/data/user/<username>``: Private user **data** directory
* ``/data/project/general/<projectname>``: Shared **Project** directory
* For BIO experiments, a dedicate ``/data/project/bio/$projectname`` exists.
* ``/scratch``: Local *scratch* disk.
* ``/shared-scratch``: Shared *scratch* disk.
* For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists.
* ``/scratch``: Local *scratch* disk (only visible by the node running a job).
* ``/shared-scratch``: Shared *scratch* disk (visible from all nodes).
A summary for each directory would be:
Properties of the directory classes:
| Directory | Block Quota [Soft:Hard] | Block Quota [Soft:Hard] | Quota Change Policy: Block | Quota Change Policy: Files | Backup | Backup Policy |
| ---------------------------------- | ----------------------- | ----------------------- |:--------------------------------- |:-------------------------------- | ------ | :----------------------------- |
@ -32,16 +32,19 @@ A summary for each directory would be:
### User home directory
Home directories are part of the PSI NFS Central Home storage provided by AIT.
However, administration for the Merlin6 NFS homes is delegated to Merlin6 administrators.
This is the default directory users will land when login in to any Merlin6 machine.
This directory is mounted in the login and computing nodes under the directory:
It is intended for your scripts, documents, software development, and other files which
you want to have backuped. Do not use it for data or HPC I/O-hungry tasks.
This directory is mounted in the login and computing nodes under the path:
```bash
/psi/home/$username
```
Home directories are part of the PSI NFS Central Home storage provided by AIT and
are managed by the Merlin6 administrators.
Users can check their quota by running the following command:
```bash
@ -53,8 +56,8 @@ quota -s
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
* Is **forbidden** to use the home directories for IO intensive tasks
* Use ``/scratch``, ``/shared-scratch``, ``/data/user`` or ``/data/project`` for this purpose.
* Users can recover up to 1 week of their lost data thanks to the automatic **daily snapshorts for 1 week**.
Snapshots are found in the following directory:
* Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**.
Snapshots can be accessed at this path:
```bash
/psi/home/.snapshop/$username
@ -62,9 +65,7 @@ Snapshots are found in the following directory:
### User data directory
User data directories are part of the Merlin6 storage cluster and technology is based on GPFS.
The user data directory is intended for *fast IO access* and keeping large amount of private data.
The user data directory is intended for *fast IO access* and keeping large amounts of private data.
This directory is mounted in the login and computing nodes under the directory
```bash
@ -77,7 +78,7 @@ Users can check their quota by running the following command:
mmlsquota -u <username> --block-size auto merlin-user
```
#### User Directory policy
#### User data directory policy
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
@ -86,16 +87,18 @@ mmlsquota -u <username> --block-size auto merlin-user
### Project data directory
Project data directories are part of the Merlin6 storage cluster and technology is based on GPFS.
This storage is intended for *fast IO access* and keeping large amounts of a project's data, where the data also can be
shared by all members of the project (the project's corresponding unix group). We recommend to keep most data in
project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies
regarding extending the available storage space.
This storage is intended for *fast IO access* and keeping large amount of private data, but also for sharing data amogst
different users sharing a project.
Creating a project is the way in where users can expand his storage space and will optimize the usage of the storage
(by avoiding for instance, duplicated data for different users).
You can request a project space by submitting an incident request via **[PSI Service Now](https://psi.service-now.com/psisp)** using the subject line
Is **highly** recommended the use of a project when multiple persons are involved in the same project managing similar/common data.
Quotas are defined in a *group* and *fileset* basis: Unix Group name must exist for a specific project or must be created for
any new project. Contact the Merlin6 administrators for more information about that.
```
Subject: [Merlin6] Project Request for project name xxxxxx
```
Please list your wish for a project name and list the accounts that should be part of it. The project will receive a corresponding unix group.
The project data directory is mounted in the login and computing nodes under the dirctory:
@ -103,7 +106,7 @@ The project data directory is mounted in the login and computing nodes under the
/data/project/$username
```
Users can check the project quota by running the following command:
Project quotas are defined on a per *group* basis. Users can check the project quota by running the following command:
```bash
mmrepquota merlin-proj:$projectname
@ -112,24 +115,23 @@ mmrepquota merlin-proj:$projectname
#### Project Directory policy
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
* Use ``/scratch``, ``/shared-scratch`` for this purpose.
* It is **forbidden** to use the data directories as ``scratch`` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files. Please Use ``/scratch``, ``/shared-scratch`` for this purpose.
* No backups: users are responsible for managing the backups of their data directories.
### Scratch directories
There are two different types of scratch disk: **local** (``/scratch``) and **shared** (``/shared-scratch``).
Specific details of each type is described below.
There are two different types of scratch storage: **local** (``/scratch``) and **shared** (``/shared-scratch``).
Usually **shared** scratch will be used for those jobs running on multiple nodes which need to access to a common shared space
for creating temporary files, while **local** scratch should be used by those jobs needing a local space for creating temporary files.
**local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially
true for all jobs running on a single node.
**shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster
and all tasks need to do I/O on the same temporary files.
**local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology,
while **shared** scratch, despite being also very fast, is an external GPFS storage with more latency.
**local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology. **Shared** scratch is implemented using a distributed parallel filesystem (GPFS) resulting in a higher latency, since it involves remote storage resources and more complex I/O coordination.
``/shared-scratch`` is only mounted in the *Merlin6* computing nodes, and its current size is 50TB. Whenever necessary, it can be increased in the future.
``/shared-scratch`` is only mounted in the *Merlin6* computing nodes (i.e. not on the login nodes), and its current size is 50TB. This can be increased in the future.
A summary for the scratch directories is the following:
The properties of the available scratch storage spaces are given in the following table
| Cluster | Service | Scratch | Scratch Mountpoint | Shared Scratch | Shared Scratch Mountpoint | Comments |
| ------- | -------------- | ------------ | ------------------ | -------------- | ------------------------- | ------------------------------------- |
@ -141,7 +143,7 @@ A summary for the scratch directories is the following:
#### Scratch directories policy
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
* By default, *always* use **local** first and only use **shared** if you specific use case needs a shared scratch area.
* By default, *always* use **local** first and only use **shared** if your specific use case requires it.
* Temporary files *must be deleted at the end of the job by the user*.
* Remaining files will be deleted by the system if detected.