--- title: Merlin6 Storage #tags: #keywords: last_updated: 28 June 2019 #summary: "" sidebar: merlin6_sidebar redirect_from: /merlin6/data-directories.html permalink: /merlin6/storage.html --- ## Introduction This document describes the different directories of the Merlin6 cluster. ### User and project data * ***Users are responsible for backing up their own data***. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.). * **`/psi/home`**, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory **`/psi/home/.snapshot/`** * ***When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster***: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account. {{site.data.alerts.warning}}When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled. Hence, when a user leaves PSI, she, her supervisor or team must ensure that the data is backed up to an external storage {{site.data.alerts.end}} ### Checking user quota For each directory, we provide a way for checking quotas (when required). However, a single command ``merlin_quotas`` is provided. This is useful to show with a single command all quotas for your filesystems (including AFS, which is not mentioned here). To check your quotas, please run: ```bash merlin_quotas ``` ## Merlin6 directories Merlin6 offers the following directory classes for users: * ``/psi/home/``: Private user **home** directory * ``/data/user/``: Private user **data** directory * ``/data/project/general/``: Shared **Project** directory * For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists. * ``/scratch``: Local *scratch* disk (only visible by the node running a job). * ``/shared-scratch``: Shared *scratch* disk (visible from all nodes). * ``/export``: Export directory for data transfer, visible from `ra-merlin-01.psi.ch`, `ra-merlin-02.psi.ch` and Merlin login nodes. * Refer to **[Transferring Data](/merlin6/transfer-data.html)** for more information about the export area and data transfer service. {{site.data.alerts.tip}}In GPFS there is a concept called GraceTime. Filesystems have a block (amount of data) and file (number of files) quota. This quota contains a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the grace period. Once GraceTime or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase when this is possible, see below table). {{site.data.alerts.end}} Properties of the directory classes: | Directory | Block Quota [Soft:Hard] | Block Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Files | Backup | Backup Policy | | ---------------------------------- | ----------------------- | ----------------------- | :-------: | :--------------------------------- |:-------------------------------- | ------ | :----------------------------- | | /psi/home/$username | USR [10GB:11GB] | *Undef* | N/A | Up to x2 when strongly justified. | N/A | yes | Daily snapshots for 1 week | | /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | 7d | Inmutable. Need a project. | Changeable when justified. | no | Users responsible for backup | | /data/project/bio/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no | Project responsible for backup | | /data/project/general/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no | Project responsible for backup | | /scratch | *Undef* | *Undef* | N/A | N/A | N/A | no | N/A | | /shared-scratch | USR [512GB:2TB] | USR [2M:2.5M] | 7d | Up to x2 when strongly justified. | Changeable when justified. | no | N/A | | /export | USR [10MB:20TB] | USR [512K:5M] | 10d | Soft can be temporary increased. | Changeable when justified. | no | N/A | {{site.data.alerts.warning}}The use of scratch and export areas as an extension of the quota is forbidden. scratch and export areas must not contain final data.
Auto cleanup policies in the scratch and export areas are applied. {{site.data.alerts.end}} ### User home directory This is the default directory users will land when login in to any Merlin6 machine. It is intended for your scripts, documents, software development, and other files which you want to have backuped. Do not use it for data or HPC I/O-hungry tasks. This directory is mounted in the login and computing nodes under the path: ```bash /psi/home/$username ``` Home directories are part of the PSI NFS Central Home storage provided by AIT and are managed by the Merlin6 administrators. Users can check their quota by running the following command: ```bash quota -s ``` #### Home directory policy * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. * Is **forbidden** to use the home directories for IO intensive tasks * Use ``/scratch``, ``/shared-scratch``, ``/data/user`` or ``/data/project`` for this purpose. * Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**. Snapshots can be accessed at this path: ```bash /psi/home/.snapshop/$username ``` ### User data directory The user data directory is intended for *fast IO access* and keeping large amounts of private data. This directory is mounted in the login and computing nodes under the directory ```bash /data/user/$username ``` Users can check their quota by running the following command: ```bash mmlsquota -u --block-size auto merlin-user ``` #### User data directory policy * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. * Is **forbidden** to use the data directories as ``scratch`` area during a job runtime. * Use ``/scratch``, ``/shared-scratch`` for this purpose. * No backup policy is applied for user data directories: users are responsible for backing up their data. ### Project data directory This storage is intended for *fast IO access* and keeping large amounts of a project's data, where the data also can be shared by all members of the project (the project's corresponding unix group). We recommend to keep most data in project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies regarding extending the available storage space. Experiments can request a project space as described in **[[Accessing Merlin -> Requesting a Project]](/merlin6/request-project.html)** Once created, the project data directory will be mounted in the login and computing nodes under the dirctory: ```bash /data/project/general/$projectname ``` Project quotas are defined on a per *group* basis. Users can check the project quota by running the following command: ```bash mmlsquota -j $projectname --block-size auto -C merlin.psi.ch merlin-proj ``` #### Project Directory policy * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. * It is **forbidden** to use the data directories as ``scratch`` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files. Please Use ``/scratch``, ``/shared-scratch`` for this purpose. * No backups: users are responsible for managing the backups of their data directories. ### Scratch directories There are two different types of scratch storage: **local** (``/scratch``) and **shared** (``/shared-scratch``). **local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially true for all jobs running on a single node. **shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster and all tasks need to do I/O on the same temporary files. **local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology. **Shared** scratch is implemented using a distributed parallel filesystem (GPFS) resulting in a higher latency, since it involves remote storage resources and more complex I/O coordination. ``/shared-scratch`` is only mounted in the *Merlin6* computing nodes (i.e. not on the login nodes), and its current size is 50TB. This can be increased in the future. The properties of the available scratch storage spaces are given in the following table | Cluster | Service | Scratch | Scratch Mountpoint | Shared Scratch | Shared Scratch Mountpoint | Comments | | ------- | -------------- | ------------ | ------------------ | -------------- | ------------------------- | -------------------------------------- | | merlin5 | computing node | 50GB / SAS | ``/scratch`` | ``N/A`` | ``N/A`` | ``merlin-c-[01-64]`` | | merlin6 | login node | 100GB / SAS | ``/scratch`` | 50TB / GPFS | ``/shared-scratch`` | ``merlin-l-0[1,2]`` | | merlin6 | computing node | 1.3TB / NVMe | ``/scratch`` | 50TB / GPFS | ``/shared-scratch`` | ``merlin-c-[001-024,101-124,201-224]`` | | merlin6 | login node | 2.0TB / NVMe | ``/scratch`` | 50TB / GPFS | ``/shared-scratch`` | ``merlin-l-00[1,2]`` | #### Scratch directories policy * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. * By default, *always* use **local** first and only use **shared** if your specific use case requires it. * Temporary files *must be deleted at the end of the job by the user*. * Remaining files will be deleted by the system if detected. * Files not accessed within 28 days will be automatically cleaned up by the system. * If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data. ### Export directory Export directory is exclusively intended for transferring data from outside PSI to Merlin and viceversa. Is a temporary directoy with an auto-cleanup policy. Please read **[Transferring Data](/merlin6/transfer-data.html)** for more information about it. #### Export directory policy * Temporary files *must be deleted at the end of the job by the user*. * Remaining files will be deleted by the system if detected. * Files not accessed within 28 days will be automatically cleaned up by the system. * If for some reason the export area gets full, admins have the rights to cleanup the oldest data ---