diff --git a/pages/merlin6/accessing-merlin6/merlin6-directories.md b/pages/merlin6/accessing-merlin6/merlin6-directories.md index adb1ade..39a31be 100644 --- a/pages/merlin6/accessing-merlin6/merlin6-directories.md +++ b/pages/merlin6/accessing-merlin6/merlin6-directories.md @@ -2,7 +2,7 @@ title: Merlin6 Data Directories #tags: #keywords: -last_updated: 18 June 2019 +last_updated: 28 June 2019 #summary: "" sidebar: merlin6_sidebar permalink: /merlin6/data-directories.html @@ -10,16 +10,16 @@ permalink: /merlin6/data-directories.html ## Merlin6 directory structure -Merlin6 contain the following directories available for users: +Merlin6 offers the following directory classes for users: -* ``/psi/home/``: private user **home** directory -* ``/data/user/``: private user **home** directory +* ``/psi/home/``: Private user **home** directory +* ``/data/user/``: Private user **data** directory * ``/data/project/general/``: Shared **Project** directory - * For BIO experiments, a dedicate ``/data/project/bio/$projectname`` exists. -* ``/scratch``: Local *scratch* disk. -* ``/shared-scratch``: Shared *scratch* disk. + * For BIO experiments, a dedicated ``/data/project/bio/$projectname`` exists. +* ``/scratch``: Local *scratch* disk (only visible by the node running a job). +* ``/shared-scratch``: Shared *scratch* disk (visible from all nodes). -A summary for each directory would be: +Properties of the directory classes: | Directory | Block Quota [Soft:Hard] | Block Quota [Soft:Hard] | Quota Change Policy: Block | Quota Change Policy: Files | Backup | Backup Policy | | ---------------------------------- | ----------------------- | ----------------------- |:--------------------------------- |:-------------------------------- | ------ | :----------------------------- | @@ -32,16 +32,19 @@ A summary for each directory would be: ### User home directory -Home directories are part of the PSI NFS Central Home storage provided by AIT. -However, administration for the Merlin6 NFS homes is delegated to Merlin6 administrators. - This is the default directory users will land when login in to any Merlin6 machine. -This directory is mounted in the login and computing nodes under the directory: +It is intended for your scripts, documents, software development, and other files which +you want to have backuped. Do not use it for data or HPC I/O-hungry tasks. + +This directory is mounted in the login and computing nodes under the path: ```bash /psi/home/$username ``` +Home directories are part of the PSI NFS Central Home storage provided by AIT and +are managed by the Merlin6 administrators. + Users can check their quota by running the following command: ```bash @@ -53,8 +56,8 @@ quota -s * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. * Is **forbidden** to use the home directories for IO intensive tasks * Use ``/scratch``, ``/shared-scratch``, ``/data/user`` or ``/data/project`` for this purpose. -* Users can recover up to 1 week of their lost data thanks to the automatic **daily snapshorts for 1 week**. -Snapshots are found in the following directory: +* Users can retrieve up to 1 week of their lost data thanks to the automatic **daily snapshots for 1 week**. +Snapshots can be accessed at this path: ```bash /psi/home/.snapshop/$username @@ -62,9 +65,7 @@ Snapshots are found in the following directory: ### User data directory -User data directories are part of the Merlin6 storage cluster and technology is based on GPFS. - -The user data directory is intended for *fast IO access* and keeping large amount of private data. +The user data directory is intended for *fast IO access* and keeping large amounts of private data. This directory is mounted in the login and computing nodes under the directory ```bash @@ -77,7 +78,7 @@ Users can check their quota by running the following command: mmlsquota -u --block-size auto merlin-user ``` -#### User Directory policy +#### User data directory policy * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. * Is **forbidden** to use the data directories as ``scratch`` area during a job runtime. @@ -86,16 +87,18 @@ mmlsquota -u --block-size auto merlin-user ### Project data directory -Project data directories are part of the Merlin6 storage cluster and technology is based on GPFS. +This storage is intended for *fast IO access* and keeping large amounts of a project's data, where the data also can be +shared by all members of the project (the project's corresponding unix group). We recommend to keep most data in +project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies +regarding extending the available storage space. -This storage is intended for *fast IO access* and keeping large amount of private data, but also for sharing data amogst -different users sharing a project. -Creating a project is the way in where users can expand his storage space and will optimize the usage of the storage -(by avoiding for instance, duplicated data for different users). +You can request a project space by submitting an incident request via **[PSI Service Now](https://psi.service-now.com/psisp)** using the subject line -Is **highly** recommended the use of a project when multiple persons are involved in the same project managing similar/common data. -Quotas are defined in a *group* and *fileset* basis: Unix Group name must exist for a specific project or must be created for -any new project. Contact the Merlin6 administrators for more information about that. + ``` + Subject: [Merlin6] Project Request for project name xxxxxx + ``` + +Please list your wish for a project name and list the accounts that should be part of it. The project will receive a corresponding unix group. The project data directory is mounted in the login and computing nodes under the dirctory: @@ -103,7 +106,7 @@ The project data directory is mounted in the login and computing nodes under the /data/project/$username ``` -Users can check the project quota by running the following command: +Project quotas are defined on a per *group* basis. Users can check the project quota by running the following command: ```bash mmrepquota merlin-proj:$projectname @@ -112,24 +115,23 @@ mmrepquota merlin-proj:$projectname #### Project Directory policy * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. -* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime. - * Use ``/scratch``, ``/shared-scratch`` for this purpose. +* It is **forbidden** to use the data directories as ``scratch`` area during a job's runtime, i.e. for high throughput I/O for a job's temporary files. Please Use ``/scratch``, ``/shared-scratch`` for this purpose. * No backups: users are responsible for managing the backups of their data directories. ### Scratch directories -There are two different types of scratch disk: **local** (``/scratch``) and **shared** (``/shared-scratch``). -Specific details of each type is described below. +There are two different types of scratch storage: **local** (``/scratch``) and **shared** (``/shared-scratch``). -Usually **shared** scratch will be used for those jobs running on multiple nodes which need to access to a common shared space -for creating temporary files, while **local** scratch should be used by those jobs needing a local space for creating temporary files. +**local** scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially +true for all jobs running on a single node. +**shared** scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster +and all tasks need to do I/O on the same temporary files. -**local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology, -while **shared** scratch, despite being also very fast, is an external GPFS storage with more latency. +**local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology. **Shared** scratch is implemented using a distributed parallel filesystem (GPFS) resulting in a higher latency, since it involves remote storage resources and more complex I/O coordination. -``/shared-scratch`` is only mounted in the *Merlin6* computing nodes, and its current size is 50TB. Whenever necessary, it can be increased in the future. +``/shared-scratch`` is only mounted in the *Merlin6* computing nodes (i.e. not on the login nodes), and its current size is 50TB. This can be increased in the future. -A summary for the scratch directories is the following: +The properties of the available scratch storage spaces are given in the following table | Cluster | Service | Scratch | Scratch Mountpoint | Shared Scratch | Shared Scratch Mountpoint | Comments | | ------- | -------------- | ------------ | ------------------ | -------------- | ------------------------- | ------------------------------------- | @@ -141,7 +143,7 @@ A summary for the scratch directories is the following: #### Scratch directories policy * Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies. -* By default, *always* use **local** first and only use **shared** if you specific use case needs a shared scratch area. +* By default, *always* use **local** first and only use **shared** if your specific use case requires it. * Temporary files *must be deleted at the end of the job by the user*. * Remaining files will be deleted by the system if detected.