Files
gitea-pages/pages/merlin7/02-How-To-Use-Merlin/storage.md

10 KiB

title, keywords, sidebar, redirect_from, permalink
title keywords sidebar redirect_from permalink
Merlin7 Storage storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas merlin7_sidebar /merlin7/data-directories.html /merlin7/storage.html

Introduction

This document describes the different directories of the Merlin7 cluster.

Backup and data policies

  • Users are responsible for backing up their own data. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
  • When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.

{{site.data.alerts.warning}}When a user leaves PSI and their account is removed, their storage space in Merlin may be recycled. Hence, when a user leaves PSI, they, their supervisor or team must ensure that the data is backed up to an external storage {{site.data.alerts.end}}

How to check quotas

Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the merlin_quotas command. This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:

$ merlin_quotas 
Path            SpaceUsed  SpaceQuota  Space %  FilesUsed  FilesQuota  Files %
--------------  ---------  ----------  -------  ---------  ----------  -------
/data/user      30.26G     1T          03%      367296     2097152     18%    
 └─ <USERNAME>
/afs/psi.ch     3.4G       9.5G        36%      0          0           0%     
 └─ user/<USERDIR>                                                                
/data/project   2.457T     10T         25%      58         2097152     00%    
 └─ bio/shared                                                                
/data/project   338.3G     10T         03%      199391     2097152     10%    
 └─ bio/hpce 

{{site.data.alerts.note}}On first use you will see a message about some configuration being generated, this is expected. Don't be surprised that it takes some time. After this using merlin_quotas should be faster. {{site.data.alerts.end}}

The output shows the quotas set and how much you are using of the quota, for each filesystem that has this set. Notice that some users will have one or more /data/project/... directories showing, depending on whether you are part of a specific PSI research group or project.

The general quota constraints for the different directories are shown in the table below. Further details on how to use merlin_quotas can be found on the Tools page.

{{site.data.alerts.tip}}If you're interesting, you can retrieve the Lustre-based quota information directly by calling lfs quota -h -p $(( 100000000 + $(id -u $USER) )) /data directly. Using the merlin_quotas command is more convenient and shows all your relevant filesystem quotas. {{site.data.alerts.end}}

Merlin7 directories

Merlin7 offers the following directory classes for users:

  • /data/user/<username>: Private user home directory
  • /data/project/general: project directory for Merlin
  • /data/project/bio/$projectname: project directory for BIO
  • /data/project/mu3e/$projectname: project directory for Mu3e
  • /data/project/meg/$projectname: project directory for Mu3e
  • /scratch: Local scratch disk (only visible by the node running a job).
  • /data/scratch/shared: Shared scratch disk (visible from all nodes).

{{site.data.alerts.tip}}In Lustre there is a concept called grace time. Filesystems have a block (amount of data) and inode (number of files) quota. These quotas contain a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the grace period. Once the grace time or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase when this is possible, see below table). {{site.data.alerts.end}}

Properties of the directory classes:

Directory Block Quota [Soft:Hard] Inode Quota [Soft:Hard] GraceTime Quota Change Policy: Block Quota Change Policy: Inodes Backup
/data/user/$username PRJ [1TB:1.074TB] PRJ [2M:2.1M] 7d Immutable. Need a project. Changeable when justified. no
/data/project/bio/$projectname PRJ [1TB:1.074TB] PRJ [1M:1.1M] 7d Subject to project requirements. Subject to project requirements. no
/data/project/general/$projectname PRJ [1TB:1.074TB] PRJ [1M:1.1M] 7d Subject to project requirements. Subject to project requirements. no
/data/scratch/shared USR [512GB:2TB] 7d Up to x2 when strongly justified. Changeable when justified. no
/scratch Undef Undef N/A N/A N/A no

{{site.data.alerts.warning}}The use of /scratch and /data/scratch/shared areas as an extension of the quota is forbidden. The /scratch and /data/scratch/shared areas must not contain final data. Keep in mind that
auto cleanup policies in the /scratch and /data/scratch/shared areas are applied. {{site.data.alerts.end}}

User home directory

This is the default directory users will land when login in to any Merlin7 machine. It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.

The home directories are mounted in the login and computing nodes under the directory

/data/user/$username

Directory policies:

  • Read Important: Code of Conduct for more information about Merlin7 policies.
  • Is forbidden to use the home directories for IO-intensive tasks, instead use one of the scratch areas instead!
  • No backup policy is applied for the user home directories: users are responsible for backing up their data.

Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the merlin_quotas command described above.

Project data directory

This storage is intended for keeping large amounts of a project's data, where the data also can be shared by all members of the project (the project's corresponding UNIX group). We recommend to keep most data in project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies regarding extending the available storage space.

Scientists can request a Merlin project space as described in [Accessing Merlin -> Requesting a Project]. By default, Merlin can offer general project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified). General Merlin projects might need to be reviewed after one year of their creation.

Once a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:

/data/project/general/$projectname

Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:

lfs quota -h -p $projectid /data

{{site.data.alerts.warning}}Checking quotas for the Merlin projects is not yet possible. In the future, a list of projectid will be provided, so users can check their quotas. {{site.data.alerts.end}}

Directory policies:

  • Read Important: Code of Conduct for more information about Merlin7 policies.
  • It is forbidden to use the data directories as /scratch area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.
    • Please Use /scratch, /data/scratch/shared for this purpose.
  • No backups: users are responsible for managing the backups of their data directories.

Dedicated project directories

Some departments or divisions have bigger storage space requirements on Merlin7. At present, bio, mu3e and meg are the main ones. These are mounted under the following paths:

/data/project/bio
/data/project/mu3e
/data/project/meg

They follow the same rules as the general projects, except that they have assigned more space.

Scratch directories

There are two different types of scratch storage: local (/scratch) and shared (/data/scratch/shared).

  • local scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially true for all jobs running on a single node. Mount path:
/scratch
  • shared scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster and all tasks need to do I/O on the same temporary files.
/data/scratch/shared

Scratch directories policies:

  • Read Important: Code of Conduct for more information about Merlin7 policies.
  • By default, always use local first and only use shared if your specific use case requires it.
  • Temporary files must be deleted at the end of the job by the user.
    • Remaining files will be deleted by the system if detected.
      • Files not accessed within 28 days will be automatically cleaned up by the system.
      • If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.