10 KiB
title, keywords, sidebar, redirect_from, permalink
title | keywords | sidebar | redirect_from | permalink |
---|---|---|---|---|
Merlin7 Storage | storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas | merlin7_sidebar | /merlin7/data-directories.html | /merlin7/storage.html |
Introduction
This document describes the different directories of the Merlin7 cluster.
Backup and data policies
- Users are responsible for backing up their own data. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
- When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
{{site.data.alerts.warning}}When a user leaves PSI and their account is removed, their storage space in Merlin may be recycled. Hence, when a user leaves PSI, they, their supervisor or team must ensure that the data is backed up to an external storage {{site.data.alerts.end}}
How to check quotas
Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the merlin_quotas
command.
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
$ merlin_quotas
Path SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %
-------------- --------- ---------- ------- --------- ---------- -------
/data/user 30.26G 1T 03% 367296 2097152 18%
└─ <USERNAME>
/afs/psi.ch 3.4G 9.5G 36% 0 0 0%
└─ user/<USERDIR>
/data/project 2.457T 10T 25% 58 2097152 00%
└─ bio/shared
/data/project 338.3G 10T 03% 199391 2097152 10%
└─ bio/hpce
{{site.data.alerts.note}}On first use you will see a message about some configuration being generated, this is expected. Don't be
surprised that it takes some time. After this using merlin_quotas
should be faster.
{{site.data.alerts.end}}
The output shows the quotas set and how much you are using of the quota, for each filesystem that has this set. Notice that some users will have
one or more /data/project/...
directories showing, depending on whether you are part of a specific PSI research group or project.
The general quota constraints for the different directories are shown in the table below. Further details on how to use merlin_quotas
can be found on the Tools page.
{{site.data.alerts.tip}}If you're interesting, you can retrieve the Lustre-based quota information directly by calling
lfs quota -h -p $(( 100000000 + $(id -u $USER) )) /data
directly. Using the merlin_quotas
command is more
convenient and shows all your relevant filesystem quotas.
{{site.data.alerts.end}}
Merlin7 directories
Merlin7 offers the following directory classes for users:
/data/user/<username>
: Private user home directory/data/project/general
: project directory for Merlin/data/project/bio/$projectname
: project directory for BIO/data/project/mu3e/$projectname
: project directory for Mu3e/data/project/meg/$projectname
: project directory for Mu3e/scratch
: Local scratch disk (only visible by the node running a job)./data/scratch/shared
: Shared scratch disk (visible from all nodes).
{{site.data.alerts.tip}}In Lustre there is a concept called grace time. Filesystems have a block (amount of data) and inode (number of files) quota. These quotas contain a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the grace period. Once the grace time or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase when this is possible, see below table). {{site.data.alerts.end}}
Properties of the directory classes:
Directory | Block Quota [Soft:Hard] | Inode Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Inodes | Backup |
---|---|---|---|---|---|---|
/data/user/$username | PRJ [1TB:1.074TB] | PRJ [2M:2.1M] | 7d | Immutable. Need a project. | Changeable when justified. | no |
/data/project/bio/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
/data/project/general/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
/data/scratch/shared | USR [512GB:2TB] | 7d | Up to x2 when strongly justified. | Changeable when justified. | no | |
/scratch | Undef | Undef | N/A | N/A | N/A | no |
{{site.data.alerts.warning}}The use of /scratch and /data/scratch/shared areas as an extension of the quota is forbidden. The /scratch and
/data/scratch/shared areas must not contain final data. Keep in mind that
auto cleanup policies in the /scratch and
/data/scratch/shared areas are applied.
{{site.data.alerts.end}}
User home directory
This is the default directory users will land when login in to any Merlin7 machine. It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.
The home directories are mounted in the login and computing nodes under the directory
/data/user/$username
Directory policies:
- Read Important: Code of Conduct for more information about Merlin7 policies.
- Is forbidden to use the home directories for IO-intensive tasks, instead use one of the scratch areas instead!
- No backup policy is applied for the user home directories: users are responsible for backing up their data.
Home directory quotas are defined in a per Lustre project basis. The quota can be checked using the merlin_quotas
command described
above.
Project data directory
This storage is intended for keeping large amounts of a project's data, where the data also can be shared by all members of the project (the project's corresponding UNIX group). We recommend to keep most data in project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies regarding extending the available storage space.
Scientists can request a Merlin project space as described in [Accessing Merlin -> Requesting a Project]. By default, Merlin can offer general project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified). General Merlin projects might need to be reviewed after one year of their creation.
Once a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:
/data/project/general/$projectname
Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
lfs quota -h -p $projectid /data
{{site.data.alerts.warning}}Checking quotas for the Merlin projects is not yet possible.
In the future, a list of projectid
will be provided, so users can check their quotas.
{{site.data.alerts.end}}
Directory policies:
- Read Important: Code of Conduct for more information about Merlin7 policies.
- It is forbidden to use the data directories as
/scratch
area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.- Please Use
/scratch
,/data/scratch/shared
for this purpose.
- Please Use
- No backups: users are responsible for managing the backups of their data directories.
Dedicated project directories
Some departments or divisions have bigger storage space requirements on Merlin7. At present, bio
, mu3e
and meg
are the main ones.
These are mounted under the following paths:
/data/project/bio
/data/project/mu3e
/data/project/meg
They follow the same rules as the general projects, except that they have assigned more space.
Scratch directories
There are two different types of scratch storage: local (/scratch
) and shared (/data/scratch/shared
).
- local scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially true for all jobs running on a single node. Mount path:
/scratch
- shared scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster and all tasks need to do I/O on the same temporary files.
/data/scratch/shared
Scratch directories policies:
- Read Important: Code of Conduct for more information about Merlin7 policies.
- By default, always use local first and only use shared if your specific use case requires it.
- Temporary files must be deleted at the end of the job by the user.
- Remaining files will be deleted by the system if detected.
- Files not accessed within 28 days will be automatically cleaned up by the system.
- If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
- Remaining files will be deleted by the system if detected.