8.7 KiB
title, keywords, last_updated, sidebar, redirect_from, permalink
title | keywords | last_updated | sidebar | redirect_from | permalink |
---|---|---|---|---|---|
Merlin7 Storage | storage, /data/user, /data/software, /data/project, /scratch, /data/scratch/shared, quota, export, user, project, scratch, data, data/scratch/shared, merlin_quotas | 07 September 2022 | merlin7_sidebar | /merlin7/data-directories.html | /merlin7/storage.html |
Introduction
This document describes the different directories of the Merlin7 cluster.
Backup and data policies
- Users are responsible for backing up their own data. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).
- When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.
{{site.data.alerts.warning}}When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled. Hence, when a user leaves PSI, she, her supervisor or team must ensure that the data is backed up to an external storage {{site.data.alerts.end}}
How to check quotas
Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the merlin_quotas
command.
This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:
merlin_quotas
{{site.data.alerts.warning}}Currently, merlin_quotas is not functional one the Merlin7 cluster. We will notify when this is ready to be used. {{site.data.alerts.end}}
Merlin7 directories
Merlin7 offers the following directory classes for users:
/data/user/<username>
: Private user home directory/data/project/general
: project directory for Merlin/data/project/bio/$projectname
project directory for BIO/data/project/mu3e/$projectname
project directory for Mu3e/data/project/meg/$projectname
project directory for Mu3e/scratch
: Local scratch disk (only visible by the node running a job)./data/scratch/shared
: Shared scratch disk (visible from all nodes).
{{site.data.alerts.tip}}In Lustre there is a concept called grace time. Filesystems have a block (amount of data) and inode (number of files) quota. These quotas contain a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the grace period. Once the grace time or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase when this is possible, see below table). {{site.data.alerts.end}}
Properties of the directory classes:
Directory | Block Quota [Soft:Hard] | Inode Quota [Soft:Hard] | GraceTime | Quota Change Policy: Block | Quota Change Policy: Inodes | Backup |
---|---|---|---|---|---|---|
/data/user/$username | PRJ [1TB:1.074TB] | PRJ [2M:2.1M] | 7d | Inmutable. Need a project. | Changeable when justified. | no |
/data/project/bio/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
/data/project/general/$projectname | PRJ [1TB:1.074TB] | PRJ [1M:1.1M] | 7d | Subject to project requirements. | Subject to project requirements. | no |
/scratch | Undef | Undef | N/A | N/A | N/A | no |
/data/scratch/shared | USR [512GB:2TB] | 7d | Up to x2 when strongly justified. | Changeable when justified. | no |
{{site.data.alerts.warning}}The use of scratch and /data/scratch/shared areas as an extension of the quota is forbidden. scratch and /data/scratch/shared areas must not contain final data. Keep in mind that
auto cleanup policies in the scratch and /data/scratch/shared areas are applied.
{{site.data.alerts.end}}
User home directory
This is the default directory users will land when login in to any Merlin7 machine. It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.
The home directories are mounted in the login and computing nodes under the directory
/data/user/$username
Directory policies:
- Read [Important: Code of Conduct](## Important: Code of Conduct) for more information about Merlin7 policies.
- Is forbidden to use the home directories for IO-intensive tasks
- Use always the local
/scratch
disk of the compute nodes in first place. - Use
/data/scratch/shared
only when necessary. In example, by jobs requiring a fast shared storage area.
- Use always the local
- No backup policy is applied for the user home directories: users are responsible for backing up their data.
Home directory quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
lfs quota -h -p $(( 100000000 + $(id -u $USER) )) /data
Project data directory
This storage is intended for keeping large amounts of a project's data, where the data also can be shared by all members of the project (the project's corresponding unix group). We recommend to keep most data in project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies regarding extending the available storage space.
Scientists can request a Merlin project space as described in [Accessing Merlin -> Requesting a Project]. By default, Merlin can offer general project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified). General Merlin projects might need to be reviewed after one year of their creation.
Once a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:
/data/project/general/$projectname
Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:
lfs quota -h -p $projectid /data
{{site.data.alerts.warning}}Checking quotas for the Merlin projects is not yet possible.
In the future, a list of projectid
will be provided, so users can check their quotas.
{{site.data.alerts.end}}
Directory policies:
- Read [Important: Code of Conduct](## Important: Code of Conduct) for more information about Merlin7 policies.
- It is forbidden to use the data directories as
scratch
area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.- Please Use
/scratch
,/data/scratch/shared
for this purpose.
- Please Use
- No backups: users are responsible for managing the backups of their data directories.
Dedicated project directories
Some departments or divisions have bigger storage space requirements on Merlin7. At present, bio
, mu3e
and meg
are the main ones.
These are mounted under the following paths:
/data/project/bio
/data/project/mu3e
/data/project/meg
They follow the same rules as the general projects, except that they have assigned more space.
Scratch directories
There are two different types of scratch storage: local (/scratch
) and shared (/data/scratch/shared
).
- local scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially true for all jobs running on a single node. Mount path:
/scratch
- shared scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster and all tasks need to do I/O on the same temporary files.
/data/scratch/shared
Scratch directories policies:
- Read [Important: Code of Conduct](## Important: Code of Conduct) for more information about Merlin7 policies.
- By default, always use local first and only use shared if your specific use case requires it.
- Temporary files must be deleted at the end of the job by the user.
- Remaining files will be deleted by the system if detected.
- Files not accessed within 28 days will be automatically cleaned up by the system.
- If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.
- Remaining files will be deleted by the system if detected.