7.1 KiB
layout, title, parent, nav_order
layout | title | parent | nav_order |
---|---|---|---|
default | Migration From Merlin5 | Merlin6 User Guide | 7 |
Migration From Merlin5
{: .no_toc }
Table of contents
{: .no_toc .text-delta }
- TOC {:toc}
Merlin5 vs Merlin6
Directories
Cluster | Home Directory | User Home Directory | Group Home Directory |
---|---|---|---|
merlin5 | /gpfs/home/$username | /gpfs/data/$username | /gpfs/group/$laboratory |
merlin6 | /psi/home/$username | /data/user/$username | /data/project/[general |
USR/GRP quota limits in Merlin6
Directory | Quota_Type [Soft:Hard] (Block) | Quota_Type [Soft:Hard] (Files) | Quota Change Policy: Block | Quota Change Policy: Files |
---|---|---|---|---|
/psi/home/$username | USR [10GB:11GB] | Undef | Up to x2 when strictly justified. | N/A |
/data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | Inmutable. Need a project. | Changeable when justified. |
/data/project/bio/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
/data/project/general/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
where:
- Block is capacity size in GB and TB
- Files is number of files + directories in Millions (M)
- User data directorry
/data/user
has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
Project directory
Why 'project' would be needed?
In Merlin5 the concept project did not exist. A similar concept (group) was existing and was mostly focused for BIO experiments.
Quite often different users are working in a similar / the same project. Data was shared in different ways, such like by allowing other users to access private data, or by having duplicates on each user directory needing access to that data. This makes the storage usage unefficient and insecure.
Also, there is another problem related to that: when a user leaves, we have plenty of data which needs to be kept and nobody becomes responsible for that. In addition, after several months user is unregistered from PSI and we end up with orphaned data which needs to be kept, but we sometimes loose track of the user.
With that, we want to restrict the usage of individual data and bet for project (shared) data. There will be one main responsible for this project, but if for some reason this person leaves, responsible can be somebody else (successor if exists, supervisor, or in the worst case, the admin).
Requesting a project
For requesting a project users must provide:
- Define a 'project' directory name. This must be unique.
- Have an existing project Unix Group.
- This can be requested through PSI Service Now
- Unix group must start with
unx-
- This Unix Group will be the default group for the 'project'
- Define a project main responsible and supervisor
- Define and justify quota requirements:
- By default GRP quota will be: Block Quota GRP [1TB:1.074TB] and File Qota GRP [1M:1.1M]
- Individual USR quotas can be requested (by default are not set).
Migration Schedule
Phase 1 [June]: Pre-migration
- Users keep working on Merlin5
- Merlin5 production directories:
'/gpfs/home/'
,'/gpfs/data'
,'/gpfs/group'
- Merlin5 production directories:
- Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch
- Users can start migrating data (see [Migration steps](# Migration steps))
- Users should copy their data from Merlin5 /gpfs/data to Merlin6 /data/user
- Users should copy their home from Merlin5 /gpfs/home to Merlin6 /psi/home
- Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
Phase 2 [July-October]: Migration to Merlin6
- Merlin6 becomes official cluster, and directories are switched to the new structure:
- Merlin6 production directories:
'/psi/home/'
,'/data/user'
,'/data/project'
- Merlin5 directories available in RO:
'/gpfs/home/'
,'/gpfs/data'
,'/gpfs/group'
- Merlin6 production directories:
- Users can keep migrating their data (see [Migration steps](# Migration steps))
- ALL data must be migrated
- Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible.
- Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
Phase 3 [November]: Merlin5 Decomission
- Old Merlin5 storage unmounted.
- Migrated directories reported by users will be deleted.
- Remaining Merlin5 data will be archived.
- Merlin5 Slurm cluster removed from production.
Migration steps
Cleanup / Archive files
- Users must cleanup and/or archive files, according to quota limits in the storage.
- If extra space is needed, 'project' would be needed.
- If extra files are needed, you can request for an increasement of the quota/
File list
Step 1: Migrating
First migration:
rsync -avAHXS <source_merlin5> <destination_merlin6>
rsync -avAHXS /gpfs/data/$username/* /data/user/$username
This can take several hours or days:
- You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate.
- Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together.
- We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster.
Step 2: Mirroring
Once first migration is done, a second rsync
should be ran. This is done with --delete
. With this option rsync
will
behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate
new files from the source to the destination.
rsync -avAHXS --delete <source_merlin5> <destination_merlin6>
rsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username
Step 3: Removing / Archiving old data
Removing migrated data
Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage. Users must report when migration is finished and report which directories are affected and ready to be removed.
Merlin administrators will remove the directories, always asking for a last confirmation.
Archiving data
Once all migrated data has been removed from the old storage, missing data will be archived.