Files
gitea-pages/pages/merlin6/99-support/migration-from-merlin5.md

7.0 KiB

title, keywords, last_updated, sidebar, permalink
title keywords last_updated sidebar permalink
Migration From Merlin5 merlin5, merlin6, migration, rsync, archive, archiving, lts, long-term storage 07 September 2022 merlin6_sidebar /merlin6/migrating.html

Directories

Merlin5 vs Merlin6

Cluster Home Directory User Home Directory Group Home Directory
merlin5 /gpfs/home/$username /gpfs/data/$username /gpfs/group/$laboratory
merlin6 /psi/home/$username /data/user/$username /data/project/[general|bio]/$projectname

Quota limits in Merlin6

Directory Quota_Type [Soft:Hard] (Block) Quota_Type [Soft:Hard] (Files) Quota Change Policy: Block Quota Change Policy: Files
/psi/home/$username USR [10GB:11GB] Undef Up to x2 when strictly justified. N/A
/data/user/$username USR [1TB:1.074TB] USR [1M:1.1M] Inmutable. Need a project. Changeable when justified.
/data/project/bio/$projectname GRP+Fileset [1TB:1.074TB] GRP+Fileset [1M:1.1M] Changeable according to project requirements. Changeable according to project requirements.
/data/project/general/$projectname GRP+Fileset [1TB:1.074TB] GRP+Fileset [1M:1.1M] Changeable according to project requirements. Changeable according to project requirements.

where:

  • Block is capacity size in GB and TB
  • Files is number of files + directories in Millions (M)
  • Quota types are the following:
    • USR: Quota is setup individually per user name
    • GRP: Quota is setup individually per Unix Group name
    • Fileset: Quota is setup per project root directory.
  • User data directory /data/user has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
  • Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded.

Project directory

Why is 'project' needed?

Merlin6 introduces the concept of a project directory. These are the recommended location for all scientific data.

  • /data/user is not suitable for sharing data between users
  • The Merlin5 group directories were a similar concept, but the association with a single organizational group made interdepartmental sharing difficult. Projects can be shared by any PSI user.
  • Projects are shared by multiple users (at a minimum they should be shared with the supervisor/PI). This decreases the chance of data being orphaned by personnel changes.
  • Shared projects are preferable to individual data for transparency and accountability in event of future questions regarding the data.
  • One project member is designated as responsible. Responsibility can be transferred if needed.

Requesting a project

Refer to Requesting a project


Migration Schedule

Phase 1 [June]: Pre-migration

  • Users keep working on Merlin5
    • Merlin5 production directories: '/gpfs/home/', '/gpfs/data', '/gpfs/group'
  • Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch
  • Users can start migrating data (see Migration steps)
    • Users should copy their data from Merlin5 /gpfs/data to Merlin6 /data/user
    • Users should copy their home from Merlin5 /gpfs/home to Merlin6 /psi/home
  • Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.

Phase 2 [July-October]: Migration to Merlin6

  • Merlin6 becomes official cluster, and directories are switched to the new structure:
    • Merlin6 production directories: '/psi/home/', '/data/user', '/data/project'
    • Merlin5 directories available in RW in login nodes: '/gpfs/home/', '/gpfs/data', '/gpfs/group'
      • In Merlin5 computing nodes, Merlin5 directories are mounted in RW: '/gpfs/home/', '/gpfs/data', '/gpfs/group'
      • In Merlin5 computing nodes, Merlin6 directories are mounted in RW: '/psi/home/', '/data/user', '/data/project'
  • Users must migrate their data (see Migration steps)
    • ALL data must be migrated
  • Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible.
  • Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.

Phase 3 [November]: Merlin5 Decomission

  • Old Merlin5 storage unmounted.
  • Migrated directories reported by users will be deleted.
  • Remaining Merlin5 data will be archived.

Migration steps

Cleanup / Archive files

  • Users must cleanup and/or archive files, according to the quota limits for the target storage.
  • If extra space is needed, we advise users to request a project
  • If you need a larger quota in respect to the maximal allowed number of files, you can request an increase of your user quota.

File list

Step 1: Migrating

First migration:

rsync -avAHXS <source_merlin5> <destination_merlin6>
rsync -avAHXS /gpfs/data/$username/* /data/user/$username

This can take several hours or days:

  • You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate.
  • Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together.
    • We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster.

Step 2: Mirroring

Once first migration is done, a second rsync should be ran. This is done with --delete. With this option rsync will behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate new files from the source to the destination.

rsync -avAHXS --delete <source_merlin5> <destination_merlin6>
rsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username

Step 3: Removing / Archiving old data

Removing migrated data

Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage. Users must report when migration is finished and report which directories are affected and ready to be removed.

Merlin administrators will remove the directories, always asking for a last confirmation.

Archiving data

Once all migrated data has been removed from the old storage, missing data will be archived.