--- title: Migration From Merlin5 #tags: keywords: merlin5, merlin6, migration, rsync, archive, archiving, lts, long-term storage last_updated: 07 September 2022 #summary: "" sidebar: merlin6_sidebar permalink: /merlin6/migrating.html --- ## Directories ### Merlin5 vs Merlin6 | Cluster | Home Directory | User Home Directory | Group Home Directory | | ------- |:-------------------- |:-------------------- |:---------------------------------------- | | merlin5 | /gpfs/home/_$username_ | /gpfs/data/_$username_ | /gpfs/group/_$laboratory_ | | merlin6 | /psi/home/_$username_ | /data/user/_$username_ | /data/project/_\[general\|bio\]_/_$projectname_ | ### Quota limits in Merlin6 | Directory | Quota_Type [Soft:Hard] (Block) | Quota_Type [Soft:Hard] (Files) | Quota Change Policy: Block | Quota Change Policy: Files | | ---------------------------------- | ------------------------------ | ------------------------------ |:--------------------------------------------- |:--------------------------------------------- | | /psi/home/$username | USR [10GB:11GB] | *Undef* | Up to x2 when strictly justified. | N/A | | /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | Inmutable. Need a project. | Changeable when justified. | | /data/project/bio/$projectname | GRP+Fileset [1TB:1.074TB] | GRP+Fileset [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. | | /data/project/general/$projectname | GRP+Fileset [1TB:1.074TB] | GRP+Fileset [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. | where: * **Block** is capacity size in GB and TB * **Files** is number of files + directories in Millions (M) * **Quota types** are the following: * **USR**: Quota is setup individually per user name * **GRP**: Quota is setup individually per Unix Group name * **Fileset**: Quota is setup per project root directory. * User data directory ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created. * Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded. ### Project directory #### Why is 'project' needed? Merlin6 introduces the concept of a *project* directory. These are the recommended location for all scientific data. * `/data/user` is not suitable for sharing data between users * The Merlin5 *group* directories were a similar concept, but the association with a single organizational group made interdepartmental sharing difficult. Projects can be shared by any PSI user. * Projects are shared by multiple users (at a minimum they should be shared with the supervisor/PI). This decreases the chance of data being orphaned by personnel changes. * Shared projects are preferable to individual data for transparency and accountability in event of future questions regarding the data. * One project member is designated as responsible. Responsibility can be transferred if needed. #### Requesting a *project* Refer to [Requesting a project](/merlin6/request-project.html) --- ## Migration Schedule ### Phase 1 [June]: Pre-migration * Users keep working on Merlin5 * Merlin5 production directories: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'`` * Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch * Users can start migrating data (see [Migration steps](/merlin6/migrating.html#migration-steps)) * Users should copy their data from Merlin5 ``/gpfs/data`` to Merlin6 ``/data/user`` * Users should copy their home from Merlin5 ``/gpfs/home`` to Merlin6 ``/psi/home`` * Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins. ### Phase 2 [July-October]: Migration to Merlin6 * Merlin6 becomes official cluster, and directories are switched to the new structure: * Merlin6 production directories: ``'/psi/home/'``, ``'/data/user'``, ``'/data/project'`` * Merlin5 directories available in RW in login nodes: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'`` * In Merlin5 computing nodes, Merlin5 directories are mounted in RW: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'`` * In Merlin5 computing nodes, Merlin6 directories are mounted in RW: ``'/psi/home/'``, ``'/data/user'``, ``'/data/project'`` * Users must migrate their data (see [Migration steps](/merlin6/migrating.html#migration-steps)) * ALL data must be migrated * Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible. * Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins. ### Phase 3 [November]: Merlin5 Decomission * Old Merlin5 storage unmounted. * Migrated directories reported by users will be deleted. * Remaining Merlin5 data will be archived. --- ## Migration steps ### Cleanup / Archive files * Users must cleanup and/or archive files, according to the quota limits for the target storage. * If extra space is needed, we advise users to request a [project](/merlin6/request-project.html) * If you need a larger quota in respect to the maximal allowed number of files, you can request an increase of your user quota. #### File list ### Step 1: Migrating First migration: ```bash rsync -avAHXS rsync -avAHXS /gpfs/data/$username/* /data/user/$username ``` This can take several hours or days: * You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate. * Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together. * We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster. ### Step 2: Mirroring Once first migration is done, a second ``rsync`` should be ran. This is done with ``--delete``. With this option ``rsync`` will behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate new files from the source to the destination. ```bash rsync -avAHXS --delete rsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username ``` ### Step 3: Removing / Archiving old data #### Removing migrated data Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage. Users must report when migration is finished and report which directories are affected and ready to be removed. Merlin administrators will remove the directories, always asking for a last confirmation. #### Archiving data Once all migrated data has been removed from the old storage, missing data will be archived.