--- title: Migration From Merlin5 #tags: #keywords: last_updated: 18 June 2019 #summary: "" sidebar: merlin6_sidebar permalink: /merlin6/migrating.html --- ## Directories ### Merlin5 vs Merlin6 | Cluster | Home Directory | User Home Directory | Group Home Directory | | ------- |:-------------------- |:-------------------- |:---------------------------------------- | | merlin5 | /gpfs/home/_$username_ | /gpfs/data/_$username_ | /gpfs/group/_$laboratory_ | | merlin6 | /psi/home/_$username_ | /data/user/_$username_ | /data/project/_\[general\|bio\]_/_$projectname_ | ### Quota limits in Merlin6 | Directory | Quota_Type [Soft:Hard] (Block) | Quota_Type [Soft:Hard] (Files) | Quota Change Policy: Block | Quota Change Policy: Files | | ---------------------------------- | ------------------------------ | ------------------------------ |:--------------------------------------------- |:--------------------------------------------- | | /psi/home/$username | USR [10GB:11GB] | *Undef* | Up to x2 when strictly justified. | N/A | | /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | Inmutable. Need a project. | Changeable when justified. | | /data/project/bio/$projectname | GRP+Fileset [1TB:1.074TB] | GRP+Fileset [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. | | /data/project/general/$projectname | GRP+Fileset [1TB:1.074TB] | GRP+Fileset [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. | where: * **Block** is capacity size in GB and TB * **Files** is number of files + directories in Millions (M) * **Quota types** are the following: * **USR**: Quota is setup individually per user name * **GRP**: Quota is setup individually per Unix Group name * **Fileset**: Quota is setup per project root directory. * User data directory ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created. * Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded. ### Project directory #### Why is 'project' needed? Merlin6 introduces the concept of a *project* directory. These are the recommended location for all scientific data. * `/data/user` is not suitable for sharing data between users * The Merlin5 *group* directories were a similar concept, but the association with a single organizational group made interdepartmental sharing difficult. Projects can be shared by any PSI user. * Projects are shared by multiple users (at a minimum they should be shared with the supervisor/PI). This decreases the chance of data being orphaned by personnel changes. * Shared projects are preferable to individual data for transparency and accountability in event of future questions regarding the data. * One project member is designated as responsible. Responsibility can be transferred if needed. #### Requesting a *project* For requesting a *project* users must provide: * Define a *'project'* directory name. This must be unique. * Have an existing *project* **Unix Group**. * This can be requested through [PSI Service Now](https://psi.service-now.com/psisp) * Unix group must start with *``unx-``* * This Unix Group will be the default group for the *'project'* * Define a project main responsible and supervisor * Define and justify quota requirements: * By default GRP quota will be: Block Quota GRP [1TB:1.074TB] and File Qota GRP [1M:1.1M] * Individual USR quotas can be requested (by default are not set). --- ## Migration Schedule ### Phase 1 [June]: Pre-migration * Users keep working on Merlin5 * Merlin5 production directories: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'`` * Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch * Users can start migrating data (see [Migration steps](# Migration steps)) * Users should copy their data from Merlin5 ``/gpfs/data`` to Merlin6 ``/data/user`` * Users should copy their home from Merlin5 ``/gpfs/home`` to Merlin6 ``/psi/home`` * Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins. ### Phase 2 [July-October]: Migration to Merlin6 * Merlin6 becomes official cluster, and directories are switched to the new structure: * Merlin6 production directories: ``'/psi/home/'``, ``'/data/user'``, ``'/data/project'`` * Merlin5 directories available in RO in login nodes: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'`` * In Merlin5 computing nodes, Merlin5 directories will be mounted in RW: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'`` * Users must migrate their data (see [Migration steps](# Migration steps)) * ALL data must be migrated * Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible. * Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins. ### Phase 3 [November]: Merlin5 Decomission * Old Merlin5 storage unmounted. * Migrated directories reported by users will be deleted. * Remaining Merlin5 data will be archived. * Merlin5 Slurm cluster removed from production. --- ## Migration steps ### Cleanup / Archive files * Users must cleanup and/or archive files, according to quota limits in the storage. * If extra space is needed, *'project'* would be needed. * If extra files are needed, you can request for an increasement of the quota/ #### File list ### Step 1: Migrating First migration: ```bash rsync -avAHXS rsync -avAHXS /gpfs/data/$username/* /data/user/$username ``` This can take several hours or days: * You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate. * Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together. * We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster. ### Step 2: Mirroring Once first migration is done, a second ``rsync`` should be ran. This is done with ``--delete``. With this option ``rsync`` will behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate new files from the source to the destination. ```bash rsync -avAHXS --delete rsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username ``` ### Step 3: Removing / Archiving old data #### Removing migrated data Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage. Users must report when migration is finished and report which directories are affected and ready to be removed. Merlin administrators will remove the directories, always asking for a last confirmation. #### Archiving data Once all migrated data has been removed from the old storage, missing data will be archived.