140 lines
6.9 KiB
Markdown
140 lines
6.9 KiB
Markdown
---
|
|
title: Migration From Merlin5
|
|
#tags:
|
|
#keywords:
|
|
last_updated: 18 June 2019
|
|
#summary: ""
|
|
sidebar: merlin6_sidebar
|
|
permalink: /merlin6/migrating.html
|
|
---
|
|
|
|
## Directories
|
|
|
|
### Merlin5 vs Merlin6
|
|
|
|
| Cluster | Home Directory | User Home Directory | Group Home Directory |
|
|
| ------- |:-------------------- |:-------------------- |:---------------------------------------- |
|
|
| merlin5 | /gpfs/home/_$username_ | /gpfs/data/_$username_ | /gpfs/group/_$laboratory_ |
|
|
| merlin6 | /psi/home/_$username_ | /data/user/_$username_ | /data/project/_\[general\|bio\]_/_$projectname_ |
|
|
|
|
### Quota limits in Merlin6
|
|
|
|
| Directory | Quota_Type [Soft:Hard] (Block) | Quota_Type [Soft:Hard] (Files) | Quota Change Policy: Block | Quota Change Policy: Files |
|
|
| ---------------------------------- | ------------------------------ | ------------------------------ |:--------------------------------------------- |:--------------------------------------------- |
|
|
| /psi/home/$username | USR [10GB:11GB] | *Undef* | Up to x2 when strictly justified. | N/A |
|
|
| /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | Inmutable. Need a project. | Changeable when justified. |
|
|
| /data/project/bio/$projectname | GRP+Fileset [1TB:1.074TB] | GRP+Fileset [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
|
|
| /data/project/general/$projectname | GRP+Fileset [1TB:1.074TB] | GRP+Fileset [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
|
|
|
|
where:
|
|
* **Block** is capacity size in GB and TB
|
|
* **Files** is number of files + directories in Millions (M)
|
|
* **Quota types** are the following:
|
|
* **USR**: Quota is setup individually per user name
|
|
* **GRP**: Quota is setup individually per Unix Group name
|
|
* **Fileset**: Quota is setup per project root directory.
|
|
* User data directory ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
|
|
* Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded.
|
|
|
|
### Project directory
|
|
|
|
#### Why is 'project' needed?
|
|
|
|
Merlin6 introduces the concept of a *project* directory. These are the recommended location for all scientific data.
|
|
|
|
* `/data/user` is not suitable for sharing data between users
|
|
* The Merlin5 *group* directories were a similar concept, but the association with a single organizational group made
|
|
interdepartmental sharing difficult. Projects can be shared by any PSI user.
|
|
* Projects are shared by multiple users (at a minimum they should be shared with the supervisor/PI). This decreases
|
|
the chance of data being orphaned by personnel changes.
|
|
* Shared projects are preferable to individual data for transparency and accountability in event of future questions
|
|
regarding the data.
|
|
* One project member is designated as responsible. Responsibility can be transferred if needed.
|
|
|
|
#### Requesting a *project*
|
|
|
|
Refer to [Requesting a project](/merlin6/request-project.html)
|
|
|
|
---
|
|
|
|
## Migration Schedule
|
|
|
|
### Phase 1 [June]: Pre-migration
|
|
|
|
* Users keep working on Merlin5
|
|
* Merlin5 production directories: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'``
|
|
* Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch
|
|
* Users can start migrating data (see [Migration steps](/merlin6/migrating.html#migration-steps))
|
|
* Users should copy their data from Merlin5 ``/gpfs/data`` to Merlin6 ``/data/user``
|
|
* Users should copy their home from Merlin5 ``/gpfs/home`` to Merlin6 ``/psi/home``
|
|
* Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
|
|
|
|
### Phase 2 [July-October]: Migration to Merlin6
|
|
|
|
* Merlin6 becomes official cluster, and directories are switched to the new structure:
|
|
* Merlin6 production directories: ``'/psi/home/'``, ``'/data/user'``, ``'/data/project'``
|
|
* Merlin5 directories available in RW in login nodes: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'``
|
|
* In Merlin5 computing nodes, Merlin5 directories are mounted in RW: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'``
|
|
* In Merlin5 computing nodes, Merlin6 directories are mounted in RW: ``'/psi/home/'``, ``'/data/user'``, ``'/data/project'``
|
|
* Users must migrate their data (see [Migration steps](/merlin6/migrating.html#migration-steps))
|
|
* ALL data must be migrated
|
|
* Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible.
|
|
* Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
|
|
|
|
### Phase 3 [November]: Merlin5 Decomission
|
|
|
|
* Old Merlin5 storage unmounted.
|
|
* Migrated directories reported by users will be deleted.
|
|
* Remaining Merlin5 data will be archived.
|
|
|
|
---
|
|
|
|
## Migration steps
|
|
|
|
### Cleanup / Archive files
|
|
|
|
* Users must cleanup and/or archive files, according to the quota limits for the target storage.
|
|
* If extra space is needed, we advise users to request a [project](/merlin6/request-project.html)
|
|
* If you need a larger quota in respect to the maximal allowed number of files, you can request an increase of your user quota.
|
|
|
|
#### File list
|
|
|
|
### Step 1: Migrating
|
|
|
|
First migration:
|
|
|
|
```bash
|
|
rsync -avAHXS <source_merlin5> <destination_merlin6>
|
|
rsync -avAHXS /gpfs/data/$username/* /data/user/$username
|
|
```
|
|
|
|
This can take several hours or days:
|
|
* You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate.
|
|
* Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together.
|
|
* We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster.
|
|
|
|
### Step 2: Mirroring
|
|
|
|
Once first migration is done, a second ``rsync`` should be ran. This is done with ``--delete``. With this option ``rsync`` will
|
|
behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate
|
|
new files from the source to the destination.
|
|
|
|
```bash
|
|
rsync -avAHXS --delete <source_merlin5> <destination_merlin6>
|
|
rsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username
|
|
```
|
|
|
|
### Step 3: Removing / Archiving old data
|
|
|
|
#### Removing migrated data
|
|
|
|
Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage.
|
|
Users must report when migration is finished and report which directories are affected and ready to be removed.
|
|
|
|
Merlin administrators will remove the directories, always asking for a last confirmation.
|
|
|
|
#### Archiving data
|
|
|
|
Once all migrated data has been removed from the old storage, missing data will be archived.
|
|
|