143 lines
7.0 KiB
Markdown
143 lines
7.0 KiB
Markdown
---
|
|
title: Migration From Merlin5
|
|
#tags:
|
|
#keywords:
|
|
last_updated: 13 June 2019
|
|
#summary: ""
|
|
sidebar: merlin6_sidebar
|
|
permalink: /merlin6/migrating.html
|
|
---
|
|
|
|
## Merlin5 vs Merlin6
|
|
|
|
### Directories
|
|
|
|
| Cluster | Home Directory | User Home Directory | Group Home Directory |
|
|
| ------- |:-------------------- |:-------------------- |:---------------------------------------- |
|
|
| merlin5 | /gpfs/home/$username | /gpfs/data/$username | /gpfs/group/$laboratory |
|
|
| merlin6 | /psi/home/$username | /data/user/$username | /data/project/[general|bio]/$projectname |
|
|
|
|
### USR/GRP quota limits in Merlin6
|
|
|
|
| Directory | Quota_Type [Soft:Hard] (Block) | Quota_Type [Soft:Hard] (Files) | Quota Change Policy: Block | Quota Change Policy: Files |
|
|
| ---------------------------------- | ------------------------------ | ------------------------------ |:--------------------------------------------- |:--------------------------------------------- |
|
|
| /psi/home/$username | USR [10GB:11GB] | *Undef* | Up to x2 when strictly justified. | N/A |
|
|
| /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | Inmutable. Need a project. | Changeable when justified. |
|
|
| /data/project/bio/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
|
|
| /data/project/general/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
|
|
|
|
where:
|
|
* **Block** is capacity size in GB and TB
|
|
* **Files** is number of files + directories in Millions (M)
|
|
* User data directorry ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
|
|
|
|
### Project directory
|
|
|
|
#### Why 'project' would be needed?
|
|
|
|
In Merlin5 the concept *project* did not exist. A similar concept (*group*) was existing and was mostly focused for BIO experiments.
|
|
|
|
Quite often different users are working in *a similar* / *the same* project. Data was shared in different ways,
|
|
such like by allowing other users to access private data, or by having duplicates on each user directory needing access to that data.
|
|
This makes the storage usage unefficient and insecure.
|
|
|
|
Also, there is another problem related to that: when a user leaves, we have plenty of data which needs to be kept and nobody becomes
|
|
responsible for that. In addition, after several months user is unregistered from PSI and we end up with orphaned data which needs to
|
|
be kept, but we sometimes loose track of the user.
|
|
|
|
With that, we want to restrict the usage of individual data and bet for project (shared) data. There will be one main responsible for
|
|
this project, but if for some reason this person leaves, responsible can be somebody else (successor if exists, supervisor, or in the
|
|
worst case, the admin).
|
|
|
|
#### Requesting a *project*
|
|
|
|
For requesting a *project* users must provide:
|
|
|
|
* Define a *'project'* directory name. This must be unique.
|
|
* Have an existing *project* **Unix Group**.
|
|
* This can be requested through [PSI Service Now](https://psi.service-now.com/psisp)
|
|
* Unix group must start with *``unx-``*
|
|
* This Unix Group will be the default group for the *'project'*
|
|
* Define a project main responsible and supervisor
|
|
* Define and justify quota requirements:
|
|
* By default GRP quota will be: Block Quota GRP [1TB:1.074TB] and File Qota GRP [1M:1.1M]
|
|
* Individual USR quotas can be requested (by default are not set).
|
|
|
|
---
|
|
|
|
## Migration Schedule
|
|
|
|
### Phase 1 [June]: Pre-migration
|
|
* Users keep working on Merlin5
|
|
* Merlin5 production directories: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'``
|
|
* Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch
|
|
* Users can start migrating data (see [Migration steps](# Migration steps))
|
|
* Users should copy their data from Merlin5 /gpfs/data to Merlin6 /data/user
|
|
* Users should copy their home from Merlin5 /gpfs/home to Merlin6 /psi/home
|
|
* Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
|
|
|
|
### Phase 2 [July-October]: Migration to Merlin6
|
|
* Merlin6 becomes official cluster, and directories are switched to the new structure:
|
|
* Merlin6 production directories: ``'/psi/home/'``, ``'/data/user'``, ``'/data/project'``
|
|
* Merlin5 directories available in RO: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'``
|
|
* Users can keep migrating their data (see [Migration steps](# Migration steps))
|
|
* ALL data must be migrated
|
|
* Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible.
|
|
* Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
|
|
|
|
### Phase 3 [November]: Merlin5 Decomission
|
|
* Old Merlin5 storage unmounted.
|
|
* Migrated directories reported by users will be deleted.
|
|
* Remaining Merlin5 data will be archived.
|
|
* Merlin5 Slurm cluster removed from production.
|
|
|
|
---
|
|
|
|
## Migration steps
|
|
|
|
### Cleanup / Archive files
|
|
* Users must cleanup and/or archive files, according to quota limits in the storage.
|
|
* If extra space is needed, *'project'* would be needed.
|
|
* If extra files are needed, you can request for an increasement of the quota/
|
|
|
|
#### File list
|
|
|
|
### Step 1: Migrating
|
|
|
|
First migration:
|
|
|
|
```bash
|
|
rsync -avAHXS <source_merlin5> <destination_merlin6>
|
|
rsync -avAHXS /gpfs/data/$username/* /data/user/$username
|
|
```
|
|
|
|
This can take several hours or days:
|
|
* You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate.
|
|
* Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together.
|
|
* We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster.
|
|
|
|
### Step 2: Mirroring
|
|
|
|
Once first migration is done, a second ``rsync`` should be ran. This is done with ``--delete``. With this option ``rsync`` will
|
|
behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate
|
|
new files from the source to the destination.
|
|
|
|
```bash
|
|
rsync -avAHXS --delete <source_merlin5> <destination_merlin6>
|
|
rsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username
|
|
```
|
|
|
|
### Step 3: Removing / Archiving old data
|
|
|
|
#### Removing migrated data
|
|
|
|
Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage.
|
|
Users must report when migration is finished and report which directories are affected and ready to be removed.
|
|
|
|
Merlin administrators will remove the directories, always asking for a last confirmation.
|
|
|
|
#### Archiving data
|
|
|
|
Once all migrated data has been removed from the old storage, missing data will be archived.
|
|
|