Update Mrlin6 documentation with latest changes, added new pages
This commit is contained in:
151
pages/merlin6-user-guide/migration-from-merlin5.md
Normal file
151
pages/merlin6-user-guide/migration-from-merlin5.md
Normal file
@ -0,0 +1,151 @@
|
||||
---
|
||||
layout: default
|
||||
title: Migration From Merlin5
|
||||
parent: Merlin6 User Guide
|
||||
nav_order: 7
|
||||
---
|
||||
|
||||
# Migration From Merlin5
|
||||
|
||||
{: .no_toc }
|
||||
|
||||
## Table of contents
|
||||
{: .no_toc .text-delta }
|
||||
|
||||
1. TOC
|
||||
{:toc}
|
||||
|
||||
---
|
||||
|
||||
## Merlin5 vs Merlin6
|
||||
|
||||
### Directories
|
||||
|
||||
| Cluster | Home Directory | User Home Directory | Group Home Directory |
|
||||
| ------- |:-------------------- |:-------------------- |:---------------------------------------- |
|
||||
| merlin5 | /gpfs/home/$username | /gpfs/data/$username | /gpfs/group/$laboratory |
|
||||
| merlin6 | /psi/home/$username | /data/user/$username | /data/project/[general|bio]/$projectname |
|
||||
|
||||
### USR/GRP quota limits in Merlin6
|
||||
|
||||
| Directory | Quota_Type [Soft:Hard] (Block) | Quota_Type [Soft:Hard] (Files) | Quota Change Policy: Block | Quota Change Policy: Files |
|
||||
| ---------------------------------- | ------------------------------ | ------------------------------ |:--------------------------------------------- |:--------------------------------------------- |
|
||||
| /psi/home/$username | USR [10GB:11GB] | *Undef* | Up to x2 when strictly justified. | N/A |
|
||||
| /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | Inmutable. Need a project. | Changeable when justified. |
|
||||
| /data/project/bio/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
|
||||
| /data/project/general/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Changeable according to project requirements. | Changeable according to project requirements. |
|
||||
|
||||
where:
|
||||
* **Block** is capacity size in GB and TB
|
||||
* **Files** is number of files + directories in Millions (M)
|
||||
* User data directorry ``/data/user`` has a strict user block quota limit policy. If more disk space is required, 'project' must be created.
|
||||
|
||||
### Project directory
|
||||
|
||||
#### Why 'project' would be needed?
|
||||
|
||||
In Merlin5 the concept *project* did not exist. A similar concept (*group*) was existing and was mostly focused for BIO experiments.
|
||||
|
||||
Quite often different users are working in *a similar* / *the same* project. Data was shared in different ways,
|
||||
such like by allowing other users to access private data, or by having duplicates on each user directory needing access to that data.
|
||||
This makes the storage usage unefficient and insecure.
|
||||
|
||||
Also, there is another problem related to that: when a user leaves, we have plenty of data which needs to be kept and nobody becomes
|
||||
responsible for that. In addition, after several months user is unregistered from PSI and we end up with orphaned data which needs to
|
||||
be kept, but we sometimes loose track of the user.
|
||||
|
||||
With that, we want to restrict the usage of individual data and bet for project (shared) data. There will be one main responsible for
|
||||
this project, but if for some reason this person leaves, responsible can be somebody else (successor if exists, supervisor, or in the
|
||||
worst case, the admin).
|
||||
|
||||
#### Requesting a *project*
|
||||
|
||||
For requesting a *project* users must provide:
|
||||
|
||||
* Define a *'project'* directory name. This must be unique.
|
||||
* Have an existing *project* **Unix Group**.
|
||||
* This can be requested through [PSI Service Now](https://psi.service-now.com/psisp)
|
||||
* Unix group must start with *``unx-``*
|
||||
* This Unix Group will be the default group for the *'project'*
|
||||
* Define a project main responsible and supervisor
|
||||
* Define and justify quota requirements:
|
||||
* By default GRP quota will be: Block Quota GRP [1TB:1.074TB] and File Qota GRP [1M:1.1M]
|
||||
* Individual USR quotas can be requested (by default are not set).
|
||||
|
||||
---
|
||||
|
||||
## Migration Schedule
|
||||
|
||||
### Phase 1 [June]: Pre-migration
|
||||
* Users keep working on Merlin5
|
||||
* Merlin5 production directories: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'``
|
||||
* Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch
|
||||
* Users can start migrating data (see [Migration steps](# Migration steps))
|
||||
* Users should copy their data from Merlin5 /gpfs/data to Merlin6 /data/user
|
||||
* Users should copy their home from Merlin5 /gpfs/home to Merlin6 /psi/home
|
||||
* Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
|
||||
|
||||
### Phase 2 [July-October]: Migration to Merlin6
|
||||
* Merlin6 becomes official cluster, and directories are switched to the new structure:
|
||||
* Merlin6 production directories: ``'/psi/home/'``, ``'/data/user'``, ``'/data/project'``
|
||||
* Merlin5 directories available in RO: ``'/gpfs/home/'``, ``'/gpfs/data'``, ``'/gpfs/group'``
|
||||
* Users can keep migrating their data (see [Migration steps](# Migration steps))
|
||||
* ALL data must be migrated
|
||||
* Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible.
|
||||
* Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.
|
||||
|
||||
### Phase 3 [November]: Merlin5 Decomission
|
||||
* Old Merlin5 storage unmounted.
|
||||
* Migrated directories reported by users will be deleted.
|
||||
* Remaining Merlin5 data will be archived.
|
||||
* Merlin5 Slurm cluster removed from production.
|
||||
|
||||
---
|
||||
|
||||
## Migration steps
|
||||
|
||||
### Cleanup / Archive files
|
||||
* Users must cleanup and/or archive files, according to quota limits in the storage.
|
||||
* If extra space is needed, *'project'* would be needed.
|
||||
* If extra files are needed, you can request for an increasement of the quota/
|
||||
|
||||
#### File list
|
||||
|
||||
### Step 1: Migrating
|
||||
|
||||
First migration:
|
||||
|
||||
```bash
|
||||
rsync -avAHXS <source_merlin5> <destination_merlin6>
|
||||
rsync -avAHXS /gpfs/data/$username/* /data/user/$username
|
||||
```
|
||||
|
||||
This can take several hours or days:
|
||||
* You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate.
|
||||
* Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together.
|
||||
* We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster.
|
||||
|
||||
### Step 2: Mirroring
|
||||
|
||||
Once first migration is done, a second ``rsync`` should be ran. This is done with ``--delete``. With this option ``rsync`` will
|
||||
behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate
|
||||
new files from the source to the destination.
|
||||
|
||||
```bash
|
||||
rsync -avAHXS --delete <source_merlin5> <destination_merlin6>
|
||||
rsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username
|
||||
```
|
||||
|
||||
### Step 3: Removing / Archiving old data
|
||||
|
||||
#### Removing migrated data
|
||||
|
||||
Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage.
|
||||
Users must report when migration is finished and report which directories are affected and ready to be removed.
|
||||
|
||||
Merlin administrators will remove the directories, always asking for a last confirmation.
|
||||
|
||||
#### Archiving data
|
||||
|
||||
Once all migrated data has been removed from the old storage, missing data will be archived.
|
||||
|
Reference in New Issue
Block a user