All checks were successful
Build and Deploy Documentation / build-and-deploy (push) Successful in 6s
200 lines
9.0 KiB
Markdown
200 lines
9.0 KiB
Markdown
---
|
|
#tags:
|
|
keywords: meg, merlin6, merlin7, migration, fpsync, rsync
|
|
#summary: ""
|
|
sidebar: meg_sidebar
|
|
last_updated: 28 May 2025
|
|
permalink: /meg/migrating.html
|
|
---
|
|
|
|
# Meg to Merlin7 Migration Guide
|
|
|
|
Welcome to the official documentation for migrating experiment data from **MEG** to **Merlin7**. Please follow the instructions carefully to ensure a smooth and secure transition.
|
|
|
|
---
|
|
|
|
## Directory Structure Changes
|
|
|
|
### Meg vs Merlin6 vs Merlin7
|
|
|
|
| Cluster | Home Directory | User Data Directory | Experiment data | Additional notes |
|
|
| ------- | :----------------- | :------------------ | --------------------- | ---------------- |
|
|
| merlin6 | /psi/home/`$USER` | /data/user/`$USER` | /data/experiments/meg | Symlink /meg |
|
|
| meg | /meg/home/`$USER` | N/A | /meg | |
|
|
| merlin7 | /data/user/`$USER` | /data/user/`$USER` | /data/project/meg | |
|
|
|
|
* The **Merlin6 home and user data directores have been merged** into the single new home directory `/data/user/$USER` on Merlin7.
|
|
* This is the same for the home directory in the meg cluster, which has to be merged into `/data/user/$USER` on Merlin7.
|
|
* Users are responsible for moving the data.
|
|
* The **experiment directory has been integrated into `/data/project/meg`**.
|
|
|
|
### Recommended Cleanup Actions
|
|
|
|
* Remove unused files and datasets.
|
|
* Archive large, inactive data sets.
|
|
|
|
### Mandatory Actions
|
|
|
|
* Stop activity on Meg and Merlin6 when performing the last rsync.
|
|
|
|
## Migration Instructions
|
|
|
|
### Preparation
|
|
|
|
A `experiment_migration.setup` migration script must be executed from **any MeG node** using the account that will perform the migration.
|
|
|
|
#### When using the local `root` account
|
|
- The script **must be executed after every reboot** of the destination nodes.
|
|
- **Reason:** On Merlin7, the home directory for the `root` user resides on ephemeral storage (no physical disk).
|
|
After a reboot, this directory is cleaned, so **SSH keys need to be redeployed** before running the migration again.
|
|
|
|
#### When using a PSI Active Directory (AD) account
|
|
- Applicable accounts include, for example:
|
|
- `gac-meg2_data`
|
|
- `gac-meg2`
|
|
- The script only needs to be executed **once**, provided that:
|
|
- The home directory for the AD account is located on a shared storage area.
|
|
- This shared storage is accessible from the node executing the transfer.
|
|
- **Reason:** On Merlin7, these accounts have their home directories on persistent shared storage, so the SSH keys remain available across reboots.
|
|
|
|
To run it:
|
|
```bash
|
|
experiment_migration.setup
|
|
```
|
|
|
|
This script will:
|
|
|
|
* Check that you have an account on Merlin7.
|
|
* Configure and check that your environment is ready for transferring files via Slurm job.
|
|
|
|
If there are issues, the script will:
|
|
|
|
* Print clear diagnostic output
|
|
* Give you some hints to resolve the issue
|
|
|
|
If you are stuck, email: [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists.psi.ch)/[meg-admins@lists.psi.ch](mailto:meg-admins@lists.psi.ch)
|
|
|
|
### Migration Procedure
|
|
|
|
1. **Run an initial sync**, ideally within a `tmux` session
|
|
* This copies the bulk of the data from MeG to Merlin7.
|
|
* **IMPORTANT: Do not modify the destination directories**
|
|
* Please, before starting the transfer ensure that:
|
|
* The source and destination directories are correct.
|
|
* The destination directories exist.
|
|
2. **Run additional syncs if needed**
|
|
* Subsequent syncs can be executed to transfer changes.
|
|
* Ensure that **only one sync for the same directory runs at a time**.
|
|
* Multiple syncs are often required since the first one may take several hours or even days.
|
|
3. Schedule a date for the final migration:
|
|
* Any activity must be stopped on the source directory.
|
|
* In the same way, no activity must be done on the destination until the migration is complete.
|
|
4. **Perform a final sync with the `-E` option** (if it applies)
|
|
* Use `-E` **only if you need to delete files on the destination that were removed from the source.**
|
|
* This ensures the destination becomes an exact mirror of the source.
|
|
* **Never use `-E` after the destination has gone into production**, as it will delete new data created there.
|
|
5. Disable access on the source folder.
|
|
6. Enable access on the destination folder.
|
|
* At this point, **no new syncs have to be performed.**
|
|
|
|
> ⚠️ **Important Notes**
|
|
> The `-E` option is destructive; handle with care.
|
|
> Always verify that the destination is ready before triggering the final sync.
|
|
> For optimal performance, use up to 12 threads with the -t option.
|
|
|
|
#### Running The Migration Script
|
|
|
|
The migration script is installed on the `meg-s-001` server at:
|
|
`/usr/local/bin/experiment_migration.bash`
|
|
|
|
This script is primarily a **wrapper** around `fpsync`, providing additional logic for synchronizing MeG experiment data.
|
|
|
|
```bash
|
|
[root@meg-s-001 ~]# experiment_migration.bash --help
|
|
Usage: /usr/local/bin/experiment_migration.bash [options] -p <project_name>
|
|
|
|
Options:
|
|
-t | --threads N Number of parallel threads (default: 10). Recommended 12 as max.
|
|
-b | --experiment-src-basedir DIR Experiment base directory (default: /meg)
|
|
-S | --space-source SPACE Source project space name (default: data1)
|
|
-B | --experiment-dst-basedir DIR Experiment base directory (default: /data/project/meg)
|
|
-D | --space-destination SPACE Destination project space name (default: data1)
|
|
-p | --project-name PRJ_NAME Mantadory field. MeG project name. Examples:
|
|
- 'online'
|
|
- 'offline'
|
|
- 'shared'
|
|
-F | --force-destination-mkdir Create the destination parent directory (default: false)
|
|
Example: mkdir -p $(dirname /data/project/meg/data1/PROJECT_NAME)
|
|
Result: mkdir -p /data/project/meg/data1
|
|
-s | --split N Number of files per split (default: 20000)
|
|
-f | --filesize SIZE File size threshold (default: 100G)
|
|
-r | --runid ID Reuse an existing runid session
|
|
-l | --list-runids List available runid sessions and exit
|
|
-x | --delete-runid Delete runid. Requires: -r | --runid ID
|
|
-E | --rsync-delete-option [WARNING] Use this to delete files in the destination
|
|
which are not present in the source any more.
|
|
[WARNING] USE THIS OPTION CAREFULLY!
|
|
Typically used in last rsync to have an exact
|
|
mirror of the source directory.
|
|
[WARNING] Some files in destination might be deleted!
|
|
Use 'man fpsync' for more information.
|
|
|
|
-h | --help Show this help message
|
|
-v | --verbose Run fpsync with -v option
|
|
```
|
|
|
|
> Defaults can be updated if necessary.
|
|
|
|
#### Migration examples
|
|
|
|
##### Example: Migrating the Entire `online` Directory
|
|
|
|
The following example demonstrates how to migrate the **entire `online`** directory.
|
|
|
|
{{site.data.alerts.tip}}
|
|
You may also choose to migrate only specific subdirectories if needed.
|
|
However, migrating full directories is generally <b>simpler</b> and <b>less error-prone</b> compared to handling multiple subdirectory migrations.
|
|
{{site.data.alerts.end}}
|
|
|
|
```bash
|
|
[root@meg-s-001 ~]# experiment_migration.bash -S data1 -D data1 -p "online"
|
|
🔄 Transferring project:
|
|
From: /meg/data1/online
|
|
To: login001.merlin7.psi.ch:/data/project/meg/data1/online
|
|
Threads: 10 | Split: 20000 files | Max size: 100G
|
|
RunID:
|
|
|
|
Please confirm to start (y/N):
|
|
❌ Transfer cancelled by user.
|
|
```
|
|
|
|
##### Example: Migrating a Specific Subdirectory
|
|
|
|
The following example demonstrates how to migrate **only a subdirectory**. In this case, we use the option `-F` to create the parent directory in the destination, to ensure that this exists before transferring:
|
|
|
|
⚠️ **Important:**
|
|
- When migrating a subdirectory, **do not** run concurrent migrations on its parent directories.
|
|
- For example, avoid running migrations with `-p "shared"` while simultaneously migrating `-p "shared/subprojects"`.
|
|
|
|
```bash
|
|
[root@meg-s-001 ~]# experiment_migration.bash -p "shared/subprojects/meg1" -F
|
|
🔄 Transferring project:
|
|
From: /meg/data1/shared/subprojects/meg1
|
|
To: login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1
|
|
Threads: 10 | Split: 20000 files | Max size: 100G
|
|
RunID:
|
|
|
|
Please confirm to start (y/N): N
|
|
❌ Transfer cancelled by user.
|
|
```
|
|
|
|
This command initiates the migration of the directory, by creating the destination parant directory (`-F` option):
|
|
* Creates the destination directory as follows:
|
|
|
|
```bash
|
|
ssh login002.merlin.psi.ch mkdir -p /data/project/meg/data1/shared/subprojects
|
|
```
|
|
* Runs FPSYNC with 10 threads and N parts of max 20000 files or 100G files:
|
|
* Source: `/meg/data1/shared/subprojects/meg1`
|
|
* Destination: `login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1`
|