9.0 KiB
keywords, sidebar, last_updated, permalink
| keywords | sidebar | last_updated | permalink |
|---|---|---|---|
| meg, merlin6, merlin7, migration, fpsync, rsync | meg_sidebar | 28 May 2025 | /meg/migrating.html |
Meg to Merlin7 Migration Guide
Welcome to the official documentation for migrating experiment data from MEG to Merlin7. Please follow the instructions carefully to ensure a smooth and secure transition.
Directory Structure Changes
Meg vs Merlin6 vs Merlin7
| Cluster | Home Directory | User Data Directory | Experiment data | Additional notes |
|---|---|---|---|---|
| merlin6 | /psi/home/$USER |
/data/user/$USER |
/data/experiments/meg | Symlink /meg |
| meg | /meg/home/$USER |
N/A | /meg | |
| merlin7 | /data/user/$USER |
/data/user/$USER |
/data/project/meg |
- The Merlin6 home and user data directores have been merged into the single new home directory
/data/user/$USERon Merlin7.- This is the same for the home directory in the meg cluster, which has to be merged into
/data/user/$USERon Merlin7. - Users are responsible for moving the data.
- This is the same for the home directory in the meg cluster, which has to be merged into
- The experiment directory has been integrated into
/data/project/meg.
Recommended Cleanup Actions
- Remove unused files and datasets.
- Archive large, inactive data sets.
Mandatory Actions
- Stop activity on Meg and Merlin6 when performing the last rsync.
Migration Instructions
Preparation
A experiment_migration.setup migration script must be executed from any MeG node using the account that will perform the migration.
When using the local root account
- The script must be executed after every reboot of the destination nodes.
- Reason: On Merlin7, the home directory for the
rootuser resides on ephemeral storage (no physical disk). After a reboot, this directory is cleaned, so SSH keys need to be redeployed before running the migration again.
When using a PSI Active Directory (AD) account
- Applicable accounts include, for example:
gac-meg2_datagac-meg2
- The script only needs to be executed once, provided that:
- The home directory for the AD account is located on a shared storage area.
- This shared storage is accessible from the node executing the transfer.
- Reason: On Merlin7, these accounts have their home directories on persistent shared storage, so the SSH keys remain available across reboots.
To run it:
experiment_migration.setup
This script will:
- Check that you have an account on Merlin7.
- Configure and check that your environment is ready for transferring files via Slurm job.
If there are issues, the script will:
- Print clear diagnostic output
- Give you some hints to resolve the issue
If you are stuck, email: merlin-admins@lists.psi.ch/meg-admins@lists.psi.ch
Migration Procedure
- Run an initial sync, ideally within a
tmuxsession- This copies the bulk of the data from MeG to Merlin7.
- IMPORTANT: Do not modify the destination directories
- Please, before starting the transfer ensure that:
- The source and destination directories are correct.
- The destination directories exist.
- Run additional syncs if needed
- Subsequent syncs can be executed to transfer changes.
- Ensure that only one sync for the same directory runs at a time.
- Multiple syncs are often required since the first one may take several hours or even days.
- Schedule a date for the final migration:
- Any activity must be stopped on the source directory.
- In the same way, no activity must be done on the destination until the migration is complete.
- Perform a final sync with the
-Eoption (if it applies)- Use
-Eonly if you need to delete files on the destination that were removed from the source. - This ensures the destination becomes an exact mirror of the source.
- Never use
-Eafter the destination has gone into production, as it will delete new data created there.
- Use
- Disable access on the source folder.
- Enable access on the destination folder.
- At this point, no new syncs have to be performed.
⚠️ Important Notes The
-Eoption is destructive; handle with care. Always verify that the destination is ready before triggering the final sync. For optimal performance, use up to 12 threads with the -t option.
Running The Migration Script
The migration script is installed on the meg-s-001 server at:
/usr/local/bin/experiment_migration.bash
This script is primarily a wrapper around fpsync, providing additional logic for synchronizing MeG experiment data.
[root@meg-s-001 ~]# experiment_migration.bash --help
Usage: /usr/local/bin/experiment_migration.bash [options] -p <project_name>
Options:
-t | --threads N Number of parallel threads (default: 10). Recommended 12 as max.
-b | --experiment-src-basedir DIR Experiment base directory (default: /meg)
-S | --space-source SPACE Source project space name (default: data1)
-B | --experiment-dst-basedir DIR Experiment base directory (default: /data/project/meg)
-D | --space-destination SPACE Destination project space name (default: data1)
-p | --project-name PRJ_NAME Mantadory field. MeG project name. Examples:
- 'online'
- 'offline'
- 'shared'
-F | --force-destination-mkdir Create the destination parent directory (default: false)
Example: mkdir -p $(dirname /data/project/meg/data1/PROJECT_NAME)
Result: mkdir -p /data/project/meg/data1
-s | --split N Number of files per split (default: 20000)
-f | --filesize SIZE File size threshold (default: 100G)
-r | --runid ID Reuse an existing runid session
-l | --list-runids List available runid sessions and exit
-x | --delete-runid Delete runid. Requires: -r | --runid ID
-E | --rsync-delete-option [WARNING] Use this to delete files in the destination
which are not present in the source any more.
[WARNING] USE THIS OPTION CAREFULLY!
Typically used in last rsync to have an exact
mirror of the source directory.
[WARNING] Some files in destination might be deleted!
Use 'man fpsync' for more information.
-h | --help Show this help message
-v | --verbose Run fpsync with -v option
Defaults can be updated if necessary.
Migration examples
Example: Migrating the Entire online Directory
The following example demonstrates how to migrate the entire online directory.
{{site.data.alerts.tip}} You may also choose to migrate only specific subdirectories if needed. However, migrating full directories is generally simpler and less error-prone compared to handling multiple subdirectory migrations. {{site.data.alerts.end}}
[root@meg-s-001 ~]# experiment_migration.bash -S data1 -D data1 -p "online"
🔄 Transferring project:
From: /meg/data1/online
To: login001.merlin7.psi.ch:/data/project/meg/data1/online
Threads: 10 | Split: 20000 files | Max size: 100G
RunID:
Please confirm to start (y/N):
❌ Transfer cancelled by user.
Example: Migrating a Specific Subdirectory
The following example demonstrates how to migrate only a subdirectory. In this case, we use the option -F to create the parent directory in the destination, to ensure that this exists before transferring:
⚠️ Important:
- When migrating a subdirectory, do not run concurrent migrations on its parent directories.
- For example, avoid running migrations with
-p "shared"while simultaneously migrating-p "shared/subprojects".
[root@meg-s-001 ~]# experiment_migration.bash -p "shared/subprojects/meg1" -F
🔄 Transferring project:
From: /meg/data1/shared/subprojects/meg1
To: login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1
Threads: 10 | Split: 20000 files | Max size: 100G
RunID:
Please confirm to start (y/N): N
❌ Transfer cancelled by user.
This command initiates the migration of the directory, by creating the destination parant directory (-F option):
-
Creates the destination directory as follows:
ssh login002.merlin.psi.ch mkdir -p /data/project/meg/data1/shared/subprojects -
Runs FPSYNC with 10 threads and N parts of max 20000 files or 100G files:
- Source:
/meg/data1/shared/subprojects/meg1 - Destination:
login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1
- Source: