8.8 KiB
Meg to Merlin7 Migration Guide
Welcome to the official documentation for migrating experiment data from MEG to Merlin7. Please follow the instructions carefully to ensure a smooth and secure transition.
Directory Structure Changes
Meg vs Merlin6 vs Merlin7
| Cluster | Home Directory | User Data Directory | Experiment data | Additional notes |
|---|---|---|---|---|
| merlin6 | /psi/home/$USER |
/data/user/$USER |
/data/experiments/meg | Symlink /meg |
| meg | /meg/home/$USER |
N/A | /meg | |
| merlin7 | /data/user/$USER |
/data/user/$USER |
/data/project/meg |
- The Merlin6 home and user data directores have been merged into the single new home directory
/data/user/$USERon Merlin7.- This is the same for the home directory in the meg cluster, which has to be merged into
/data/user/$USERon Merlin7. - Users are responsible for moving the data.
- This is the same for the home directory in the meg cluster, which has to be merged into
- The experiment directory has been integrated into
/data/project/meg.
Recommended Cleanup Actions
- Remove unused files and datasets.
- Archive large, inactive data sets.
Mandatory Actions
- Stop activity on Meg and Merlin6 when performing the last rsync.
Migration Instructions
Preparation
A experiment_migration.setup migration script must be executed from any MeG node using the account that will perform the migration.
When using the local root account
- The script must be executed after every reboot of the destination nodes.
- Reason: On Merlin7, the home directory for the
rootuser resides on ephemeral storage (no physical disk). After a reboot, this directory is cleaned, so SSH keys need to be redeployed before running the migration again.
When using a PSI Active Directory (AD) account
- Applicable accounts include, for example:
gac-meg2_datagac-meg2
- The script only needs to be executed once, provided that:
- The home directory for the AD account is located on a shared storage area.
- This shared storage is accessible from the node executing the transfer.
- Reason: On Merlin7, these accounts have their home directories on persistent shared storage, so the SSH keys remain available across reboots.
To run it:
experiment_migration.setup
This script will:
- Check that you have an account on Merlin7.
- Configure and check that your environment is ready for transferring files via Slurm job.
If there are issues, the script will:
- Print clear diagnostic output
- Give you some hints to resolve the issue
If you are stuck, email: merlin-admins@lists.psi.ch/meg-admins@lists.psi.ch
Migration Procedure
- Run an initial sync, ideally within a
tmuxsession- This copies the bulk of the data from MeG to Merlin7.
- IMPORTANT: Do not modify the destination directories
- Please, before starting the transfer ensure that:
- The source and destination directories are correct.
- The destination directories exist.
- Run additional syncs if needed
- Subsequent syncs can be executed to transfer changes.
- Ensure that only one sync for the same directory runs at a time.
- Multiple syncs are often required since the first one may take several hours or even days.
- Schedule a date for the final migration:
- Any activity must be stopped on the source directory.
- In the same way, no activity must be done on the destination until the migration is complete.
- Perform a final sync with the
-Eoption (if it applies)- Use
-Eonly if you need to delete files on the destination that were removed from the source. - This ensures the destination becomes an exact mirror of the source.
- Never use
-Eafter the destination has gone into production, as it will delete new data created there.
- Use
- Disable access on the source folder.
- Enable access on the destination folder.
- At this point, no new syncs have to be performed.
!!! note "Important"
The `-E` option is destructive; handle with care.
Always verify that the destination is ready before triggering the final sync.
For optimal performance, use up to 12 threads with the -t option.
Running The Migration Script
The migration script is installed on the meg-s-001 server at:
/usr/local/bin/experiment_migration.bash
This script is primarily a wrapper around fpsync, providing additional logic for synchronizing MeG experiment data.
[root@meg-s-001 ~]# experiment_migration.bash --help
Usage: /usr/local/bin/experiment_migration.bash [options] -p <project_name>
Options:
-t | --threads N Number of parallel threads (default: 10). Recommended 12 as max.
-b | --experiment-src-basedir DIR Experiment base directory (default: /meg)
-S | --space-source SPACE Source project space name (default: data1)
-B | --experiment-dst-basedir DIR Experiment base directory (default: /data/project/meg)
-D | --space-destination SPACE Destination project space name (default: data1)
-p | --project-name PRJ_NAME Mantadory field. MeG project name. Examples:
- 'online'
- 'offline'
- 'shared'
-F | --force-destination-mkdir Create the destination parent directory (default: false)
Example: mkdir -p $(dirname /data/project/meg/data1/PROJECT_NAME)
Result: mkdir -p /data/project/meg/data1
-s | --split N Number of files per split (default: 20000)
-f | --filesize SIZE File size threshold (default: 100G)
-r | --runid ID Reuse an existing runid session
-l | --list-runids List available runid sessions and exit
-x | --delete-runid Delete runid. Requires: -r | --runid ID
-E | --rsync-delete-option [WARNING] Use this to delete files in the destination
which are not present in the source any more.
[WARNING] USE THIS OPTION CAREFULLY!
Typically used in last rsync to have an exact
mirror of the source directory.
[WARNING] Some files in destination might be deleted!
Use 'man fpsync' for more information.
-h | --help Show this help message
-v | --verbose Run fpsync with -v option
!!! tip
Defaults can be updated if necessary.
Migration examples
Example: Migrating the Entire online Directory
The following example demonstrates how to migrate the entire online directory.
!!! tip
You may also choose to migrate only specific subdirectories if needed.
However, migrating full directories is generally **simpler** and **less
error-prone** compared to handling multiple subdirectory migrations.
[root@meg-s-001 ~]# experiment_migration.bash -S data1 -D data1 -p "online"
🔄 Transferring project:
From: /meg/data1/online
To: login001.merlin7.psi.ch:/data/project/meg/data1/online
Threads: 10 | Split: 20000 files | Max size: 100G
RunID:
Please confirm to start (y/N):
❌ Transfer cancelled by user.
Example: Migrating a Specific Subdirectory
The following example demonstrates how to migrate only a subdirectory. In this case, we use the option -F to create the parent directory in the destination, to ensure that this exists before transferring:
⚠️ Important:
- When migrating a subdirectory, do not run concurrent migrations on its parent directories.
- For example, avoid running migrations with
-p "shared"while simultaneously migrating-p "shared/subprojects".
[root@meg-s-001 ~]# experiment_migration.bash -p "shared/subprojects/meg1" -F
🔄 Transferring project:
From: /meg/data1/shared/subprojects/meg1
To: login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1
Threads: 10 | Split: 20000 files | Max size: 100G
RunID:
Please confirm to start (y/N): N
❌ Transfer cancelled by user.
This command initiates the migration of the directory, by creating the destination parant directory (-F option):
-
Creates the destination directory as follows:
ssh login002.merlin.psi.ch mkdir -p /data/project/meg/data1/shared/subprojects -
Runs FPSYNC with 10 threads and N parts of max 20000 files or 100G files:
- Source:
/meg/data1/shared/subprojects/meg1 - Destination:
login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1
- Source: