From e85ef5c7768f03555ad8ec08b33507f8476b48eb Mon Sep 17 00:00:00 2001 From: caubet_m Date: Tue, 5 Aug 2025 17:02:09 +0200 Subject: [PATCH] Add MeG migration instructions --- _data/sidebars/meg_sidebar.yml | 17 + .../meg/99-support/migration-from-merlin6.md | 328 ------------------ pages/meg/99-support/migration-to-merlin7.md | 172 +++++++++ 3 files changed, 189 insertions(+), 328 deletions(-) create mode 100644 _data/sidebars/meg_sidebar.yml delete mode 100644 pages/meg/99-support/migration-from-merlin6.md create mode 100644 pages/meg/99-support/migration-to-merlin7.md diff --git a/_data/sidebars/meg_sidebar.yml b/_data/sidebars/meg_sidebar.yml new file mode 100644 index 0000000..2b54992 --- /dev/null +++ b/_data/sidebars/meg_sidebar.yml @@ -0,0 +1,17 @@ +# Follow the pattern here for the URLs -- no slash at the beginning, and include the .html. The link here is rendered exactly as is in the Markdown references. + +entries: + - product: MEG + version: + folders: + - title: Quick Start Guide + folderitems: + - title: Introduction + url: /meg/introduction.html + - title: Support + folderitems: + - title: MeG Migration Guide + url: /meg/migrating.html + - title: Contact + url: /meg/contact.html + diff --git a/pages/meg/99-support/migration-from-merlin6.md b/pages/meg/99-support/migration-from-merlin6.md deleted file mode 100644 index fa51cd8..0000000 --- a/pages/meg/99-support/migration-from-merlin6.md +++ /dev/null @@ -1,328 +0,0 @@ ---- -#tags: -keywords: meg, merlin6, merlin7, migration, fpsync, rsync -#summary: "" -sidebar: meg_sidebar -last_updated: 28 May 2025 -permalink: /meg/migrating.html ---- - -# Merlin6 to Merlin7 Migration Guide - -Welcome to the official documentation for migrating your data from **Merlin6** to **Merlin7**. Please follow the instructions carefully to ensure a smooth and secure transition. - -## ๐Ÿ“… Migration Schedule - -### Phase 1: Users without Projects โ€” **Deadline: July 11** - -If you **do not belong to any Merlin project**, i.e for - -* Users not in any group project (`/data/projects/general`) -* Users not in BIO, MEG, Mu3e -* Users not part of PSI-owned private Merlin nodes (ASA, MEG, Mu3e) - -You must complete your migration **before July 11**. You just need to migrate your personal */data/user/$USER* and */home/psi/$USER* directories. - -Users are responsible for initiating and completing the migration process as lined out below. -Contact the Merlin support team [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists.psi.ch) if you need help. - -> โš ๏ธ In this phase, **it's important that you don't belong to any project**. -> Once the migration is finished, **access to Merlin6 will be no longer possible.** - -Please refer to the [Phase 1: Step-by-Step Migration Instructions](/merlin7/migrating.html#phase-1-step-by-step-migration-instructions) section -for detailed information about user data migration. - -### Phase 2: Project Members and Owners โ€” **Start Before August 1** - -For users in active projects: - -* Project **owners and members will be contacted by the Merlin admins**. -* Migration will be **scheduled individually per project**. -* Expect contact **before August 1**. - -> โš ๏ธ In this phase, **data and home directories of group owners and members will be also requested to be migrated in parallel.** - -Please refer to the [Phase 2: Migration Instructions](/merlin7/migrating.html#phase-2-migration-instructions) section -for further information. - ---- - -## Directory Structure Changes - -### Merlin6 vs Merlin7 - -| Cluster | Home Directory | User Data Directory | Projects | Experiments | -| ------- | :----------------- | :------------------ | -------------- | ----------------- | -| merlin6 | /psi/home/`$USER` | /data/user/`$USER` | /data/project/ | /data/experiments | -| merlin7 | /data/user/`$USER` | /data/user/`$USER` | /data/project/ | /data/project/ | - -* The **home directory and user data directory have been merged** into the single new home directory`/data/user/$USER`. -* The **experiments directory has been integrated into `/data/project/`**: - - * `/data/project/general` contains general Merlin7 projects. - * Other subdirectories are used for large-scale projects such as CLS division, Mu3e, and MeG. - ---- - -## ๐Ÿ“‹ Prerequisites and Preparation - -Before starting the migration, make sure you: - -* are **registered on Merlin7**. - - * If not yet registered, please do so following [these instructions](../merlin7/request-account.html) - -* **have cleaned up your data to reduce migration time and space usage**. -* **For the user data migration**, ensure your total usage on Merlin6 (`/psi/home`+`/data/user`) is **well below the 1โ€ฏTB quota** (use the `merlin_quotas` command). Remember: - - * **Merlin7 also has a 1โ€ฏTB quota on your home directory**, and you might already have data there. - * If your usage exceeds this during the transfer, the process might fail. -* No activity should be running / performed on Merlin6 when the transfer process is ongoing. - -### Recommended Cleanup Actions - -* Remove unused files and datasets. -* Archive large, inactive data sets. -* Delete or clean up unused `conda` or `virtualenv` Python environments: - - * These are often large and may not work as-is on Merlin7. - * You can export your conda environment description to a file with: - - ```bash - conda env export -n myenv > $HOME/myenv.yml - ``` - * Then recreate them later on Merlin7 from these files. - -> ๐Ÿงน For the **user data**, you can always remove more old data **after** migration โ€” it will be copied into `~/merlin6data` and `~/merlin6home` on Merlin7. - ---- - -## Phase 1: Step-by-Step Migration Instructions - -### Step 1: Run `merlin7_migration.setup` - -Log into any **Merlin6 login node** (`merlin-l-001.psi.ch`, `merlin-l-002.psi.ch`, `merlin-l-01.psi.ch`) and run: - -```bash -merlin7_migration.setup -``` - -This script will: - -* Check that you have an account on Merlin7. -* Configure and check that your environment is ready for transferring files via Slurm job. -* **Create two directories:** - - * `~/merlin6data` โ†’ copy of your old /data/user - * `~/merlin6home` โ†’ copy of your old home - -> โš ๏ธ **Important:** If `~/merlin6home` or `~/merlin6data` already exist on Merlin7, the script will exit. -> **Please remove them or contact support**. - -If there are issues, the script will: - -* Print clear diagnostic output -* Give you some hints to resolve the issue - -If you are stuck, email: [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists.psi.ch) - ---- - -### Step 2: Run `merlin7_migration.start` - -After setup completes, start the migration by running: - -```bash -merlin7_migration.start -``` - -This script will: - -* Check the status of your quota on Merlin6. -* Submit **SLURM batch jobs** to the **`xfer`** partition -* Queue two jobs: - - * `migrate_merlin6data.batch` (data dir) - * `migrate_merlin6home.batch` (home dir) - * This job will only start if `migrate_merlin6data.batch` has successfully - finished. -* Automatically track the job IDs -* Print log file locations for the different jobs - -> โš ๏ธ **Once both transfers succeed, your access to Merlin6 will be revoked.** -> Do **not** attempt to reconnect to Merlin6 after this. - -#### โ— If Something Goes Wrong - -If a problem occurs during the migration process: - -* ๐Ÿ” **Check the job log files** mentioned in the script output. They contain detailed messages that explain what failed and why. -* ๐Ÿ› ๏ธ **Fix the root cause** on the source system. Common issues include: - - * Files with incorrect permissions - * Ownership mismatches - * Disk quota exceeded on Merlin7 -* ๐Ÿ“š Refer to the [โš ๏ธ Common rsync/fpsync Migration Issues](/merlin7/migrating.html#%EF%B8%8F--common-rsyncfpsync-migration-issues) section below for detailed explanations and solutions. - -> โ„น๏ธ **Important:** If `migrate_merlin6data.batch` fails, the migration process will automatically cancel `migrate_merlin6home.batch` to avoid ending in an inconsistent state. - -Once the problem is resolved, simply re-run the `merlin7_migration.start` script to resume the migration. - ---- - -### Step 3: Monitor Transfer Jobs - -To monitor your transfer jobs, run: - -```bash -squeue -M merlin6 -u $USER -p xfer -``` - -Check the output to ensure your jobs are: - -* Running (`R`) or completed (`CG` or removed from queue) -* Not failed (`F`, `TO`, or stuck) - -You can also check logs (as printed by the script) to verify job completion. - -> โœ… When `/data/user/$USER` and `/psi/home/$USER` on Merlin6 are no longer accessible, migration is complete. - ---- - -### Examples - -#### Setup the Migration - -```bash -merlin7_migration.setup -``` - -*Expected output:* - -```bash -โœ… login002.merlin7.psi.ch -โœ… `$USER` is a member of svc-cluster_merlin7 -โœ… Skipping key generation -โœ… SSH key already added to agent. -โœ… SSH ID successfully copied to login00[1|2].merlin7.psi.ch. -โœ… Test successful. -โœ… /data/software/xfer_logs/caubet_m created. -โœ… ~/merlin6data directory created. -โœ… ~/merlin6home directory created. -``` - -#### Start the Migration - -```bash -merlin7_migration.start -``` - -*Expected output:* - -```bash -(base) โ„ [caubet_m@merlin-l-001:/data/software/admin/scripts/merlin-user-tools/alps(master)]# ./merlin7_migration.start -โœ… Quota check passed. -Used: 512 GB, 234001 files - -################################################### -Submitting transfer jobs to Slurm - - Job logs can be found here: -โžก๏ธ Directory '/data/user/caubet_m' does NOT have 000 permissions. Transfer pending, continuing... -โœ… Submitted DATA_MIGRATION job: 24688554. Sleeping 3 seconds... - - /data/user transfer logs: - - /data/software/xfer_logs/caubet_m/data-24688554.out - - /data/software/xfer_logs/caubet_m/data-24688554.err -โžก๏ธ Directory '/psi/home/caubet_m' does NOT have 000 permissions. Transfer pending, continuing... -โœ… Submitted HOME_MIGRATION job with dependency on 24688554: 24688555. Sleeping 3 seconds... - - /psi/home transfer logs: - - /data/software/xfer_logs/caubet_m/home-24688555.out - - /data/software/xfer_logs/caubet_m/home-24688555.err - -โœ… You can start manually a monitoring window with: - tmux new-session -d -s "xfersession" "watch 'squeue -M merlin6 -u caubet_m -p xfer'" - tmux attach -t "xfersession" - -โœ… FINISHED - PLEASE CHECK JOB TRANSFER PROGRESS -``` - -#### Monitor Progress - -```bash -squeue -M merlin6 -u $USER -p xfer -``` - -*Output:* - -```bash -$ squeue -M merlin6 -u $USER -p xfer -CLUSTER: merlin6 - JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) - 24688581 xfer HOME_MIG caubet_m PD 0:00 1 (Dependency) - 24688580 xfer DATA_MIG caubet_m R 0:22 1 merlin-c-017 -``` - ---- - -## Phase 2: Migration Instructions - -Please refer to the [Prerequisites and Preparation](/merlin7/migrating.html#-prerequisites-and-preparation) section for initial setup steps. -Further instructions will be sent via email once the owning team is contacted by the Merlin administrators. - ---- - -## โš ๏ธ Common `rsync`/`fpsync` Migration Issues - -### File Permission Denied - -* **Cause**: Files or directories are not readable by the user running the transfer. -* **Solution**: Fix source-side permissions: - - ```bash - chmod -R u+rX /path/to/file_or_dir - ``` - -### Ownership Mismatches - -* **Cause**: Source files are owned by another user (e.g. root or a collaborator). -* **Solution**: - - * Change ownership before migration: - - ```bash - chown -R $USER /path/to/file - ``` - -### Special Files (e.g. device files, sockets) - -* **Cause**: `rsync` tries to copy UNIX sockets, device files, or FIFOs. -* **Effect**: Errors or incomplete copies. -* **Solution**: Avoid transferring such files entirely (by deleting them). - -### Exceeded Disk Quota - -* **Cause**: Combined size of existing + incoming data exceeds 1โ€ฏTB quota on Merlin7. -* **Effect**: Transfer stops abruptly. -* **Solution**: Clean up or archive non-essential data before migration. - -### Very Small Files or Large Trees โ†’ Many Small rsync Calls - -* **Cause**: Directory with thousands/millions of small files. -* **Effect**: Transfer is slow or hits process limits. -* **Solution**: Consider archiving to `.tar.gz` before transferring: - - ```bash - tar -czf myenv.tar.gz myenv/ - ``` - ---- - -## Need Help? - -If something doesn't work: - -* Re-run the scripts and check the logs carefully. -* Use `less`, `cat`, or `tail -f` to view your job logs. -* Contact the Merlin support team: ๐Ÿ“ง [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists.psi.ch) - -> We are here to help you migrate safely and efficiently. diff --git a/pages/meg/99-support/migration-to-merlin7.md b/pages/meg/99-support/migration-to-merlin7.md new file mode 100644 index 0000000..edb22fb --- /dev/null +++ b/pages/meg/99-support/migration-to-merlin7.md @@ -0,0 +1,172 @@ +--- +#tags: +keywords: meg, merlin6, merlin7, migration, fpsync, rsync +#summary: "" +sidebar: meg_sidebar +last_updated: 28 May 2025 +permalink: /meg/migrating.html +--- + +# Meg to Merlin7 Migration Guide + +Welcome to the official documentation for migrating experiment data from **MEG** to **Merlin7**. Please follow the instructions carefully to ensure a smooth and secure transition. + +--- + +## Directory Structure Changes + +### Meg vs Merlin6 vs Merlin7 + +| Cluster | Home Directory | User Data Directory | Experiment data | Additional notes | +| ------- | :----------------- | :------------------ | --------------------- | ---------------- | +| merlin6 | /psi/home/`$USER` | /data/user/`$USER` | /data/experiments/meg | Symlink /meg | +| meg | /meg/home/`$USER` | N/A | /meg | | +| merlin7 | /data/user/`$USER` | /data/user/`$USER` | /data/project/meg | | + +* The **Merlin6 home and user data directores have been merged** into the single new home directory `/data/user/$USER` on Merlin7. + * This is the same for the home directory in the meg cluster, which has to be merged into `/data/user/$USER` on Merlin7. + * Users are responsible for moving the data. +* The **experiment directory has been integrated into `/data/project/meg`**. + +### Recommended Cleanup Actions + +* Remove unused files and datasets. +* Archive large, inactive data sets. + +### Mandatory Actions + +* Stop activity on Meg and Merlin6 when performing the last rsync. + +## Migration Instructions + +### Preparation + +A `experiment_migration.setup` migration script must be executed from **any MeG node** using the account that will perform the migration. + +#### When using the local `root` account +- The script **must be executed after every reboot** of the destination nodes. +- **Reason:** On Merlin7, the home directory for the `root` user resides on ephemeral storage (no physical disk). +After a reboot, this directory is cleaned, so **SSH keys need to be redeployed** before running the migration again. + +#### When using a PSI Active Directory (AD) account +- Applicable accounts include, for example: + - `gac-meg2_data` + - `gac-meg2` +- The script only needs to be executed **once**, provided that: + - The home directory for the AD account is located on a shared storage area. + - This shared storage is accessible from the node executing the transfer. +- **Reason:** On Merlin7, these accounts have their home directories on persistent shared storage, so the SSH keys remain available across reboots. + +To run it: +```bash +experiment_migration.setup +``` + +This script will: + +* Check that you have an account on Merlin7. +* Configure and check that your environment is ready for transferring files via Slurm job. + +If there are issues, the script will: + +* Print clear diagnostic output +* Give you some hints to resolve the issue + +If you are stuck, email: [merlin-admins@lists.psi.ch](mailto:merlin-admins@lists.psi.ch)/[meg-admins@lists.psi.ch](mailto:meg-admins@lists.psi.ch) + +### Migration Procedure + +1. **Run an initial sync** + * This copies the bulk of the data from MeG to Merlin7. + * **IMPORTANT: Do not modify the destination directories** + * Please, before starting the transfer ensure that: + * The source and destination directories are correct. + * The destination directories exist. +2. **Run additional syncs if needed** + * Subsequent syncs can be executed to transfer changes. + * Ensure that **only one sync for the same directory runs at a time**. + * Multiple syncs are often required since the first one may take several hours or even days. +3. Schedule a date for the final migration: + * Any activity must be stopped on the source directory. + * In the same way, no activity must be done on the destination until the migration is complete. +4. **Perform a final sync with the `-E` option** (if it applies) + * Use `-E` **only if you need to delete files on the destination that were removed from the source.** + * This ensures the destination becomes an exact mirror of the source. + * **Never use `-E` after the destination has gone into production**, as it will delete new data created there. +5. Disable access on the source folder. +6. Enable access on the destination folder. + * At this point, **no new syncs have to be performed.** + +> โš ๏ธ **Important Notes** +> The `-E` option is destructive; handle with care. +> Always verify that the destination is ready before triggering the final sync. +> For optimal performance, use up to 12 threads with the -t option. + +#### Running The Migration Script + +The migration script is installed on the `meg-s-001` server at: +`/usr/local/bin/experiment_migration.bash` + +This script is primarily a **wrapper** around `fpsync`, providing additional logic for synchronizing MeG experiment data. + +```bash +[root@meg-s-001 ~]# experiment_migration.bash --help +Usage: ./experiment_migration.bash [options] -p + +Options: + -t | --threads N Number of parallel threads (default: 10). Recommended 12 as max. + -b | --experiment-src-basedir DIR Experiment base directory (default: /data/experiment/meg) + -B | --experiment-dst-basedir DIR Experiment base directory (default: /data/experiment/meg) + -S | --space-source SPACE Source project space name (default: data1) + -D | --space-destination SPACE Destination project space name (default: data1) + -p | --project-name PRJ_NAME Mantadory field. MeG project name. Examples: + - 'online' + - 'offline' + - 'shared' + -s | --split N Number of files per split (default: 20000) + -f | --filesize SIZE File size threshold (default: 100G) + -r | --runid ID Reuse an existing runid session + -l | --list-runids List available runid sessions and exit + -x | --delete-runid Delete runid. Requires: -r | --runid ID + -E | --rsync-delete-option [WARNING] Use this to delete files in the destination + which are not present in the source any more. + [WARNING] USE THIS OPTION CAREFULLY! + Typically used in last rsync to have an exact + mirror of the source directory. + [WARNING] Some files in destination might be deleted! + Use 'man fpsync' for more information. + + -h | --help Show this help message + -v | --verbose Run fpsync with -v option +``` + +> Defaults can be updated if necessary. + +#### Migration example + +The following example migrates the entire online data + +```bash +[root@meg-s-001 bin]# experiment_migration.bash -b /meg -S /data/project/meg -S data1 -D data1 -p "online" +๐Ÿ”„ Transferring project: + From: /meg/data1/online + To: login002.merlin7.psi.ch:/data/experiment/meg/data1/online + Threads: 10 | Split: 20000 files | Max size: 100G + RunID: + +Please confirm to start (y/N): N +โŒ Transfer cancelled by user. +``` + +The following example migrates a subdirectory: + +```bash +[root@meg-s-001 bin]# experiment_migration.bash -b /meg -B /data/project/meg -S data1 -D data1 -p "shared/subprojects/meg1" +๐Ÿ”„ Transferring project: + From: /meg/data1/shared/subprojects/meg1 + To: login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1 + Threads: 10 | Split: 20000 files | Max size: 100G + RunID: + +Please confirm to start (y/N): N +```