# AareLC AareLC is an annotation and training helper for YOLO-based loop centering workflows. Main capabilities: - interactive GUI for image review and annotation - saving annotations for detection and segmentation - syncing image/label data from DB to local folders - generating a headless dataset preparation script (`train/val` split) - generating a matching training launcher script for Ultralytics YOLO ## Quick Start (5 commands) ```bash git clone /AareLC.git cd AareLC python3 -m venv .venv && source .venv/bin/activate (on sls-gpu-003, recommend using uv for efficient package management: source yolo/bin/activate) pip install -U pip && pip install -r requirements.txt python3 ml_gui.py ``` on sls-gpu-003: after uv pip install the tensorrt-cu13-lib from the wheel file that was downloaded and saved in tmp folder (downloaded from nvidia: https://pypi.nvidia.com/) uv pip install --no-deps tensorrt-cu13 uv pip install --no-deps tensorrt-cu13-bindings ## Contributors - J. Dawn Duan - Filip Leonarski - Martin Appleby - Guillaume Gotthard - Claude (Anthropic) — AI pair programmer ## Repository Structure ``` AareLC/ ├── src/ │ ├── clip/ clip related scripts │ ├── core/ inference, DB, processing clients │ ├── gui/ reusable GUI panels │ ├── server/ server related scripts │ ├── inference/ inference related scripts │ ├── training/ training related scripts │ └── tools/ data/annotation utilities ├── scripts/ production runnable entry points ├── experiments/ exploratory, non-production ├── config/ yaml/config files ├── models/ │ ├── registry/ versioned trained models with metadata │ ├── base/ pretrained upstream weights │ └── archived/ old/retired versions ├── data/ │ ├── datasets/ curated dataset images (source of truth) │ ├── splits/ txt files recording split info │ └── training_runs/ per-run metrics/plots (gitignored) ├── tests/ unit tests (pytest) ├── notebooks/ ├── environment.yml ├── pyproject.toml ├── requirements.txt └── README.md ``` ## Model Registry The `models/registry/` directory is the canonical place for trained model versions. ### Registry structure Each model version lives in its own directory: ``` models/registry// ├── weights.pt ← primary PyTorch checkpoint ├── exports/ ← derived export formats │ ├── model.onnx │ ├── model.engine ← TensorRT (hardware-specific, gitignored) │ └── pruned_weights.pt └── metadata.json ← version info and metrics ``` ### Naming convention `---` Examples: - `yolo26n-seg-overlap-false-2026-04-12` — yolo26n, segmentation, overlap_mask=False, trained 2026-04-12 - `yolo26n-seg-multiscale-2026-03-16` — yolo26n, segmentation, multi-scale training ### metadata.json fields | Field | Required | Description | |---|---|---| | `model_id` | yes | Same as directory name | | `family` | yes | Base architecture (e.g. `yolo26n`) | | `task` | yes | `segmentation`, `detection`, etc. | | `training_date` | yes | ISO date of training run | | `export_formats` | yes | List of available formats | | `metrics` | yes | mAP, precision, recall, etc. | | `dataset_version` | no | Dataset used for training | | `training_config` | no | Hyperparameters and settings | | `git_commit` | no | Commit hash at training time | | `framework_version` | no | Ultralytics/YOLO version | | `notes` | no | Free-form notes | | `status` | no | `active`, `deprecated`, or `archived` | ### Workflow 1. Train a model. 2. Create a new registry folder using the naming convention. 3. Save the main checkpoint as `weights.pt`. 4. Export derived formats into `exports/`. 5. Write `metadata.json`. 6. If production-ready, update `models/active/production.json` to point to this version. 7. Move outdated versions to `models/archived/` when no longer needed. ### Rules - Do not overwrite an existing registry version — add a new one instead. - Treat each registry folder as immutable once finalized. - `models/active/production.json` is the source of truth for model selection. ### AI-friendly summary - `models/registry/` = all versioned trained models; each subfolder = one release - `metadata.json` = structured description of the model - `models/active/` = current production pointer - `models/archived/` = retired models - Do not guess the correct model version; read the registry metadata instead ## Getting Started ### 1. Clone and enter project ```bash git clone /AareLC.git cd AareLC ``` ### 2. Create environment Python 3.11+ is recommended. ```bash python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install -r requirements.txt ``` ### 3. Optional environment variable If DB access requires shared password: ```bash export AAREDB_SHARED_PASSWORD="" ``` ### 4. Start GUI ```bash python3 ml_gui.py ``` ## GUI Main Functions ### File / Review - Load local image - Fetch next image from DB - Open review folder and navigate (`N`/`P` or arrow keys) - Save edited review annotations ### Annotation - Predict current image - Edit detections/polygons with select/brush/SAM tools - Save for training (stores local sets + DB annotation) ### Sync - Retrieve missing local images/labels from DB - Generate YOLO segmentation preparation script (headless workflow) ## Prepare Dataset For Training (Recommended Workflow) Use this when training happens on a machine without GUI. ### Step A: Generate prep script from GUI machine In GUI: 1. `Sync` -> `Generate YOLO Segmentation Prep Script...` 2. Select source folder containing images/labels 3. Choose destination dataset folder 4. Confirm split ratio (default `0.85`) and random seed (default `42`) 5. Choose input model: - local model file from `models/` (`.pt`/`.yaml`) - or Ultralytics model name (`yolo11n-seg.pt`, etc.) 6. Save generated script (commit to Gitea if needed) ### Step B: Run prep script on training machine ```bash python3 prepare_yolo_seg_dataset_.py ``` Optional preview: ```bash python3 prepare_yolo_seg_dataset_.py --dry-run ``` The script creates: - `/train/images` - `/train/labels` - `/val/images` - `/val/labels` > **Note:** In the future, training dataset folders will use links (symlinks or hardlinks) pointing to the raw images in `data/datasets/` instead of copying files. This avoids duplicating large image data on disk. - `/dataset.yaml` - `/train_yolo_seg.py` - `/logs/dataset_preparation_*.json` - `/logs/dataset_preparation_*.txt` Log includes: - preparation date/time - source/destination paths - train/val counts - class names and class counts - full file mapping list (source -> destination) - missing labels and orphan labels - selected model spec ## Train Model Rule of thumb for data preperation: - When doing the splitting, always create links in the folder that refers to images in data/datasets/ (don't duplicate) - When done training, save the split into data/splits/ by only saving the names and paths - After dataset prep: ```bash python3 /train_yolo_seg.py ``` Notes: - if `--model` points to a local file, it will be used directly - if `--model` is an Ultralytics name, it will be downloaded at runtime (internet required) ## Reproducibility Notes - split reproducibility is controlled by `--seed` - same input files + same seed => same train/val assignment - changing seed only changes distribution, not file integrity ## Testing Unit tests live in `tests/` and cover pure-logic modules that have no GUI or hardware dependency. | Test file | Module under test | |---|---| | `tests/test_target_point.py` | `src/server/target_point.py` — target-point priority logic | | `tests/test_generate_splits.py` | `src/tools/generate_splits.py` — split-file generation | | `tests/test_image_utils.py` | `src/server/image_utils.py` — image decode helpers | ### Run tests locally ```bash # install test dependency (once) pip install pytest pytest-cov # run all unit tests pytest # with coverage report pytest --cov=src --cov-report=term-missing ``` ### CI/CD A Gitea Actions workflow at `.gitea/workflows/ci.yml` runs the test suite automatically on every push and pull request to `master`/`main`. ## Troubleshooting - `dataset.yaml not found`: run prep script first, or pass `--data /path/to/dataset.yaml` - no pairs found: check that image names match label names (`.txt`) - missing classes in logs: verify label files contain valid class ids in first column - model download fails: use a local `.pt` model from `models/`