AareLC
AareLC is an annotation and training helper for YOLO-based loop centering workflows.
Main capabilities:
- interactive GUI for image review and annotation
- saving annotations for detection and segmentation
- syncing image/label data from DB to local folders
- generating a headless dataset preparation script (
train/valsplit) - generating a matching training launcher script for Ultralytics YOLO
Quick Start (5 commands)
git clone <your-gitea-url>/AareLC.git
cd AareLC
python3 -m venv .venv && source .venv/bin/activate
(on sls-gpu-003, recommend using uv for efficient package management: source yolo/bin/activate)
pip install -U pip && pip install -r requirements.txt
python3 ml_gui.py
on sls-gpu-003: after uv pip install the tensorrt-cu13-lib from the wheel file that was downloaded and saved in tmp folder (downloaded from nvidia: https://pypi.nvidia.com/) uv pip install --no-deps tensorrt-cu13 uv pip install --no-deps tensorrt-cu13-bindings
Contributors
- J. Dawn Duan
- Filip Leonarski
- Martin Appleby
- Guillaume Gotthard
- Claude (Anthropic) — AI pair programmer
Repository Structure
AareLC/
├── src/
│ ├── clip/ clip related scripts
│ ├── core/ inference, DB, processing clients
│ ├── gui/ reusable GUI panels
│ ├── server/ server related scripts
│ ├── inference/ inference related scripts
│ ├── training/ training related scripts
│ └── tools/ data/annotation utilities
├── scripts/ production runnable entry points
├── experiments/ exploratory, non-production
├── config/ yaml/config files
├── models/
│ ├── registry/ versioned trained models with metadata
│ ├── base/ pretrained upstream weights
│ └── archived/ old/retired versions
├── data/
│ ├── datasets/ curated dataset images (source of truth)
│ ├── splits/ txt files recording split info
│ └── training_runs/ per-run metrics/plots (gitignored)
├── tests/ unit tests (pytest)
├── notebooks/
├── environment.yml
├── pyproject.toml
├── requirements.txt
└── README.md
Model Registry
The models/registry/ directory is the canonical place for trained model versions.
Registry structure
Each model version lives in its own directory:
models/registry/<model-id>/
├── weights.pt ← primary PyTorch checkpoint
├── exports/ ← derived export formats
│ ├── model.onnx
│ ├── model.engine ← TensorRT (hardware-specific, gitignored)
│ └── pruned_weights.pt
└── metadata.json ← version info and metrics
Naming convention
<architecture>-<task>-<variant>-<YYYY-MM-DD>
Examples:
yolo26n-seg-overlap-false-2026-04-12— yolo26n, segmentation, overlap_mask=False, trained 2026-04-12yolo26n-seg-multiscale-2026-03-16— yolo26n, segmentation, multi-scale training
metadata.json fields
| Field | Required | Description |
|---|---|---|
model_id |
yes | Same as directory name |
family |
yes | Base architecture (e.g. yolo26n) |
task |
yes | segmentation, detection, etc. |
training_date |
yes | ISO date of training run |
export_formats |
yes | List of available formats |
metrics |
yes | mAP, precision, recall, etc. |
dataset_version |
no | Dataset used for training |
training_config |
no | Hyperparameters and settings |
git_commit |
no | Commit hash at training time |
framework_version |
no | Ultralytics/YOLO version |
notes |
no | Free-form notes |
status |
no | active, deprecated, or archived |
Workflow
- Train a model.
- Create a new registry folder using the naming convention.
- Save the main checkpoint as
weights.pt. - Export derived formats into
exports/. - Write
metadata.json. - If production-ready, update
models/active/production.jsonto point to this version. - Move outdated versions to
models/archived/when no longer needed.
Rules
- Do not overwrite an existing registry version — add a new one instead.
- Treat each registry folder as immutable once finalized.
models/active/production.jsonis the source of truth for model selection.
AI-friendly summary
models/registry/= all versioned trained models; each subfolder = one releasemetadata.json= structured description of the modelmodels/active/= current production pointermodels/archived/= retired models- Do not guess the correct model version; read the registry metadata instead
Getting Started
1. Clone and enter project
git clone <your-gitea-url>/AareLC.git
cd AareLC
2. Create environment
Python 3.11+ is recommended.
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
3. Optional environment variable
If DB access requires shared password:
export AAREDB_SHARED_PASSWORD="<your_password>"
4. Start GUI
python3 ml_gui.py
GUI Main Functions
File / Review
- Load local image
- Fetch next image from DB
- Open review folder and navigate (
N/Por arrow keys) - Save edited review annotations
Annotation
- Predict current image
- Edit detections/polygons with select/brush/SAM tools
- Save for training (stores local sets + DB annotation)
Sync
- Retrieve missing local images/labels from DB
- Generate YOLO segmentation preparation script (headless workflow)
Prepare Dataset For Training (Recommended Workflow)
Use this when training happens on a machine without GUI.
Step A: Generate prep script from GUI machine
In GUI:
Sync->Generate YOLO Segmentation Prep Script...- Select source folder containing images/labels
- Choose destination dataset folder
- Confirm split ratio (default
0.85) and random seed (default42) - Choose input model:
- local model file from
models/(.pt/.yaml) - or Ultralytics model name (
yolo11n-seg.pt, etc.)
- Save generated script (commit to Gitea if needed)
Step B: Run prep script on training machine
python3 prepare_yolo_seg_dataset_<timestamp>.py
Optional preview:
python3 prepare_yolo_seg_dataset_<timestamp>.py --dry-run
The script creates:
<destination>/train/images<destination>/train/labels<destination>/val/images<destination>/val/labels
Note: In the future, training dataset folders will use links (symlinks or hardlinks) pointing to the raw images in
data/datasets/instead of copying files. This avoids duplicating large image data on disk.
<destination>/dataset.yaml<destination>/train_yolo_seg.py<destination>/logs/dataset_preparation_*.json<destination>/logs/dataset_preparation_*.txt
Log includes:
- preparation date/time
- source/destination paths
- train/val counts
- class names and class counts
- full file mapping list (source -> destination)
- missing labels and orphan labels
- selected model spec
Train Model
Rule of thumb for data preperation:
- When doing the splitting, always create links in the folder that refers to images in data/datasets/ (don't duplicate)
- When done training, save the split into data/splits/ by only saving the names and paths
After dataset prep:
python3 <destination>/train_yolo_seg.py
Notes:
- if
--modelpoints to a local file, it will be used directly - if
--modelis an Ultralytics name, it will be downloaded at runtime (internet required)
Reproducibility Notes
- split reproducibility is controlled by
--seed - same input files + same seed => same train/val assignment
- changing seed only changes distribution, not file integrity
Testing
Unit tests live in tests/ and cover pure-logic modules that have no GUI or hardware dependency.
| Test file | Module under test |
|---|---|
tests/test_target_point.py |
src/server/target_point.py — target-point priority logic |
tests/test_generate_splits.py |
src/tools/generate_splits.py — split-file generation |
tests/test_image_utils.py |
src/server/image_utils.py — image decode helpers |
Run tests locally
# install test dependency (once)
pip install pytest pytest-cov
# run all unit tests
pytest
# with coverage report
pytest --cov=src --cov-report=term-missing
CI/CD
A Gitea Actions workflow at .gitea/workflows/ci.yml runs the test suite automatically on every push and pull request to master/main.
Troubleshooting
dataset.yaml not found: run prep script first, or pass--data /path/to/dataset.yaml- no pairs found: check that image names match label names (
.txt) - missing classes in logs: verify label files contain valid class ids in first column
- model download fails: use a local
.ptmodel frommodels/