duan_j 098bc8f0d6
CI / test (push) Successful in 4s
generate a ci template for later
2026-04-14 16:43:58 +02:00
2026-04-14 16:07:31 +02:00
2026-04-14 16:07:31 +02:00
2026-04-14 16:15:10 +02:00
2026-04-14 16:07:31 +02:00
2026-04-14 16:07:31 +02:00
2026-04-14 16:41:48 +02:00
2026-04-14 16:07:31 +02:00
2026-04-14 16:07:31 +02:00
2026-04-14 16:07:31 +02:00

AareLC

AareLC is an annotation and training helper for YOLO-based loop centering workflows.

Main capabilities:

  • interactive GUI for image review and annotation
  • saving annotations for detection and segmentation
  • syncing image/label data from DB to local folders
  • generating a headless dataset preparation script (train/val split)
  • generating a matching training launcher script for Ultralytics YOLO

Quick Start (5 commands)

git clone <your-gitea-url>/AareLC.git
cd AareLC
python3 -m venv .venv && source .venv/bin/activate
(on sls-gpu-003, recommend using uv for efficient package management: source yolo/bin/activate)
pip install -U pip && pip install -r requirements.txt
python3 ml_gui.py

on sls-gpu-003: after uv pip install the tensorrt-cu13-lib from the wheel file that was downloaded and saved in tmp folder (downloaded from nvidia: https://pypi.nvidia.com/) uv pip install --no-deps tensorrt-cu13 uv pip install --no-deps tensorrt-cu13-bindings

Contributors

  • J. Dawn Duan
  • Filip Leonarski
  • Martin Appleby
  • Guillaume Gotthard
  • Claude (Anthropic) — AI pair programmer

Repository Structure

AareLC/
├── src/
│   ├── clip/                   clip related scripts
│   ├── core/                   inference, DB, processing clients
│   ├── gui/                    reusable GUI panels
│   ├── server/                 server related scripts
│   ├── inference/              inference related scripts
│   ├── training/               training related scripts
│   └── tools/                  data/annotation utilities
├── scripts/                    production runnable entry points
├── experiments/                exploratory, non-production
├── config/                     yaml/config files
├── models/
│   ├── registry/               versioned trained models with metadata
│   ├── base/                   pretrained upstream weights
│   └── archived/               old/retired versions
├── data/
│   ├── datasets/               curated dataset images (source of truth)
│   ├── splits/                 txt files recording split info
│   └── training_runs/          per-run metrics/plots (gitignored)
├── tests/                      unit tests (pytest)
├── notebooks/
├── environment.yml
├── pyproject.toml
├── requirements.txt
└── README.md

Model Registry

The models/registry/ directory is the canonical place for trained model versions.

Registry structure

Each model version lives in its own directory:

models/registry/<model-id>/
├── weights.pt           ← primary PyTorch checkpoint
├── exports/             ← derived export formats
│   ├── model.onnx
│   ├── model.engine     ← TensorRT (hardware-specific, gitignored)
│   └── pruned_weights.pt
└── metadata.json        ← version info and metrics

Naming convention

<architecture>-<task>-<variant>-<YYYY-MM-DD>

Examples:

  • yolo26n-seg-overlap-false-2026-04-12 — yolo26n, segmentation, overlap_mask=False, trained 2026-04-12
  • yolo26n-seg-multiscale-2026-03-16 — yolo26n, segmentation, multi-scale training

metadata.json fields

Field Required Description
model_id yes Same as directory name
family yes Base architecture (e.g. yolo26n)
task yes segmentation, detection, etc.
training_date yes ISO date of training run
export_formats yes List of available formats
metrics yes mAP, precision, recall, etc.
dataset_version no Dataset used for training
training_config no Hyperparameters and settings
git_commit no Commit hash at training time
framework_version no Ultralytics/YOLO version
notes no Free-form notes
status no active, deprecated, or archived

Workflow

  1. Train a model.
  2. Create a new registry folder using the naming convention.
  3. Save the main checkpoint as weights.pt.
  4. Export derived formats into exports/.
  5. Write metadata.json.
  6. If production-ready, update models/active/production.json to point to this version.
  7. Move outdated versions to models/archived/ when no longer needed.

Rules

  • Do not overwrite an existing registry version — add a new one instead.
  • Treat each registry folder as immutable once finalized.
  • models/active/production.json is the source of truth for model selection.

AI-friendly summary

  • models/registry/ = all versioned trained models; each subfolder = one release
  • metadata.json = structured description of the model
  • models/active/ = current production pointer
  • models/archived/ = retired models
  • Do not guess the correct model version; read the registry metadata instead

Getting Started

1. Clone and enter project

git clone <your-gitea-url>/AareLC.git
cd AareLC

2. Create environment

Python 3.11+ is recommended.

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

3. Optional environment variable

If DB access requires shared password:

export AAREDB_SHARED_PASSWORD="<your_password>"

4. Start GUI

python3 ml_gui.py

GUI Main Functions

File / Review

  • Load local image
  • Fetch next image from DB
  • Open review folder and navigate (N/P or arrow keys)
  • Save edited review annotations

Annotation

  • Predict current image
  • Edit detections/polygons with select/brush/SAM tools
  • Save for training (stores local sets + DB annotation)

Sync

  • Retrieve missing local images/labels from DB
  • Generate YOLO segmentation preparation script (headless workflow)

Use this when training happens on a machine without GUI.

Step A: Generate prep script from GUI machine

In GUI:

  1. Sync -> Generate YOLO Segmentation Prep Script...
  2. Select source folder containing images/labels
  3. Choose destination dataset folder
  4. Confirm split ratio (default 0.85) and random seed (default 42)
  5. Choose input model:
  • local model file from models/ (.pt/.yaml)
  • or Ultralytics model name (yolo11n-seg.pt, etc.)
  1. Save generated script (commit to Gitea if needed)

Step B: Run prep script on training machine

python3 prepare_yolo_seg_dataset_<timestamp>.py

Optional preview:

python3 prepare_yolo_seg_dataset_<timestamp>.py --dry-run

The script creates:

  • <destination>/train/images
  • <destination>/train/labels
  • <destination>/val/images
  • <destination>/val/labels

Note: In the future, training dataset folders will use links (symlinks or hardlinks) pointing to the raw images in data/datasets/ instead of copying files. This avoids duplicating large image data on disk.

  • <destination>/dataset.yaml
  • <destination>/train_yolo_seg.py
  • <destination>/logs/dataset_preparation_*.json
  • <destination>/logs/dataset_preparation_*.txt

Log includes:

  • preparation date/time
  • source/destination paths
  • train/val counts
  • class names and class counts
  • full file mapping list (source -> destination)
  • missing labels and orphan labels
  • selected model spec

Train Model

Rule of thumb for data preperation:

  • When doing the splitting, always create links in the folder that refers to images in data/datasets/ (don't duplicate)
  • When done training, save the split into data/splits/ by only saving the names and paths

After dataset prep:

python3 <destination>/train_yolo_seg.py

Notes:

  • if --model points to a local file, it will be used directly
  • if --model is an Ultralytics name, it will be downloaded at runtime (internet required)

Reproducibility Notes

  • split reproducibility is controlled by --seed
  • same input files + same seed => same train/val assignment
  • changing seed only changes distribution, not file integrity

Testing

Unit tests live in tests/ and cover pure-logic modules that have no GUI or hardware dependency.

Test file Module under test
tests/test_target_point.py src/server/target_point.py — target-point priority logic
tests/test_generate_splits.py src/tools/generate_splits.py — split-file generation
tests/test_image_utils.py src/server/image_utils.py — image decode helpers

Run tests locally

# install test dependency (once)
pip install pytest pytest-cov

# run all unit tests
pytest

# with coverage report
pytest --cov=src --cov-report=term-missing

CI/CD

A Gitea Actions workflow at .gitea/workflows/ci.yml runs the test suite automatically on every push and pull request to master/main.

Troubleshooting

  • dataset.yaml not found: run prep script first, or pass --data /path/to/dataset.yaml
  • no pairs found: check that image names match label names (.txt)
  • missing classes in logs: verify label files contain valid class ids in first column
  • model download fails: use a local .pt model from models/
S
Description
No description provided
Readme 573 MiB
2026-04-15 10:18:02 +02:00
Languages
Python 100%