migration and splitting AareLC

2026-04-14 16:07:31 +02:00
commit 391c357c84
3022 changed files with 23477 additions and 0 deletions
@@ -0,0 +1,275 @@
+# AareLC
+
+AareLC is an annotation and training helper for YOLO-based loop centering workflows.
+
+Main capabilities:
+- interactive GUI for image review and annotation
+- saving annotations for detection and segmentation
+- syncing image/label data from DB to local folders
+- generating a headless dataset preparation script (`train/val` split)
+- generating a matching training launcher script for Ultralytics YOLO
+
+## Quick Start (5 commands)
+```bash
+git clone <your-gitea-url>/AareLC.git
+cd AareLC
+python3 -m venv .venv && source .venv/bin/activate
+(on sls-gpu-003, recommend using uv for efficient package management: source yolo/bin/activate)
+pip install -U pip && pip install -r requirements.txt
+python3 ml_gui.py
+```
+
+on sls-gpu-003:
+after uv pip install the tensorrt-cu13-lib from the wheel file that was downloaded and saved in tmp folder (downloaded from nvidia: https://pypi.nvidia.com/)
+uv pip install --no-deps tensorrt-cu13
+uv pip install --no-deps tensorrt-cu13-bindings
+
+## Contributors
+- J. Dawn Duan
+- Filip Leonarski
+- Martin Appleby
+- Guillaume Gotthard
+- Claude (Anthropic) — AI pair programmer
+
+## Repository Structure
+
+```
+AareLC/
+├── src/
+│   ├── clip/                   clip related scripts
+│   ├── core/                   inference, DB, processing clients
+│   ├── gui/                    reusable GUI panels
+│   ├── server/                 server related scripts
+│   ├── inference/              inference related scripts
+│   ├── training/               training related scripts
+│   └── tools/                  data/annotation utilities
+├── scripts/                    production runnable entry points
+├── experiments/                exploratory, non-production
+├── config/                     yaml/config files
+├── models/
+│   ├── registry/               versioned trained models with metadata
+│   ├── base/                   pretrained upstream weights
+│   └── archived/               old/retired versions
+├── data/
+│   ├── datasets/               curated dataset images (source of truth)
+│   ├── splits/                 txt files recording split info
+│   └── training_runs/          per-run metrics/plots (gitignored)
+├── tests/                      unit tests (pytest)
+├── notebooks/
+├── environment.yml
+├── pyproject.toml
+├── requirements.txt
+└── README.md
+```
+
+## Model Registry
+
+The `models/registry/` directory is the canonical place for trained model versions.
+
+### Registry structure
+
+Each model version lives in its own directory:
+
+```
+models/registry/<model-id>/
+├── weights.pt           ← primary PyTorch checkpoint
+├── exports/             ← derived export formats
+│   ├── model.onnx
+│   ├── model.engine     ← TensorRT (hardware-specific, gitignored)
+│   └── pruned_weights.pt
+└── metadata.json        ← version info and metrics
+```
+
+### Naming convention
+
+`<architecture>-<task>-<variant>-<YYYY-MM-DD>`
+
+Examples:
+- `yolo26n-seg-overlap-false-2026-04-12` — yolo26n, segmentation, overlap_mask=False, trained 2026-04-12
+- `yolo26n-seg-multiscale-2026-03-16` — yolo26n, segmentation, multi-scale training
+
+### metadata.json fields
+
+| Field | Required | Description |
+|---|---|---|
+| `model_id` | yes | Same as directory name |
+| `family` | yes | Base architecture (e.g. `yolo26n`) |
+| `task` | yes | `segmentation`, `detection`, etc. |
+| `training_date` | yes | ISO date of training run |
+| `export_formats` | yes | List of available formats |
+| `metrics` | yes | mAP, precision, recall, etc. |
+| `dataset_version` | no | Dataset used for training |
+| `training_config` | no | Hyperparameters and settings |
+| `git_commit` | no | Commit hash at training time |
+| `framework_version` | no | Ultralytics/YOLO version |
+| `notes` | no | Free-form notes |
+| `status` | no | `active`, `deprecated`, or `archived` |
+
+### Workflow
+
+1. Train a model.
+2. Create a new registry folder using the naming convention.
+3. Save the main checkpoint as `weights.pt`.
+4. Export derived formats into `exports/`.
+5. Write `metadata.json`.
+6. If production-ready, update `models/active/production.json` to point to this version.
+7. Move outdated versions to `models/archived/` when no longer needed.
+
+### Rules
+
+- Do not overwrite an existing registry version — add a new one instead.
+- Treat each registry folder as immutable once finalized.
+- `models/active/production.json` is the source of truth for model selection.
+
+### AI-friendly summary
+
+- `models/registry/` = all versioned trained models; each subfolder = one release
+- `metadata.json` = structured description of the model
+- `models/active/` = current production pointer
+- `models/archived/` = retired models
+- Do not guess the correct model version; read the registry metadata instead
+
+## Getting Started
+
+### 1. Clone and enter project
+```bash
+git clone <your-gitea-url>/AareLC.git
+cd AareLC
+```
+
+### 2. Create environment
+Python 3.11+ is recommended.
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install --upgrade pip
+pip install -r requirements.txt
+```
+
+### 3. Optional environment variable
+If DB access requires shared password:
+
+```bash
+export AAREDB_SHARED_PASSWORD="<your_password>"
+```
+
+### 4. Start GUI
+```bash
+python3 ml_gui.py
+```
+
+## GUI Main Functions
+
+### File / Review
+- Load local image
+- Fetch next image from DB
+- Open review folder and navigate (`N`/`P` or arrow keys)
+- Save edited review annotations
+
+### Annotation
+- Predict current image
+- Edit detections/polygons with select/brush/SAM tools
+- Save for training (stores local sets + DB annotation)
+
+### Sync
+- Retrieve missing local images/labels from DB
+- Generate YOLO segmentation preparation script (headless workflow)
+
+## Prepare Dataset For Training (Recommended Workflow)
+
+Use this when training happens on a machine without GUI.
+
+### Step A: Generate prep script from GUI machine
+In GUI:
+1. `Sync` -> `Generate YOLO Segmentation Prep Script...`
+2. Select source folder containing images/labels
+3. Choose destination dataset folder
+4. Confirm split ratio (default `0.85`) and random seed (default `42`)
+5. Choose input model:
+- local model file from `models/` (`.pt`/`.yaml`)
+- or Ultralytics model name (`yolo11n-seg.pt`, etc.)
+6. Save generated script (commit to Gitea if needed)
+
+### Step B: Run prep script on training machine
+```bash
+python3 prepare_yolo_seg_dataset_<timestamp>.py
+```
+
+Optional preview:
+```bash
+python3 prepare_yolo_seg_dataset_<timestamp>.py --dry-run
+```
+
+The script creates:
+- `<destination>/train/images`
+- `<destination>/train/labels`
+- `<destination>/val/images`
+- `<destination>/val/labels`
+
+> **Note:** In the future, training dataset folders will use links (symlinks or hardlinks) pointing to the raw images in `data/datasets/` instead of copying files. This avoids duplicating large image data on disk.
+- `<destination>/dataset.yaml`
+- `<destination>/train_yolo_seg.py`
+- `<destination>/logs/dataset_preparation_*.json`
+- `<destination>/logs/dataset_preparation_*.txt`
+
+Log includes:
+- preparation date/time
+- source/destination paths
+- train/val counts
+- class names and class counts
+- full file mapping list (source -> destination)
+- missing labels and orphan labels
+- selected model spec
+
+## Train Model
+Rule of thumb for data preperation:
+- When doing the splitting, always create links in the folder that refers to images in data/datasets/ (don't duplicate)
+- When done training, save the split into data/splits/ by only saving the names and paths
+- 
+After dataset prep:
+```bash
+python3 <destination>/train_yolo_seg.py
+```
+
+Notes:
+- if `--model` points to a local file, it will be used directly
+- if `--model` is an Ultralytics name, it will be downloaded at runtime (internet required)
+
+## Reproducibility Notes
+- split reproducibility is controlled by `--seed`
+- same input files + same seed => same train/val assignment
+- changing seed only changes distribution, not file integrity
+
+## Testing
+
+Unit tests live in `tests/` and cover pure-logic modules that have no GUI or hardware dependency.
+
+| Test file | Module under test |
+|---|---|
+| `tests/test_target_point.py` | `src/server/target_point.py` — target-point priority logic |
+| `tests/test_generate_splits.py` | `src/tools/generate_splits.py` — split-file generation |
+| `tests/test_image_utils.py` | `src/server/image_utils.py` — image decode helpers |
+
+### Run tests locally
+
+```bash
+# install test dependency (once)
+pip install pytest pytest-cov
+
+# run all unit tests
+pytest
+
+# with coverage report
+pytest --cov=src --cov-report=term-missing
+```
+
+### CI/CD
+
+A Gitea Actions workflow at `.gitea/workflows/ci.yml` runs the test suite automatically on every push and pull request to `master`/`main`.
+
+## Troubleshooting
+- `dataset.yaml not found`: run prep script first, or pass `--data /path/to/dataset.yaml`
+- no pairs found: check that image names match label names (`.txt`)
+- missing classes in logs: verify label files contain valid class ids in first column
+- model download fails: use a local `.pt` model from `models/`
@@ -0,0 +1,106 @@
+task: detect
+mode: train
+model: yolov8n.yaml
+data: /home/jungfrau/alc/yolo/dataset.yaml
+epochs: 800
+time: null
+patience: 300
+batch: 16
+imgsz: 640
+save: true
+save_period: -1
+cache: false
+device: null
+workers: 8
+project: null
+name: train8
+exist_ok: false
+pretrained: true
+optimizer: auto
+verbose: true
+seed: 0
+deterministic: true
+single_cls: false
+rect: false
+cos_lr: false
+close_mosaic: 10
+resume: false
+amp: true
+fraction: 1.0
+profile: false
+freeze: null
+multi_scale: false
+overlap_mask: true
+mask_ratio: 4
+dropout: 0.0
+val: true
+split: val
+save_json: false
+save_hybrid: false
+conf: null
+iou: 0.7
+max_det: 300
+half: false
+dnn: false
+plots: true
+source: null
+vid_stride: 1
+stream_buffer: false
+visualize: false
+augment: false
+agnostic_nms: false
+classes: null
+retina_masks: false
+embed: null
+show: false
+save_frames: false
+save_txt: false
+save_conf: false
+save_crop: false
+show_labels: true
+show_conf: true
+show_boxes: true
+line_width: null
+format: torchscript
+keras: false
+optimize: false
+int8: false
+dynamic: false
+simplify: true
+opset: null
+workspace: null
+nms: false
+lr0: 0.01
+lrf: 0.01
+momentum: 0.937
+weight_decay: 0.0005
+warmup_epochs: 3.0
+warmup_momentum: 0.8
+warmup_bias_lr: 0.1
+box: 7.5
+cls: 0.5
+dfl: 1.5
+pose: 12.0
+kobj: 1.0
+nbs: 64
+hsv_h: 0.015
+hsv_s: 0.7
+hsv_v: 0.4
+degrees: 0.0
+translate: 0.1
+scale: 0.5
+shear: 0.0
+perspective: 0.0
+flipud: 0.0
+fliplr: 0.0
+bgr: 0.0
+mosaic: 1.0
+mixup: 0.0
+copy_paste: 0.0
+copy_paste_mode: flip
+auto_augment: randaugment
+erasing: 0.4
+crop_fraction: 1.0
+cfg: null
+tracker: botsort.yaml
+save_dir: /home/jungfrau/alc/yolo/runs/detect/train8
@@ -0,0 +1,14 @@
+# dataset.yaml
+path: /Users/duan_j/Applications/alc/tests/yolo  # Dataset root directory
+train: train/images     # Training images relative to 'path'
+val: val/images         # Validation images relative to 'path'
+test: test/images       # Test images relative to 'path' (optional)
+
+# Class names
+names:
+  0: loop_all
+  1: pin
+  2: crystal
+  3: loop_face
+  4: ice
+  5: needle
@@ -0,0 +1,52 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+# Ultralytics YOLO26-seg instance segmentation model with P3/8 - P5/32 outputs
+# Model docs: https://docs.ultralytics.com/models/yolo26
+# Task docs: https://docs.ultralytics.com/tasks/segment
+
+# Parameters
+nc: 80 # number of classes
+end2end: True # whether to use end-to-end mode
+reg_max: 1 # DFL bins
+scales: # model compound scaling constants, i.e. 'model=yolo26n-seg.yaml' will call yolo26-seg.yaml with scale 'n'
+  # [depth, width, max_channels]
+  n: [0.50, 0.25, 1024] # summary: 309 layers, 3,126,280 parameters, 3,126,280 gradients, 10.5 GFLOPs
+  s: [0.50, 0.50, 1024] # summary: 309 layers, 11,505,800 parameters, 11,505,800 gradients, 37.4 GFLOPs
+  m: [0.50, 1.00, 512] # summary: 329 layers, 27,112,072 parameters, 27,112,072 gradients, 132.5 GFLOPs
+  l: [1.00, 1.00, 512] # summary: 441 layers, 31,515,528 parameters, 31,515,528 gradients, 150.9 GFLOPs
+  x: [1.00, 1.50, 512] # summary: 441 layers, 70,693,800 parameters, 70,693,800 gradients, 337.7 GFLOPs
+
+# YOLO26n backbone
+backbone:
+  # [from, repeats, module, args]
+  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
+  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
+  - [-1, 2, C3k2, [256, False, 0.25]]
+  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
+  - [-1, 2, C3k2, [512, False, 0.25]]
+  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
+  - [-1, 2, C3k2, [512, True]]
+  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
+  - [-1, 2, C3k2, [1024, True]]
+  - [-1, 1, SPPF, [1024, 5, 3, True]] # 9
+  - [-1, 2, C2PSA, [1024]] # 10
+
+# YOLO26n head
+head:
+  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
+  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
+  - [-1, 2, C3k2, [512, True]] # 13
+
+  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
+  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
+  - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small)
+
+  - [-1, 1, Conv, [256, 3, 2]]
+  - [[-1, 13], 1, Concat, [1]] # cat head P4
+  - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium)
+
+  - [-1, 1, Conv, [512, 3, 2]]
+  - [[-1, 10], 1, Concat, [1]] # cat head P5
+  - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large)
+
+  - [[16, 19, 22], 1, Segment26, [nc, 32, 256]] # Segment26(P3, P4, P5)
@@ -0,0 +1 @@
+datasets/** filter=lfs diff=lfs merge=lfs -text
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`datasets/** filter=lfs diff=lfs merge=lfs -text`