migration and splitting AareLC

This commit is contained in:
2026-04-14 16:07:31 +02:00
commit 391c357c84
3022 changed files with 23477 additions and 0 deletions
+275
View File
@@ -0,0 +1,275 @@
# AareLC
AareLC is an annotation and training helper for YOLO-based loop centering workflows.
Main capabilities:
- interactive GUI for image review and annotation
- saving annotations for detection and segmentation
- syncing image/label data from DB to local folders
- generating a headless dataset preparation script (`train/val` split)
- generating a matching training launcher script for Ultralytics YOLO
## Quick Start (5 commands)
```bash
git clone <your-gitea-url>/AareLC.git
cd AareLC
python3 -m venv .venv && source .venv/bin/activate
(on sls-gpu-003, recommend using uv for efficient package management: source yolo/bin/activate)
pip install -U pip && pip install -r requirements.txt
python3 ml_gui.py
```
on sls-gpu-003:
after uv pip install the tensorrt-cu13-lib from the wheel file that was downloaded and saved in tmp folder (downloaded from nvidia: https://pypi.nvidia.com/)
uv pip install --no-deps tensorrt-cu13
uv pip install --no-deps tensorrt-cu13-bindings
## Contributors
- J. Dawn Duan
- Filip Leonarski
- Martin Appleby
- Guillaume Gotthard
- Claude (Anthropic) — AI pair programmer
## Repository Structure
```
AareLC/
├── src/
│ ├── clip/ clip related scripts
│ ├── core/ inference, DB, processing clients
│ ├── gui/ reusable GUI panels
│ ├── server/ server related scripts
│ ├── inference/ inference related scripts
│ ├── training/ training related scripts
│ └── tools/ data/annotation utilities
├── scripts/ production runnable entry points
├── experiments/ exploratory, non-production
├── config/ yaml/config files
├── models/
│ ├── registry/ versioned trained models with metadata
│ ├── base/ pretrained upstream weights
│ └── archived/ old/retired versions
├── data/
│ ├── datasets/ curated dataset images (source of truth)
│ ├── splits/ txt files recording split info
│ └── training_runs/ per-run metrics/plots (gitignored)
├── tests/ unit tests (pytest)
├── notebooks/
├── environment.yml
├── pyproject.toml
├── requirements.txt
└── README.md
```
## Model Registry
The `models/registry/` directory is the canonical place for trained model versions.
### Registry structure
Each model version lives in its own directory:
```
models/registry/<model-id>/
├── weights.pt ← primary PyTorch checkpoint
├── exports/ ← derived export formats
│ ├── model.onnx
│ ├── model.engine ← TensorRT (hardware-specific, gitignored)
│ └── pruned_weights.pt
└── metadata.json ← version info and metrics
```
### Naming convention
`<architecture>-<task>-<variant>-<YYYY-MM-DD>`
Examples:
- `yolo26n-seg-overlap-false-2026-04-12` — yolo26n, segmentation, overlap_mask=False, trained 2026-04-12
- `yolo26n-seg-multiscale-2026-03-16` — yolo26n, segmentation, multi-scale training
### metadata.json fields
| Field | Required | Description |
|---|---|---|
| `model_id` | yes | Same as directory name |
| `family` | yes | Base architecture (e.g. `yolo26n`) |
| `task` | yes | `segmentation`, `detection`, etc. |
| `training_date` | yes | ISO date of training run |
| `export_formats` | yes | List of available formats |
| `metrics` | yes | mAP, precision, recall, etc. |
| `dataset_version` | no | Dataset used for training |
| `training_config` | no | Hyperparameters and settings |
| `git_commit` | no | Commit hash at training time |
| `framework_version` | no | Ultralytics/YOLO version |
| `notes` | no | Free-form notes |
| `status` | no | `active`, `deprecated`, or `archived` |
### Workflow
1. Train a model.
2. Create a new registry folder using the naming convention.
3. Save the main checkpoint as `weights.pt`.
4. Export derived formats into `exports/`.
5. Write `metadata.json`.
6. If production-ready, update `models/active/production.json` to point to this version.
7. Move outdated versions to `models/archived/` when no longer needed.
### Rules
- Do not overwrite an existing registry version — add a new one instead.
- Treat each registry folder as immutable once finalized.
- `models/active/production.json` is the source of truth for model selection.
### AI-friendly summary
- `models/registry/` = all versioned trained models; each subfolder = one release
- `metadata.json` = structured description of the model
- `models/active/` = current production pointer
- `models/archived/` = retired models
- Do not guess the correct model version; read the registry metadata instead
## Getting Started
### 1. Clone and enter project
```bash
git clone <your-gitea-url>/AareLC.git
cd AareLC
```
### 2. Create environment
Python 3.11+ is recommended.
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
### 3. Optional environment variable
If DB access requires shared password:
```bash
export AAREDB_SHARED_PASSWORD="<your_password>"
```
### 4. Start GUI
```bash
python3 ml_gui.py
```
## GUI Main Functions
### File / Review
- Load local image
- Fetch next image from DB
- Open review folder and navigate (`N`/`P` or arrow keys)
- Save edited review annotations
### Annotation
- Predict current image
- Edit detections/polygons with select/brush/SAM tools
- Save for training (stores local sets + DB annotation)
### Sync
- Retrieve missing local images/labels from DB
- Generate YOLO segmentation preparation script (headless workflow)
## Prepare Dataset For Training (Recommended Workflow)
Use this when training happens on a machine without GUI.
### Step A: Generate prep script from GUI machine
In GUI:
1. `Sync` -> `Generate YOLO Segmentation Prep Script...`
2. Select source folder containing images/labels
3. Choose destination dataset folder
4. Confirm split ratio (default `0.85`) and random seed (default `42`)
5. Choose input model:
- local model file from `models/` (`.pt`/`.yaml`)
- or Ultralytics model name (`yolo11n-seg.pt`, etc.)
6. Save generated script (commit to Gitea if needed)
### Step B: Run prep script on training machine
```bash
python3 prepare_yolo_seg_dataset_<timestamp>.py
```
Optional preview:
```bash
python3 prepare_yolo_seg_dataset_<timestamp>.py --dry-run
```
The script creates:
- `<destination>/train/images`
- `<destination>/train/labels`
- `<destination>/val/images`
- `<destination>/val/labels`
> **Note:** In the future, training dataset folders will use links (symlinks or hardlinks) pointing to the raw images in `data/datasets/` instead of copying files. This avoids duplicating large image data on disk.
- `<destination>/dataset.yaml`
- `<destination>/train_yolo_seg.py`
- `<destination>/logs/dataset_preparation_*.json`
- `<destination>/logs/dataset_preparation_*.txt`
Log includes:
- preparation date/time
- source/destination paths
- train/val counts
- class names and class counts
- full file mapping list (source -> destination)
- missing labels and orphan labels
- selected model spec
## Train Model
Rule of thumb for data preperation:
- When doing the splitting, always create links in the folder that refers to images in data/datasets/ (don't duplicate)
- When done training, save the split into data/splits/ by only saving the names and paths
-
After dataset prep:
```bash
python3 <destination>/train_yolo_seg.py
```
Notes:
- if `--model` points to a local file, it will be used directly
- if `--model` is an Ultralytics name, it will be downloaded at runtime (internet required)
## Reproducibility Notes
- split reproducibility is controlled by `--seed`
- same input files + same seed => same train/val assignment
- changing seed only changes distribution, not file integrity
## Testing
Unit tests live in `tests/` and cover pure-logic modules that have no GUI or hardware dependency.
| Test file | Module under test |
|---|---|
| `tests/test_target_point.py` | `src/server/target_point.py` — target-point priority logic |
| `tests/test_generate_splits.py` | `src/tools/generate_splits.py` — split-file generation |
| `tests/test_image_utils.py` | `src/server/image_utils.py` — image decode helpers |
### Run tests locally
```bash
# install test dependency (once)
pip install pytest pytest-cov
# run all unit tests
pytest
# with coverage report
pytest --cov=src --cov-report=term-missing
```
### CI/CD
A Gitea Actions workflow at `.gitea/workflows/ci.yml` runs the test suite automatically on every push and pull request to `master`/`main`.
## Troubleshooting
- `dataset.yaml not found`: run prep script first, or pass `--data /path/to/dataset.yaml`
- no pairs found: check that image names match label names (`.txt`)
- missing classes in logs: verify label files contain valid class ids in first column
- model download fails: use a local `.pt` model from `models/`
+106
View File
@@ -0,0 +1,106 @@
task: detect
mode: train
model: yolov8n.yaml
data: /home/jungfrau/alc/yolo/dataset.yaml
epochs: 800
time: null
patience: 300
batch: 16
imgsz: 640
save: true
save_period: -1
cache: false
device: null
workers: 8
project: null
name: train8
exist_ok: false
pretrained: true
optimizer: auto
verbose: true
seed: 0
deterministic: true
single_cls: false
rect: false
cos_lr: false
close_mosaic: 10
resume: false
amp: true
fraction: 1.0
profile: false
freeze: null
multi_scale: false
overlap_mask: true
mask_ratio: 4
dropout: 0.0
val: true
split: val
save_json: false
save_hybrid: false
conf: null
iou: 0.7
max_det: 300
half: false
dnn: false
plots: true
source: null
vid_stride: 1
stream_buffer: false
visualize: false
augment: false
agnostic_nms: false
classes: null
retina_masks: false
embed: null
show: false
save_frames: false
save_txt: false
save_conf: false
save_crop: false
show_labels: true
show_conf: true
show_boxes: true
line_width: null
format: torchscript
keras: false
optimize: false
int8: false
dynamic: false
simplify: true
opset: null
workspace: null
nms: false
lr0: 0.01
lrf: 0.01
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
warmup_bias_lr: 0.1
box: 7.5
cls: 0.5
dfl: 1.5
pose: 12.0
kobj: 1.0
nbs: 64
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
degrees: 0.0
translate: 0.1
scale: 0.5
shear: 0.0
perspective: 0.0
flipud: 0.0
fliplr: 0.0
bgr: 0.0
mosaic: 1.0
mixup: 0.0
copy_paste: 0.0
copy_paste_mode: flip
auto_augment: randaugment
erasing: 0.4
crop_fraction: 1.0
cfg: null
tracker: botsort.yaml
save_dir: /home/jungfrau/alc/yolo/runs/detect/train8
+14
View File
@@ -0,0 +1,14 @@
# dataset.yaml
path: /Users/duan_j/Applications/alc/tests/yolo # Dataset root directory
train: train/images # Training images relative to 'path'
val: val/images # Validation images relative to 'path'
test: test/images # Test images relative to 'path' (optional)
# Class names
names:
0: loop_all
1: pin
2: crystal
3: loop_face
4: ice
5: needle
+52
View File
@@ -0,0 +1,52 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
# Ultralytics YOLO26-seg instance segmentation model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo26
# Task docs: https://docs.ultralytics.com/tasks/segment
# Parameters
nc: 80 # number of classes
end2end: True # whether to use end-to-end mode
reg_max: 1 # DFL bins
scales: # model compound scaling constants, i.e. 'model=yolo26n-seg.yaml' will call yolo26-seg.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 309 layers, 3,126,280 parameters, 3,126,280 gradients, 10.5 GFLOPs
s: [0.50, 0.50, 1024] # summary: 309 layers, 11,505,800 parameters, 11,505,800 gradients, 37.4 GFLOPs
m: [0.50, 1.00, 512] # summary: 329 layers, 27,112,072 parameters, 27,112,072 gradients, 132.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 441 layers, 31,515,528 parameters, 31,515,528 gradients, 150.9 GFLOPs
x: [1.00, 1.50, 512] # summary: 441 layers, 70,693,800 parameters, 70,693,800 gradients, 337.7 GFLOPs
# YOLO26n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5, 3, True]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO26n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, True]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, True]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Segment26, [nc, 32, 256]] # Segment26(P3, P4, P5)
+1
View File
@@ -0,0 +1 @@
datasets/** filter=lfs diff=lfs merge=lfs -text
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More