migration and splitting AareLC
This commit is contained in:
@@ -0,0 +1,275 @@
|
||||
# AareLC
|
||||
|
||||
AareLC is an annotation and training helper for YOLO-based loop centering workflows.
|
||||
|
||||
Main capabilities:
|
||||
- interactive GUI for image review and annotation
|
||||
- saving annotations for detection and segmentation
|
||||
- syncing image/label data from DB to local folders
|
||||
- generating a headless dataset preparation script (`train/val` split)
|
||||
- generating a matching training launcher script for Ultralytics YOLO
|
||||
|
||||
## Quick Start (5 commands)
|
||||
```bash
|
||||
git clone <your-gitea-url>/AareLC.git
|
||||
cd AareLC
|
||||
python3 -m venv .venv && source .venv/bin/activate
|
||||
(on sls-gpu-003, recommend using uv for efficient package management: source yolo/bin/activate)
|
||||
pip install -U pip && pip install -r requirements.txt
|
||||
python3 ml_gui.py
|
||||
```
|
||||
|
||||
on sls-gpu-003:
|
||||
after uv pip install the tensorrt-cu13-lib from the wheel file that was downloaded and saved in tmp folder (downloaded from nvidia: https://pypi.nvidia.com/)
|
||||
uv pip install --no-deps tensorrt-cu13
|
||||
uv pip install --no-deps tensorrt-cu13-bindings
|
||||
|
||||
## Contributors
|
||||
- J. Dawn Duan
|
||||
- Filip Leonarski
|
||||
- Martin Appleby
|
||||
- Guillaume Gotthard
|
||||
- Claude (Anthropic) — AI pair programmer
|
||||
|
||||
## Repository Structure
|
||||
|
||||
```
|
||||
AareLC/
|
||||
├── src/
|
||||
│ ├── clip/ clip related scripts
|
||||
│ ├── core/ inference, DB, processing clients
|
||||
│ ├── gui/ reusable GUI panels
|
||||
│ ├── server/ server related scripts
|
||||
│ ├── inference/ inference related scripts
|
||||
│ ├── training/ training related scripts
|
||||
│ └── tools/ data/annotation utilities
|
||||
├── scripts/ production runnable entry points
|
||||
├── experiments/ exploratory, non-production
|
||||
├── config/ yaml/config files
|
||||
├── models/
|
||||
│ ├── registry/ versioned trained models with metadata
|
||||
│ ├── base/ pretrained upstream weights
|
||||
│ └── archived/ old/retired versions
|
||||
├── data/
|
||||
│ ├── datasets/ curated dataset images (source of truth)
|
||||
│ ├── splits/ txt files recording split info
|
||||
│ └── training_runs/ per-run metrics/plots (gitignored)
|
||||
├── tests/ unit tests (pytest)
|
||||
├── notebooks/
|
||||
├── environment.yml
|
||||
├── pyproject.toml
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## Model Registry
|
||||
|
||||
The `models/registry/` directory is the canonical place for trained model versions.
|
||||
|
||||
### Registry structure
|
||||
|
||||
Each model version lives in its own directory:
|
||||
|
||||
```
|
||||
models/registry/<model-id>/
|
||||
├── weights.pt ← primary PyTorch checkpoint
|
||||
├── exports/ ← derived export formats
|
||||
│ ├── model.onnx
|
||||
│ ├── model.engine ← TensorRT (hardware-specific, gitignored)
|
||||
│ └── pruned_weights.pt
|
||||
└── metadata.json ← version info and metrics
|
||||
```
|
||||
|
||||
### Naming convention
|
||||
|
||||
`<architecture>-<task>-<variant>-<YYYY-MM-DD>`
|
||||
|
||||
Examples:
|
||||
- `yolo26n-seg-overlap-false-2026-04-12` — yolo26n, segmentation, overlap_mask=False, trained 2026-04-12
|
||||
- `yolo26n-seg-multiscale-2026-03-16` — yolo26n, segmentation, multi-scale training
|
||||
|
||||
### metadata.json fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|---|---|---|
|
||||
| `model_id` | yes | Same as directory name |
|
||||
| `family` | yes | Base architecture (e.g. `yolo26n`) |
|
||||
| `task` | yes | `segmentation`, `detection`, etc. |
|
||||
| `training_date` | yes | ISO date of training run |
|
||||
| `export_formats` | yes | List of available formats |
|
||||
| `metrics` | yes | mAP, precision, recall, etc. |
|
||||
| `dataset_version` | no | Dataset used for training |
|
||||
| `training_config` | no | Hyperparameters and settings |
|
||||
| `git_commit` | no | Commit hash at training time |
|
||||
| `framework_version` | no | Ultralytics/YOLO version |
|
||||
| `notes` | no | Free-form notes |
|
||||
| `status` | no | `active`, `deprecated`, or `archived` |
|
||||
|
||||
### Workflow
|
||||
|
||||
1. Train a model.
|
||||
2. Create a new registry folder using the naming convention.
|
||||
3. Save the main checkpoint as `weights.pt`.
|
||||
4. Export derived formats into `exports/`.
|
||||
5. Write `metadata.json`.
|
||||
6. If production-ready, update `models/active/production.json` to point to this version.
|
||||
7. Move outdated versions to `models/archived/` when no longer needed.
|
||||
|
||||
### Rules
|
||||
|
||||
- Do not overwrite an existing registry version — add a new one instead.
|
||||
- Treat each registry folder as immutable once finalized.
|
||||
- `models/active/production.json` is the source of truth for model selection.
|
||||
|
||||
### AI-friendly summary
|
||||
|
||||
- `models/registry/` = all versioned trained models; each subfolder = one release
|
||||
- `metadata.json` = structured description of the model
|
||||
- `models/active/` = current production pointer
|
||||
- `models/archived/` = retired models
|
||||
- Do not guess the correct model version; read the registry metadata instead
|
||||
|
||||
## Getting Started
|
||||
|
||||
### 1. Clone and enter project
|
||||
```bash
|
||||
git clone <your-gitea-url>/AareLC.git
|
||||
cd AareLC
|
||||
```
|
||||
|
||||
### 2. Create environment
|
||||
Python 3.11+ is recommended.
|
||||
|
||||
```bash
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 3. Optional environment variable
|
||||
If DB access requires shared password:
|
||||
|
||||
```bash
|
||||
export AAREDB_SHARED_PASSWORD="<your_password>"
|
||||
```
|
||||
|
||||
### 4. Start GUI
|
||||
```bash
|
||||
python3 ml_gui.py
|
||||
```
|
||||
|
||||
## GUI Main Functions
|
||||
|
||||
### File / Review
|
||||
- Load local image
|
||||
- Fetch next image from DB
|
||||
- Open review folder and navigate (`N`/`P` or arrow keys)
|
||||
- Save edited review annotations
|
||||
|
||||
### Annotation
|
||||
- Predict current image
|
||||
- Edit detections/polygons with select/brush/SAM tools
|
||||
- Save for training (stores local sets + DB annotation)
|
||||
|
||||
### Sync
|
||||
- Retrieve missing local images/labels from DB
|
||||
- Generate YOLO segmentation preparation script (headless workflow)
|
||||
|
||||
## Prepare Dataset For Training (Recommended Workflow)
|
||||
|
||||
Use this when training happens on a machine without GUI.
|
||||
|
||||
### Step A: Generate prep script from GUI machine
|
||||
In GUI:
|
||||
1. `Sync` -> `Generate YOLO Segmentation Prep Script...`
|
||||
2. Select source folder containing images/labels
|
||||
3. Choose destination dataset folder
|
||||
4. Confirm split ratio (default `0.85`) and random seed (default `42`)
|
||||
5. Choose input model:
|
||||
- local model file from `models/` (`.pt`/`.yaml`)
|
||||
- or Ultralytics model name (`yolo11n-seg.pt`, etc.)
|
||||
6. Save generated script (commit to Gitea if needed)
|
||||
|
||||
### Step B: Run prep script on training machine
|
||||
```bash
|
||||
python3 prepare_yolo_seg_dataset_<timestamp>.py
|
||||
```
|
||||
|
||||
Optional preview:
|
||||
```bash
|
||||
python3 prepare_yolo_seg_dataset_<timestamp>.py --dry-run
|
||||
```
|
||||
|
||||
The script creates:
|
||||
- `<destination>/train/images`
|
||||
- `<destination>/train/labels`
|
||||
- `<destination>/val/images`
|
||||
- `<destination>/val/labels`
|
||||
|
||||
> **Note:** In the future, training dataset folders will use links (symlinks or hardlinks) pointing to the raw images in `data/datasets/` instead of copying files. This avoids duplicating large image data on disk.
|
||||
- `<destination>/dataset.yaml`
|
||||
- `<destination>/train_yolo_seg.py`
|
||||
- `<destination>/logs/dataset_preparation_*.json`
|
||||
- `<destination>/logs/dataset_preparation_*.txt`
|
||||
|
||||
Log includes:
|
||||
- preparation date/time
|
||||
- source/destination paths
|
||||
- train/val counts
|
||||
- class names and class counts
|
||||
- full file mapping list (source -> destination)
|
||||
- missing labels and orphan labels
|
||||
- selected model spec
|
||||
|
||||
## Train Model
|
||||
Rule of thumb for data preperation:
|
||||
- When doing the splitting, always create links in the folder that refers to images in data/datasets/ (don't duplicate)
|
||||
- When done training, save the split into data/splits/ by only saving the names and paths
|
||||
-
|
||||
After dataset prep:
|
||||
```bash
|
||||
python3 <destination>/train_yolo_seg.py
|
||||
```
|
||||
|
||||
Notes:
|
||||
- if `--model` points to a local file, it will be used directly
|
||||
- if `--model` is an Ultralytics name, it will be downloaded at runtime (internet required)
|
||||
|
||||
## Reproducibility Notes
|
||||
- split reproducibility is controlled by `--seed`
|
||||
- same input files + same seed => same train/val assignment
|
||||
- changing seed only changes distribution, not file integrity
|
||||
|
||||
## Testing
|
||||
|
||||
Unit tests live in `tests/` and cover pure-logic modules that have no GUI or hardware dependency.
|
||||
|
||||
| Test file | Module under test |
|
||||
|---|---|
|
||||
| `tests/test_target_point.py` | `src/server/target_point.py` — target-point priority logic |
|
||||
| `tests/test_generate_splits.py` | `src/tools/generate_splits.py` — split-file generation |
|
||||
| `tests/test_image_utils.py` | `src/server/image_utils.py` — image decode helpers |
|
||||
|
||||
### Run tests locally
|
||||
|
||||
```bash
|
||||
# install test dependency (once)
|
||||
pip install pytest pytest-cov
|
||||
|
||||
# run all unit tests
|
||||
pytest
|
||||
|
||||
# with coverage report
|
||||
pytest --cov=src --cov-report=term-missing
|
||||
```
|
||||
|
||||
### CI/CD
|
||||
|
||||
A Gitea Actions workflow at `.gitea/workflows/ci.yml` runs the test suite automatically on every push and pull request to `master`/`main`.
|
||||
|
||||
## Troubleshooting
|
||||
- `dataset.yaml not found`: run prep script first, or pass `--data /path/to/dataset.yaml`
|
||||
- no pairs found: check that image names match label names (`.txt`)
|
||||
- missing classes in logs: verify label files contain valid class ids in first column
|
||||
- model download fails: use a local `.pt` model from `models/`
|
||||
@@ -0,0 +1,106 @@
|
||||
task: detect
|
||||
mode: train
|
||||
model: yolov8n.yaml
|
||||
data: /home/jungfrau/alc/yolo/dataset.yaml
|
||||
epochs: 800
|
||||
time: null
|
||||
patience: 300
|
||||
batch: 16
|
||||
imgsz: 640
|
||||
save: true
|
||||
save_period: -1
|
||||
cache: false
|
||||
device: null
|
||||
workers: 8
|
||||
project: null
|
||||
name: train8
|
||||
exist_ok: false
|
||||
pretrained: true
|
||||
optimizer: auto
|
||||
verbose: true
|
||||
seed: 0
|
||||
deterministic: true
|
||||
single_cls: false
|
||||
rect: false
|
||||
cos_lr: false
|
||||
close_mosaic: 10
|
||||
resume: false
|
||||
amp: true
|
||||
fraction: 1.0
|
||||
profile: false
|
||||
freeze: null
|
||||
multi_scale: false
|
||||
overlap_mask: true
|
||||
mask_ratio: 4
|
||||
dropout: 0.0
|
||||
val: true
|
||||
split: val
|
||||
save_json: false
|
||||
save_hybrid: false
|
||||
conf: null
|
||||
iou: 0.7
|
||||
max_det: 300
|
||||
half: false
|
||||
dnn: false
|
||||
plots: true
|
||||
source: null
|
||||
vid_stride: 1
|
||||
stream_buffer: false
|
||||
visualize: false
|
||||
augment: false
|
||||
agnostic_nms: false
|
||||
classes: null
|
||||
retina_masks: false
|
||||
embed: null
|
||||
show: false
|
||||
save_frames: false
|
||||
save_txt: false
|
||||
save_conf: false
|
||||
save_crop: false
|
||||
show_labels: true
|
||||
show_conf: true
|
||||
show_boxes: true
|
||||
line_width: null
|
||||
format: torchscript
|
||||
keras: false
|
||||
optimize: false
|
||||
int8: false
|
||||
dynamic: false
|
||||
simplify: true
|
||||
opset: null
|
||||
workspace: null
|
||||
nms: false
|
||||
lr0: 0.01
|
||||
lrf: 0.01
|
||||
momentum: 0.937
|
||||
weight_decay: 0.0005
|
||||
warmup_epochs: 3.0
|
||||
warmup_momentum: 0.8
|
||||
warmup_bias_lr: 0.1
|
||||
box: 7.5
|
||||
cls: 0.5
|
||||
dfl: 1.5
|
||||
pose: 12.0
|
||||
kobj: 1.0
|
||||
nbs: 64
|
||||
hsv_h: 0.015
|
||||
hsv_s: 0.7
|
||||
hsv_v: 0.4
|
||||
degrees: 0.0
|
||||
translate: 0.1
|
||||
scale: 0.5
|
||||
shear: 0.0
|
||||
perspective: 0.0
|
||||
flipud: 0.0
|
||||
fliplr: 0.0
|
||||
bgr: 0.0
|
||||
mosaic: 1.0
|
||||
mixup: 0.0
|
||||
copy_paste: 0.0
|
||||
copy_paste_mode: flip
|
||||
auto_augment: randaugment
|
||||
erasing: 0.4
|
||||
crop_fraction: 1.0
|
||||
cfg: null
|
||||
tracker: botsort.yaml
|
||||
save_dir: /home/jungfrau/alc/yolo/runs/detect/train8
|
||||
@@ -0,0 +1,14 @@
|
||||
# dataset.yaml
|
||||
path: /Users/duan_j/Applications/alc/tests/yolo # Dataset root directory
|
||||
train: train/images # Training images relative to 'path'
|
||||
val: val/images # Validation images relative to 'path'
|
||||
test: test/images # Test images relative to 'path' (optional)
|
||||
|
||||
# Class names
|
||||
names:
|
||||
0: loop_all
|
||||
1: pin
|
||||
2: crystal
|
||||
3: loop_face
|
||||
4: ice
|
||||
5: needle
|
||||
@@ -0,0 +1,52 @@
|
||||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
# Ultralytics YOLO26-seg instance segmentation model with P3/8 - P5/32 outputs
|
||||
# Model docs: https://docs.ultralytics.com/models/yolo26
|
||||
# Task docs: https://docs.ultralytics.com/tasks/segment
|
||||
|
||||
# Parameters
|
||||
nc: 80 # number of classes
|
||||
end2end: True # whether to use end-to-end mode
|
||||
reg_max: 1 # DFL bins
|
||||
scales: # model compound scaling constants, i.e. 'model=yolo26n-seg.yaml' will call yolo26-seg.yaml with scale 'n'
|
||||
# [depth, width, max_channels]
|
||||
n: [0.50, 0.25, 1024] # summary: 309 layers, 3,126,280 parameters, 3,126,280 gradients, 10.5 GFLOPs
|
||||
s: [0.50, 0.50, 1024] # summary: 309 layers, 11,505,800 parameters, 11,505,800 gradients, 37.4 GFLOPs
|
||||
m: [0.50, 1.00, 512] # summary: 329 layers, 27,112,072 parameters, 27,112,072 gradients, 132.5 GFLOPs
|
||||
l: [1.00, 1.00, 512] # summary: 441 layers, 31,515,528 parameters, 31,515,528 gradients, 150.9 GFLOPs
|
||||
x: [1.00, 1.50, 512] # summary: 441 layers, 70,693,800 parameters, 70,693,800 gradients, 337.7 GFLOPs
|
||||
|
||||
# YOLO26n backbone
|
||||
backbone:
|
||||
# [from, repeats, module, args]
|
||||
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
|
||||
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
|
||||
- [-1, 2, C3k2, [256, False, 0.25]]
|
||||
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
|
||||
- [-1, 2, C3k2, [512, False, 0.25]]
|
||||
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
|
||||
- [-1, 2, C3k2, [512, True]]
|
||||
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
|
||||
- [-1, 2, C3k2, [1024, True]]
|
||||
- [-1, 1, SPPF, [1024, 5, 3, True]] # 9
|
||||
- [-1, 2, C2PSA, [1024]] # 10
|
||||
|
||||
# YOLO26n head
|
||||
head:
|
||||
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
|
||||
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
|
||||
- [-1, 2, C3k2, [512, True]] # 13
|
||||
|
||||
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
|
||||
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
|
||||
- [-1, 2, C3k2, [256, True]] # 16 (P3/8-small)
|
||||
|
||||
- [-1, 1, Conv, [256, 3, 2]]
|
||||
- [[-1, 13], 1, Concat, [1]] # cat head P4
|
||||
- [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium)
|
||||
|
||||
- [-1, 1, Conv, [512, 3, 2]]
|
||||
- [[-1, 10], 1, Concat, [1]] # cat head P5
|
||||
- [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large)
|
||||
|
||||
- [[16, 19, 22], 1, Segment26, [nc, 32, 256]] # Segment26(P3, P4, P5)
|
||||
@@ -0,0 +1 @@
|
||||
datasets/** filter=lfs diff=lfs merge=lfs -text
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user