Refactor/cleanup

This commit is contained in:
alexiondev
2026-05-08 17:18:58 -04:00
parent 799aa9fa3d
commit 1b904e04ea
18 changed files with 214 additions and 357 deletions

View File

@@ -11,10 +11,10 @@ All commands must be run from the project root using the local venv:
.venv/bin/python identify.py <image_path>
# Train the model
.venv/bin/python -m src.models.train --epochs 50 --batch_size 64 --lr 1e-4
.venv/bin/python -m src.models.train --epochs 50
# Evaluate a trained model on val and aug_test sets
.venv/bin/python -m src.models.evaluate [--backbone resnet18|resnet34] [--model_path <path>]
.venv/bin/python -m src.models.evaluate [--backbone resnet34] [--model_path <path>]
# Run inference only (no registry lookup)
.venv/bin/python src/models/inference.py <image_path>
@@ -26,7 +26,7 @@ All commands must be run from the project root using the local venv:
.venv/bin/python -m src.data.generate_aug_test_set
# Generate a single sample image for visual inspection
.venv/bin/python src/data/high_fidelity_generator.py
.venv/bin/python src/data/renderer.py
# Lint
.venv/bin/ruff check src/
@@ -50,7 +50,7 @@ Image → Detector → Inference → Resolver → Registry
### 1. Detection (`src/utils/detector.py`)
`SpindaDetector.detect_and_crop()` returns a **128×128 BGR** image, or `None`.
`SpindaDetector.detect_and_crop()` returns a **128×128 BGR** numpy array, or `None`.
Two-tier strategy, tried in order:
- **Tier 1 (screenshots/sprites):** HSV-filter red pixels → find individual spot blobs → cluster to 4 spots → derive crop from cluster centroid + `_SPOT_CROP_RATIO=5.5` (= 128 / 24.5 px span) with a `_SPOT_CENTER_OFFSET=0.056` downward shift so the spot centroid lands at 44.4 % from the top of the crop (matching the training canvas).
@@ -58,15 +58,18 @@ Two-tier strategy, tried in order:
### 2. Model (`src/models/regression_model.py`)
ResNet-18 backbone with the final FC replaced by `Linear(512, 8·16)`. Forward pass returns **(B, 8, 16)** — treating each of the 8 coordinates as a 16-class classification problem. Trained with `CrossEntropyLoss` on `view(-1, 16)` vs `view(-1)` targets; predictions use `argmax(dim=2)`.
Configurable backbone (default: ResNet-34) with the final FC replaced by `Linear(feat_dim, 8·16)`. Forward pass returns **(B, 8, 16)** — treating each of the 8 coordinates as a 16-class classification problem. Trained with `CrossEntropyLoss` on `view(-1, 16)` vs `view(-1)` targets; predictions use `argmax(dim=2)`.
Supported backbones: `resnet18` (512-d), `resnet34` (512-d), `convnext_tiny` (768-d).
### 3. Training (`src/models/train.py`)
- `SpindaDataset` (200 k virtual samples/epoch): generates a fresh random 32-bit PID per `__getitem__`, renders the sprite with a random background colour, then applies the full augmentation pipeline.
- `SpindaEvalDataset`: loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both `data/val/` (clean, seed=42) and `data/aug_test/` (augmented, seed=99).
- `SpindaEvalDataset` (in `src/data/dataset.py`): loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both `data/val/` (clean, seed=42) and `data/aug_test/` (augmented, seed=99).
- `_worker_init_fn` re-seeds Python `random` and NumPy per worker so forked workers generate distinct PIDs.
- Weighted loss: BL_x ×1.5, BL_y ×2.5 — applied during training only; val loss is unweighted.
- Early stopping: patience = 10 epochs on clean-val exact-match rate.
- Best model checkpoint: `models/best_spinda_model.pth`.
- Checkpoints saved to `models/best_{backbone}_model.pth`.
### 4. PID Encoding (domain invariant — must not be changed)
@@ -107,7 +110,8 @@ assets/ # Sprite assets used by the renderer
Spot_{TL,TR,BL,BR}.png
models/
best_spinda_model.pth
best_resnet34_model.pth # current best (default)
best_convnext_tiny_model.pth # convnext experiment
```
`metadata.json` format: `[{"img_path": "...", "pid_hex": "...", "target": [int×8]}, ...]`
@@ -115,8 +119,8 @@ models/
## Key invariants
- **Visual collisions:** ~1.3 % of fingerprints are shared by multiple PIDs (many-to-one mapping). `SpindaRegistry` stores `(fingerprint, pid_hex)` pairs with a unique constraint so `lookup_by_fingerprint` can return *all* matching PIDs — this is intentional, not a bug.
- The **validation set** uses white backgrounds (no augmentation baked in) to give a stable epoch-comparable baseline. Do not add augmentation to `generate_val_set.py`.
- The **augmented test set** is pre-generated and fixed. Regenerating it changes the baseline; do so intentionally.
- The crop output size is always **128×128** regardless of tier. The model transform chain also resizes to 128×128, so the inference path is robust to re-size.
- `generate_high_fidelity_spinda()` always takes `bg_color` as a `(R, G, B)` tuple in PIL order (not BGR).
- `generate_high_fidelity_spinda()` in `src/data/renderer.py` always takes `bg_color` as a `(R, G, B)` tuple in PIL order (not BGR).
- `SpindaInference.predict()` accepts either a file path or a BGR numpy array directly (e.g. from the detector).