Refactor/cleanup

2026-05-08 17:18:58 -04:00
parent 799aa9fa3d
commit 1b904e04ea
18 changed files with 214 additions and 357 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -11,10 +11,10 @@ All commands must be run from the project root using the local venv:
 .venv/bin/python identify.py <image_path>

 # Train the model
-.venv/bin/python -m src.models.train --epochs 50 --batch_size 64 --lr 1e-4
+.venv/bin/python -m src.models.train --epochs 50

 # Evaluate a trained model on val and aug_test sets
-.venv/bin/python -m src.models.evaluate [--backbone resnet18|resnet34] [--model_path <path>]
+.venv/bin/python -m src.models.evaluate [--backbone resnet34] [--model_path <path>]

 # Run inference only (no registry lookup)
 .venv/bin/python src/models/inference.py <image_path>
@@ -26,7 +26,7 @@ All commands must be run from the project root using the local venv:
 .venv/bin/python -m src.data.generate_aug_test_set

 # Generate a single sample image for visual inspection
-.venv/bin/python src/data/high_fidelity_generator.py
+.venv/bin/python src/data/renderer.py

 # Lint
 .venv/bin/ruff check src/
@@ -50,7 +50,7 @@ Image → Detector → Inference → Resolver → Registry

 ### 1. Detection (`src/utils/detector.py`)

-`SpindaDetector.detect_and_crop()` returns a **128×128 BGR** image, or `None`.
+`SpindaDetector.detect_and_crop()` returns a **128×128 BGR** numpy array, or `None`.

 Two-tier strategy, tried in order:
 - **Tier 1 (screenshots/sprites):** HSV-filter red pixels → find individual spot blobs → cluster to 4 spots → derive crop from cluster centroid + `_SPOT_CROP_RATIO=5.5` (= 128 / 24.5 px span) with a `_SPOT_CENTER_OFFSET=0.056` downward shift so the spot centroid lands at 44.4 % from the top of the crop (matching the training canvas).
@@ -58,15 +58,18 @@ Two-tier strategy, tried in order:

 ### 2. Model (`src/models/regression_model.py`)

-ResNet-18 backbone with the final FC replaced by `Linear(512, 8·16)`. Forward pass returns **(B, 8, 16)** — treating each of the 8 coordinates as a 16-class classification problem. Trained with `CrossEntropyLoss` on `view(-1, 16)` vs `view(-1)` targets; predictions use `argmax(dim=2)`.
+Configurable backbone (default: ResNet-34) with the final FC replaced by `Linear(feat_dim, 8·16)`. Forward pass returns **(B, 8, 16)** — treating each of the 8 coordinates as a 16-class classification problem. Trained with `CrossEntropyLoss` on `view(-1, 16)` vs `view(-1)` targets; predictions use `argmax(dim=2)`.
+
+Supported backbones: `resnet18` (512-d), `resnet34` (512-d), `convnext_tiny` (768-d).

 ### 3. Training (`src/models/train.py`)

 - `SpindaDataset` (200 k virtual samples/epoch): generates a fresh random 32-bit PID per `__getitem__`, renders the sprite with a random background colour, then applies the full augmentation pipeline.
- `SpindaEvalDataset`: loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both `data/val/` (clean, seed=42) and `data/aug_test/` (augmented, seed=99).
+- `SpindaEvalDataset` (in `src/data/dataset.py`): loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both `data/val/` (clean, seed=42) and `data/aug_test/` (augmented, seed=99).
 - `_worker_init_fn` re-seeds Python `random` and NumPy per worker so forked workers generate distinct PIDs.
+- Weighted loss: BL_x ×1.5, BL_y ×2.5 — applied during training only; val loss is unweighted.
 - Early stopping: patience = 10 epochs on clean-val exact-match rate.
- Best model checkpoint: `models/best_spinda_model.pth`.
+- Checkpoints saved to `models/best_{backbone}_model.pth`.

 ### 4. PID Encoding (domain invariant — must not be changed)

@@ -107,7 +110,8 @@ assets/            # Sprite assets used by the renderer
  Spot_{TL,TR,BL,BR}.png

 models/
-  best_spinda_model.pth
+  best_resnet34_model.pth      # current best (default)
+  best_convnext_tiny_model.pth # convnext experiment
 ```

 `metadata.json` format: `[{"img_path": "...", "pid_hex": "...", "target": [int×8]}, ...]`
@@ -115,8 +119,8 @@ models/
 ## Key invariants

 - **Visual collisions:** ~1.3 % of fingerprints are shared by multiple PIDs (many-to-one mapping). `SpindaRegistry` stores `(fingerprint, pid_hex)` pairs with a unique constraint so `lookup_by_fingerprint` can return *all* matching PIDs — this is intentional, not a bug.
-
 - The **validation set** uses white backgrounds (no augmentation baked in) to give a stable epoch-comparable baseline. Do not add augmentation to `generate_val_set.py`.
 - The **augmented test set** is pre-generated and fixed. Regenerating it changes the baseline; do so intentionally.
 - The crop output size is always **128×128** regardless of tier. The model transform chain also resizes to 128×128, so the inference path is robust to re-size.
- `generate_high_fidelity_spinda()` always takes `bg_color` as a `(R, G, B)` tuple in PIL order (not BGR).
+- `generate_high_fidelity_spinda()` in `src/data/renderer.py` always takes `bg_color` as a `(R, G, B)` tuple in PIL order (not BGR).
+- `SpindaInference.predict()` accepts either a file path or a BGR numpy array directly (e.g. from the detector).