Files
vibe-spinda/CLAUDE.md
2026-05-08 09:22:50 -04:00

120 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Commands
All commands must be run from the project root using the local venv:
```bash
# End-to-end identification (the main entry point)
.venv/bin/python identify.py <image_path>
# Train the model
.venv/bin/python -m src.models.train --epochs 50 --batch_size 64 --lr 1e-4
# Run inference only (no registry lookup)
.venv/bin/python src/models/inference.py <image_path>
# Generate/regenerate the fixed validation set (seed=42, 1000 samples, white bg)
.venv/bin/python -m src.data.generate_val_set
# Generate/regenerate the fixed augmented test set (seed=99, 500 samples)
.venv/bin/python -m src.data.generate_aug_test_set
# Generate a single sample image for visual inspection
.venv/bin/python src/data/high_fidelity_generator.py
# Lint
.venv/bin/ruff check src/
# Type check
.venv/bin/mypy src/
```
The package is installed in editable mode (`pip install -e .`); imports use `src.*` paths.
**Important:** DataLoader workers use `multiprocessing`, so training must be invoked as a module (`python -m src.models.train`), not as a script piped via stdin — Python cannot resolve the worker `main_path` in that case.
## Architecture
The pipeline has five stages:
```
Image → Detector → Inference → Resolver → Registry
(cropped) (logits) (PIDs) (SQLite)
```
### 1. Detection (`src/utils/detector.py`)
`SpindaDetector.detect_and_crop()` returns a **128×128 BGR** image, or `None`.
Two-tier strategy, tried in order:
- **Tier 1 (screenshots/sprites):** HSV-filter red pixels → find individual spot blobs → cluster to 4 spots → derive crop from cluster centroid + `_SPOT_CROP_RATIO=5.5` (= 128 / 24.5 px span) with a `_SPOT_CENTER_OFFSET=0.056` downward shift so the spot centroid lands at 44.4 % from the top of the crop (matching the training canvas).
- **Tier 2 (real photos, spots merged):** Find the full Spinda body blob; score = `circularity + 0.2·log(area/min_area) 0.1·|aspect 1.12|`; crop the face (top 43/58 of body height) using blob width as scale reference.
### 2. Model (`src/models/regression_model.py`)
ResNet-18 backbone with the final FC replaced by `Linear(512, 8·16)`. Forward pass returns **(B, 8, 16)** — treating each of the 8 coordinates as a 16-class classification problem. Trained with `CrossEntropyLoss` on `view(-1, 16)` vs `view(-1)` targets; predictions use `argmax(dim=2)`.
### 3. Training (`src/models/train.py`)
- `SpindaDataset` (200 k virtual samples/epoch): generates a fresh random 32-bit PID per `__getitem__`, renders the sprite with a random background colour, then applies the full augmentation pipeline.
- `SpindaEvalDataset`: loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both `data/val/` (clean, seed=42) and `data/aug_test/` (augmented, seed=99).
- `_worker_init_fn` re-seeds Python `random` and NumPy per worker so forked workers generate distinct PIDs.
- Early stopping: patience = 10 epochs on clean-val exact-match rate.
- Best model checkpoint: `models/best_spinda_model.pth`.
### 4. PID Encoding (domain invariant — must not be changed)
The 8 model outputs map directly to hex nibbles of the 32-bit PID via the **ProfessorRex** convention:
| Coord index | Nibble | Spot | Notes |
|-------------|--------|------|-------|
| 0 (TL_x) | `pid[-1]` | TL | no pixel offset |
| 1 (TL_y) | `pid[-2]` | TL | |
| 2 (TR_x) | `pid[-3]` | TR | +24 px |
| 3 (TR_y) | `pid[-4]` | TR | +1 px |
| 4 (BL_x) | `pid[3]` | BL | +6 px |
| 5 (BL_y) | `pid[2]` | BL | +18 px |
| 6 (BR_x) | `pid[1]` | BR | +18 px |
| 7 (BR_y) | `pid[0]` | BR | +19 px |
`SpindaResolver.coordinates_to_pid()` reconstructs each byte as `(Y << 4) | X`; BDSP reverses the byte order.
### 5. Registry (`src/registry/database.py`)
SQLite at `data/spinda_registry.db`. Schema: `(fingerprint TEXT, pid_hex TEXT, UNIQUE)` with an index on `fingerprint`. `SpindaRegistry.add_entry()` is idempotent (ignores `IntegrityError`).
## Data layout
```
data/
val/ # 1000 fixed clean sprites, white bg (seed=42) — stable benchmark
metadata.json
sample_NNNN.png
aug_test/ # 500 fixed augmented images (seed=99) — domain-adaptation tracker
metadata.json
sample_NNNN.png
spinda_registry.db
assets/ # Sprite assets used by the renderer
Spinda_Base_Top.png # 52×43 face layer
Spinda_Head.png # colourisation source for spots
Spot_{TL,TR,BL,BR}.png
models/
best_spinda_model.pth
```
`metadata.json` format: `[{"img_path": "...", "pid_hex": "...", "target": [int×8]}, ...]`
## Key invariants
- **Visual collisions:** ~1.3 % of fingerprints are shared by multiple PIDs (many-to-one mapping). `SpindaRegistry` stores `(fingerprint, pid_hex)` pairs with a unique constraint so `lookup_by_fingerprint` can return *all* matching PIDs — this is intentional, not a bug.
- The **validation set** uses white backgrounds (no augmentation baked in) to give a stable epoch-comparable baseline. Do not add augmentation to `generate_val_set.py`.
- The **augmented test set** is pre-generated and fixed. Regenerating it changes the baseline; do so intentionally.
- The crop output size is always **128×128** regardless of tier. The model transform chain also resizes to 128×128, so the inference path is robust to re-size.
- `generate_high_fidelity_spinda()` always takes `bg_color` as a `(R, G, B)` tuple in PIL order (not BGR).