6.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Commands
All commands must be run from the project root using the local venv:
# End-to-end identification (the main entry point)
.venv/bin/python identify.py <image_path>
# Train the model
.venv/bin/python -m src.models.train --epochs 50
# Evaluate a trained model on val and aug_test sets
.venv/bin/python -m src.models.evaluate [--backbone resnet34] [--model_path <path>]
# Run inference only (no registry lookup)
.venv/bin/python src/models/inference.py <image_path>
# Generate/regenerate the fixed validation set (seed=42, 1000 samples, white bg)
.venv/bin/python -m src.data.generate_val_set
# Generate/regenerate the fixed augmented test set (seed=99, 500 samples)
.venv/bin/python -m src.data.generate_aug_test_set
# Generate a single sample image for visual inspection
.venv/bin/python src/data/renderer.py
# Lint
.venv/bin/ruff check src/
# Type check
.venv/bin/mypy src/
The package is installed in editable mode (pip install -e .); imports use src.* paths.
Important: DataLoader workers use multiprocessing, so training must be invoked as a module (python -m src.models.train), not as a script piped via stdin — Python cannot resolve the worker main_path in that case.
Architecture
The pipeline has five stages:
Image → Detector → Inference → Resolver → Registry
(cropped) (logits) (PIDs) (SQLite)
1. Detection (src/utils/detector.py)
SpindaDetector.detect_and_crop() returns a 128×128 BGR numpy array, or None.
Two-tier strategy, tried in order:
- Tier 1 (screenshots/sprites): HSV-filter red pixels → find individual spot blobs → cluster to 4 spots → derive crop from cluster centroid +
_SPOT_CROP_RATIO=5.5(= 128 / 24.5 px span) with a_SPOT_CENTER_OFFSET=0.056downward shift so the spot centroid lands at 44.4 % from the top of the crop (matching the training canvas). - Tier 2 (real photos, spots merged): Find the full Spinda body blob; score =
circularity + 0.2·log(area/min_area) − 0.1·|aspect − 1.12|; crop the face (top 43/58 of body height) using blob width as scale reference.
2. Model (src/models/regression_model.py)
Configurable backbone (default: ResNet-34) with the final FC replaced by Linear(feat_dim, 8·16). Forward pass returns (B, 8, 16) — treating each of the 8 coordinates as a 16-class classification problem. Trained with CrossEntropyLoss on view(-1, 16) vs view(-1) targets; predictions use argmax(dim=2).
Supported backbones: resnet18 (512-d), resnet34 (512-d), convnext_tiny (768-d).
3. Training (src/models/train.py)
SpindaDataset(200 k virtual samples/epoch): generates a fresh random 32-bit PID per__getitem__, renders the sprite with a random background colour, then applies the full augmentation pipeline.SpindaEvalDataset(insrc/data/dataset.py): loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for bothdata/val/(clean, seed=42) anddata/aug_test/(augmented, seed=99)._worker_init_fnre-seeds Pythonrandomand NumPy per worker so forked workers generate distinct PIDs.- Weighted loss: BL_x ×1.5, BL_y ×2.5 — applied during training only; val loss is unweighted.
- Early stopping: patience = 10 epochs on clean-val exact-match rate.
- Checkpoints saved to
models/best_{backbone}_model.pth.
4. PID Encoding (domain invariant — must not be changed)
The 8 model outputs map directly to hex nibbles of the 32-bit PID via the ProfessorRex convention:
| Coord index | Nibble | Spot | Notes |
|---|---|---|---|
| 0 (TL_x) | pid[-1] |
TL | no pixel offset |
| 1 (TL_y) | pid[-2] |
TL | |
| 2 (TR_x) | pid[-3] |
TR | +24 px |
| 3 (TR_y) | pid[-4] |
TR | +1 px |
| 4 (BL_x) | pid[3] |
BL | +6 px |
| 5 (BL_y) | pid[2] |
BL | +18 px |
| 6 (BR_x) | pid[1] |
BR | +18 px |
| 7 (BR_y) | pid[0] |
BR | +19 px |
SpindaResolver.coordinates_to_pid() reconstructs each byte as (Y << 4) | X; BDSP reverses the byte order.
5. Registry (src/registry/database.py)
SQLite at data/spinda_registry.db. Schema: (fingerprint TEXT, pid_hex TEXT, UNIQUE) with an index on fingerprint. SpindaRegistry.add_entry() is idempotent (ignores IntegrityError).
Data layout
data/
val/ # 1000 fixed clean sprites, white bg (seed=42) — stable benchmark
metadata.json
sample_NNNN.png
aug_test/ # 500 fixed augmented images (seed=99) — domain-adaptation tracker
metadata.json
sample_NNNN.png
spinda_registry.db
assets/ # Sprite assets used by the renderer
Spinda_Base_Top.png # 52×43 face layer
Spinda_Head.png # colourisation source for spots
Spot_{TL,TR,BL,BR}.png
models/
best_resnet34_model.pth # current best (default)
best_convnext_tiny_model.pth # convnext experiment
metadata.json format: [{"img_path": "...", "pid_hex": "...", "target": [int×8]}, ...]
Key invariants
- Visual collisions: ~1.3 % of fingerprints are shared by multiple PIDs (many-to-one mapping).
SpindaRegistrystores(fingerprint, pid_hex)pairs with a unique constraint solookup_by_fingerprintcan return all matching PIDs — this is intentional, not a bug. - The validation set uses white backgrounds (no augmentation baked in) to give a stable epoch-comparable baseline. Do not add augmentation to
generate_val_set.py. - The augmented test set is pre-generated and fixed. Regenerating it changes the baseline; do so intentionally.
- The crop output size is always 128×128 regardless of tier. The model transform chain also resizes to 128×128, so the inference path is robust to re-size.
generate_high_fidelity_spinda()insrc/data/renderer.pyalways takesbg_coloras a(R, G, B)tuple in PIL order (not BGR).SpindaInference.predict()accepts either a file path or a BGR numpy array directly (e.g. from the detector).