120 lines
5.4 KiB
Markdown
120 lines
5.4 KiB
Markdown
# CLAUDE.md
|
||
|
||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||
|
||
## Commands
|
||
|
||
All commands must be run from the project root using the local venv:
|
||
|
||
```bash
|
||
# End-to-end identification (the main entry point)
|
||
.venv/bin/python identify.py <image_path>
|
||
|
||
# Train the model
|
||
.venv/bin/python -m src.models.train --epochs 50 --batch_size 64 --lr 1e-4
|
||
|
||
# Run inference only (no registry lookup)
|
||
.venv/bin/python src/models/inference.py <image_path>
|
||
|
||
# Generate/regenerate the fixed validation set (seed=42, 1000 samples, white bg)
|
||
.venv/bin/python -m src.data.generate_val_set
|
||
|
||
# Generate/regenerate the fixed augmented test set (seed=99, 500 samples)
|
||
.venv/bin/python -m src.data.generate_aug_test_set
|
||
|
||
# Generate a single sample image for visual inspection
|
||
.venv/bin/python src/data/high_fidelity_generator.py
|
||
|
||
# Lint
|
||
.venv/bin/ruff check src/
|
||
|
||
# Type check
|
||
.venv/bin/mypy src/
|
||
```
|
||
|
||
The package is installed in editable mode (`pip install -e .`); imports use `src.*` paths.
|
||
|
||
**Important:** DataLoader workers use `multiprocessing`, so training must be invoked as a module (`python -m src.models.train`), not as a script piped via stdin — Python cannot resolve the worker `main_path` in that case.
|
||
|
||
## Architecture
|
||
|
||
The pipeline has five stages:
|
||
|
||
```
|
||
Image → Detector → Inference → Resolver → Registry
|
||
(cropped) (logits) (PIDs) (SQLite)
|
||
```
|
||
|
||
### 1. Detection (`src/utils/detector.py`)
|
||
|
||
`SpindaDetector.detect_and_crop()` returns a **128×128 BGR** image, or `None`.
|
||
|
||
Two-tier strategy, tried in order:
|
||
- **Tier 1 (screenshots/sprites):** HSV-filter red pixels → find individual spot blobs → cluster to 4 spots → derive crop from cluster centroid + `_SPOT_CROP_RATIO=5.5` (= 128 / 24.5 px span) with a `_SPOT_CENTER_OFFSET=0.056` downward shift so the spot centroid lands at 44.4 % from the top of the crop (matching the training canvas).
|
||
- **Tier 2 (real photos, spots merged):** Find the full Spinda body blob; score = `circularity + 0.2·log(area/min_area) − 0.1·|aspect − 1.12|`; crop the face (top 43/58 of body height) using blob width as scale reference.
|
||
|
||
### 2. Model (`src/models/regression_model.py`)
|
||
|
||
ResNet-18 backbone with the final FC replaced by `Linear(512, 8·16)`. Forward pass returns **(B, 8, 16)** — treating each of the 8 coordinates as a 16-class classification problem. Trained with `CrossEntropyLoss` on `view(-1, 16)` vs `view(-1)` targets; predictions use `argmax(dim=2)`.
|
||
|
||
### 3. Training (`src/models/train.py`)
|
||
|
||
- `SpindaDataset` (200 k virtual samples/epoch): generates a fresh random 32-bit PID per `__getitem__`, renders the sprite with a random background colour, then applies the full augmentation pipeline.
|
||
- `SpindaEvalDataset`: loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both `data/val/` (clean, seed=42) and `data/aug_test/` (augmented, seed=99).
|
||
- `_worker_init_fn` re-seeds Python `random` and NumPy per worker so forked workers generate distinct PIDs.
|
||
- Early stopping: patience = 10 epochs on clean-val exact-match rate.
|
||
- Best model checkpoint: `models/best_spinda_model.pth`.
|
||
|
||
### 4. PID Encoding (domain invariant — must not be changed)
|
||
|
||
The 8 model outputs map directly to hex nibbles of the 32-bit PID via the **ProfessorRex** convention:
|
||
|
||
| Coord index | Nibble | Spot | Notes |
|
||
|-------------|--------|------|-------|
|
||
| 0 (TL_x) | `pid[-1]` | TL | no pixel offset |
|
||
| 1 (TL_y) | `pid[-2]` | TL | |
|
||
| 2 (TR_x) | `pid[-3]` | TR | +24 px |
|
||
| 3 (TR_y) | `pid[-4]` | TR | +1 px |
|
||
| 4 (BL_x) | `pid[3]` | BL | +6 px |
|
||
| 5 (BL_y) | `pid[2]` | BL | +18 px |
|
||
| 6 (BR_x) | `pid[1]` | BR | +18 px |
|
||
| 7 (BR_y) | `pid[0]` | BR | +19 px |
|
||
|
||
`SpindaResolver.coordinates_to_pid()` reconstructs each byte as `(Y << 4) | X`; BDSP reverses the byte order.
|
||
|
||
### 5. Registry (`src/registry/database.py`)
|
||
|
||
SQLite at `data/spinda_registry.db`. Schema: `(fingerprint TEXT, pid_hex TEXT, UNIQUE)` with an index on `fingerprint`. `SpindaRegistry.add_entry()` is idempotent (ignores `IntegrityError`).
|
||
|
||
## Data layout
|
||
|
||
```
|
||
data/
|
||
val/ # 1000 fixed clean sprites, white bg (seed=42) — stable benchmark
|
||
metadata.json
|
||
sample_NNNN.png
|
||
aug_test/ # 500 fixed augmented images (seed=99) — domain-adaptation tracker
|
||
metadata.json
|
||
sample_NNNN.png
|
||
spinda_registry.db
|
||
|
||
assets/ # Sprite assets used by the renderer
|
||
Spinda_Base_Top.png # 52×43 face layer
|
||
Spinda_Head.png # colourisation source for spots
|
||
Spot_{TL,TR,BL,BR}.png
|
||
|
||
models/
|
||
best_spinda_model.pth
|
||
```
|
||
|
||
`metadata.json` format: `[{"img_path": "...", "pid_hex": "...", "target": [int×8]}, ...]`
|
||
|
||
## Key invariants
|
||
|
||
- **Visual collisions:** ~1.3 % of fingerprints are shared by multiple PIDs (many-to-one mapping). `SpindaRegistry` stores `(fingerprint, pid_hex)` pairs with a unique constraint so `lookup_by_fingerprint` can return *all* matching PIDs — this is intentional, not a bug.
|
||
|
||
- The **validation set** uses white backgrounds (no augmentation baked in) to give a stable epoch-comparable baseline. Do not add augmentation to `generate_val_set.py`.
|
||
- The **augmented test set** is pre-generated and fixed. Regenerating it changes the baseline; do so intentionally.
|
||
- The crop output size is always **128×128** regardless of tier. The model transform chain also resizes to 128×128, so the inference path is robust to re-size.
|
||
- `generate_high_fidelity_spinda()` always takes `bg_color` as a `(R, G, B)` tuple in PIL order (not BGR).
|