CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

All commands must be run from the project root using the local venv:

# End-to-end identification (the main entry point)
.venv/bin/python identify.py <image_path>

# Train the model
.venv/bin/python -m src.models.train --epochs 50

# Evaluate a trained model on val and aug_test sets
.venv/bin/python -m src.models.evaluate [--backbone resnet34] [--model_path <path>]

# Run inference only (no registry lookup)
.venv/bin/python src/models/inference.py <image_path>

# Generate/regenerate the fixed validation set (seed=42, 1000 samples, white bg)
.venv/bin/python -m src.data.generate_val_set

# Generate/regenerate the fixed augmented test set (seed=99, 500 samples)
.venv/bin/python -m src.data.generate_aug_test_set

# Generate a single sample image for visual inspection
.venv/bin/python src/data/renderer.py

# Lint
.venv/bin/ruff check src/

# Type check
.venv/bin/mypy src/

The package is installed in editable mode (pip install -e .); imports use src.* paths.

Important: DataLoader workers use multiprocessing, so training must be invoked as a module (python -m src.models.train), not as a script piped via stdin — Python cannot resolve the worker main_path in that case.

Architecture

The pipeline has five stages:

Image → Detector → Inference → Resolver → Registry
                  (cropped)   (logits)   (PIDs)     (SQLite)

1. Detection (`src/utils/detector.py`)

SpindaDetector.detect_and_crop() returns a 128×128 BGR numpy array, or None.

Two-tier strategy, tried in order:

Tier 1 (screenshots/sprites): HSV-filter red pixels → find individual spot blobs → cluster to 4 spots → derive crop from cluster centroid + _SPOT_CROP_RATIO=5.5 (= 128 / 24.5 px span) with a _SPOT_CENTER_OFFSET=0.056 downward shift so the spot centroid lands at 44.4 % from the top of the crop (matching the training canvas).
Tier 2 (real photos, spots merged): Find the full Spinda body blob; score = circularity + 0.2·log(area/min_area) − 0.1·|aspect − 1.12|; crop the face (top 43/58 of body height) using blob width as scale reference.

2. Model (`src/models/regression_model.py`)

Configurable backbone (default: ResNet-34) with the final FC replaced by Linear(feat_dim, 8·16). Forward pass returns (B, 8, 16) — treating each of the 8 coordinates as a 16-class classification problem. Trained with CrossEntropyLoss on view(-1, 16) vs view(-1) targets; predictions use argmax(dim=2).

Supported backbones: resnet18 (512-d), resnet34 (512-d), convnext_tiny (768-d).

3. Training (`src/models/train.py`)

SpindaDataset (200 k virtual samples/epoch): generates a fresh random 32-bit PID per __getitem__, renders the sprite with a random background colour, then applies the full augmentation pipeline.
SpindaEvalDataset (in src/data/dataset.py): loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both data/val/ (clean, seed=42) and data/aug_test/ (augmented, seed=99).
_worker_init_fn re-seeds Python random and NumPy per worker so forked workers generate distinct PIDs.
Weighted loss: BL_x ×1.5, BL_y ×2.5 — applied during training only; val loss is unweighted.
Early stopping: patience = 10 epochs on clean-val exact-match rate.
Checkpoints saved to models/best_{backbone}_model.pth.

4. PID Encoding (domain invariant — must not be changed)

The 8 model outputs map directly to hex nibbles of the 32-bit PID via the ProfessorRex convention:

Coord index	Nibble	Spot	Notes
0 (TL_x)	`pid[-1]`	TL	no pixel offset
1 (TL_y)	`pid[-2]`	TL
2 (TR_x)	`pid[-3]`	TR	+24 px
3 (TR_y)	`pid[-4]`	TR	+1 px
4 (BL_x)	`pid[3]`	BL	+6 px
5 (BL_y)	`pid[2]`	BL	+18 px
6 (BR_x)	`pid[1]`	BR	+18 px
7 (BR_y)	`pid[0]`	BR	+19 px

SpindaResolver.coordinates_to_pid() reconstructs each byte as (Y << 4) | X; BDSP reverses the byte order.

5. Registry (`src/registry/database.py`)

SQLite at data/spinda_registry.db. Schema: (fingerprint TEXT, pid_hex TEXT, UNIQUE) with an index on fingerprint. SpindaRegistry.add_entry() is idempotent (ignores IntegrityError).

Data layout

data/
  val/             # 1000 fixed clean sprites, white bg (seed=42) — stable benchmark
    metadata.json
    sample_NNNN.png
  aug_test/        # 500 fixed augmented images (seed=99) — domain-adaptation tracker
    metadata.json
    sample_NNNN.png
  spinda_registry.db

assets/            # Sprite assets used by the renderer
  Spinda_Base_Top.png   # 52×43 face layer
  Spinda_Head.png       # colourisation source for spots
  Spot_{TL,TR,BL,BR}.png

models/
  best_resnet34_model.pth      # current best (default)
  best_convnext_tiny_model.pth # convnext experiment

metadata.json format: [{"img_path": "...", "pid_hex": "...", "target": [int×8]}, ...]

Key invariants

Visual collisions: ~1.3 % of fingerprints are shared by multiple PIDs (many-to-one mapping). SpindaRegistry stores (fingerprint, pid_hex) pairs with a unique constraint so lookup_by_fingerprint can return all matching PIDs — this is intentional, not a bug.
The validation set uses white backgrounds (no augmentation baked in) to give a stable epoch-comparable baseline. Do not add augmentation to generate_val_set.py.
The augmented test set is pre-generated and fixed. Regenerating it changes the baseline; do so intentionally.
The crop output size is always 128×128 regardless of tier. The model transform chain also resizes to 128×128, so the inference path is robust to re-size.
generate_high_fidelity_spinda() in src/data/renderer.py always takes bg_color as a (R, G, B) tuple in PIL order (not BGR).
SpindaInference.predict() accepts either a file path or a BGR numpy array directly (e.g. from the detector).

6.0 KiB Raw Blame History Unescape Escape

CLAUDE.md

Commands

Architecture

1. Detection (src/utils/detector.py)

2. Model (src/models/regression_model.py)

3. Training (src/models/train.py)