Files
vibe-spinda/CLAUDE.md
2026-05-08 16:42:35 -04:00

5.6 KiB
Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

All commands must be run from the project root using the local venv:

# End-to-end identification (the main entry point)
.venv/bin/python identify.py <image_path>

# Train the model
.venv/bin/python -m src.models.train --epochs 50 --batch_size 64 --lr 1e-4

# Evaluate a trained model on val and aug_test sets
.venv/bin/python -m src.models.evaluate [--backbone resnet18|resnet34] [--model_path <path>]

# Run inference only (no registry lookup)
.venv/bin/python src/models/inference.py <image_path>

# Generate/regenerate the fixed validation set (seed=42, 1000 samples, white bg)
.venv/bin/python -m src.data.generate_val_set

# Generate/regenerate the fixed augmented test set (seed=99, 500 samples)
.venv/bin/python -m src.data.generate_aug_test_set

# Generate a single sample image for visual inspection
.venv/bin/python src/data/high_fidelity_generator.py

# Lint
.venv/bin/ruff check src/

# Type check
.venv/bin/mypy src/

The package is installed in editable mode (pip install -e .); imports use src.* paths.

Important: DataLoader workers use multiprocessing, so training must be invoked as a module (python -m src.models.train), not as a script piped via stdin — Python cannot resolve the worker main_path in that case.

Architecture

The pipeline has five stages:

Image → Detector → Inference → Resolver → Registry
                  (cropped)   (logits)   (PIDs)     (SQLite)

1. Detection (src/utils/detector.py)

SpindaDetector.detect_and_crop() returns a 128×128 BGR image, or None.

Two-tier strategy, tried in order:

  • Tier 1 (screenshots/sprites): HSV-filter red pixels → find individual spot blobs → cluster to 4 spots → derive crop from cluster centroid + _SPOT_CROP_RATIO=5.5 (= 128 / 24.5 px span) with a _SPOT_CENTER_OFFSET=0.056 downward shift so the spot centroid lands at 44.4 % from the top of the crop (matching the training canvas).
  • Tier 2 (real photos, spots merged): Find the full Spinda body blob; score = circularity + 0.2·log(area/min_area) 0.1·|aspect 1.12|; crop the face (top 43/58 of body height) using blob width as scale reference.

2. Model (src/models/regression_model.py)

ResNet-18 backbone with the final FC replaced by Linear(512, 8·16). Forward pass returns (B, 8, 16) — treating each of the 8 coordinates as a 16-class classification problem. Trained with CrossEntropyLoss on view(-1, 16) vs view(-1) targets; predictions use argmax(dim=2).

3. Training (src/models/train.py)

  • SpindaDataset (200 k virtual samples/epoch): generates a fresh random 32-bit PID per __getitem__, renders the sprite with a random background colour, then applies the full augmentation pipeline.
  • SpindaEvalDataset: loads pre-generated images from disk (post-augmentation, pre-normalisation) and applies only the normalise step. Used for both data/val/ (clean, seed=42) and data/aug_test/ (augmented, seed=99).
  • _worker_init_fn re-seeds Python random and NumPy per worker so forked workers generate distinct PIDs.
  • Early stopping: patience = 10 epochs on clean-val exact-match rate.
  • Best model checkpoint: models/best_spinda_model.pth.

4. PID Encoding (domain invariant — must not be changed)

The 8 model outputs map directly to hex nibbles of the 32-bit PID via the ProfessorRex convention:

Coord index Nibble Spot Notes
0 (TL_x) pid[-1] TL no pixel offset
1 (TL_y) pid[-2] TL
2 (TR_x) pid[-3] TR +24 px
3 (TR_y) pid[-4] TR +1 px
4 (BL_x) pid[3] BL +6 px
5 (BL_y) pid[2] BL +18 px
6 (BR_x) pid[1] BR +18 px
7 (BR_y) pid[0] BR +19 px

SpindaResolver.coordinates_to_pid() reconstructs each byte as (Y << 4) | X; BDSP reverses the byte order.

5. Registry (src/registry/database.py)

SQLite at data/spinda_registry.db. Schema: (fingerprint TEXT, pid_hex TEXT, UNIQUE) with an index on fingerprint. SpindaRegistry.add_entry() is idempotent (ignores IntegrityError).

Data layout

data/
  val/             # 1000 fixed clean sprites, white bg (seed=42) — stable benchmark
    metadata.json
    sample_NNNN.png
  aug_test/        # 500 fixed augmented images (seed=99) — domain-adaptation tracker
    metadata.json
    sample_NNNN.png
  spinda_registry.db

assets/            # Sprite assets used by the renderer
  Spinda_Base_Top.png   # 52×43 face layer
  Spinda_Head.png       # colourisation source for spots
  Spot_{TL,TR,BL,BR}.png

models/
  best_spinda_model.pth

metadata.json format: [{"img_path": "...", "pid_hex": "...", "target": [int×8]}, ...]

Key invariants

  • Visual collisions: ~1.3 % of fingerprints are shared by multiple PIDs (many-to-one mapping). SpindaRegistry stores (fingerprint, pid_hex) pairs with a unique constraint so lookup_by_fingerprint can return all matching PIDs — this is intentional, not a bug.

  • The validation set uses white backgrounds (no augmentation baked in) to give a stable epoch-comparable baseline. Do not add augmentation to generate_val_set.py.

  • The augmented test set is pre-generated and fixed. Regenerating it changes the baseline; do so intentionally.

  • The crop output size is always 128×128 regardless of tier. The model transform chain also resizes to 128×128, so the inference path is robust to re-size.

  • generate_high_fidelity_spinda() always takes bg_color as a (R, G, B) tuple in PIL order (not BGR).