Files
vibe-spinda/design doc.md
2026-05-08 09:22:50 -04:00

3.8 KiB

Spinda Coordinate Regression & Global Registry (SCRGR)

Date Created: 2026-05-07 10:49:15

Tags: #MachineLearning #ComputerVision #Python #Pokemon #Spinda #Regression

The Problem

There are 2^{32} (over 4.2 billion) Spinda variations, but identifying a specific pattern from a user-submitted photo or screenshot is currently a manual, error-prone process. Because a 32-bit PID determines the exact coordinates of four facial spots on a discrete 16 \times 16 grid, a system is needed to automatically extract these coordinates and map them to their corresponding game data without requiring a massive, unsearchable database of raw images.

Context

Spinda's visual appearance is deterministic. The PID is split into four bytes, each providing the (x, y) coordinates for one of the four spots.

  • Current State: Existing tools can generate a pattern from a PID, but the inverse (Pattern → PID) is difficult due to "visual collisions" (multiple PIDs resulting in identical spot placements) and the noise inherent in real-world photography (glare, blur, and distortion).

  • Technical Shift: While initial discussions considered abstract image fingerprinting, the realization that the "identity" of a Spinda is mathematically defined by 8 discrete integers (4 \text{ spots} \times 2 \text{ coordinates}) allows for a more precise Regression-based approach.

Design

Summary

The proposed solution uses a Coordinate Regression Model to translate pixels into a 8-dimension vector of spatial coordinates. This vector is rounded to the nearest integers to match the game's internal 16 \times 16 grid, providing a "Visual Fingerprint" that can be instantly looked up in a O(1) hash map to identify associated PIDs.

Detailed Design

1. Synthetic Data Generation & Augmentation

To facilitate a "smooth" training experience in Python, we will build a generator using libraries like OpenCV or PIL:

  • Perfect Sprites: Generate 2D Spinda faces with known ground-truth coordinates.

  • Augmentation Pipeline: Apply "Domain Randomization" to simulate real-world conditions:

    • Spatial Transforms: Slight rotations and tilts to mimic handheld photography.

    • Sensor Noise: Add Gaussian noise and Moiré patterns to simulate digital camera sensors.

    • Grid Jitter: Ensure the model learns the center-of-mass for a spot even if it is partially obscured.

2. ML Architecture: Coordinate Regression

Instead of a classification model, we will implement a Regression CNN (e.g., a modified ResNet or MobileNet backbone):

  • Input: A standardized 128 \times 128 crop of the Spinda face.

  • Output Layer: A dense layer with 8 neurons using a linear activation function, representing [\hat{x}_1, \hat{y}_1, \hat{x}_2, \hat{y}_2, \hat{x}_3, \hat{y}_3, \hat{x}_4, \hat{y}_4].

  • Loss Function: Mean Squared Error (MSE) to minimize the distance between predicted and actual grid coordinates.

3. Deterministic "Snap-to-Grid" Matching

Post-inference, the model's float outputs are processed to ensure mathematical accuracy:

  • Rounding: Outputs are rounded to the nearest integer within the [0, 15] range.

  • Hashing: The 8 integers are concatenated into a unique string key (e.g., "12-04-08-09-02-01-15-14").

  • Collision Handling: The database maps this key to a list of all PIDs that produce that visual output, accounting for the BDSP "Endian flip" and other internal overlaps.

4. The Global Registry & Audit Trail

  • Automated Documentation: Successfully matched Spindas are added to a community database.

  • Manual Review System: For entries with low model confidence (e.g., if the floats were far from an integer before rounding), the system logs the original image for administrator "Approve/Reject" review to maintain data integrity.