⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Viboy Color Project Log

← Return to index

Step 0426: Triage 10 Failures + Clustering

📋 Executive Summary

Complete and systematic triage of the 10 failed tests remaining after Step 0425. Exact capture of failures, root cause analysis by cluster and selection of atomic fix strategy. Critical decision:DO NOT touch code in this Step, only rigorous diagnosis.

Clustering Result:
  • Cluster A: 6 PPU bugs (framebuffer swap bug) - 🔴 HIGH priority
  • Cluster B: 3 Registers failures (Post-Boot vs Zero-Init) - 🟡 HALF priority
  • Cluster C: 1 CPU Control fault (EI delay test bad) - 🟢 LOW priority

Cluster selected for Step 0427:Cluster B (Foundation first, smaller area)

🔧 Hardware Concept

1. Post-Boot State (Pan Docs - Power Up Sequence)

When the Game Boy is turned on, the Boot ROM runs an initialization sequence and leaves the records in aspecific statebefore jumping to the cartridge code (0x0100):

  • DMG: A=0x01, BC=0x0013, DE=0x00D8, HL=0x014D, SP=0xFFFE, PC=0x0100, F=0xB0
  • CGB: A=0x11, BC=0x0000, DE=0xFF56, HL=0x000D, SP=0xFFFE, PC=0x0100, F=0x80

The Viboy Color core implementsDefault Post-Boot State(skip-boot), which means thatPyRegisters()starts with PC=0x0100, not PC=0x0000.

2. EI Delay (Pan Docs - CPU Instruction Set)

The instructionEI (Enable Interrupts)has critical behavior:1 instruction delay. This means that IME is not activated immediately, butafter executing the following statement.

; hardware-accurate example:
EI ; IME is still False here
NOPE ; This instruction is executed with IME=False
; Here IME is activated automatically
; Pending interrupts are processed

This is critical for patterns likeEI + RETIused in interrupt handlers.

3.PPU Framebuffer Double-Buffering

The emulator usesdouble-bufferingTo avoid tearing:

  • back buffer: Where the PPU writes pixels during rendering (line by line)
  • Front buffer: Buffer exposed to Python/SDL for display
  • Swap: At the end of the frame (LY=144), you must copy back → front

The bug detected: the back buffer has correct pixels, but the front buffer remains blank (swap does not work).

⚙️ Technical Implementation

Task T1: Accurate Fault Capture

Commands executed:

pytest -q > /tmp/viboy_0426_pytest.log 2>&1
tail -n 140 /tmp/viboy_0426_pytest.log
grep -n "^FAILED " /tmp/viboy_0426_pytest.log

# Individual failures per cluster:
pytest -vv tests/test_core_ppu_rendering.py::TestCorePPURendering::test_bg_rendering_simple_tile --maxfail=1 -x > /tmp/viboy_0426_ppu_rendering_first.log 2>&1
pytest -vv tests/test_core_registers.py::TestPyRegistersPCSP::test_program_counter --maxfail=1 -x > /tmp/viboy_0426_registers_first.log 2>&1
pytest -vv tests/test_cpu_control.py::TestCPUControl::test_di_ei_sequence --maxfail=1 -x > /tmp/viboy_0426_cpu_control_first.log 2>&1

Root Cause Analysis

Cluster A: PPU Framebuffer Swap (6 failures)

Tests affected:

  • test_bg_rendering_simple_tile
  • test_signed_addressing_fix
  • test_sprite_rendering_simple
  • test_sprite_transparency
  • test_sprite_x_flip
  • test_sprite_palette_selection

Typical assertion:

AssertionError: First pixel must be black (0xFF000000), it is 0xFFFFFFFF (index=0)
assert 4294967295 == 4278190080

Log evidence:

[PPU-RENDER-WRITE] First 20 pixels: 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[PPU-FRAMEBUFFER-LINE] Pixel (0, 0): index=0 # ❌ Should be 3

Diagnosis:✅ Core bug - The renderer correctly writes to the back buffer, but the final buffer exposed to Python remains blank. problem inrenderer.py(swap/copy of buffers).

Cluster B: Registers Post-Boot vs Zero-Init (3 failures)

Tests affected:

  • test_program_counter
  • test_stack_pointer
  • test_initialization_by_default

Typical assertion:

def test_program_counter(self):
    reg = PyRegisters()
    assert reg.pc == 0 # ❌ Failure: pc=0x0100 (256)

Core code (Registers.cpp:33):

CoreRegisters::CoreRegisters() :
    a(0x01), b(0x00), c(0x13), d(0x00), e(0xD8),
    h(0x01), l(0x4D), f(0xB0),
    pc(0x0100), // Post-Boot State (Pan Docs)
    sp(0xFFFE)

Diagnosis:❌ Poorly designed tests - The core correctly implements the Post-Boot State (PC=0x0100 according to Pan Docs). The tests assume initialization to zero, which contradicts the design of the real hardware.

Cluster C: CPU Control EI Delay (1 failure)

Test affected:

  • test_di_ei_sequence

Test code:

cpu._op_ei()
assert cpu.ime is True # ❌ Failure: ime=False

Core code (core.py:2405):

def _op_ei(self) -> int:
    """EI is delayed by 1 instruction (Pan Docs)"""
    self.ime_scheduled = True # Do not activate immediately
    return 1

Diagnosis:❌ Poorly designed test - The test calls_op_ei()directly and expect IME=True immediately. The core correctly implements the 1-instruction delay (hardware-accurate according to Pan Docs).

🎯 Strategic Decision

Cluster Selected for Step 0427: Cluster B (Post-Boot Registers)

Justification:

  1. Foundation first: Resolve register initialization mismatch before PPU
  2. Smaller changing surface: Only touch tests, no core (3 tests in 1 file)
  3. Clear design decision: Document Post-Boot vs Zero-Init policy
  4. Don't touch PPU yet: Cluster A (PPU framebuffer) is more complex and should be done after cleaning foundation

Proposed strategy:

  • Add methodreset_to_zero()toPyRegistersfor tests that need raw state
  • Update the 3 tests to usereset_to_zero()or accept Post-Boot values
  • Document policy in the test with comments Pan Docs
  • Verification: build + test_build + pytest target + pytest global (215 → 218 passing)

Cluster resolution order:

  1. Step 0427: Cluster B (Registers) - Foundation
  2. Step 0428: Cluster C (CPU Control) - Low-hanging fruit
  3. Step 0429+: Cluster A (PPU Framebuffer) - Requires deep investigation in renderer.py

✅ Tests and Verification

Command Executed

pytest -q

Result

======================== 10 failed, 215 passed in 0.88s ========================

Validation

  • ✅ Exact capture of 10 bugs (no new bugs introduced)
  • ✅ Root cause clustering completed
  • ✅ First cluster failure analyzed in detail
  • ✅ Atomic fix strategy defined
  • ✅ NO code was modified (pure triage)

Failure Breakdown by Cluster

Cluster Failures Cause Guy Priority
A (PPU) 6 Framebuffer swap bug core bug 🔴 HIGH
B (Registers) 3 Post-Boot vs Zero-Init Bad tests 🟡 MEDIUM
C (CPU Control) 1 The delay test is bad Bad test 🟢 LOW

📁 Files Analyzed

  • tests/test_core_ppu_rendering.py- 2 failures (Cluster A)
  • tests/test_core_ppu_sprites.py- 4 failures (Cluster A)
  • tests/test_core_registers.py- 3 failures (Cluster B)
  • tests/test_cpu_control.py- 1 failure (Cluster C)
  • src/core/cpp/Registers.cpp- Verified (Post-Boot correct)
  • src/cpu/core.py- Verified (correct delay)

📚 Lessons Learned

  • Disciplined triage: NOT modifying the code in the diagnostic step avoids introducing new faults
  • Root Cause Clustering: Grouping faults allows atomic fixes and minimizes change surface
  • Foundation first: Resolve register/CPU discrepancies before PPU simplifies subsequent debugging
  • Tests vs Spec: When a test contradicts Pan Docs, the test is wrong (not the core)
  • Post-Boot State: The emulator must decide between Zero-Init (for pure tests) or Post-Boot (for realism)
  • Hardware-accurate delay: 1 instruction delay is critical for real games (do not simplify)

🔗 References

  • Pan Docs - Power Up Sequence: Post-Boot State (DMG/CGB)
  • Pan Docs - CPU Instruction Set (EI): Delay of 1 instruction
  • Pan Docs - Video Display Controller: Double-buffering and framebuffer swap