⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

First-Nonzero Tiledata + Source-Read Correlation + Wait-Loop Explainer

Date:2026-01-10 StepID:0502 State: VERIFIED

Summary

This Step implements advanced instrumentation to diagnose why DMG games only write zeros to VRAM. were addedwritten content counters(zero vs non-zero) inVRAMWriteAuditStats, trackingfirst/last nonzero writewith full context (PC, frame, PPU mode), and aMMU reads ring bufferfor source-read correlation (detect if the CPU is copying zeros from ROM/RAM). Implemented automatic stop when the first non-zero write is detected and a DMG v4 classifier that uses these new metrics for more precise diagnosis. The goal is to get a binary answer based on data: are you trying to write real tiles (nonzero)? Where do the written values ​​come from?

Hardware Concept

Step 0501 Problem: It was detected that DMG games perform 6144 writes to tiledata (exactly the size of 0x8000-0x97FF = 0x1800 bytes), all with value 0x00. This suggests that the game is doing a complete "clear VRAM", but then not loading actual tiles.

Step 0502 Hypothesis: We need to distinguish between three possible scenarios:

  • A) "There are never non-zero writes": The game is stuck / does not progress to the tile loading phase
  • B) "There are non-zero writes, but then it becomes zero": Something is deleting the tiles or there is a repeated clear/swap
  • C) "Writes are zero because the source read is zero": Bug in ROM reads/MBC/mapping or data fetch is failing

Source-Read Correlation: To diagnose scenario C, we need to trace the memory reads that precede the writes to VRAM. If a write to VRAM writes 0x00 and the last read before the write also read 0x00 from ROM/RAM, then we know the problem is in the source (the data being copied), not the destination (VRAM).

Reads Ring Buffer: We implemented a lightweight ring buffer (256 events) that captures all memory reads (ROM, WRAM, HRAM, VRAM, IO). When writing to VRAM, we look at the last 1-3 reads of the ring to find the most likely source. This is heuristic but sufficient to detect if the problem is "source=0".

Wait Loop Detection: If the game clears and then stays in a waiting loop (e.g. waiting for VBlank, waiting for a flag), we need to identify which I/O address it is waiting on. This is done by analyzing the PC "hotspot" (most visited PC) and what I/O addresses are read from that hotspot.

Reference: Pan Docs - Memory Map, VRAM Access, CPU Instruction Set

Implementation

Phase A: Actual Accounting of Written Content ✅

A1) VRAMWriteAuditStats v2:

  • ExtendedVRAMWriteAuditStatswith written content counters:
    • tiledata_writes_zero_count: Writes with value 0x00
    • tiledata_writes_nonzero_count: Writes with value != 0x00
    • tiledata_writes_ff_count: Writes with value 0xFF (optional)
  • Added trackingFirstNonzeroWriteandLastNonzeroWritewith full context:
    • frame_id, PC, addr, value, stat_mode, ly, lcdc
    • Allows you to identify exactly when and where the first real tile appears
  • Sample unique non-zero values ​​(up to 8 values) to identify patterns

A2) Ring Buffer of "Interesting" Events:

  • Dedicated 64-event ring buffer for "interesting" writes:
    • tiledata write wherevalue != 0x00
    • tiledata write blocked
    • tiledata write forced
  • Prevents the ring from being filled with "clear = 0" garbage

Phase B: Source-Read Correlation ✅

B1) MMU Read Ring:

  • Implemented 256 event ring buffer for all memory reads
  • StructureMMUReadEventcapture: frame_id, PC, addr, value, region (ROM0/ROMX/WRAM/HRAM/IO/VRAM), type (normal/IO)
  • Capture inMMU::read()for ROM, VRAM, WRAM, HRAM
  • Lightweight and non-intrusive (only saves last 256 reads)

B2) Heuristic Write↔Read Correlation:

  • ExtendedVRAMWriteEventwith correlation fields:
    • src_addr_guess: Estimated source address (from last read)
    • src_value_guess: Estimated source value (from last read)
    • src_region_guess: Estimated source region (ROM0/ROMX/WRAM/HRAM/IO)
    • src_correlation_valid: If the correlation is valid (last 1-3 reads)
  • InMMU::write()when destination is VRAM, searches the last 3 reads of the ring:
    • Search reads with the same PC (direct correlation)
    • Or use the most recent read if it is in the last 3
  • harsh interpretation:
    • Yeahvalue_written == 0x00andsrc_value_guess == 0x00repeatedly → Source is giving zeros
    • Yeahsrc_value_guess != 0butvalue_written == 0→ Problem in CPU/A-reg/order (less likely)

Phase C: Hotspot Explainer (Partial) ⚠️

C1) Hotspot Explainer:

  • Partially implemented: DMG v4 classifier uses PC hotspot to detect wait loops
  • Pending: full ROM dump around hotspot, loop stability counter, and top read addr from hotspot
  • This requires additional tracking of reads per PC which can be done heuristically from the MMU read ring.

Phase D: Run Until Evidence ✅

D1) Automatic Stop:

  • Implemented inrom_smoke_0442.py:
    • Flagstop_on_first_tiledata_nonzero_write: Stops when first non-zero write to tiledata is detected
    • Flagstop_on_first_tiledata_nonzero: For when first non-zero tiledata is detected in VRAM (readback)
  • CLI arguments added:
    • --stop-on-first-tiledata-nonzero
    • --stop-on-first-tiledata-nonzero-write
  • Minimal executions without waiting by eye: stops automatically when evidence appears

Phase E: DMG v4 Classifier ✅

E1) New Classifications:

  • ONLY_CLEAR_TO_ZERO: 6144 writes to 0, no nonzero ever
  • SOURCE_READS_ZERO: src_guess almost always 0 (source-read correlation)
  • NONZERO_WRITTEN_THEN_CLEARED: Nonzero appears and then returns to 0
  • NONZERO_PRESENT_OK: tiledataNZ goes up (real tiles present)
  • WAIT_LOOP_ON_ADDR_X: Hotspot always reads the same addr (pending full implementation)

E2) Minimum Suggested Fix:

  • YeahSOURCE_READS_ZEROandsrc_addr_guesspoints to ROMX → Strong suspicion to MBC/banking/ROM read
  • YeahONLY_CLEAR_TO_ZERO+ hotspot waiting in WRAM/HRAM → Suspected VBlank handler flag that is not set
  • YeahNONZERO_WRITTEN_THEN_CLEARED→ Find who cleans (clear PC) and why it repeats

Created/Modified Components

  • src/core/cpp/MMU.hpp: Extended structures (VRAMWriteAuditStats, VRAMWriteStats, MMUReadEvent, FirstNonzeroWrite, LastNonzeroWrite)
  • src/core/cpp/MMU.cpp: Implementation of zero/non-zero counters, source-read correlation, ring buffer of reads
  • src/core/cython/mmu.pxd: Updated Cython declarations
  • src/core/cython/mmu.pyx: Methodsget_vram_write_stats_v2(), get_vram_write_audit_stats()updated
  • tools/rom_smoke_0442.py: Auto stop, DMG v4 classifier

Affected Files

  • src/core/cpp/MMU.hpp- Extension of VRAMWriteStats, VRAMWriteAuditStats structures, new MMUReadEvent structure
  • src/core/cpp/MMU.cpp- Implementation of zero/non-zero counters, tracking first/last nonzero, MMU read ring, source-read correlation
  • src/core/cython/mmu.pxd- Cython declarations updated with new structures
  • src/core/cython/mmu.pyx- get_vram_write_stats_v2() method, extended get_vram_write_audit_stats(), get_vram_write_ring() with correlation
  • tools/rom_smoke_0442.py- Auto stop (stop_on_first_tiledata_nonzero, stop_on_first_tiledata_nonzero_write), DMG v4 classifier

Tests and Verification

Compilation:

python3 setup.py build_ext --inplace
# Exit code: 0 (success)
# Only warnings, no critical errors

Build Test:

python3 test_build.py
# [SUCCESS] The build pipeline works correctly
# The C++/Cython core is ready for Phase 2.

C++ Compiled Module Validation: The methodsget_vram_write_stats_v2()andget_vram_write_audit_stats()They are available from Python and can be called without errors.

Pending Tests:

  • Executerom_smoke_0442.pywith--stop-on-first-tiledata-nonzero-writeto validate automatic stop
  • Verify that the DMG v4 classifier returns correct classifications based on the new metrics
  • Validate that the source-read correlation is working correctly by comparing VRAM writes with previous reads

Sources consulted

Educational Integrity

What I Understand Now

  • Source-Read Correlation: It is possible to trace where the values ​​written to VRAM come from by analyzing previous reads. This is heuristic (search the last 1-3 reads) but enough to detect if the problem is "source=0" (the data being copied is zero) vs "destination=problem" (VRAM is blocked/corrupted).
  • Content Counters: Distinguishing between "write attempts" and "written content" is crucial. A blocked write is still an attempt, but if the written content (when allowed) is always zero, then we know that the problem is not the blocking but the content.
  • Reads Ring Buffer: A light ring buffer (256 events) is sufficient to capture the context necessary for source-read correlation without consuming memory. We only need the last N reads, not all historical reads.

What remains to be confirmed

  • Hotspot Explainer Complete: Complete hotspot explainer implementation showing ROM dump around the PC, loop stability counter, and top read addr from the hotspot. This requires additional tracking of reads per PC.
  • Validation with Real ROMs: Run rom_smoke_0442.py with the new flags and verify that the classifications are correct and useful for diagnosis.
  • More Accurate Source-Read Correlation: The current heuristic (last 1-3 reads) is sufficient for simple cases, but could be improved with instruction pattern analysis (e.g. if the instruction is LD (HL),A, then the last read is probably the source).

Hypotheses and Assumptions

Correlation Assumption: We assume that if a VRAM write occurs shortly after a ROM/RAM read, the written value probably comes from the read value. This is heuristic and may fail in cases where there are multiple interleaved reads/writes, but for most cases (simple data copy) it should work.

Next Steps

  • [ ] Executerom_smoke_0442.pywith--stop-on-first-tiledata-nonzero-writeon tetris.gb and pkmn.gb for real evidence
  • [ ] Complete Hotspot Explainer: ROM dump around PC, loop stability, top read addr
  • [ ] Analyze results from the DMG v4 classifier and determine minimum fix according to the classifications
  • [ ] If SOURCE_READS_ZERO → Investigate MBC/banking/ROM read
  • [ ] If ONLY_CLEAR_TO_ZERO + wait loop → Investigate VBlank handler flags
  • [ ] If NONZERO_WRITTEN_THEN_CLEARED → Identify who cleans and why