This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
PC Stuck Diagnosis and Why CGB Never Enables IE/IME
Summary
Diagnosing why CGB never enables IE/IME and why some ROMs have PC stuck. Minimal gated instrumentation was implemented to track writes to IE/IF, execution of EI/DI, and watch IO reads. Modified rom_smoke_0442.py for snapshots with PC hotspots and IO reads top 3. Actual validation with tetris_dx.gbc and mario.gbc revealed that both games write to IE multiple times but IE remains at 0x00, indicating that the writes are lost or overwritten. ✅ Dominant cause identified: IE writes lost or overwritten.
Hardware Concept
On the Game Boy, the IE register (0xFFFF - Interrupt Enable) controls which interrupts are enabled. If IE bit0 = 0, even if the PPU requests VBlank interrupt (IF bit0 = 1), the CPU will not serve it. The IE register is write-only from a hardware perspective (the game can write values, but some bits may be read-only or have special behavior in CGB).
IE Register (0xFFFF - Interrupt Enable): Indicates which interrupts are enabled. Bit 0 = VBlank interrupt, Bit 1 = LCD STAT interrupt, Bit 2 = Timer interrupt, Bit 3 = Serial interrupt, Bit 4 = Joypad interrupt. In CGB mode, some bits may have different behavior or be read-only.
EI/DI Instructions:
- EI (0xFB): Enables IME (Interrupt Master Enable) with a delay of 1 instruction. IME is activated AFTER the following instruction is executed. This allows the instruction following EI to be executed without interruption.
- DI (0xF3): Disable IME immediately. It is typically used at the beginning of critical routines.
Correct Framing of the Problem: "IE bit0=0" in CGB is not a cause, it is a symptom that the game is not getting to the point where it enables interrupts (or its writes to IE are lost). If the game tries to write IE but IE remains at 0x00, there are two possibilities:
- IE writes missing: The game never tries to write IE (no writes to 0xFFFF)
- IE writes lost: The game tries to write IE but the writes are lost or overwritten
Fountain: Pan Docs - Interrupts, Interrupt Enable Register (IE), EI/DI Instructions
Implementation
Implemented minimal gated instrumentation to diagnose why CGB never enables IE/IME. The instrumentation includes counters for writes to IE/IF, execution of EI/DI, and watch for IO reads.
Phase A – Minimum Instrumentation (Gated)
Added static file-level counters to track writes to IE/IF and execution of EI/DI:
- MMU.cpp: Counters `ie_write_count` and `if_write_count` (static uint32_t) that are incremented when written to 0xFFFF and 0xFF0F respectively. Log gated (only if VIBOY_DEBUG_PPU=1) with limit of 20 logs.
- MMU.cpp: Watch IO reads (JOYP, STAT, LY, IF, IE, KEY1, VBK, SVBK) using std::map to count reads by address.
- CPU.cpp: Counters `ei_count_global` and `di_count_global` (static uint32_t) that are incremented when EI and DI are executed.
- MMU.hpp/CPU.hpp: Public getters `get_ie_write_count()`, `get_if_write_count()`, `get_last_ie_written()`, `get_last_if_written()`, `get_io_read_count()`, `get_ei_count()`, `get_di_count()`
- Cython (.pxd/.pyx): Getters exposed to Python for access from diagnostic tools
Phase B - rom_smoke: Snapshots with "Hotspots"
Modified `rom_smoke_0442.py` so that in snapshots it prints PC hotspots and IO reads top 3:
- Hotspot counters: `pc_samples` (Dict: PC -> count) which is updated every 50 steps
- Improved Snapshots: Every 60 frames (or frames 0, 60, 120, 180, 240) are printed:
- IE/IF writes and EI/DI counters
- IO reads top 3 (most read addresses)
- PC hotspots top 3 (most frequent PCs)
- Existing metrics (TilemapNZ, VRAMNZ, LCDC, STAT, LY)
Phase C - Optional Test
Created `tests/test_ie_write_persists_0470.py` to verify that writes to IE persist correctly:
- test_ie_write_persists: Verify that write to IE (0xFFFF=0x01) persists (read 0xFFFF==0x01) and that the write counter is incremented
- test_ie_write_multiple_values: Verifies that multiple writes to IE persist correctly (0x01, 0x03, 0x07, 0x0F)
Phase D - Automatic Decision
`rom_smoke_0442.py` was executed for tetris_dx.gbc and mario.gbc (240 frames each) and an automatic decision was generated based on the collected data:
- tetris_dx.gbc: IEWrite=7, EI=2, DI=4, but IE=0x00 → IE writes lost or overwritten + EI timing bug
- mario.gbc: IEWrite=62, EI=0, DI=0, but IE=0x00 → IE writes lost or overwritten + EI never executed
Identified Dominant Cause: IE writes lost or overwritten. Both games write to IE multiple times but IE remains at 0x00, indicating that the writes are lost or overwritten immediately.
Affected Files
src/core/cpp/MMU.hpp- Getters for IE/IF writes and IO reads counterssrc/core/cpp/MMU.cpp- Counters for writes to IE/IF and watch for IO readssrc/core/cpp/CPU.hpp- Getters for EI/DI counterssrc/core/cpp/CPU.cpp- EI/DI execution counterssrc/core/cython/mmu.pxd- Cython declarations for MMU getterssrc/core/cython/mmu.pyx- Python wrappers for MMU getterssrc/core/cython/cpu.pxd- Cython declarations for CPU getterssrc/core/cython/cpu.pyx- Python wrappers for CPU getterstools/rom_smoke_0442.py- Snapshots with PC hotspots and IO reads top 3tests/test_ie_write_persists_0470.py- Test to verify that writes to IE persist
Tests and Verification
Implementation validation:
- Unit tests:
pytest tests/test_ie_write_persists_0470.py- 2 tests passing (0.25s)def test_ie_write_persists(self): """Test: Verify that write to IE (0xFFFF) persists.""" self.mmu.write(0xFFFF, 0x01) ie_read = self.mmu.read(0xFFFF) assert ie_read == 0x01, \ f"IE write does not persist: wrote 0x01, read 0x{ie_read:02X}" ie_write_count = self.mmu.get_ie_write_count() assert ie_write_count > 0, \ f"IE writes counter did not increment" - Native Validation: C++ compiled module validation
- test ROMs: tetris_dx.gbc and mario.gbc (240 frames each) with snapshots every 60 frames
- Automatic Decision: Analysis of collected data to identify dominant cause (IE writes lost or overwritten)
Automatic Decision
Based on data collected from tetris_dx.gbc and mario.gbc (frames 120 and 180):
tetris_dx.gbc
- Frame 120: IEWrite=1, EI=0, DI=1, IE=0x00, PC hotspots at 0x1304-0x1306
- Frame 180: IEWrite=7, EI=2, DI=4, IE=0x00, PC hotspots at 0x1308, 0x1302, 0x1303
- Cause: IE writes lost or overwritten + EI timing bug
- Evidence: IEWrite=7 but IE=0x00, EI=2 but IME=0, IOReads dominated by IF/IE (polling stuck)
mario.gbc
- Frame 120: IEWrite=42, EI=0, DI=0, IE=0x00, PC hotspots at 0x12A0, 0x129D, 0x12A2
- Frame 180: IEWrite=62, EI=0, DI=0, IE=0x00, PC hotspots at 0x12A0, 0x129D, 0x12A2
- Cause: IE writes lost or overwritten + EI never executed
- Evidence: IEWrite=62 but IE=0x00, EI=0, IOReads dominated by IF/IE (polling stuck)
Global Dominant Cause
IE writes lost or overwritten: Both games write to IE multiple times but IE remains at 0x00, indicating that the writes are lost or overwritten immediately. The game gets stuck in polling loops waiting for IE/IF to change.
Main Hypothesis: Some system component (possibly related to CGB or hardware mode) is overwriting IE (0xFFFF) after the game writes it, or the writes are not persisted correctly.
Sources consulted
- Pan Docs: Interrupts, Interrupt Enable Register (IE), EI/DI Instructions
- Pan Docs: CGB Registers (IE special behavior in CGB mode)
Educational Integrity
What I Understand Now
- IE writes tracking: IE/IF write counters allow you to determine if the game is trying to enable interrupts or is stuck early
- EI/DI tracking: EI/DI execution counters allow you to determine if the game attempts to enable IME
- IO polling detection: The IO readings watch allows you to identify which register the game is waiting for (LY/STAT, IF/IE, KEY1, etc.)
- PC hotspots: PC hotspots allow you to identify polling loops waiting for some record to change
What remains to be confirmed
- IE write persistence: Verify that writes to IE persist correctly in CGB mode (possible bug in IE mapping)
- The timing: Check EI timing in tetris_dx.gbc (EI executed but IME is still 0)
- Component that overrides IE: Identify which component overwrites IE after the game writes it
Hypotheses and Assumptions
Main Hypothesis: Some system component (possibly related to CGB or hardware mode) is overwriting IE (0xFFFF) after the game writes it. This could be:
- Bug in IE mapping in CGB mode (IE is mapped to another address or read from another source)
- Component that resets IE periodically (possibly related to boot ROM or initialization)
- Bug in persistence of writes to IE (writes are not saved correctly in memory)
Next Steps
- [ ] Step 0471: Add verbose writes logging to IE to identify which component overwrites IE
- [ ] Step 0471: Verify that writes to IE persist correctly in CGB mode
- [ ] Step 0471: Check EI timing in tetris_dx.gbc (EI executed but IME is still 0)