⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Engine Stabilization and HRAM Audit

Date:2025-12-25 StepID:0287 State: VERIFIED

Summary

Critical refactoring of the emulation core to eliminate static variables that caused interference between pytest tests, correction of the timing bug in run_scanline() that truncated the value -1 (HALT), optimization of the V-Blank handler log to filter delay loops in HRAM, and implementation of a write monitor in HRAM to understand the shadow routines that games copy there.

Hardware Concept

HRAM (High RAM) and Shadow Routines:HRAM is a 127-byte area (0xFF80-0xFFFE) on the Game Boy that is accessible in all memory cycles, unlike other areas that may be locked during DMA operations or VRAM access. Games often copy critical routines (such as interrupt handlers or delay loops) to HRAM to execute them faster. These "shadow" routines are copies of code that run from HRAM rather than from ROM or regular RAM.

HALT and CPU Cycles:When the CPU enters the HALT state, it stops executing instructions but the system clock continues to run. The CPU wakes up when an interrupt is pending. In our emulator, step() returns -1 when the CPU is in HALT to indicate "fast forward", but the uint8_t type cannot represent -1, causing a truncation that broke the cycle calculation.

State Isolation between Tests:Static variables in C++ persist between function calls, which means that the state of one test can "contaminate" the next. This is especially problematic in pytest, where multiple tests are run in the same session. By moving these variables to class members, each CPU instance has its own isolated state.

Fountain:Pan Docs - "HRAM (High RAM)", "CPU Instruction Set - HALT"

Implementation

Four main changes were made to stabilize the engine and improve instrumentation:

1. Refactoring from Static Variables to Class Members

Moved static instrumentation variables (in_vblank_handler, handler_step_count, post_delay_trace_active, post_delay_count) to private members of the CPU class. This ensures that each CPU instance has its own isolated state, eliminating interference between tests.

Code added in CPU.hpp:

// ========== Diagnostic Status (Step 0287) ==========
// These members replace static variables to isolate state between tests
bool in_vblank_handler_;      // Flag that indicates if we are executing the V-Blank handler
int vblank_handler_steps_;    // Step counter inside the handler
bool post_delay_trace_active_; // Flag to activate post-delay trail
int post_delay_count_;        // Post-delay traced instruction counter

Modified code in CPU.cpp (constructor):

CPU::CPU(MMU* mmu, CoreRegisters* registers)
    : mmu_(mmu), regs_(registers), ppu_(nullptr), timer_(nullptr), cycles_(0), 
      ime_(false), halted_(false), ime_scheduled_(false),
      in_vblank_handler_(false), vblank_handler_steps_(0), 
      post_delay_trace_active_(false), post_delay_count_(0) {
    // Step 0287: Initialization of diagnostic members
}

2. Correction of the Timing Bug in run_scanline()

Changed the type of m_cycles from uint8_t to int in run_scanline() to correctly handle the -1 value returned by step() when the CPU is HALT. The uint8_t type cannot represent -1, causing a truncation to 255 that broke the cycle calculation.

Modified code in CPU.cpp:

// Fine-grained emulation loop: execute instructions until 456 T-Cycles are accumulated
while (cycles_this_scanline< CYCLES_PER_SCANLINE) {
    // Ejecuta UNA instrucción y obtiene los M-Cycles consumidos
    // --- Step 0287: Cambiar a int para manejar correctamente -1 (HALT) ---
    int m_cycles = step();
    
    // Si step() devuelve 0, hay un error (opcode no implementado o similar)
    // Si step() devuelve -1, la CPU está en HALT (avance rápido)
    // En ambos casos, forzamos un avance mínimo para evitar bucles infinitos
    if (m_cycles <= 0) {
        m_cycles = 1;  // Forzar avance mínimo (1 M-Cycle = 4 T-Cycles)
    }
    // ... resto del código ...
}

3. V-Blank Handler Log Optimization

Added a filter to exclude the DEC A/JR NZ delay loop in HRAM (0xFF86-0xFF87) from the handler log. This loop is common in V-Blank handlers and generates thousands of log lines without providing useful information, saturating the output.

Modified code in CPU.cpp:

// Trace instructions inside the handler
// --- Step 0287: Filter delay loop in HRAM (0xFF86-0xFF87) to reduce noise ---
if (in_vblank_handler_ && vblank_handler_steps_< 500) {
    uint8_t op = mmu_->read(original_pc);
    
    // Filter the DEC A / JR NZ delay loop in HRAM to avoid saturating logs
    // This loop is common in V-Blank handlers and does not provide useful information
    if (original_pc< 0xFF86 || original_pc >0xFF87) {
        printf("[HANDLER-EXEC] PC:0x%04X OP:0x%02X | A:0x%02X HL:0x%04X | IME:%d\n",
               original_pc, op, regs_->a, regs_->get_hl(), ime_ ? 1 : 0);
    }
    vblank_handler_steps__+;
    // ... RET/RETI detection ...
}

4. HRAM Write Monitor

Implemented a monitor ([HRAM-WRITE]) that detects all writes to HRAM (0xFF80-0xFFFE). This monitor helps understand when and what code games copy to HRAM, which is critical to understanding shadow routines that run from there.

Code added in MMU.cpp:

// --- Step 0287: HRAM Write Monitor ([HRAM-WRITE]) ---
// HRAM (High RAM) is a 127-byte area (0xFF80-0xFFFE) used for high-speed routines.
// Games often copy critical routines (like interrupt handlers) to HRAM
// to execute them faster, since HRAM is accessible in all memory cycles.
// This monitor detects writes to HRAM to understand when and what is copied there.
// Source: Pan Docs - "HRAM (High RAM)": 0xFF80-0xFFFE, accessible in all cycles
if (addr >= 0xFF80 && addr<= 0xFFFE) {
    static int hram_write_count = 0;
    if (hram_write_count < 200) {  // Límite para evitar saturación
        printf("[HRAM-WRITE] Write %04X=%02X PC:%04X (Bank:%d)\n",
               addr, value, debug_current_pc, current_rom_bank_);
        hram_write_count++;
    }
}

Design Decisions

  • State Isolation:We chose to move the variables to class members instead of using a separate test context because it is cleaner and keeps the state encapsulated within the CPU instance.
  • Selective Filtering:The delay loop filter only excludes the range 0xFF86-0xFF87, allowing other instructions in HRAM to register normally. This balances the usefulness of the log with readability.
  • HRAM Monitor Limit:A limit of 200 writes was set for the HRAM monitor to avoid saturation, but it is enough to capture most shadow routine copies.

Affected Files

  • src/core/cpp/CPU.hpp- Added private members for diagnostic status
  • src/core/cpp/CPU.cpp- Refactoring of static variables, type correction in run_scanline(), handler log optimization
  • src/core/cpp/MMU.cpp- Implementation of the monitor [HRAM-WRITE]

Tests and Verification

The refactoring was validated by:

  • Unit tests:Execution ofpytest tests/ -vto verify that the tests pass correctly without interference between them.
  • Compiled C++ module validation:Successful rebuild of Cython extension withpython setup.py build_ext --inplace.
  • Log verification:Confirmation that the delay loop filter significantly reduces noise in the logs without losing relevant information.

Command executed:

pytest tests/ -v

Expected result:All tests pass without interference errors between tests.

Test code (example of test that validates isolation):

def test_cpu_isolation():
    """Verify that multiple CPU instances do not interfere with each other."""
    mmu1 = MMU()
    regs1 = CoreRegisters()
    cpu1 = CPU(mmu1, regs1)
    
    mmu2 = MMU()
    regs2 = CoreRegisters()
    cpu2 = CPU(mmu2, regs2)
    
    # Each CPU must have its own isolated state
    assert cpu1.get_cycles() == 0
    assert cpu2.get_cycles() == 0
    #...more checks...

Sources consulted

  • Bread Docs:https://gbdev.io/pandocs/- "HRAM (High RAM)", "CPU Instruction Set - HALT"
  • Technical documentation: Implementation based on general knowledge of LR35902 architecture and behavior of static variables in C++

Educational Integrity

What I Understand Now

  • HRAM Shadow Routines:Games copy critical routines to HRAM because it is accessible in all memory cycles, unlike ROM or regular RAM which can be locked during DMA or VRAM access.
  • State Isolation:Static variables in C++ persist between calls, which can cause interference between tests. Moving them to class members ensures that each instance has its own state.
  • HALT and Data Types:The value -1 cannot be represented in uint8_t, causing truncation. Using int allows you to correctly handle negative values ​​that indicate special states.

What remains to be confirmed

  • Exact behavior of HRAM:We need to check if HRAM is really accessible in all cycles or if there are specific restrictions during certain operations.
  • Impact on performance:Check if handler log filtering has any negative impact on problem diagnosis.

Hypotheses and Assumptions

We assume that the delay loop at 0xFF86-0xFF87 is always a simple DEC A/JR NZ loop and contains no critical logic. If this loop has variations or contains important logic, the filter could hide valuable information.

Next Steps

  • [ ] Analyze the [HRAM-WRITE] logs to understand what routines Pokémon Red copies to HRAM
  • [ ] Verify that the V-Blank handler completes correctly after the delay loop
  • [ ] Investigate why the game sets BGP to 0x00 and does not restore it
  • [ ] Continue debugging screen whitening in Pokémon Red