⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Migration of PPU (Timing and State) to C++

Date:2025-12-19 StepID:0111 State: Filled

Summary

The timing and status logic of the PPU (Pixel Processing Unit) was migrated to C++, implementing the state engine that manages the PPU modes (0-3), the LY register, V-Blank and STAT interrupts. This is Phase A of the PPU migration, focused in precise timing without pixel rendering (which will be Phase B). The implementation maintains all the critical timing logic from v0.0.1 but now runs on native code to avoid the Python-C++ context switching bottleneck.

Hardware Concept

The PPU (Pixel Processing Unit) of the Game Boy is responsible for generating the video signal and maintain screen synchronization. In this first phase, we focus only in thetiming motor, which is critical for the precise timing of emulation at 60 FPS.

Scanlines Timing

The Game Boy screen has 144 visible lines (0-143) followed by 10 lines of V-Blank (144-153), for a total of 154 lines per frame. Each scan line takes exactly456 T-Cycles(clock cycles), which gives a total of70,224 T-Cycles per frame(~59.7 FPS).

PPU modes

Each visible line (0-143) is divided into 3 modes representing different phases of the rendering process:

  • Mode 2 (OAM Search): 0-79 cycles. The PPU looks for sprites in OAM (Object Attribute Memory). The CPU is locked from OAM during this period.
  • Mode 3 (Pixel Transfer): 80-251 cycles (172 cycles). The PPU draws pixels by reading VRAM. The CPU is locked from VRAM and OAM.
  • Mode 0 (H-Blank): 252-455 cycles (204 cycles). Horizontal rest. The CPU can freely access VRAM.
  • Mode 1 (V-Blank): Lines 144-153 complete. Vertical rest. The CPU can freely access VRAM during the entire V-Blank.

Critical Records

  • LY (0xFF44): Current line (0-153). Read only from software.
  • LYC (0xFF45): Line comparator. When LY == LYC, STAT bit 2 is set.
  • STAT (0xFF41): LCD status. Bits 0-1 = current mode, bit 2 = LYC match, bits 3-6 = interrupt enables.
  • LCDC (0xFF40): LCD control. Bit 7 = LCD enabled (if off, PPU stops and LY=0).
  • IF (0xFF0F): Interrupt flags. Bit 0 = V-Blank, bit 1 = STAT.

Fountain:Pan Docs - LCD Timing, V-Blank, STAT Register, LCD Control Register

Implementation

The class was createdPPUin C++ that replicates all the timing logic of the Python implementation, but running in native code. The PPU uses injection of dependencies (receives a pointer to MMU) to access I/O registers and request interruptions.

Components created/modified

  • PPU.hpp: Declaration of the PPU class with timing constants and public/private methods.
  • PPU.cpp: Complete implementation of the timing engine, mode and interruption management.
  • ppu.pxd: Cython definitions for the C++ class.
  • ppu.pyx: Python wrapper that exposes PyPPU with properties and methods.
  • native_core.pyx: Includes ppu.pyx to make PyPPU available from viboy_core.
  • setup.py: Added PPU.cpp to the list of build sources.
  • tests/test_core_ppu_timing.py: Complete suite of 8 tests to validate the native implementation.

Design decisions

  • Dependency injection: The PPU receives a pointer to MMU in the constructor. It does not own the MMU, it only uses it, avoiding ownership problems and allowing the MMU Existing Python can be shared with other components.
  • Type of cycles: The methodstep()receiveintinstead of small types (uint8_t, uint16_t) to avoid overflow. The tests They advance thousands of cycles at a time to simulate complete frames.
  • Internal clock as uint32_t: The internal counterclock_isuint32_tto be able to accumulate up to ~70K cycles per frame without overflow. Initially wasuint16_t, causing subtle bugs when multiple lines were processed at once.
  • STAT Management: The PPU reads and writes directly to STAT usingmmu_->read/write(). In the Python version there were special methods (write_byte_internal) to avoid recursion, but in C++ the MMU is simple and does not have that complexity.
  • Rising Edge Detection: STAT interrupts are fired only at "rising edge" (when the condition goes from False to True), controlled by the flagstat_interrupt_line_. This prevents multiple interruptions on the same line.

Critical technical details

  • LCD Enabled Check: If bit 7 of LCDC is 0, the PPU stops completely: It does not accumulate cycles or advance lines. LY stays at 0.
  • Frame wrap-around: When LY > 153, it resets to 0 and starts a new frame.
  • V-Blank interrupt: Activated when LY == 144, writing bit 0 of IF. This ALWAYS happens, regardless of the IME status (allows manual polling).
  • STAT interrupt: Activated according to 4 conditions (LYC match, Mode 0/1/2 enable), writing bit 1 of IF. It only fires on rising edge.

Affected Files

  • src/core/cpp/PPU.hpp- PPU class declaration
  • src/core/cpp/PPU.cpp- Timing engine implementation
  • src/core/cython/ppu.pxd- Cython Definitions
  • src/core/cython/ppu.pyx-Python Wrapper
  • src/core/cython/native_core.pyx- Includes ppu.pyx
  • setup.py- Added PPU.cpp to build
  • tests/test_core_ppu_timing.py- Test suite (8 tests, all passing)

Tests and Verification

A complete suite of tests was created that validates all critical aspects of timing:

  • test_ly_increment: Verifies that LY is incremented correctly after 456 T-Cycles.
  • test_ly_increment_partial: Verifies that LY does not change with less than 456 cycles.
  • test_vblank_trigger: Validates that the V-Blank interrupt is triggered when LY == 144.
  • test_frame_wrap: Verifies that LY is reset to 0 after line 153.
  • test_ppu_modes: Validates that PPU modes are updated correctly based on timing.
  • test_lyc_match_stat_interrupt: Check STAT interrupts when LY == LYC.
  • test_lcd_disabled: Validates that the PPU stops when the LCD is disabled.
  • test_multiple_frames: Verifies processing of multiple complete frames.

Result:✅ All 8 tests pass correctly after correcting the overflow ofclock_(changed fromuint16_ttouint32_t).

Sources consulted

Educational Integrity

What I Understand Now

  • Accurate timing is critical: The PPU timing motor must be extremely accurate because games rely on timing to refresh graphics during V-Blank. An error in a cycle can cause visual glitches or synchronization failures.
  • Subtle overflow: The use of small types (uint16_t) can cause bugs subtle when many cycles are processed at once. Change touint32_tfor the clock internally resolved an issue where LY would reset incorrectly after rendering full frames.
  • Separation of responsibilities: The C++ PPU only manages timing and state. The rendering of pixels will come in Phase B. This separation allows the timing to be validated before adding complexity.

What remains to be confirmed

  • Actual performance: Although the code is now native, we have not yet measured the impact real in FPS when the PPU is integrated with the native CPU. The next phase will be to connect PPU and CPU in the main loop.
  • MMU Python support: The C++ PPU uses MMU C++ directly. When we integrate With the complete system, we will need to ensure that both MMUs (Python and C++) are synchronized, or migrate completely to MMU C++.

Hypotheses and Assumptions

Timing logic migrated from Python is assumed to be correct, as it has been extensively validated in v0.0.1. The tests confirmed that the behavior is identical. The only difference is that now runs in native code, eliminating the overhead of Python but maintaining the same logic.

Next Steps

  • [ ] Phase B: Pixel Rendering- Implement the rendering engine in C++ that generates the framebuffer pixel by pixel. This will include tile decoding, background rendering, window and sprites.
  • [ ] Integration with main loop- Connect native PPU with native CPU in the loop main emulation to measure the real impact on performance.
  • [ ] MMU Synchronization- Solve how to sync MMU Python with MMU C++ when both components need memory access, or decide to migrate completely to MMU C++.