⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

PPU Phase B: Framebuffer and Rendering in C++

Date:2025-12-19 StepID:0124 State: Verified

Summary

After getting the Pygame window to appear and refresh at 60 FPS (Step 0123), the next critical step was to implement the framebuffer and pixel rendering in C++. The window was blank because although the PPU was advancing correctly (LY cycled from 0 to 153), there was no actual graph data to display.

In this step, thePPU Migration Phase B- The framebuffer with color indices (0-3) and a simplified renderer that generates a test gradient pattern. This allows you to verify that the entire data pipeline is working correctly:CPU C++ → PPU C++ → Framebuffer C++ → Cython MemoryView → Python Pygame.

Expected result:A screen with a diagonal gradient pattern that updates at 60 FPS, confirming that the framebuffer is being written and displayed correctly. Once confirmed, the next step will be to replace the test gradient with the actual rendering of tiles.

Hardware Concept

On the real Game Boy, the PPU renders each scanline in real time, generating 160 pixels per line and 144 visible lines per frame. Each pixel is represented as acolor index (0-3)which is mapped to a final color using the BGP palette (Background Palette, register 0xFF47).

Pixel format:The Game Boy uses a 2bpp (2 bits per pixel) format, allowing for 4 possible colors (0, 1, 2, 3). These indices are not direct RGB colors, but references to a configurable 4-color palette. The BGP register contains 4 pairs of bits that define what shade of gray (on the original Game Boy) corresponds to each index:

  • Bits 0-1:Color for index 0 (typically white)
  • Bits 2-3:Color for index 1 (typically light gray)
  • Bits 4-5:Color for index 2 (typically dark gray)
  • Bits 6-7:Color for index 3 (typically black)

Advantage of index format:Storing color indices (uint8_t) instead of full RGB colors (uint32_t ARGB) has multiple advantages:

  • Lower memory usage:1 byte per pixel vs 4 bytes (75% reduction)
  • Flexibility:Changing BGP palette updates all pixels without re-rendering
  • Efficiency:Conversion to RGB only happens once in Python, not every frame in C++
  • Zero-Copy:Python can read indexes directly from C++ memory without copies

Fountain:Pan Docs - Background Palette (BGP), Tile Data, 2bpp Format

Implementation

The implementation was divided into 4 main components:

1. Framebuffer in C++ (PPU.hpp / PPU.cpp)

Changed the framebufferstd::vector<uint32_t>(ARGB32) tostd::vector<uint8_t>(color indices):

// In PPU.hpp
std::vector<uint8_t> framebuffer_;  // Color indices (0-3)

// In PPU.cpp (constructor)
framebuffer_(FRAMEBUFFER_SIZE, 0) // Initialize to index 0 (blank by default)

// Method to get pointer
uint8_t* get_framebuffer_ptr() {
    return framebuffer_.data();
}

2. Simplified Renderer (PPU.cpp)

A method was implementedrender_scanline()simplified that generates a test diagonal gradient pattern:

void PPU::render_scanline() {
    if (ly_ >= VISIBLE_LINES) {
        return;
    }
    
    // Diagonal gradient pattern: (ly_ + x) % 4
    int line_start_index = static_cast<int>(ly_) * SCREEN_WIDTH;
    for (int x = 0; x < SCREEN_WIDTH; ++x) {
        framebuffer_[line_start_index + x] = static_cast<uint8_t>((ly_ + x) % 4);
    }
}

This pattern is called automatically when the PPU enters Mode 0 (H-Blank) within a visible line, ensuring that each line is rendered exactly once.

3. Zero-Copy Exposure to Python (ppu.pyx)

Updated the Cython wrapper to expose the framebuffer as a 1D memoryview of uint8_t:

@property
def framebuffer(self):
    """
    Gets the framebuffer as a color index memoryview (Zero-Copy).
    The framebuffer is organized in rows: pixel (y, x) is at index [y * 160 + x].
    """
    cdef uint8_t* ptr = self._ppu.get_framebuffer_ptr()
    cdef unsigned char[:] view = <unsigned char[:144*160]>ptr
    return view

Important note:Cython memoryviews do not supportreshape()directly, so a 1D array is returned and Python calculates the index manually using[y * 160 + x].

4. Python Renderer (renderer.py)

Method updatedrender_frame()to read the indexes from the C++ framebuffer, apply the BGP palette, and render in Pygame:

# Get framebuffer as memoryview (Zero-Copy)
frame_indices = self.cpp_ppu.framebuffer # 1D array of 23040 elements

# Read and decode BGP palette
bgp = self.mmu.read_byte(IO_BGP) & 0xFF
palette = [
    PALETTE_GREYSCALE[(bgp >> 0) & 0x03],
    PALETTE_GREYSCALE[(bgp >> 2) & 0x03],
    PALETTE_GREYSCALE[(bgp >> 4) & 0x03],
    PALETTE_GREYSCALE[(bgp >> 6) & 0x03],
]

# Create surface and apply palette
frame_surface = pygame.Surface((GB_WIDTH, GB_HEIGHT))
with pygame.PixelArray(frame_surface) as pixels:
    for and in range(GB_HEIGHT):
        for x in range(GB_WIDTH):
            idx = y * GB_WIDTH + x # Calculate 1D index
            color_index = frame_indices[idx] & 0x03
            rgb_color = palette[color_index]
            pixels[x, y] = rgb_color

Design Decisions

  • Indices vs RGB:Color indices were chosen to reduce memory and allow palette changes without re-rendering. Conversion to RGB happens only once in Python.
  • Memoryview 1D:Although a 2D array would be more intuitive, Cython memoryviews do not support reshape. Manual calculation of the index[y * 160 + x]It is trivial and does not affect performance.
  • Test pattern:Implemented a simple diagonal gradient to verify that the framebuffer works before implementing the actual rendering of tiles. This allows the entire data chain to be validated without the additional complexity of tile decoding.

Affected Files

  • src/core/cpp/PPU.hpp- Changed framebuffer from uint32_t to uint8_t
  • src/core/cpp/PPU.cpp- Simplified render_scanline() implementation
  • src/core/cython/ppu.pxd- Get_framebuffer_ptr() signature update
  • src/core/cython/ppu.pyx- Expose framebuffer as memoryview uint8_t
  • src/gpu/renderer.py- Updated render_frame() to use indexes and apply palette

Tests and Verification

Compilation:The C++ extension compiled successfully without errors (only minor warnings for unused variables in legacy methods).

Expected result when running:When runningpython main.py tu_rom.gbc, you should see:

  • A Pygame window that refreshes at 60 FPS
  • A visible diagonal gradient pattern (not white screen)
  • The console logs show that LY cycles from 0 to 153 correctly

Compiled C++ module validation:The framebuffer is correctly exposed as a memoryview and Python can read the indexes without copies (Zero-Copy).

Next validation step:Once confirmed that the gradient pattern is displayed correctly, the test code will be replaced with the actual rendering of Background, Window and Sprites from VRAM.

Sources consulted

Educational Integrity

What I Understand Now

  • Framebuffer with indexes:Storing color indices (0-3) instead of full RGB colors reduces memory and allows dynamic palette changes without re-rendering.
  • Zero-Copy with Cython:Cython's memoryviews allow Python to directly access C++ memory without copies, essential for achieving 60 FPS without bottlenecks.
  • Separation of responsibilities:C++ takes care of the heavy computing (rendering scanlines), Python takes care of the presentation (apply palette and display in Pygame).
  • Test pattern:Implementing a simple (gradient) pattern first allows you to validate the entire data chain before adding the complexity of actual rendering.

What remains to be confirmed

  • Actual performance:Verify that accessing the framebuffer via memoryview does not cause significant overhead in the rendering loop.
  • Actual rendering:Once confirmed that the framebuffer works, implement the actual rendering of Background, Window and Sprites from VRAM.
  • Synchronization:Ensure that scanline rendering occurs at the correct time in the PPU cycle (Mode 0 H-Blank).

Hypotheses and Assumptions

Main assumption:The diagonal gradient pattern(ly_ + x) % 4will produce a distinctive visual pattern that will allow you to verify that LY is advancing correctly and that the framebuffer is being written and displayed. If this pattern is displayed correctly, the entire data chain works and we can proceed with the actual rendering.

Next Steps

  • [ ] Verify that the gradient pattern is displayed correctly in the window
  • [ ] Confirm that LY cycles from 0 to 153 and that the framebuffer is updated at 60 FPS
  • [ ] Replace test code with actual Background rendering from VRAM
  • [ ] Implement Window and Sprite rendering
  • [ ] Optimize framebuffer access if necessary (profiling)