This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
PPU Phase B – Scanline and Framebuffer Rendering
Summary
Line-by-line rendering (scanline rendering) of the PPU was implemented in C++, adding the ability to generate real pixels for Background and Window. The framebuffer is exposed as a NumPy memoryview for Zero-Copy transfer to Python/Pygame, which allows you to achieve performances of thousands of potential FPS instead of the limited 30 FPS from the pure Python implementation. This is Phase B of the PPU migration, which transforms the timing unit into a complete pixel processing unit.
Hardware Concept
On the Game Boy, the PPU renders the screen line by line during mode 3 (Pixel Transfer). Instead of drawing the entire screen at the end of a frame, the PPU draws each line of 160 pixels when entering H-Blank (Mode 0), just after completing Mode 3.
Scanline Rendering
Scanline Renderingis the technique of rendering a complete line of pixels at once. On the Game Boy, this occurs during the H-Blank period, when the PPU is fully loaded. the information necessary for that line. This approach is critical to performance, since allows processing only 160 pixels per line instead of processing the entire screen (23,040 pixels) at the end of the frame.
Tiles format 2bpp
Game Boy tools are stored in2bpp(2 bits per pixel), which allows 4 colors per tile. Each tile is a block of 8x8 pixels that occupies 16 bytes in VRAM:
- 2 bytes per line (one line = 8 pixels)
- Byte 1: Low bits of each pixel (bit 7 = pixel 0, bit 6 = pixel 1, ..., bit 0 = pixel 7)
- Byte 2: High bits of each pixel (same order)
- Pixel color = (bit_high << 1) | low_bit (values 0-3)
Background and Window
The screen is made up of two main layers:
- Background: Base layer that can be scrolled using SCX/SCY (X/Y scroll).
- Windows: Opaque layer that is drawn above the Background but below the Sprites. Used for HUDs and sticky menus.
Both layers use the same tile format and the same BGP palette (Background Palette), but They can use different tilemaps depending on the LCDC bits.
Framebuffer and Zero-Copy
The framebuffer is a flat array of 160 * 144 = 23,040 pixels in ARGB32 format (32 bits per pixel).
By exposing it as aCython memoryview, Python can directly access the
C++ memory without copying data. This allows you to usepygame.surfarray.blit_array()for
transfer the entire framebuffer to the GPU in a single block operation, resulting in a
Extremely high transfer performance.
Fountain:Pan Docs - Background, Window, Tile Data, 2bpp Format, LCD Control Register
Implementation
Added the framebuffer and rendering methods to the PPU class in C++. The framebuffer is initialized to blank and updated line by line when the PPU enters H-Blank after completing Mode 3 (Pixel Transfer).
Components created/modified
- PPU.hpp: Added framebuffer_ (std::vector<uint32_t>), VRAM/tilemaps constants, render_scanline(), render_bg(), render_window(), decode_tile_line(), get_framebuffer_ptr() methods.
- PPU.cpp: Full implementation of scanline rendering with 2bpp decoding, scrolling, palettes, and tile cache optimization.
- ppu.pxd: Added get_framebuffer_ptr() and uint32_t to import.
- ppu.pyx: Added framebuffer property that exposes the framebuffer as a memoryview for Zero-Copy.
- tests/test_core_ppu_rendering.py: Suite of 4 tests to validate rendering of Background, scroll, Window, and framebuffer memoryview.
Design decisions
Rendered in H-Blank:Rendering is executed when the PPU enters Mode 0
(H-Blank) after completing Mode 3. A flag is usedscanline_rendered_to avoid
render the same line multiple times if multiple calls to step() are made during the same
H-Blank.
Pixel-by-pixel rendering:Although less efficient than rendering by tiles, we render pixel by pixel to correctly handle sub-pixel scrolling (when SCX is not multiple of 8). Added cache optimization to avoid decoding the same tile multiple times times within the same line.
Fixed gray palette:A palette of 4 shades of gray is used (White, Light Gray, Dark Gray, Black) corresponding to the BGP values. This is enough for original Game Boy (DMG). Color (CGB) support will be implemented in a later phase.
Memoryview Zero-Copy:The framebuffer is exposed as a memoryview of uint32_t, allowing Python to directly access C++ memory without copies. This is critical for the performance, allowing 23,040 pixels (92 KB) to be transferred to the GPU in a single operation.
Affected Files
src/core/cpp/PPU.hpp- Added framebuffer and rendering methodssrc/core/cpp/PPU.cpp- Implementation of render_bg(), render_window(), decode_tile_line()src/core/cython/ppu.pxd- Added get_framebuffer_ptr()src/core/cython/ppu.pyx- Added framebuffer (memoryview) propertytests/test_core_ppu_rendering.py- Complete suite of rendering tests
Tests and Verification
A complete unit test suite was created that validates line-by-line rendering:
- Unit tests:pytest with 4 tests passing
Test: Simple Background Rendering
def test_bg_rendering_simple_tile(self):
# Write an all black tile to VRAM 0x8000
# Set tilemap to use tile ID 0
# PPU advances to H-Blank
# Verify that the first pixel is black (0xFF000000)
Result:✅ PASSED
Test: Background Scroll
def test_bg_rendering_scroll(self):
# Create two tiles (black and white)
# Set SCX=8 to scroll the background
# Verify that the first visible pixel is from the correct tile
Result:✅ PASSED
Test: Window Rendering
def test_window_rendering(self):
# Set Background with black tile
# Set Window with white tile to (0,0)
# Verify that the Window overwrites the Background
Result:✅ PASSED
Test: Framebuffer as Memoryview
def test_framebuffer_memoryview(self):
# Verify that the framebuffer can be converted to a numpy array
# without copying data (Zero-Copy)
Result:✅ PASSED
Command executed:
pytest tests/test_core_ppu_rendering.py -v
Result:4 passed in 0.11s
Validation:Compiled C++ module validated with native tests.
Sources consulted
- Bread Docs:LCD Timing, Background, Window, Tile Data, 2bpp Format
- Bread Docs:LCD Control Register (LCDC), Background Palette (BGP)
- Cython Documentation:Memoryviews and Zero-Copy
Educational Integrity
What I Understand Now
- Scanline Rendering:The technique of rendering line by line instead of frame by frame allows you to optimize performance, processing only 160 pixels at a time instead of 23,040.
- 2bpp format:Tiles are stored with 2 bits per pixel, where each line of 8 pixels occupies 2 consecutive bytes (one for low bits, one for high bits).
- Zero-Copy Transfer:By exposing the C++ framebuffer as a memoryview, Python can directly access memory without copying data, allowing extremely fast transfers to the GPU.
- Sub-pixel scroll:Scroll (SCX/SCY) can scroll the background in 1 pixel increments, not just 8 pixels (full tiles), which requires rendering pixel by pixel to properly handle scrolling.
What remains to be confirmed
- Sprite Rendering:The sprites (OBJ) will be rendered in a later phase, applying the same scanline rendering techniques but with priority and transparency logic.
- Tiles cache optimization:The current optimization is basic. It could be improved by caching entire tiles instead of just lines, or even pre-decoding tiles when they are written to VRAM.
- CGB support:The current palette is gray. Color (CGB) support will require additional RGB555 palettes and VRAM banks.
Hypotheses and Assumptions
Rendering in H-Blank is assumed to be the correct time, as this is when the PPU has completed Mode 3 (Pixel Transfer) and has all the necessary information for the line. This matches with actual hardware behavior, where the PPU draws during the pixel transfer period and "rest" during H-Blank.
Next Steps
- [ ] Sprite Rendering (OBJ) - Priority, Transparency, and Special Attributes
- [ ] More aggressive tile cache optimization
- [ ] Framebuffer integration with Pygame for real visualization
- [ ] CGB palette support (RGB555) for Game Boy Color
- [ ] Validation with real ROMs to verify correct rendering