This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
PPU Phase C: Real Tiles Rendering from VRAM
Summary
After the success of Phase B, which confirmed that the framebuffer and rendering pipeline are working correctly by displaying a test pattern at 60 FPS, this step implements theactual rendering of Background tiles from VRAM. To make this possible, indirect memory writing instructions were also implemented:LDI (HL), A(0x22),LDD (HL), A(0x32), andRHP (HL), A(0x77).
Tile rendering reads the graphics data that the game CPU writes to VRAM during initialization, decodes the tiles in 2bpp (2 bits per pixel) format, and draws them in the framebuffer applying scroll (SCX/SCY) and respecting the LCDC register settings (tilemap base, tile data base, signed/unsigned addressing).
This is the step that turns the test engine into atrue visual emulator- PPU can now display actual game graphics instead of test patterns.
Hardware Concept
The Background rendering on the Game Boy works through a system oftilemapsandtile data. Games store their graphics as 8x8 pixel tiles in VRAM, and then arrange these tiles into a 32x32 tile map to form the entire screen.
VRAM structure
The VRAM (Video RAM) of the Game Boy is 8KB (0x8000-0x9FFF) and is organized into two main regions:
- Tile Data(0x8000-0x97FF): Contains the graphic data of the tiles in 2bpp format (16 bytes per tile)
- Tile Maps(0x9800-0x9BFF and 0x9C00-0x9FFF): They contain indexes of tiles that form the Background map
Tile format (2bpp)
Each tile occupies 16 bytes (8 lines × 2 bytes per line). Each pixel is encoded in 2 bits:
- Byte 1: Low bits of each pixel (LSB)
- Byte 2: High bits of each pixel (MSB)
- pixel color:
(high_bit << 1) | low_bit→ values 0-3 (palette index)
Pixels are read from left to right, with bit 7 being the leftmost pixel.
Tiles Addressing
The LCDC register (0xFF40) controls how tiles are addressed:
- Bit 4 = 1: Tile Data at 0x8000 (unsigned addressing: tile IDs 0-255)
- Bit 4 = 0: Tile Data at 0x8800 (signed addressing: tile IDs -128 to 127, tile 0 at 0x9000)
Indirect Writing Instructions
In order for the CPU to write tile data to VRAM, it needs instructions that write to indirect memory using HL as a pointer:
- LDI (HL), A (0x22): Write A in (HL) and then increment HL. Useful for sequential memory copies (equivalent to
*HL++ = Ain C). - LDD (HL), A (0x32): Write A in (HL) and then decrement HL. Less common, but useful for reverse direction copying (equivalent to
*HL-- = Ain C). - RHP (HL), A (0x77): Write A in (HL) without modifying HL. The most common instruction for indirect writing (equivalent to
*HL = Ain C).
These instructions are essential during game initialization, when copying graphics data from ROM to VRAM.
Scanline Rendering Process
For each visible line (LY = 0-143), the PPU:
- Reads SCY and SCX (scroll registers) to determine which part of the tilemap to display
- For each pixel X on the screen (0-159):
- Calculate the position on the tilemap:
map_x = (x + SCX) % 256,map_y = (LY + SCY) % 256 - Read the tile ID of the tilemap at that position
- Calculates the address of the tile in VRAM according to the addressing (signed/unsigned)
- Decodes the specific pixel of the tile (reads 2 bytes, extracts the corresponding bit)
- Write the color index (0-3) to the framebuffer
- Calculate the position on the tilemap:
Implementation
Two main components were implemented: the indirect write instructions on the CPU and the actual rendering of scanlines on the PPU.
CPU: Indirect Write Instructions
Three new cases were added in theswitchofCPU::step()insrc/core/cpp/CPU.cpp:
- LDI (HL), A (0x22): Writes
regs_->ain the directionregs_->get_hl(), then increments HL with wrap-around. - LDD (HL), A (0x32): Similar to LDI, but decrements HL after writing.
- RHP (HL), A (0x77): It was already implemented in the LD block r, r', so it was not duplicated (just a comment added).
All of these instructions consume 2 M-Cycles according to Pan Docs specifications.
PPU: Real Scanline Rendering
Method completely replacedPPU::render_scanline()insrc/core/cpp/PPU.cpp. The new code:
- Verify that the LCD is enabled (LCDC bit 7)
- Read configuration registers: SCY, SCX, LCDC
- Determines the tilemap and tile data bases according to LCDC
- For each pixel X (0-159):
- Calculate position in the tilemap with scroll
- Read tile ID from tilemap
- Calculates tile address in VRAM (signed/unsigned)
- Read 2 bytes from the tile line
- Decodes the specific pixel by extracting the corresponding bits
- Write color index (0-3) to the framebuffer
Design decisions:
- Direct access to MMU is used to read VRAM (PPU has full memory access)
- Each pixel is calculated individually instead of decoding entire tiles (initial simplicity, can be optimized later)
- Color index format (0-3) is maintained in the framebuffer, palette application is done in Python
Affected Files
src/core/cpp/CPU.cpp- Added LDI (HL), A (0x22) and LDD (HL), A (0x32) instructionssrc/core/cpp/PPU.cpp- Replaced render_scanline() with actual tile rendering implementationtests/test_core_cpu_indirect_writes.py- New file with 6 tests to validate indirect writing instructions
Tests and Verification
A new tests file was createdtest_core_cpu_indirect_writes.pywith 6 unit tests:
- test_ldi_hl_a: Verify that LDI (HL), A writes correctly and increments HL
- test_ldi_hl_a_wrap_around: Verifies that LDI handles wrap-around correctly (0xFFFF → 0x0000)
- test_ldd_hl_a: Verify that LDD (HL), A writes correctly and decrements HL
- test_ldd_hl_a_wrap_around: Verifies that LDD handles wrap-around correctly (0x0000 → 0xFFFF)
- test_ld_hl_a: Verify that LD (HL), A writes correctly without modifying HL
- test_ldi_sequence: Verifies a sequence of multiple LDIs to simulate a copy loop
Execution result:
$ pytest tests/test_core_cpu_indirect_writes.py -v
============================= test session starts =============================
collected 6 items
tests/test_core_cpu_indirect_writes.py::TestLDIndirectWrites::test_ldi_hl_a PASSED
tests/test_core_cpu_indirect_writes.py::TestLDIndirectWrites::test_ldi_hl_a_wrap_around PASSED
tests/test_core_cpu_indirect_writes.py::TestLDIndirectWrites::test_ldd_hl_a PASSED
tests/test_core_cpu_indirect_writes.py::TestLDIndirectWrites::test_ldd_hl_a_wrap_around PASSED
tests/test_core_cpu_indirect_writes.py::TestLDIndirectWrites::test_ld_hl_a PASSED
tests/test_core_cpu_indirect_writes.py::TestLDIndirectWrites::test_ldi_sequence PASSED
============================== 6 passed in 0.06s ==============================
Compiled C++ module validation:All tests use the compiled native module (viboy_core) and verify that the instructions are executed correctly with the precise timing (2 M-Cycles).
Key fragment of the test:
def test_ldi_hl_a(self):
"""Check LDI (HL), A (0x22)"""
mmu = PyMMU()
regs = PyRegisters()
cpu = PyCPU(mmu, regs)
regs.pc = 0x8000
regs.a = 0xBE
regs.hl = 0xC000
mmu.write(0x8000, 0x22) # LDI (HL), A
cycles = cpu.step()
assert mmu.read(0xC000) == 0xBE
assert regs.hl == 0xC001
assert cycles == 2
Sources consulted
- Pan Docs - CPU Instruction Set: Specification of LDI (HL), A, LDD (HL), A and LD (HL), A instructions, including timing (M-Cycles)
- Pan Docs - Video Display: Explanation of the tile format (2bpp), VRAM structure, and Background rendering process
- Bread Docs - LCDC Register: Description of the bits that control tile addressing (bit 4) and tilemap selection
Educational Integrity
What I Understand Now
- 2bpp format: I understand how tiles are stored as 16 bytes (8 lines × 2 bytes), where each pixel is encoded in 2 bits distributed in two separate bytes.
- Tiles Addressing: The difference between signed and unsigned addressing and how it affects the calculation of the tile address in VRAM.
- Scroll: How SCX and SCY allow you to move the tilemap visible on the screen, creating the camera effect.
- Indirect Writing Instructions: How LDI/LDD are essential for efficient memory copy loops in game initialization code.
What remains to be confirmed
- Rendered with real ROMs: We need to test the emulator with real ROMs (like Tetris) to verify that the graphics are rendered correctly. This will validate that the entire chain (CPU writes VRAM → PPU reads VRAM → renders) works.
- color palette: Although the framebuffer contains indices 0-3, we need to apply the BGP (Background Palette) in Python to see the correct colors.
- Performance optimization: The current rendering decodes pixel by pixel. We could optimize by decoding entire lines of tiles and caching them.
Hypotheses and Assumptions
We assume that the signed addressing calculation is correct: when LCDC bit 4 = 0, tile 0 is at 0x9000, which means that the IDs are interpreted as int8_t and calculated astile_data_base + (signed_tile_id * 16). This needs validation with real ROMs.
Next Steps
- [ ] Test the emulator with real ROMs (Tetris, Mario) to verify that the graphics render correctly
- [ ] Implement BGP palette application in Python renderer to display correct colors (currently only indices 0-3)
- [ ] Optimize scanline rendering (decode entire lines of tiles instead of pixel by pixel)
- [ ] Implement Window rendering (opaque layer over Background)
- [ ] Implement Sprite rendering (OBJ - Objects) to display moving elements