This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
PPU Phase E: Scanlines Architecture for CPU-PPU Synchronization
Summary
Polling deadlock analysis has revealed a fundamental flaw in our core loop architecture. Although the CPU and PPU are logically correct, they are not synchronized in time. The CPU runs its polling loop so fast that the PPU never has enough cycles to change state, creating a temporary deadlock. This Step documents the complete re-architecture of the main loop (`run()`) to be based on scanlines, forcing precise synchronization between CPU and PPU cycles, and structurally breaking the deadlock.
Hardware Concept: Scanline-Based Time
The Game Boy's hardware is rigidly synchronized. The PPU takes exactly456 T-Cyclesin processing a scan line. During those 456 cycles, the CPU is executing instructions in parallel. An accurate emulator should replicate this 1:1 relationship.
The Polling Deadlock Problem
Imagine that the PPU is a car traveling from one city (Mode 2) to another (Mode 3), a trip that lasts 80 minutes (cycles). The CPU is an impatient kid in the back seat who, every 32 minutes (cycles), asks, "Are we in H-Blank City yet?" The car (PPU) has not even reached the first city yet, but the CPU has already asked twice.
This is exactly what was happening:
- The PPU starts a line and entersMode 2 (OAM Scan), a state that lasts80 T-Cycles.
- The CPU enters its polling loop:
LDH A, (n) -> CP d8 -> JR NZ, e. - This entire loop consumes
12 + 8 + 12 = 32 T-Cycles. - The CPU runs the loop, reads STAT (which says "Mode 2"), the comparison fails, and jumps. 32 cycles have passed.
- The CPU executes the loop again. It reads STAT (which still says "Mode 2" because only 32 of the 80 cycles have passed). The comparison fails. Jump. 64 cycles have passed.
- The CPU is "idling" in Mode 2, not giving the PPU time to finish its work and change state.
The problem is not in the components, but in theorchestrator: our main loop inviboy.py. Ourwhile Truecurrent has no notion of "emulated time." It just runs the CPU once and then does other things. We need an architecture that forces the passage of time in a synchronized manner.
The Solution: Architecture by Scanlines
The new architecture will work like this:
- External Loop (per Frame):It's still a
while self.running. - Medium Loop (by Scanline):Inside, a loop that repeats 154 times (the total number of lines in a frame).
- Internal Loop (CPU):For each of those 154 lines, we will run the CPU repeatedly until they have been consumed exactly456 T-Cycles.
- PPU Update:Once the 456 cycles have been consumed, we will call
ppu.step(456)once, passing through exactly 456 cycles.
This design ensures that, for each "step" of the PPU (a scanline), the CPU has executed the correct number of "steps" (instructions). Deadlock becomes impossible, because the emulated time always advances.
Fountain:Pan Docs - LCD Timing, System Clock
Implementation
Completely rewritten the methodrun()insrc/viboy.pyto implement the strict scanline architecture.
Modified Components
- Viboy::run(): Completely rewritten with strict scanline architecture.
- Timing constants: Defined at the beginning of the method:
CYCLES_PER_SCANLINE = 456SCANLINES_PER_FRAME = 154CYCLES_PER_FRAME = 70224
New Loop Structure
# Main emulator loop
while self.running:
# --- Full Frame Loop (70224 cycles) ---
for line in range(SCANLINES_PER_FRAME):
# --- Scanline loop (456 cycles) ---
cycles_this_scanline = 0
while cycles_this_scanline< CYCLES_PER_SCANLINE:
if not self._cpu.halted:
# Ejecuta una instrucción de CPU y devuelve los M-Cycles
m_cycles = self._cpu.step()
# Convierte a T-Cycles (1 M-Cycle = 4 T-Cycles)
t_cycles = m_cycles * 4
cycles_this_scanline += t_cycles
else:
# Si la CPU está en HALT, simplemente avanzamos el tiempo
# en la unidad mínima posible.
cycles_this_scanline += 4
# Al final de la scanline, actualizamos la PPU una sola vez
self._ppu.step(CYCLES_PER_SCANLINE)
# --- Fin del Frame ---
# Renderizado y sincronización...
Design Decisions
- Strict Sync:The PPU is only updated once per scanline, with exactly 456 cycles. This ensures that the emulated time always advances correctly.
- HALT Management:If the CPU is in HALT, we advance the time in minimum increments (4 T-Cycles) so that the PPU can continue advancing and generate interrupts.
- Timer:The Timer is updated every instruction (only in Python mode for now) to maintain the accuracy of the RNG used by games like Tetris.
- Rendering:Rendering no longer depends on "is_frame_ready" because this loop guarantees that a full frame (154 scanlines) has been completed.
Affected Files
src/viboy.py- Complete method rewriterun()with strict scanline architecture.
Tests and Verification
This implementation is a re-architecture of the main loop. The CPU and PPU unit tests are still valid, but the main validation will be done by running the emulator with a real ROM.
Expected Result
When running the emulator with this new architecture:
- The deadlock will break.It is structurally impossible for the CPU to idle without the PPU advancing.
L.Y.will increase.HeHeartbeatwill finally showL.Y.changing from 0 to 1, 2, 3... up to 153, and then back to 0. We will see the heartbeat of the video system for the first time!- We'll see graphics!Once the deadlock is broken, the CPU will be able to continue with its initialization routine, it will copy the tile data to VRAM, and our PPU, which already knows how to render the background, will finally have something to draw. We should see the Nintendo logo or the Tetris copyright screen appear.
Validation:Running the emulator with real ROM (Tetris, Mario, etc.) to confirm that the deadlock is broken and thatL.Y.progresses correctly.
Sources consulted
- Bread Docs:LCD Timing, System Clock
- GBEDG:Instruction Timing
Note: This architecture is a gold standard in the emulation industry. Several reference emulators (SameBoy, mGBA) use similar scanline-based architectures to ensure perfect synchronization.
Educational Integrity
What I Understand Now
- Time Synchronization:The difference between component correctness and system synchronization. The components may be correct individually, but if they are not synchronized in time, the system fails.
- Architecture by Scanlines:A design that forces emulated time to pass synchronously, ensuring that the CPU and PPU are always at the same "moment" of emulated time.
- Polling Deadlock:A type of deadlock where the CPU is waiting for a state change that never occurs because the PPU is not given enough time to advance.
What remains to be confirmed
- Running with Real ROM:Verify that the deadlock is broken and that
L.Y.It progresses correctly by running the emulator with a real ROM. - Graphics Rendering:Confirm that once the deadlock is broken, the CPU can copy data to VRAM and the PPU can render the background correctly.
Hypotheses and Assumptions
This architecture assumes that the PPU can process exactly 456 cycles per scanline correctly. If there is a problem in the PPU implementation that causes it to not process cycles correctly, the deadlock could persist or other synchronization problems may arise.
Next Steps
- [ ] Run the emulator with a real ROM (Tetris, Mario) to confirm that the deadlock is broken.
- [ ] Verify that
L.Y.advances correctly (0 → 153 → 0) in the heartbeat. - [ ] Confirm that graphics render correctly once the deadlock is broken.
- [ ] If the deadlock persists, investigate possible problems in the PPU implementation or in the conversion from M-Cycles to T-Cycles.