⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Visual Verification and Post-Correction Performance Analysis

Date:2025-12-29 StepID:0363 State: VERIFIED

Summary

A comprehensive performance diagnostic system was implemented to identify bottlenecks in the rendering pipeline. Added timing measurements at critical points: rendering scanlines in C++, reading framebuffer in Python, and final rendering in the renderer. This system will allow you to identify if the low FPS problem (0.1-10.8) is due to collars bottle in the rendering pipeline or other factors.

Hardware Concept

Performance in Emulators

A frame should complete in approximately 16.67ms to maintain a stable 60 FPS. The rendering pipeline must be efficient: C++ generates the framebuffer → Python reads the framebuffer → Python renders to the screen. Any bottleneck in this pipeline (>16ms at any stage) will cause low FPS.

To identify bottlenecks, it is necessary to measure the time each stage of the pipeline takes:

  • render_scanline(): Time it takes for C++ to render a complete line (expected:<1ms por línea)
  • get_frame_ready_and_reset(): Time it takes for C++ to mark the frame as ready (expected:<0.1ms)
  • Reading the framebuffer in Python: Time taken by Python to copy the C++ framebuffer (expected:<1ms)
  • Rendering in Python: Time it takes for Python to convert indices to RGB and render to screen (expected:<10ms)

Synchronization in Multi-Component Systems

When multiple components (C++ and Python) share data (framebuffer), synchronization is critical. The framebuffer must be maintained stable until the read component finishes. If the framebuffer is modified during reading, corrupted graphics may occur or white screens.

Double Buffering (Concept for Future Optimizations)

Double buffering uses two framebuffers: one for writing (C++) and one for reading (Python). When a frame is completed, they exchange the buffers. This eliminates race conditions but requires more memory. If the performance diagnostic shows that there are problems synchronization, double buffering could be a solution.

Implementation

Performance Diagnostics in C++ (PPU.cpp)

Added time measurement usingstd::chrono::high_resolution_clockin two critical functions:

  • render_scanline(): The total time it takes to render a line is measured. It is reported every 10 executions for the first 100 lines.
  • get_frame_ready_and_reset(): The time it takes to mark the frame as ready is measured. It is reported every 10 executions for the first 100 calls.
// --- Step 0363: Performance Diagnostics ---
// Measure render_scanline() time to identify bottlenecks
static int render_scanline_timing_count = 0;
auto start_time = std::chrono::high_resolution_clock::now();

// ... render_scanline() code ...

auto end_time = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);

if (render_scanline_timing_count< 100) {
    render_scanline_timing_count++;
    if (render_scanline_timing_count % 10 == 0) {
        printf("[PPU-PERF] render_scanline() (LY=%d) took %lld microseconds\n", 
               ly_, duration.count());
    }
}

Performance Diagnostics in Python (viboy.py)

The existing framebuffer read time measurement was improved so that it reports every 60 frames, allowing monitoring performance during execution without saturating the logs.

# --- Step 0363: Performance Diagnostics in Python ---
# Measure framebuffer reading time and report every 60 frames
if not hasattr(self, '_framebuffer_read_timing_count'):
    self._framebuffer_read_timing_count = 0
self._framebuffer_read_timing_count += 1

if self._framebuffer_read_timing_count % 60 == 0:
    logger.info(f"[Viboy-Perf] Frame {self._framebuffer_read_timing_count} | "
               f"Read: {read_duration:.2f}ms")
    print(f"[Viboy-Perf] Frame {self._framebuffer_read_timing_count} | "
          f"Read: {read_duration:.2f}ms", flush=True)

Performance Diagnosis in Renderer (renderer.py)

The existing rendering time measurement was improved so that it reports every 60 frames, allowing you to monitor the time that It takes Python to convert indices to RGB and render to screen.

# --- Step 0363: Renderer Performance Diagnostics ---
# Report rendering time every 60 frames
if not hasattr(self, '_render_perf_frame_count'):
    self._render_perf_frame_count = 0
self._render_perf_frame_count += 1

if self._render_perf_frame_count % 60 == 0:
    logger.info(f"[Viboy-Perf] Frame {self._render_perf_frame_count} | "
               f"Render: {render_time:.2f}ms")
    print(f"[Viboy-Perf] Frame {self._render_perf_frame_count} | "
          f"Render: {render_time:.2f}ms", flush=True)

Affected Files

  • src/core/cpp/PPU.cpp- Added performance diagnostics in render_scanline() and get_frame_ready_and_reset()
  • src/viboy.py- Improved performance diagnostics to report every 60 frames
  • src/gpu/renderer.py- Improved performance diagnostics to report every 60 frames

Tests and Verification

The C++ code was successfully compiled with the new performance measurements. A quick test was run with Tetris that confirmed that the code runs without errors, although no performance metrics were generated because the test only reached Frame 0.

Compile command:

python3 setup.py build_ext --inplace

Compilation:✅ Successful (with minor format warnings that do not affect functionality)

Quick test:Tetris was run for 30 seconds to verify that the code works. The game started correctly (Frame 0, VRAM empty at startup, normal behavior).

Next steps for full verification:Run longer tests with all 6 ROMs to generate performance metrics complete and analyze if there are bottlenecks in the rendering pipeline.

Sources consulted

  • Bread Docs:https://gbdev.io/pandocs/- Timing and synchronization
  • C++17 Documentation:std::chronofor high resolution time measurement

Educational Integrity

What I Understand Now

  • Performance measurement:It is critical to measure time at each stage of the pipeline to identify bottlenecks. Wearstd::chrono::high_resolution_clockprovides microsecond precision in C++.
  • Periodic report:To avoid saturating the logs, it is better to report performance metrics periodically (every 60 frames) instead of in each frame.
  • Rendering pipeline:The complete pipeline is: C++ render → C++ mark frame ready → Python read → Python render. Each stage must be efficient to maintain 60 FPS.

What remains to be confirmed

  • Actual metrics:We need to run longer tests to get real performance metrics and check for necks of bottle in the pipeline.
  • Visual problems:Whether the fixes in Step 0362 resolved the white screen issues, or whether visual issues persist.
  • Low SPF:If the low FPS (0.1-10.8) is due to bottlenecks in the rendering pipeline or other factors.

Hypotheses and Assumptions

Hypothesis:The low FPS could be due to bottlenecks in the rendering pipeline. If the measurement shows that any stage takes more than 16ms, that would be the cause of the problem. If all stages are fast (<16ms), el problema podría estar en otro lugar (sincronización, esperas innecesarias, etc.).

Test Log Analysis

Complete tests were run with the 6 ROMs (60 seconds each) and the generated logs were analyzed.

Performance Results

  • render_scanline(): 9-56 microseconds (average ~25 μs) - Excellent performance
  • get_frame_ready_and_reset(): 0 microseconds - Instant
  • Reading the framebuffer (Python): 0.02-0.05ms (average ~0.03ms) - Extremely fast
  • Observed FPS: 51-53 FPS (better than the 0.1-10.8 previously reported)

Critical Issue Identified

Race Condition in Framebuffer: Multiple warnings were found in all logs:

[PPU-FRAMEBUFFER-STABILITY] ⚠️ WARNING: Framebuffer changed while Python was reading it!

ROM warning statistics:

  • Mario: 24 warnings
  • Gold: 35 warnings
  • PKMN Yellow: 19 warnings
  • PKMN: 22 warnings
  • Tetris: 26 warnings
  • Zelda DX: 7291 warnings ⚠️ (VERY HIGH)

Problem Analysis

The current code protects againstcleaningfrom the framebuffer whenframebuffer_being_read_is active, butNOT write protectedof new data. This means that:

  1. Python marks the framebuffer as "being read" (framebuffer_being_read_ = true)
  2. While Python reads the framebuffer, the PPU continues rendering the next frame
  3. PPU writes new data to framebuffer while Python is reading it
  4. This causes race conditions and corrupted graphics

Root cause: The flagframebuffer_being_read_it only prevents clearing of the framebuffer, but it does not preventrender_scanline()write new data while Python is reading.

Empty Frames

Completely empty frames (0 non-white pixels) were found in several games, especially in Gold. This may be normal game behavior (black screens during transitions) or a timing issue.

Full report available at: docs/logs/entries/2025-12-29__0363__analisis-logs-reporte.md

Next Steps

  • [✅] Run full visual tests with all 6 ROMs
  • [✅] Analyze generated logs to confirm that the fixes work
  • [✅] Analyze performance metrics (conclusion: excellent performance, not the problem)
  • [✅] Identify critical issue: framebuffer race condition
  • [ ] Implement double buffering to eliminate race conditions (Step 0364)
  • [ ] Investigate why Zelda DX has so many warnings (7291 vs 19-35 in other games)
  • [ ] Check if empty frames are normal behavior or a problem