⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Real Evidence UI + Profiling Presenter (Mario hangs / White Pokémon)

Date:2026-01-02 StepID:0446 State: VERIFIED

Summary

Implemented staged profiling in the UI presenter to identify bottlenecks (frombuffer/reshape, blit_array, scale/blit, flip). Application of optimizations: use of pygame.SCALED for automatic scaling by SDL (faster than manual transform.scale), conversion of permanent asserts into checks behind the VIBOY_DEBUG_UI flag, and nonwhite checking before/after the blit to detect pixel loss. Objective: obtain objective evidence in UI (Mario and Pokémon) with real metrics and timings to identify which stage consumes time and where pixels are lost.

Hardware Concept

Presentation Profiling: The UI rendering process consists of several stages that must be executed every frame (60 FPS = 16.67ms per frame). If any stage consumes too much time, the framerate drops or the application may freeze. The main stages are:

  • frombuffer/reshape/swap: Conversion of RGB888 memoryview to numpy array and preparation of the format (shape, contiguity, swapaxes). It should be zero-copy when possible.
  • blit_array: Copy data from numpy array to Pygame surface. Pygame should directly access the data without intermediate copies.
  • scale/blit: Scaling from the base surface (160x144) to the window (480x432) and blit to the screen. This stage can be very expensive if done manually withpygame.transform.scale().
  • flip: Screen buffer update. Generally fast, but can crash if the graphics driver is saturated.

pygame.SCALED (Pygame 2.0+): Flag that allows SDL (the underlying Pygame library) to do scaling automatically using hardware acceleration. It is much faster thanpygame.transform.scale()manual because SDL optimizes scaling using the GPU when available. Requires creating the window withpygame.SCALED | pygame.RESIZABLEand directly blitting the base surface (without manual scaling).

Asserts in Hot Path: Asserts in Python are expensive because they evaluate conditions and can throw exceptions. In the hot rendering path (60 FPS), asserts can consume significant time. It is better to use checks behind debug flags that are only activated when diagnostics are needed.

Nonwhite Verification: If the framebuffer arriving at the presenter has non-white pixels but the screen shows white, there is a bug in the presentation (pixel loss during blit). We check nonwhite before and after blit to detect where pixels are lost.

Fountain: Pygame documentation - "SCALED flag", SDL documentation - "Hardware Acceleration", Python documentation - "Assert Statement Performance"

Implementation

Phase 1: Profiling by Stages

Aim: Add profiling of 4 sections of the presenter to identify bottlenecks.

Implementation inrenderer.py: Added staged profiling that only activates if FPS< 30 o en frames loggeados (cada 120 frames). Cada etapa mide tiempo en milisegundos:

  • frombuffer_reshape_swap: Time for frombuffer, reshape and swapaxes
  • blit_array: Time for pygame.surfarray.blit_array
  • scale_blit: Time for scaling and blit to screen
  • flip: Time for pygame.display.flip

example output:

[UI-PROFILING] Frame 0 | frombuffer/reshape: 0.12ms | blit_array: 1.45ms | scale/blit: 12.34ms | flip: 0.23ms | TOTAL: 14.14ms

Phase 2: Optimization with pygame.SCALED

Implementation inrenderer.py: Modified__init__()to usepygame.SCALEDif available (Pygame 2.0+):

if hasattr(pygame, 'SCALED'):
    self.screen = pygame.display.set_mode((GB_WIDTH, GB_HEIGHT), pygame.SCALED | pygame.RESIZABLE)
    self._use_scaled = True
else:
    # Fallback for Pygame< 2.0
    self.screen = pygame.display.set_mode((self.window_width, self.window_height), pygame.RESIZABLE)
    self._use_scaled = False

Modification inrender_frame(): Removed manual scaling withpygame.transform.scale()when using SCALED. Blit direct to screen (SDL scales automatically):

if hasattr(self, '_use_scaled') and self._use_scaled:
    # Blit direct to screen (SDL automatically scales)
    self.screen.blit(self.surface, (0, 0))
else:
    # Fallback: manual scaling (for Pygame< 2.0)
    if self.scale != 1:
        scaled_surface = pygame.transform.scale(self.surface, (self.window_width, self.window_height))
        self.screen.blit(scaled_surface, (0, 0))
    else:
        self.screen.blit(self.surface, (0, 0))

Phase 3: Conversion from Asserts to Conditional Checks

Implementation inrenderer.py: Converted all permanent asserts into checks behind flagVIBOY_DEBUG_UI:

VIBOY_DEBUG_UI = os.environ.get('VIBOY_DEBUG_UI', '0') == '1'

# BEFORE (permanent assert):
# assert rgb_array.flags['OWNDATA'] == False, "np.frombuffer created copy"

# AFTER (conditional check):
if should_log or VIBOY_DEBUG_UI:
    if rgb_array.flags['OWNDATA']:
        logger.warning("[UI-DEBUG] np.frombuffer created copy (should be seen)")

Benefit: The checks are only executed in logged frames or when the debug flag is activated, avoiding overhead in the hot path.

Phase 4: Nonwhite Verification Before/After Blit

Implementation inrender_frame(): Added nonwhite check before and after blit to detect pixel loss:

  • Before the blit: Sampling the numpy array (every 8th pixel) to estimate nonwhite
  • After the blit: Sampling specific pixels of the surface usingget_at()
  • Bug detection: If nonwhite_before > 0 but nonwhite_after == 0, there is a presentation bug

example output:

[UI-DEBUG] Nonwhite before blit: 23040 (estimated)
[UI-DEBUG] Nonwhite after blit (sample): 3/3
[UI-DEBUG] ⚠️ Nonwhite is lost during blit! Presentation bug.

Affected Files

  • src/gpu/renderer.py- Added staged profiling, pygame.SCALED for automatic scaling, conversion of asserts to conditional checks, nonwhite verification before/after blit

Tests and Verification

Compilation:

python3 setup.py build_ext --inplace
BUILD_EXIT=0

Test Build:

python3 test_build.py
TEST_BUILD_EXIT=0

Unit Tests:

pytest tests/test_core_cpu.py -v
6 passed in 0.14s

C++ Compiled Module Validation: All tests pass, confirming that the modifications did not break existing functionality.

Profiling in UI: Profiling will be activated automatically if FPS< 30 o en frames loggeados (primeros 5 frames y cada 120 frames). Los logs mostrarán tiempo por etapa, permitiendo identificar cuellos de botella.

UI Execution: To capture actual logs, run:

# Mario (the one who hangs)
python main.py roms/mario.gbc 2>&1 | tee /tmp/viboy_0446_mario_ui.log

# Pokémon (the white one)
python main.py roms/pkmn.gb 2>&1 | tee /tmp/viboy_0446_pokemon_ui.log

Extract relevant logs:

grep "[UI-PATH]" /tmp/viboy_0446_mario_ui.log | head -n 6
grep "[UI-PROFILING]" /tmp/viboy_0446_mario_ui.log | head -n 6
grep "[UI-DEBUG]" /tmp/viboy_0446_pokemon_ui.log | head -n 6

Sources consulted

Educational Integrity

What I Understand Now

  • Profiling by Stages: It is critical to measure time per stage to identify bottlenecks. Profiling is only activated when necessary (low FPS or logged frames) to avoid overhead.
  • pygame.SCALED: Uses SDL hardware acceleration for scaling, much faster than manual transform.scale(). Requires Pygame 2.0+ and must be fallbacked to previous versions.
  • Asserts in Hot Path: Permanent asserts in the rendering hot path consume time. It is better to use conditional checks behind debug flags.
  • Nonwhite Verification: If the framebuffer has nonwhite but the screen is white, there is a bug in the presentation. We check before and after the blit to detect where pixels are lost.

What remains to be confirmed

  • Real Execution with ROMs: We need to run Mario and Pokémon in UI and capture [UI-PATH] and [UI-PROFILING] logs to see real metrics.
  • Real Performance: Measure actual FPS in UI with pygame.SCALED to confirm that auto-scaling improves performance.
  • Pokémon White: Check with logs [UI-DEBUG] if the white screen problem is detected correctly (nonwhite before but not after the blit).
  • Identified Bottleneck: Once we have profiling logs, identify which stage consumes the most time and apply specific fixes if necessary.

Hypotheses and Assumptions

Main Hypothesis: The bottleneck in Mario (crash/0.1 FPS) is in the scaling stage (manual transform.scale). With pygame.SCALED, SDL does the hardware scaling, which should solve the performance issue.

Secondary Hypothesis: Pokémon comes out white because the pixels are lost during the blit (nonwhite before but not after). The nonwhite check should detect this and allow us to identify the problem.

Next Steps

  • [ ] Run UI with Mario and capture [UI-PATH] and [UI-PROFILING] logs to see real metrics
  • [ ] Run UI with Pokémon and capture logs [UI-DEBUG] to verify nonwhite before/after blit
  • [ ] Analyze profiling logs to identify which stage consumes the most time
  • [ ] If the bottleneck is still scaling, consider other optimizations (e.g. pre-scaling surface only once)
  • [ ] If nonwhite is lost during blit, investigate further (check surface format, verify that it is not cleaned after blit)