⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Correct Profiling Measurement + Headless/UI Comparison + Minimum Fix

Date:2026-01-02 StepID:0448 State: VERIFIED

Summary

Fixed profiling system to correctly measure stage_sum_ms, frame_wall_ms and pacing_ms separately. Improved [UI-PATH] logging to include core metrics (PC, VRAM_nonzero, LCDC, BGP, LY) and NonWhite calculated with serious sampling (16x16 grid). Post-blit verification with decent sampling (64 points, 8×8 grid). Creation of headless vs UI comparison script to produce comparative table and decide with numerical evidence if the problem is presenter/UI or core. Log spam control with strict gating.

Hardware Concept

Correct Profiling: The previous profiling measured accumulated time incorrectly, generating inconsistent values (TOTAL: 421ms vs sum of stages: 5.88ms). Correct profiling must measure:

stage_sum_ms: Sum of the individual render stages (frombuffer/reshape, blit_array, scale/blit, flip)
frame_wall_ms: Total time of the entire frame (wall-clock time from start to finish)
pacing_ms: Waiting/synchronization time (difference between wall and stages: wall - stages)

NonWhite Sampling: Previous sampling was insufficient (only 3 pixels). To obtain reliable estimates:

Grid 16×16: 256 sample points to calculate NonWhite before blit
Grid 8×8: 64 sample points to verify NonWhite after blit

Headless vs UI Comparison: To decide if the problem is presenter/UI or core, the framebuffer generated by the core (headless) is compared vs the framebuffer presented by the UI. If headless has NonWhite > 0 but UI before ~0, the problem is how UI gets the framebuffer. If headless has NonWhite > 0 and UI before > 0 but UI after ~0, the problem is with presenter/blit.

Implementation

Phase A: Fix Profiling

Archive: src/gpu/renderer.py

Changes:

Measureframe_wall_startat the beginning of the entire frame (before any stage)
Measure each individual stage:frombuffer_ms, blit_ms, scale_blit_ms, flip_ms
Calculatestage_sum_ms= sum of individual stages
Calculateframe_wall_ms= total time sinceframe_wall_startuntil the end
Calculatepacing_ms = frame_wall_ms - stage_sum_ms
Formatted log:[UI-PROFILING] Frame N | stages=Xms (frombuf=A blit=B scale=C flip=D) | wall=Yms | pacing=Zms

Phase B: Improve Logging [UI-PATH]

Archive: src/gpu/renderer.py, src/viboy.py

Changes:

Add parametermetricsoptional torender_frame()
Create helper function_sample_vram_nonzero()inViboy(sampling every 16th byte, same as headless tool)
Collect metrics onviboy.py: PC, VRAM_nonzero, LCDC, BGP, LY
Pass metrics torender_frame()from all calls
Improve NonWhite sampling: 16×16 grid = 256 points (before it was every 64th pixel ≈ 3 points)
Formatted log:[UI-PATH] F4 | Path=cpp_rgb_view | PC=6152 | LCDC=E3 | BGP=FC | LY=90 | VRAMnz=2028 | NonWhite=23040 | Hash=abc12345 | wall=16.7ms

Phase C: Enhanced Post-Blit Verification

Archive: src/gpu/renderer.py

Changes:

Replace sampling of 3 pixels per 8×8 grid = 64 points
Sample fromself.surface.get_at()after the blit
Calculatenonwhite_after_totalestimated (multiply by density)
Comparenonwhite_sample(before, 16×16 grid) vsnonwhite_after_total(later, 8×8 grid)
Detect significant loss: yesbefore > 1000andafter< 100, issue warning

Phase D: Headless vs UI Comparison Script

Archive: tools/compare_headless_vs_ui_0448.sh

Functionality:

Run headless tool for each ROM (Mario, Pokémon, Tetris, Zelda DX)
Run UI with 15s timeout for each ROM
Extract metrics from both (NonWhite, VRAM_nonzero, PC_end)
Generate comparison table:ROM | headless NonWhite | UI NonWhite_before | UI NonWhite_after | VRAMnz | PC_end
Allow automatic decision based on table:
- headless NonWhite > 0 and UI before > 0 but UI after ~0 → bug presenter/blit
- headless NonWhite > 0 and UI before ~0 → bug in how UI gets framebuffer
- Both 0 but VRAMnz high → PPU/pallet bug
- Both 0 and VRAMnz ~0 and PC stuck → CPU/ROM exec bug

Phase E: Log Spam Control

Archive: src/gpu/renderer.py

Verification:

Strict gating:should_log = (self._path_log_count< 5) or (self._path_log_count % 120 == 0)
VIBOY_DEBUG_UIdefault OFF (os.environ.get('VIBOY_DEBUG_UI', '0') == '1')
Logs[UI-PATH]and[UI-PROFILING]only withinshould_logor when FPS< 30

Affected Files

src/gpu/renderer.py- Profiling fix, improved logging [UI-PATH], improved post-blit verification
src/viboy.py- Helper function_sample_vram_nonzero(), pass metrics to render_frame()
tools/compare_headless_vs_ui_0448.sh- Headless vs UI comparison script (created)

Tests and Verification

Compilation:

python3 setup.py build_ext --inplace
# BUILD_EXIT=0 ✓

python3 test_build.py
# TEST_BUILD_EXIT=0 ✓

Native Validation: Successfully compiled C++ module, functional Python-C++ interface.

Comparison Script: Script created and executable. Requires manual execution with ROMs to generate comparative table.

Sources consulted

Pan Docs: Memory Map, I/O Registers (for VRAM, LCDC, BGP, LY metrics)
Step 0442: Tool headless (tools/rom_smoke_0442.py) - Reference for VRAM_nonzero sampling

Educational Integrity

What I Understand Now

Correct Profiling: To measure performance correctly, you must separate the real work time (stages) from the total time (wall), and the waiting/synchronization time (pacing). This allows us to identify whether the bottlenecks are in the render (high stages) or in the synchronization (high pacing).
Representative Sampling: A sampling of 3 pixels is not enough to estimate NonWhite. A 16x16 (256 points) or 8x8 (64 points) grid provides much more reliable estimates, especially when comparing before/after blit.
Headless vs UI Comparison: To diagnose presentation problems, it is crucial to compare what the core generates (headless) vs what the UI presents. If headless has data but UI doesn't, the problem is in the presenter. If headless has no data, the problem is in the core.

What remains to be confirmed

Running the Comparison Script: The script is created but requires manual execution with real ROMs to generate the comparison table and make decisions based on numerical evidence.
Final Diagnosis: Once the comparison script is executed, you can decide with certainty if the problem is presenter/UI (blit, format, surface) or core (CPU/VRAM/PPU does not produce an image for that ROM).

Next Steps

[ ] Run headless vs UI comparison script with real ROMs (Mario, Pokémon, Tetris, Zelda DX)
[ ] Analyze comparative table and make decision based on numerical evidence
[ ] If the problem is presenter/UI: investigate and correct bug in blit/format/surface
[ ] If the problem is core: investigate why PPU does not produce an image for that specific ROM