⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

HALT Architecture: "Fast Forward" to the Next Event

Date:2025-12-20 StepID:0172 State: ✅ VERIFIED

Summary

The polling deadlock has been solved by the scanlines architecture, but it has revealed a more subtle deadlock: the CPU executes the instructionHALTand our main loop does not advance time efficiently, keepingL.Y.stuck in0. This Step documents the implementation of a managementHALTintelligent that "fast-forwards" the time to the end of the current scanline, correctly simulating a standby CPU while the rest of the hardware (PPU) continues to function.

Hardware Concept: HALT and Event Synchronization

The instructionHALT(opcode0x76) puts the CPU into a low-power state. The CPU stops executing instructions and waits for an interrupt to occur. However, the rest of the hardware (such as the PPU)it doesn't stop. The system clock continues to "beat".

HALT's Crawling Problem

Our previous simulation ofHALTIt was too simplistic:

else: # If CPU is HALT
    cycles_this_scanline += 4

This is terribly inefficient. We are simulating a "sleeping" CPU by advancing time at a snail's pace (4 cycles at a time). It would take 114 iterations of our Python loop just to complete one scanline. Meanwhile, theHeartbeatit shoots, it shows usLY=0, and makes us believe that the system is frozen. It is not frozen; is crawling.

HeHALTof the hardware does not "crawl". The CPU stops, but the rest of the system (the PPU) continues running at full speed until the next event. We must simulate this.

The Solution: "Fast Forward" to the Next Event

When the CPU goes intoHALT, we should not advance the time by 4 cycles at a time. We must calculate how many cycles are left until the next significant event (the end of the scanline) and advance the time in a single jump. This is a critical optimization and a much more accurate simulation of hardware behavior.

Fountain:Pan Docs - HALT behavior, Interrupts

Implementation

Two main components were modified to implement intelligent management ofHALT:

A. Signal HALT from C++

First, we needCPU::step()inform us that you have entered into statusHALT. We will use a special (negative) return value for this.

Insrc/core/cpp/CPU.cpp, we modify the case0x76(HALT) and PHASE 2 of HALT management:

// ========== PHASE 2: HALT Management ==========
// If the CPU is in HALT, do not execute instructions
// Return -1 to signal the orchestrator to "fast forward"
// until the next event (end of scanline)
if (halted_) {
    cycles_ += 1;
    return -1;  // Special code: signals HALT for fast forward
}

// ... inside the switch (opcode)
case 0x76://HALT
    halted_ = true;
    cycles_ += 1;  // HALT consumes 1 M-Cycle
    return -1;  // Special code: signals HALT for fast forward

B. Modify viboy.py to Handle the HALT Signal

Now, the orchestrator in Python reacts to this signal:

# In src/viboy.py, inside the run() method

while cycles_this_scanline< CYCLES_PER_SCANLINE:
    # Ejecuta una instrucción de CPU y devuelve los M-Cycles
    # m_cycles puede ser negativo (-1) si la CPU entra en HALT
    m_cycles = self._cpu.step()
    
    if m_cycles == -1:
        # ¡La CPU ha entrado en HALT!
        # "Avance Rápido": Calculamos los ciclos restantes para
        # completar la scanline y los añadimos de un solo golpe.
        remaining_cycles_in_scanline = CYCLES_PER_SCANLINE - cycles_this_scanline
        t_cycles = remaining_cycles_in_scanline
        cycles_this_scanline += t_cycles
    else:
        # Instrucción normal: convertir M-Cycles a T-Cycles
        t_cycles = m_cycles * 4
        cycles_this_scanline += t_cycles

Design Decisions

  • Special Return Value:We use-1as a special code to signal HALT. This is safe because no normal instruction returns a negative M-Cycles value.
  • Fast Forward:When we detect HALT, we calculate the remaining cycles in the current scanline and add them in one fell swoop. This correctly simulates that the CPU is asleep but the rest of the hardware is still running.
  • Compatibility:The Cython wrapper already returnsint, so we don't need to modify it.

Affected Files

  • src/core/cpp/CPU.cpp- Modified to return-1when it enters HALT (case0x76and PHASE 2).
  • src/viboy.py- Modified the main loop to handle special code-1and perform fast forward.
  • tests/test_core_cpu_interrupts.py- Updated testtest_halt_stops_executionand added new testtest_halt_instruction_signals_correctly.

Tests and Verification

Interruption tests were executed to validate the new behavior ofHALT:

Command Executed

pytest tests/test_core_cpu_interrupts.py::TestHALT -v

Result

tests/test_core_cpu_interrupts.py::TestHALT::test_halt_stops_execution PASSED
tests/test_core_cpu_interrupts.py::TestHALT::test_halt_instruction_signals_correctly PASSED
tests/test_core_cpu_interrupts.py::TestHALT::test_halt_wakeup_on_interrupt PASSED

============================== 3 passed in 0.05s ==============================

Test Code

The new testtest_halt_instruction_signals_correctlyvalidates that:

def test_halt_instruction_signals_correctly(self):
    """
    Step 0172: Verify that HALT (0x76) activates the 'halted' flag and
    which step() returns -1 to point it out.
    """
    mmu = PyMMU()
    regs = PyRegisters()
    cpu = PyCPU(mmu, regs)
    
    # Configure
    mmu.write(0x0100, 0x76) # HALT
    regs.pc = 0x0100
    
    assert cpu.get_halted() == 0, "CPU must not be HALT initially"
    
    # Run
    cycles = cpu.step()
    
    # Check
    assert cycles == -1, "step() must return -1 to signal HALT"
    assert cpu.get_halted() == 1, "The 'halted' flag must be set"
    assert regs.pc == 0x0101, "PC must have moved forward 1 byte"

Compiled C++ module validation:All tests pass correctly, confirming that the compiled C++ module works as expected.

Sources consulted

Educational Integrity

What I Understand Now

  • HALT and Emulated Time:When the CPU goes into HALT, it doesn't mean that time stops. The rest of the hardware (PPU, Timer, etc.) continues to work. Our simulation should reflect this.
  • Fast Forward Optimization:Instead of advancing time by 4 cycles during HALT, we can calculate the remaining cycles until the next event and advance in one fell swoop. This is more efficient and more accurate.
  • Signaling between Components:Use special return values ​​(such as-1) is an elegant way to communicate special states between the C++ core and the Python orchestrator.

What remains to be confirmed

  • Running with Real ROM:Verify that with this new architecture, when the game enters HALT waiting for V-Blank, time advances correctly andL.Y.increases.
  • HALT Awakening:Confirm that when the PPU generates a V-Blank interrupt, the CPU correctly wakes up from HALT and continues execution.

Hypotheses and Assumptions

This implementation assumes that the next significant event after HALT is always the end of the current scanline. This is correct for most cases, but there may be situations where we want to advance to a more specific event (such as a Timer interruption). For now, moving to the end of the scanline is sufficient and correct.

Next Steps

This is the moment of truth. With this new architecture:

  1. The game will enterHALTto wait for V-Blank.
  2. Ourrun()will detect it, it will advance time to the end of the current scanline.
  3. ppu.step(456)will be called, andL.Y.will increase.
  4. This will be repeated for each line. We will see in theHeartbeatasL.Y.cycles from 0 to 153.
  5. WhenL.Y.reaches 144, the PPU will generate a V-Blank interrupt.
  6. handle_interrupts()on the C++ CPU will detect it and wake up the CPU fromHALT.
  7. The game will continue to run.

If everything goes well,we should see the Nintendo logo or the Tetris copyright screen for the first time.