⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Warp Drive Operation: Decrement Monitor and Delay Loop Validation

Date:2025-12-25 StepID:0277 State: draft

Summary

This Step implements "Operation Warp Drive" to validate the delay loop identified in Step 0276. Previous analysis revealed that the game is NOT pulling hardware, but rather running a delay loop for DE registry-based software. The loop decrements DE until it reaches 0, and then continues execution.

Specific instrumentation was added at three critical points: (1) capture of the initial DE load on PC:0x614A, (2) monitoring the DE decrement every 1000 iterations in PC:0x6150, and (3) loop exit detection when the PC leaves the range 0x614A-0x6155. The goal is to confirm that DE is decreasing correctly, how long is missing from the loop, and validate that the DEC DE instruction is working correctly.

Hardware Concept

On the Game Boy, software delay loops are a common technique for creating temporary pauses without using timer hardware or interrupts. These loops work by decrementing a 16-bit register until it reaches 0, consuming CPU cycles predictably.

1. Software Delay Loops

A typical delay loop on the Game Boy works like this:

  1. Initial charge:A value is loaded into a pair of registers (ex: DE = 0x2000)
  2. Loop:The register pair is decremented (DEC DE)
  3. Verification:It is checked if the pair reached 0 (using OR or ADD to combine the bytes and verify flags)
  4. Repetition:If not 0, loop repeats

Code example:

LD OF, 0x2000 ; Load initial value
.loop:
    DEC DE ; Decrease DE
    LD A, D ; Load D at A
    OR E ; A = D | E (if D=0 and E=0, then A=0 and Z=1)
    JR NZ, .loop ; Jump if Z=0 (if DE != 0)

2. Real Time Calculation

The time a delay loop takes depends on:

  • Initial value:If DE is loaded with 0xFFFF, the loop will execute 65,536 iterations
  • Cycles per iteration:Each iteration consumes several M-Cycles (T-Cycles/4)
  • CPU frequency:The Game Boy runs at ~4.19 MHz (4,194,304 Hz)

Calculation example:If each iteration consumes 10 T-Cycles and DE is loaded with 0x2000 (8,192):

  • Total T-Cycles: 8,192 × 10 = 81,920 T-Cycles
  • Real time: 81,920 / 4,194,304 ≈ 19.5 ms

In an emulator, if the loop is poorly implemented or if the 16-bit ALU has a bug, DE may not decrement correctly, causing an infinite loop (the "stuck illusion").

3. DEC DE Validation (Opcode 0x1B)

The instructionDEC DE(opcode 0x1B) decrements the DE register pair by 1. According to the specification from LR35902:

  • Cycles:2 M-Cycles (8 T-Cycles)
  • Flags:It does NOT affect flags (unlike DEC r which does affect Z, N, H)
  • Wrap-around:If DE = 0x0000, after DEC DE, DE = 0xFFFF (wrap-around in 16 bits)

It is critical that this instruction work correctly because many delay loops depend on it. Yes DEC DE does not decrement correctly, the loop becomes infinite and the game freezes.

Fountain:Bread Docs -CPU Instruction Set- DEC rr

Implementation

Three instrumentation points were implemented inCPU.cppto monitor the delay loop:

1. DE Initial Load Capture (PC:0x614A)

In the case0x11(LD DE, nn), added a check to detect when the original PC (before the fetch) it was 0x614A. When this condition is detected, the value being loaded into DE is printed along with the bytes memory from which it is read (0x614B and 0x614C, in little-endian format).

case 0x11://LD DE, d16
{
    uint16_t value = fetch_word();
    regs_->set_of(value);
    
    // Step 0277: Capture initial DE load on PC:0x614A
    if (saved_pc_for_instrumentation == 0x614A) {
        printf("[SNIPER-LOAD] PC:0x614A | Loading FROM with value: 0x%04X ...\n", value);
    }
    ...
}

2. Decrement Monitor (PC:0x6150)

In the case0x1B(DEC DE), added a monitor that prints the DE status every 1000 iterations when the original PC was 0x6150. This allows you to verify that DE is decreasing correctly without saturating the log with too many lines.

case 0x1B://DEC DE
{
    dec_16bit(1);  // 1 = OF
    
    // Step 0277: Monitor decrement every 1000 iterations
    if (saved_pc_for_instrumentation == 0x6150) {
        static uint32_t loop_counter = 0;
        loop_counter++;
        if (loop_counter % 1000 == 0) {
            printf("[SNIPER-DELAY] Iteration:%u | DE:0x%04X | LY:%d DIV:0x%02X\n", ...);
        }
    }
    ...
}

3. Loop Exit Trigger

At the beginning ofstep(), before processing interrupts, a check was added to detect when the PC leaves the range 0x614A-0x6155. This indicates that the delay loop has ended and the game continues with the normal execution.

// Step 0277: Loop exit trigger
static uint16_t last_pc_in_loop = 0;
if (last_pc_in_loop >= 0x614A && last_pc_in_loop<= 0x6155 && 
    !(regs_->pc >= 0x614A && regs_->pc<= 0x6155)) {
    printf("[SNIPER-EXIT] ¡LIBERTAD! El bucle de retardo ha terminado...\n");
}

Components created/modified

  • CPU.cpp:The method was modifiedstep()to add the three instrumentation points
  • CPU.cpp:Added a static variablesaved_pc_for_instrumentationto trace the original PC before fetch

Design decisions

Using static variable for original PC:A static variable is usedsaved_pc_for_instrumentationwhich is updated at the start ofstep()with the original PC (before the fetch). This variable is accessible from switch cases, allowing you to verify whether a specific instruction was executed in a critical address.

Sampling every 1000 iterations:The decrement monitor only prints every 1000 iterations to avoid saturate the log. If DE is loaded with 0xFFFF, this will generate approximately 65 log lines, which is manageable.

Loop Exit Verification:It is verified if the PC leaves the range 0x614A-0x6155 by comparing the PC current with the PC of the previous iteration. This allows you to detect when the game exits the loop without the need to instrument each statement outside the loop.

Affected Files

  • src/core/cpp/CPU.cpp- Modified methodstep()to add specific monitors: load DE (0x614A), decrement (0x6150), and exit loop (0x614A-0x6155)
  • src/core/cpp/CPU.cpp- Modified case0x11(LD DE, nn) ​​to capture initial charge
  • src/core/cpp/CPU.cpp- Modified case0x1B(DEC DE) to monitor decrease

Tests and Verification

The verification will be carried out by running Pokémon Red and analyzing the generated logs:

  • Command executed: python main.py roms/pkmn.gb
  • Expected result:You should see three types of messages in the log:
    • [SNIPER-LOAD]: Displays the initial value loaded in DE
    • [SNIPER-DELAY]: Displays the status of DE every 1000 iterations (if DE is loaded with 0xFFFF, there should be ~65 messages)
    • [SNIPER-EXIT]: Indicates that the loop has ended and the game continues

DEC DE Validation

It was verified that the instructionDEC DE(opcode 0x1B) is correctly implemented in the code:

case 0x1B://DEC DE
{
    dec_16bit(1);  // 1 = OF
    cycles_ += 2;
    return 2;
}

The functiondec_16bit(1)is correctly implemented and decrements DE using wrap-around in 16 bits:

case 1: { // FROM
    uint16_t de = regs_->get_de();
    of = (of - 1) & 0xFFFF;
    regs_->set_de(de);
    break;
}

Compiled C++ module validation:✅ Successful build without linter errors. The instrumentation It is ready to be tested in execution.

Sources consulted

Note: Implementation based on general knowledge of LR35902 architecture and Pan Docs specifications.

Educational Integrity

What I Understand Now

  • Software Delay Loops:They are a common technique for creating temporary pauses without using timer hardware or interrupts. They work by decrementing a 16-bit register until it reaches 0.
  • Real time calculation:The time a delay loop takes depends on the initial value, the cycles per iteration, and the CPU frequency (~4.19 MHz). If DE is loaded with 0xFFFF, the loop may take several milliseconds to complete.
  • The "stuck illusion":If a delay loop is loaded with a very large value (ex: 0xFFFF), it may appear that the game is frozen when in reality it is just waiting for the loop finish. This is especially problematic in emulators if the 16-bit ALU has a bug.
  • DEC DE does not affect flags:Unlike DEC r (which affects Z, N, H), DEC DE (and other DEC rr) It does NOT affect flags. This is important for the logic of delay loops.

What remains to be confirmed

  • Initial DE value:What value is loaded into DE in PC:0x614A? If it is 0xFFFF, the loop will take ~65,536 iterations. If it is a smaller value, the loop will end faster.
  • Correct decrement:Is DE decreasing correctly? If the [SNIPER-DELAY] messages show that DE does not change or stays stuck at a value, there is a bug in dec_16bit().
  • Loop output:Does the game exit the loop when DE reaches 0? If we see [SNIPER-EXIT], We confirm that the loop ended correctly. If we don't see it, the game could be stuck for another reason.

Hypotheses and Assumptions

Main hypothesis:The "lock" is actually a software delay loop that should terminate when DE reaches 0. If the game is still stuck after DE reaches 0, the problem is somewhere else. part (possibly in the logic that follows after the loop, or in a PPU synchronization problem).

Assumption about initial value:We assume that DE is loaded with a reasonable value (not 0xFFFF), but this needs to be confirmed with the [SNIPER-LOAD] logs. If DE is loaded with 0xFFFF, the loop will take a while considerable to complete.

Next Steps

  • [ ] Run Pokémon Red and analyze the [SNIPER-LOAD] logs to see what value is loaded into DE
  • [ ] Verify that DE is decreasing correctly using the [SNIPER-DELAY] logs
  • [ ] Confirm that the loop ends when DE reaches 0 (look for [SNIPER-EXIT])
  • [ ] If DE is not decreasing, investigate and fix the bug in dec_16bit()
  • [ ] If the loop ends but the game is still stuck, investigate what happens after the loop
  • [ ] Calculate the actual time the loop takes based on the initial value of DE and cycles per iteration