This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Stack Math Implementation (0xE8, 0xF8, 0xF9)
Summary
This Step implements the three critical Stack Math instructions that were missing from the CPU:ADD SP, e (0xE8), RH HL, SP+e (0xF8)andLD SP, HL (0xF9).
Diagnostics for Step 0267 revealed that the Stack Pointer was corrupted (SP:210A, pointing to ROM). The most likely cause was the lack of these instructions, which games use to manage local variables on the stack.
The implementation is surgical: the H and C flags are calculated based on the low byte of SP (as if it were an 8-bit sum), not in the full 16-bit result. This specific behavior of the LR35902 hardware is critical for accuracy.
Hardware Concept
The Game Boy has some strange but vital instructions for the C language (and for Pokémon): operate the Stack Pointer (SP) as if it were a normal data record.
ADD SP, e (0xE8)
Adds a signed value (positive or negative) to the SP. It is used to reserve or free space for local variables on the stack.
The trap:The flagshandcThey are calculated based on the low byte (as if it were an 8-bit sum), not the 16-bit result! It is a very specific behavior of the LR35902 CPU.
- Z: Always 0 (reset).
- N: Always 0 (it is sum).
- h: Carry from bit 3 (low nibble). Formula:
((sp & 0xF) + (offset & 0xF)) > 0xF. - c: Carry from bit 7 (low byte). Formula:
((sp & 0xFF) + (offset & 0xFF)) > 0xFF.
Timing:4 M-Cycles (16 T-Cycles).
RH HL, SP+e (0xF8)
Calculates the address of a variable on the stack and puts it on HL. Uses the same weird flag logic as ADD SP, e.
Important:SP is NOT modified. It is only used for calculation.
Timing:3 M-Cycles (12 T-Cycles).
LD SP, HL (0xF9)
Move HL to SP. Essential for restoring the stack after temporary operations.
Flags:Does not affect flags.
Timing:2 M-Cycles (8 T-Cycles).
Why are they critical?
If these instructions are missing or poorly implemented, the SP ends up pointing to Narnia (or ROM 0x210A, as we saw in Step 0267), and the game explodes. C compilers generate code that uses these instructions constantly to:
- Reserve space for local variables:
ADD SP, -8(reserve 8 bytes). - Access local variables:
RHP HL, SP+4(accesses the variable at offset +4). - Restore battery:
LD SP, HL(restore SP from a temporary registry).
Fountain:Pan Docs - "CPU Instruction Set", "ADD SP, r8", "LD HL, SP+r8", "LD SP, HL"
Implementation
The three opcodes were implemented in the methodstep()ofCPU.cpp, just before the CB prefix (0xCB).
Case 0xE8: ADD SP, e
case 0xE8: // ADD SP, e
{
// Read signed offset
uint8_t offset_raw = fetch_byte();
int8_t offset = static_cast<int8_t>(offset_raw);
// Save original SP for flag calculation
uint16_t sp_old = regs_->sp;
uint8_t sp_low = sp_old & 0xFF;
// Calculate new SP
uint16_t sp_new = (sp_old + offset) & 0xFFFF;
regs_->sp = sp_new;
// Calculate flags (CRITICAL: based on low byte)
regs_->set_flag_z(false); // Z: always 0
regs_->set_flag_n(false); // N: always 0
// H: Half-carry from bit 3
uint8_t offset_unsigned = static_cast<uint8_t>(offset_raw);
uint8_t sp_low_nibble = sp_low & 0x0F;
uint8_t offset_low_nibble = offset_unsigned & 0x0F;
bool half_carry = (sp_low_nibble + offset_low_nibble) > 0x0F;
regs_->set_flag_h(half_carry);
// C: Carry from bit 7
bool carry = ((static_cast<uint16_t>(sp_low) + static_cast<uint16_t>(offset_unsigned)) & 0x100) != 0;
regs_->set_flag_c(carry);
cycles_ += 4;
return 4;
}
Case 0xF8: LD HL, SP+e
case 0xF8: // LD HL, SP+e
{
// Read signed offset
uint8_t offset_raw = fetch_byte();
int8_t offset = static_cast<int8_t>(offset_raw);
// Save SP for flag calculation (NOT modified)
uint16_t sp = regs_->sp;
uint8_t sp_low = sp & 0xFF;
// Calculate HL = SP + offset
uint16_t hl_new = (sp + offset) & 0xFFFF;
regs_->set_hl(hl_new);
// Calculate flags (identical to ADD SP, e)
regs_->set_flag_z(false);
regs_->set_flag_n(false);
uint8_t offset_unsigned = static_cast<uint8_t>(offset_raw);
uint8_t sp_low_nibble = sp_low & 0x0F;
uint8_t offset_low_nibble = offset_unsigned & 0x0F;
bool half_carry = (sp_low_nibble + offset_low_nibble) > 0x0F;
regs_->set_flag_h(half_carry);
bool carry = ((static_cast<uint16_t>(sp_low) + static_cast<uint16_t>(offset_unsigned)) & 0x100) != 0;
regs_->set_flag_c(carry);
cycles_ += 3;
return 3;
}
Case 0xF9: LD SP, HL
case 0xF9: // LD SP, HL
{
uint16_t hl = regs_->get_hl();
regs_->sp = hl;
cycles_ += 2;
return 2;
}
Design Decisions
- Flags Calculation:The low byte of SP and the offset are used as unsigned for the flag calculation, but the offset is interpreted as signed for the actual sum. This replicates the behavior of the hardware.
- Location on the Switch:They were inserted just before the CB prefix to maintain the logical order of opcodes.
- Step 0267 Watchdog:It is kept active to verify that these instructions do not corrupt the SP.
Affected Files
src/core/cpp/CPU.cpp- Added cases 0xE8, 0xF8 and 0xF9 in the methodstep().
Tests and Verification
Compiled C++ module validation:The instructions were implemented directly in C++ and require recompilation.
Compile command:
.\rebuild_cpp.ps1
Test command:
python main.py roms/pkmn.gb
Expected verifications:
- The message
[CRITICAL] SP CORRUPTIONof Step 0267 should stop appearing (or appear much less frequently). - The GPS should show "healthy" SP values like
DFFXeitherFFFX(in WRAM or HRAM). - The game should advance beyond the waiting loop and display new graphics on the screen.
Note:Complete unit tests can be implemented in a future Step, following the pattern oftests/test_cpu_sp_arithmetic.pyof the Python version.
Sources consulted
- Bread Docs:CPU Instruction Set - ADD SP, r8
- Bread Docs:CPU Instruction Set - LD HL, SP+r8
- Bread Docs:CPU Instruction Set - LD SP, HL
- Reference implementation in Python:
src/cpu/core.py(Step 0068, v0.0.1)
Educational Integrity
What I Understand Now
- Flags in Stack Math:The H and C flags in ADD SP, e and LD HL, SP+e are calculated based on the low byte of SP, not the full 16-bit result. This is different from ADD HL, rr, where the flags are calculated in the low 12 bits (H) and 16 bits (C).
- Use in C Compilers:These instructions are fundamental for the code generated by C compilers, which constantly use them to manage the stack frame and local variables.
- SP Corruption:If these instructions are missing or poorly implemented, the SP can become corrupted and point to read-only memory (ROM), causing the game to crash.
What remains to be confirmed
- Validation with real ROMs:We need to run the emulator with Pokémon Red and verify that the SP is no longer corrupted and that the game is progressing correctly.
- Unit tests:Implement complete unit tests in C++ or Python that validate the calculation of flags in extreme cases (overflow, underflow, etc.).
Hypotheses and Assumptions
We assume that the flags implementation is correct based on the Pan Docs documentation and the reference implementation in Python (v0.0.1). However, the final validation requires running the emulator with real ROMs and verifying that the behavior is correct.
Next Steps
- [ ] Recompile the C++ module with
.\rebuild_cpp.ps1 - [ ] Run the emulator with Pokémon Red and verify that the SP is no longer corrupted
- [ ] Verify that the game progresses past the waiting loop and displays graphics
- [ ] If the Step 0267 watchdog still detects corruption, analyze what other instructions may be causing the problem
- [ ] Implement complete unit tests for the three statements (optional, may be a future Step)