This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Advanced Stack Arithmetic (SP+r8)
Summary
Two critical offset stack arithmetic opcodes have been implemented:ADD SP, r8(0xE8) andRH HL, SP+r8(0xF8). These opcodes allow stack addresses to be computed with an 8-bit signed offset, a common operation in game code to access local variables or data structures on the stack. The emulator had advanced more than 1 million cycles running Pokémon and hit the unimplemented 0xF8 opcode on PC=0x1D5C, indicating significant progress. Both opcodes have special flags (H and C) that are calculated based on the low byte of SP, not the low 12 bits as in ADD HL, rr.
Hardware Concept
The LR35902 CPU provides two instructions to perform offset stack arithmetic:
- ADD SP, r8 (0xE8): Adds an 8-bit signed integer to the Stack Pointer (SP). The offset is read as a signed byte using Two's Complement representation (0x00-0x7F are positive, 0x80-0xFF are negative). The result is stored in SP and consumes 4 M-Cycles.
- RH HL, SP+r8 (0xF8): Calculates SP + offset (same offset format) and stores the result in HL. SP is NOT modified. Consumes 3 M-Cycles.
Special flags:Both instructions have unique behavior with flags:
- Z (Zero): Always 0 (not touched).
- N (Subtract): Always 0 (it is a sum).
- H (Half-Carry): Activated if there is carry from bit 3 to 4 (low nibble). It is calculated as:
((sp & 0xF) + (offset & 0xF)) > 0xF. - C (Carry): Activated if there is carry from bit 7 to 8 (low byte). It is calculated as:
((sp & 0xFF) + (offset & 0xFF)) > 0xFF.
Critical difference with ADD HL, rr:In ADD HL, rr, the H and C flags are calculated in the low 12 bits (bits 0-11) and 16 bits respectively. In ADD SP, r8 and LD HL, SP+r8, the flags are calculated only in the low byte (bits 0-7) of SP, because we are adding an 8-bit value to a 16-bit value.
Use in games:These instructions are essential for accessing local variables on the stack. For example, if a function has local variables on the stack, you can useLD HL, SP-4to get a pointer to those variables without modifying SP.
Implementation
A generic helper was implemented_add_sp_offset()which calculates SP + offset and returns the result along with the H and C flags. This helper is reused in both opcodes to maintain consistency and avoid code duplication.
Helper: _add_sp_offset()
The helper receives a signed offset (range [-128, 127]) and returns a tuple(result, h_flag, c_flag):
- Converts the offset to its unsigned representation for flag calculations.
- Calculate the result with 16-bit wrap-around:
(sp + offset) & 0xFFFF. - Calculate H flag:
((sp_low & 0xF) + (offset_low & 0xF)) > 0xF. - Calculate C flag:
((sp_low + offset_low) & 0x100) != 0.
Opcode 0xE8: ADD SP, r8
Implemented in_op_add_sp_r8():
- Read the offset using
_read_signed_byte()(already existing). - Call
_add_sp_offset()to calculate result and flags. - Update SP with the result.
- Updates flags: Z=0, N=0, H and C according to calculation.
- Returns 4 M-Cycles.
Opcode 0xF8: LD HL, SP+r8
Implemented in_op_ld_hl_sp_r8():
- Read the offset using
_read_signed_byte(). - Call
_add_sp_offset()to calculate result and flags. - Updates HL with the result (SP is NOT modified).
- Updates flags: Z=0, N=0, H and C according to calculation.
- Returns 3 M-Cycles.
Integration in dispatch table
Both opcodes were added to the dispatch table_opcode_tablein__init__():
0xE8: self._op_add_sp_r80xF8: self._op_ld_hl_sp_r8
Affected Files
src/cpu/core.py- Added helper_add_sp_offset()and handlers_op_add_sp_r8()and_op_ld_hl_sp_r8(). Integrated into dispatch table.tests/test_cpu_sp_arithmetic.py- New file with 9 unit tests that cover both opcodes: positive/negative offsets, H and C flags, wrap-around, and verification that SP does not change in LD HL, SP+r8.
Tests and Verification
9 exhaustive unit tests were created that cover all relevant cases:
- ADD SP, r8 (5 tests):Positive offset, negative offset, half-carry, carry, wrap-around.
- LD HL, SP+r8 (4 tests):Positive offset, negative offset, H and C flags, verification that SP does not change.
Test Execution
Command executed:
python -m pytest tests/test_cpu_sp_arithmetic.py -v
Around:
- OS: Windows 10
- Python: 3.13.5
Result:
============================= test session starts =============================
platform win32 -- Python 3.13.5, pytest-9.0.2, pluggy-1.6.0
collected 9 items
tests/test_cpu_sp_arithmetic.py::TestAddSpR8::test_add_sp_positive PASSED
tests/test_cpu_sp_arithmetic.py::TestAddSpR8::test_add_sp_negative PASSED
tests/test_cpu_sp_arithmetic.py::TestAddSpR8::test_add_sp_with_half_carry PASSED
tests/test_cpu_sp_arithmetic.py::TestAddSpR8::test_add_sp_with_carry PASSED
tests/test_cpu_sp_arithmetic.py::TestAddSpR8::test_add_sp_wraparound PASSED
tests/test_cpu_sp_arithmetic.py::TestLdHlSpR8::test_ld_hl_sp_r8_positive PASSED
tests/test_cpu_sp_arithmetic.py::TestLdHlSpR8::test_ld_hl_sp_r8_negative PASSED
tests/test_cpu_sp_arithmetic.py::TestLdHlSpR8::test_ld_hl_sp_r8_with_flags PASSED
tests/test_cpu_sp_arithmetic.py::TestLdHlSpR8::test_ld_hl_sp_r8_sp_unchanged PASSED
============================== 9 passed in 0.15s ==============================
What is valid:
- Arithmetic correctness:The tests verify that SP + offset is calculated correctly, including cases with wrap-around (0xFFFF + 1 = 0x0000).
- Flags H and C:The tests verify that the flags are calculated correctly based on the low byte of SP, not the low 12 bits as in ADD HL, rr.
- SP Preservation:The tests verify that in LD HL, SP+r8, the Stack Pointer is not modified, it is only used to calculate HL.
- Flags Z and N:The tests verify that Z=0 and N=0 always, regardless of the result.
Test code (example):
def test_add_sp_positive(self):
"""Test: Verify that ADD SP, r8 adds a positive offset correctly."""
mmu = MMU()
cpu = CPU(mmu)
cpu.registers.set_pc(0x0100)
cpu.registers.set_sp(0x1000)
# Write opcode and offset
mmu.write_byte(0x0100, 0xE8) # ADD SP, r8
mmu.write_byte(0x0101, 0x05) # +5
cycles = cpu.step()
# Check result
assert cpu.registers.get_sp() == 0x1005, "SP must be 0x1005"
# Check flags
assert not cpu.registers.get_flag_z(), "Z must be 0"
assert not cpu.registers.get_flag_n(), "N must be 0"
assert not cpu.registers.get_flag_h(), "H must be 0 (no half-carry)"
assert not cpu.registers.get_flag_c(), "C must be 0 (no carry)"
# Check cycles
assert cycles == 4, "ADD SP, r8 must consume 4 M-Cycles"
Why these tests demonstrate the hardware:The tests verify that the calculation of H and C flags is based on the low byte of SP (bits 0-7), not the low 12 bits as in ADD HL, rr. This is a specific feature of the LR35902 hardware that differentiates these instructions from other 16-bit arithmetic operations. The tests also verify that the offset is correctly interpreted as a signed integer (Two's Complement), allowing negative offsets that are common in game code to access local variables on the stack.
Sources consulted
- Bread Docs:CPU Instruction Set - ADD SP, r8 / LD HL, SP+r8
- Pan Docs: Flags behavior for SP+r8 instructions (difference to ADD HL, rr)
Note: The implementation was based on Pan Docs documentation on the behavior of flags in these specific instructions, which differs from the standard behavior of ADD HL, rr.
Educational Integrity
What I Understand Now
- Special flags in SP+r8:The H and C flags in ADD SP, r8 and LD HL, SP+r8 are calculated based on the low byte of SP (bits 0-7), not the low 12 bits as in ADD HL, rr. This is because we are adding an 8-bit value to a 16-bit value, and the hardware only checks for the low byte overflow.
- Use in game code:These instructions are essential for accessing local variables on the stack. For example,
LD HL, SP-4obtains a pointer to local variables without modifying SP, allowing efficient access to data structures on the stack. - Difference from ADD HL, rr:Although both are 16-bit arithmetic operations, the flag calculation is different. In ADD HL, rr, H is computed in the low 12 bits (bits 0-11), while in SP+r8, H is computed only in the low nibble (bits 0-3).
What remains to be confirmed
- Behavior in borderline cases:The tests cover normal and wrap-around cases, but extreme cases such as SP=0x0000 with large negative offset or SP=0xFFFF with large positive offset have not been tested. However, wrap-around tests cover these cases implicitly.
- Validation with real ROMs:The implementation was validated with unit tests, but has not yet been tested by running a real ROM that uses these opcodes. The fact that the emulator has gone so far as to run 0xF8 in Pokémon indicates that the game needs them, but the final validation will be when the game progresses beyond that point.
Hypotheses and Assumptions
Assumption about flags calculation:The implementation assumes that the calculation of H and C flags is based on the low byte of SP (bits 0-7), not the low 12 bits. This assumption is supported by the Pan Docs documentation, but has not been verified with real hardware. Unit tests verify that the calculation is correct based on this assumption.
Next Steps
- [ ] Run Pokémon (pkmn.gb) and verify that it progresses past PC=0x1D5C (where it collided with 0xF8).
- [ ] Verify that the game shows the intro (shooting star, copyright) after implementing these opcodes.
- [ ] If more unimplemented opcodes appear, implement them following the same pattern.
- [ ] Continue with the implementation of missing subsystems (APU, if necessary for the game).