This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Native CPU: Basic I/O Implementation (LDH)
Summary
Implemented high memory I/O instructionsLDH (n), A(0xE0) andLDH A, (n)(0xF0) on the native CPU (C++). These instructions are critical for communication between the CPU and hardware registers (PPU, Timer, etc.). The diagnosis revealed that opcode 0xE0 was the next missing link causing the Segmentation Fault when the emulator was trying to run real ROMs.
Hardware Concept
The Game Boy has a mapped memory space of 64KB (0x0000-0xFFFF). The range0xFF00-0xFFFFis reserved for hardware (I/O) registers. These Logs control critical components such as:
- 0xFF40 (LCDC): PPU control (LCD enabled, sprites, background, etc.)
- 0xFF41 (STAT): PPU status (current mode, interruptions, etc.)
- 0xFF47 (BGP): Background color palette
- 0xFF00 (JOYP): Joypad status
- 0xFF04-0xFF07: Timer Registers
The instructionLDH(Load High) is a hardware optimization that allows
access these registers more efficiently than a standard LD instruction. LDH calculates
the address as0xFF00 + n, wherenis an immediate byte (0-255).
This allows accessing any register in the range 0xFF00-0xFFFF with only 3 M-Cycles,
instead of the 4 M-Cycles that an indirect LD instruction would require.
Why is it critical?When a game starts, the first thing it does is configure these hardware registers. Without LDH, the CPU cannot write to LCDC, BGP, or any other I/O register, which prevents the PPU from initializing correctly and causes the emulator to crash when trying to execute invalid instructions.
Fountain:Pan Docs - CPU Instruction Set, section "LDH (n), A" and "LDH A, (n)"
Implementation
Two cases were added to the main switchCPU::step()to handle opcodes
0xE0 and 0xF0. The implementation is straightforward: read the immediate offset, calculate the address0xFF00 + offset, and performs the corresponding read or write operation.
Components created/modified
- src/core/cpp/CPU.cpp: Added cases 0xE0 and 0xF0 on the main switch
- tests/test_core_cpu_io.py: Complete test suite for LDH (new file)
Implemented Code
Implementation ofLDH (n), A(0xE0):
case 0xE0: // LDH (n), A
{
uint8_t offset = fetch_byte();
uint16_t addr = 0xFF00 + static_cast<uint16_t>(offset);
mmu_->write(addr, regs_->a);
cycles_ += 3;
return 3;
}
Implementation ofLDH A, (n)(0xF0):
case 0xF0: // LDH A, (n)
{
uint8_t offset = fetch_byte();
uint16_t addr = 0xFF00 + static_cast<uint16_t>(offset);
regs_->a = mmu_->read(addr);
cycles_ += 3;
return 3;
}
Design decisions
Timing:Both instructions consume 3 M-Cycles according to Pan Docs. This is consistent with the fact that they require reading an immediate byte (1 M-Cycle) and performing a memory (2 additional M-Cycles).
Explicit cast:It is usedstatic_cast<uint16_t>(offset)to avoid
compiler warnings and make type promotion explicit. The offset is a uint8_t (0-255),
but addition with 0xFF00 requires a uint16_t.
No range validation:It is not validated that the resulting address is in the range 0xFF00-0xFFFF because mathematically it always will be (0xFF00 + 0x00 = 0xFF00, 0xFF00 + 0xFF = 0xFFFF). The MMU is responsible for handling invalid access securely.
Affected Files
src/core/cpp/CPU.cpp- Added cases 0xE0 and 0xF0 on the main switch (lines ~906-930)tests/test_core_cpu_io.py- Complete test suite for LDH (new file, 5 tests)
Tests and Verification
Created a complete suite of unit tests intest_core_cpu_io.pywhich validates:
- test_ldh_write: Verify that LDH (n), A writes correctly to 0xFF00 + n
- test_ldh_read: Verifies that LDH A, (n) correctly reads from 0xFF00 + n
- test_ldh_write_lcdc: Specific case for writing to LCDC (0xFF40)
- test_ldh_read_stat: Specific case to read from STAT (0xFF41)
- test_ldh_offset_wraparound: Verify that large offsets (0xFF) work correctly
Command executed:
pytest tests/test_core_cpu_io.py -v
Expected result:5 tests passed
Test code (example):
def test_ldh_write(self):
"""Test: LDH (n), A (0xE0) writes A to 0xFF00 + n."""
mmu = PyMMU()
regs = PyRegisters()
cpu = PyCPU(mmu, regs)
regs.pc = 0x8000
regs.a = 0xAB
mmu.write(0x8000, 0xE0) # Opcode LDH (n), A
mmu.write(0x8001, 0x40) # offset 'n' (for 0xFF40 - LCDC)
cycles = cpu.step()
assert mmu.read(0xFF40) == 0xAB
assert regs.pc == 0x8002
assert cycles == 3
Native Validation:All tests validate the compiled C++ module through of the Cython wrapper. There is no intermediate Python code; the native CPU directly executes the LDH instructions.
Sources consulted
- Bread Docs: CPU Instruction Set, section "LDH (n), A" and "LDH A, (n)" - Timing: 3 M-Cycles
- Bread Docs: Memory Map, "I/O Registers" section (0xFF00-0xFFFF)
Educational Integrity
What I Understand Now
- LDH is a hardware optimization:Allows access to I/O registers with fewer cycles than a standard indirect LD instruction. This is critical because games They constantly access these registers during execution.
- The range 0xFF00-0xFFFF is mapped to hardware:Every direction in this range corresponds to a hardware-specific register. The CPU cannot simply "write in memory" here; every write has side effects on the emulated hardware.
- Game initialization depends on LDH:Without this instruction, the games cannot configure the PPU, Timer, or any other hardware component, which causes the emulator to crash immediately.
What remains to be confirmed
- Behavior of read-only registers:Some I/O registers are read-only (e.g. STAT has read-only bits). The MMU should handle this, but we need to verify that the games are not trying to write to these registers incorrectly.
- Writing Side Effects:Some records have effects secondary when they are written (e.g. writing to DIV resets the counter). This will be implemented when we migrate the Timer to C++.
Hypotheses and Assumptions
Assumption:We assume that the MMU handles register accesses correctly. of I/O. In the current implementation, the MMU simply reads/writes to memory, but in the In the future we will need to implement specific mapping for each hardware register (e.g. when write to DIV, reset counter).
Next Steps
- [ ] Run the emulator with a real ROM (e.g. Tetris) and verify that it advances beyond opcode 0xE0
- [ ] Identify the following unimplemented opcode that causes the following crash
- [ ] Implement more I/O instructions if necessary (e.g. LD (C), A and LD A, (C))
- [ ] Verify that the PPU can receive configurations correctly through LDH