This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Dynamic I/O and Register Mapping
Summary
CPU ISA (Instruction Set Architecture) 100% complete!The last two were implemented Missing opcodes from LR35902 CPU:RH (C), A (0xE2)andLD A, (C) (0xF2). These instructions allow dynamic access to hardware registers using the C register as the offset, which is especially useful for initialization loops. Additionally, system visibility was significantly improved adding constants for all hardware registers (LCDC, STAT, BGP, etc.) and improving MMU logging to display register names instead of hexadecimal addresses. With this, the emulator can execute code full of real games and the logs now show readable information as "IO WRITE: LCDC = 0x91" instead of "Writing at 0xFF40."
Hardware Concept
LD (C), A and LD A, (C) - Dynamic I/O Access
The Game Boy controls its peripherals (screen, audio, timers, interruptions) throughMemory Mapped I/O. This means that writing to certain memory addresses (range 0xFF00-0xFF7F) does not write to RAM, but controls real hardware.
We already had the instructions implementedLDH (n), A(0xE0) andLDH A, (n)(0xF0), which
read an immediate bytenand access0xFF00 + n. However, these instructions require
that the offset is embedded in the code, which makes them static.
RH (C), A (0xE2)andLD A, (C) (0xF2)They are optimized variants that use the registrycas dynamic offset. This allows:
- Initialization loops:A game can initialize multiple hardware registers in a loop, increasing C in each iteration.
- Space saving:1 byte less than LDH (does not need to read immediate byte).
- Flexibility:The offset can be calculated at run time.
Practical example:Tetris DX usaRHP (C), Ato write to LCDC (0xFF40), STAT (0xFF41),
BGP (0xFF47), etc. in a loop, incrementing C from 0x40 to 0x4F.
Hardware Registers (Memory Mapped I/O)
The Game Boy has dozens of hardware registers mapped to the range 0xFF00-0xFF7F. The most important ones are:
- LCDC (0xFF40):LCD Control - Turn screen on/off, background/sprite settings.
- STAT (0xFF41):LCD Status - Current status of the LCD (mode, interrupt flags).
- SCY/SCX (0xFF42/43):Scroll Y/X - Background position.
- LY (0xFF44):Current line being drawn (0-153, read only).
- BGP (0xFF47):Background Palette Data - Color palette for the background.
- IF (0xFF0F):Interrupt Flag - Pending interrupt flags.
- IE (0xFFFF):Interrupt Enable - Mask of enabled interrupts.
Fountain:Pan Docs - Memory Map / I/O Ports
Implementation
Implemented the two missing opcodes insrc/cpu/core.pyand the system was significantly improved
logging insrc/memory/mmu.pyto make visible which hardware registers are being accessed.
Components created/modified
_op_ld_c_a()insrc/cpu/core.py:- Implements LD (C), A (0xE2).
- Calculate I/O address:
0xFF00+C. - Write the value of A in that address.
- It consumes 2 M-Cycles (1 less than LDH because it does not read immediate bytes).
_op_ld_a_c()insrc/cpu/core.py:- Implements LD A, (C) (0xF2).
- Calculate I/O address:
0xFF00+C. - Reads the value from that address and loads it into A.
- Consumes 2 M-Cycles.
- Hardware register constantsin
src/memory/mmu.py:- Added constants for all main records:
IO_LCDC,IO_STAT,IO_BGP,IO_IF,IO_IE, etc. - Dictionary
IO_REGISTER_NAMESwhich maps addresses to readable names.
- Added constants for all main records:
- Improved loggingin
MMU.write_byte():- Detects writes in the I/O range (0xFF00-0xFF7F).
- Register informative log with log name:
"IO WRITE: LCDC = 0x91". - If the record is not in the dictionary, it shows generic format:
"IO WRITE: IO_0xFF50 = 0x42".
Opcodes added to dispatch table
0xE2:_op_ld_c_a(LD (C), A)0xF2:_op_ld_a_c(LD A, (C))
Design decisions
- Timing:LD(C),A and LD A,(C) consume 2 M-Cycles (vs 3 for LDH) because they do not need to read an immediate byte. This matches the Pan Docs documentation.
- Logging:It is used
logger.info()for I/O writes (notdebug) because It is valuable information to understand what the game is doing. The log is "lazy" (it is only formatted if the level logging is enabled). - Constants:Constants were defined for the most common records, but the system is extensible.
If you need to add more records in the future, just add them to the dictionary
IO_REGISTER_NAMES.
Affected Files
src/cpu/core.py- Added methods_op_ld_c_a()and_op_ld_a_c(). Added opcodes 0xE2 and 0xF2 to the dispatch table.src/memory/mmu.py- Added hardware register constants (IO_LCDC, IO_STAT, etc.) and dictionaryIO_REGISTER_NAMES. Improved methodwrite_byte()for logging informative of I/O writes.tests/test_cpu_io_c.py- New file with 6 unit tests:- 3 tests for LD (C), A (LCDC, STAT, BGP, wrap-around)
- 2 tests for LD A, (C) (STAT, LCDC)
- 1 test for I/O address wrap-around
Tests and Verification
The full suite of TDD tests was run to validate the two implemented opcodes.
Test Execution
Command executed:
python3 -m pytest tests/test_cpu_io_c.py -v
Around:
- OS: macOS (darwin 21.6.0)
- Python: 3.9.6
- pytest:8.4.2
Result:
============================== test session starts ==============================
platform darwin -- Python 3.9.6, pytest-8.4.2, pluggy-1.6.0
collected 6 items
tests/test_cpu_io_c.py::TestIOAccessViaC::test_ld_c_a_write PASSED [ 16%]
tests/test_cpu_io_c.py::TestIOAccessViaC::test_ld_c_a_write_stat PASSED [ 33%]
tests/test_cpu_io_c.py::TestIOAccessViaC::test_ld_c_a_write_bgp PASSED [ 50%]
tests/test_cpu_io_c.py::TestIOAccessViaC::test_ld_a_c_read PASSED [ 66%]
tests/test_cpu_io_c.py::TestIOAccessViaC::test_ld_a_c_read_lcdc PASSED [ 83%]
tests/test_cpu_io_c.py::TestIOAccessViaC::test_ld_c_a_wrap_around PASSED [100%]
============================== 6 passed in 0.19s ==============================
What is valid:
- LD (C), A:Verify that the writing in
0xFF00+Cworks correctly for different C values (LCDC=0x40, STAT=0x41, BGP=0x47). Validates that C and A are not modified after the writing. Confirms that it consumes 2 M-Cycles (correct according to documentation). - LD A, (C):Verify that the reading of
0xFF00+Ccorrectly loads the value in A. Validates that C is not modified. Confirm timing of 2 M-Cycles. - Wrap-around:Verify that with C=0xFF, the calculated address is 0xFFFF (IE), proving that the direction calculation works correctly even at the limit.
Test Code (Essential Fragment)
Test example for LD (C), A:
def test_ld_c_a_write(self):
"""Test: LD (C), A writes correctly to 0xFF00 + C."""
mmu = MMU()
cpu = CPU(mmu)
# Set initial state
cpu.registers.set_c(0x40) # LCDC
cpu.registers.set_a(0x91)
cpu.registers.set_pc(0x8000)
# Write opcode to memory
mmu.write_byte(0x8000, 0xE2) # LD (C), A
# Execute statement
cycles = cpu.step()
# Verify that 0xFF40 was written correctly (LCDC)
assert mmu.read_byte(IO_LCDC) == 0x91, "LCDC must be 0x91"
assert cpu.registers.get_c() == 0x40, "C must not change"
assert cpu.registers.get_a() == 0x91, "A must not change"
assert cycles == 2, "Must consume 2 M-Cycles"
Full route: tests/test_cpu_io_c.py
Validation with Real ROM (Tetris DX)
ROM:Tetris DX (user-contributed ROM, not distributed)
Execution mode:Headless, with logging enabled at the INFO level to view I/O writes.
Success Criterion:The emulator should run opcode 0xE2 without "Opcode not implemented" errors and display informative I/O write logs with readable log names.
Observation:When running Tetris DX, the emulator now:
- Correctly execute the 0xE2 (LD (C), A) opcode that was causing the error.
- Shows informative logs such as:
IO WRITE: LCDC = 0x91 (addr: 0xFF40) IO WRITE: STAT = 0x85 (addr: 0xFF41) IO WRITE: BGP = 0xE4 (addr: 0xFF47) - The game progresses past initialization and enters a loop waiting for the LY register (0xFF44) change, which is the expected behavior since we do not have the PPU (Processing Unit) implemented yet. of Graphics).
Result: Verified- The opcodes work correctly and The logging system displays valuable information for debugging.
Legal notes:The Tetris DX ROM is the property of the user and is not distributed or included in the repository. It is used only for local emulator validation testing.
Sources consulted
- Pan Docs - CPU Instruction Set:https://gbdev.io/pandocs/CPU_Instruction_Set.html
- LD (C), A (opcode 0xE2)
- LD A, (C) (opcode 0xF2)
- Pan Docs - Memory Map:https://gbdev.io/pandocs/Memory_Map.html
- I/O Ports (0xFF00-0xFF7F)
- Hardware registers (LCDC, STAT, BGP, etc.)
Educational Integrity
What I Understand Now
- Memory Mapped I/O:The Game Boy uses memory addresses to control hardware. Write at 0xFF40 it does not write to RAM, but rather configures the LCD. This is more efficient than having instructions special for each peripheral.
- LD (C), A vs LDH (n), A:The key difference is that LD(C),A uses a register(C) as offset, allowing dynamic loops. LDH(n), A uses an immediate byte, which is static but more direct. LD(C), A is 1 cycle faster because it does not need to read the immediate byte.
- Hardware logs:Each record has a specific purpose. LCDC controls whether the screen is on, STAT indicates the current LCD mode (H-Blank, V-Blank, OAM, etc.), BGP defines the background colors. Understanding these registers is crucial for implementing PPU later.
- Informative log:Show register names instead of hexadecimal addresses makes the logs much more readable and useful for debugging. This is especially important when You work with complex hardware like the Game Boy.
What remains to be confirmed
- Behavior of read-only registers:Some registers like LY (0xFF44) are only reading. The MMU currently allows writing to them, but the actual hardware ignores writes. This should be implemented when the PPU is added.
- Registers with special behavior:Some registers have special behaviors when writing. For example, writing to DMA (0xFF46) starts a transfer. DIV (0xFF04) is reset to write any value. These behaviors will be implemented when the subsystems are added corresponding (DMA, Timer).
- Full range of records:Constants were defined for the most common records, but there are many more in the range 0xFF00-0xFF7F. As more subsystems (APU, Timer, etc.) are implemented, more constants will be added and logging will be improved.
Hypotheses and Assumptions
Timing of 2 M-Cycles:We assume that LD (C), A and LD A, (C) consume 2 M-Cycles based on the Pan Docs documentation. This is consistent with the fact that they do not need to read an immediate byte (unlike of LDH that consumes 3 M-Cycles). However, we have not validated this with real hardware, only with documentation.
Wrap-around behavior:We assume that if C=0xFF, the calculated address is 0xFFFF (IE), which is mathematically correct. The wrap-around test validates this, but we have not verified whether the hardware real has some special behavior in this case.
Next Steps
- [x] Implement LD (C), A (0xE2) and LD A, (C) (0xF2)
- [x] Add hardware register constants in MMU
- [x] Improve I/O write logging
- [x] Create TDD tests for the new opcodes
- [ ] Implement PPU (Graphics Processing Unit):The next big module. The PPU is responsible for rendering the screen, updating the LY register, generating V-Blank and H-Blank interrupts, and manage sprites and backgrounds. Without PPU, games sit in infinite loops waiting for LY to change.
- [ ] Implement Timer (DIV, TIMA, TMA, TAC): Subsystem that generates timer interrupts and maintains the division counter.
- [ ] Implement Interrupts: Complete interruption handling system (V-Blank, H-Blank, Timer, Joypad).
- [ ] Implement Joypad: Reading buttons and controller directions.
Note:With the implementation of these two opcodes, the LR35902 CPU is theoretically 100% complete. All instruction set opcodes are implemented. The next logical step is to implement the PPU to that games can render graphics and move beyond waiting loops.