This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Implementation of Loads and 16-bit Arithmetic in C++
Summary
Data transfer operations (Loads) and 16-bit arithmetic were implemented in C++, covering approximately 40% of the Game Boy instruction set. were added generic helpers to handle the entire block 0x40-0x7F of LD r, r', as well as operations immediate loading (8 and 16 bits) and register pair arithmetic. 64+ were implemented new opcodes with an optimized architecture that uses register pointers and helper functions inline for maximum performance. All 16 tests pass correctly.
Hardware Concept
Most of the instructions of a CPU are data movement operations (Loads). On the Game Boy, the opcode block 0x40-0x7F forms a perfect matrix where:
- Bits 3-5: Destination register code (0=B, 1=C, 2=D, 3=E, 4=H, 5=L, 6=(HL), 7=A)
- Bits 0-2: Source record code (same encoding)
- Exception: 0x76 is HALT, not LD
This structure allows 63 LD instructions to be implemented with a single generic helper function,
instead of writing 63 individual cases. In C++, we can use pointers to registers
(uint8_t*) to create generic functions likeld_r_r(uint8_t* dest, uint8_t* src).
16-bit arithmetic: The Game Boy has an important peculiarity:
- INC/DEC rr: They DO NOT affect flags (they only increment/decrement the pair)
- ADD HL, rr: YES affects flags, but only H and C (NOT Z). The half-carry is calculated in bit 11 (bit 3 of the high byte), not bit 3 as in 8-bit operations.
This difference is critical to the accuracy of the emulation. 16-bit half-carry detects
high byte low nibble overflow:((hl & 0xFFF) + (value & 0xFFF)) > 0xFFF.
C++ Optimization: Using inline helper functions with record pointers allows the compiler optimizes the code by removing unnecessary indirections and generating code highly efficient machine. The entire block 0x40-0x7F is compiled to a single switch with generic logic, reducing code size and improving branch prediction.
Implementation
Generic helpers were implemented to handle the complete block of LD instructions and 16-bit arithmetic operations. The architecture uses pointers to registers and helper functions inline for maximum performance.
Components created/modified
- CPU.hpp: Added Load helper declarations and 16-bit arithmetic:
get_register_ptr(): Get pointer to record according to coderead_register_or_mem(): Read from register or memory (HL)write_register_or_mem(): Write to register or memory (HL)ld_r_r(): Copy value between registers/memoryinc_16bit()/dec_16bit(): Increment/decrement of pairsadd_hl(): 16-bit addition to HL with flag calculation
- CPU.cpp: Implementation of helpers and 64+ new opcodes:
- Block 0x40-0x7F: LD r, r' (63 instructions, except 0x76 HALT)
- LD r, n: 0x06, 0x0E, 0x16, 0x1E, 0x26, 0x2E, 0x3E (7 instructions)
- LD (HL), n: 0x36 (1 instruction)
- LD rr, nn: 0x01, 0x11, 0x21, 0x31 (4 instructions)
- INC/DEC rr: 0x03, 0x0B, 0x13, 0x1B, 0x23, 0x2B, 0x33, 0x3B (8 instructions)
- ADD HL, rr: 0x09, 0x19, 0x29, 0x39 (4 instructions)
- cpu.pxd: Corrected import of
boolfor Cython compatibility - tests/test_core_cpu_loads.py: Complete suite of 16 tests to validate all operations
Design decisions
- Pointers to records: Instead of using a giant switch with 64 cases, a helper function is used that maps registration codes to pointers. This reduces code size and improves maintainability.
- Management of (HL): Code 6 represents memory access at address HL.
The functions
read_register_or_mem()andwrite_register_or_mem()handle this special case, simplifying the LD logic. - Precise timing: Instructions that access memory (destination or origin = (HL)) they consume 2 M-Cycles instead of 1. The cycle calculation logic checks both codes.
- Flags in ADD HL: Implemented correct half-carry calculation in bit 11 and complete carry in 16 bits, respecting that Z is not affected.
- INC/DEC rr without flags: These opcodes do not affect flags, which is critical for emulation accuracy (many games depend on this behavior).
Affected Files
src/core/cpp/CPU.hpp- Added Load helper declarations and 16-bit arithmeticsrc/core/cpp/CPU.cpp- Implementation of helpers and 64+ new opcodessrc/core/cython/cpu.pxd- Fixed bool import for compatibilitytests/test_core_cpu_loads.py- Complete suite of 16 tests
Tests and Verification
A complete suite of 16 tests was created that validate all the implemented operations:
- TestLD_8bit_Register(5 tests): Verify LD r, r', LD (HL), r and LD r, (HL)
- TestLD_8bit_Immediate(2 tests): Check LD r, n and LD (HL), n
- TestLD_16bit(2 tests): Check LD rr, nn for BC and HL
- TestINC_DEC_16bit(3 tests): Check INC/DEC rr and that they DO NOT affect flags
- TestADD_HL(4 tests): Verify ADD HL, rr with different carry cases
Result: All 16 tests pass correctly. Validation includes:
- Verification of correct values in registers and memory
- Timing verification (M-Cycles consumed)
- Flag verification (especially that INC/DEC rr DO NOT affect flags)
- Checking half-carry and carry in 16-bit operations
- 16-bit wrap-around verification
Sources consulted
- Bread Docs:CPU Instruction Set- LD opcode structure and 16-bit arithmetic
- Bread Docs:CPU Registers and Flags- Behavior of flags in 16-bit operations
- GBEDG:Opcodes Table- Timing and behavior of each instruction
Note: The implementation strictly follows the technical documentation, without consulting source code from other emulators.
Educational Integrity
What I Understand Now
- opcode matrix: The block 0x40-0x7F forms a perfect matrix that allows implement 63 instructions with a single generic helper function, significantly reducing code size and improving maintainability.
- Pointers to members: In C++, we can use pointers to registers to create Generic functions that work with any record, eliminating the need for boilerplate code.
- Half-carry in 16 bits: The half-carry in 16-bit operations is calculated in the bit 11 (bit 3 of the high byte), not bit 3 as in 8-bit operations. This is critical for the precision of the emulation.
- INC/DEC rr without flags: These instructions DO NOT affect flags, which is a normal behavior. specific to the Game Boy hardware that many games depend on.
- Memory timing: Instructions that access memory (destination or origin = (HL)) they consume 2 M-Cycles instead of 1, reflecting the additional cost of memory access.
What remains to be confirmed
- CB prefix: Rotation and bit operations (CB prefix) are not implemented yet. These are critical for many bit manipulation operations.
- Remaining operations: Operations such as ADC, SBC, CP, and other operations are still missing arithmetic and logic that are common in the instruction set.
Hypotheses and Assumptions
The implementation assumes that the behavior of flags in ADD HL, rr is correct according to Pan Docs. Half-carry calculation in bit 11 is based on technical documentation, but should be validated with more exhaustive tests or test ROMs allowed if available.
Next Steps
- [ ] Implement remaining arithmetic operations (ADC, SBC, CP)
- [ ] Implement remaining logical operations (OR, CPL, SCF, CCF)
- [ ] Implement CB prefix (rotations and bit operations)
- [ ] Implement remaining stack operations (PUSH/POP for other peers)
- [ ] Validate with allowed test ROMs if available