This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Migration of Registries to C++ (CoreRegisters)
Summary
Migration of CPU registers from Python to C++ is complete,
creating the classCoreRegistersproviding ultra-fast access
to 8 and 16 bit registers. This implementation is critical for performance,
since registers are accessed thousands of times per second during emulation.
With direct memory access instead of Python method calls, the loop
main CPU will be significantly faster.
Hardware Concept
The Game Boy's LR35902 CPU uses a hybrid register architecture based on the Z80/8080. The records are organized into:
- 8-bit registers:A, B, C, D, E, H, L, F
- 16-bit registers:PC (Program Counter), SP (Stack Pointer)
- 16-bit virtual pairs:AF, BC, DE, HL (8-bit register combinations)
The F register (Flags) has an important hardware peculiarity: the 4 low bits They are always 0 on real hardware. Only bits 7, 6, 5, 4 are valid and represent the flags Z (Zero), N (Subtract), H (Half Carry) and C (Carry) respectively.
To emulate this efficiently, we implement getters/setters for the 16-bit pairs
which manipulate bits correctly using bitwise operations. For example, the AF pair
has A in the high byte and F in the low byte, but when reading/writing F, we always apply
the mask0xF0to simulate hardware behavior.
Fountain:Pan Docs - Game Boy CPU Manual, logs section.
Implementation
The class was implementedCoreRegistersin C++ with the following principles
Design for maximum performance:
- Simple data structure:Registries are public members for direct access without method overhead. This allows the compiler to optimize memory access.
- Inline methods:Methods for virtual pairs (get_af, set_af, etc.)
and flags helpers are marked as
inlineso that the compiler expand them in the calling place, eliminating the overhead of function calls. - Cache-friendly:All records are contiguous in memory, taking advantage of the spatial locality of the processor cache.
Created components
src/core/cpp/Registers.hpp- CoreRegisters class declaration with all the registers, inline methods for virtual pairs and flag helpers.src/core/cpp/Registers.cpp- Implementation of the constructor that initializes all registers to zero.src/core/cython/registers.pxd- Cython definition of the C++ class for the link system.src/core/cython/registers.pyx- Cython WrapperPyRegisterswhich exposes Python properties for intuitive access to records.tests/test_core_registers.py- Complete test suite (14 tests) that validate all aspects of the records.
Design decisions
1. Public vs Private Members:We decided to use public members for individual records (a, b, c, etc.) because direct access is faster than getters/setters, and this data does not need additional validation. The methods inlines are only used for virtual pairs and flags, where there is additional logic.
2. Wrap-around in Cython:The Cython wrapper accepts valuesintof Python and applies the wrap-around before converting to C types (uint8_t, uint16_t).
This allows tests to write values like256either0x10000and
the system handles them correctly.
3. Python properties:Instead of explicit get/set methods, we use
Python properties (@property) so that Python code can access
to records as attributes (e.g.reg.a = 0x12ratherreg.set_a(0x12)).
Affected Files
src/core/cpp/Registers.hpp- CoreRegisters class (new)src/core/cpp/Registers.cpp- Constructor implementation (new)src/core/cython/registers.pxd- Cython definition (new)src/core/cython/registers.pyx- Cython PyRegisters Wrapper (new)src/core/cython/native_core.pyx- Updated to include registers.pyxsetup.py- Added Registers.cpp to build sourcestests/test_core_registers.py- Complete test suite (new)
Tests and Verification
A complete test suite was created that validates all aspects of the logs:
- Unit tests:14 tests covering:
- 8-bit and wrap-around registers
- 16-bit virtual pairs (AF, BC, DE, HL)
- Low bit mask in register F
- Individual flags (Z, N, H, C)
- Program Counter and Stack Pointer
- Default initialization
- Compilation:✅ Successful with no errors (minor Cython warnings expected)
- Compiled C++ module validation:✅ All tests pass (14/14)
- Execution time:~0.05s (extremely fast)
Execute command: python -m pytest tests/test_core_registers.py -v
Sources consulted
- Bread Docs:Game Boy CPU Manual- Records and flags section
- Implementation based on LR35902 hardware specifications
Note: The implementation follows the pattern established in the Python version, but optimized for C++ with direct memory access and inline methods.
Educational Integrity
What I Understand Now
- Endianness and virtual pairs:In a Little-Endian architecture like the Game Boy,
the AF pair has A in the high byte and F in the low byte. This means that by doing
(to<< 8) | fwe get the correct value of the 16-bit pair. - Memory access optimization:Have all contiguous records in memory (in a struct) allows the processor to load them into the cache together, reducing cache misses and improving performance.
- Inline functions:Methods marked as
inlineare expanded by the compiler at the calling site, eliminating the overhead of function calls. This is critical for small functions that are called thousands of times per second. - Type management in Cython:To handle wrap-around correctly, we need
accept values
intPython, apply the mask, and then do explicit cast to C types (uint8_t,uint16_t).
What remains to be confirmed
- Actual CPU loop performance:Although theoretically direct access memory should be faster, we will need to measure actual performance when Let's integrate this into the main CPU loop.
- Behavior of flags in complex operations:When we implement the CPU instructions, we will verify that the flags are set correctly depending on hardware specifications.
Hypotheses and Assumptions
We assume that the C++ compiler will properly optimize inline methods and access.
to public members. If there are performance issues, we might consider usingunionfor virtual peers, but this would complicate the code and is probably not necessary.
Next Steps
- [ ] Migrate CPU to C++ using CoreRegisters and CoreMMU
- [ ] Implement statement loop (Fetch-Decode-Execute) in C++
- [ ] Integrate CoreRegisters with the main emulation loop
- [ ] Measure performance compared to the Python version
- [ ] Implement critical CPU instructions in C++