This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Migration of the CPU Skeleton to C++ (CoreCPU)
Summary
The migration of the basic CPU skeleton to C++ has been completed, establishing the pattern ofdependency injectionin native code. The CPU now runs the Fetch-Decode-Execute loop in pure C++, accessing MMU and Records using direct pointers. Two test opcodes implemented (NOP and LD A, d8) to validate the architectural pattern before migrating the rest of instructions.
Hardware Concept
The Game Boy's LR35902 CPU executes instructions in a continuous cycle calledFetch-Decode-Execute:
- Fetch: Read the byte pointed by PC (Program Counter) from memory
- Increase: Advance PC to next byte
- Decode/Execute: Identify the opcode and execute the corresponding operation
Each instruction consumes a specific number ofM-Cycles(Machine Cycles). An M-Cycle typically corresponds to a memory operation. For example:
NOPE(0x00): 1 M-Cycle (does nothing, just consumes time)RH A, d8(0x3E): 2 M-Cycles (read opcode + read immediate value)
The CPUdoes not ownthe MMU or the Registries; only keeps references (pointers) to them. This allows multiple components to share the same state, following the pattern ofdependency injection.
Fountain: Pan Docs - CPU Instruction Set
Implementation
C++ class createdCPUwhich implements the basic instruction cycle.
The architecture uses pointers to MMU and CoreRegisters (it does not own them), following
the principle of dependency injection. This allows Python to own
of memory, while C++ operates at maximum speed with direct pointers.
Components created/modified
- CPU.hpp / CPU.cpp: C++ class that implements the Fetch-Decode-Execute loop
- Members: pointers to MMU and CoreRegisters, cycle counter
- Method
step(): Execute an instruction cycle - Helper
fetch_byte(): Read memory byte and increment PC - Compiler-optimized switch for opcode decoding
- cpu.pxd / cpu.pyx: Wrapper Cython that exposes CPU to Python
- Class
PyCPU: Python wrapper for CPU - Builder receives
PyMMUandPyRegisters - Extract underlying C++ pointers for dependency injection
- Class
- setup.py: Added CPU.cpp to build sources
- native_core.pyx: Included cpu.pyx in the main module
- test_core_cpu.py: Complete suite of integration tests (6 tests)
Design decisions
1. Dependency Injection in C++: The CPU receives pointers to MMU and Records instead of owning them. This allows Python to manage the lifecycle of objects, while C++ operates with direct pointers (maximum performance).
2. Switch Statement for Decoding: A is usedswitchinstead of a function table. The compiler can optimize this into a table
jump table, providing O(1) opcode decoding.
3. Minimum Opcodes for Validation: Only 2 opcodes were implemented (NOP and LD A, d8) to validate the architectural pattern. The rest of the opcodes are will migrate in later steps.
4. Error Handling: Unknown opcodes return 0 (error) instead of throwing exceptions. This avoids overhead in the critical emulation loop.
5. Access to Private Members on Cython: Asmmu.pyxandregisters.pyxare included innative_core.pyx, we can
directly access private members_mmuand_regsfromcpu.pyx(same compiled module).
Affected Files
src/core/cpp/CPU.hpp- Declaration of the CPU class in C++src/core/cpp/CPU.cpp- Implementation of the instruction cyclesrc/core/cython/cpu.pxd- Cython definition of the C++ classsrc/core/cython/cpu.pyx- Python wrapper for CPUsrc/core/cython/native_core.pyx- Included cpu.pyxsrc/core/cython/mmu.pyx- Comment on access to private memberssrc/core/cython/registers.pyx- Comment on access to private memberssetup.py- Added CPU.cpp to sourcestests/test_core_cpu.py- Integration test suite (6 tests)
Tests and Verification
A complete suite of integration tests was created that validates:
- Initialization: CPU is created successfully with MMU and Registers
- NOP (0x00): Consumes 1 M-Cycle, increases PC correctly
- RH A, d8 (0x3E): Reads immediate value, saves it in A, consumes 2 M-Cycles
- Multiple executions: Instruction sequence works correctly
- Unknown opcodes: Return 0 (error) without crashing
- Dependency injection: Multiple CPUs can share MMU and Registers
Execution results:
============================= test session starts =============================
platform win32 -- Python 3.13.5, pytest-9.0.2
collected 6 items
tests/test_core_cpu.py::TestCoreCPU::test_cpu_initialization PASSED [ 16%]
tests/test_core_cpu.py::TestCoreCPU::test_nop_instruction PASSED [ 33%]
tests/test_core_cpu.py::TestCoreCPU::test_ld_a_d8_instruction PASSED [ 50%]
tests/test_core_cpu.py::TestCoreCPU::test_ld_a_d8_multiple_executions PASSED [ 66%]
tests/test_core_cpu.py::TestCoreCPU::test_unknown_opcode_returns_zero PASSED [ 83%]
tests/test_core_cpu.py::TestCoreCPU::test_cpu_with_shared_mmu_and_registers PASSED [100%]
============================= 6 passed in 0.06s =============================
✅ All tests pass (6/6)
Compilation: Successful with Visual Studio 2022 (MSVC 14.44.35207). Minor Cython warnings expected (does not affect functionality).
Sources consulted
- Bread Docs: CPU Instruction Set- Specification of opcodes and machine cycles
- LR35902 architecture: General knowledge of the Fetch-Decode-Execute cycle
- Cython Documentation: Access to private members in included modules
Educational Integrity
What I Understand Now
- Dependency Injection in C++: Python creates the objects (memory owner), C++ receives pointers to operate at maximum speed. This avoids the overhead of passing Python objects in each instruction cycle.
- Fetch-Decode-Execute Cycle: The basic cycle of the CPU is Fetch (read opcode), Decode (identify instruction), Execute (execute operation). Each step consumes cycles specific machine.
- Optimized Switch Statement: The C++ compiler can convert a switch with consecutive cases in a jump table, providing O(1) decoding.
- Access to Private Members in Cython: When Cython modules are included in the same main file (native_core.pyx), private members can access each other because they are compiled in the same module.
What remains to be confirmed
- Real Performance: Comparative performance impact has not yet been measured with the Python implementation. Profiling will be needed to validate the improvement.
- Complete Opcode Migration: Only 2 opcodes are implemented. The rest (256 opcodes + 256 CB opcodes) need gradual migration.
- Interruption Management: The current skeleton does not handle interrupts. If You will need to add IME (Interrupt Master Enable) logic and interrupt handling.
Hypotheses and Assumptions
Performance Assumption: We assume that the switch statement will be optimized by the compiler in a jump table. This should be validated with profiling and code analysis generated.
Private Member Access Assumption: We assume that direct access to_mmuand_regsfromcpu.pyxIt is safe because they are in the
same compiled module. This was validated with successful compilation and passing tests.
Next Steps
- [ ] Migrate more basic opcodes (LD, ADD, SUB, etc.)
- [ ] Implement interrupt handling (IME, HALT)
- [ ] Add profiling to measure real performance vs Python
- [ ] Migrate CB opcodes (prefix 0xCB)
- [ ] Integrate native CPU with the main emulation loop