This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
CPU Instruction Cycle Implementation
Summary
The class was implementedCPUwhich unifies the Registers and the MMU to create the Fetch-Decode-Execute instruction cycle.
The first 3 basic opcodes were implemented: NOP (0x00), LD A,d8 (0x3E) and LD B,d8 (0x06).
Created a complete unit test suite with 6 tests, all passing correctly.
The CPU can now execute instructions sequentially, marking the first functional "heartbeat" of the emulator.
Hardware Concept
HeInstruction CycleIt is the fundamental process that makes a CPU work. Without it, the CPU is just a static data structure. It is the "heartbeat" that turns hardware into a runnable machine.
The Fetch-Decode-Execute Cycle
All Game Boy instructions follow the same basic cycle:
- Fetch:Reads the byte at the address pointed to by the Program Counter (PC). This byte is theopcode(opcode).
- Increment:Advances the PC to point to the next byte. This is critical because the PC must move sequentially.
- Decode:Identifies what operation the opcode represents (e.g. "load value into register A").
- Execute:Performs the identified operation, possibly reading additional operands from memory or modifying registers.
This cycle repeats indefinitely while the console is on, executing thousands of instructions per second.
Opcodes and Instructions
Aopcodeis a byte (0x00 to 0xFF) that identifies a specific operation. The Game Boy has approximately 500 different opcodes, although some are repeated with different operands.
Instructions implemented in this step:
- 0x00 - NOP (No Operation):It doesn't do anything, it just consumes 1 machine cycle. Useful for time alignment and padding.
- 0x3E - LD A, d8:Loads an 8-bit immediate value (d8) into register A. Reads the next byte from memory and stores it in A.
- 0x06 - LD B, d8:Similar to LD A, d8 but loads the value into register B.
M-Cycles vs T-Cycles (Machine Cycles vs Clock Cycles)
The Game Boy uses two types of cycles to measure time:
- M-Cycle (Machine Cycle):A machine cycle corresponds to one memory operation (read or write). It is the most useful unit for measuring how long an instruction takes.
- T-Cycle (Clock Cycle):A clock cycle is the basic unit of hardware time. On the Game Boy, typically 1 M-Cycle = 4 T-Cycles.
For now we count M-Cyclesbecause it is simpler and sufficient to validate that the instructions are executed correctly. Later we will need T-Cycles for precise synchronization with other components (PPU, APU, timers).
Program Counter (PC) and Sequential Execution
HeProgram Counter (PC)is a 16-bit register that points to the next instruction to execute. After each instruction, the PC advances automatically. This allows sequential execution of instructions in memory.
Example:If we have the instructions in memory:
- 0x0100: 0x3E (LD A, d8)
- 0x0101: 0x42 (operand: value to load)
- 0x0102: 0x00 (NOP)
The execution is:
- PC = 0x0100, read 0x3E, execute LD A, d8 (read 0x42 from 0x0101), PC = 0x0102
- PC = 0x0102, read 0x00, execute NOP, PC = 0x0103
Fountain:Pan Docs - CPU Instruction Set and architecture LR35902
Implementation
The class was implementedCPUwhich unifies the previous components (Registers and MMU) to create
a functional instruction cycle. The implementation is modular and extensible, ready to add the remaining 500 opcodes.
Components created/modified
- CPU class:Manages the Fetch-Decode-Execute cycle and maintains references to Registers and MMU
- step() method:Executes a single instruction, returns the cycles consumed
- fetch_byte() method:Helper that reads a byte of memory at the PC address and advances PC automatically
- _execute_opcode() method:Dispatch opcodes using if/elif (compatible with Python 3.9+, ready for match/case when migrated to 3.10+)
- Implemented Opcodes:NOP (0x00), LD A,d8 (0x3E), LD B,d8 (0x06)
Design decisions
- Dependency injection:The CPU receives the MMU in the builder, allowing mock testing and better modularity.
- Helper fetch_byte():Facilitates reading immediate operands without repeating PC feed code.
- Handling of unimplemented opcodes:Lance
NotImplementedErrorwith useful information (opcode and PC) to facilitate debugging. - Logging:Logging is used instead of print() for debug traces (DEBUG level) that can be enabled/disabled.
- Python Compatibility:If/elif is used instead of match/case to maintain compatibility with Python 3.9 (current environment uses 3.9.6). Documented a TODO to migrate to match/case when upgrading to Python 3.10+.
Code structure
The CPU class follows a clear pattern:
CPU
├── __init__(mmu) # Initialize registers and save reference to MMU
├── step() # Main loop: fetch → decode → execute
├── fetch_byte() # Helper to read operands and increment PC
└── _execute_opcode() # Opcode dispatch
Affected Files
src/cpu/core.py- New file with the CPU class and implementation of the instruction cycle (170 lines)src/cpu/__init__.py- Updated to export CPU classtests/test_cpu_core.py- New file with complete test suite (6 tests, 204 lines)
Tests and Verification
A complete suite of unit tests was created that validates:
- Test 1 (test_nop):Verify that NOP advances PC by 1 byte and consumes 1 cycle
- Test 2 (test_ld_a_d8):Verify that LD A, d8 loads the correct value, advances PC by 2 bytes and consumes 2 cycles
- Test 3 (test_ld_b_d8):Verify that LD B, d8 works the same as LD A, d8 but in register B
- Test 4 (test_unimplemented_opcode_raises):Check that unimplemented opcodes raise NotImplementedError
- Test 5 (test_fetch_byte_helper):Verify that fetch_byte() reads correctly and advances PC
- Test 6 (test_multiple_instructions_sequential):Verifies sequential execution of multiple instructions
Result: ✅ 6 tests passed(run with pytest)
Additional validation:
- Verification that PC advances correctly after each instruction
- Verifying that records are updated correctly with immediate values
- Verifying that cycles are counted correctly
- No linting errors (verified with read_lints)
Sources consulted
- Pan Docs - CPU Instruction Set: Reference for opcodes and machine cycles
- Pan Docs - LR35902 Architecture: General Processor Architecture
Note: Clean-room implementation based solely on public technical documentation. Code from other emulators (mGBA, Gambatte, SameBoy, etc.) was not consulted.
Educational Integrity
What I Understand Now
- Fetch-Decode-Execute cycle:It is the fundamental loop that makes a CPU work. Without this loop, registers and memory are just static data structures.
- Program Counter (PC):It should automatically advance after each instruction to allow sequential execution. The fetch_byte() helper makes this easier.
- Opcodes:They are bytes that identify operations. Most opcodes have operands that follow immediately after in memory.
- M-Cycles:For now we count M-Cycles (machine cycles) because it is simpler. Later we will need T-Cycles for precise timing.
- Modularity:CPU depends on MMU but not vice versa. This allows independent testing and better architecture.
What remains to be confirmed
- Precise timing:Some instructions may have variations in timing depending on conditions. This will be validated with ROM tests when we implement more opcodes.
- Interruptions:The instruction cycle must be able to be interrupted. This will be implemented later.
- CB Opcodes (prefix):The Game Boy has a special prefix 0xCB that changes the meaning of the following 256 opcodes. This will be implemented later.
- Conditional opcodes:Many instructions have conditional versions that depend on flags. We will need branching logic.
Hypotheses and Assumptions
Assumed (to be validated with ROM tests):
- The timing of M-Cycles is correct according to Pan Docs (NOP=1, LD r8,d8=2). It will be validated with the execution of real programs.
- The behavior of unimplemented opcodes (NotImplementedError) is correct for development. In a full emulator, all opcodes must be implemented.
Next Steps
- [ ] Implement more load opcodes (LD): LD C,d8, LD D,d8, LD E,d8, LD H,d8, LD L,d8
- [ ] Implement loading opcodes between registers (LD r8, r8)
- [ ] Implement loading opcodes from/to memory (LD A,(HL), LD (HL),A, etc.)
- [ ] Implement basic arithmetic opcodes (ADD, SUB, INC, DEC)
- [ ] Implement jump opcodes (JP, JR) to change the execution flow
- [ ] Organize opcodes in a scalable way (dispatch table, separate modules)
- [ ] Migrate to Python 3.10+ and use match/case to dispatch opcodes