This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
ALU with Immediate Operands (d8)
Summary
Completed the set of immediate ALU operations (with 8-bit operands embedded in the code), implementing ADC A, d8 (0xCE), SBC A, d8 (0xDE), AND d8 (0xE6), XOR d8 (0xEE) and OR d8 (0xF6). These instructions are critical because they allow you to operate on constants directly from the code, without needing to load values into registers first. The implementation reuses the helpers existing generics (_adc, _sbc, _and, _xor, _or), following the DRY (Don't Repeat Yourself) principle. With this, the CPU now has full computational capability for 8-bit operations, which allows games like Tetris DX to progress beyond initialization.
Hardware Concept
Heimmediate addressingis an addressing mode where the operand (the value to operate on) is embedded directly in the instruction code, just after the opcode.
On the LR35902 architecture, 8-bit immediate instructions follow this format:
- Byte 1:Opcode (e.g. 0xE6 for AND d8)
- Byte 2:Immediate operand (d8 = "data 8-bit")
When the CPU executes an immediate instruction:
- Read opcode from address pointed by PC
- Increase PC
- Read immediate operand from new PC address
- Increase CP again
- Execute the operation with the immediate value
The advantage of immediate addressing is that it allows operating with constants without the need for load values into registers first. For example, to do `AND A, 0x0F`, you don't need:
LD B, 0x0F ; Load 0x0F into B
AND A, B ; AND A with B
You can simply do:
AND 0x0F ; AND A with 0x0F directly
This saves bytes of code and CPU cycles, which is critical on resource-constrained systems like the Game Boy.
Logic reuse:The internal logic of the operations (calculation of flags Z, N, H, C) is identical between log versions and immediate versions. The only difference is where the operand is obtained: from a register or from the code. Therefore, the implementation reuses the same generic helpers (_adc, _sbc, _and, _xor, _or) that already existed for the registry versions.
Implementation
Implemented 5 new immediate opcodes following the same pattern as the immediate opcodes already existing (ADD A, d8 and SUB d8). Each method follows this structure:
- Read the immediate operand using
self.fetch_byte() - Calls the corresponding generic helper (e.g.
self._and(operand)) - Record the operation in the debug log
- Returns 2 M-Cycles (fetch opcode + fetch operand)
Components created/modified
src/cpu/core.py: Added 5 new handler methods:_op_adc_a_d8()- ADC A, d8 (0xCE)_op_sbc_a_d8()- SBC A, d8 (0xDE)_op_and_d8()- AND d8 (0xE6)_op_xor_d8()- XOR d8 (0xEE)_op_or_d8()- OR d8 (0xF6)
src/cpu/core.py: Updated the dispatch table (_opcode_table) to include the 5 new opcodes.tests/test_cpu_alu_immediate.py: Created new file with complete test suite (5 tests) validating all immediate operations.
Design decisions
Helper reuse:It was decided to reuse the existing generic helpers (_adc, _sbc, _and, _xor, _or) instead of duplicating logic. This follows the DRY principle and guarantees that the behavior of flags is identical between registry and immediate versions.
Consistency with existing opcodes:The new methods follow exactly the same
pattern that_op_add_a_d8and_op_sub_d8, maintaining consistency in the code
and facilitating future maintenance.
Comprehensive documentation:Each method includes detailed docstrings explaining what the instruction does, when it is useful, what flags it updates and how many cycles it consumes. This is critical for an educational project where understanding is as important as functionality.
Affected Files
src/cpu/core.py- Added 5 new handler methods and updated the dispatch tabletests/test_cpu_alu_immediate.py- Created new file with complete test suite (5 tests)
Tests and Verification
Description of how the implementation was validated:
- Unit tests:pytest with 5 tests passing:
test_and_immediate: Check AND d8 with bitmask (0xFF AND 0x0F = 0x0F) and the hardware quirk where H is always 1.test_xor_immediate: Checks XOR d8 which results in zero (0xFF XOR 0xFF = 0x00, Z=1).test_adc_immediate: Check ADC A, d8 with active carry (0x00 + 0x00 + 1 = 0x01).test_or_immediate: Check basic OR d8 (0x00 OR 0x55 = 0x55).test_sbc_immediate: Check SBC A, d8 with borrow active (0x00 - 0x00 - 1 = 0xFF).
- Real ROM (Tetris DX):It was executed
python3 main.py tetris_dx.gbc --debug:- CPU successfully executed initialization loop around 0x1383-0x1390 which uses combinations of DEC, LD and OR between registers.
- The opcode
0xE6 (AND d8)runs now without problems at 0x12CA, masking the value read from memory with an immediate constant. - The emulator advances to PC=0x12CF after ~70,082 M-Cycles and stops atopcode 0x0E (LD C, d8)not implemented, confirming that the next bottleneck is no longer the immediate ALU but an immediate load 8 bits in C.
- Logs:Methods include debug logging showing the operand,
the result and the updated flags. The mode
--debugViboy records PC, opcode, registers and cycles, allowing you to follow the exact flow that leads to 0x12CF. - Documentation:Implementation based on Pan Docs - Instruction Set.
Sources consulted
- Bread Docs:Instruction Set- Reference for immediate opcodes
Note: The implementation follows the same pattern as existing immediate opcodes (ADD A, d8, SUB d8, CP d8), guaranteeing consistency in the code.
Educational Integrity
What I Understand Now
- Immediate addressing:I understand that it is an addressing mode where the operand is embedded in the code, just after the opcode. This allows operating with constants without needing to load values into registers first.
- Logic reuse:I understand that the internal logic of the operations (flag calculation) is identical between registry and immediate versions. The only difference is where you get the operand from.
- Timing:I understand that all 8-bit immediate instructions consume 2 M-Cycles: one for fetching the opcode and another for fetching the operand.
- Completeness of the ALU set:With these 5 opcodes, we now have the complete set of immediate 8-bit ALU operations, giving the CPU full computational capability for 8-bit operations.
What remains to be confirmed
- Exact timing:Although I assume that all 8-bit immediate instructions they consume 2 M-Cycles, I have not verified this exhaustively with detailed technical documentation. You should confirm this with Pan Docs or timing tests if necessary in the future.
- Behavior in edge cases:The tests cover basic cases, but I have not exhaustively tested all edge cases (overflow, underflow, etc.). Generic helpers They are already tested, so it should be correct, but it is something to keep in mind.
Hypotheses and Assumptions
Main assumption:I assume that the timing (2 M-Cycles) is correct for all 8-bit immediate instructions, based on ADD A, d8, and SUB d8 (which were already implemented) also use 2 M-Cycles. This assumption seems reasonable, but is not explicitly stated. verified with detailed technical documentation.
Completeness assumption:I assume that with these 5 opcodes, we now have the set full of immediate 8-bit ALU operations. However, I have not thoroughly verified if there are other immediate operations that are missing. This assumption is based on general knowledge of the LR35902 architecture.
Next Steps
- [x] Test Tetris DX to see if it now progresses past opcode 0xE6
- [x] If Tetris advances, identify the next unimplemented opcode causing failure (0x0E - LD C, d8in PC=0x12CF).
- [ ] Implement opcode 0x0E (LD C, d8) reusing the 8-bit immediate load pattern.
- [ ] If Tetris tries to access hardware registers (0xFF40, 0xFF44, etc.), implement the basic PPU (Pixel Processing Unit) subsystem.
- [ ] If Tetris attempts to write to VRAM (0x8000-0x9FFF), implement VRAM mapping on the MMU.
- [ ] Continue to implement missing opcodes according to the needs of the game.