This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Implementation of the CB Prefix (Extended Instructions) in C++
Summary
The complete CB prefix (256 extended instructions) was implemented in C++,
including rotations, shifts, BIT, RES and SET. This is the "jewel in the crown"
of the Game Boy's CPU, allowing native and extremely
fast. Added methodhandle_cb()which decodes the CB opcode
using efficient bitwise logic, and all operations were implemented according to
Bread Docs. All tests pass (11/11), validating correct behavior
of flags, timing and indirect memory access.
Hardware Concept
The Game Boy has more instructions than can fit in 1 byte (256 opcodes).
When the CPU reads the opcode0xCB, knows that the next byte must
be interpreted with a different table of instructions. The CB prefix allows
access 256 additional instructions, organized in a very orderly way:
- 0x00-0x3F:Rotations and Shifts (RLC, RRC, RL, RR, SLA, SRA, SWAP, SRL)
- RLC/RRC: Circular rotation (bit 7/0 exits and enters at the other end)
- RL/RR: Rotation via Carry (bit 7/0 goes to C, old C goes in)
- SLA: Shift Left Arithmetic (multiply by 2, bit 7 → C, bit 0 ← 0)
- SRA: Shift Right Arithmetic (divide by signed 2, preserve bit 7)
- SRL: Shift Right Logical (divide by unsigned 2, bit 7 ← 0)
- SWAP: Swap high and low nibbles (0xF0 → 0x0F)
- 0x40-0x7F:BIT b, r (Test bit) - Tests if a bit is on
- Z = !bit (if bit is off, Z=1)
- H = 1 (always, hardware quirk)
- N = 0 (always)
- C = Preserved (does not change)
- 0x80-0xBF:RES b, r (Reset bit) - Turns off a specific bit
- Does not affect flags (preserves all)
- 0xC0-0xFF:SET b, r (Set bit) - Turns on a specific bit
- Does not affect flags (preserves all)
Mathematical Structure of Opcode CB:The CB opcode is perfectly structured for efficient decoding:
- Bits 0-2: Register (0=B, 1=C, 2=D, 3=E, 4=H, 5=L, 6=(HL), 7=A)
- Bits 3-5: Bit index (0-7) for BIT/SET/RES, or operation type for rotations
- Bits 6-7: Operation group (00=Rotations/Shifts, 01=BIT, 10=RES, 11=SET)
Critical Difference with Fast Spins:CB prefix rotations (ex: RLC) calculate the Z flag according to the result, while fast rotations (e.g. RLCA) they always put Z=0. This difference is critical for gaming compatibility.
C++ Optimization:C++'s native bitwise operations (&, |, ~, <<, >>)
compile directly to single-cycle machine instructions, delivering performance
maximum. Indirect memory access (HL) requires 4 M-Cycles (read, modify, write),
while registers only require 2 M-Cycles.
Implementation
The method was implementedhandle_cb()inCPU.cppthat decodes
the CB opcode using efficient bitwise logic. The method extracts the components from the opcode
(register, index bit, operation group) and execute the corresponding operation using
a nested switch for maximum performance.
Components created/modified
- CPU.hpp: Added declaration
handle_cb()with complete documentation. - CPU.cpp:
- Implementation of
handle_cb()with bitwise decoding - Nested switch for rotations/shifts (8 operations: RLC, RRC, RL, RR, SLA, SRA, SWAP, SRL)
- Logic for BIT (test bits with correct flags)
- Logic for RES and SET (reset and set bits without affecting flags)
- Handling indirect memory access (HL) with correct timing
- Implementation of
- CPU.cpp(step()): Added case
0xCBthat callshandle_cb(). - tests/test_core_cpu_cb.py: Complete suite of 11 tests validating:
- BIT with bits on/off and C preservation
- RL with and without prior carry
- SET/RES in indirect memory (HL)
- SWAP with different values
- RLC with critical difference of Z flags vs RLCA
Design decisions
- Bitwise Decoding:Bit extraction is used (
&,>>) to decode the CB opcode instead of a giant 256 case switch. This reduces code size and improves host processor branch prediction. - Nested Switch:An external switch is used for the operation group (bits 6-7) and an internal switch for the rotation/shift type (bits 3-5). This allows the compiler optimize better than a 256 case flat switch.
- Flag Preservation:RES and SET do not modify flags, while BIT always sets H=1 and N=0, but preserves C. This logic is implemented explicitly to ensure Compatibility with real hardware.
- Precise Timing:2 M-Cycles are returned for records and 4 M-Cycles for (HL), reflecting the actual cost of memory access (read, modify, write).
- Early Return in BIT:BIT does not modify the register/memory, so it is returned immediately after updating flags, avoiding unnecessary writing.
Key Code
int CPU::handle_cb() {
uint8_t cb_opcode = fetch_byte();
uint8_t reg_code = cb_opcode & 0x07; // Bits 0-2
uint8_t bit_index = (cb_opcode >> 3) & 0x07; // Bits 3-5
uint8_t op_group = (cb_opcode >> 6) & 0x03; // Bits 6-7
bool is_memory = (reg_code == 6);
uint8_t value = read_register_or_mem(reg_code);
// Switch according to operation group
switch (op_group) {
case 0x00: // Rotations/Shifts
// Internal switch for operation type
break;
case 0x01://BIT
// Test bit, update flags, return
break;
case 0x02://RES
result = value & ~(1<< bit_index);
break;
case 0x03: // SET
result = value | (1 << bit_index);
break;
}
write_register_or_mem(reg_code, result);
return is_memory ? 4 : 2;
}
Affected Files
src/core/cpp/CPU.hpp- Added declarationhandle_cb()src/core/cpp/CPU.cpp- Complete CB prefix implementation (200+ lines)tests/test_core_cpu_cb.py- Suite of 11 tests to validate CB operations
Tests and Verification
A complete test suite was created intest_core_cpu_cb.pywhich validates:
- BIT:Test with bits on/off, flag verification (Z inverse, H=1 always, C preserved)
- RL:Rotation through carry with and without previous carry, flag verification
- SET/RES:Operations in indirect memory (HL), timing verification (4 M-Cycles)
- SWAP:Swapping nibbles with different values, checking Z flags
- RLC:Critical difference of Z flags vs fast rotations (RLCA)
Result:All tests pass (11/11) ✅
$ python -m pytest tests/test_core_cpu_cb.py -v
============================= test session starts =============================
tests/test_core_cpu_cb.py::TestCBBit::test_cb_bit_7_h_set PASSED
tests/test_core_cpu_cb.py::TestCBBit::test_cb_bit_7_h_clear PASSED
tests/test_core_cpu_cb.py::TestCBBit::test_cb_bit_preserves_carry PASSED
tests/test_core_cpu_cb.py::TestCBRot::test_cb_rl_c PASSED
tests/test_core_cpu_cb.py::TestCBRot::test_cb_rl_with_carry PASSED
tests/test_core_cpu_cb.py::TestCBHL::test_cb_set_3_hl PASSED
tests/test_core_cpu_cb.py::TestCBHL::test_cb_res_0_hl PASSED
tests/test_core_cpu_cb.py::TestCBSwap::test_cb_swap_a PASSED
tests/test_core_cpu_cb.py::TestCBSwap::test_cb_swap_zero_result PASSED
tests/test_core_cpu_cb.py::TestCBRLC::test_cb_rlc_z_flag PASSED
tests/test_core_cpu_cb.py::TestCBRLC::test_cb_rlc_nonzero_result PASSED
============================= 11 passed in 0.07s =============================
Sources consulted
- Bread Docs:CB Prefix Instructions
- Bread Docs:BIT b, r Instruction
- Bread Docs:RES b, r Instruction
- Bread Docs:SET b, r Instruction
- Bread Docs:Rotations and Shifts
Educational Integrity
What I Understand Now
- Mathematical Structure of Opcode CB:CB opcode is perfectly organized for efficient decoding using bitwise operations. Bits 6-7 determine the group of operation, bits 3-5 determine the operation type or bit index, and bits 0-2 determine the destination record.
- Flags Critical Difference:CB prefix rotations (RLC, RRC, RL, RR) calculate the Z flag according to the result, while fast rotations (RLCA, RRCA, RLA, RRA) They always put Z=0. This difference is critical for gaming compatibility.
- Hardware Quirks:BIT always sets H=1, regardless of the value of the bit. RES and SET do not affect flags, preserving all previous flags. These peculiarities are implemented explicitly to ensure compatibility.
- Precise Timing:Indirect memory access (HL) requires 4 M-Cycles because involves reading, modifying and writing, while registers only require 2 M-Cycles.
- C++ Optimization:Native C++ bitwise operations compile directly to single-cycle machine instructions, offering maximum performance. The nested switch allows the compiler to optimize better than a flat 256-case switch.
What remains to be confirmed
- Behavior with Real Games:Although the tests pass, it would be valuable to try with test ROMs allowed to verify that CB operations work correctly on real game contexts.
- Edge Cases:Verify behavior with limit values (0x00, 0xFF) and with all the bits on/off in different combinations.
Hypotheses and Assumptions
The implementation is strictly based on Pan Docs. No additional assumptions have been made, and all design decisions are supported by official technical documentation.
Next Steps
- [ ] Implement remaining CPU instructions (STOP, DAA, etc.) if necessary
- [ ] Validate with allowed test ROMs to verify complete compatibility
- [ ] Optimize the main emulation loop for maximum performance
- [ ] Start implementation of APU (Audio Processing Unit) - Phase 2 main objective