⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Implementation of the CB Prefix (Extended Instructions) in C++

Date:2025-12-19 StepID:0110 State: Filled

Summary

The complete CB prefix (256 extended instructions) was implemented in C++, including rotations, shifts, BIT, RES and SET. This is the "jewel in the crown" of the Game Boy's CPU, allowing native and extremely fast. Added methodhandle_cb()which decodes the CB opcode using efficient bitwise logic, and all operations were implemented according to Bread Docs. All tests pass (11/11), validating correct behavior of flags, timing and indirect memory access.

Hardware Concept

The Game Boy has more instructions than can fit in 1 byte (256 opcodes). When the CPU reads the opcode0xCB, knows that the next byte must be interpreted with a different table of instructions. The CB prefix allows access 256 additional instructions, organized in a very orderly way:

  • 0x00-0x3F:Rotations and Shifts (RLC, RRC, RL, RR, SLA, SRA, SWAP, SRL)
    • RLC/RRC: Circular rotation (bit 7/0 exits and enters at the other end)
    • RL/RR: Rotation via Carry (bit 7/0 goes to C, old C goes in)
    • SLA: Shift Left Arithmetic (multiply by 2, bit 7 → C, bit 0 ← 0)
    • SRA: Shift Right Arithmetic (divide by signed 2, preserve bit 7)
    • SRL: Shift Right Logical (divide by unsigned 2, bit 7 ← 0)
    • SWAP: Swap high and low nibbles (0xF0 → 0x0F)
  • 0x40-0x7F:BIT b, r (Test bit) - Tests if a bit is on
    • Z = !bit (if bit is off, Z=1)
    • H = 1 (always, hardware quirk)
    • N = 0 (always)
    • C = Preserved (does not change)
  • 0x80-0xBF:RES b, r (Reset bit) - Turns off a specific bit
    • Does not affect flags (preserves all)
  • 0xC0-0xFF:SET b, r (Set bit) - Turns on a specific bit
    • Does not affect flags (preserves all)

Mathematical Structure of Opcode CB:The CB opcode is perfectly structured for efficient decoding:

  • Bits 0-2: Register (0=B, 1=C, 2=D, 3=E, 4=H, 5=L, 6=(HL), 7=A)
  • Bits 3-5: Bit index (0-7) for BIT/SET/RES, or operation type for rotations
  • Bits 6-7: Operation group (00=Rotations/Shifts, 01=BIT, 10=RES, 11=SET)

Critical Difference with Fast Spins:CB prefix rotations (ex: RLC) calculate the Z flag according to the result, while fast rotations (e.g. RLCA) they always put Z=0. This difference is critical for gaming compatibility.

C++ Optimization:C++'s native bitwise operations (&, |, ~, <<, >>) compile directly to single-cycle machine instructions, delivering performance maximum. Indirect memory access (HL) requires 4 M-Cycles (read, modify, write), while registers only require 2 M-Cycles.

Implementation

The method was implementedhandle_cb()inCPU.cppthat decodes the CB opcode using efficient bitwise logic. The method extracts the components from the opcode (register, index bit, operation group) and execute the corresponding operation using a nested switch for maximum performance.

Components created/modified

  • CPU.hpp: Added declarationhandle_cb()with complete documentation.
  • CPU.cpp:
    • Implementation ofhandle_cb()with bitwise decoding
    • Nested switch for rotations/shifts (8 operations: RLC, RRC, RL, RR, SLA, SRA, SWAP, SRL)
    • Logic for BIT (test bits with correct flags)
    • Logic for RES and SET (reset and set bits without affecting flags)
    • Handling indirect memory access (HL) with correct timing
  • CPU.cpp(step()): Added case0xCBthat callshandle_cb().
  • tests/test_core_cpu_cb.py: Complete suite of 11 tests validating:
    • BIT with bits on/off and C preservation
    • RL with and without prior carry
    • SET/RES in indirect memory (HL)
    • SWAP with different values
    • RLC with critical difference of Z flags vs RLCA

Design decisions

  • Bitwise Decoding:Bit extraction is used (&, >>) to decode the CB opcode instead of a giant 256 case switch. This reduces code size and improves host processor branch prediction.
  • Nested Switch:An external switch is used for the operation group (bits 6-7) and an internal switch for the rotation/shift type (bits 3-5). This allows the compiler optimize better than a 256 case flat switch.
  • Flag Preservation:RES and SET do not modify flags, while BIT always sets H=1 and N=0, but preserves C. This logic is implemented explicitly to ensure Compatibility with real hardware.
  • Precise Timing:2 M-Cycles are returned for records and 4 M-Cycles for (HL), reflecting the actual cost of memory access (read, modify, write).
  • Early Return in BIT:BIT does not modify the register/memory, so it is returned immediately after updating flags, avoiding unnecessary writing.

Key Code

int CPU::handle_cb() {
    uint8_t cb_opcode = fetch_byte();
    uint8_t reg_code = cb_opcode & 0x07;        // Bits 0-2
    uint8_t bit_index = (cb_opcode >> 3) & 0x07; // Bits 3-5
    uint8_t op_group = (cb_opcode >> 6) & 0x03;  // Bits 6-7
    
    bool is_memory = (reg_code == 6);
    uint8_t value = read_register_or_mem(reg_code);
    
    // Switch according to operation group
    switch (op_group) {
        case 0x00: // Rotations/Shifts
            // Internal switch for operation type
            break;
        case 0x01://BIT
            // Test bit, update flags, return
            break;
        case 0x02://RES
            result = value & ~(1<< bit_index);
            break;
        case 0x03: // SET
            result = value | (1 << bit_index);
            break;
    }
    
    write_register_or_mem(reg_code, result);
    return is_memory ? 4 : 2;
}

Affected Files

  • src/core/cpp/CPU.hpp- Added declarationhandle_cb()
  • src/core/cpp/CPU.cpp- Complete CB prefix implementation (200+ lines)
  • tests/test_core_cpu_cb.py- Suite of 11 tests to validate CB operations

Tests and Verification

A complete test suite was created intest_core_cpu_cb.pywhich validates:

  • BIT:Test with bits on/off, flag verification (Z inverse, H=1 always, C preserved)
  • RL:Rotation through carry with and without previous carry, flag verification
  • SET/RES:Operations in indirect memory (HL), timing verification (4 M-Cycles)
  • SWAP:Swapping nibbles with different values, checking Z flags
  • RLC:Critical difference of Z flags vs fast rotations (RLCA)

Result:All tests pass (11/11) ✅

$ python -m pytest tests/test_core_cpu_cb.py -v
============================= test session starts =============================
tests/test_core_cpu_cb.py::TestCBBit::test_cb_bit_7_h_set PASSED
tests/test_core_cpu_cb.py::TestCBBit::test_cb_bit_7_h_clear PASSED
tests/test_core_cpu_cb.py::TestCBBit::test_cb_bit_preserves_carry PASSED
tests/test_core_cpu_cb.py::TestCBRot::test_cb_rl_c PASSED
tests/test_core_cpu_cb.py::TestCBRot::test_cb_rl_with_carry PASSED
tests/test_core_cpu_cb.py::TestCBHL::test_cb_set_3_hl PASSED
tests/test_core_cpu_cb.py::TestCBHL::test_cb_res_0_hl PASSED
tests/test_core_cpu_cb.py::TestCBSwap::test_cb_swap_a PASSED
tests/test_core_cpu_cb.py::TestCBSwap::test_cb_swap_zero_result PASSED
tests/test_core_cpu_cb.py::TestCBRLC::test_cb_rlc_z_flag PASSED
tests/test_core_cpu_cb.py::TestCBRLC::test_cb_rlc_nonzero_result PASSED
============================= 11 passed in 0.07s =============================

Sources consulted

Educational Integrity

What I Understand Now

  • Mathematical Structure of Opcode CB:CB opcode is perfectly organized for efficient decoding using bitwise operations. Bits 6-7 determine the group of operation, bits 3-5 determine the operation type or bit index, and bits 0-2 determine the destination record.
  • Flags Critical Difference:CB prefix rotations (RLC, RRC, RL, RR) calculate the Z flag according to the result, while fast rotations (RLCA, RRCA, RLA, RRA) They always put Z=0. This difference is critical for gaming compatibility.
  • Hardware Quirks:BIT always sets H=1, regardless of the value of the bit. RES and SET do not affect flags, preserving all previous flags. These peculiarities are implemented explicitly to ensure compatibility.
  • Precise Timing:Indirect memory access (HL) requires 4 M-Cycles because involves reading, modifying and writing, while registers only require 2 M-Cycles.
  • C++ Optimization:Native C++ bitwise operations compile directly to single-cycle machine instructions, offering maximum performance. The nested switch allows the compiler to optimize better than a flat 256-case switch.

What remains to be confirmed

  • Behavior with Real Games:Although the tests pass, it would be valuable to try with test ROMs allowed to verify that CB operations work correctly on real game contexts.
  • Edge Cases:Verify behavior with limit values (0x00, 0xFF) and with all the bits on/off in different combinations.

Hypotheses and Assumptions

The implementation is strictly based on Pan Docs. No additional assumptions have been made, and all design decisions are supported by official technical documentation.

Next Steps

  • [ ] Implement remaining CPU instructions (STOP, DAA, etc.) if necessary
  • [ ] Validate with allowed test ROMs to verify complete compatibility
  • [ ] Optimize the main emulation loop for maximum performance
  • [ ] Start implementation of APU (Audio Processing Unit) - Phase 2 main objective