⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Implementation of ALU and Flags in C++

Date:2025-12-19 StepID:0105 State: Filled

Summary

The ALU (Arithmetic Logic Unit) and Flags management were implemented in C++, adding basic arithmetic (ADD, SUB) and logical operations (AND, XOR) to the native core. 5 new opcodes were implemented: INC A, DEC A, ADD A d8, SUB d8 and XOR A. All tests pass correctly, validating the management accurate flags (Z, N, H, C) and efficient half-carry calculation in C++.

Hardware Concept

The ALU (Arithmetic Logic Unit) is the component of the CPU that performs mathematical and logical operations. On the Game Boy (LR35902), the ALU operates mainly on register A (Accumulator) and updates 4 critical flags:

  • Z (Zero): Activated when the result of an operation is 0.
  • N (Subtract): Activated in subtraction operations, deactivated in addition.
  • H (Half-Carry): Activated when there is low nibble overflow (bit 3 → 4) in addition, or borrow in subtraction.
  • C (Carry): Activated when there is a complete overflow of 8 bits (overflow/underflow).

The Half-Carry calculation is critical for the DAA (Decimal Adjust Accumulator) instruction, which converts binary results to BCD (Binary Coded Decimal). In C++, these operations bitwise are compiled to very few machine instructions, offering maximum performance.

C++ Optimization: The formula((a & 0xF) + (b & 0xF)) > 0xFfor half-carry compiles directly to register operations, eliminating overhead of Python objects and function calls.

Implementation

Added 4 private inline methods in the CPU class for ALU operations:alu_add(), alu_sub(), alu_and()andalu_xor(). These methods update the A register and the flags atomically, using the methods inline ofCoreRegistersfor maximum performance.

Components created/modified

  • CPU.hpp: Added inline ALU method declarations.
  • CPU.cpp: Implementation of ALU methods and 5 new opcodes (0x3C, 0x3D, 0xC6, 0xD6, 0xAF).
  • tests/test_core_cpu_alu.py: Complete suite of 7 tests to validate native ALU.

Design decisions

  • Inline methods: ALU helpers are private inline methods for the compiler to Embed them directly into the opcode switch, eliminating function call costs.
  • Reusing CoreRegisters: Methods are usedset_flag_*()already existing in CoreRegisters, which are inline and automatically apply the F-register mask.
  • Half-Carry Calculation: The original value of A before the operation is saved for correctly calculate half-carry/half-borrow.
  • Opcodes implemented:
    • 0x3C: INC A (Increment A) - 1 M-Cycle
    • 0x3D: DEC A (Decrement A) - 1 M-Cycle
    • 0xC6: ADD A, d8 (Add immediate) - 2 M-Cycles
    • 0xD6: SUB d8 (Subtract immediate) - 2 M-Cycles
    • 0xAF: XOR A (XOR A with A, optimization for A=0) - 1 M-Cycle

Key code

// Example: alu_add() in CPU.cpp
void CPU::alu_add(uint8_t value) {
    uint8_t a_old = regs_->a;  // Save to calculate flags
    uint16_t result = static_cast<uint16_t>(a_old) + static_cast<uint16_t>(value);
    regs_->a = static_cast<uint8_t>(result);
    
    // Flags
    regs_->set_flag_z(regs_->a == 0);
    regs_->set_flag_n(false);
    
    // Half-carry: ((a_old & 0xF) + (value & 0xF)) > 0xF
    uint8_t a_low = a_old & 0x0F;
    uint8_t value_low = value & 0x0F;
    regs_->set_flag_h((a_low + value_low) > 0x0F);
    
    regs_->set_flag_c(result > 0xFF);
}

Affected Files

  • src/core/cpp/CPU.hpp- Added inline ALU method declarations
  • src/core/cpp/CPU.cpp- ALU implementation and 5 new opcodes
  • tests/test_core_cpu_alu.py- Suite of 7 tests to validate native ALU

Tests and Verification

A complete test suite was created in Python that validates the native ALU:

  • test_add_immediate_basic: Basic addition (10 + 2 = 12), check flags Z, N, H, C.
  • test_sub_immediate_zero_flag: Subtraction that activates Flag Z (10 - 10 = 0).
  • test_add_half_carry: Half-carry detection (0x0F + 0x01 = 0x10).
  • test_xor_a_optimization: XOR A clears A to 0 and activates Z.
  • test_inc_a: Increment of A with update of flags.
  • test_dec_a: Decrement of A with half-borrow.
  • test_add_full_carry: Full carry detection (0xFF + 0x01 = 0x00).

Result: ✅ 7/7 tests passed(100% success).

C++ Compiled Module Validation: Cython extension was compiled correctly without errors. The tests execute native C++ code through the wrapper Python, validating full interoperability.

Sources consulted

Educational Integrity

What I Understand Now

  • Half-Carry in C++: The formula((a & 0xF) + (b & 0xF)) > 0xFcompiles to very few machine instructions (AND, ADD, CMP), offering performance maximum compared to Python where each operation creates int objects.
  • Flags and DAA: The H (Half-Carry) flag is critical for DAA, which adjusts binary results to BCD. Without correct H, DAA fails and games that use BCD crash.
  • XOR A Optimization: XOR A(0xAF) is a common optimization in Game Boy code to clear A to 0 in a single cycle, more efficient thanLD A, 0.
  • Inline in C++: Inline methods are embedded directly in the code of call, eliminating the function call overhead. In the critical emulation loop, This is essential for performance.

What remains to be confirmed

  • ADC/SBC: Operations with previous carry/borrow (ADC A, d8 and SBC A, d8) not yet implemented. They require reading the C flag before the operation.
  • Operations with records: ADD A, r (where r is B, C, D, E, H, L) yet not implemented. They require mapping from opcodes to registers.
  • OR and CP: Logical OR operations and CP comparison not yet implemented.

Hypotheses and Assumptions

Half-Borrow in DEC: The current implementation calculates half-borrow as(old_a & 0x0F) == 0x00, which detects when the low nibble is 0 before decrease This is correct according to Pan Docs, but was validated with tests to ensure accuracy.

Next Steps

  • [ ] Implement ADC A, d8 (0xCE) and SBC A, d8 (0xDE) - carry/borrow operations
  • [ ] Implement ALU operations with registers (ADD A, r where r = B, C, D, E, H, L)
  • [ ] Implement remaining logical operations (OR, CP)
  • [ ] Implement 16-bit operations (ADD HL, rr, INC rr, DEC rr)
  • [ ] Optimize the opcode switch with lookup tables or jump tables