⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Implementation of ALU and Flags in C++

Date:2025-12-19 StepID:0105 State: Filled

Summary

The ALU (Arithmetic Logic Unit) and Flags management were implemented in C++, adding basic arithmetic (ADD, SUB) and logical operations (AND, XOR) to the native core. 5 new opcodes were implemented: INC A, DEC A, ADD A d8, SUB d8 and XOR A. All tests pass correctly, validating the management accurate flags (Z, N, H, C) and efficient half-carry calculation in C++.

Hardware Concept

The ALU (Arithmetic Logic Unit) is the component of the CPU that performs mathematical and logical operations. On the Game Boy (LR35902), the ALU operates mainly on register A (Accumulator) and updates 4 critical flags:

Z (Zero): Activated when the result of an operation is 0.
N (Subtract): Activated in subtraction operations, deactivated in addition.
H (Half-Carry): Activated when there is low nibble overflow (bit 3 → 4) in addition, or borrow in subtraction.
C (Carry): Activated when there is a complete overflow of 8 bits (overflow/underflow).

The Half-Carry calculation is critical for the DAA (Decimal Adjust Accumulator) instruction, which converts binary results to BCD (Binary Coded Decimal). In C++, these operations bitwise are compiled to very few machine instructions, offering maximum performance.

C++ Optimization: The formula((a & 0xF) + (b & 0xF)) > 0xFfor half-carry compiles directly to register operations, eliminating overhead of Python objects and function calls.

Implementation

Added 4 private inline methods in the CPU class for ALU operations:alu_add(), alu_sub(), alu_and()andalu_xor(). These methods update the A register and the flags atomically, using the methods inline ofCoreRegistersfor maximum performance.

Components created/modified

CPU.hpp: Added inline ALU method declarations.
CPU.cpp: Implementation of ALU methods and 5 new opcodes (0x3C, 0x3D, 0xC6, 0xD6, 0xAF).
tests/test_core_cpu_alu.py: Complete suite of 7 tests to validate native ALU.

Design decisions

Inline methods: ALU helpers are private inline methods for the compiler to Embed them directly into the opcode switch, eliminating function call costs.
Reusing CoreRegisters: Methods are usedset_flag_*()already existing in CoreRegisters, which are inline and automatically apply the F-register mask.
Half-Carry Calculation: The original value of A before the operation is saved for correctly calculate half-carry/half-borrow.
Opcodes implemented:
- 0x3C: INC A (Increment A) - 1 M-Cycle
- 0x3D: DEC A (Decrement A) - 1 M-Cycle
- 0xC6: ADD A, d8 (Add immediate) - 2 M-Cycles
- 0xD6: SUB d8 (Subtract immediate) - 2 M-Cycles
- 0xAF: XOR A (XOR A with A, optimization for A=0) - 1 M-Cycle

Key code

// Example: alu_add() in CPU.cpp
void CPU::alu_add(uint8_t value) {
    uint8_t a_old = regs_->a;  // Save to calculate flags
    uint16_t result = static_cast<uint16_t>(a_old) + static_cast<uint16_t>(value);
    regs_->a = static_cast<uint8_t>(result);
    
    // Flags
    regs_->set_flag_z(regs_->a == 0);
    regs_->set_flag_n(false);
    
    // Half-carry: ((a_old & 0xF) + (value & 0xF)) > 0xF
    uint8_t a_low = a_old & 0x0F;
    uint8_t value_low = value & 0x0F;
    regs_->set_flag_h((a_low + value_low) > 0x0F);
    
    regs_->set_flag_c(result > 0xFF);
}

Affected Files

src/core/cpp/CPU.hpp- Added inline ALU method declarations
src/core/cpp/CPU.cpp- ALU implementation and 5 new opcodes
tests/test_core_cpu_alu.py- Suite of 7 tests to validate native ALU

Tests and Verification

A complete test suite was created in Python that validates the native ALU:

test_add_immediate_basic: Basic addition (10 + 2 = 12), check flags Z, N, H, C.
test_sub_immediate_zero_flag: Subtraction that activates Flag Z (10 - 10 = 0).
test_add_half_carry: Half-carry detection (0x0F + 0x01 = 0x10).
test_xor_a_optimization: XOR A clears A to 0 and activates Z.
test_inc_a: Increment of A with update of flags.
test_dec_a: Decrement of A with half-borrow.
test_add_full_carry: Full carry detection (0xFF + 0x01 = 0x00).

Result: ✅ 7/7 tests passed(100% success).

C++ Compiled Module Validation: Cython extension was compiled correctly without errors. The tests execute native C++ code through the wrapper Python, validating full interoperability.

Sources consulted

Bread Docs:CPU Instruction Set- Arithmetic operations and flags section
Bread Docs:CPU Registers and Flags- Specification of flags Z, N, H, C
GBEDG:Game Boy Opcodes- Opcode reference 0x3C, 0x3D, 0xC6, 0xD6, 0xAF

Educational Integrity

What I Understand Now

Half-Carry in C++: The formula((a & 0xF) + (b & 0xF)) > 0xFcompiles to very few machine instructions (AND, ADD, CMP), offering performance maximum compared to Python where each operation creates int objects.
Flags and DAA: The H (Half-Carry) flag is critical for DAA, which adjusts binary results to BCD. Without correct H, DAA fails and games that use BCD crash.
XOR A Optimization: XOR A(0xAF) is a common optimization in Game Boy code to clear A to 0 in a single cycle, more efficient thanLD A, 0.
Inline in C++: Inline methods are embedded directly in the code of call, eliminating the function call overhead. In the critical emulation loop, This is essential for performance.

What remains to be confirmed

ADC/SBC: Operations with previous carry/borrow (ADC A, d8 and SBC A, d8) not yet implemented. They require reading the C flag before the operation.
Operations with records: ADD A, r (where r is B, C, D, E, H, L) yet not implemented. They require mapping from opcodes to registers.
OR and CP: Logical OR operations and CP comparison not yet implemented.

Hypotheses and Assumptions

Half-Borrow in DEC: The current implementation calculates half-borrow as(old_a & 0x0F) == 0x00, which detects when the low nibble is 0 before decrease This is correct according to Pan Docs, but was validated with tests to ensure accuracy.

Next Steps

[ ] Implement ADC A, d8 (0xCE) and SBC A, d8 (0xDE) - carry/borrow operations
[ ] Implement ALU operations with registers (ADD A, r where r = B, C, D, E, H, L)
[ ] Implement remaining logical operations (OR, CP)
[ ] Implement 16-bit operations (ADD HL, rr, INC rr, DEC rr)
[ ] Optimize the opcode switch with lookup tables or jump tables