This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Implementation of ALU and Flags in C++
Summary
The ALU (Arithmetic Logic Unit) and Flags management were implemented in C++, adding basic arithmetic (ADD, SUB) and logical operations (AND, XOR) to the native core. 5 new opcodes were implemented: INC A, DEC A, ADD A d8, SUB d8 and XOR A. All tests pass correctly, validating the management accurate flags (Z, N, H, C) and efficient half-carry calculation in C++.
Hardware Concept
The ALU (Arithmetic Logic Unit) is the component of the CPU that performs mathematical and logical operations. On the Game Boy (LR35902), the ALU operates mainly on register A (Accumulator) and updates 4 critical flags:
- Z (Zero): Activated when the result of an operation is 0.
- N (Subtract): Activated in subtraction operations, deactivated in addition.
- H (Half-Carry): Activated when there is low nibble overflow (bit 3 → 4) in addition, or borrow in subtraction.
- C (Carry): Activated when there is a complete overflow of 8 bits (overflow/underflow).
The Half-Carry calculation is critical for the DAA (Decimal Adjust Accumulator) instruction, which converts binary results to BCD (Binary Coded Decimal). In C++, these operations bitwise are compiled to very few machine instructions, offering maximum performance.
C++ Optimization: The formula((a & 0xF) + (b & 0xF)) > 0xFfor half-carry compiles directly to register operations, eliminating overhead
of Python objects and function calls.
Implementation
Added 4 private inline methods in the CPU class for ALU operations:alu_add(), alu_sub(), alu_and()andalu_xor().
These methods update the A register and the flags atomically, using the methods
inline ofCoreRegistersfor maximum performance.
Components created/modified
- CPU.hpp: Added inline ALU method declarations.
- CPU.cpp: Implementation of ALU methods and 5 new opcodes (0x3C, 0x3D, 0xC6, 0xD6, 0xAF).
- tests/test_core_cpu_alu.py: Complete suite of 7 tests to validate native ALU.
Design decisions
- Inline methods: ALU helpers are private inline methods for the compiler to Embed them directly into the opcode switch, eliminating function call costs.
- Reusing CoreRegisters: Methods are used
set_flag_*()already existing in CoreRegisters, which are inline and automatically apply the F-register mask. - Half-Carry Calculation: The original value of A before the operation is saved for correctly calculate half-carry/half-borrow.
- Opcodes implemented:
0x3C: INC A (Increment A) - 1 M-Cycle0x3D: DEC A (Decrement A) - 1 M-Cycle0xC6: ADD A, d8 (Add immediate) - 2 M-Cycles0xD6: SUB d8 (Subtract immediate) - 2 M-Cycles0xAF: XOR A (XOR A with A, optimization for A=0) - 1 M-Cycle
Key code
// Example: alu_add() in CPU.cpp
void CPU::alu_add(uint8_t value) {
uint8_t a_old = regs_->a; // Save to calculate flags
uint16_t result = static_cast<uint16_t>(a_old) + static_cast<uint16_t>(value);
regs_->a = static_cast<uint8_t>(result);
// Flags
regs_->set_flag_z(regs_->a == 0);
regs_->set_flag_n(false);
// Half-carry: ((a_old & 0xF) + (value & 0xF)) > 0xF
uint8_t a_low = a_old & 0x0F;
uint8_t value_low = value & 0x0F;
regs_->set_flag_h((a_low + value_low) > 0x0F);
regs_->set_flag_c(result > 0xFF);
}
Affected Files
src/core/cpp/CPU.hpp- Added inline ALU method declarationssrc/core/cpp/CPU.cpp- ALU implementation and 5 new opcodestests/test_core_cpu_alu.py- Suite of 7 tests to validate native ALU
Tests and Verification
A complete test suite was created in Python that validates the native ALU:
- test_add_immediate_basic: Basic addition (10 + 2 = 12), check flags Z, N, H, C.
- test_sub_immediate_zero_flag: Subtraction that activates Flag Z (10 - 10 = 0).
- test_add_half_carry: Half-carry detection (0x0F + 0x01 = 0x10).
- test_xor_a_optimization: XOR A clears A to 0 and activates Z.
- test_inc_a: Increment of A with update of flags.
- test_dec_a: Decrement of A with half-borrow.
- test_add_full_carry: Full carry detection (0xFF + 0x01 = 0x00).
Result: ✅ 7/7 tests passed(100% success).
C++ Compiled Module Validation: Cython extension was compiled correctly without errors. The tests execute native C++ code through the wrapper Python, validating full interoperability.
Sources consulted
- Bread Docs:CPU Instruction Set- Arithmetic operations and flags section
- Bread Docs:CPU Registers and Flags- Specification of flags Z, N, H, C
- GBEDG:Game Boy Opcodes- Opcode reference 0x3C, 0x3D, 0xC6, 0xD6, 0xAF
Educational Integrity
What I Understand Now
- Half-Carry in C++: The formula
((a & 0xF) + (b & 0xF)) > 0xFcompiles to very few machine instructions (AND, ADD, CMP), offering performance maximum compared to Python where each operation creates int objects. - Flags and DAA: The H (Half-Carry) flag is critical for DAA, which adjusts binary results to BCD. Without correct H, DAA fails and games that use BCD crash.
- XOR A Optimization:
XOR A(0xAF) is a common optimization in Game Boy code to clear A to 0 in a single cycle, more efficient thanLD A, 0. - Inline in C++: Inline methods are embedded directly in the code of call, eliminating the function call overhead. In the critical emulation loop, This is essential for performance.
What remains to be confirmed
- ADC/SBC: Operations with previous carry/borrow (ADC A, d8 and SBC A, d8) not yet implemented. They require reading the C flag before the operation.
- Operations with records: ADD A, r (where r is B, C, D, E, H, L) yet not implemented. They require mapping from opcodes to registers.
- OR and CP: Logical OR operations and CP comparison not yet implemented.
Hypotheses and Assumptions
Half-Borrow in DEC: The current implementation calculates half-borrow as(old_a & 0x0F) == 0x00, which detects when the low nibble is 0 before
decrease This is correct according to Pan Docs, but was validated with tests to ensure accuracy.
Next Steps
- [ ] Implement ADC A, d8 (0xCE) and SBC A, d8 (0xDE) - carry/borrow operations
- [ ] Implement ALU operations with registers (ADD A, r where r = B, C, D, E, H, L)
- [ ] Implement remaining logical operations (OR, CP)
- [ ] Implement 16-bit operations (ADD HL, rr, INC rr, DEC rr)
- [ ] Optimize the opcode switch with lookup tables or jump tables