This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Implementation of Stack and Subroutines in C++
Summary
Implemented Stack and subroutine operations in C++, adding the stack helpers (push_byte, pop_byte, push_word, pop_word) and 4 critical opcodes: PUSH BC (0xC5), POP BC (0xC1), CALL nn (0xCD) and RET (0xC9). The implementation respects the downward growth of the stack (SP decrements on PUSH) and the order Little-Endian correct. All tests pass, validating basic operations, Nested CALL/RET and correct stack behavior.
Hardware Concept
The Stack is a LIFO (Last In First Out) data structure that allows The CPU "remembers" return addresses when executing subroutines. In the Game Boy, the stack grows downward (decreasing memory addresses), which means that the Stack Pointer (SP) is decremented when PUSH is done and incremented when it goes POP.
Stack Growth: The stack grows downwards because the stack memory space is typically in the high region of RAM (0xFFFE is the typical initial value). As SP decreases, the stack expands towards lower addresses, avoiding collisions with program code and data.
Little-Endian in PUSH/POP: When PUSHing a 16-bit word (e.g. PC), the HIGH byte (MSB) is written first to SP-1 and then the LOW byte (LSB) to SP-2. When doing POP, the LOW byte of SP is read first and then the HIGH byte of SP+1, combining them in Little-Endian format. This order is critical for correct address restoration.
CALL and RET: CALL nn saves the return address (current PC) on the stack and then jumps to address nn. RET retrieves the return address from the stack and restores PC. Without these operations, the CPU cannot execute structured code with subroutines, limiting itself to "spaghetti jumps" without the ability to return.
C++ Optimization: Stack operations are extremely frequent in the Game Boy code (every CALL/RET, every interruption). In C++, these operations are compiled to simple pointer movements and direct memory allocations, offering brutal performance compared to Python where each operation involves multiple calls to function and object management.
Implementation
Added 4 private inline methods in the CPU class for stack operations:push_byte(), pop_byte(), push_word()andpop_word(). These methods handle Stack Pointer arithmetic and
the correct byte order (Little-Endian).
Components created/modified
- CPU.hpp: Added inline stack method declarations.
- CPU.cpp: Implementation of stack helpers and 4 new opcodes (0xC5, 0xC1, 0xCD, 0xC9).
- tests/test_core_cpu_stack.py: Complete suite of 4 tests to validate native stack.
Design decisions
- Inline methods: Stack helpers are private inline methods for maximum performance. The compiler embeds them directly into the opcodes, eliminating the cost function call.
- Wrap-around in 16-bit: All SP operations use
& 0xFFFFto ensure SP is always in the valid 16-bit range, even in edge cases (although on real hardware this should not happen). - Byte order in PUSH/POP: PUSH writes high byte first (SP-1), then low byte (SP-2). POP reads low byte first (SP), then high byte (SP+1). This order is consistent with the Little-Endian format of the Game Boy.
- CALL saves PC after fetch: CALL saves the PC value after reading the entire instruction (including the destination address), which is the address of the next instruction. This allows RET to correctly return to the code following the CALL.
Key code
// Helper for word PUSH (16 bits)
void CPU::push_word(uint16_t val) {
push_byte((val >> 8) & 0xFF); // High byte first
push_byte(val & 0xFF); // Low byte second
}
// Helper for word POP (16 bits)
uint16_t CPU::pop_word() {
uint8_t low = pop_byte(); // Low byte first
uint8_t high = pop_byte(); // High byte second
return (static_cast<uint16_t>(high) << 8) | static_cast<uint16_t>(low);
}
// CALL nn: Save return address and jump
case 0xCD: {
uint16_t target = fetch_word();
uint16_t return_addr = regs_->pc;
push_word(return_addr);
regs_->pc = target;
cycles_ += 6;
return 6;
}
Affected Files
src/core/cpp/CPU.hpp- Added stack method declarations (push_byte, pop_byte, push_word, pop_word)src/core/cpp/CPU.cpp- Implementation of stack helpers and opcodes (0xC5, 0xC1, 0xCD, 0xC9)tests/test_core_cpu_stack.py- Suite of 4 tests to validate stack operations
Tests and Verification
A complete test suite was created intest_core_cpu_stack.pywhich validates:
- test_push_pop_bc: Verifies PUSH BC and basic POP BC, validating that the data are saved and restored correctly, and that SP is decremented/incremented appropriately.
- test_stack_grows_downwards: Critical test that verifies that the stack grows downwards (SP decreases in PUSH). If the stack grew upwards, the games would become corrupted.
- test_call_ret_basic: Verifies CALL nn and basic RET, validating that the address return is correctly saved on the stack and RET restores PC correctly.
- test_call_nested: Checks nested CALL (subroutine that calls another subroutine), validating that multiple levels of calls are working correctly.
Result: All 4 tests pass correctly (0.06s of execution). The C++ implementation is functionally correct and ready for use in emulation.
Sources consulted
- Bread Docs:CPU Instruction Set- Sections on PUSH, POP, CALL and RET
- Bread Docs:Memory Map- Stack Pointer and stack region
Note: The implementation strictly follows the Pan Docs specification on the order of bytes in PUSH/POP and the behavior of the Stack Pointer.
Educational Integrity
What I Understand Now
- Stack Growth: The stack grows downward (SP decreases) because the space stack is in the high RAM region. This avoids collisions with code and data.
- Little-Endian in PUSH/POP: PUSH writes high byte first (SP-1), then low byte (SP-2). POP reads low byte first (SP), then high byte (SP+1). This order It is critical for correct address restoration.
- CALL/RET: CALL saves PC (return address) on the stack and jumps to the subroutine. RET recovers PC from the stack and restores execution. Without this, there is no structured code.
- C++ Performance: Stack operations are extremely frequent and in C++ they are compiled to simple pointer movements, offering brutal performance compared to Python.
What remains to be confirmed
- PUSH/POP from other peers: Currently only PUSH/POP BC is implemented. PUSH/POP DE, HL and AF still need to be implemented (the latter requires a special mask for F).
- Conditional CALL/RET: Conditional CALL/RET (CALL NZ, CALL Z, RET NZ, RET Z, etc.) that check flags before executing.
- Interruptions: Interrupts also use the stack to save PC. It remains to be validated that the behavior is correct when interrupts are implemented.
Hypotheses and Assumptions
Byte order in PUSH/POP is assumed to be consistent with the Little-Endian format of the Game Boy. This is supported by Pan Docs and tests confirm that it works correctly.
Next Steps
- [ ] Implement PUSH/POP for other register pairs (DE, HL, AF)
- [ ] Implement conditional CALL/RET (CALL NZ, CALL Z, RET NZ, RET Z, etc.)
- [ ] Implement more load and store (LD) opcodes
- [ ] Continue migrating more CPU opcodes to C++