This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.
Full Stack and Accumulator Rotations
Summary
Stack management completed by implementing PUSH/POP for all register pairs (AF, DE, HL), and fast accumulator rotations (RLCA, RRCA, RLA, RRA) were added. The POP AF implementation includes the critical mask 0xF0 for the low bits of the F register, simulating the behavior of real hardware. Rapid rotations have a behavior Special with flags: Z is always 0, even if the result is zero. This is a difference key with the rotations of the CB prefix. All tests pass (17 tests in total).
Hardware Concept
Full Stack
The stack on the Game Boy is a region of memory that grows downward (from high to low addresses). Allows saving and restoring the state of registers during calls to subroutines and handling of interruptions. We already had PUSH/POP BC implemented; now we complete with AF, DE and HL.
CRITICAL - POP AF and Mask 0xF0:
When we POP AF, we retrieve the Flags (F) register from the stack. On real hardware Game Boy, the low 4 bits of the F register (bits 0-3) are ALWAYS zero. This is a feature hardware physics, not a software convention.
If we do not apply the 0xF0 mask to the value retrieved from the stack, the low bits may contain "garbage" that affects flag comparisons. Games like Tetris fail when checking flags if these bits are not clear, because the conditional instructions (JR NZ, RET Z, etc.) are They behave randomly.
Fast Accumulator Rotations
Fast rotations (0x07, 0x0F, 0x17, 0x1F) are optimized instructions that rotate register A in different ways. They are "fast" because they only operate on A and consume 1 cycle, unlike the rotations of the CB prefix that can operate on any record.
Rotation types:
- RLCA (0x07):Rotate Left Circular Accumulator. Bit 7 leaves and enters through bit 0. It is also copied to the C flag.
- RRCA (0x0F):Rotate Right Circular Accumulator. Bit 0 leaves and enters through bit 7. It is also copied to the C flag.
- RLA (0x17):Rotate Left Accumulator through Carry. Bit 7 goes to flag C, and *old* flag C goes in at bit 0. It's a 9-bit rotation (8 bits of A + 1 bit of C).
- RRA (0x1F):Rotate Right Accumulator through Carry. Bit 0 goes to flag C, and *old* flag C goes in at bit 7. It's a 9-bit rotation.
CRITICAL - Flags in Fast Rotations:
These instructions ALWAYS put Z=0, N=0, H=0. They only affect C. This is a key difference with the CB rotations (0xCB), where Z is normally calculated based on the result. If the result of a fast rotation is 0, Z is still 0 (hardware quirk).
Use in Games:
Rotations through carry (RLA, RRA) are essential for number generators pseudo-random. Games like Tetris use RLA intensively to generate random sequences of pieces. Without these instructions, the game would hang waiting for a valid random number.
Implementation
Implemented 10 new opcodes: 6 for stack (PUSH/POP AF, DE, HL) and 4 for rotations fast (RLCA, RRCA, RLA, RRA).
Components created/modified
src/cpu/core.py: Added handlers for PUSH/POP AF, DE, HL and fast rotations. POP AF applies 0xF0 mask usingset_af()that internally callsset_f()that already applies the mask.tests/test_cpu_stack.py: Added 3 new tests for PUSH/POP DE, HL and AF (including critical mask test 0xF0).tests/test_cpu_rotations.py: New file with 9 tests for all rotations, including flag quirk tests (Z always 0) and RLA chains to simulate random generators.
Design decisions
0xF0 mask in POP AF:
It was taken advantage ofset_af()already calls internallyset_f(), which applies the
mask 0xF0 automatically. This ensures that the low bits of F are always clean, without
need for additional code in the handler.
Rotations: Explicit implementation of flags:
Although fast rotations always set Z=0, N=0, H=0, it was implemented explicitly in each handler for clarity and to avoid errors if the behavior of helpers is modified in the future generic. This also makes the code more self-documenting.
Reusing stack helpers:
All PUSH/POP reuse helpers_push_word()and_pop_word()that
They already existed, guaranteeing consistency in byte order (Little-Endian) and Stack Pointer management.
Affected Files
src/cpu/core.py- Added 10 new opcode handlers (PUSH/POP AF, DE, HL and fast rotations)tests/test_cpu_stack.py- Added 3 new tests (PUSH/POP DE, HL, AF with mask)tests/test_cpu_rotations.py- New file with 9 tests for quick rotations
Tests and Verification
12 new tests were created (3 for stack + 9 for rotations) and they all pass correctly:
- Unit tests:pytest with 17 tests passing (5 existing stack tests + 3 new + 9 rotations)
- POP AF critical test:Verify that when you retrieve 0xFFFF from the stack, F becomes 0xF0 (low bits cleared)
- Rotation tests:They validate circular rotations, rotations through carry, flag quirk (Z always 0), and RLA chains for random generators
- Documentation:Pan Docs - CPU Instruction Set (PUSH/POP, fast rotations, flags behavior)
Sources consulted
- Bread Docs:CPU Instruction Set - Stack Operations
- Bread Docs:CPU Instruction Set - Rotations (RLCA, RRCA, RLA, RRA)
- Bread Docs:Hardware quirks - F register mask (low bits always 0)
- Bread Docs:Flags behavior - Fast rotations vs CB rotations
Note: Implementation based on Pan Docs technical documentation on the behavior of the LR35902 hardware.
Educational Integrity
What I Understand Now
- Mask 0xF0 in F:The low 4 bits of the F register are always zero in real hardware. This is not a software convention, but rather a physical hardware limitation. If we do not apply mask in POP AF, flags can have invalid values that break conditional logic.
- Fast rotations vs CB:Fast rotations (0x07, 0x0F, 0x17, 0x1F) have a special behavior with flags: Z is always 0, even if the result is zero. The rotations CB calculate Z normally based on the result. This difference is critical to the accuracy of the emulator.
- Rotations through carry:RLA and RRA are 9-bit rotations (8 bits of A + 1 bit of C). The old carry goes into the register, and the bit that comes out goes to the carry. This allows you to create generators of efficient pseudo-random numbers.
- Use in games:Rotations through carry are essential for random generators. Without them, games like Tetris cannot generate random pieces and crash.
What remains to be confirmed
- Exact timing:For now we assume that all fast rotations consume 1 M-Cycle. Pending verification if there are subtle differences in timing between circular rotations and through carry.
- Behavior in edge cases:The rotations are tested with common values, but More tests with extreme values (0x00, 0xFF, etc.) may be needed to ensure that the wrap-around works correctly in all cases.
Hypotheses and Assumptions
We assume that the behavior of flags in fast rotations (Z always 0) is consistent across hardware GameBoy. This assumption is supported by Pan Docs, but we have not verified with real hardware or multiple emulators (due to the clean-room rule, we cannot consult code from other emulators).
Next Steps
- [ ] Implement more CB prefix opcodes (rotations, shifts, BIT, SET, RES)
- [ ] Implement conditional CALLs (CALL NZ, CALL Z, CALL NC, CALL C)
- [ ] Implement conditional JPs (JP NZ, JP Z, JP NC, JP C)
- [ ] Verify that the emulator can execute more Tetris DX code (goal: exceed 100,000 cycles)
- [ ] Implement complete interrupt system (IF, IE, interrupt handling)