⚠️ Clean-Room / Educational

This project is educational and Open Source. No code is copied from other emulators. Implementation based solely on technical documentation and permitted tests.

Step 0251: DMA Implementation (OAM Transfer)

Date:2025-12-23 StepID:0251 State: draft

Summary

This Step implements DMA (Direct Memory Access) transfer to copy data to OAM (Object Attribute Memory). When a game writes a value to the registry0xFF46, the hardware automatically copies 160 bytes from the addressXX00(where XX is the written value) to the OAM (0xFE00-0xFE9F). This functionality is critical for games to be able to update sprites, and many games (such as Tetris) They depend on it for their boot sequence.

Hardware Concept

The Game Boy includes a DMA (Direct Memory Access) mechanism that allows data to be copied to the OAM without intervention directly from the CPU. This mechanism is essential because OAM is accessible only during certain periods of the cycle. PPU rendering.

DMA operation:

  • DMA register (0xFF46): Write a valueXXin this record it begins a DMA transfer.
  • Origin Address: The written value forms the upper part of the source address:XX00(ex: write0xC0→ origin in0xC000).
  • Destination: The OAM is always on0xFE00-0xFE9F(160 bytes, 40 sprites × 4 bytes).
  • Duration: On real hardware, the transfer takes approximately 160 microseconds (640 CPU cycles).
  • Restriction during DMA: During the transfer, the CPU can only access HRAM (0xFF80-0xFFFE). Trying to access other memory regions during DMA can cause unpredictable behavior.

Why is it important?

Many games use DMA to copy sprite data to OAM because it is faster and more efficient than copying byte by byte with the CPU. Additionally, some games (such as Tetris) use DMA as part of their initialization sequence or as synchronization mechanism. If DMA is not implemented, these games can sit in infinite loops waiting for the transfer to complete.

Fountain:Pan Docs - "DMA Transfer"

Implementation

Implemented DMA transfer in the methodwrite()ofMMU.cpp. When a writing in0xFF46, the source address is calculated and 160 bytes are copied to the OAM.

Modified components

  • src/core/cpp/MMU.cpp: Added DMA transfer logic in methodwrite().

Design decisions

Instant DMA:For simplicity, we implement a snapshot copy of the 160 bytes. On real hardware, the transfer takes ~640 cycles, but for this first implementation we assume that the copy is immediate. A more precise implementation would require:

  • Count cycles during transfer (640 cycles).
  • Block memory access (except HRAM) during the transfer.
  • Synchronize with the PPU render cycle.

Address Validation:It validates that the destination address (0xFE00 + i) is inside of memory limits before writing. This prevents out-of-range access.

Use ofread():The method is usedread()of the MMU to read from the address origin, ensuring that all memory mapping rules are respected (ex: Echo RAM, special registers). This is important because the DMA can copy from any memory region (ROM, RAM, VRAM, etc.).

Implemented code

// --- Step 0251: DMA IMPLEMENTATION (OAM TRANSFER) ---
if (addr == 0xFF46) {
    // 1. Calculate source address: value * 0x100
    uint16_t source_base = static_cast<uint16_t>(value) << 8;
    
    // 2. Copy 160 bytes (0xA0) to OAM (0xFE00-0xFE9F)
    for (int i = 0; i < 160; i++) {
        uint16_t source_addr = source_base + i;
        uint8_t data = read(source_addr);
        if ((0xFE00 + i) < MEMORY_SIZE) {
            memory_[0xFE00 + i] = data;
        }
    }
    
    printf("[DMA] Transfer completed: %04X -> FE00 (160 bytes)\n", source_base);
}

Affected Files

  • src/core/cpp/MMU.cpp- Added DMA transfer logic in the methodwrite()(lines 302-323).

Tests and Verification

The implementation will be validated by running games that depend on DMA:

  • Tetris: The log for Step 0250 showed a DMA attempt (Write DMA[FF46] = 00). With this implementation, Tetris should be able to complete its boot sequence.
  • Mario Deluxe and Pokémon Red: These games already show graphical activity, but the DMA will allow for the sprites to render correctly.
  • DMA log: The code includes a log message ([DMA] Transfer completed...) which will confirm when a DMA transfer is executed.

Test command:

python main.py roms/tetris.gb

Expected validation:

  • See the message[DMA] Transfer completed...on the console.
  • Tetris exits the infinite loop (the PC changes0x2Bxx).
  • The sprites (pieces) appear on the screen or the logo is displayed correctly.

Sources consulted

Educational Integrity

What I Understand Now

  • DMA as a fast copy mechanism: DMA allows memory blocks to be copied without intervention directly from the CPU, which is more efficient than copying byte by byte.
  • Restrictions during DMA: In real hardware, during DMA transfer, the CPU can only access HRAM. This restriction is not implemented in this initial version, but is important for a precise emulation.
  • Using DMA in games: Games use DMA not only to copy sprites, but also as synchronization mechanism or as part of its initialization sequence.

What remains to be confirmed

  • Precise timing: The current implementation is instantaneous, but on real hardware it takes 640 cycles. We need to verify if the games depend on this timing to work correctly.
  • memory lock: We do not implement memory access blocking (except HRAM) during DMA. Some games may depend on this behavior.
  • Interaction with PPU: The OAM is only accessible during certain periods of the render cycle. We need to check if the DMA respects these periods or if it can write at any time.

Hypotheses and Assumptions

Instant DMA:We assume that a snapshot is enough for the games to work. If some games crash, it may be necessary to implement precise timing of 640 cycles.

Memory Access during DMA:For now, we allow the CPU to access any memory region during the DMA. If we encounter problems, we will need to implement access blocking (except HRAM).

Next Steps

  • [ ] Try Tetris and see if it exits the infinite loop.
  • [ ] Verify that sprites appear correctly in Mario and Pokémon.
  • [ ] If necessary, implement precise DMA timing (640 cycles).
  • [ ] If necessary, implement memory access blocking during DMA (except HRAM).
  • [ ] Investigate if there are other games that critically depend on DMA for their startup.