I'm just guessing here, but it looks like the two processors share SRAM and the beefy ARM processor draws the scene and writes it as tiles to SRAM. The Z80 reads the tiles from SRAM and blits them to the screen.
Basically you have an ARM processor doing a whole lot of work, and a Z80 in charge of moving it around and drawing supporting UI.
Yes, that is essentially what I'm doing.
The Arm does most of the heavy lifting for the actual game and the Z80 does input, sound, HUD, palette fading, hands+gun, main game loop, and of course it spends a lot of time just shuffling data to vram.
RAM is limited so the KE04 internally renders to a 2 bitplane framebuffer (the bit-banding feature of KE04 greatly helps).
When the Z80 needs the next frame it triggers an interrupt on the KE04 to wake it from sleep mode, it converts the framebuffer into GB vram ready tile + map attribute data and stores it on the dp-sram.
Z80 can then DMA directly from dp-sram into vram.
Some ranges in the dp-sram are for command buffers used for Z80<->KE04 communication.
The KE04 I am using has 128Kb ROM and 16Kb RAM, and lacks hardware division. This presents some interesting challenges in terms of memory and rom space usage, and juggling speed vs ram/rom usage.
You could of course put something much beefier in there but I think that would take too much of the fun away from the project.
All in all it's great fun and I'm learning a lot as I go along.
If I were to make another hardware revision I would use a CPLD instead of the MBC1 chip, and try to loose the dp-sram in favour of a normal sram.
They're utilizing a dual-port SRAM, meaning that the co-processor can read and write to the RAM at the same time as the Gameboy CPU can read and write to it. Those pins along the cartridge edge are actually just the address and data lines of the Gameboy CPU.
They've written a program for the Gameboy CPU whose job is to DMA data from the RAM to video RAM (it's a bit more complicated due to the architecture of the Gameboy GPU not being set up for streaming video at it).
The game itself is running on the ARM co-processor, writing data to a known location in the DP-SRAM and the Gameboy CPU is streaming it to the display.
That's very similar to what people are doing with the BeagleBone black...there's a pair of PRUs in the AM3XXX processor that have direct access to memory. So, you do the hard work on the ARM, but let the PRU push pixels (or other data) that needs to be real time, jitter free, etc.
That's basically right. He does some fancy stuff like DMA from the cartridge to VRAM to make it fast enough, but basically he just copies each frame into memory on the CGB and then swaps the background being displayed to make it appear. It takes two V-Blanks to copy an entire frame, so it runs at 30 FPS (While the CGB runs at about 60FPS).
I've invited the author to this thread since it seems several people are guessing/assuming what he is doing, plus I'm sure they'd like to know the great reaction they got here :)
That kind of trick was very common back in the day.
The Amiga used a very similar setup between the chipset and the CPU.
And cartridge based consoles have often included coprocessors on the carts (but nothing as potent as this ARM). At the tail end of the SNES years there was even a simple "GPU" in some of its carts.
I always wondered how all those 3D polygon games on the Sega Genesis tile engine too. I guess the games just rendered a screen buffer in RAM and created tiles on the fly.
Most games had their graphics tiles stored in ROM, but some games had 8KB of RAM instead of bank-switched ROM. Elite is in that category. So is Legend of Zelda, although Zelda's tiles seem to be stored verbatim in the program ROM, while Elite's must be algorithmically generated.
Basically you have an ARM processor doing a whole lot of work, and a Z80 in charge of moving it around and drawing supporting UI.