> Since threads are processed in parallel, tiny-gpu assumes that all threads "converge" to the same program counter after each instruction - which is a naive assumption for the sake of simplicity.
> In real GPUs, individual threads can branch to different PCs, causing branch divergence where a group of threads threads initially being processed together has to split out into separate execution.
Whoops. Maybe this person should try programming for a GPU before attempting to build one out of silicon.
Not to mention the whole SIMD that... isn't.
(This is the same person who stapled together other people's circuits to blink an LED and claimed to have built a CPU)
No, that effectively syncs all warps in a thread group. This implementation isn't doing any synchronization, it's independently doing PC/decode for different instructions, and just assuming they won't diverge. That's... a baffling combination of decisions; why do independent PC/decode if they're not to diverge? It reads as a very basic lack of ability to understand the core fundamental value of a GPU. And this isn't a secret GPU architecture thing. Here's a slide deck from 2009 going over the actual high-level architecture of a GPU. Notice how fetch/decode are shared between threads.
> In real GPUs, individual threads can branch to different PCs, causing branch divergence where a group of threads threads initially being processed together has to split out into separate execution.
Whoops. Maybe this person should try programming for a GPU before attempting to build one out of silicon.
Not to mention the whole SIMD that... isn't.
(This is the same person who stapled together other people's circuits to blink an LED and claimed to have built a CPU)