Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You might want alignment there as well.


Both alignment and unalignment actually.

GPUs are weird. Prime numbers to 'unalign' data so that you minimize bank conflicts is a common optimization trick.

GPUs don't have one memory load/store unit. They are incredibly parallel and have like 32 load/store units that try to operate in parallel.

If all your data is aligned, then bank#0 gets more requests than bank#31. (Thread#0 accesses memory 800. Thread#1 accesses memory 832. Threas#2 accesses 864... Woops you just hammered one bank and now 31 of your memory banks are sitting around doing nothing, while bank#0 is doing all the work sequentially)

Unalignment means more read/writes are sent to bank#31, and fewer to bank#0, better balancing the load across your parallel load/store units.


Here be dragons, and this person tames them. This is insane, actually. I’m guessing it would be cool if gpu automagically scrambled memory so you didn't have to manually unalign it?


When you're doing "uint32_t array[thread_idx.x]" sorts of things, you'll notice that your threads are all lined up with the array. So you're in perfect bank-alignment.

With "array[thread_idx.x]" kind of access, Thread#0 accesses array[0], Thread#1 accesses array[1]... etc. etc.

array[0] might map to memory location #0x8001200, which will probably be bank#0. array[1] might map to #0x8001204, which would be bank#1. Etc. etc. (I forget exactly how many bytes per bank, but... you get the gist).

At the end of the day, all your array[] accesses from Thread#0 through Thread#1023 of your workgroup/block will be perfectly balanced and perfectly spread out between all banks.

--------

So really, the "lesson" is to just organize your data in arrays as much as possible. GPUs are really, really good at simple array reads/writes.

That's not always possible of course. You should only "shuffle" the banks if you know for certain that one bank is going to be hit more than the other banks.

--------

It really comes down to the size of the object you made an array out of. If you have a large object for some reason, maybe array[0] and array[1], array[2], etc. etc. will all map to bank#0.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: