Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is (1025, 1025, 1025) so much faster than (1024, 1024, 1024)?


My guess that it's happening mostly due cache conflicts. With 1024 for a simplified L1 with 32kb you can fit exactly 8 lines of the inner dimmension in the cache, which means that (0,8,0) would have the same cache location as (0, 0, 0), which is bad for tiling




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: