Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe this happens because branch prediction lets you pipeline multiple simultaneous comparisons, and rewind whenever the branch predictor is wrong (about half the time for truly random data and inputs). The CMOV approach blocks after each comparison function due to the data dependency. On average, you're doing two comparisons at a time with branches, and one with CMOV, so when comparison time is greater than branch prediction penalty, you would expect a crossover to occur.


Switching to an N-way search (for N>2) could help extract the lost parallelism. At the cost of touching more cachelines though, so it is not necessarily a win.


You're also wasting a lot of work doing that. If you go two levels deep, you now do 2 comparisons per iteration (caching the result from the previous level), but you are guaranteed to waste one of those. The CPU with branch predictor is doing about 1.5 comparisons per iteration (worst case) thanks to the 50% mispredict rate, although this depends a lot on how long the comparison takes: a very long comparison function will converge to 1 comparison per iteration. Only if you replicate the speculation for yourself - which will necessarily have to be a lot worse than a CPU branch predictor to be fast and branchless - do you have any hope of an efficiency gain.


You do additional comparisons at each recursion depth, but you have a shallower recursion depth though. I think you break-even already at 4-way search.


I think that only helps if you assume exactly 1.5 comparisons per level with a branch predictor, and I think that will be hard to hit: you basically need the comparison function to be very short to have that (shorter than the mispredict penalty). Then, you are doing 3 comparisons per two levels vs 3 comparisons for 2 levels. If you were to extend this logic to do 3 levels at a time, you would be doing 7 comparisons to do the same amount of work that the branch predictor does in expected <4.5 comparisons.


Ok, I have been nerd sniped. I need to get near a compiler and a profiler ASAP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: