Yeah. When your timing results are a single digit multiple of your timing precis... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		gizmo686 80 days ago \| parent \| context \| favorite \| on: The Weird Concept of Branchless Programming Yeah. When your timing results are a single digit multiple of your timing precision, that is a good indication you either need a longer test, or a more precise clock. At a 5ms baseline with millisecond precision, the smallest improvement you can measure is 20%. And you cannot distinguish a 20% speedup with a 20% slowdown that happened to get luck with clock ticks. For what it is worth, I ran the provided test code on my machine with a 100x increase in iterations and got the following: `== Benchmarking ABS == ABS (branch): 0.260 sec ABS (branchless): 0.264 sec == Benchmarking CLAMP == CLAMP (branch): 0.332 sec CLAMP (branchless): 0.538 sec == Benchmarking PARTITION == PARTITION (branch): 0.043 sec PARTITION (branchless): 0.091 sec` Which is not exactly encouraging (gcc 13.3.0, -ffast-math -march=native. I did not use the -fomit-this-entire-function flag, which my compiler does not understand). I had to drop down to O0 to see branchless be faster in any case: `== Benchmarking ABS == ABS (branch): 0.743 sec ABS (branchless): 0.948 sec == Benchmarking CLAMP == CLAMP (branch): 4.275 sec CLAMP (branchless): 1.429 sec == Benchmarking PARTITION == PARTITION (branch): 0.156 sec PARTITION (branchless): 0.164 sec`

Roxxik 80 days ago | [–]

I also tried myself, on different array sizes, with more iterations. The branchy version is not strictly worse.

https://gist.github.com/Stefan-JLU/3925c6a73836ce841860b55c8...

Someone 79 days ago | [–]

> I had to drop down to O0 to see branchless be faster in any case

Did you check whether your branchy code actually still was branchy after the compiler processed it at higher optimization levels?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact