I don't get it. The problem with binary search and branches is not the branches themselves, it's the fact that until you have done the comparison, you don't know which memory location in the array to fetch next. It doesn't matter if you use branches or anything else, the question is what do you want the processor to do?
There is a data dependency: until I read the middle index, I can't tell if I want to search the data in the upper section or the lower section. I can speculate and issue reads to both. That would fix the dependency, but create more memory traffic. Is this the right trade-off?
Binary search reduces the search space exponentially as it proceeds, so actually quite a lot of the total comparisons can hit L1d cache. (Maybe half of them for a ~250GB dataset.)
Of course, you could keep a cacheable partial index of your huge dataset to accelerate the early part of your search as well.
There is a data dependency: until I read the middle index, I can't tell if I want to search the data in the upper section or the lower section. I can speculate and issue reads to both. That would fix the dependency, but create more memory traffic. Is this the right trade-off?
What do you want? Removing branches is not it.