Nothing in that link shows how a compiler could make use of a fine-grained probability estimate in a practical way to guide optimizations. I'm perfectly aware of the general concept of branch prediction and the annotations that certain architectures have in their instruction sets.
TL;DR: you can generate different, and more efficient code if you know how much it is going to be mispredicted.