> We have achieved this by incorporating hundreds of micro-optimizations. Each m...

amaranth · on Oct 8, 2014

Of course since processors are out of order and superscalar reducing instructions isn't always a worthy goal either. You might end up with less instructions but less things happening in parallel. And then of course you have to balance this with the cache efficiency of having to look at less instructions to complete a task.

Basically, optimization at this level is really hard.

nnethercote · on Oct 8, 2014

And yet, in practice, I have found that optimizing for instruction counts works really well. I've gotten way more mileage out of Cachegrind's instruction counts than I ever have out of its cache or branch prediction simulations.

nkurz · on Oct 8, 2014

I was tempted to respond after your first comment, but this followup provoked me to action. When you wrote your post several years ago, modelling with Cachegrind may still have been a defensible approach. But the divergence between its generic processor simulation and actual real world performance continues to diverge. As you note, this divergence first became very apparent with the cache and branch predictions. Currently, I'd argue that your time will almost always be better spent checking the CPU's built-in performance monitors rather than using Cachegrind. For the cases where Cachegrind used to be useful, 'perf' is a joy!

JohnBooty · on Oct 8, 2014

I don't have any experience with Cachegrind, so I don't have any reason to doubt what you're saying.

However, if it's true, what do you say about the real-world performance gains the SQLite folks have achieved by using Cachegrind?

nnethercote · on Oct 8, 2014

Well, the SQLite folks are clearly still getting good usage from Cachegrind.

nkurz · on Oct 8, 2014

I'm not sure that's a strong argument for your case. Similarly to the way that Usain Bolt could probably beat me in the 100m even if he was in a wheelchair with flat tires, I'm sure Richard Hipp could probably do a decent job of optimizing SQLite with just a stub of pencil and a scrap of napkin. I just think he'd be more effective with better tools.

But you inspired me to test out some real numbers, which I posted in a new thread: https://news.ycombinator.com/item?id=8426302