Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We have achieved this by incorporating hundreds of micro-optimizations. Each micro-optimization might improve the performance by as little as 0.05%. If we get one that improves performance by 0.25%, that is considered a huge win. Each of these optimizations is unmeasurable on a real-world system (we have to use cachegrind to get repeatable run-times) but if you do enough of them, they add up.

Interesting! That reminds me of my own experience in a different context (from https://blog.mozilla.org/nnethercote/2011/07/01/faster-javas...):

> Cachegrind [does] does event-based profiling, i.e. it counts instructions, memory accesses, etc, rather than time. When making a lot of very small improvements, noise variations often swamp the effects of the improvements, so being able to see that instruction counts are going down by 0.2% here, 0.3% there, is very helpful.



Of course since processors are out of order and superscalar reducing instructions isn't always a worthy goal either. You might end up with less instructions but less things happening in parallel. And then of course you have to balance this with the cache efficiency of having to look at less instructions to complete a task.

Basically, optimization at this level is really hard.


And yet, in practice, I have found that optimizing for instruction counts works really well. I've gotten way more mileage out of Cachegrind's instruction counts than I ever have out of its cache or branch prediction simulations.


I was tempted to respond after your first comment, but this followup provoked me to action. When you wrote your post several years ago, modelling with Cachegrind may still have been a defensible approach. But the divergence between its generic processor simulation and actual real world performance continues to diverge. As you note, this divergence first became very apparent with the cache and branch predictions. Currently, I'd argue that your time will almost always be better spent checking the CPU's built-in performance monitors rather than using Cachegrind. For the cases where Cachegrind used to be useful, 'perf' is a joy!


I don't have any experience with Cachegrind, so I don't have any reason to doubt what you're saying.

However, if it's true, what do you say about the real-world performance gains the SQLite folks have achieved by using Cachegrind?


Well, the SQLite folks are clearly still getting good usage from Cachegrind.


I'm not sure that's a strong argument for your case. Similarly to the way that Usain Bolt could probably beat me in the 100m even if he was in a wheelchair with flat tires, I'm sure Richard Hipp could probably do a decent job of optimizing SQLite with just a stub of pencil and a scrap of napkin. I just think he'd be more effective with better tools.

But you inspired me to test out some real numbers, which I posted in a new thread: https://news.ycombinator.com/item?id=8426302




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: