The cycle-counts returned by cachegrind are repeatable, to 7 or 8 significant fi...

nkurz · on Oct 8, 2014

> The cycle-counts returned by cachegrind are repeatable, to 7 or 8 significant figures. ... I don't think perf is quite so repeatable

It can come close. I just tried, and for 'speedtest1' seemed to be getting 3 significant digits for the cycle counts, and 4 for the instruction count. You'd probably gain another one or two if you were to measure computation only and remove the printf() and other I/O statements. The underlying performance counters are pretty much cycle accurate.

> cg_annotate utility gives me a complete program listing showing me the cycle counts spent on each line of code... If perf provides such a tool, I am unaware of it.

Yes, that record/report combination I quoted above does just this, with insignificant runtime overhead. Unlike the total counts, this one is sampled, so you might get a little more variation. It's definitely good enough for quickly finding hotspots, and there are other (harder to use) tools that can use the precise "PEBS" events you need complete counts. These even allow you to do nifty things like track the number of times each branch statement is mispredicted.

> I'm measuring the performance on the "cachegrind virtual CPU" of a binary prepared using GCC and -Os because that combination gives repeatable measurements that are easy to map into specific lines of source code.

Absolutely, this is the right way to view cachegrind. My question would be whether it is the right generic CPU to be using, and whether optimizations made on it translate well to other modern CPUs. Many of them will, but I think you'd have faster turnaround time even better success with an approach that uses a real CPU and its performance counters.

> I'm finding cachegrind to be a better tool to help with implementing micro-optimizations.

Please realize I have the utmost respect for your work on SQLite. It's my most frequent answer when asked for an example of C code to study, learn from, and pattern after. I'm certain you will manage to optimize it with any tool you choose, but having spent many hours with GProf and cachegrind myself, I (will the zeal of a recent convert) think you'll be amazed with some of the things that are now possible with performance counters.