Impressive. Easy to get going, low overhead, powerful one-liners. I like the fil...

degio · on April 3, 2014

Brendan, thanks for the feedback. It's really cool to hear comments like this from someone like you. We really respect your work in the field.

Good catch on topprocs_file, we'll have to find a better name for it.

In terms of overhead, we put a lot of effort in it and, as you pointed out, we're already extremely optimized. But we think we can do even better. For example, we don't have any kind of kernel-level filtering yet. Coming soon! :)

SEJeff · on April 4, 2014

Any chance you are working on getting this upstream? I noticed Greg KH as one of the contributors

degio · on April 4, 2014

I guess it's early to tell, but if the kernel folks don't object we would be happy to work at including our driver in the kernel.

fche · on April 4, 2014

A 1 gigabyte no-IO dd is an unusual microbenchmark: it stresses only syscall rates while wasting time with memcpy. (A loop on getpid() or equivalent would have worked just as well.)

On my workstation the plain dd runs in 40 ms, with systemtap one-liner instrumenting/aggregating those 1 million write(2) syscalls in the kernel extended that tiny runtime to about 50 ms for a 1.25x slowdown. But such small numbers are hardly meaningful.

I'm curious to what extent userspace perf script postprocessing is deemed a technological equivalent to this; or why a new kernel module was deemed necessary versus the perf_event_open(2) ring-buffer abi.