Impressive. Easy to get going, low overhead, powerful one-liners.
I like the filter syntax - would be nice for perf_events to pick this up. Although, if it did, I hope that the stable filter fields API can be extended with unstable arbitrary expressions as needed, for when dynamic probes are used.
What perf_events realy lacks is a way for custom processing of data in kernel context, to reduce the overheads of enablings. Eg, lets say I want a histogram of disk I/O latency. sysdig has chisels, which look like they do what I want, but from the Chisels User Guide: "Usually, with dtrace-like tools you write your scripts using a domain-specific language that gets compiled into bytecode and injected in the kernel. Draios uses a different approach: events are efficiently brought to user-level, enriched with context, and then scripts can be applied to them." Oh no, not user-level!
I tested this quickly, expecting DTrace's approach (which is the same as SystemTap and ktap) to blow sysdig out of the water. But the results were surprising (take these quick tests with a grain of salt). Here's my target command, along with sysdig and DTrace enablings, and strace for comparison:
sysdig slowed the target by about 4x. DTrace, between 2.5 and 2.7x. strace (for comparison), over 200x. This is a worst-case test, and if I'm willing to slow a target by 2x then taking that to 4x doesn't make much difference. With what I normally trace, the overheads are 1/100th of that, so DTrace is negligible. The take-away here is that the overheads are closer to the "negligible" end of the spectrum than strace's "violent" end. Which I found surprising for user-level aggregation.
The Sysdig Examples could do with some sanity checking. Eg:
"See the top processes in terms of disk bandwidth usage
sysdig -c topprocs_file"
Brendan, thanks for the feedback. It's really cool to hear comments like this from someone like you. We really respect your work in the field.
Good catch on topprocs_file, we'll have to find a better name for it.
In terms of overhead, we put a lot of effort in it and, as you pointed out, we're already extremely optimized. But we think we can do even better. For example, we don't have any kind of kernel-level filtering yet. Coming soon! :)
A 1 gigabyte no-IO dd is an unusual microbenchmark: it stresses only syscall rates while wasting time with memcpy. (A loop on getpid() or equivalent would have worked just as well.)
On my workstation the plain dd runs in 40 ms, with systemtap one-liner instrumenting/aggregating those 1 million write(2) syscalls in the kernel extended that tiny runtime to about 50 ms for a 1.25x slowdown. But such small numbers are hardly meaningful.
I'm curious to what extent userspace perf script postprocessing is deemed a technological equivalent to this; or why a new kernel module was deemed necessary versus the perf_event_open(2) ring-buffer abi.
I like the filter syntax - would be nice for perf_events to pick this up. Although, if it did, I hope that the stable filter fields API can be extended with unstable arbitrary expressions as needed, for when dynamic probes are used.
What perf_events realy lacks is a way for custom processing of data in kernel context, to reduce the overheads of enablings. Eg, lets say I want a histogram of disk I/O latency. sysdig has chisels, which look like they do what I want, but from the Chisels User Guide: "Usually, with dtrace-like tools you write your scripts using a domain-specific language that gets compiled into bytecode and injected in the kernel. Draios uses a different approach: events are efficiently brought to user-level, enriched with context, and then scripts can be applied to them." Oh no, not user-level!
I tested this quickly, expecting DTrace's approach (which is the same as SystemTap and ktap) to blow sysdig out of the water. But the results were surprising (take these quick tests with a grain of salt). Here's my target command, along with sysdig and DTrace enablings, and strace for comparison:
sysdig slowed the target by about 4x. DTrace, between 2.5 and 2.7x. strace (for comparison), over 200x. This is a worst-case test, and if I'm willing to slow a target by 2x then taking that to 4x doesn't make much difference. With what I normally trace, the overheads are 1/100th of that, so DTrace is negligible. The take-away here is that the overheads are closer to the "negligible" end of the spectrum than strace's "violent" end. Which I found surprising for user-level aggregation.The Sysdig Examples could do with some sanity checking. Eg:
"See the top processes in terms of disk bandwidth usage sysdig -c topprocs_file"
I saw:
That's while my dd between /dev/zero and /dev/null was running. No "disk bandwidth"! :)edit: formatting