Oh no I hoped it was about speeding up perf record that is actually a big thorn on my side, one I wrote a specific tool for... Depending on the number of probes you use, perf record can induce large latency hits or reduced throughput. Batching/buffer disk writes solves the problem for me. But I had to redevelop a perf parser to record/compress smarter. And for network streaming (no touching the disk is even better...).
It'd be nice if perf record had a fundamentally faster way of working. I found a nice description of how it works in the README.md for cargo-trace: "perf relies on perf_event_open_sys to sample the stack. Every time a sample is taken, the entire stack is copied into user space. Stack unwinding is performed in user space as a post processing step. This wastes bandwidth and is a security concern as it may dump secrets like private keys."
cargo-trace is apparently dormant now, but I found it really interesting. It does the unwinding via eBPF instead, which should be quicker while recording, not generate as much (sensitive) data, and not require as much post-processing. (Symbolization would still happen in post-processing.)
I'll ask about opensourcing the tool. But just in case, the recipe is to use pipe mode and pre-parse all frames, stream them as messages, sometimes to several targets (pub/sub) with some streaming-zstd, and also splitting the pmu/probes/Intel-PT streams and treating them separately. Stack-traces are analysed (precomputed cfg optimised structure so unwinding is faster) before storing in adhoc in-house format with all other system traces. Only annoying thing is changing perf-record settings (pid changes, need event X, new probe) means restart and I ran out of interns before we had no-loss switchover...
Sounds more specialized than I was imagining but a cool system.
The idea of a more efficient compressed encoding seems generally applicable. I imagine just piping through zstd would be an improvement over plain perf record directly to a file, but it sounds like your tool's splitting makes zstd more effective. It'd be handy to be able to just do perf record ... | fancy-recompress > out, and even better to upstream the format improvement into perf itself. I feel you on "ran out of interns"; there's always more to do...
Perf report is indeed slow AF, especially on large files, you're right in wanting to speed it up! Thanks for sharing! This is an interesting tidbit that has thrown me down a rabbit hole of 'profiling the profiler'...