Yes, there can be good reasons to fork (especially after making a fair effort to...

kortilla · on June 4, 2021

> They look deceptively simple, but those that are kprobes-based are really kernel-specific and brittle, and need ongoing maintenance to match the latest changes in the kernel

It seems like there is a missing formal interface here if this is so brittle, no? If it’s hitting a bunch of internal kernel stuff shouldn’t this stuff just live with the kernel itself?

brendangregg · on June 4, 2021

The formal interface is tracepoints. So tracepoints in theory aren't brittle (they are best-effort stable) and don't need so much expert maintenance (which is mostly the case). In theory, someone could port tracepoint-based tools and almost never need maintenance.

But kprobes is basically exposing raw kernel code that the kernel engineers bashed out with no idea that anyone might trace it. And they can change it from one minor release to another. And change it in unobvious ways: Add a new codepath somewhere that takes some of the traffic, so gee, seems like my tool still works but the numbers are a bit lower. Or maybe I measured queue latency and now there's two queues but the tool is only tracing the first one, or now there's no queues so my tools blows up as it can't find the functions to trace (that's actually preferable, since it's obvious that something needs fixing!).

I really don't like using kprobes if it can be avoided (instead use tracepoints, /proc, netlink, etc). But sometimes it's solve the problem with kprobes or not at all.

Now, normally such code-specific-brittle things should indeed live with the code like you say, so normally I'd think about putting the tools in the kernel code. But we don't want to add so much user space to the kernel, and, it also opens the door as to whether these should actually be tracepoints instead (which begins long discussions: Maintainers don't want to be on the hook to maintain stable tracepoints if they aren't totally needed).

Another scenario where the tools should ship with the code base would be user space applications. E.g., if someone wrote a bunch of low-level tracing tools for the Cassandra database that used uprobes and were code specific, then they would be too niche for bcc, and would probably be best living in the Cassandra code base itself.

JoelJacobson · on June 4, 2021

Thanks Brendan for creating the bpfcc-tools! I’m using it in magicmake [1] which is a tool to automatically find missing packages when compiling, based on file path accesses.

[1] https://github.com/truthly/magicmake