Files
linux/tools/include/uapi
Peter Zijlstra c69993ecdd perf: Support deferred user unwind
Add support for deferred userspace unwind to perf.

Where perf currently relies on in-place stack unwinding; from NMI
context and all that. This moves the userspace part of the unwind to
right before the return-to-userspace.

This has two distinct benefits, the biggest is that it moves the
unwind to a faultable context. It becomes possible to fault in debug
info (.eh_frame, SFrame etc.) that might not otherwise be readily
available. And secondly, it de-duplicates the user callchain where
multiple samples happen during the same kernel entry.

To facilitate this the perf interface is extended with a new record
type:

  PERF_RECORD_CALLCHAIN_DEFERRED

and two new attribute flags:

  perf_event_attr::defer_callchain - to request the user unwind be deferred
  perf_event_attr::defer_output    - to request PERF_RECORD_CALLCHAIN_DEFERRED records

The existing PERF_RECORD_SAMPLE callchain section gets a new
context type:

  PERF_CONTEXT_USER_DEFERRED

After which will come a single entry, denoting the 'cookie' of the
deferred callchain that should be attached here, matching the 'cookie'
field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED.

The 'defer_callchain' flag is expected on all events with
PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event
responsible for collecting side-band events (like mmap, comm etc.).
Setting 'defer_output' on multiple events will get you duplicated
PERF_RECORD_CALLCHAIN_DEFERRED records.

Based on earlier patches by Josh and Steven.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net
2025-10-29 10:29:58 +01:00
..
2025-10-29 10:29:58 +01:00

Why we want a copy of kernel headers in tools?
==============================================

There used to be no copies, with tools/ code using kernel headers
directly. From time to time tools/perf/ broke due to legitimate kernel
hacking. At some point Linus complained about such direct usage. Then we
adopted the current model.

The way these headers are used in perf are not restricted to just
including them to compile something.

There are sometimes used in scripts that convert defines into string
tables, etc, so some change may break one of these scripts, or new MSRs
may use some different #define pattern, etc.

E.g.:

  $ ls -1 tools/perf/trace/beauty/*.sh | head -5
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/fsmount.sh
  $
  $ tools/perf/trace/beauty/fadvise.sh
  static const char *fadvise_advices[] = {
        [0] = "NORMAL",
        [1] = "RANDOM",
        [2] = "SEQUENTIAL",
        [3] = "WILLNEED",
        [4] = "DONTNEED",
        [5] = "NOREUSE",
  };
  $

The tools/perf/check-headers.sh script, part of the tools/ build
process, points out changes in the original files.

So its important not to touch the copies in tools/ when doing changes in
the original kernel headers, that will be done later, when
check-headers.sh inform about the change to the perf tools hackers.

Another explanation from Ingo Molnar:
It's better than all the alternatives we tried so far:

 - Symbolic links and direct #includes: this was the original approach but
   was pushed back on from the kernel side, when tooling modified the
   headers and broke them accidentally for kernel builds.

 - Duplicate self-defined ABI headers like glibc: double the maintenance
   burden, double the chance for mistakes, plus there's no tech-driven
   notification mechanism to look at new kernel side changes.

What we are doing now is a third option:

 - A software-enforced copy-on-write mechanism of kernel headers to
   tooling, driven by non-fatal warnings on the tooling side build when
   kernel headers get modified:

    Warning: Kernel ABI header differences:
      diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
      diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
      diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
      ...

   The tooling policy is to always pick up the kernel side headers as-is,
   and integate them into the tooling build. The warnings above serve as a
   notification to tooling maintainers that there's changes on the kernel
   side.

We've been using this for many years now, and it might seem hacky, but
works surprisingly well.