The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.
Committer testing:
root@x1:~# perf kwork report --use-bpf
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
--------------------------------------------------------------------------------------------------------------------------------
(w)intel_atomic_commit_work [ | 0009 | 18.680 ms | 2 | 18.553 ms | 362410.681580 s | 362410.700133 s |
(w)pm_runtime_work | 0007 | 13.300 ms | 1 | 13.300 ms | 362410.254996 s | 362410.268295 s |
(w)intel_atomic_commit_work [ | 0009 | 9.846 ms | 2 | 9.717 ms | 362410.172352 s | 362410.182069 s |
(w)acpi_ec_event_processor | 0002 | 8.106 ms | 1 | 8.106 ms | 362410.463187 s | 362410.471293 s |
(s)SCHED:7 | 0000 | 1.351 ms | 106 | 0.063 ms | 362410.658017 s | 362410.658080 s |
i915:157 | 0008 | 0.994 ms | 13 | 0.361 ms | 362411.222125 s | 362411.222486 s |
(s)SCHED:7 | 0001 | 0.703 ms | 98 | 0.047 ms | 362410.245004 s | 362410.245051 s |
(s)SCHED:7 | 0005 | 0.674 ms | 42 | 0.074 ms | 362411.483039 s | 362411.483113 s |
(s)NET_RX:3 | 0001 | 0.556 ms | 10 | 0.079 ms | 362411.066388 s | 362411.066467 s |
<SNIP>
root@x1:~# perf trace -e bpf --max-events 5 perf kwork report --use-bpf
0.000 ( 0.016 ms): perf/2948007 bpf(cmd: 36, uattr: 0x7ffededa6660, size: 8) = -1 EOPNOTSUPP (Operation not supported)
0.026 ( 0.106 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6390, size: 148) = 12
0.152 ( 0.032 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6450, size: 148) = 12
26.247 ( 0.138 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6300, size: 148) = 12
26.396 ( 0.012 ms): perf/2948007 bpf(uattr: 0x7ffededa64b0, size: 80) = 12
Starting trace, Hit <Ctrl+C> to stop and report
root@x1:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240902200515.2103769-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Support using LLVM as a disassembler method, allowing helperless
annotation in non-distro builds. (It is also much faster than
using libbfd or bfd objdump on binaries with a lot of debug
information.)
This is nearly identical to the output of llvm-objdump; there are
some very rare whitespace differences, some minor changes to demangling
(since we use perf's regular demangling and not LLVM's own) and
the occasional case where llvm-objdump makes a different choice
when multiple symbols share the same address.
It should work across all of LLVM's supported architectures, although
I've only tested 64-bit x86, and finding the right triple from perf's
idea of machine architecture can sometimes be a bit tricky. Ideally, we
should have some way of finding the triplet just from the file itself.
Committer notes:
Address this on 32-bit systems by using PRIu64 from inttypes.h
3 17.58 almalinux:9-i386 : FAIL gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC)
util/llvm-c-helpers.cpp: In function ‘char* make_symbol_relative_string(dso*, const char*, u64, u64)’:
util/llvm-c-helpers.cpp:150:52: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘u64’ {aka
+‘long long unsigned int’} [-Werror=format=]
150 | snprintf(buf, sizeof(buf), "%s+0x%lx",
| ~~^
| |
| long unsigned int
| %llx
151 | demangled ? demangled : sym_name, addr - base_addr);
| ~~~~~~~~~~~~~~~~
| |
| u64 {aka long long unsigned int}
cc1plus: all warnings being treated as errors
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-3-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In addition to the existing support for libbfd and calling out to
an external addr2line command, add support for using libllvm directly.
This is both faster than libbfd, and can be enabled in distro builds
(the LLVM license has an explicit provision for GPLv2 compatibility).
Thus, it is set as the primary choice if available.
As an example, running 'perf report' on a medium-size profile with
DWARF-based backtraces took 58 seconds with LLVM, 78 seconds with
libbfd, 153 seconds with external llvm-addr2line, and I got tired and
aborted the test after waiting for 55 minutes with external bfd
addr2line (which is the default for perf as compiled by distributions
today).
Evidently, for this case, the bfd addr2line process needs 18 seconds (on
a 5.2 GHz Zen 3) to load the .debug ELF in question, hits the 1-second
timeout and gets killed during initialization, getting restarted anew
every time. Having an in-process addr2line makes this much more robust.
As future extensions, libllvm can be used in many other places where
we currently use libbfd or other libraries:
- Symbol enumeration (in particular, for PE binaries).
- Demangling (including non-Itanium demangling, e.g. Microsoft
or Rust).
- Disassembling (perf annotate).
However, these are much less pressing; most people don't profile PE
binaries, and perf has non-bfd paths for ELF. The same with demangling;
the default _cxa_demangle path works fine for most users, and while bfd
objdump can be slow on large binaries, it is possible to use
--objdump=llvm-objdump to get the speed benefits. (It appears
LLVM-based demangling is very simple, should we want that.)
Tested with LLVM 14, 15, 16, 18 and 19. For some reason, LLVM 12 was not
correctly detected using feature_check, and thus was not tested.
Committer notes:
Added the name and a __maybe_unused to address:
1 13.50 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-22) (GCC)
util/srcline.c: In function 'dso__free_a2l':
util/srcline.c:184:20: error: parameter name omitted
void dso__free_a2l(struct dso *)
^~~~~~~~~~~~
make[3]: *** [/git/perf-6.11.0-rc3/tools/build/Makefile.build:158: util] Error 2
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-1-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If sched_in event for current task is not recorded, sched_in timestamp
will be set to end_time of time window interest, causing an error in
timestamp show. In this case, we choose to ignore this event.
Test scenario:
perf[1229608] does not record the first sched_in event, run time and sch delay are both 0
# perf sched timehist
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
2090450.763231 [0000] perf[1229608] 0.000 0.000 0.000
2090450.763235 [0000] migration/0[15] 0.000 0.001 0.003
2090450.763263 [0001] perf[1229608] 0.000 0.000 0.000
2090450.763268 [0001] migration/1[21] 0.000 0.001 0.004
2090450.763302 [0002] perf[1229608] 0.000 0.000 0.000
2090450.763309 [0002] migration/2[27] 0.000 0.001 0.007
2090450.763338 [0003] perf[1229608] 0.000 0.000 0.000
2090450.763343 [0003] migration/3[33] 0.000 0.001 0.004
Before:
arbitrarily specify a time window of interest, timestamp will be set to an incorrect value
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
200.000000 [0000] perf[1229608] 0.000 0.000 0.000
200.000000 [0001] perf[1229608] 0.000 0.000 0.000
200.000000 [0002] perf[1229608] 0.000 0.000 0.000
200.000000 [0003] perf[1229608] 0.000 0.000 0.000
200.000000 [0004] perf[1229608] 0.000 0.000 0.000
200.000000 [0005] perf[1229608] 0.000 0.000 0.000
200.000000 [0006] perf[1229608] 0.000 0.000 0.000
200.000000 [0007] perf[1229608] 0.000 0.000 0.000
After:
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819024720.2405244-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The spinlock and rwlock use a single-element per-cpu array to track
current locks due to performance reason. But this means the key is
always available and it cannot simply account lock stats in the array
because some of them are invalid.
In fact, the contention_end() program in the BPF invalidates the entry
by setting the 'lock' value to 0 instead of deleting the entry for the
hashmap. So it should skip entries with the lock value of 0 in the
account_end_timestamp().
Otherwise, it'd have spurious high contention on an idle machine:
$ sudo perf lock con -ab -Y spinlock sleep 3
contended total wait max wait avg wait type caller
8 4.72 s 1.84 s 590.46 ms spinlock rcu_core+0xc7
8 1.87 s 1.87 s 233.48 ms spinlock process_one_work+0x1b5
2 1.87 s 1.87 s 933.92 ms spinlock worker_thread+0x1a2
3 1.81 s 1.81 s 603.93 ms spinlock tmigr_update_events+0x13c
2 1.72 s 1.72 s 861.98 ms spinlock tick_do_update_jiffies64+0x25
6 42.48 us 13.02 us 7.08 us spinlock futex_q_lock+0x2a
1 13.03 us 13.03 us 13.03 us spinlock futex_wake+0xce
1 11.61 us 11.61 us 11.61 us spinlock rcu_core+0xc7
I don't believe it has contention on a spinlock longer than 1 second.
After this change, it only reports some small contentions.
$ sudo perf lock con -ab -Y spinlock sleep 3
contended total wait max wait avg wait type caller
4 133.51 us 43.29 us 33.38 us spinlock tick_do_update_jiffies64+0x25
4 69.06 us 31.82 us 17.27 us spinlock process_one_work+0x1b5
2 50.66 us 25.77 us 25.33 us spinlock rcu_core+0xc7
1 28.45 us 28.45 us 28.45 us spinlock rcu_core+0xc7
1 24.77 us 24.77 us 24.77 us spinlock tmigr_update_events+0x13c
1 23.34 us 23.34 us 23.34 us spinlock raw_spin_rq_lock_nested+0x15
Fixes: b5711042a1 ("perf lock contention: Use per-cpu array map for spinlocks")
Reported-by: Xi Wang <xii@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240828052953.1445862-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously inject->is_pipe was set if the input or output were a
pipe. Determining the input was a pipe had to be done prior to
starting the session and opening the file. This was done by comparing
the input file name with '-' but it fails if the pipe file is written
to disk.
Opening a pipe file from disk will correctly set perf_data.is_pipe, but
this is too late for 'perf inject' and results in a broken file. A
workaround is 'cat pipe_perf|perf inject -i - ...'.
This change removes inject->is_pipe and changes the dependent
conditions to use the is_pipe flag on the input
(inject->session->data) and output files (inject->output). This
ensures the is_pipe condition reflects things like the header being
read.
The change removes the use of perf file header repiping, that is
writing the file header out while reading it in. The case of input
pipe and output file cannot repipe as the attributes for the file are
unknown. To resolve this, write the file header when writing to disk
and as the attributes may be unknown, write them after the data.
Update sessions repipe variable to be trace_event_repipe as those are
the only events now impacted by it. Update __perf_session__new as the
repipe_fd no longer needs passing. Fully removing repipe from session
header reading will be done in a later change.
Committer testing:
root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 | perf report -i -
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB - ]
#
# Total Lost Samples: 0
#
# Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. ...............................
#
100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5
|
---__libc_start_main@@GLIBC_2.34
__libc_start_call_main
0x562fc2560a9f
clock_nanosleep@GLIBC_2.2.5
#
# (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
#
root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 > pipe.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB - ]
root@number:~# perf report --stdio -i pipe.data
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. ...............................
#
100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5
|
---__libc_start_main@@GLIBC_2.34
__libc_start_call_main
0x55f775975a9f
clock_nanosleep@GLIBC_2.2.5
#
# (Tip: To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...)
#
root@number:~#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Buggy perf.data files can have the attributes and data
overlapping.
For example, when processing pipe data the attributes aren't known and
so file offset header calculations can consider them not present.
Later this can cause the attributes to overwrite the data. This can be
seen in:
$ perf record -o - true > a.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.059 MB - ]
$ perf inject -i a.data -o b.data
$ perf report --stats -i b.data
0x68 [0]: failed to process type: 510379 [Invalid argument]
Error:
failed to process sample
$
This change makes reading the corrupt file fail:
$ perf report --stats -i b.data
Perf file header corrupt: Attributes and data overlap
incompatible file format (rerun with -v to learn more)
$
Which is more informative.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian pointed out that the libcap feature test is also used by bpftool, so
we can't remove it just because perf stopped using it, revert the
removal of the feature test.
Since both perf and libcap uses the fast path feature detection
(tools/build/feature/test-all.c), probably the best thing is to keep
libcap-devel when building perf even it not being used there.
This reverts commit 47b3b6435e.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I noticed that the fast path feature detection was failing:
$ cat /tmp/build/perf-tools-next/feature/test-all.make.output
/usr/bin/ld: cannot find -lcap: No such file or directory
collect2: error: ld returned 1 exit status
$
The patch removing the dependency (Fixes tag below) didn't remove the
detection of libcap, and as the fast path feature detection (test-all.c)
had -lcap in its Makefile link list of libraries to link, it was failing
when libcap-devel is not available, fix it by removing those leftover
files.
Fixes: e25ebda78e ("perf cap: Tidy up and improve capability testing")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zs-gjOGFWtAvIZit@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
$ sudo ./perf test filtering -vv
96: perf record sample filtering (by BPF) tests:
--- start ---
test child forked, pid 2966908
Checking BPF-filter privilege
Basic bpf-filter test
Basic bpf-filter test [Success]
Failing bpf-filter test
Failing bpf-filter test [Success]
Group bpf-filter test
Group bpf-filter test [Success]
Multiple bpf-filter test
Multiple bpf-filter test [Success]
Cgroup bpf-filter test
Cgroup bpf-filter test [Success]
---- end(0) ----
96: perf record sample filtering (by BPF) tests : Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.
So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage. So it shouldn't access it
unconditionally.
I got a segfault, when I wanted to see cgroup profiles.
$ sudo perf record -a --all-cgroups --synth=cgroup true
$ sudo perf report -s cgroup
Program received signal SIGSEGV, Segmentation fault.
0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
48 return RC_CHK_ACCESS(map)->dso;
(gdb) bt
#0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
#1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
#2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
#3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
at util/hist.c:644
#4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
#5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
#6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
#7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
at util/hist.c:1260
#8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
machine=0x5555560388e8) at builtin-report.c:334
#9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
#10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
#11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
#12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
#13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
#14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
#15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
#16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
at util/session.c:780
#17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
file_path=0x555556038ff0 "perf.data") at util/session.c:1406
As you can see the entry->ms.map was NULL even if he->ms.map has a
value. This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same. I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).
Fixes: ac01c8c424 ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf trace -p <PID>' work on a syscall that is unaugmented, but doesn't
work on a syscall that's augmented (when it calls perf_event_output() in
BPF).
Let's take open() as an example. open() is augmented in perf trace.
Before:
$ perf trace -e open -p 3792392
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
We can see there's no output.
After:
$ perf trace -e open -p 3792392
0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
Reason:
bpf_perf_event_output() will fail when you specify a pid in 'perf trace' (EOPNOTSUPP).
When using 'perf trace -p 114', before perf_event_open(), we'll have PID
= 114, and CPU = -1.
This is bad for bpf-output event, because the ring buffer won't accept
output from BPF's perf_event_output(), making it fail. I'm still trying
to find out why.
If we open bpf-output for every cpu, instead of setting it to -1, like
this:
PID = <PID>, CPU = 0
PID = <PID>, CPU = 1
PID = <PID>, CPU = 2
PID = <PID>, CPU = 3
Everything works.
You can test it with this script (open.c):
#include <unistd.h>
#include <sys/syscall.h>
int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DINGZHEN", s2[] = "XUEBAO";
while (1) {
syscall(SYS_open, s1, i1, i2);
sleep(1);
}
return 0;
}
save, compile:
make open
perf trace:
perf trace -e open <path-to-the-executable>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>