When parsing a sample with a sample ID, copy machine_pid and vcpu from
perf_sample_id to perf_sample.
Note, machine_pid will be zero when unused, so only a non-zero value
represents a guest machine. vcpu should be ignored if machine_pid is zero.
Note also, machine_pid is used with events that have come from injecting a
guest perf.data file, however guest events recorded on the host (i.e. using
perf kvm) have the (QEMU) hypervisor process pid to identify them - refer
machines__find_for_cpumode().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220711093218.10967-14-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When injecting events from a guest perf.data file, the events will have
separate sample ID numbers. These ID numbers can then be used to determine
which machine an event belongs to. To facilitate that, add machine_pid and
vcpu to id_index records. For backward compatibility, these are added at
the end of the record, and the length of the record is used to determine
if they are present or not.
Note, this is needed because the events from a guest perf.data file contain
the pid/tid of the process running at that time inside the VM not the
pid/tid of the (QEMU) hypervisor thread. So a way is needed to relate
guest events back to the guest machine and VCPU, and using sample ID
numbers for that is relatively simple and convenient.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220711093218.10967-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This new option displays all of the information needed to do external
BuildID-based symbolization of kernel stack traces, such as those collected
by bpf_get_stackid().
For each kernel module plus the main kernel, it displays the BuildID,
the start and end virtual addresses of that module's text range (rounded
out to page boundaries), and the pathname of the module.
When run as a non-privileged user, the actual addresses of the modules'
text ranges are not available, so the tools displays "0, <text length>" for
kernel modules and "0, 0xffffffffffffffff" for the kernel itself.
Sample output:
root# perf buildid-list -m
cf6df852fd4da122d616153353cc8f560fd12fe0 ffffffffa5400000 ffffffffa6001e27 [kernel.kallsyms]
1aa7209aa2acb067d66ed6cf7676d65066384d61 ffffffffc0087000 ffffffffc008b000 /lib/modules/5.15.15-1rodete2-amd64/kernel/crypto/sha512_generic.ko
3857815b5bf0183697b68f8fe0ea06121644041e ffffffffc008c000 ffffffffc0098000 /lib/modules/5.15.15-1rodete2-amd64/kernel/arch/x86/crypto/sha512-ssse3.ko
4081fde0bca2bc097cb3e9d1efcb836047d485f1 ffffffffc0099000 ffffffffc009f000 /lib/modules/5.15.15-1rodete2-amd64/kernel/drivers/acpi/button.ko
1ef81ba4890552ea6b0314f9635fc43fc8cef568 ffffffffc00a4000 ffffffffc00aa000 /lib/modules/5.15.15-1rodete2-amd64/kernel/crypto/cryptd.ko
cc5c985506cb240d7d082b55ed260cbb851f983e ffffffffc00af000 ffffffffc00b6000 /lib/modules/5.15.15-1rodete2-amd64/kernel/drivers/i2c/busses/i2c-piix4.ko
[...]
Committer notes:
u64 formatter should be PRIx64 for printing as hex numbers, fix this:
28 5.28 debian:experimental-x-mips : FAIL gcc version 11.2.0 (Debian 11.2.0-18)
builtin-buildid-list.c: In function 'buildid__map_cb':
builtin-buildid-list.c:32:24: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
32 | printf("%s %16lx %16lx", bid_buf, map->start, map->end);
| ~~~~^ ~~~~~~~~~~
| | |
| long unsigned int u64 {aka long long unsigned int}
| %16llx
builtin-buildid-list.c:32:30: error: format '%lx' expects argument of type 'long unsigned int', but argument 4 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
32 | printf("%s %16lx %16lx", bid_buf, map->start, map->end);
| ~~~~^ ~~~~~~~~
| | |
| long unsigned int u64 {aka long long unsigned int}
| %16llx
cc1: all warnings being treated as errors
Signed-off-by: Blake Jones <blakejones@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220629213632.3899212-1-blakejones@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To update the perf/core codebase.
Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end
of evsel__config(), after what was added in:
49c692b7df ("perf offcpu: Accept allowed sample types only")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked
objects") uncovered the following issue on aarch64:
util/unwind-libunwind-local.c: In function 'find_proc_info':
util/unwind-libunwind-local.c:386:28: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
386 | if (ofs > 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:371:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
371 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:363:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
363 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
In file included from util/libunwind/arm64.c:37:
Fixes: dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked objects")
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: kernel-team@cloudflare.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701182046.12589-1-ivan@cloudflare.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Build ID events associate a file name with a build ID. However, when
using perf inject, there is no guarantee that the file on the current
machine at the current time has that build ID. Fix by comparing the
build IDs and skip adding to the cache if they are different.
Example:
$ echo "int main() {return 0;}" > prog.c
$ gcc -o prog prog.c
$ perf record --buildid-all ./prog
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data ]
$ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; }
$ file-buildid prog
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ echo "int main() {return 1;}" > prog.c
$ gcc -o prog prog.c
$ file-buildid prog
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
Before:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
$
After:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
$
Fixes: 454c407ec1 ("perf: add perf-inject builtin")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20220621125144.5623-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the -D option is used, the details of thread-map, cpu-map and
event-update events are not currently dumped. Add prints so that they are.
Example:
# perf record --kcore sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
# perf script -D | grep 'THREAD\|CPU'
0 0x4950 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 35116
0 0x4978 [0x20]: PERF_RECORD_CPU_MAP: 0-7
# perf script -D | grep -A4 'UPDATE'
0 0x4920 [0x30]: PERF_RECORD_EVENT_UPDATE
... id: 147
... 0-7
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220610113316.6682-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In preparation for recording sideband events in a virtual machine guest so
that they can be injected into a host perf.data file.
This is needed to enable injecting events after the initial synthesized
user events (that have an all zero id sample) but before regular events.
Committer notes:
Add entry about PERF_RECORD_FINISHED_INIT to
tools/perf/Documentation/perf.data-file-format.txt.
Committer testing:
Before:
# perf report -D | grep FINISHED
0 0x5910 [0x8]: PERF_RECORD_FINISHED_ROUND
FINISHED_ROUND events: 1 ( 0.5%)
#
After:
# perf record -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ]
# perf report -D | grep FINISHED
0 0x5068 [0x8]: PERF_RECORD_FINISHED_INIT: unhandled!
0 0x5390 [0x8]: PERF_RECORD_FINISHED_ROUND
FINISHED_ROUND events: 1 ( 0.5%)
FINISHED_INIT events: 1 ( 0.5%)
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220610113316.6682-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
segbase is the address of .eh_frame_hdr and table_data is segbase plus
the header size. find_proc_info computes segbase as `map->start +
segbase - map->pgoff` which is wrong when
* .eh_frame_hdr and .text are in different PT_LOAD program headers
* and their p_vaddr difference does not equal their p_offset difference
Since 10.0, ld.lld's default --rosegment -z noseparate-code layout has
such R and RX PT_LOAD program headers.
ld.lld (default) => perf report fails to unwind `perf record
--call-graph dwarf` recorded data
ld.lld --no-rosegment => ok (trivial, no R PT_LOAD)
ld.lld -z separate-code => ok but by luck: there are two PT_LOAD but
their p_vaddr difference equals p_offset difference
ld.bfd -z noseparate-code => ok (trivial, no R PT_LOAD)
ld.bfd -z separate-code (default for Linux/x86) => ok but by luck:
there are two PT_LOAD but their p_vaddr difference equals p_offset
difference
To fix the issue, compute segbase as dso's base address plus
PT_GNU_EH_FRAME's p_vaddr. The base address is computed by iterating
over all dso-associated maps and then subtract the first PT_LOAD p_vaddr
(the minimum guaranteed by generic ABI) from the minimum address.
In libunwind, find_proc_info transitively called by unw_step is cached,
so the iteration overhead is acceptable.
Reported-by: Sebastian Ullrich <sebasti@nullri.ch>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: llvm@lists.linux.dev
Link: https://github.com/ClangBuiltLinux/linux/issues/1646
Link: https://lore.kernel.org/r/20220527182039.673248-1-maskray@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Architectures can detect availability of extra registers at runtime so
use this more complete set for unwinding. This will include the VG
register on arm64 in a later commit.
If the function isn't implemented then PERF_REGS_MASK is returned and
there is no change.
Committer notes:
Added util/perf_regs.c to tools/perf/util/python-ext-sources so that
'perf test python' passes, i.e. the perf python binding has all the
symbols it needs, addressing:
$ perf test -v python
19: 'import perf' in python :
--- start ---
test child forked, pid 2037817
python usage test: "echo "import sys ; sys.path.append('/tmp/build/perf/python'); import perf" | '/usr/bin/python3' "
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /tmp/build/perf/python/perf.cpython-310-x86_64-linux-gnu.so: undefined symbol: arch__user_reg_mask
test child finished with -1
---- end ----
'import perf' in python: FAILED!
$
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220525154114.718321-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This covers two different use cases. The first one is cgroup
filtering given by -G/--cgroup option which controls the off-cpu
profiling for tasks in the given cgroups only.
The other use case is cgroup sampling which is enabled by
--all-cgroups option and it adds PERF_SAMPLE_CGROUP to the sample_type
to set the cgroup id of the task in the sample data.
Example output.
$ sudo perf record -a --off-cpu --all-cgroups sleep 1
$ sudo perf report --stdio -s comm,cgroup --call-graph=no
...
# Samples: 144 of event 'offcpu-time'
# Event count (approx.): 48452045427
#
# Children Self Command Cgroup
# ........ ........ ............... ..........................................
#
61.57% 5.60% Chrome_ChildIOT /user.slice/user-657345.slice/user@657345.service/app.slice/...
29.51% 7.38% Web Content /user.slice/user-657345.slice/user@657345.service/app.slice/...
17.48% 1.59% Chrome_IOThread /user.slice/user-657345.slice/user@657345.service/app.slice/...
16.48% 4.12% pipewire-pulse /user.slice/user-657345.slice/user@657345.service/session.slice/...
14.48% 2.07% perf /user.slice/user-657345.slice/user@657345.service/app.slice/...
14.30% 7.15% CompositorTileW /user.slice/user-657345.slice/user@657345.service/app.slice/...
13.33% 6.67% Timer /user.slice/user-657345.slice/user@657345.service/app.slice/...
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220518224725.742882-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add --off-cpu option to enable the off-cpu profiling with BPF. It'd
use a bpf_output event and rename it to "offcpu-time". Samples will
be synthesized at the end of the record session using data from a BPF
map which contains the aggregated off-cpu time at context switches.
So it needs root privilege to get the off-cpu profiling.
Each sample will have a separate user stacktrace so it will skip
kernel threads. The sample ip will be set from the stacktrace and
other sample data will be updated accordingly. Currently it only
handles some basic sample types.
The sample timestamp is set to a dummy value just not to bother with
other events during the sorting. So it has a very big initial value
and increase it on processing each samples.
Good thing is that it can be used together with regular profiling like
cpu cycles. If you don't want to that, you can use a dummy event to
enable off-cpu profiling only.
Example output:
$ sudo perf record --off-cpu perf bench sched messaging -l 1000
$ sudo perf report --stdio --call-graph=no
# Total Lost Samples: 0
#
# Samples: 41K of event 'cycles'
# Event count (approx.): 42137343851
...
# Samples: 1K of event 'offcpu-time'
# Event count (approx.): 587990831640
#
# Children Self Command Shared Object Symbol
# ........ ........ ............... .................. .........................
#
81.66% 0.00% sched-messaging libc-2.33.so [.] __libc_start_main
81.66% 0.00% sched-messaging perf [.] cmd_bench
81.66% 0.00% sched-messaging perf [.] main
81.66% 0.00% sched-messaging perf [.] run_builtin
81.43% 0.00% sched-messaging perf [.] bench_sched_messaging
40.86% 40.86% sched-messaging libpthread-2.33.so [.] __read
37.66% 37.66% sched-messaging libpthread-2.33.so [.] __write
2.91% 2.91% sched-messaging libc-2.33.so [.] __poll
...
As you can see it spent most of off-cpu time in read and write in
bench_sched_messaging(). The --call-graph=no was added just to make
the output concise here.
It uses perf hooks facility to control BPF program during the record
session rather than adding new BPF/off-cpu specific calls.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220518224725.742882-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>