Perf probe on vfs_fstatat fails as below on a powerpc system
$ ./perf probe -nf --max-probes=512 -a 'vfs_fstatat $params'
Segmentation fault (core dumped)
This is observed while running perftool-testsuite_probe testcase.
While running with verbose, its observed that segfault happens
at:
synthesize_probe_trace_arg ()
synthesize_probe_trace_command ()
probe_file.add_event ()
apply_perf_probe_events ()
__cmd_probe ()
cmd_probe ()
run_builtin ()
handle_internal_command ()
main ()
Code in synthesize_probe_trace_arg() access a null value and results in
segfault. Data structure which is null:
struct probe_trace_arg arg->value
We are hitting a case where arg->value is null in probe point:
"vfs_fstatat $params". This is happening since 'commit e896474fe4
("getname_maybe_null() - the third variant of pathname copy-in")'
Before the commit, probe point for vfs_fstatat was getting added only
for one location:
Writing event: p:probe/vfs_fstatat _text+6345404 dfd=%gpr3:s32 filename=%gpr4:x64 stat=%gpr5:x64 flags=%gpr6:s32
With this change, vfs_fstatat code is inlined for other locations in the
code:
Probe point found: __do_sys_lstat64+48
Probe point found: __do_sys_stat64+48
Probe point found: __do_sys_newlstat+48
Probe point found: __do_sys_newstat+48
Probe point found: vfs_fstatat+0
When trying to find matching dwarf information entry (DIE)
from the debuginfo, the code incorrectly picks DIE which is
not referring to vfs_fstatat. Snippet from dwarf entry in vmlinux
debuginfo file.
The main abstract die is:
<1><4214883>: Abbrev Number: 147 (DW_TAG_subprogram)
<4214885> DW_AT_external : 1
<4214885> DW_AT_name : (indirect string, offset: 0x17b9f3): vfs_fstatat
With formal parameters:
<2><4214896>: Abbrev Number: 51 (DW_TAG_formal_parameter)
<4214897> DW_AT_name : dfd
<2><42148a3>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148a4> DW_AT_name : (indirect string, offset: 0x8fda9): filename
<2><42148b0>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148b1> DW_AT_name : (indirect string, offset: 0x16bd9c): stat
<2><42148bd>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148be> DW_AT_name : (indirect string, offset: 0x39832b): flags
While collecting variables/parameters for a probe point, the function
copy_variables_cb() also looks at dwarf debug entries based on the
instruction address. Snippet
if (dwarf_haspc(die_mem, vf->pf->addr))
return DIE_FIND_CB_CONTINUE;
else
return DIE_FIND_CB_SIBLING;
But incase of inlined function instance for vfs_fstatat, there are two
entries which has the instruction address entry point as same.
Instance 1: which is for vfs_fstatat and DW_AT_abstract_origin points to
0x4214883 (reference above for main abstract die)
<3><42131fa>: Abbrev Number: 59 (DW_TAG_inlined_subroutine)
<42131fb> DW_AT_abstract_origin: <0x4214883>
<42131ff> DW_AT_entry_pc : 0xc00000000062b1e0
Instance 2: which is not for vfs_fstatat but for getname
<5><4213270>: Abbrev Number: 39 (DW_TAG_inlined_subroutine)
<4213271> DW_AT_abstract_origin: <0x4215b6b>
<4213275> DW_AT_entry_pc : 0xc00000000062b1e0
But the copy_variables_cb() continues to add parameters from second
instance also based on the dwarf_haspc() check. This results in
formal parameters for getname also appended to params. But while
filling in the args->value for these parameters, since these args
are not part of dwarf with offset "42131fa". Hence value will be
null. This incorrect args results in segfault when value field is
accessed.
Save the dwarf dieoffset of the actual DW_TAG_subprogram as part of
"struct probe_finder". In copy_variables_cb(), include check to make
sure the DW_AT_abstract_origin points to the correct entry if the
dwarf_haspc() matches the instruction address.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250225123042.37263-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The max-latency value can make the histogram smaller, but not larger, we
have a maximum of 22 buckets and specifying a max-latency that would
require more buckets has no effect.
Dynamically allocate the buckets and compute the bucket number from the
max latency as (max-min) / range + 2
If the maximum is not specified, we still set the bucket number to 22
and compute the maximum accordingly.
Fail if the maximum is smaller than min+range, this way we make sure we
always have 3 buckets: those below min, those above max and one in the
middle.
Since max-latency is not available in log2 mode, always use 22 buckets.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250207080446.77630-1-gmonaco@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb.
As getdents64 may be missing from libc, add support using syscall. As
the system call number maybe missing, add #defines for those.
Note, an earlier version of this patch had a feature test for
getdents64 but there were problems on certains distros where
getdents64 would be #define renamed to getdents breaking the code. The
syscall use was made uncondtional to work around this. There is
context in:
https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250222061015.303622-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Prior to commit 70c90e4a6b ("perf parse-events: Avoid scanning PMUs
before parsing") names (generally event names) excluded hyphen (minus)
symbols as the formation of legacy names with hyphens was handled in
the yacc code. That commit allowed hyphens supposedly making
name_minus unnecessary. However, changing name_minus to name has
issues in the term config tokens as then name ends up having priority
over numbers and name allows matching numbers since commit
5ceb57990b ("perf parse: Allow tracepoint names to start with digits
"). It is also permissable for a name to match with a colon (':') in
it when its in a config term list. To address this rename name_minus
to term_name, make the pattern match name's except for the colon, add
number matching into the config term region with a higher priority
than name matching. This addresses an inconsistency and allows greater
matching for names inside of term lists, for example, they may start
with a number.
Rename name_tag to quoted_name and update comments and helper
functions to avoid str detecting quoted strings which was already done
by the lexer.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250109175401.161340-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When testing perf trace on NixOS, I noticed significant startup delays:
- `ls`: ~2ms
- `strace ls`: ~10ms
- `perf trace ls`: ~550ms
Profiling showed that 51% of the time is spent reading files,
26% in loading BPF programs, and 11% in `newfstatat`.
This patch optimizes module path exploration by avoiding `stat()` calls
unless necessary. For filesystems that do not implement `d_type`
(DT_UNKNOWN), it falls back to the old behavior.
See `readdir(3)` for details.
This reduces `perf trace ls` time to ~500ms.
A more thorough startup optimization based on command parameters would
be ideal, but that is a larger effort.
Signed-off-by: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Currently the code checks that there is no "ipc" in the sort order
and add an ipc string. This will always error out on the second pass
after input reload/switch, since the sort order already contains "ipc".
Do the ipc check/fixup only on the first pass.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20250108063628.215577-1-dvyukov@google.com
Fixes: ec6ae74fe8 ("perf report: Display average IPC and IPC coverage per symbol")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The symbol_conf.use_callchain should be reset when switching to new data
file, otherwise report__setup_sample_type() will show an error message
that it enabled callchains but no callchain data. The function also
will turn on the callchains if the data has PERF_SAMPLE_CALLCHAIN so I
think it's ok to reset symbol_conf.use_callchain here.
Link: https://lore.kernel.org/r/20250211060745.294289-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In sysfs, the perf events are all located in
/sys/bus/event_source/devices/ but some places ended up hard-coding the
location to be at the root of /sys/devices/ which could be very risky as
you do not exactly know what type of device you are accessing in sysfs
at that location.
So fix this all up by properly pointing everything at the bus device
list instead of the root of the sysfs devices/ tree.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/2025021955-implant-excavator-179d@gregkh
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When listing in verbose mode, the long description is used but the PMU
name isn't appended. There doesn't seem to be a reason to exclude it
when asking for more information, so use the same print block for both
long and short descriptions.
Before:
$ perf list -v
...
inst_retired
[Instruction architecturally executed]
After:
$ perf list -v
...
inst_retired
[Instruction architecturally executed. Unit: armv8_cortex_a57]
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250219151622.1097289-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
A recent change in the flamegraph script fixed an issue with live mode
but it created another for offline mode. It needs to pass "-" to -i
option to read from stdin in the live mode. Actually there's a logic
to pass the option in the perf script code, but the script was written
with "-- $@" which prevented the option to go to the perf script. So
the previous commit added the hard-coded "-i -" to the report command.
But it's a problem for the offline mode which expects input from a file
and now it's stuck on reading from stdin. Let's remove the "-i - --"
part and let it pass the options properly to perf script.
Closes: https://lore.kernel.org/linux-perf-users/c41e4b04-e1fd-45ab-80b0-ec2ac6e94310@linux.ibm.com
Fixes: 23e0a63c6d ("perf script: force stdin for flamegraph in live mode")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Anubhav Shelat <ashelat@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Since the commit dc6d2bc2d8 ("perf sample: Make user_regs and
intr_regs optional"), the building for Arm64 reports error:
arch/arm64/util/unwind-libdw.c: In function ‘libdw__arch_set_initial_registers’:
arch/arm64/util/unwind-libdw.c:11:32: error: initialization of ‘struct regs_dump *’ from incompatible pointer type ‘struct regs_dump **’ [-Werror=incompatible-pointer-types]
11 | struct regs_dump *user_regs = &ui->sample->user_regs;
| ^
cc1: all warnings being treated as errors
make[6]: *** [/home/niayan01/linux/tools/build/Makefile.build:85: arch/arm64/util/unwind-libdw.o] Error 1
make[5]: *** [/home/niayan01/linux/tools/build/Makefile.build:138: util] Error 2
arch/arm64/tests/dwarf-unwind.c: In function ‘test__arch_unwind_sample’:
arch/arm64/tests/dwarf-unwind.c:48:27: error: initialization of ‘struct regs_dump *’ from incompatible pointer type ‘struct regs_dump **’ [-Werror=incompatible-pointer-types]
48 | struct regs_dump *regs = &sample->user_regs;
| ^
To fix the issue, use the helper perf_sample__user_regs() to retrieve
the user_regs.
Fixes: dc6d2bc2d8 ("perf sample: Make user_regs and intr_regs optional")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250214111025.14478-1-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The struct dump_regs contains 512 bytes of cache_regs, meaning the two
values in perf_sample contribute 1088 bytes of its total 1384 bytes
size. Initializing this much memory has a cost reported by Tavian
Barnes <tavianator@tavianator.com> as about 2.5% when running `perf
script --itrace=i0`:
https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/
Adrian Hunter <adrian.hunter@intel.com> replied that the zero
initialization was necessary and couldn't simply be removed.
This patch aims to strike a middle ground of still zeroing the
perf_sample, but removing 79% of its size by make user_regs and
intr_regs optional pointers to zalloc-ed memory. To support the
allocation accessors are created for user_regs and intr_regs. To
support correct cleanup perf_sample__init and perf_sample__exit
functions are created and added throughout the code base.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>