Remove code that tested the "unit" as in KB/sec for certain hard coded
metric values and did workarounds.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The logic to skip output of a default metric line was firing on
Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the
need_full_name check as it is equivalent to the different PMU test in
the cases we care about, merge the 'if's and flip the evsel of the PMU
test. The 'if' is now basically saying, if the output matches the last
printed output then skip the output.
Before:
```
TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation
# 24.3 % tma_frontend_bound
TopdownL1 (cpu_core) # 33.9 % tma_backend_bound
# 30.6 % tma_retiring
# 42.2 % tma_backend_bound
# 25.0 % tma_frontend_bound (49.81%)
# 12.8 % tma_bad_speculation
# 20.0 % tma_retiring (59.46%)
```
After:
```
TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation
# 43.7 % tma_frontend_bound
# 30.7 % tma_backend_bound
# 17.2 % tma_retiring
TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound
# 37.6 % tma_frontend_bound (49.66%)
# 18.0 % tma_bad_speculation
# 12.6 % tma_retiring (59.58%)
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now that the metrics are encoded in common json the hard coded
printing means the metrics are shown twice. Remove the hard coded
version.
This means that when specifying events, and those events correspond to
a hard coded metric, the metric will no longer be displayed. The
metric will be displayed if the metric is requested. Due to the adhoc
printing in the previous approach it was often found frustrating, the
new approach avoids this.
The default perf stat output on an alderlake now looks like:
```
$ perf stat -a -- sleep 1
Performance counter stats for 'system wide':
19,697 context-switches # nan cs/sec cs_per_second
TopdownL1 (cpu_core) # 10.7 % tma_bad_speculation
# 24.9 % tma_frontend_bound
TopdownL1 (cpu_core) # 34.3 % tma_backend_bound
# 30.1 % tma_retiring
6,593 page-faults # nan faults/sec page_faults_per_second
729,065,658 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.79%)
1,605,131,101 cpu_core/cpu-cycles/ # nan GHz cycles_frequency
# 19.7 % tma_bad_speculation
# 14.2 % tma_retiring (50.14%)
# 37.3 % tma_frontend_bound (50.31%)
87,302,268 cpu_atom/branches/ # nan M/sec branch_frequency (60.27%)
512,046,956 cpu_core/branches/ # nan M/sec branch_frequency
1,111 cpu-migrations # nan migrations/sec migrations_per_second
# 28.8 % tma_backend_bound (60.26%)
0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized
392,509,323 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (60.19%)
2,990,369,310 cpu_core/instructions/ # 1.9 instructions insn_per_cycle
3,493,478 cpu_atom/branch-misses/ # 5.9 % branch_miss_rate (49.69%)
7,297,531 cpu_core/branch-misses/ # 1.4 % branch_miss_rate
1.006621701 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Some Default group metrics require their events showing for
consistency with perf's previous behavior. Add a flag to indicate when
this is the case and use it in stat-display.
As events are coming from Default metrics remove that default hardware
and software events from perf stat.
Following this change the default perf stat output on an alderlake looks like:
```
$ perf stat -a -- sleep 1
Performance counter stats for 'system wide':
20,550 context-switches # nan cs/sec cs_per_second
TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation
# 28.1 % tma_frontend_bound
TopdownL1 (cpu_core) # 29.2 % tma_backend_bound
# 33.7 % tma_retiring
6,685 page-faults # nan faults/sec page_faults_per_second
790,091,064 cpu_atom/cpu-cycles/
# nan GHz cycles_frequency (49.83%)
2,563,918,366 cpu_core/cpu-cycles/
# nan GHz cycles_frequency
# 12.3 % tma_bad_speculation
# 14.5 % tma_retiring (50.20%)
# 33.8 % tma_frontend_bound (50.24%)
76,390,322 cpu_atom/branches/ # nan M/sec branch_frequency (60.20%)
1,015,173,047 cpu_core/branches/ # nan M/sec branch_frequency
1,325 cpu-migrations # nan migrations/sec migrations_per_second
# 39.3 % tma_backend_bound (60.17%)
0.00 msec cpu-clock # 0.000 CPUs utilized
# 0.0 CPUs CPUs_utilized
554,347,072 cpu_atom/instructions/ # 0.64 insn per cycle
# 0.6 instructions insn_per_cycle (60.14%)
5,228,931,991 cpu_core/instructions/ # 2.04 insn per cycle
# 2.0 instructions insn_per_cycle
4,308,874 cpu_atom/branch-misses/ # 5.65% of all branches
# 5.6 % branch_miss_rate (49.76%)
9,890,606 cpu_core/branch-misses/ # 0.97% of all branches
# 1.0 % branch_miss_rate
1.005477803 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
For CPU nanoseconds a lot of the stat-shadow metrics use either
task-clock or cpu-clock, the latter being used when
target__has_cpu. Add a #target_cpu literal so that json metrics can
perform the same test.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Rather than using the first evsel in the matched events, try to find
the least shared non-tool evsel. The aim is to pick the first evsel
that typifies the metric within the list of metrics.
This addresses an issue where Default metric group metrics may lose
their counter value due to how the stat displaying hides counters for
default event/metric output.
For a metricgroup like TopdownL1 on an Intel Alderlake the change is,
before there are 4 events with metrics:
```
$ perf stat -M topdownL1 -a sleep 1
Performance counter stats for 'system wide':
7,782,334,296 cpu_core/TOPDOWN.SLOTS/ # 10.4 % tma_bad_speculation
# 19.7 % tma_frontend_bound
2,668,927,977 cpu_core/topdown-retiring/ # 35.7 % tma_backend_bound
# 34.1 % tma_retiring
803,623,987 cpu_core/topdown-bad-spec/
167,514,386 cpu_core/topdown-heavy-ops/
1,555,265,776 cpu_core/topdown-fe-bound/
2,792,733,013 cpu_core/topdown-be-bound/
279,769,310 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.2 % tma_retiring
# 15.1 % tma_bad_speculation
457,917,232 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 38.4 % tma_backend_bound
# 34.2 % tma_frontend_bound
783,519,226 cpu_atom/TOPDOWN_FE_BOUND.ALL/
10,790,192 cpu_core/INT_MISC.UOP_DROPPING/
879,845,633 cpu_atom/TOPDOWN_BE_BOUND.ALL/
```
After there are 6 events with metrics:
```
$ perf stat -M topdownL1 -a sleep 1
Performance counter stats for 'system wide':
2,377,551,258 cpu_core/TOPDOWN.SLOTS/ # 7.9 % tma_bad_speculation
# 36.4 % tma_frontend_bound
480,791,142 cpu_core/topdown-retiring/ # 35.5 % tma_backend_bound
186,323,991 cpu_core/topdown-bad-spec/
65,070,590 cpu_core/topdown-heavy-ops/ # 20.1 % tma_retiring
871,733,444 cpu_core/topdown-fe-bound/
848,286,598 cpu_core/topdown-be-bound/
260,936,456 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.4 % tma_retiring
# 17.6 % tma_bad_speculation
419,576,513 cpu_atom/CPU_CLK_UNHALTED.CORE/
797,132,597 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.0 % tma_frontend_bound
3,055,447 cpu_core/INT_MISC.UOP_DROPPING/
671,014,164 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 32.0 % tma_backend_bound
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It should also have PERF_SAMPLE_TID to enable inherit and PERF_SAMPLE_READ
on recent kernels. Not having _TID makes the feature check wrongly detect
the inherit and _READ support.
It was reported that the following command failed due to the error in
the missing feature check on Intel SPR machines.
$ perf record -e '{cpu/mem-loads-aux/S,cpu/mem-loads,ldlat=3/PS}' -- ls
Error:
Failure to open event 'cpu/mem-loads,ldlat=3/PS' on PMU 'cpu' which will be removed.
Invalid event (cpu/mem-loads,ldlat=3/PS) in per-thread mode, enable system wide with '-a'.
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 3b193a57ba ("perf tools: Detect missing kernel features properly")
Reported-and-tested-by: Chen, Zide <zide.chen@intel.com>
Closes: https://lore.kernel.org/lkml/20251022220802.1335131-1-zide.chen@intel.com/
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Remove unnecessary semicolons reported by Coccinelle/coccicheck and the
semantic patch at scripts/coccinelle/misc/semicolon.cocci.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add an ability to be able to compose perf_tools, by having one perform
an action and then calling a delegate. Currently the perf_tools have
if-then-elses setting the callback and then if-then-elses within the
callback. Understanding the behavior is complex as it is in two places
and logic for numerous operations, within things like perf inject, is
interwoven. By chaining perf_tools together based on command line
options this kind of code can be avoided.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Getting context for what a tool is doing, such as the perf_inject
instance, using container_of the tool is a common pattern in the
code. This isn't possible event_op2, event_op3 and event_op4 callbacks
as the tool isn't passed. Add the argument and then fix function
signatures to match. As tools maybe reading a tool from somewhere
else, change that code to use the passed in tool.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Recent change on enabling --buildid-mmap by default brought an issue
with build-id handling. With build-ID in MMAP2 records, we don't need
to save the build-ID table in the header of a perf data file.
But the actual file contents still need to be cached in the debug
directory for annotation etc. Split the build-ID header processing and
caching and make sure perf record to save hit DSOs in the build-ID cache
by moving perf_session__cache_build_ids() to the end of the record__
finish_output().
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When copy metrics into a group also copy default information from the
original metrics.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If an out-of-memory occurs the expr also needs freeing.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Update comment as the stat_config no longer holds all metrics.
Signed-off-by: Ian Rogers <irogers@google.com>
Fixes: faebee18d7 ("perf stat: Move metric list from config to evlist")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The metric_events exist in the metric_expr list and so this variable
has been unused for a while.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The syscalls_sys_{enter,exit} map in augmented_raw_syscalls.bpf.c has
max entries of 512. Usually syscall numbers are smaller than this but
x86 has x32 ABI where syscalls start from 512.
That makes trace__init_syscalls_bpf_prog_array_maps() fail in the middle
of the loop when it accesses those keys. As the loop iteration is not
ordered by syscall numbers anymore, the failure can affect non-x32
syscalls.
Let's increase the map size to 1024 so that it can handle those ABIs
too. While most systems won't need this, increasing the size will be
safer for potential future changes.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
On some machines, it caused troubles when it tried to find kernel
symbols. I think it's because kernel modules and kallsyms are messed
up during load and split.
Basically we want to make sure the kernel map is loaded and the code has
it in the lock_contention_read(). But recently we added more lookups in
the lock_contention_prepare() which is called before _read().
Also the kernel map (kallsyms) may not be the first one in the group
like on ARM. Let's use machine__kernel_map() rather than just loading
the first map.
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 688d2e8de2 ("perf lock contention: Add -l/--lock-addr option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The line_len is only set on success. Check the return value instead.
util/hwmon_pmu.c: In function ‘perf_pmus__read_hwmon_pmus’:
util/hwmon_pmu.c:742:20: warning: ‘line_len’ may be used uninitialized [-Wmaybe-uninitialized]
742 | if (line_len > 0 && line[line_len - 1] == '\n')
| ^
util/hwmon_pmu.c:719:24: note: ‘line_len’ was declared here
719 | size_t line_len;
Fixes: 53cc0b351e ("perf hwmon_pmu: Add a tool PMU exposing events from hwmon in sysfs")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To avoid hardcoding the offset value for synthetic event IDs
in multiple auxtrace modules (arm-spe, cs-etm, intel-pt, etc.),
and to improve code reusability, this patch unifies
the handling of the ID offset via a dedicated helper function.
Signed-off-by: tanze <tanze@kylinos.cn>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Commit b8308511f6 bumped the max events to 1024 but this results in
BPF verifier issues if the number of command line events is too
large. Workaround this by:
1) moving the constants to a header file to share between BPF and perf
C code,
2) testing that the maximum number of events doesn't cause BPF
verifier issues in debug builds,
3) lower the max events from 1024 to 128,
4) in perf stat, if there are more events than the BPF counters can
support then disable BPF counter usage.
The rodata setup is factored into its own function to avoid
duplicating it in the testing code.
Signed-off-by: Ian Rogers <irogers@google.com>
Fixes: b8308511f6 ("perf stat bperf cgroup: Increase MAX_EVENTS from 32 to 1024")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When the OpenCSD library introduces a new enumeration value (for example,
in the v1.7.1 release), the perf build fails with an error:
util/cs-etm-decoder/cs-etm-decoder.c:600:10: error: enumeration value 'OCSD_GEN_TRC_ELEM_ITMTRACE' not explicitly handled in switch [-Werror, -Wswitch-enum]
600 | switch (elem->elem_type) {
| ^~~~~~~~~~~~~~~
1 error generated.
Convert to if-else sentences to mute the enumeration value warning,
which can avoid build failures whenever the lib is updated.
No functional change.
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The tracked pointer offset was not being preserved in the stack state,
which could lead to incorrect type analysis. This change adds a
ptr_offset field to the type_state_stack struct and passes it to
set_stack_state and findnew_stack_state to ensure the offset is
preserved after the pointer is loaded from a stack location. It improves
the type annotation coverage and quality.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Track the arithmetic operations on registers with pointer types. We
handle only add, sub and lea instructions. The original pointer
information needs to be preserved for getting outermost struct types.
For example, reg0 points to a struct cfs_rq, when we add 0x10 to reg0,
it should preserve the information of struct cfs_rq + 0x10 in the
register instead of a pointer type to the child field at 0x10.
Details:
1. struct type_state_reg now includes an offset, indicating if the
register points to the start or an internal part of its associated
type. This offset is used in mem to reg and reg to stack mem
transfers, and also applied to the final type offset.
2. lea offset(%sp/%fp), reg is now treated as taking the address of a
stack variable. It worked fine in most cases, but an issue with this
approach is the pointer type may not exist.
3. lea offset(%base), reg is handled by moving the type from %base and
adding an offset, similar to an add operation followed by a mov reg
to reg.
4. Non-stack variables from DWARF with non-zero offsets in their
location expressions are now accepted with register offset tracking.
Multi-register addressing modes in LEA are not supported.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Introduce TSR_KIND_POINTER to improve the data type profiler's ability
to track pointer-based memory accesses and address register variables.
TSR_KIND_POINTER represents that the location holds a pointer type to
the type in the type state. The semantics match the `breg` registers
that describe a memory location.
This change implements handling for this new kind in mov instructions
and in the check_matching_type() function. When a TSR_KIND_POINTER is
moved to the stack, the stack state size is set to the architecture's
pointer size.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Introduce a helper function is_address_gen_insn() to check
arch-dependent address generation instructions like lea in x86. Remove
type annotation on these instructions since they are not accessing
memory. It should be counted as `no_mem_ops`.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Check the error code of evsel__get_arch() in the symbol__annotate().
Previously it checked non-zero value but after the refactoring it does
only for negative values.
Fixes: 0669729eb0 ("perf annotate: Factor out evsel__get_arch()")
Suggested-by: James Clark <james.clark@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.
The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.
Stack trace as below:
Perf: Segmentation fault
-------- backtrace --------
#0 0x55d365 in ui__signal_backtrace setup.c:0
#1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
#2 0x570f08 in arch__is perf[570f08]
#3 0x562186 in annotate_get_insn_location perf[562186]
#4 0x562626 in __hist_entry__get_data_type annotate.c:0
#5 0x56476d in annotation_line__write perf[56476d]
#6 0x54e2db in annotate_browser__write annotate.c:0
#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
#8 0x54dc9e in annotate_browser__refresh annotate.c:0
#9 0x54c03d in __ui_browser__refresh browser.c:0
#10 0x54ccf8 in ui_browser__run perf[54ccf8]
#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
#12 0x552293 in do_annotate hists.c:0
#13 0x55941c in evsel__hists_browse hists.c:0
#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
#15 0x42ff02 in cmd_report perf[42ff02]
#16 0x494008 in run_builtin perf.c:0
#17 0x494305 in handle_internal_command perf.c:0
#18 0x410547 in main perf[410547]
#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
#21 0x410b75 in _start perf[410b75]
Fixes: 1d4374afd0 ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The recent change for perf c2c annotate broke build without slang
support like below.
builtin-annotate.c: In function 'hists__find_annotations':
builtin-annotate.c:522:73: error: 'NO_ADDR' undeclared (first use in this function); did you mean 'NR_ADDR'?
522 | key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR);
| ^~~~~~~
| NR_ADDR
builtin-annotate.c:522:73: note: each undeclared identifier is reported only once for each function it appears in
builtin-annotate.c:522:31: error: too many arguments to function 'hist_entry__tui_annotate'
522 | key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR);
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from util/sort.h:6,
from builtin-annotate.c:28:
util/hist.h:756:19: note: declared here
756 | static inline int hist_entry__tui_annotate(struct hist_entry *he __maybe_unused,
| ^~~~~~~~~~~~~~~~~~~~~~~~
And I noticed that it missed to update the other side of #ifdef
HAVE_SLANG_SUPPORT. Let's fix it.
Cc: Tianyou Li <tianyou.li@intel.com>
Fixes: cd3466cd26 ("perf c2c: Add annotation support to perf c2c report")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Events with an X modifier were reordered within a group, for example
slots was made the leader in:
```
$ perf record -e '{cpu/mem-stores/ppu,cpu/slots/uX}' -- sleep 1
```
Fix by making `dont_regroup` evsels always use their index for
sorting. Make the cur_leader, when fixing the groups, be that of
`dont_regroup` evsel so that the `dont_regroup` evsel doesn't become a
leader.
On a tigerlake this patch corrects this and meets expectations in:
```
$ perf stat -e '{cpu/mem-stores/,cpu/slots/uX}' -a -- sleep 0.1
Performance counter stats for 'system wide':
83,458,652 cpu/mem-stores/
2,720,854,880 cpu/slots/uX
0.103780587 seconds time elapsed
$ perf stat -e 'slots,slots:X' -a -- sleep 0.1
Performance counter stats for 'system wide':
732,042,247 slots (48.96%)
643,288,155 slots:X (51.04%)
0.102731018 seconds time elapsed
```
Closes: https://lore.kernel.org/lkml/18f20d38-070c-4e17-bc90-cf7102e1e53d@linux.intel.com/
Fixes: 035c178930 ("perf parse-events: Add 'X' modifier to exclude an event from being regrouped")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The MAX_EVENTS value ensured a counted loop presumably to satisfy the
BPF verifier. It is possible to go past 32 events when gathering
uncore events. Increase the amount to 1024 as that should provide some
amount of headroom.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add an optional PMU argument to parse_metrics to allow restriction of
the particular metrics to be opened. If no argument is provided then
all metrics with the given name/group are opened
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ensure both the perf_event_attr and alternate_hw_config are checked in
the match. Don't mask the config if the perf_event_attr isn't a
HARDWARE or HW_CACHE event. Add common early exit cases.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Rather than wildcard matching the cycles event specify only the core
PMUs. This avoids potentially loading unnecessary uncore PMUs.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now that legacy hardware and cache events are in json, having the
lexer match the specific event is no longer necessary and generic PMU
parsing can take place. Because of this remove the specific term
parsing, event adding, and passing of alternate_hw_config which was
now always PERF_COUNT_HW_MAX.
This mirrors a similar change for software events in commit 6e9fa4131a
("perf parse-events: Remove non-json software events").
With no hard coded legacy hardware or cache events the wild card, case
insensitivity, etc. is consistent for events. This does, however, mean
events like cycles will wild card against all PMUs. A change does the
same was originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
and reverted by Linus in commit 4f1b067359 ("Revert "perf
parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
his dislike for the cycles behavior on ARM. Earlier patches in this
series make perf record event opening failures non-fatal and hide the
cycles event's failure to open on ARM in perf record, so it is
expected the behavior will now be transparent in perf record. perf
stat with a cycles event will wildcard open the event on all PMUs. As
cycles is a "default event", the perf stat behavior for default events
was updated to only open them on core/software PMUs.
The change to support legacy events with PMUs was done to clean up
Intel's hybrid PMU implementation. Having sysfs/json events with
increased priority to legacy was requested by Mark Rutland
<mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
events on that PMU. It was requested that RISC-V be able to add events
to the perf tool json so the PMU driver didn't need to map legacy
events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
A previous series of patches decreasing legacy hardware event
priorities was posted in:
https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
software events can be implemented similarly:
https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now legacy hardware events are in json there's no need for a specific
printing routine that previously served for both hardware and software
events. The associated event_symbols_hw is also removed. To support
the previous filtered version use an event glob of "legacy hardware"
which matches the topic of the json events.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now legacy cache events are in json there's no need for a specific
printing routine. To support the previous filtered version use an
event glob of "legacy cache" which matches the topic of the json
events.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add support to finding/adding events from the default_core event
table. If an event already exists from sysfs/json then the
default_core configuration is saved in the legacy_terms string. Lazily
use the legacy_terms string to set a legacy hardware or cache event as
deprecated if the core PMU doesn't support it. Use the legacy terms
string to set the alternate_hw_config, avoiding the value needing to
be passed from the parse_events parser.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add the PMU terms legacy-hardware-config and
legacy-cache-config. These terms are similar to the config term in
that their values are assigned to the perf_event_attr config
value. They differ in that the PMU type is switched to be either
PERF_TYPE_HARDWARE or PERF_TYPE_HW_CACHE, and the PMU type is moved
into the extended type information of the config value. This will
allow later patches to add legacy events to json.
An example use of the terms is in the following:
```
$ perf stat -vv -e 'cpu/legacy-hardware-config=1/,cpu/legacy-cache-config=0x10001/' true
Using CPUID GenuineIntel-6-8D-1
Attempt to add: cpu/legacy-hardware-config=0x1/
..after resolving event: cpu/legacy-hardware-config=0x1/
Attempt to add: cpu/legacy-cache-config=0x10001/
..after resolving event: cpu/legacy-cache-config=0x10001/
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x1 (PERF_COUNT_HW_INSTRUCTIONS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
type 3 (PERF_TYPE_HW_CACHE)
size 136
config 0x10001 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_L1I)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 4
cpu/legacy-hardware-config=1/: -1: 1364046 414756 414756
cpu/legacy-cache-config=0x10001/: -1: 57453 414756 414756
cpu/legacy-hardware-config=1/: 1364046 414756 414756
cpu/legacy-cache-config=0x10001/: 57453 414756 414756
Performance counter stats for 'true':
1,364,046 cpu/legacy-hardware-config=1/
57,453 cpu/legacy-cache-config=0x10001/
0.001988593 seconds time elapsed
0.002194000 seconds user
0.000000000 seconds sys
```
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The FILE argument was necessary for the scanner but now that
functionality is not being used we can switch to just using
io__getline which should cut down on stdio buffer usage.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now the events file isn't directly parsed from a FILE but stored in a
string prior to parsing, remove the FILE argument to the associated
scanner functions as they only ever pass NULL.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When an event/alias is created for a PMU the terms are eagerly parsed
using parse_events_terms. For a command like perf stat or perf record,
the particular event/alias will be found, the terms parsed, the
terms cloned for use in the event parsing, and then the terms used to
configure the perf_event_attr. Events/aliases may be eagerly loaded,
such as from sysfs or in perf list, in which case the aliases terms
will be little or never used. To avoid redundant work, to avoid
cloning, and to reduce memory overhead, hold the terms for an event as
a string until they need handling as a term list. This may introduce
duplicate parsing if an event is repeated in a list, but this
situation is expected to be uncommon.
Measuring the number of instructions before and after with a sysfs
event and perf stat, there is a minor reduction in the number of
instructions executed by 0.3%.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Scan the software PMU first rather than last as it is the least likely
to fail the probe. Specifying the software PMU by name was enabled by
commit 9957d8c801 ("perf jevents: Add common software event
json"). For hardware events, add core PMU names when getting events to
probe so that not all PMUs are scanned. For example, when legacy
events support wildcards and for the event "cycles:u" on x86, we want
to only scan the "cpu" PMU and not all uncore PMUs for the event too.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The term list when adding an event to a PMU is expected to have the
event name for the alias lookup. Also, set found_supported so that
-EINVAL isn't returned.
Fixes: 62593394f6 ("perf parse-events: Legacy cache names on all
PMUs and lower priority")
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The bperf BPF counter code doesn't handle "any"(-1) CPU events, always
wanting to aggregate a count against a CPU, which avoids the need for
atomics so let's not change that. Force evsels used for BPF counters
to require a CPU when not in system-wide mode so that the "any"(-1)
value isn't used during map propagation and evsel's CPU map matches
that of the PMU.
Fixes: b91917c0c6 ("perf bpf_counter: Fix handling of cpumap fixing hybrid")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>