linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-15 13:14:59 -05:00

Author	SHA1	Message	Date
Leo Yan	c8bf2a05df	perf arm_spe: Rename SPE_OP_PKT_IS_OTHER_SVE_OP macro Rename the macro to SPE_OP_PKT_OTHER_SUBCLASS_SVE to unify naming. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	b4eaece3d9	perf arm_spe: Decode GCS operation Decode a load or store from a GCS operation and the associated "common" field. After: . 00000000: 49 44 LD GCS COMM . 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18 . 0000000b: 9a 00 00 LAT 0 XLAT . 0000000e: 43 00 DATA-SOURCE 0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	b61ca7219d	perf arm_spe: Unify operation naming Rename extended subclass and SVE/SME register access subclass, so that the naming can be consistent cross all sub classes. Add an log "SVE-SME-REG" for the SVE/SME register access, this is easier for parsing. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	33e1fffea4	perf arm_spe: Fix memset subclass in operation The operation subclass is extracted from bits [7..1] of the payload. Since bit [0] is not parsed, there is no chance to match the memset type (0x25). As a result, the memset payload is never parsed successfully. Instead of extracting a unified bit field, change to extract the specific bits for each operation subclass. Fixes: `34fb60400e` ("perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Ian Rogers	d8d8a0b360	perf tool_pmu: More accurately set the cpus for tool events The user and system time events can record on different CPUs, but for all other events a single CPU map of just CPU 0 makes sense. In parse-events detect a tool PMU and then pass the perf_event_attr so that the tool_pmu can return CPUs specific for the event. This avoids a CPU map of all online CPUs being used for events like duration_time. Avoiding this avoids the evlist CPUs containing CPUs for which duration_time just gives 0. Minimizing the evlist CPUs can remove unnecessary sched_setaffinity syscalls that delay metric calculations. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:09 -08:00
Ian Rogers	d702c0f4af	perf stat: Reduce scope of walltime_nsecs_stats walltime_nsecs_stats is no longer used for counter values, move into that stat_config where it controls certain things like noise measurement. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:09 -08:00
Ian Rogers	557c34435b	perf stat: Reduce scope of ru_stats The ru_stats are used to capture user and system time stats when a process exits. These are then applied to user and system time tool events if their reads fail due to the process terminating. Reduce the scope now the metric code no longer reads these values. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:09 -08:00
Ian Rogers	3d65f6445f	perf stat-shadow: Read tool events directly When reading time values for metrics don't use the globals updated in builtin-stat, just read the events as regular events. The only exception is for time events where nanoseconds need converting to seconds as metrics assume time metrics are in seconds. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:08 -08:00
Ian Rogers	bdf96c4ecd	perf tool_pmu: Use old_count when computing count values for time events When running in interval mode every third count of a time event isn't showing properly: ``` $ perf stat -e duration_time -a -I 1000 1.001082862 1,002,290,425 duration_time 2.004264262 1,003,183,516 duration_time 3.007381401 <not counted> duration_time 4.011160141 1,003,705,631 duration_time 5.014515385 1,003,290,110 duration_time 6.018539680 <not counted> duration_time 7.022065321 1,003,591,720 duration_time ``` The regression came in with a different fix, found through bisection, commit `68cb156743` ("perf tool_pmu: Fix aggregation on duration_time"). The issue is caused by the enabled and running time of the event matching the old_count's and creating a delta of 0, which is indicative of an error. Fixes: `68cb156743` ("perf tool_pmu: Fix aggregation on duration_time") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:08 -08:00
Ian Rogers	f69d34e8f2	perf pmu: perf_cpu_map__new_int to avoid parsing a string Prefer perf_cpu_map__new_int(0) to perf_cpu_map__new("0") as it avoids strings parsing. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:08 -08:00
Ian Rogers	af9e8d12b1	libperf cpumap: Reduce allocations and sorting in intersect On hybrid platforms the CPU maps are often disjoint. Rather than copy CPUs and trim, compute the number of common CPUs, if none early exit, otherwise copy in an sorted order. This avoids memory allocation in the disjoint case and avoids a second malloc and useless sort in the previous trim cases. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:08 -08:00
Ian Rogers	289815011c	perf stat: Display metric-only for 0 counters 0 counters may occur in hypervisor settings but metric-only output is always expected. This resolves an issue in the "perf stat STD output linter" test. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 17:31:59 -08:00
Ian Rogers	efee18981a	perf test: Don't fail if user rdpmc returns 0 when disabled In certain hypervisor set ups the value 0 may be returned but this is only erroneous if the user rdpmc isn't disabled. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-16 23:24:00 -08:00
Ian Rogers	d3726d4e5b	perf parse-events: Add debug logging to perf_event If verbose is enabled and parse_event is called, typically by tests, log failures. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-16 23:24:00 -08:00
Ian Rogers	c335b7a960	perf test: Be tolerant of missing json metric none value print_metric_only_json and print_metric_end in stat-display.c may create a metric value of "none" which fails validation as isfloat. Add a helper to properly validate metric numeric values. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-16 23:24:00 -08:00
liujing	38367a22ab	perf sample: Fix the wrong format specifier In the file tools/perf/util/cs-etm.c, queue_nr is of type unsigned int and should be printed with %u. Signed-off-by: liujing <liujing@cmss.chinamobile.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-16 23:12:19 -08:00
James Clark	86ce2a29dd	perf script: Fix build by removing unused evsel_script() The evsel_script() function is unused since the linked commit. Fix the build by removing it. Fixes the following compilation error: static inline struct evsel_script evsel_script(struct evsel evsel) ^ builtin-script.c:347:36: error: unused function 'evsel_script' [-Werror,-Wunused-function] Fixes: `3622990efa` ("perf script: Change metric format to use json metrics") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-15 09:44:04 -08:00
Ian Rogers	c1932fb85a	perf vendor metrics s390: Avoid has_event(INSTRUCTIONS) The instructions event is now provided in json meaning the has_event test always succeeds. Switch to using non-legacy event names in the affected metrics. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/3e80f453-f015-4f4f-93d3-8df6bb6b3c95@linux.ibm.com/ Fixes: `0012e0fa22` ("perf jevents: Add legacy-hardware and legacy-cache json") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-13 23:22:34 -08:00
Ian Rogers	ca016b6527	perf auxtrace: Remove errno.h from auxtrace.h and fix transitive dependencies errno.h isn't used in auxtrace.h so remove it and fix build failures caused by transitive dependencies through auxtrace.h on errno.h. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-13 23:03:11 -08:00
Ian Rogers	754187ad73	perf build: Remove NO_AUXTRACE build option The NO_AUXTRACE build option was used when the __get_cpuid feature test failed or if it was provided on the command line. The option no longer avoids a dependency on a library and so having the option is just adding complexity to the code base. Remove the option CONFIG_AUXTRACE from Build files and HAVE_AUXTRACE_SUPPORT by assuming it is always defined. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-13 23:03:11 -08:00
Ian Rogers	c819bfdc4a	tool build: Remove __get_cpuid feature test This feature test is no longer used so remove. The function tested by the feature test is used in: tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c however, the Makefile just assumes the presence of the function and doesn't perform a build feature test for it. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-13 23:03:11 -08:00
Ian Rogers	2566bbfc0a	perf build: Don't add NO_AUXTRACE if missing feature-get_cpuid The intel-pt code dependent on __get_cpuid is no longer present so remove the feature test in the Makefile.config. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-13 23:03:11 -08:00
Ian Rogers	8933c624d9	perf intel-pt: Use the perf provided "cpuid.h" Rather than having a feature test and include of <cpuid.h> for the __get_cpuid function, use the cpuid function provided by tools/perf/arch/x86/util/cpuid.h. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-13 23:03:11 -08:00
Zide Chen	fc9c17b223	perf test: Add a perf event fallback test This adds test cases to verify the precise ip fallback logic: - If the system supports precise ip, for an event given with the maximum precision level, it should be able to decrease precise_ip to find a supported level. - The same fallback behavior should also work in more complex scenarios, such as event groups or when PEBS is involved Additional fallback tests, such as those covering missing feature cases, can be added in the future. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Reviewed-by: Ian Rogers <irogers!@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-12 10:41:54 -08:00
Namhyung Kim	da8fcfba08	perf stat: Align metric output without events One of my concern in the perf stat output was the alignment in the metrics and shadow stats. I think it missed to calculate the basic output length using COUNTS_LEN and EVNAME_LEN but missed to add the unit length like "msec" and surround 2 spaces. I'm not sure why it's not printed below though. But anyway, now it shows correctly aligned metric output. $ perf stat true Performance counter stats for 'true': 859,772 task-clock # 0.395 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 56 page-faults # 65.134 K/sec 1,075,022 instructions # 0.86 insn per cycle 1,255,911 cycles # 1.461 GHz 220,573 branches # 256.548 M/sec 7,381 branch-misses # 3.35% of all branches TopdownL1 # 19.2 % tma_retiring # 28.6 % tma_backend_bound # 9.5 % tma_bad_speculation # 42.6 % tma_frontend_bound 0.002174871 seconds time elapsed ^ \| 0.002154000 seconds user \| 0.000000000 seconds sys here Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 17:00:31 -08:00
Ian Rogers	68cc6ec3ac	perf tool_pmu: Make core_wide and target_cpu json events For the sake of better documentation, add core_wide and target_cpu to the tool.json. When the values of system_wide and user_requested_cpu_list are unknown, use the values from the global stat_config. Example output showing how '-a' modifies the values in `perf stat`: ``` $ perf stat -e core_wide,target_cpu true Performance counter stats for 'true': 0 core_wide 0 target_cpu 0.000993787 seconds time elapsed 0.001128000 seconds user 0.000000000 seconds sys $ perf stat -e core_wide,target_cpu -a true Performance counter stats for 'system wide': 1 core_wide 1 target_cpu 0.002271723 seconds time elapsed $ perf list ... tool: core_wide [1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool] ... target_cpu [1 if CPUs being analyzed,0 if threads/processes. Unit: tool] ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:36 -08:00
Ian Rogers	02432d920e	perf test stat csv: Update test expectations and events Explicitly use a metric rather than implicitly expecting '-e instructions,cycles' to produce a metric. Use a metric with software events to make it more compatible. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:36 -08:00
Ian Rogers	a48cd551d7	perf test stat: Update test expectations and events test_stat_record_report and test_stat_record_script used default output which triggers a bug when sending metrics. As this isn't relevant to the test switch to using named software events. Update the match in test_hybrid as the cycles event is now cpu-cycles to workaround potential ARM issues. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	6b76f0678b	perf test stat: Update shadow test to use metrics Previously '-e cycles,instructions' would implicitly create an IPC metric. This now has to be explicit with '-M insn_per_cycle'. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	91c1949d76	perf test metrics: Update all metrics for possibly failing default metrics Default metrics may use unsupported events and be ignored. These metrics shouldn't cause metric testing to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	083ae6c1fb	perf test stat: Update std_output testing metric expectations Make the expectations match json metrics rather than the previous hard coded ones. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	b1cb2b76bd	perf test stat: Ignore failures in Default[234] metricgroups The Default[234] metric groups may contain unsupported legacy events. Allow those metric groups to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	2c240484cf	perf test stat+json: Improve metric-only testing When testing metric-only, pass a metric to perf rather than expecting a hard coded metric value to be generated. Remove keys that were really metric-only units and instead don't expect metric only to have a matching json key as it encodes metrics as {"metric_name", "metric_value"}. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	1bcd627165	perf stat: Remove "unit" workarounds for metric-only Remove code that tested the "unit" as in KB/sec for certain hard coded metric values and did workarounds. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	a745c0831c	perf stat: Sort default events/metrics To improve the readability of default events/metrics, sort the evsels after the Default metric groups have be parsed. Before: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,087 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.3 % tma_bad_speculation # 25.8 % tma_frontend_bound # 34.5 % tma_backend_bound # 29.3 % tma_retiring 7,829 page-faults # nan faults/sec page_faults_per_second 880,144,270 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.10%) 1,693,081,235 cpu_core/cpu-cycles/ # nan GHz cycles_frequency TopdownL1 (cpu_atom) # 20.5 % tma_bad_speculation # 13.8 % tma_retiring (50.26%) # 34.6 % tma_frontend_bound (50.23%) 89,326,916 cpu_atom/branches/ # nan M/sec branch_frequency (60.19%) 538,123,088 cpu_core/branches/ # nan M/sec branch_frequency 1,368 cpu-migrations # nan migrations/sec migrations_per_second # 31.1 % tma_backend_bound (60.19%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 485,744,856 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (59.87%) 3,093,112,283 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 4,939,427 cpu_atom/branch-misses/ # 5.0 % branch_miss_rate (49.77%) 7,632,248 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.005084693 seconds time elapsed ``` After: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,165 context-switches # nan cs/sec cs_per_second 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 2,260 cpu-migrations # nan migrations/sec migrations_per_second 20,476 page-faults # nan faults/sec page_faults_per_second 17,052,357 cpu_core/branch-misses/ # 1.5 % branch_miss_rate 1,120,090,590 cpu_core/branches/ # nan M/sec branch_frequency 3,402,892,275 cpu_core/cpu-cycles/ # nan GHz cycles_frequency 6,129,236,701 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 6,159,523 cpu_atom/branch-misses/ # 3.1 % branch_miss_rate (49.86%) 222,158,812 cpu_atom/branches/ # nan M/sec branch_frequency (50.25%) 1,547,610,244 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.40%) 1,304,901,260 cpu_atom/instructions/ # 0.8 instructions insn_per_cycle (50.41%) TopdownL1 (cpu_core) # 13.7 % tma_bad_speculation # 23.5 % tma_frontend_bound # 33.3 % tma_backend_bound # 29.6 % tma_retiring TopdownL1 (cpu_atom) # 32.1 % tma_backend_bound (59.65%) # 30.1 % tma_frontend_bound (59.51%) # 22.3 % tma_bad_speculation # 15.5 % tma_retiring (59.53%) 1.008405429 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	19df87d9ed	perf stat: Fix default metricgroup display on hybrid The logic to skip output of a default metric line was firing on Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the need_full_name check as it is equivalent to the different PMU test in the cases we care about, merge the 'if's and flip the evsel of the PMU test. The 'if' is now basically saying, if the output matches the last printed output then skip the output. Before: ``` TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation # 24.3 % tma_frontend_bound TopdownL1 (cpu_core) # 33.9 % tma_backend_bound # 30.6 % tma_retiring # 42.2 % tma_backend_bound # 25.0 % tma_frontend_bound (49.81%) # 12.8 % tma_bad_speculation # 20.0 % tma_retiring (59.46%) ``` After: ``` TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation # 43.7 % tma_frontend_bound # 30.7 % tma_backend_bound # 17.2 % tma_retiring TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound # 37.6 % tma_frontend_bound (49.66%) # 18.0 % tma_bad_speculation # 12.6 % tma_retiring (59.58%) ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	b71f46a6a7	perf stat: Remove hard coded shadow metrics Now that the metrics are encoded in common json the hard coded printing means the metrics are shown twice. Remove the hard coded version. This means that when specifying events, and those events correspond to a hard coded metric, the metric will no longer be displayed. The metric will be displayed if the metric is requested. Due to the adhoc printing in the previous approach it was often found frustrating, the new approach avoids this. The default perf stat output on an alderlake now looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 19,697 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.7 % tma_bad_speculation # 24.9 % tma_frontend_bound TopdownL1 (cpu_core) # 34.3 % tma_backend_bound # 30.1 % tma_retiring 6,593 page-faults # nan faults/sec page_faults_per_second 729,065,658 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.79%) 1,605,131,101 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 19.7 % tma_bad_speculation # 14.2 % tma_retiring (50.14%) # 37.3 % tma_frontend_bound (50.31%) 87,302,268 cpu_atom/branches/ # nan M/sec branch_frequency (60.27%) 512,046,956 cpu_core/branches/ # nan M/sec branch_frequency 1,111 cpu-migrations # nan migrations/sec migrations_per_second # 28.8 % tma_backend_bound (60.26%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 392,509,323 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (60.19%) 2,990,369,310 cpu_core/instructions/ # 1.9 instructions insn_per_cycle 3,493,478 cpu_atom/branch-misses/ # 5.9 % branch_miss_rate (49.69%) 7,297,531 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.006621701 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	3622990efa	perf script: Change metric format to use json metrics The metric format option isn't properly supported. This change improves that by making the sample events update the counts of an evsel, where the shadow metric code expects to read the values. To support printing metrics, metrics need to be found. This is done on the first attempt to print a metric. Every metric is parsed and then the evsels in the metric's evlist compared to those in perf script using the perf_event_attr type and config. If the metric matches then it is added for printing. As an event in the perf script's evlist may have >1 metric id, or different leader for aggregation, the first metric matched will be displayed in those cases. An example use is: ``` $ perf record -a -e '{instructions,cpu-cycles}:S' -a -- sleep 1 $ perf script -F period,metric ... 867817 metric: 0.30 insn per cycle 125394 metric: 0.04 insn per cycle 313516 metric: 0.11 insn per cycle metric: 1.00 insn per cycle ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	2dfc0cab3d	perf stat: Add detail -d,-dd,-ddd metrics Add metrics for the stat-shadow -d, -dd and -ddd events and hard coded metrics. Remove the events as these now come from the metrics. Following this change a detailed perf stat output looks like: ``` $ perf stat -a -ddd -- sleep 1 Performance counter stats for 'system wide': 21,089 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 14.1 % tma_bad_speculation # 27.3 % tma_frontend_bound (30.56%) TopdownL1 (cpu_core) # 31.5 % tma_backend_bound # 27.2 % tma_retiring (30.56%) 6,302 page-faults # nan faults/sec page_faults_per_second 928,495,163 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (28.41%) 1,841,409,834 cpu_core/cpu-cycles/ # nan GHz cycles_frequency (38.51%) # 14.5 % tma_bad_speculation # 16.0 % tma_retiring (28.41%) # 36.8 % tma_frontend_bound (35.57%) 100,859,118 cpu_atom/branches/ # nan M/sec branch_frequency (42.73%) 572,657,734 cpu_core/branches/ # nan M/sec branch_frequency (54.43%) 1,527 cpu-migrations # nan migrations/sec migrations_per_second # 32.7 % tma_backend_bound (42.73%) 0.00 msec cpu-clock # 0.000 CPUs utilized # 0.0 CPUs CPUs_utilized 498,668,509 cpu_atom/instructions/ # 0.57 insn per cycle # 0.6 instructions insn_per_cycle (42.97%) 3,281,762,225 cpu_core/instructions/ # 1.84 insn per cycle # 1.8 instructions insn_per_cycle (62.20%) 4,919,511 cpu_atom/branch-misses/ # 5.43% of all branches # 5.4 % branch_miss_rate (35.80%) 7,431,776 cpu_core/branch-misses/ # 1.39% of all branches # 1.4 % branch_miss_rate (62.20%) 2,517,007 cpu_atom/LLC-loads/ # 0.1 % llc_miss_rate (28.62%) 3,931,318 cpu_core/LLC-loads/ # 40.4 % llc_miss_rate (45.98%) 14,918,674 cpu_core/L1-dcache-load-misses/ # 2.25% of all L1-dcache accesses # nan % l1d_miss_rate (37.80%) 27,067,264 cpu_atom/L1-icache-load-misses/ # 15.92% of all L1-icache accesses # 15.9 % l1i_miss_rate (21.47%) 116,848,994 cpu_atom/dTLB-loads/ # 0.8 % dtlb_miss_rate (21.47%) 764,870,407 cpu_core/dTLB-loads/ # 0.1 % dtlb_miss_rate (15.12%) 1.006181526 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	a3248b5b54	perf jevents: Add metric DefaultShowEvents Some Default group metrics require their events showing for consistency with perf's previous behavior. Add a flag to indicate when this is the case and use it in stat-display. As events are coming from Default metrics remove that default hardware and software events from perf stat. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 20,550 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation # 28.1 % tma_frontend_bound TopdownL1 (cpu_core) # 29.2 % tma_backend_bound # 33.7 % tma_retiring 6,685 page-faults # nan faults/sec page_faults_per_second 790,091,064 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.83%) 2,563,918,366 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 12.3 % tma_bad_speculation # 14.5 % tma_retiring (50.20%) # 33.8 % tma_frontend_bound (50.24%) 76,390,322 cpu_atom/branches/ # nan M/sec branch_frequency (60.20%) 1,015,173,047 cpu_core/branches/ # nan M/sec branch_frequency 1,325 cpu-migrations # nan migrations/sec migrations_per_second # 39.3 % tma_backend_bound (60.17%) 0.00 msec cpu-clock # 0.000 CPUs utilized # 0.0 CPUs CPUs_utilized 554,347,072 cpu_atom/instructions/ # 0.64 insn per cycle # 0.6 instructions insn_per_cycle (60.14%) 5,228,931,991 cpu_core/instructions/ # 2.04 insn per cycle # 2.0 instructions insn_per_cycle 4,308,874 cpu_atom/branch-misses/ # 5.65% of all branches # 5.6 % branch_miss_rate (49.76%) 9,890,606 cpu_core/branch-misses/ # 0.97% of all branches # 1.0 % branch_miss_rate 1.005477803 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	c7adeb0974	perf jevents: Add set of common metrics based on default ones Add support to getting a common set of metrics from a default table. It simplifies the generation to add json metrics at the same time. The metrics added are CPUs_utilized, cs_per_second, migrations_per_second, page_faults_per_second, insn_per_cycle, stalled_cycles_per_instruction, frontend_cycles_idle, backend_cycles_idle, cycles_frequency, branch_frequency and branch_miss_rate based on the shadow metric definitions. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 2 Performance counter stats for 'system wide': 0.00 msec cpu-clock # 0.000 CPUs utilized 77,739 context-switches 15,033 cpu-migrations 321,313 page-faults 14,355,634,225 cpu_atom/instructions/ # 1.40 insn per cycle (35.37%) 134,561,560,583 cpu_core/instructions/ # 3.44 insn per cycle (57.85%) 10,263,836,145 cpu_atom/cycles/ (35.42%) 39,138,632,894 cpu_core/cycles/ (57.60%) 2,989,658,777 cpu_atom/branches/ (42.60%) 32,170,570,388 cpu_core/branches/ (57.39%) 29,789,870 cpu_atom/branch-misses/ # 1.00% of all branches (42.69%) 165,991,152 cpu_core/branch-misses/ # 0.52% of all branches (57.19%) (software) # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 11.9 % tma_bad_speculation # 19.6 % tma_frontend_bound (63.97%) TopdownL1 (cpu_core) # 18.8 % tma_backend_bound # 49.7 % tma_retiring (63.97%) (software) # nan faults/sec page_faults_per_second # nan GHz cycles_frequency (42.88%) # nan GHz cycles_frequency (69.88%) TopdownL1 (cpu_atom) # 11.7 % tma_bad_speculation # 29.9 % tma_retiring (50.07%) TopdownL1 (cpu_atom) # 31.3 % tma_frontend_bound (43.09%) (cpu_atom) # nan M/sec branch_frequency (43.09%) # nan M/sec branch_frequency (70.07%) # nan migrations/sec migrations_per_second TopdownL1 (cpu_atom) # 27.1 % tma_backend_bound (43.08%) (software) # 0.0 CPUs CPUs_utilized # 1.4 instructions insn_per_cycle (43.04%) # 3.5 instructions insn_per_cycle (69.99%) # 1.0 % branch_miss_rate (35.46%) # 0.5 % branch_miss_rate (65.02%) 2.005626564 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:34 -08:00
Ian Rogers	2e5140849b	perf expr: Add #target_cpu literal For CPU nanoseconds a lot of the stat-shadow metrics use either task-clock or cpu-clock, the latter being used when target__has_cpu. Add a #target_cpu literal so that json metrics can perform the same test. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:34 -08:00
Ian Rogers	c8035a4961	perf metricgroup: Add care to picking the evsel for displaying a metric Rather than using the first evsel in the matched events, try to find the least shared non-tool evsel. The aim is to pick the first evsel that typifies the metric within the list of metrics. This addresses an issue where Default metric group metrics may lose their counter value due to how the stat displaying hides counters for default event/metric output. For a metricgroup like TopdownL1 on an Intel Alderlake the change is, before there are 4 events with metrics: ``` $ perf stat -M topdownL1 -a sleep 1 Performance counter stats for 'system wide': 7,782,334,296 cpu_core/TOPDOWN.SLOTS/ # 10.4 % tma_bad_speculation # 19.7 % tma_frontend_bound 2,668,927,977 cpu_core/topdown-retiring/ # 35.7 % tma_backend_bound # 34.1 % tma_retiring 803,623,987 cpu_core/topdown-bad-spec/ 167,514,386 cpu_core/topdown-heavy-ops/ 1,555,265,776 cpu_core/topdown-fe-bound/ 2,792,733,013 cpu_core/topdown-be-bound/ 279,769,310 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.2 % tma_retiring # 15.1 % tma_bad_speculation 457,917,232 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 38.4 % tma_backend_bound # 34.2 % tma_frontend_bound 783,519,226 cpu_atom/TOPDOWN_FE_BOUND.ALL/ 10,790,192 cpu_core/INT_MISC.UOP_DROPPING/ 879,845,633 cpu_atom/TOPDOWN_BE_BOUND.ALL/ ``` After there are 6 events with metrics: ``` $ perf stat -M topdownL1 -a sleep 1 Performance counter stats for 'system wide': 2,377,551,258 cpu_core/TOPDOWN.SLOTS/ # 7.9 % tma_bad_speculation # 36.4 % tma_frontend_bound 480,791,142 cpu_core/topdown-retiring/ # 35.5 % tma_backend_bound 186,323,991 cpu_core/topdown-bad-spec/ 65,070,590 cpu_core/topdown-heavy-ops/ # 20.1 % tma_retiring 871,733,444 cpu_core/topdown-fe-bound/ 848,286,598 cpu_core/topdown-be-bound/ 260,936,456 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.4 % tma_retiring # 17.6 % tma_bad_speculation 419,576,513 cpu_atom/CPU_CLK_UNHALTED.CORE/ 797,132,597 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.0 % tma_frontend_bound 3,055,447 cpu_core/INT_MISC.UOP_DROPPING/ 671,014,164 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 32.0 % tma_backend_bound ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:34 -08:00
Namhyung Kim	367377f45c	perf tools: Fix missing feature check for inherit + SAMPLE_READ It should also have PERF_SAMPLE_TID to enable inherit and PERF_SAMPLE_READ on recent kernels. Not having _TID makes the feature check wrongly detect the inherit and _READ support. It was reported that the following command failed due to the error in the missing feature check on Intel SPR machines. $ perf record -e '{cpu/mem-loads-aux/S,cpu/mem-loads,ldlat=3/PS}' -- ls Error: Failure to open event 'cpu/mem-loads,ldlat=3/PS' on PMU 'cpu' which will be removed. Invalid event (cpu/mem-loads,ldlat=3/PS) in per-thread mode, enable system wide with '-a'. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: `3b193a57ba` ("perf tools: Detect missing kernel features properly") Reported-and-tested-by: Chen, Zide <zide.chen@intel.com> Closes: https://lore.kernel.org/lkml/20251022220802.1335131-1-zide.chen@intel.com/ Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:43:37 -08:00
Chen Ni	e279039c3e	perf symbol: Remove unneeded semicolon Remove unnecessary semicolons reported by Coccinelle/coccicheck and the semantic patch at scripts/coccinelle/misc/semicolon.cocci. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-10 22:22:16 -08:00
Ian Rogers	081006b7c8	perf test: Add test that command line period overrides sysfs/json values The behavior of weak terms is subtle, add a test that they aren't accidentally broken. The test finds an event with a weak 'period' and then overrides it. In no such event is present then the test skips. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-09 23:07:57 -08:00
Ian Rogers	0e9b51a432	perf pmu: Make pmu_alias_terms weak again The terms for a json event should be weak so they don't override command line options. Before: ``` $ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1 \|grep "{ sample_period, sample_freq }" { sample_period, sample_freq } 200003 { sample_period, sample_freq } 2000003 { sample_period, sample_freq } 1000 ``` After: ``` $ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1 \|grep "{ sample_period, sample_freq }" { sample_period, sample_freq } 1000 { sample_period, sample_freq } 1000 { sample_period, sample_freq } 1000 ``` Fixes: `84bae3af20` ("perf pmu: Don't eagerly parse event terms") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-09 23:07:57 -08:00
Ian Rogers	6331b26693	perf tool: Add a delegate_tool that just delegates actions to another tool Add an ability to be able to compose perf_tools, by having one perform an action and then calling a delegate. Currently the perf_tools have if-then-elses setting the callback and then if-then-elses within the callback. Understanding the behavior is complex as it is in two places and logic for numerous operations, within things like perf inject, is interwoven. By chaining perf_tools together based on command line options this kind of code can be avoided. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-07 13:25:05 -08:00
Ian Rogers	71062e282d	perf tool: Add the perf_tool argument to all callbacks Getting context for what a tool is doing, such as the perf_inject instance, using container_of the tool is a common pattern in the code. This isn't possible event_op2, event_op3 and event_op4 callbacks as the tool isn't passed. Add the argument and then fix function signatures to match. As tools maybe reading a tool from somewhere else, change that code to use the passed in tool. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-07 13:25:05 -08:00
Xu Yang	fa4a527af5	perf vendor events arm64:: Add i.MX94 DDR Performance Monitor metrics Add JSON metrics for i.MX94 DDR Performance Monitor. Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-06 17:46:16 -08:00

1 2 3 4 5 ...

49381 Commits