Commit Graph

49381 Commits

Author SHA1 Message Date
Leo Yan
c8bf2a05df perf arm_spe: Rename SPE_OP_PKT_IS_OTHER_SVE_OP macro
Rename the macro to SPE_OP_PKT_OTHER_SUBCLASS_SVE to unify naming.

Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18 20:31:29 -08:00
Leo Yan
b4eaece3d9 perf arm_spe: Decode GCS operation
Decode a load or store from a GCS operation and the associated "common"
field.

After:

  .  00000000:  49 44                                           LD GCS COMM
  .  00000002:  b2 18 3c d7 83 00 80 ff ff                      VA 0xffff800083d73c18
  .  0000000b:  9a 00 00                                        LAT 0 XLAT
  .  0000000e:  43 00                                           DATA-SOURCE 0

Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18 20:31:29 -08:00
Leo Yan
b61ca7219d perf arm_spe: Unify operation naming
Rename extended subclass and SVE/SME register access subclass, so that
the naming can be consistent cross all sub classes.

Add an log "SVE-SME-REG" for the SVE/SME register access, this is easier
for parsing.

Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18 20:31:29 -08:00
Leo Yan
33e1fffea4 perf arm_spe: Fix memset subclass in operation
The operation subclass is extracted from bits [7..1] of the payload.
Since bit [0] is not parsed, there is no chance to match the memset type
(0x25). As a result, the memset payload is never parsed successfully.

Instead of extracting a unified bit field, change to extract the
specific bits for each operation subclass.

Fixes: 34fb60400e ("perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18 20:31:29 -08:00
Ian Rogers
d8d8a0b360 perf tool_pmu: More accurately set the cpus for tool events
The user and system time events can record on different CPUs, but for
all other events a single CPU map of just CPU 0 makes sense. In
parse-events detect a tool PMU and then pass the perf_event_attr so
that the tool_pmu can return CPUs specific for the event. This avoids
a CPU map of all online CPUs being used for events like
duration_time. Avoiding this avoids the evlist CPUs containing CPUs
for which duration_time just gives 0. Minimizing the evlist CPUs can
remove unnecessary sched_setaffinity syscalls that delay metric
calculations.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 18:43:09 -08:00
Ian Rogers
d702c0f4af perf stat: Reduce scope of walltime_nsecs_stats
walltime_nsecs_stats is no longer used for counter values, move into
that stat_config where it controls certain things like noise
measurement.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 18:43:09 -08:00
Ian Rogers
557c34435b perf stat: Reduce scope of ru_stats
The ru_stats are used to capture user and system time stats when a
process exits. These are then applied to user and system time tool
events if their reads fail due to the process terminating. Reduce the
scope now the metric code no longer reads these values.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 18:43:09 -08:00
Ian Rogers
3d65f6445f perf stat-shadow: Read tool events directly
When reading time values for metrics don't use the globals updated in
builtin-stat, just read the events as regular events. The only
exception is for time events where nanoseconds need converting to
seconds as metrics assume time metrics are in seconds.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 18:43:08 -08:00
Ian Rogers
bdf96c4ecd perf tool_pmu: Use old_count when computing count values for time events
When running in interval mode every third count of a time event isn't
showing properly:
```
$ perf stat -e duration_time -a -I 1000
     1.001082862      1,002,290,425      duration_time
     2.004264262      1,003,183,516      duration_time
     3.007381401      <not counted>      duration_time
     4.011160141      1,003,705,631      duration_time
     5.014515385      1,003,290,110      duration_time
     6.018539680      <not counted>      duration_time
     7.022065321      1,003,591,720      duration_time
```
The regression came in with a different fix, found through bisection,
commit 68cb156743 ("perf tool_pmu: Fix aggregation on
duration_time"). The issue is caused by the enabled and running time
of the event matching the old_count's and creating a delta of 0, which
is indicative of an error.

Fixes: 68cb156743 ("perf tool_pmu: Fix aggregation on duration_time")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 18:43:08 -08:00
Ian Rogers
f69d34e8f2 perf pmu: perf_cpu_map__new_int to avoid parsing a string
Prefer perf_cpu_map__new_int(0) to perf_cpu_map__new("0") as it avoids
strings parsing.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 18:43:08 -08:00
Ian Rogers
af9e8d12b1 libperf cpumap: Reduce allocations and sorting in intersect
On hybrid platforms the CPU maps are often disjoint. Rather than copy
CPUs and trim, compute the number of common CPUs, if none early exit,
otherwise copy in an sorted order. This avoids memory allocation in
the disjoint case and avoids a second malloc and useless sort in the
previous trim cases.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 18:43:08 -08:00
Ian Rogers
289815011c perf stat: Display metric-only for 0 counters
0 counters may occur in hypervisor settings but metric-only output is
always expected. This resolves an issue in the "perf stat STD output
linter" test.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17 17:31:59 -08:00
Ian Rogers
efee18981a perf test: Don't fail if user rdpmc returns 0 when disabled
In certain hypervisor set ups the value 0 may be returned but this is
only erroneous if the user rdpmc isn't disabled.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16 23:24:00 -08:00
Ian Rogers
d3726d4e5b perf parse-events: Add debug logging to perf_event
If verbose is enabled and parse_event is called, typically by tests,
log failures.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16 23:24:00 -08:00
Ian Rogers
c335b7a960 perf test: Be tolerant of missing json metric none value
print_metric_only_json and print_metric_end in stat-display.c may
create a metric value of "none" which fails validation as isfloat. Add
a helper to properly validate metric numeric values.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16 23:24:00 -08:00
liujing
38367a22ab perf sample: Fix the wrong format specifier
In the file tools/perf/util/cs-etm.c, queue_nr is of type unsigned
int and should be printed with %u.

Signed-off-by: liujing <liujing@cmss.chinamobile.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16 23:12:19 -08:00
James Clark
86ce2a29dd perf script: Fix build by removing unused evsel_script()
The evsel_script() function is unused since the linked commit. Fix the
build by removing it.

Fixes the following compilation error:

  static inline struct evsel_script *evsel_script(struct evsel *evsel)
                                     ^

builtin-script.c:347:36: error: unused function 'evsel_script' [-Werror,-Wunused-function]
Fixes: 3622990efa ("perf script: Change metric format to use json metrics")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-15 09:44:04 -08:00
Ian Rogers
c1932fb85a perf vendor metrics s390: Avoid has_event(INSTRUCTIONS)
The instructions event is now provided in json meaning the has_event
test always succeeds. Switch to using non-legacy event names in the
affected metrics.

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Closes: https://lore.kernel.org/linux-perf-users/3e80f453-f015-4f4f-93d3-8df6bb6b3c95@linux.ibm.com/
Fixes: 0012e0fa22 ("perf jevents: Add legacy-hardware and legacy-cache json")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13 23:22:34 -08:00
Ian Rogers
ca016b6527 perf auxtrace: Remove errno.h from auxtrace.h and fix transitive dependencies
errno.h isn't used in auxtrace.h so remove it and fix build failures
caused by transitive dependencies through auxtrace.h on errno.h.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13 23:03:11 -08:00
Ian Rogers
754187ad73 perf build: Remove NO_AUXTRACE build option
The NO_AUXTRACE build option was used when the __get_cpuid feature
test failed or if it was provided on the command line. The option no
longer avoids a dependency on a library and so having the option is
just adding complexity to the code base. Remove the option
CONFIG_AUXTRACE from Build files and HAVE_AUXTRACE_SUPPORT by assuming
it is always defined.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13 23:03:11 -08:00
Ian Rogers
c819bfdc4a tool build: Remove __get_cpuid feature test
This feature test is no longer used so remove.

The function tested by the feature test is used in:
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
however, the Makefile just assumes the presence of the function and
doesn't perform a build feature test for it.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13 23:03:11 -08:00
Ian Rogers
2566bbfc0a perf build: Don't add NO_AUXTRACE if missing feature-get_cpuid
The intel-pt code dependent on __get_cpuid is no longer present so
remove the feature test in the Makefile.config.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13 23:03:11 -08:00
Ian Rogers
8933c624d9 perf intel-pt: Use the perf provided "cpuid.h"
Rather than having a feature test and include of <cpuid.h> for the
__get_cpuid function, use the cpuid function provided by
tools/perf/arch/x86/util/cpuid.h.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13 23:03:11 -08:00
Zide Chen
fc9c17b223 perf test: Add a perf event fallback test
This adds test cases to verify the precise ip fallback logic:

- If the system supports precise ip, for an event given with the maximum
  precision level, it should be able to decrease precise_ip to find a
  supported level.
- The same fallback behavior should also work in more complex scenarios,
  such as event groups or when PEBS is involved

Additional fallback tests, such as those covering missing feature cases,
can be added in the future.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers!@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-12 10:41:54 -08:00
Namhyung Kim
da8fcfba08 perf stat: Align metric output without events
One of my concern in the perf stat output was the alignment in the
metrics and shadow stats.  I think it missed to calculate the basic
output length using COUNTS_LEN and EVNAME_LEN but missed to add the
unit length like "msec" and surround 2 spaces.  I'm not sure why it's
not printed below though.

But anyway, now it shows correctly aligned metric output.

  $ perf stat true

   Performance counter stats for 'true':

             859,772      task-clock                       #    0.395 CPUs utilized
                   0      context-switches                 #    0.000 /sec
                   0      cpu-migrations                   #    0.000 /sec
                  56      page-faults                      #   65.134 K/sec
           1,075,022      instructions                     #    0.86  insn per cycle
           1,255,911      cycles                           #    1.461 GHz
             220,573      branches                         #  256.548 M/sec
               7,381      branch-misses                    #    3.35% of all branches
                          TopdownL1                        #     19.2 %  tma_retiring
                                                           #     28.6 %  tma_backend_bound
                                                           #      9.5 %  tma_bad_speculation
                                                           #     42.6 %  tma_frontend_bound

         0.002174871 seconds time elapsed                  ^
                                                           |
         0.002154000 seconds user                          |
         0.000000000 seconds sys                          here

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 17:00:31 -08:00
Ian Rogers
68cc6ec3ac perf tool_pmu: Make core_wide and target_cpu json events
For the sake of better documentation, add core_wide and target_cpu to
the tool.json. When the values of system_wide and
user_requested_cpu_list are unknown, use the values from the global
stat_config.

Example output showing how '-a' modifies the values in `perf stat`:
```
$ perf stat -e core_wide,target_cpu true

 Performance counter stats for 'true':

                 0      core_wide
                 0      target_cpu

       0.000993787 seconds time elapsed

       0.001128000 seconds user
       0.000000000 seconds sys

$ perf stat -e core_wide,target_cpu -a true

 Performance counter stats for 'system wide':

                 1      core_wide
                 1      target_cpu

       0.002271723 seconds time elapsed

$ perf list
...
tool:
  core_wide
       [1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool]
...
  target_cpu
       [1 if CPUs being analyzed,0 if threads/processes. Unit: tool]
...
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:36 -08:00
Ian Rogers
02432d920e perf test stat csv: Update test expectations and events
Explicitly use a metric rather than implicitly expecting '-e
instructions,cycles' to produce a metric. Use a metric with software
events to make it more compatible.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:36 -08:00
Ian Rogers
a48cd551d7 perf test stat: Update test expectations and events
test_stat_record_report and test_stat_record_script used default
output which triggers a bug when sending metrics. As this isn't
relevant to the test switch to using named software events.

Update the match in test_hybrid as the cycles event is now cpu-cycles
to workaround potential ARM issues.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
6b76f0678b perf test stat: Update shadow test to use metrics
Previously '-e cycles,instructions' would implicitly create an IPC
metric. This now has to be explicit with '-M insn_per_cycle'.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
91c1949d76 perf test metrics: Update all metrics for possibly failing default metrics
Default metrics may use unsupported events and be ignored. These
metrics shouldn't cause metric testing to fail.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
083ae6c1fb perf test stat: Update std_output testing metric expectations
Make the expectations match json metrics rather than the previous hard
coded ones.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
b1cb2b76bd perf test stat: Ignore failures in Default[234] metricgroups
The Default[234] metric groups may contain unsupported legacy
events. Allow those metric groups to fail.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
2c240484cf perf test stat+json: Improve metric-only testing
When testing metric-only, pass a metric to perf rather than expecting
a hard coded metric value to be generated.

Remove keys that were really metric-only units and instead don't
expect metric only to have a matching json key as it encodes metrics
as {"metric_name", "metric_value"}.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
1bcd627165 perf stat: Remove "unit" workarounds for metric-only
Remove code that tested the "unit" as in KB/sec for certain hard coded
metric values and did workarounds.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
a745c0831c perf stat: Sort default events/metrics
To improve the readability of default events/metrics, sort the evsels
after the Default metric groups have be parsed.

Before:
```
$ perf stat -a sleep 1
 Performance counter stats for 'system wide':

            22,087      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     10.3 %  tma_bad_speculation
                                                  #     25.8 %  tma_frontend_bound
                                                  #     34.5 %  tma_backend_bound
                                                  #     29.3 %  tma_retiring
             7,829      page-faults                      #      nan faults/sec  page_faults_per_second
       880,144,270      cpu_atom/cpu-cycles/             #      nan GHz  cycles_frequency       (50.10%)
     1,693,081,235      cpu_core/cpu-cycles/             #      nan GHz  cycles_frequency
             TopdownL1 (cpu_atom)                 #     20.5 %  tma_bad_speculation
                                                  #     13.8 %  tma_retiring             (50.26%)
                                                  #     34.6 %  tma_frontend_bound       (50.23%)
        89,326,916      cpu_atom/branches/               #      nan M/sec  branch_frequency     (60.19%)
       538,123,088      cpu_core/branches/               #      nan M/sec  branch_frequency
             1,368      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     31.1 %  tma_backend_bound        (60.19%)
              0.00 msec cpu-clock                        #      0.0 CPUs  CPUs_utilized
       485,744,856      cpu_atom/instructions/           #      0.6 instructions  insn_per_cycle  (59.87%)
     3,093,112,283      cpu_core/instructions/           #      1.8 instructions  insn_per_cycle
         4,939,427      cpu_atom/branch-misses/          #      5.0 %  branch_miss_rate         (49.77%)
         7,632,248      cpu_core/branch-misses/          #      1.4 %  branch_miss_rate

       1.005084693 seconds time elapsed
```
After:
```
$ perf stat -a sleep 1
 Performance counter stats for 'system wide':

            22,165      context-switches                 #      nan cs/sec  cs_per_second
              0.00 msec cpu-clock                        #      0.0 CPUs  CPUs_utilized
             2,260      cpu-migrations                   #      nan migrations/sec  migrations_per_second
            20,476      page-faults                      #      nan faults/sec  page_faults_per_second
        17,052,357      cpu_core/branch-misses/          #      1.5 %  branch_miss_rate
     1,120,090,590      cpu_core/branches/               #      nan M/sec  branch_frequency
     3,402,892,275      cpu_core/cpu-cycles/             #      nan GHz  cycles_frequency
     6,129,236,701      cpu_core/instructions/           #      1.8 instructions  insn_per_cycle
         6,159,523      cpu_atom/branch-misses/          #      3.1 %  branch_miss_rate         (49.86%)
       222,158,812      cpu_atom/branches/               #      nan M/sec  branch_frequency     (50.25%)
     1,547,610,244      cpu_atom/cpu-cycles/             #      nan GHz  cycles_frequency       (50.40%)
     1,304,901,260      cpu_atom/instructions/           #      0.8 instructions  insn_per_cycle  (50.41%)
             TopdownL1 (cpu_core)                 #     13.7 %  tma_bad_speculation
                                                  #     23.5 %  tma_frontend_bound
                                                  #     33.3 %  tma_backend_bound
                                                  #     29.6 %  tma_retiring
             TopdownL1 (cpu_atom)                 #     32.1 %  tma_backend_bound        (59.65%)
                                                  #     30.1 %  tma_frontend_bound       (59.51%)
                                                  #     22.3 %  tma_bad_speculation
                                                  #     15.5 %  tma_retiring             (59.53%)

       1.008405429 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
19df87d9ed perf stat: Fix default metricgroup display on hybrid
The logic to skip output of a default metric line was firing on
Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the
need_full_name check as it is equivalent to the different PMU test in
the cases we care about, merge the 'if's and flip the evsel of the PMU
test. The 'if' is now basically saying, if the output matches the last
printed output then skip the output.

Before:
```
             TopdownL1 (cpu_core)                 #     11.3 %  tma_bad_speculation
                                                  #     24.3 %  tma_frontend_bound
             TopdownL1 (cpu_core)                 #     33.9 %  tma_backend_bound
                                                  #     30.6 %  tma_retiring
                                                  #     42.2 %  tma_backend_bound
                                                  #     25.0 %  tma_frontend_bound       (49.81%)
                                                  #     12.8 %  tma_bad_speculation
                                                  #     20.0 %  tma_retiring             (59.46%)
```
After:
```
             TopdownL1 (cpu_core)                 #      8.3 %  tma_bad_speculation
                                                  #     43.7 %  tma_frontend_bound
                                                  #     30.7 %  tma_backend_bound
                                                  #     17.2 %  tma_retiring
             TopdownL1 (cpu_atom)                 #     31.9 %  tma_backend_bound
                                                  #     37.6 %  tma_frontend_bound       (49.66%)
                                                  #     18.0 %  tma_bad_speculation
                                                  #     12.6 %  tma_retiring             (59.58%)
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
b71f46a6a7 perf stat: Remove hard coded shadow metrics
Now that the metrics are encoded in common json the hard coded
printing means the metrics are shown twice. Remove the hard coded
version.

This means that when specifying events, and those events correspond to
a hard coded metric, the metric will no longer be displayed. The
metric will be displayed if the metric is requested. Due to the adhoc
printing in the previous approach it was often found frustrating, the
new approach avoids this.

The default perf stat output on an alderlake now looks like:
```
$ perf stat -a -- sleep 1

 Performance counter stats for 'system wide':

            19,697      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     10.7 %  tma_bad_speculation
                                                  #     24.9 %  tma_frontend_bound
             TopdownL1 (cpu_core)                 #     34.3 %  tma_backend_bound
                                                  #     30.1 %  tma_retiring
             6,593      page-faults                      #      nan faults/sec  page_faults_per_second
       729,065,658      cpu_atom/cpu-cycles/             #      nan GHz  cycles_frequency       (49.79%)
     1,605,131,101      cpu_core/cpu-cycles/             #      nan GHz  cycles_frequency
                                                  #     19.7 %  tma_bad_speculation
                                                  #     14.2 %  tma_retiring             (50.14%)
                                                  #     37.3 %  tma_frontend_bound       (50.31%)
        87,302,268      cpu_atom/branches/               #      nan M/sec  branch_frequency     (60.27%)
       512,046,956      cpu_core/branches/               #      nan M/sec  branch_frequency
             1,111      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     28.8 %  tma_backend_bound        (60.26%)
              0.00 msec cpu-clock                        #      0.0 CPUs  CPUs_utilized
       392,509,323      cpu_atom/instructions/           #      0.6 instructions  insn_per_cycle  (60.19%)
     2,990,369,310      cpu_core/instructions/           #      1.9 instructions  insn_per_cycle
         3,493,478      cpu_atom/branch-misses/          #      5.9 %  branch_miss_rate         (49.69%)
         7,297,531      cpu_core/branch-misses/          #      1.4 %  branch_miss_rate

       1.006621701 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
3622990efa perf script: Change metric format to use json metrics
The metric format option isn't properly supported. This change
improves that by making the sample events update the counts of an
evsel, where the shadow metric code expects to read the values.  To
support printing metrics, metrics need to be found. This is done on
the first attempt to print a metric. Every metric is parsed and then
the evsels in the metric's evlist compared to those in perf script
using the perf_event_attr type and config. If the metric matches then
it is added for printing. As an event in the perf script's evlist may
have >1 metric id, or different leader for aggregation, the first
metric matched will be displayed in those cases.

An example use is:
```
$ perf record -a -e '{instructions,cpu-cycles}:S' -a -- sleep 1
$ perf script -F period,metric
...
     867817
         metric:    0.30  insn per cycle
     125394
         metric:    0.04  insn per cycle
     313516
         metric:    0.11  insn per cycle
         metric:    1.00  insn per cycle
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
2dfc0cab3d perf stat: Add detail -d,-dd,-ddd metrics
Add metrics for the stat-shadow -d, -dd and -ddd events and hard coded
metrics. Remove the events as these now come from the metrics.

Following this change a detailed perf stat output looks like:
```
$ perf stat -a -ddd -- sleep 1
 Performance counter stats for 'system wide':

            21,089      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     14.1 %  tma_bad_speculation
                                                  #     27.3 %  tma_frontend_bound       (30.56%)
             TopdownL1 (cpu_core)                 #     31.5 %  tma_backend_bound
                                                  #     27.2 %  tma_retiring             (30.56%)
             6,302      page-faults                      #      nan faults/sec  page_faults_per_second
       928,495,163      cpu_atom/cpu-cycles/
                                                  #      nan GHz  cycles_frequency       (28.41%)
     1,841,409,834      cpu_core/cpu-cycles/
                                                  #      nan GHz  cycles_frequency       (38.51%)
                                                  #     14.5 %  tma_bad_speculation
                                                  #     16.0 %  tma_retiring             (28.41%)
                                                  #     36.8 %  tma_frontend_bound       (35.57%)
       100,859,118      cpu_atom/branches/               #      nan M/sec  branch_frequency     (42.73%)
       572,657,734      cpu_core/branches/               #      nan M/sec  branch_frequency     (54.43%)
             1,527      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     32.7 %  tma_backend_bound        (42.73%)
              0.00 msec cpu-clock                        #    0.000 CPUs utilized
                                                  #      0.0 CPUs  CPUs_utilized
       498,668,509      cpu_atom/instructions/           #    0.57  insn per cycle
                                                  #      0.6 instructions  insn_per_cycle  (42.97%)
     3,281,762,225      cpu_core/instructions/           #    1.84  insn per cycle
                                                  #      1.8 instructions  insn_per_cycle  (62.20%)
         4,919,511      cpu_atom/branch-misses/          #    5.43% of all branches
                                                  #      5.4 %  branch_miss_rate         (35.80%)
         7,431,776      cpu_core/branch-misses/          #    1.39% of all branches
                                                  #      1.4 %  branch_miss_rate         (62.20%)
         2,517,007      cpu_atom/LLC-loads/              #      0.1 %  llc_miss_rate            (28.62%)
         3,931,318      cpu_core/LLC-loads/              #     40.4 %  llc_miss_rate            (45.98%)
        14,918,674      cpu_core/L1-dcache-load-misses/  #    2.25% of all L1-dcache accesses
                                                  #      nan %  l1d_miss_rate            (37.80%)
        27,067,264      cpu_atom/L1-icache-load-misses/  #   15.92% of all L1-icache accesses
                                                  #     15.9 %  l1i_miss_rate            (21.47%)
       116,848,994      cpu_atom/dTLB-loads/             #      0.8 %  dtlb_miss_rate           (21.47%)
       764,870,407      cpu_core/dTLB-loads/             #      0.1 %  dtlb_miss_rate           (15.12%)

       1.006181526 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
a3248b5b54 perf jevents: Add metric DefaultShowEvents
Some Default group metrics require their events showing for
consistency with perf's previous behavior. Add a flag to indicate when
this is the case and use it in stat-display.

As events are coming from Default metrics remove that default hardware
and software events from perf stat.

Following this change the default perf stat output on an alderlake looks like:
```
$ perf stat -a -- sleep 1

 Performance counter stats for 'system wide':

            20,550      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #      9.0 %  tma_bad_speculation
                                                  #     28.1 %  tma_frontend_bound
             TopdownL1 (cpu_core)                 #     29.2 %  tma_backend_bound
                                                  #     33.7 %  tma_retiring
             6,685      page-faults                      #      nan faults/sec  page_faults_per_second
       790,091,064      cpu_atom/cpu-cycles/
                                                  #      nan GHz  cycles_frequency       (49.83%)
     2,563,918,366      cpu_core/cpu-cycles/
                                                  #      nan GHz  cycles_frequency
                                                  #     12.3 %  tma_bad_speculation
                                                  #     14.5 %  tma_retiring             (50.20%)
                                                  #     33.8 %  tma_frontend_bound       (50.24%)
        76,390,322      cpu_atom/branches/               #      nan M/sec  branch_frequency     (60.20%)
     1,015,173,047      cpu_core/branches/               #      nan M/sec  branch_frequency
             1,325      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     39.3 %  tma_backend_bound        (60.17%)
              0.00 msec cpu-clock                        #    0.000 CPUs utilized
                                                  #      0.0 CPUs  CPUs_utilized
       554,347,072      cpu_atom/instructions/           #    0.64  insn per cycle
                                                  #      0.6 instructions  insn_per_cycle  (60.14%)
     5,228,931,991      cpu_core/instructions/           #    2.04  insn per cycle
                                                  #      2.0 instructions  insn_per_cycle
         4,308,874      cpu_atom/branch-misses/          #    5.65% of all branches
                                                  #      5.6 %  branch_miss_rate         (49.76%)
         9,890,606      cpu_core/branch-misses/          #    0.97% of all branches
                                                  #      1.0 %  branch_miss_rate

       1.005477803 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:35 -08:00
Ian Rogers
c7adeb0974 perf jevents: Add set of common metrics based on default ones
Add support to getting a common set of metrics from a default
table. It simplifies the generation to add json metrics at the same
time. The metrics added are CPUs_utilized, cs_per_second,
migrations_per_second, page_faults_per_second, insn_per_cycle,
stalled_cycles_per_instruction, frontend_cycles_idle,
backend_cycles_idle, cycles_frequency, branch_frequency and
branch_miss_rate based on the shadow metric definitions.

Following this change the default perf stat output on an alderlake
looks like:
```
$ perf stat -a -- sleep 2

 Performance counter stats for 'system wide':

              0.00 msec cpu-clock                        #    0.000 CPUs utilized
            77,739      context-switches
            15,033      cpu-migrations
           321,313      page-faults
    14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
   134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
    10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
    39,138,632,894      cpu_core/cycles/                                                        (57.60%)
     2,989,658,777      cpu_atom/branches/                                                      (42.60%)
    32,170,570,388      cpu_core/branches/                                                      (57.39%)
        29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
       165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
                       (software)                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
                                                  #     19.6 %  tma_frontend_bound       (63.97%)
             TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
                                                  #     49.7 %  tma_retiring             (63.97%)
                       (software)                 #      nan faults/sec  page_faults_per_second
                                                  #      nan GHz  cycles_frequency       (42.88%)
                                                  #      nan GHz  cycles_frequency       (69.88%)
             TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
                                                  #     29.9 %  tma_retiring             (50.07%)
             TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
                       (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
                                                  #      nan M/sec  branch_frequency     (70.07%)
                                                  #      nan migrations/sec  migrations_per_second
             TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
                       (software)                 #      0.0 CPUs  CPUs_utilized
                                                  #      1.4 instructions  insn_per_cycle  (43.04%)
                                                  #      3.5 instructions  insn_per_cycle  (69.99%)
                                                  #      1.0 %  branch_miss_rate         (35.46%)
                                                  #      0.5 %  branch_miss_rate         (65.02%)

       2.005626564 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:34 -08:00
Ian Rogers
2e5140849b perf expr: Add #target_cpu literal
For CPU nanoseconds a lot of the stat-shadow metrics use either
task-clock or cpu-clock, the latter being used when
target__has_cpu. Add a #target_cpu literal so that json metrics can
perform the same test.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:34 -08:00
Ian Rogers
c8035a4961 perf metricgroup: Add care to picking the evsel for displaying a metric
Rather than using the first evsel in the matched events, try to find
the least shared non-tool evsel. The aim is to pick the first evsel
that typifies the metric within the list of metrics.

This addresses an issue where Default metric group metrics may lose
their counter value due to how the stat displaying hides counters for
default event/metric output.

For a metricgroup like TopdownL1 on an Intel Alderlake the change is,
before there are 4 events with metrics:
```
$ perf stat -M topdownL1 -a sleep 1

 Performance counter stats for 'system wide':

     7,782,334,296      cpu_core/TOPDOWN.SLOTS/          #     10.4 %  tma_bad_speculation
                                                  #     19.7 %  tma_frontend_bound
     2,668,927,977      cpu_core/topdown-retiring/       #     35.7 %  tma_backend_bound
                                                  #     34.1 %  tma_retiring
       803,623,987      cpu_core/topdown-bad-spec/
       167,514,386      cpu_core/topdown-heavy-ops/
     1,555,265,776      cpu_core/topdown-fe-bound/
     2,792,733,013      cpu_core/topdown-be-bound/
       279,769,310      cpu_atom/TOPDOWN_RETIRING.ALL/   #     12.2 %  tma_retiring
                                                  #     15.1 %  tma_bad_speculation
       457,917,232      cpu_atom/CPU_CLK_UNHALTED.CORE/  #     38.4 %  tma_backend_bound
                                                  #     34.2 %  tma_frontend_bound
       783,519,226      cpu_atom/TOPDOWN_FE_BOUND.ALL/
        10,790,192      cpu_core/INT_MISC.UOP_DROPPING/
       879,845,633      cpu_atom/TOPDOWN_BE_BOUND.ALL/
```

After there are 6 events with metrics:
```
$ perf stat -M topdownL1 -a sleep 1

 Performance counter stats for 'system wide':

     2,377,551,258      cpu_core/TOPDOWN.SLOTS/          #      7.9 %  tma_bad_speculation
                                                  #     36.4 %  tma_frontend_bound
       480,791,142      cpu_core/topdown-retiring/       #     35.5 %  tma_backend_bound
       186,323,991      cpu_core/topdown-bad-spec/
        65,070,590      cpu_core/topdown-heavy-ops/      #     20.1 %  tma_retiring
       871,733,444      cpu_core/topdown-fe-bound/
       848,286,598      cpu_core/topdown-be-bound/
       260,936,456      cpu_atom/TOPDOWN_RETIRING.ALL/   #     12.4 %  tma_retiring
                                                  #     17.6 %  tma_bad_speculation
       419,576,513      cpu_atom/CPU_CLK_UNHALTED.CORE/
       797,132,597      cpu_atom/TOPDOWN_FE_BOUND.ALL/   #     38.0 %  tma_frontend_bound
         3,055,447      cpu_core/INT_MISC.UOP_DROPPING/
       671,014,164      cpu_atom/TOPDOWN_BE_BOUND.ALL/   #     32.0 %  tma_backend_bound
```

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:48:34 -08:00
Namhyung Kim
367377f45c perf tools: Fix missing feature check for inherit + SAMPLE_READ
It should also have PERF_SAMPLE_TID to enable inherit and PERF_SAMPLE_READ
on recent kernels.  Not having _TID makes the feature check wrongly detect
the inherit and _READ support.

It was reported that the following command failed due to the error in
the missing feature check on Intel SPR machines.

  $ perf record -e '{cpu/mem-loads-aux/S,cpu/mem-loads,ldlat=3/PS}' -- ls
  Error:
  Failure to open event 'cpu/mem-loads,ldlat=3/PS' on PMU 'cpu' which will be removed.
  Invalid event (cpu/mem-loads,ldlat=3/PS) in per-thread mode, enable system wide with '-a'.

Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 3b193a57ba ("perf tools: Detect missing kernel features properly")
Reported-and-tested-by: Chen, Zide <zide.chen@intel.com>
Closes: https://lore.kernel.org/lkml/20251022220802.1335131-1-zide.chen@intel.com/
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11 16:43:37 -08:00
Chen Ni
e279039c3e perf symbol: Remove unneeded semicolon
Remove unnecessary semicolons reported by Coccinelle/coccicheck and the
semantic patch at scripts/coccinelle/misc/semicolon.cocci.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-10 22:22:16 -08:00
Ian Rogers
081006b7c8 perf test: Add test that command line period overrides sysfs/json values
The behavior of weak terms is subtle, add a test that they aren't
accidentally broken. The test finds an event with a weak 'period' and
then overrides it. In no such event is present then the test skips.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-09 23:07:57 -08:00
Ian Rogers
0e9b51a432 perf pmu: Make pmu_alias_terms weak again
The terms for a json event should be weak so they don't override
command line options.

Before:
```
$ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1
|grep "{ sample_period, sample_freq }"
 { sample_period, sample_freq }   200003
 { sample_period, sample_freq }   2000003
 { sample_period, sample_freq }   1000
```

After:
```
$ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1
|grep "{ sample_period, sample_freq }"
 { sample_period, sample_freq }   1000
 { sample_period, sample_freq }   1000
 { sample_period, sample_freq }   1000
```

Fixes: 84bae3af20 ("perf pmu: Don't eagerly parse event terms")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-09 23:07:57 -08:00
Ian Rogers
6331b26693 perf tool: Add a delegate_tool that just delegates actions to another tool
Add an ability to be able to compose perf_tools, by having one perform
an action and then calling a delegate. Currently the perf_tools have
if-then-elses setting the callback and then if-then-elses within the
callback. Understanding the behavior is complex as it is in two places
and logic for numerous operations, within things like perf inject, is
interwoven. By chaining perf_tools together based on command line
options this kind of code can be avoided.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-07 13:25:05 -08:00
Ian Rogers
71062e282d perf tool: Add the perf_tool argument to all callbacks
Getting context for what a tool is doing, such as the perf_inject
instance, using container_of the tool is a common pattern in the
code. This isn't possible event_op2, event_op3 and event_op4 callbacks
as the tool isn't passed. Add the argument and then fix function
signatures to match. As tools maybe reading a tool from somewhere
else, change that code to use the passed in tool.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-07 13:25:05 -08:00
Xu Yang
fa4a527af5 perf vendor events arm64:: Add i.MX94 DDR Performance Monitor metrics
Add JSON metrics for i.MX94 DDR Performance Monitor.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-06 17:46:16 -08:00