Commit Graph

1352792 Commits

Author SHA1 Message Date
James Clark
aea3496bbc perf tools: Fix arm64 source package build
Syscall tables are generated from rules in the kernel tree. Add the
related files to the MANIFEST to fix the Perf source package build.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: bfb713ea53 ("perf tools: Fix arm64 build by generating unistd_64.h")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250513-james-perf-src-pkg-fix-v1-1-bcfd0486dbd6@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-13 17:26:35 -03:00
Ian Rogers
3eb5c49f71 perf test: Hybrid improvements for metric value validation test
On my alderlake I currently see for the "perf metrics value validation" test:

```
Total Test Count:  142
Passed Test Count:  139
[
Metric Relationship Error:      The collected value of metric ['tma_fetch_latency', 'tma_fetch_bandwidth', 'tma_frontend_bound']
                        is [31.137028] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [tma_frontend_bound, tma_frontend_bound]
                        Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
Metric Relationship Error:      The collected value of metric ['tma_memory_bound', 'tma_core_bound', 'tma_backend_bound']
                        is [6.564442] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [tma_backend_bound, tma_backend_bound]
                        Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
Metric Relationship Error:      The collected value of metric ['tma_light_operations', 'tma_heavy_operations', 'tma_retiring']
                        is [57.806179] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [tma_retiring, tma_retiring]
                        Relationship rule description: 'Sum of the level 2 children should equal level 1 parent']
Metric validation return with erros. Please check metrics reported with errors.
```

I suspect it is due to two metrics for different CPU types being
enabled. Add a -cputype option to avoid this. The test still fails with:

```
Total Test Count:  115
Passed Test Count:  114
[
Wrong Metric Value Error:       The collected value of metric ['tma_l2_hit_latency']
                        is [117.947088] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [0, 100]]
Metric validation return with errors. Please check metrics reported with errors.
```

which is a reproducible genuine error and likely requires a metric fix.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250512184700.11691-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-13 17:24:42 -03:00
Ian Rogers
7f84f67418 perf list: Display the PMU name associated with a perf metric in JSON
The 'perf stat --cputype' option can be used to filter which metrics
will be applied, for this reason the JSON metrics have an associated
PMU.

List this PMU name in the 'perf list' output in JSON mode so that
tooling may access it.

An example of the new field is:
```
{
        "MetricGroup": "Backend",
        "MetricName": "tma_core_bound",
        "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)",
        "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
        "ScaleUnit": "100%",
        "BriefDescription": "This metric represents fraction of slots where ...
        "PublicDescription": "This metric represents fraction of slots where ...
        "Unit": "cpu_core"
},
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250512184700.11691-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-13 17:06:14 -03:00
Ian Rogers
4102ff8b1f perf metricgroup: Binary search when resolving referred to metrics
Unlike with events, metrics can be matched by name or a list of metric
groups.

However, when a metric refers to another metric it isn't referring to a
group but the singular metric in question.

Prior to this change every "id" in a metric expression is checked to see
if it is a metric by scanning all the metrics in the metrics table.

As the table is sorted my metric name we can speed the search in the
resolution case by binary searching for the metric.

Rename some of the metricgroup functions to make it clearer whether
they match a metric by name or by both name and group.

Before:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m15.972s
user    0m13.176s
sys     0m3.001s
```

After:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m5.343s
user    0m1.871s
sys     0m2.128s
```

Committer testing:

  root@number:~# grep -m1 'model name' /proc/cpuinfo
  model name	: AMD Ryzen 9 9950X3D 16-Core Processor
  root@number:~#

Before:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real	0m9.286s
  user	0m9.354s
  sys	0m0.062s
  root@number:~#

After:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real	0m0.689s
  user	0m0.766s
  sys	0m0.042s
  root@number:~# time perf test 10
   10: PMU JSON event tests                                            :
   10.1: PMU event table sanity                                        : Ok
   10.2: PMU event map aliases                                         : Ok
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
   10.5: Parsing of metric thresholds with fake PMUs                   : Ok

  real	0m0.696s
  user	0m0.807s
  sys	0m0.064s
  root@number:~#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250512194622.33258-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-13 16:36:51 -03:00
Ian Rogers
754baf426e perf pmu: Change aliases from list to hashmap
Finding an alias for things like perf_pmu__have_event() would need to
search the aliases list, whilst this happens relatively infrequently it
can be a significant overhead in testing.

Switch to using a hashmap. Move common initialization code to
perf_pmu__init(). Refactor the test 'struct perf_pmu_test_pmu' to not
have perf pmu within it to better support the perf_pmu__init() function.

Before:
```
$ time perf test "Parsing of PMU event table metrics"
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

real    0m13.287s
user    0m13.026s
sys     0m0.532s
```

After:
```
$ time perf test "Parsing of PMU event table metrics"
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

real    0m13.011s
user    0m12.885s
sys     0m0.485s
```

Committer testing:

  root@number:~# grep -m1 'model name' /proc/cpuinfo
  model name	: AMD Ryzen 9 9950X3D 16-Core Processor
  root@number:~#

Before:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real	0m9.296s
  user	0m9.361s
  sys	0m0.063s
  root@number:~#

After:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real	0m9.286s
  user	0m9.354s
  sys	0m0.062s
  root@number:~#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250512194622.33258-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-13 16:36:43 -03:00
Ian Rogers
375368a961 perf fncache: Switch to using hashmap
The existing fncache can get large in testing situations. As the
bucket array is a fixed size this leads to it degrading to O(n)
performance. Use a regular hashmap that can dynamically reallocate its
array.

Before:
```
$ time perf test "Parsing of PMU event table metrics"
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

real    0m14.132s
user    0m17.806s
sys     0m0.557s
```

After:
```
$ time perf test "Parsing of PMU event table metrics"
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

real    0m13.287s
user    0m13.026s
sys     0m0.532s
```

Committer notes:

  root@number:~# grep -m1 'model name' /proc/cpuinfo
  model name	: AMD Ryzen 9 9950X3D 16-Core Processor
  root@number:~#

Before:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real	0m9.277s
  user	0m9.979s
  sys	0m0.055s
  root@number:~#

After:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real	0m9.296s
  user	0m9.361s
  sys	0m0.063s
  root@number:~#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250512194622.33258-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-13 16:36:22 -03:00
Ian Rogers
f3061d5267 perf tests: Harden branch stack sampling test
On continuous testing the perf script output can be empty, or nearly
empty, causing tr/grep to exit and due to "set -e" the test traps and
fails.

Add some empty file handling that sets the test to skip and make grep
and other text rewriting failures non-fatal by adding "|| true".

Committer testing:

  root@number:~# grep -m1 "model name" /proc/cpuinfo
  model name	: AMD Ryzen 9 9950X3D 16-Core Processor
  root@number:~# perf test "Check branch stack sampling"
  104: Check branch stack sampling                                     : Ok
  root@number:~#
  root@number:~# perf test -vvvvvvv "Check branch stack sampling"
  104: Check branch stack sampling:
  --- start ---
  test child forked, pid 396047
   142d22-142da0 l brstack_bench
  perf does have symbol 'brstack_bench'
  Testing user branch stack sampling
  Testing branch stack filtering permutation (any_call,CALL|IND_CALL|COND_CALL|SYSCALL|IRQ)
  Testing branch stack filtering permutation (call,CALL|SYSCALL)
  Testing branch stack filtering permutation (cond,COND)
  Testing branch stack filtering permutation (any_ret,RET|COND_RET|SYSRET|ERET)
  Testing branch stack filtering permutation (call,cond,CALL|SYSCALL|COND)
  Testing branch stack filtering permutation (any_call,cond,CALL|IND_CALL|COND_CALL|IRQ|SYSCALL|COND)
  Testing branch stack filtering permutation (cond,any_call,any_ret,COND|CALL|IND_CALL|COND_CALL|SYSCALL|IRQ|RET|COND_RET|SYSRET|ERET)
  ---- end(0) ----
  104: Check branch stack sampling                                     : Ok
  root@number:~#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250318161639.34446-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:55:15 -03:00
Ian Rogers
255f5b6d06 perf parse-events: Add "cpu" term to set the CPU an event is recorded on
The -C option allows the CPUs for a list of events to be specified but
its not possible to set the CPU for a single event. Add a term to
allow this. The term isn't a general CPU list due to ',' already being
a special character in event parsing instead multiple cpu= terms may
be provided and they will be merged/unioned together.

An example of mixing different types of events counted on different CPUs:
```
$ perf stat -A -C 0,4-5,8 -e "instructions/cpu=0/,l1d-misses/cpu=4,cpu=5/,inst_retired.any/cpu=8/,cycles" -a sleep 0.1

 Performance counter stats for 'system wide':

CPU0            6,979,225      instructions/cpu=0/              #    0.89  insn per cycle
CPU4               75,138      cpu/l1d-misses/
CPU5            1,418,939      cpu/l1d-misses/
CPU8              797,553      cpu/inst_retired.any,cpu=8/
CPU0            7,845,302      cycles
CPU4            6,546,859      cycles
CPU5          185,915,438      cycles
CPU8            2,065,668      cycles

       0.112449242 seconds time elapsed
```

Committer testing:

  root@number:~# grep -m1 "model name" /proc/cpuinfo
  model name	: AMD Ryzen 9 9950X3D 16-Core Processor
  root@number:~# perf stat -A -e "instructions/cpu=0/,instructions,l1d-misses/cpu=4,cpu=5/,cycles" -a sleep 0.1

   Performance counter stats for 'system wide':

  CPU0    2,398,351   instructions/cpu=0/    #  0.44  insn per cycle
  CPU0    2,398,152   instructions           #  0.44  insn per cycle
  CPU1    1,265,634   instructions           #  0.49  insn per cycle
  CPU2      606,087   instructions           #  0.50  insn per cycle
  CPU3    4,025,752   instructions           #  0.52  insn per cycle
  CPU4    4,236,810   instructions           #  0.53  insn per cycle
  CPU5    3,984,832   instructions           #  0.66  insn per cycle
  CPU6      434,132   instructions           #  0.44  insn per cycle
  CPU7       65,752   instructions           #  0.41  insn per cycle
  CPU8      459,083   instructions           #  0.48  insn per cycle
  CPU9    6,464,161   instructions           #  1.31  insn per cycle
  <SNIP>
  root@number:~# perf stat -e "instructions/cpu=0/,instructions,l1d-misses/cpu=4,cpu=5/,cycles" -a sleep 0.

   Performance counter stats for 'system wide':

             144,822      instructions/cpu=0/              #    0.03  insn per cycle
           4,666,114      instructions                     #    0.93  insn per cycle
               2,583      l1d-misses
           4,993,633      cycles

         0.000868512 seconds time elapsed

  root@number:~#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250403194337.40202-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:23:19 -03:00
Ian Rogers
168c7b5091 perf parse-events: Set is_pmu_core for legacy hardware events
Also set the CPU map to all online CPU maps.

This is done so the behavior of legacy hardware and hardware cache
events better matches that of sysfs and JSON events during
__perf_evlist__propagate_maps().

Fix missing cpumap put in "Synthesize attr update" test.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250403194337.40202-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:16 -03:00
Ian Rogers
f60c3f4468 perf stat: Use counter cpumask to skip zero values
When a counter is 0 it may or may not be skipped.

For uncore counters it is common they are only valid on 1 logical CPU
and all other CPUs should be skipped.

The PMU's cpumask was used for the skip calculation, but that cpumask
may not reflect user overrides.

Similarly a counter on a core PMU may explicitly not request a CPU be
gathered.

If the counter on this CPU's value is 0 then the counter should be
skipped as it wasn't requested.

Switch from using the PMU cpumask to that associated with the evsel to
support these cases.

Avoid potential crash with --per-thread mode where config->aggr_get_id
is NULL. Add some examples for the tool event 0 counter skipping.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250403194337.40202-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:16 -03:00
Ian Rogers
2e7a2f7f3c libperf cpumap: Add ability to create CPU from a single CPU number
Add perf_cpu_map__new_int() so that a CPU map can be created from a
single integer.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250403194337.40202-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:16 -03:00
Ian Rogers
365e02ddb6 perf tests metrics: Permission related fixes
When permissions are limited running sleep without system wide isn't a
good benchmark to run to achieve samples, switch to running noploop.

Remove indent for non-success cases.

Allow skip for the not counted case.

Minor debug changes.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250412004704.2297939-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:16 -03:00
Ian Rogers
f0869f3156 perf evsel: Add per-thread warning for EOPNOTSUPP open failues
The mrvl_ddr_pmu will return EOPNOTSUPP if opened in per-thread
mode. Give a warning for this similar to EINVAL.

Doing this better supports metric testing with limited permissions when
the mrvl_ddr_pmu is present, as the failure to open causes the test to
skip and not fail.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250412004704.2297939-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:16 -03:00
Adrian Hunter
17e548405a perf scripts python: exported-sql-viewer.py: Fix pattern matching with Python 3
The script allows the user to enter patterns to find symbols.

The pattern matching characters are converted for use in SQL.

For PostgreSQL the conversion involves using the Python maketrans()
method which is slightly different in Python 3 compared with Python 2.

Fix to work in Python 3.

Fixes: beda0e725e ("perf script python: Add Python3 support to exported-sql-viewer.py")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tony Jones <tonyj@suse.de>
Link: https://lore.kernel.org/r/20250512093932.79854-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:16 -03:00
Adrian Hunter
352b088164 perf intel-pt: Do not default to recording all switch events
On systems with many CPUs, recording extra context switch events can be
excessive and unnecessary. Add perf config intel-pt.all-switch-events=false
to control the behaviour.

Example:

 # perf config intel-pt.all-switch-events=false
 # perf record -eintel_pt//u uname
 Linux
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.082 MB perf.data ]
 # perf script -D | grep PERF_RECORD_SWITCH | awk '{print $5}' | uniq -c
       5 PERF_RECORD_SWITCH
 # perf config intel-pt.all-switch-events=true
 # perf record -eintel_pt//u uname
 Linux
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.102 MB perf.data ]
 # perf script -D | grep PERF_RECORD_SWITCH | awk '{print $5}' | uniq -c
     180 PERF_RECORD_SWITCH_CPU_WIDE

Committer testing:

While doing a make -j28 allmodconfig:

  root@five:~# grep "model name" -m1 /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-14700K
  root@five:~#
  root@five:~# perf config intel-pt.all-switch-events=false
  root@five:~# perf record -e intel_pt//u uname
  Linux
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.019 MB perf.data ]
  root@five:~# perf report --stats | grep SWITCH_CPU_WIDE
  root@five:~#
  root@five:~# perf config intel-pt.all-switch-events=true
  root@five:~# perf record -e intel_pt//u uname
  Linux
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.047 MB perf.data ]
  root@five:~# perf report --stats | grep SWITCH_CPU_WIDE
       SWITCH_CPU_WIDE events:        542  (96.4%)
  root@five:~#

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250512093932.79854-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:16 -03:00
Adrian Hunter
e00eac6b5b perf intel-pt: Fix PEBS-via-PT data_src
The Fixes commit did not add support for decoding PEBS-via-PT data_src.
Fix by adding support.

PEBS-via-PT is a feature of some E-core processors, starting with
processors based on Tremont microarchitecture. Because the kernel only
supports Intel PT features that are on all processors, there is no support
for PEBS-via-PT on hybrids.

Currently that leaves processors based on Tremont, Gracemont and Crestmont,
however there are no events on Tremont that produce data_src information,
and for Gracemont and Crestmont there are only:

	mem-loads	event=0xd0,umask=0x5,ldlat=3
	mem-stores	event=0xd0,umask=0x6

Affected processors include Alder Lake N (Gracemont), Sierra Forest
(Crestmont) and Grand Ridge (Crestmont).

Example:

 # perf record -d -e intel_pt/branch=0/ -e mem-loads/aux-output/pp uname

 Before:

  # perf.before script --itrace=o -Fdata_src
            0 |OP No|LVL N/A|SNP N/A|TLB N/A|LCK No|BLK  N/A
            0 |OP No|LVL N/A|SNP N/A|TLB N/A|LCK No|BLK  N/A

 After:

  # perf script --itrace=o -Fdata_src
  10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A
  10450100442 |OP LOAD|LVL L2 hit|SNP None|TLB L2 miss|LCK No|BLK  N/A

Fixes: 975846eddf ("perf intel-pt: Add memory information to synthesized PEBS sample")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250512093932.79854-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12 14:18:09 -03:00
Ian Rogers
cd17a9b1a7 perf test demangle-ocaml: Switch to using dso__demangle_sym()
The use of the demangle-ocaml APIs means we don't detect if a different
demangler is used before the OCaml one for the case that matters to
perf.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Bill Wendling <morbo@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/20250430004128.474388-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 17:03:09 -03:00
Ian Rogers
41fcc9a343 perf test demangle-java: Switch to using dso__demangle_sym()
The use of the demangle-java APIs means we don't detect if a different
demangler is used before the Java one for the case that matters to
perf.

Remove the return types from the demangled names as dso__demangle_sym()
removes those.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Bill Wendling <morbo@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/20250430004128.474388-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 17:02:23 -03:00
Ian Rogers
bdf05ccd18 perf test demangle-rust: Add Rust demangling test
The test cases are listed examples in:

https://doc.rust-lang.org/rustc/symbol-mangling/v0.html

This test was previously part of a different Rust v0 demangler:

https://lore.kernel.org/lkml/20250129193037.573431-1-irogers@google.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Bill Wendling <morbo@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/20250430004128.474388-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 17:01:57 -03:00
Ian Rogers
ac292ea7c3 perf demangle-rust: Remove previous legacy rust decoder
Code is unused since the introduction of rustc-demangle demangler.

Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Bill Wendling <morbo@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/20250430004128.474388-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 17:01:39 -03:00
Ian Rogers
e20848c317 perf symbol-elf: Integrate rust-v0 demangling
Use the demangle-rust-v0 APIs to see if symbol is Rust mangled and
demangle if so.

The API requires a pre-allocated output buffer, some estimation and
retrying are added for this.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Bill Wendling <morbo@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/20250430004128.474388-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 17:00:59 -03:00
Ian Rogers
60869b22af perf demangle-rust: Add rustc-demangle C demangler
Imported at commit 80e40f57d99f ("add comment about finding latest
version of code") from:

https://github.com/rust-lang/rustc-demangle/blob/main/crates/native-c/src/demangle.c
https://github.com/rust-lang/rustc-demangle/blob/main/crates/native-c/include/demangle.h

There is discussion of this issue motivating the import in:

https://github.com/rust-lang/rust/issues/60705
https://lore.kernel.org/lkml/20250129193037.573431-1-irogers@google.com/

The SPDX lines reflect the dual license Apache-2 or MIT in:

https://github.com/rust-lang/rustc-demangle/blob/main/README.md

Following Migual Ojeda's suggestion comments were added on copyright and
keeping the code in sync with upstream.

The files are renamed as perf supports multiple demanglers and so
demangle as a name would be overloaded.

The work here was done by Ariel Ben-Yehuda <ariel.byd@gmail.com> and I
am merely importing it as discussed in the rust-lang issue.

Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Bill Wendling <morbo@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/20250430004128.474388-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 17:00:05 -03:00
Colin Ian King
4f1a19b8bc perf test amd ibs: Fix spelling mistake "Asssuming" -> "Assuming"
There is a spelling mistake ina pr_debug message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250507082421.188848-1-colin.i.king@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 14:47:19 -03:00
Namhyung Kim
c60b7d6f50 perf pmu: Use available core PMU for raw events
When it finds a matching PMU for a legacy event, it should look for
core PMUs.  The raw events also refers to core events so it should be
handled similarly.

On x86, PERF_TYPE_RAW should match with the existing cpu PMU.  But on
ARM, there's no PMU with the matching type so it'll pick the first core
PMU for it.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250507215939.54399-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 14:44:44 -03:00
Namhyung Kim
c42e219942 perf lock contention: Add -J/--inject-delay option
This is to slow down lock acquistion (on contention locks) deliberately.

A possible use case is to estimate impact on application performance by
optimization of kernel locking behavior.  By delaying the lock it can
simulate the worse condition as a control group, and then compare with
the current behavior as a optimized condition.

The syntax is 'time@function' and the time can have unit suffix like
"us" and "ms".  For example, I ran a simple test like below.

  $ sudo perf lock con -abl -L tasklist_lock -- \
    sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
   contended   total wait     max wait     avg wait            address   symbol

          92      1.18 ms    199.54 us     12.79 us   ffffffff8a806080   tasklist_lock (rwlock)

The contention count was 92 and the average wait time was around 10 us.
But if I add 100 usec of delay to the tasklist_lock,

  $ sudo perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- \
    sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
   contended   total wait     max wait     avg wait            address   symbol

         190     15.67 ms    230.10 us     82.46 us   ffffffff8a806080   tasklist_lock (rwlock)

The contention count increased and the average wait time was up closed
to 100 usec.  If I increase the delay even more,

  $ sudo perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- \
    sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
   contended   total wait     max wait     avg wait            address   symbol

        1002      2.80 s       3.01 ms      2.80 ms   ffffffff8a806080   tasklist_lock (rwlock)

Now every sleep process had contention and the wait time was more than 1
msec.  This is on my 4 CPU laptop so I guess one CPU has the lock while
other 3 are waiting for it mostly.

For simplicity, it only supports global locks for now.

Committer testing:

  root@number:~# grep -m1 'model name' /proc/cpuinfo
  model name : AMD Ryzen 9 9950X3D 16-Core Processor
  root@number:~# perf lock con -abl -L tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
   contended  total wait   max wait  avg wait           address  symbol

         142   453.85 us   25.39 us   3.20 us  ffffffffae808080  tasklist_lock (rwlock)
  root@number:~# perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
   contended  total wait   max wait  avg wait           address  symbol

        1040     2.39 s     3.11 ms   2.30 ms  ffffffffae808080  tasklist_lock (rwlock)
  root@number:~# perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
   contended  total wait   max wait  avg wait           address  symbol

        1025    24.72 s    31.01 ms  24.12 ms  ffffffffae808080  tasklist_lock (rwlock)
  root@number:~#

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250509171950.183591-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 14:32:15 -03:00
Michael Petlan
4bfe27140e perf tests: Fix 'perf report' tests installation
There was a copy-paste mistake in the installation commands.

Also, we need to install stderr-whitelist.txt file, which contains
allowed messages that are printed on stderr and should not cause test
fail.

Fixes: 097fe67df1 ("perf testsuite: Install perf-report tests in the 'make install-tests -C tools/perf' target")
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250113182605.130719-6-vmolnaro@redhat.com
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09 14:10:08 -03:00
Namhyung Kim
30d20fb1f8 perf trace: Fix leaks of 'struct thread' in set_filter_loop_pids()
I've found some leaks from 'perf trace -a'.

It seems there are more leaks but this is what I can find for now.

Fixes: 082ab9a18e ("perf trace: Filter out 'sshd' in the tracer ancestry in syswide tracing")
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org
[ split from a larget patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 17:11:50 -03:00
Namhyung Kim
bb3de7fa98 perf trace: Fix leaks of 'struct thread' in fprintf_sys_enter()
I've found some leaks from 'perf trace -a'.

It seems there are more leaks but this is what I can find for now.

Fixes: 70351029b5 ("perf thread: Add support for reading the e_machine type for a thread")
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org
[ split from a larget patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 17:10:09 -03:00
Ian Rogers
70e21ac8b0 perf parse-events: Add debug dump of evlist if reordered
Add debug verbose output to show how evsels were reordered by
parse_events__sort_events_and_fix_groups(). For example:

```
$ perf record -v -e '{instructions,cycles}' true
Using CPUID GenuineIntel-6-B7-1
WARNING: events were regrouped to match PMUs
evlist after sorting/fixing: '{cpu_atom/instructions/,cpu_atom/cycles/},{cpu_core/instructions/,cpu_core/cycles/}'
```

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250402201549.4090305-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 12:53:51 -03:00
Ian Rogers
583dc500d1 perf evlist: Make groups visible in evlist__format_evsels() output
Make groups visible in output:

Before:

{cycles,instructions} ->
cpu_atom/cycles/,cpu_atom/instructions/,cpu_core/cycles/,cpu_core/instructions/

After:

{cycles,instructions} ->
{cpu_atom/cycles/,cpu_atom/instructions/},{cpu_core/cycles/,cpu_core/instructions/}

Committer testing:

Before:

  root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla
  Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied
  root@number:~#

After:

  root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla
  Failed to collect '{cycles,instructions,cache-misses}' for the '/tmp/bla' workload: Permission denied
  root@number:~#

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250402201549.4090305-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 12:51:40 -03:00
Ian Rogers
f0f245eaa2 perf evlist: Refactor evlist__scnprintf_evsels()
Switch output to using a strbuf so the storage can be resized.

Add a maximum size argument to avoid too much output that may happen for
uncore events.

Rename as scnprintf is no longer used.

Committer testing:

  With the patch applied:

  root@number:~# perf probe -x ~/bin/perf evlist__format_evsels
  Added new event:
    probe_perf:evlist_format_evsels (on evlist__format_evsels in /home/acme/bin/perf)

  You can now use it in all perf tools, such as:

  	perf record -e probe_perf:evlist_format_evsels -aR sleep 1

  root@number:~# perf probe -l
    probe_perf:evlist_format_evsels (on evlist__format_evsels@util/evlist.c in /home/acme/bin/perf)
  root@number:~# perf trace -e probe_perf:*/max-stack=10/ perf record -e cycles,instructions,cache-misses /tmp/bla
  Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied
       0.000 perf/3893011 probe_perf:evlist_format_evsels(__probe_ip: 6183397)
                                         evlist__format_evsels (/home/acme/bin/perf)
                                         __cmd_record (/home/acme/bin/perf)
                                         cmd_record (/home/acme/bin/perf)
                                         run_builtin (/home/acme/bin/perf)
                                         handle_internal_command (/home/acme/bin/perf)
                                         run_argv (/home/acme/bin/perf)
                                         main (/home/acme/bin/perf)
                                         __libc_start_call_main (/usr/lib64/libc.so.6)
                                         __libc_start_main@@GLIBC_2.34 (/usr/lib64/libc.so.6)
                                         _start (/home/acme/bin/perf)
  root@number:~#

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250402201549.4090305-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 12:47:46 -03:00
Ian Rogers
a5efaf9008 perf stat: Remove print_mixed_hw_group_error
print_mixed_hw_group_error will print a warning when a group of events
uses different PMUs.

This isn't possible to happen as parse_events__sort_events_and_fix_groups()
will break groups when this happens, adding the warning at the start
of perf of:

  WARNING: events were regrouped to match PMUs

As the previous mixed group warning can never happen, remove the
associated code.

Committer testing:

Before/after:

  acme@five:~$ perf stat -e '{cpu_core/cycles/,cpu_atom/cycles/}' sleep 1
  WARNING: events were regrouped to match PMUs

   Performance counter stats for 'sleep 1':

             424,895      cpu_atom/cycles/u
       <not counted>      cpu_core/cycles/u         (0.00%)

         1.011862314 seconds time elapsed

         0.000000000 seconds user
         0.003166000 seconds sys

  acme@five:~$

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250402201549.4090305-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 12:46:23 -03:00
Ian Rogers
4b53137721 perf stat: Better hybrid support for the NMI watchdog warning
Prior to this patch evlist__has_hybrid would return false if the
processor wasn't hybrid or the evlist didn't contain any core
events. If the only PMU used by events was cpu_core then it would
true even though there are no cpu_atom events. For example:

```
$ perf stat --cputype=cpu_core -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}' true

 Performance counter stats for 'true':

     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)
     <not counted>      cpu_core/cycles/                                                        (0.00%)

       0.001981900 seconds time elapsed

       0.002311000 seconds user
       0.000000000 seconds sys
```

This patch changes evlist__has_hybrid to return true only if the
evlist contains events from >1 core PMU. This means the NMI watchdog
warning is shown for the case above.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250402201549.4090305-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 12:21:40 -03:00
Ian Rogers
7900938850 perf trace: Add missing thread__put() in thread__e_machine()
Add missing thread__put() of the found parent thread in
thread__e_machine().

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 11:50:44 -03:00
Ian Rogers
8830091383 perf trace: Free the files.max entry in files->table
The files.max is the maximum valid fd in the files array and so
freeing the values needs to be inclusive of the max value.

Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08 11:49:40 -03:00
Arnaldo Carvalho de Melo
8330d092f7 Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes from the latest perf-tools pull request from Namhyung
and get perf-tools-next in line with thinngs in other areas it uses,
like tools/lib/bpf, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-07 12:51:38 -03:00
Linus Torvalds
707df33751 Merge tag 'media/v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "Some Kconfig dependency fixes"

* tag 'media/v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: cec: tda9950: add back i2c dependency
  media: i2c: lt6911uxe: add two selects to Kconfig
  media: platform: synopsys: VIDEO_SYNOPSYS_HDMIRX should depend on ARCH_ROCKCHIP
  media: i2c: lt6911uxe: Fix Kconfig dependencies:
  media: vivid: fix FB dependency
2025-05-07 07:00:15 -07:00
Linus Torvalds
0d8d44db29 Merge tag 'for-6.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - revert device path canonicalization, this does not work as intended
   with namespaces and is not reliable in all setups

 - fix crash in scrub when checksum tree is not valid, e.g. when mounted
   with rescue=ignoredatacsums

 - fix crash when tracepoint btrfs_prelim_ref_insert is enabled

 - other minor fixups:
     - open code folio_index(), meant to be used in MM code
     - use matching type for sizeof in compression allocation

* tag 'for-6.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: open code folio_index() in btree_clear_folio_dirty_tag()
  Revert "btrfs: canonicalize the device path before adding it"
  btrfs: avoid NULL pointer dereference if no valid csum tree
  btrfs: handle empty eb->folios in num_extent_folios()
  btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref
  btrfs: compression: adjust cb->compressed_folios allocation type
2025-05-06 08:19:09 -07:00
Linus Torvalds
cccd033714 Merge tag 'for-6.15/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:

 - fix reading past the end of allocated memory

 - fix missing dm_put_live_table() in dm_keyslot_evict()

* tag 'for-6.15/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix copying after src array boundaries
  dm: add missing unlock on in dm_keyslot_evict()
2025-05-06 08:14:20 -07:00
Tudor Ambarus
f1aff4bc19 dm: fix copying after src array boundaries
The blammed commit copied to argv the size of the reallocated argv,
instead of the size of the old_argv, thus reading and copying from
past the old_argv allocated memory.

Following BUG_ON was hit:
[    3.038929][    T1] kernel BUG at lib/string_helpers.c:1040!
[    3.039147][    T1] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
...
[    3.056489][    T1] Call trace:
[    3.056591][    T1]  __fortify_panic+0x10/0x18 (P)
[    3.056773][    T1]  dm_split_args+0x20c/0x210
[    3.056942][    T1]  dm_table_add_target+0x13c/0x360
[    3.057132][    T1]  table_load+0x110/0x3ac
[    3.057292][    T1]  dm_ctl_ioctl+0x424/0x56c
[    3.057457][    T1]  __arm64_sys_ioctl+0xa8/0xec
[    3.057634][    T1]  invoke_syscall+0x58/0x10c
[    3.057804][    T1]  el0_svc_common+0xa8/0xdc
[    3.057970][    T1]  do_el0_svc+0x1c/0x28
[    3.058123][    T1]  el0_svc+0x50/0xac
[    3.058266][    T1]  el0t_64_sync_handler+0x60/0xc4
[    3.058452][    T1]  el0t_64_sync+0x1b0/0x1b4
[    3.058620][    T1] Code: f800865e a9bf7bfd 910003fd 941f48aa (d4210000)
[    3.058897][    T1] ---[ end trace 0000000000000000 ]---
[    3.059083][    T1] Kernel panic - not syncing: Oops - BUG: Fatal exception

Fix it by copying the size of src, and not the size of dst, as it was.

Fixes: 5a2a6c4281 ("dm: always update the array size in realloc_argv on success")
Cc: stable@vger.kernel.org
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-06 14:06:59 +02:00
Howard Chu
8feafba59c perf test: Add direct off-cpu tests
Since we added --off-cpu-thresh, add tests for when a sample's off-cpu
time is above the threshold, and when it's below the threshold.

Note that the basic test performed in test_offcpu_basic() collects a
direct sample now, since sleep 1 has duration of 1000ms, higher than the
default value of --off-cpu-thresh of 500ms, resulting in a direct
sample.

An example:

  $ sudo perf test offcpu
  124: perf record offcpu profiling tests                              : Ok
  $

Committer testing:

  root@number:~# perf test offcpu
  126: perf record offcpu profiling tests                              : Ok
  root@number:~# perf test -v offcpu
  126: perf record offcpu profiling tests                              : Ok
  root@number:~# perf test -vv offcpu
  126: perf record offcpu profiling tests:
  --- start ---
  test child forked, pid 1410791
  Checking off-cpu privilege
  Basic off-cpu test
  Basic off-cpu test [Success]
  Child task off-cpu test
  Child task off-cpu test [Success]
  Threshold test (above threshold)
  Threshold test (above threshold) [Success]
  Threshold test (below threshold)
  Threshold test (below threshold) [Success]
  ---- end(0) ----
  126: perf record offcpu profiling tests                              : Ok
  root@number:~#

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250501022809.449767-11-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:52:08 -03:00
Howard Chu
9557c00076 perf record --off-cpu: Add --off-cpu-thresh option
Specify the threshold for dumping offcpu samples with --off-cpu-thresh,
the unit is milliseconds. Default value is 500ms.

Example:

  perf record --off-cpu --off-cpu-thresh 824

The example above collects direct off-cpu samples where the off-cpu time
is longer than 824ms.

Committer testing:

After commenting out the end off-cpu dump to have just the ones that are
added right after the task is scheduled back, and using a threshould of
1000ms, we see some periods (the 5th column, just before "offcpu-time"
in the 'perf script' output) that are over 1000.000.000 nanoseconds:

  root@number:~# perf record --off-cpu --off-cpu-thresh 10000
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 3.902 MB perf.data (34335 samples) ]
  root@number:~# perf script
<SNIP>
  Isolated Web Co   59932 [028] 63839.594437: 1000049427 offcpu-time:
             7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6)
             7fe63c78c04c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6)
             7fe63c78e928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6)
             5599974a9fe7 mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)+0xe7 (/usr/lib64/fir>
                100000000 [unknown] ([unknown])

          swapper       0 [025] 63839.594459:     195724    cycles:P:  ffffffffac328270 read_tsc+0x0 ([kernel.kallsyms])
  Isolated Web Co   59932 [010] 63839.594466: 1000055278 offcpu-time:
             7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6)
             7fe63c78ba24 __syscall_cancel+0x14 (/usr/lib64/libc.so.6)
             7fe63c804c4e __poll+0x1e (/usr/lib64/libc.so.6)
             7fe633b0d1b8 PollWrapper(_GPollFD*, unsigned int, int) [clone .lto_priv.0]+0xf8 (/usr/lib64/firefox/libxul.so)
                10000002c [unknown] ([unknown])

          swapper       0 [027] 63839.594475:     134433    cycles:P:  ffffffffad4c45d9 irqentry_enter+0x19 ([kernel.kallsyms])
          swapper       0 [028] 63839.594499:     215838    cycles:P:  ffffffffac39199a switch_mm_irqs_off+0x10a ([kernel.kallsyms])
  MediaPD~oder #1 1407676 [027] 63839.594514:     134433    cycles:P:      7f982ef5e69f dct_IV(int*, int, int*)+0x24f (/usr/lib64/libfdk-aac.so.2.0.0)
          swapper       0 [024] 63839.594524:     267411    cycles:P:  ffffffffad4c6ee6 poll_idle+0x56 ([kernel.kallsyms])
  MediaSu~sor #75 1093827 [026] 63839.594555:     332652    cycles:P:      55be753ad030 moz_xmalloc+0x200 (/usr/lib64/firefox/firefox)
          swapper       0 [027] 63839.594616:     160548    cycles:P:  ffffffffad144840 menu_select+0x570 ([kernel.kallsyms])
  Isolated Web Co   14019 [027] 63839.595120: 1000050178 offcpu-time:
             7fc9537cc6c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6)
             7fc9537c104c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6)
             7fc9537c3928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6)
             7fc95372a3c8 pt_TimedWait+0xb8 (/usr/lib64/libnspr4.so)
             7fc95372a8d8 PR_WaitCondVar+0x68 (/usr/lib64/libnspr4.so)
             7fc94afb1f7c WatchdogMain(void*)+0xac (/usr/lib64/firefox/libxul.so)
             7fc947498660 [unknown] ([unknown])
             7fc9535fce88 [unknown] ([unknown])
             7fc94b620e60 WatchdogManager::~WatchdogManager()+0x0 (/usr/lib64/firefox/libxul.so)
          fff8548387f8b48 [unknown] ([unknown])

          swapper       0 [003] 63839.595712:     212948    cycles:P:  ffffffffacd5b865 acpi_os_read_port+0x55 ([kernel.kallsyms])
<SNIP>

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-2-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-10-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:51:54 -03:00
Howard Chu
74069a0160 perf record --off-cpu: Dump the remaining PERF_SAMPLE_ in sample_type from BPF's stack trace map
Dump the remaining PERF_SAMPLE_ data, as if it is dumping a direct
sample.

Put the stack trace, tid, off-cpu time and cgroup id into the raw_data
section, just like a direct off-cpu sample coming from BPF's
bpf_perf_event_output().

This ensures that evsel__parse_sample() correctly parses both direct
samples and accumulated samples.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-10-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-9-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:51:43 -03:00
Howard Chu
8ae7a5769b perf script: Display off-cpu samples correctly
No PERF_SAMPLE_CALLCHAIN in sample_type, but 'perf script' needs to
display a callchain, have to specify manually.

Also, prefer displaying a callchain:

 gvfs-afc-volume    2267 [001] 3829232.955656: 1001115340 offcpu-time:
            77f05292603f __pselect+0xbf (/usr/lib/x86_64-linux-gnu/libc.so.6)
            77f052a1801c [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
            77f052a18d45 [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
            77f05289ca94 start_thread+0x384 (/usr/lib/x86_64-linux-gnu/libc.so.6)
            77f052929c3c clone3+0x2c (/usr/lib/x86_64-linux-gnu/libc.so.6)

to a raw binary BPF output:

  BPF output: 0000: dd 08 00 00 db 08 00 00  <DD>...<DB>...
	  0008: cc ce ab 3b 00 00 00 00  <CC>Ϋ;....
	  0010: 06 00 00 00 00 00 00 00  ........
	  0018: 00 fe ff ff ff ff ff ff  .<FE><FF><FF><FF><FF><FF><FF>
	  0020: 3f 60 92 52 f0 77 00 00  ?`.R<F0>w..
	  0028: 1c 80 a1 52 f0 77 00 00  ..<A1>R<F0>w..
	  0030: 45 8d a1 52 f0 77 00 00  E.<A1>R<F0>w..
	  0038: 94 ca 89 52 f0 77 00 00  .<CA>.R<F0>w..
	  0040: 3c 9c 92 52 f0 77 00 00  <..R<F0>w..
	  0048: 00 00 00 00 00 00 00 00  ........
	  0050: 00 00 00 00              ....

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-9-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-8-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:51:31 -03:00
Howard Chu
7de1a87f1e perf record --off-cpu: Disable perf_event's callchain collection
There is a check in evsel.c that does this:

if (evsel__is_offcpu_event(evsel))
	evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES;

This along with:

 #define OFFCPU_SAMPLE_TYPES  (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
			      PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
			      PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
			      PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
			      PERF_SAMPLE_CGROUP)

will tell perf_event to collect callchain.

We don't need the callchain from perf_event when collecting off-cpu
samples, because it's prev's callchain, not next's callchain.

   (perf_event)     (task_storage) (needed)
   prev             next
   |                  |
   ---sched_switch---->

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-8-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:51:12 -03:00
Howard Chu
7f8f56475d perf evsel: Assemble off-cpu samples
Use the data in bpf-output samples, to assemble off-cpu samples.

In evsel__is_offcpu_event(), check if sample_type is PERF_SAMPLE_RAW to
support off-cpu sample data created by an older version of perf.

Testing compatibility on off-cpu samples collected by perf before this patch series:

See below, the sample_type still uses PERF_SAMPLE_CALLCHAIN

$ perf script --header -i ./perf.data.ptn | grep "event : name = offcpu-time"
 # event : name = offcpu-time, , id = { 237917, 237918, 237919, 237920 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1

The output is correct.

  $ perf script -i ./perf.data.ptn | grep offcpu-time
  gmain    2173 [000] 18446744069.414584:  100102015 offcpu-time:
  NetworkManager     901 [000] 18446744069.414584:    5603579 offcpu-time:
  Web Content 1183550 [000] 18446744069.414584:      46278 offcpu-time:
  gnome-control-c 2200559 [000] 18446744069.414584: 11998247014 offcpu-time:
  <SNIP>
  $

And after this patch series:

  $ perf script --header -i ./perf.data.off-cpu-v9 | grep "event : name = offcpu-time"
   # event : name = offcpu-time, , id = { 237959, 237960, 237961, 237962 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1

  $ ./perf script -i ./perf.data.off-cpu-v9 | grep offcpu-time
  gnome-shell    1875 [001] 4789616.361225:  100097057 offcpu-time:
  gnome-shell    1875 [001] 4789616.461419:  100107463 offcpu-time:
      firefox 2206821 [002] 4789616.475690:  255257245 offcpu-time:
  $

Committer testing:

The command to record those samples:

  root@number:~# perf record --off-cpu -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.092 MB perf.data (1552 samples) ]
  root@number:~#

Then, before this patch series, the sample_type for the "offcpu-time" event is:

  root@number:~# perf evlist -v | grep offcpu-time
  offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, freq: 1, sample_id_all: 1
  root@number:~#

And after it, after recording it again:

  root@number:~# perf record --off-cpu -a sleep 1 ; perf evlist -v | grep offcpu-time
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.151 MB perf.data (2843 samples) ]
  offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, sample_id_all: 1
  root@number:~#

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-7-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-6-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:49:08 -03:00
Howard Chu
d6948f2af2 perf record --off-cpu: Dump off-cpu samples in BPF
Collect tid, period, callchain, and cgroup id and dump them when off-cpu
time threshold is reached.

We don't collect the off-cpu time twice (the delta), it's either in
direct samples, or accumulated samples that are dumped at the end of
perf.data.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-6-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-5-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:48:44 -03:00
Howard Chu
282c195906 perf record --off-cpu: Preparation of off-cpu BPF program
Set the perf_event map in BPF for dumping off-cpu samples, and set the
offcpu_thresh to specify the threshold.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-5-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-4-howardchu95@gmail.com
[ Added some missing iteration variables to off_cpu_config() and fixed up
  a manually edited patch hunk line boundary line ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:48:20 -03:00
Howard Chu
0f72027bb9 perf record --off-cpu: Parse off-cpu event
Parse the off-cpu event using parse_event(), as bpf-output.

Call evlist__enable_evsel() on off-cpu event. This fixes the inability
to collect direct off-cpu samples on a workload, as reported by Arnaldo
Carvalho de Melo <acme@redhat.com>.

The reason being, workload sets enable_on_exec instead of calling
evlist__enable(), but off-cpu event does not attach to an executable and
execve won't be called, so the fds from perf_event_open() are not
enabled.

no-inherit should be set to 1, here's the reason:

We update the BPF perf_event map for direct off-cpu sample dumping (in
following patches), it executes as follows:

bpf_map_update_value()
 bpf_fd_array_map_update_elem()
  perf_event_fd_array_get_ptr()
   perf_event_read_local()

In perf_event_read_local(), there is:

int perf_event_read_local(struct perf_event *event, u64 *value,
			  u64 *enabled, u64 *running)
{
...
	/*
	 * It must not be an event with inherit set, we cannot read
	 * all child counters from atomic context.
	 */
	if (event->attr.inherit) {
		ret = -EOPNOTSUPP;
		goto out;
	}

Which means no-inherit has to be true for updating the BPF perf_event
map.

Moreover, for bpf-output events, we primarily want a system-wide event
instead of a per-task event.

The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU
index to retrieve the perf_event file descriptor it outputs to.

Making a bpf-output event system-wide naturally satisfies this
requirement by mapping CPU appropriately.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:48:02 -03:00
Howard Chu
671e943452 perf evsel: Expose evsel__is_offcpu_event() for future use
Expose evsel__is_offcpu_event() so it can be used in off_cpu_config(),
evsel__parse_sample() and 'perf script'.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-3-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05 21:47:39 -03:00