Leo Yan
d120cb34c9
perf arm_spe: Allow parsing both data source and events
...
Current code skips to parse events after generating data source. The
reason is the data source packets have cache and snooping related info,
the afterwards event packets might contain duplicate info.
This commit changes to continue parsing the events after data source
analysis. If data source does not give out memory level and snooping
types, then the event info is used to synthesize the related fields.
As a result, both the peer snoop option ('-d peer') and hitm options
('-d tot/lcl/rmt') are supported by Arm SPE in 'perf c2c'.
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:29 -03:00
Leo Yan
d510568970
perf arm_spe: Set HITM flag
...
Since FEAT_SPEv1p4, Arm SPE provides two extra events: "Cache data
modified" and "Data snooped".
Set the snoop mode as:
- If both the "Cache data modified" event and the "Data snooped" event
are set, which indicates a load operation that snooped from a outside
cache and hit a modified copy, set the HITM flag to inspect false
sharing.
- If the snooped event bit is not set, and the snooped event has been
supported by the hardware, set as NONE mode (no snoop operation).
- If the snooped event bit is not set, and the event is not supported or
absent the events info in the meta data, set as NA mode (not
available).
Don't set any mode for only "Cache data modified" event, as it hits a
local modified copy.
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:29 -03:00
Leo Yan
04abd5c065
perf arm_spe: Refactor arm_spe__get_metadata_by_cpu()
...
Handle "CPU=-1" (per-thread mode) in the arm_spe__get_metadata_by_cpu()
function. As a result, the function is more general and will be invoked
by a sequential change.
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:28 -03:00
Leo Yan
786e7e7a50
perf arm_spe: Fill memory levels for FEAT_SPEv1p4
...
Starting with FEAT_SPEv1p4, Arm SPE provides information on Level 2 data
cache and recently fetched events. This patch fills in the memory levels
for these new events.
The recently fetched events are matched to line-fill buffer (LFB). In
general, the latency for accessing LFB is higher than accessing L1 cache
but lower than accessing L2 cache. Thus, it locates in the memory
hierarchy information between L1 cache and L2 cache.
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:28 -03:00
Leo Yan
14d4ecb15e
perf arm_spe: Separate setting of memory levels for loads and stores
...
For a load hit, the lowest-level cache reflects the latency of fetching
a data. Otherwise, the highest-level cache involved in refilling
indicates the overhead caused by a load.
Store operations remain unchanged to keep the descending order when
iterating through cache levels.
Split into two functions: one is for setting memory levels for loads and
another for stores.
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:28 -03:00
Leo Yan
98f993ae6f
perf arm_spe: Refine memory level filling
...
This commit introduces macros for detecting cache level and cache miss.
Populates the 'mem_lvl_num' field which is a later added attribute for
representing memory level. Set NA ("not available") to memory levels if
memory hierarchy info is absent.
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:28 -03:00
Leo Yan
99940fd9e1
perf arm_spe: Add "event_filter" entry in meta data
...
Add a new "event_filter" entry in the meta data and dump it in raw data
mode.
After:
# perf script -D
...
0 0 0x470 [0x1f0]: PERF_RECORD_AUXTRACE_INFO type: 4
Header version :2
Header size :4
PMU type v2 :11
CPU number :8
Magic :0x1010101010101010
CPU # :0
Num of params :4
MIDR :0x410fd0f0
PMU Type :11
Min Interval :256
Event Filter :0x3fe08fe
...
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:28 -03:00
James Clark
7203a22492
perf arm_spe: Use full type for data_src
...
data_src has an actual type rather than just being a u64. To help
readers, delay decomposing it to a u64 until it's finally assigned to
the sample.
Signed-off-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:27 -03:00
Leo Yan
cb300e3515
perf arm_spe: Correct memory level for remote access
...
For remote accesses, the data source packet does not contain information
about the memory level. To avoid misinformation, set the memory level to
NA (Not Available).
Fixes: 4e6430cbb1 ("perf arm-spe: Use SPE data source for neoverse cores")
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:27 -03:00
Leo Yan
039fd0634a
perf arm_spe: Correct setting remote access
...
Set the mem_remote field for a remote access to appropriately represent
the event.
Fixes: a89dbc9b98 ("perf arm-spe: Set sample's data source field")
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-19 12:14:27 -03:00
James Clark
9574a44747
perf arm-spe: Display --itrace period warnings for all sample types
...
Currently we only display the warning when the instructions group is
requested. Instructions are on by default, and the period applies to all
sample types anyway so always check the options and show the warning.
Reword the messages to be more explicit about which flags the warnings
apply to.
Signed-off-by: James Clark <james.clark@linaro.org >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ben Gainey <Ben.Gainey@arm.com >
Cc: George Wort <George.Wort@arm.com >
Cc: Graham Woodward <Graham.Woodward@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Michael Williams <Michael.Williams@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-09 14:51:07 -03:00
James Clark
bf1af4f6e6
perf arm-spe: Downsample all sample types equally
...
The various sample types that are generated are based on the same SPE
sample, just placed into different sample type bins.
The same sample can be in multiple bins if it has flags set that cause
it to be.
Currently we're only applying the --itrace interval downsampling to the
instruction bin, which means that the sample would appear in one bin but
not another if it was skipped due to downsampling.
I don't thing anyone would want or expect this, so make this behave
consistently by applying the downsampling before generating any sample.
You might argue that the "instructions" interval type doesn't make sense
to apply to "memory" sample types because it would be skipping every n
memory samples, rather than every n instructions.
ut the downsampling was already not an instruction interval even for the
instruction samples. SPE has a hardware based sampling interval, and the
instruction interval was just a convenient way to specify further
downsampling.
This is hinted at in the warning message shown for intervals greater
than 1.
This makes SPE diverge from trace technologies like Intel PT and Arm
Coresight.
In those cases instruction samples can be reduced but all branches are
still emitted. This makes sense there, because branches form a complete
execution history, and asking to skip branches every n instructions
doesn't really make sense.
But for SPE, as mentioned above, downsampling the instruction samples
already wasn't consistent with trace technologies so we ended up with
some middle ground that had no benefit.
Now it's possible to reduce the volume of samples in all groups and
samples won't be missing from one group but present in another.
Reviewed-by: Leo Yan <leo.yan@arm.com >
Signed-off-by: James Clark <james.clark@linaro.org >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ben Gainey <Ben.Gainey@arm.com >
Cc: George Wort <George.Wort@arm.com >
Cc: Graham Woodward <Graham.Woodward@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Michael Williams <Michael.Williams@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-09 14:49:38 -03:00
James Clark
80a2d7ea48
perf arm-spe: Show instruction sample types by default
...
Instruction sample types are enabled in the default itrace options in
perf, but this never applied to SPE because the default nanoseconds
period isn't supported.
This meant that instructions ended up being opt-in by the user only when
they requested an instruction based period.
Change the default period type to instructions so that instruction
samples are generated by default. This can overridden by specifying any
--itrace option.
This solves a common complaint from users that the unfiltered SPE
samples appear to be missing, and only the samples that have memory
flags set appear in the various memory groups.
Signed-off-by: James Clark <james.clark@linaro.org >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ben Gainey <Ben.Gainey@arm.com >
Cc: George Wort <George.Wort@arm.com >
Cc: Graham Woodward <Graham.Woodward@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Michael Williams <Michael.Williams@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-09-09 14:48:52 -03:00
Ian Rogers
57ddb9cbb5
perf evlist: Change env variable to session
...
The session holds a perf_env pointer env. In UI code container_of is
used to turn the env to a session, but this assumes the session
header's env is in use. Rather than a dubious container_of, hold the
session in the evlist and derive the env from the session with
evsel__env, perf_session__env, etc.
Signed-off-by: Ian Rogers <irogers@google.com >
Link: https://lore.kernel.org/r/20250724163302.596743-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2025-07-25 10:37:56 -07:00
Yicong Yang
846b62b343
perf arm-spe: Add support for SPE Data Source packet on HiSilicon HIP12
...
Add data source encoding for HiSilicon HIP12 and coresponding mapping
to the perf's memory data source. This will help to synthesize the data
and support upper layer tools like perf-mem and perf-c2c.
Reviewed-by: Leo Yan <leo.yan@arm.com >
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com >
Cc: CaiJingtao <caijingtao@huawei.com >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@linaro.org >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Jonathan Cameron <jonathan.cameron@huawei.com >
Cc: Junhao He <hejunhao3@huawei.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Cc: Yushan Wang <wangyushan12@huawei.com >
Cc: Zeng Tao <prime.zeng@hisilicon.com >
Cc: xueshan2@huawei.com
Link: https://lore.kernel.org/r/20250425033845.57671-3-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2025-05-27 17:57:58 -03:00
Leo Yan
2cc2f258a9
perf arm-spe: Support previous branch target (PBT) address
...
When FEAT_SPE_PBT is implemented, the previous branch target address
(named as PBT) before the sampled operation, will be recorded.
This commit first introduces a 'prev_br_tgt' field in the record for
saving the PBT address in the decoder.
If the current operation is a branch instruction, by combining with PBT,
it can create a chain with two consecutive branches. As the branch
stack stores branches in descending order, meaning a newer branch is
stored in a lower entry in the stack. Arm SPE stores the latest branch
in the first entry of branch stack, and the previous branch coming from
PBT is stored into the second entry.
Otherwise, if current operation is not a branch, the last branch will be
saved for PBT only. PBT lacks associated information such as branch
source address, branch type, and events. The branch entry fills zeros
for the corresponding fields and only set its target address.
After:
perf script -f --itrace=bl -F flags,addr,brstack
jcc ffff800080187914 0xffff8000801878fc/0xffff800080187914/P/-/-/1/COND/- 0x0/0xffff8000801878f8/-/-/-/0//-
jcc ffff8000802d12d8 0xffff8000802d12f8/0xffff8000802d12d8/P/-/-/1/COND/- 0x0/0xffff8000802d12ec/-/-/-/0//-
jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//-
jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//-
jmp ffff800081410980 0xffff800081419108/0xffff800081410980/P/-/-/1//- 0x0/0xffff800081419104/-/-/-/0//-
return ffff80008036e064 0xffff80008141ba84/0xffff80008036e064/P/-/-/1/RET/- 0x0/0xffff80008141ba60/-/-/-/0//-
jcc ffff8000803d54f0 0xffff8000803d54e8/0xffff8000803d54f0/P/-/-/1/COND/- 0x0/0xffff8000803d54e0/-/-/-/0//-
jmp ffff80008015e468 0xffff8000803d46dc/0xffff80008015e468/P/-/-/1//- 0x0/0xffff8000803d46c8/-/-/-/0//-
jmp ffff8000806e2d50 0xffff80008040f710/0xffff8000806e2d50/P/-/-/1//- 0x0/0xffff80008040f6e8/-/-/-/0//-
jcc ffff800080721704 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//-
Reviewed-by: Ian Rogers <irogers@google.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Link: https://lore.kernel.org/r/20250304111240.3378214-13-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2025-03-05 09:13:20 -08:00
Leo Yan
73cb57f56f
perf arm-spe: Add branch stack
...
Although Arm SPE cannot generate continuous branch records, this commit
creates a branch stack with only one branch entry. A single branch info
can be used for performance optimization.
A branch stack structure is dynamically allocated in the decode queue.
The branch stack and stack flags are synthesized based on branch types
and associated events.
After:
# perf script --itrace=bl1 -F flags,addr,brstack
jcc ffffc0fad9c6b214 0xffffc0fad9c6b234/0xffffc0fad9c6b214/P/-/-/7/COND/-
jcc/miss,not_taken/ ffffc0fadaaebb30 0xffffc0fadaaebb2c/0xffffc0fadaaebb30/MN/-/-/7/COND/-
jmp ffffc0fadaaea358 0xffffc0fadaaea5ec/0xffffc0fadaaea358/P/-/-/5//-
jcc/not_taken/ ffffc0fadaae6494 0xffffc0fadaae6490/0xffffc0fadaae6494/PN/-/-/11/COND/-
jcc/not_taken/ ffff7f83ab54 0xffff7f83ab50/0xffff7f83ab54/PN/-/-/13/COND/-
jcc/not_taken/ ffff7f83ab08 0xffff7f83ab04/0xffff7f83ab08/PN/-/-/8/COND/-
jcc ffff7f83aa80 0xffff7f83aa58/0xffff7f83aa80/P/-/-/10/COND/-
jcc ffff7f9a45d0 0xffff7f9a43f0/0xffff7f9a45d0/P/-/-/29/COND/-
jcc/not_taken/ ffffc0fad9ba6db4 0xffffc0fad9ba6db0/0xffffc0fad9ba6db4/PN/-/-/44/COND/-
jcc ffffc0fadaac2964 0xffffc0fadaac2970/0xffffc0fadaac2964/P/-/-/6/COND/-
jcc ffffc0fad99ddc10 0xffffc0fad99ddc04/0xffffc0fad99ddc10/P/-/-/72/COND/-
jcc/not_taken/ ffffc0fad9b3f21c 0xffffc0fad9b3f218/0xffffc0fad9b3f21c/PN/-/-/64/COND/-
jcc ffffc0fad9c3b604 0xffffc0fad9c3b5f8/0xffffc0fad9c3b604/P/-/-/13/COND/-
jcc ffffc0fadaad6048 0xffffc0fadaad5f8c/0xffffc0fadaad6048/P/-/-/5/COND/-
return/miss/ ffff7f84e614 0xffffc0fad98a2274/0xffff7f84e614/M/-/-/13/RET/-
jcc/not_taken/ ffffc0fadaac4eb4 0xffffc0fadaac4eb0/0xffffc0fadaac4eb4/PN/-/-/5/COND/-
jmp ffff7f8e3130 0xffff7f87555c/0xffff7f8e3130/P/-/-/5//-
jcc/not_taken/ ffffc0fad9b3d9b0 0xffffc0fad9b3d9ac/0xffffc0fad9b3d9b0/PN/-/-/14/COND/-
return ffffc0fad9b91950 0xffffc0fad98c3e28/0xffffc0fad9b91950/P/-/-/12/RET/-
Reviewed-by: Ian Rogers <irogers@google.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Link: https://lore.kernel.org/r/20250304111240.3378214-12-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2025-03-05 09:13:20 -08:00
Leo Yan
4a53a67e0e
perf arm-spe: Set sample flags with supplement info
...
Based on the supplement information in the record, this commit sets the
sample flags for conditional branch, function call, return. It also
sets events in flags, such as mispredict, not taken, and in transaction.
Reviewed-by: Ian Rogers <irogers@google.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Signed-off-by: Leo Yan <leo.yan@arm.com >
Link: https://lore.kernel.org/r/20250304111240.3378214-11-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2025-03-05 09:13:20 -08:00
Leo Yan
e1d47850bb
perf arm-spe: Fix load-store operation checking
...
The ARM_SPE_OP_LD and ARM_SPE_OP_ST operations are secondary operation
type, they are overlapping with other second level's operation types
belonging to SVE and branch operations. As a result, a non load-store
operation can be parsed for data source and memory sample.
To fix the issue, this commit introduces a is_ldst_op() macro for
checking LDST operation, and apply the checking when synthesize data
source and memory samples.
Fixes: a89dbc9b98 ("perf arm-spe: Set sample's data source field")
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20250304111240.3378214-7-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2025-03-05 09:13:19 -08:00
Ian Rogers
dc6d2bc2d8
perf sample: Make user_regs and intr_regs optional
...
The struct dump_regs contains 512 bytes of cache_regs, meaning the two
values in perf_sample contribute 1088 bytes of its total 1384 bytes
size. Initializing this much memory has a cost reported by Tavian
Barnes <tavianator@tavianator.com > as about 2.5% when running `perf
script --itrace=i0`:
https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/
Adrian Hunter <adrian.hunter@intel.com > replied that the zero
initialization was necessary and couldn't simply be removed.
This patch aims to strike a middle ground of still zeroing the
perf_sample, but removing 79% of its size by make user_regs and
intr_regs optional pointers to zalloc-ed memory. To support the
allocation accessors are created for user_regs and intr_regs. To
support correct cleanup perf_sample__init and perf_sample__exit
functions are created and added throughout the code base.
Signed-off-by: Ian Rogers <irogers@google.com >
Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2025-02-12 20:06:11 -08:00
Ilkka Koskinen
9e7a00ec6a
perf arm-spe: Add support for SPE Data Source packet on AmpereOne
...
Decode SPE Data Source packets on AmpereOne. The field is IMPDEF.
Reviewed-by: Leo Yan <leo.yan@arm.com >
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Graham Woodward <graham.woodward@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@linaro.org >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241108202946.16835-3-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-12-09 17:52:41 -03:00
Ilkka Koskinen
ccdc9e9c5e
perf arm-spe: Prepare for adding data source packet implementations for other cores
...
Split Data Source Packet handling to prepare adding support for
other implementations.
Reviewed-by: Leo Yan <leo.yan@arm.com >
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Graham Woodward <graham.woodward@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@linaro.org >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241108202946.16835-2-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-12-09 17:52:41 -03:00
James Clark
ba993e5ada
perf arm-spe: Use old behavior when opening old SPE files
...
Since the linked commit, we stopped interpreting data source if the
perf.data file doesn't have the new metadata version. This means that
perf c2c will show no samples in this case.
Keep the old behavior so old files can be opened, but also still show
the new warning that updating might improve the decoding.
Also re-write the warning to be more concise and specific to a user.
Fixes: ba5e7169e5 ("perf arm-spe: Use metadata to decide the data source feature")
Signed-off-by: James Clark <james.clark@linaro.org >
Reviewed-by: Leo Yan <leo.yan@arm.com >
Cc: Julio.Suarez@arm.com
Cc: Kiel.Friedt@arm.com
Cc: Ryan.Roberts@arm.com
Cc: Will Deacon <will@kernel.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com >
Cc: John Garry <john.g.garry@oracle.com >
Link: https://lore.kernel.org/r/20241029143734.291638-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-30 23:50:47 -07:00
Graham Woodward
edff8dad3f
perf arm-spe: Correctly set sample flags
...
Set flags on all synthesized instruction and branch samples.
Signed-off-by: Graham Woodward <graham.woodward@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: nd@arm.com
Cc: mike.leach@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241025143009.25419-4-graham.woodward@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-29 16:10:14 -07:00
Graham Woodward
c1b67c8510
perf arm-spe: Use ARM_SPE_OP_BRANCH_ERET when synthesizing branches
...
Instead of checking the type for just branch misses, we can instead
check for the OP_BRANCH_ERET and synthesise branches as well as
branch misses.
Signed-off-by: Graham Woodward <graham.woodward@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: nd@arm.com
Cc: mike.leach@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241025143009.25419-3-graham.woodward@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-29 16:10:10 -07:00
Graham Woodward
19966d792b
perf arm-spe: Set sample.addr to target address for instruction sample
...
For an instruction sample, assign the target address to the field
'to_ip'.
If it is a non-branch record, to_ip will be 0, presenting a non-valid
target address.
Signed-off-by: Graham Woodward <graham.woodward@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: nd@arm.com
Cc: mike.leach@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241025143009.25419-2-graham.woodward@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-29 16:10:05 -07:00
Ian Rogers
58fc358a3e
perf color: Add printf format checking and resolve issues
...
Add printf format checking to vararg printf routines in
color.h. Resolve build errors/bugs that are found through this
checking.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Cc: Weilin Wang <weilin.wang@intel.com >
Cc: Will Deacon <will@kernel.org >
Cc: James Clark <james.clark@linaro.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com >
Cc: Thomas Richter <tmricht@linux.ibm.com >
Cc: Tim Chen <tim.c.chen@linux.intel.com >
Cc: John Garry <john.g.garry@oracle.com >
Link: https://lore.kernel.org/r/20241017175356.783793-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-17 12:44:26 -07:00
Leo Yan
ea2ead4224
perf arm-spe: Add Cortex CPUs to common data source encoding list
...
Add Cortex-A720, Cortex-A725, Cortex-X1C, Cortex-X3 and Cortex-X925 into
the common data source encoding list. For everyone of these CPUs, it
technical reference manual defines the data source packet as the common
encoding format.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20241003185322.192357-8-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-14 12:04:32 -07:00
Besar Wicaksono
041c0e5715
perf arm-spe: Add Neoverse-V2 to common data source encoding list
...
Add Neoverse-V2 MIDR to the common data source encoding range list.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com >
Reviewed-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: Leo Yan <leo.yan@linaro.org >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20241003185322.192357-7-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-14 12:04:32 -07:00
Leo Yan
6bcf54c89b
perf arm-spe: Remove the unused 'midr' field
...
The 'midr' field is replaced by the MIDR values stored in metadata (per
CPU wise). Remove the 'midr' field as it is no longer used.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20241003185322.192357-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-14 12:04:31 -07:00
Leo Yan
ba5e7169e5
perf arm-spe: Use metadata to decide the data source feature
...
Use the info in the metadata to decide if the data source feature is
supported. The CPU MIDR must be in the CPU list for the common data
source encoding.
For the metadata version 1, it doesn't include info for MIDR. In this
case, due to absent info for making decision, print out warning to
remind users to upgrade tool and returns false.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20241003185322.192357-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-14 12:04:31 -07:00
Leo Yan
56ae663e76
perf arm-spe: Introduce arm_spe__is_homogeneous()
...
Introduce the arm_spe__is_homogeneous() function, it uses to check if
Arm SPE is homogeneous cross all CPUs.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20241003185322.192357-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-14 12:04:31 -07:00
Leo Yan
50b8f1d5bf
perf arm-spe: Rename the common data source encoding
...
The Neoverse CPUs follow the common data source encoding, and other
CPU variants can share the same format.
Rename the CPU list and data source definitions as common data source
names. This change prepares for appending more CPU variants.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20241003185322.192357-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-14 12:04:31 -07:00
Leo Yan
fb98fa3bf8
perf arm-spe: Rename arm_spe__synth_data_source_generic()
...
The arm_spe__synth_data_source_generic() function is invoked when the
tool detects that CPUs do not support data source packets and falls back
to synthesizing only the memory level.
Rename it to arm_spe__synth_memory_level() for better reflecting its
purpose.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Link: https://lore.kernel.org/r/20241003185322.192357-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-14 12:04:31 -07:00
Leo Yan
e52abceb4b
perf arm-spe: Dump metadata with version 2
...
This commit dumps metadata with version 2. It dumps metadata for header
and per CPU data respectively in the arm_spe_print_info() function to
support metadata version 2 format.
After:
0 0 0x3c0 [0x1b0]: PERF_RECORD_AUXTRACE_INFO type: 4
Header version :2
Header size :4
PMU type v2 :13
CPU number :8
Magic :0x1010101010101010
CPU # :0
Num of params :3
MIDR :0x410fd801
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :1
Num of params :3
MIDR :0x410fd801
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :2
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :3
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :4
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :5
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :6
Num of params :3
MIDR :0x410fd850
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :7
Num of params :3
MIDR :0x410fd850
PMU Type :-1
Min Interval :0
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Cc: Will Deacon <will@kernel.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com >
Cc: John Garry <john.g.garry@oracle.com >
Link: https://lore.kernel.org/r/20241003184302.190806-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-03 15:23:31 -07:00
Leo Yan
7842a4b6ff
perf arm-spe: Support metadata version 2
...
This commit is to support metadata version 2 and at the meantime it is
backward compatible for version 1's format.
The metadata version 1 doesn't include the ARM_SPE_HEADER_VERSION field.
As version 1 is fixed with two u64 fields, by checking the metadata
size, it distinguishes the metadata is version 1 or version 2 (and any
new versions if later will have). For version 2, it reads out CPU number
and retrieves the metadata info for every CPU.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Cc: Will Deacon <will@kernel.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com >
Cc: John Garry <john.g.garry@oracle.com >
Link: https://lore.kernel.org/r/20241003184302.190806-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-03 15:23:27 -07:00
Leo Yan
0ca2c45404
perf arm-spe: Define metadata header version 2
...
The first version's metadata header structure doesn't include a field to
indicate a header version, which is not friendly for extension.
Define the metadata version 2 format with a new header structure and
extend per CPU's metadata. In the meantime, the old metadata header will
still be supported for backward compatibility.
Signed-off-by: Leo Yan <leo.yan@arm.com >
Reviewed-by: James Clark <james.clark@linaro.org >
Cc: Will Deacon <will@kernel.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com >
Cc: John Garry <john.g.garry@oracle.com >
Link: https://lore.kernel.org/r/20241003184302.190806-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-10-03 15:23:09 -07:00
Ian Rogers
30f29bae91
perf tool: Constify tool pointers
...
The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Tested-by: Adrian Hunter <adrian.hunter@intel.com >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Anshuman Khandual <anshuman.khandual@arm.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Huacai Chen <chenhuacai@kernel.org >
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Jonathan Cameron <jonathan.cameron@huawei.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Nick Terrell <terrelln@fb.com >
Cc: Oliver Upton <oliver.upton@linux.dev >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Song Liu <song@kernel.org >
Cc: Sun Haiyong <sunhaiyong@loongson.cn >
Cc: Suzuki Poulouse <suzuki.poulose@arm.com >
Cc: Will Deacon <will@kernel.org >
Cc: Yanteng Si <siyanteng@loongson.cn >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-08-12 18:05:14 -03:00
Ian Rogers
4e322c7855
perf auxtrace: Remove dummy tools
...
Add perf_session__deliver_synth_attr_event that synthesizes a
perf_record_header_attr event with one id. Remove use of
perf_event__synthesize_attr that necessitates the use of the dummy
tool in order to pass the session.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Tested-by: Adrian Hunter <adrian.hunter@intel.com >
Tested-by: Leo Yan <leo.yan@arm.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Anshuman Khandual <anshuman.khandual@arm.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Huacai Chen <chenhuacai@kernel.org >
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Jonathan Cameron <jonathan.cameron@huawei.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Nick Terrell <terrelln@fb.com >
Cc: Oliver Upton <oliver.upton@linux.dev >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Song Liu <song@kernel.org >
Cc: Sun Haiyong <sunhaiyong@loongson.cn >
Cc: Suzuki Poulouse <suzuki.poulose@arm.com >
Cc: Will Deacon <will@kernel.org >
Cc: Yanteng Si <siyanteng@loongson.cn >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-08-12 18:04:36 -03:00
Ian Rogers
ee84a3032b
perf thread: Add accessor functions for thread
...
Using accessors will make it easier to add reference count checking in
later patches.
Committer notes:
thread->nsinfo wasn't wrapped as it is used together with
nsinfo__zput(), where does a trick to set the field with a refcount
being dropped to NULL, and that doesn't work well with using
thread__nsinfo(thread), that loses the &thread->nsinfo pointer.
When refcount checking is added to 'struct thread', later in this
series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to
check the thread pointer.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Brian Robbins <brianrob@linux.microsoft.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Dmitrii Dolgov <9erthalion6@gmail.com >
Cc: Fangrui Song <maskray@google.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Ivan Babrou <ivan@cloudflare.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jing Zhang <renyu.zj@linux.alibaba.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Miguel Ojeda <ojeda@kernel.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Sean Christopherson <seanjc@google.com >
Cc: Steinar H. Gunderson <sesse@google.com >
Cc: Suzuki Poulouse <suzuki.poulose@arm.com >
Cc: Wenyu Liu <liuwenyu7@huawei.com >
Cc: Will Deacon <will@kernel.org >
Cc: Yang Jihong <yangjihong1@huawei.com >
Cc: Ye Xingchen <ye.xingchen@zte.com.cn >
Cc: Yuan Can <yuancan@huawei.com >
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230608232823.4027869-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2023-06-12 15:57:53 -03:00
German Gomez
03a6c16ebf
perf arm-spe: Add SVE flags to the SPE samples
...
Add flags from the Scalable Vector Extension (SVE) to the SPE samples
which are available from Armv8.3 (FEAT_SPEv1p1).
These will be displayed in a new SIMD sort field in a later commit.
Signed-off-by: German Gomez <german.gomez@arm.com >
Signed-off-by: James Clark <james.clark@arm.com >
Acked-by: Ian Rogers <irogers@google.com >
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Cc: Anshuman.Khandual@arm.com
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Arnaldo Carvalho de Melo <acme@kernel.org >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: linux-arm-kernel@lists.infradead.org
Cc: John Garry <john.g.garry@oracle.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2023-03-20 19:28:11 -03:00
German Gomez
0066015a3d
perf arm-spe: Refactor arm-spe to support operation packet type
...
Extend the decoder of Arm SPE records to support more fields from the
operation packet type.
Not all fields are being decoded by this commit. Only those needed to
support the use-case SVE load/store/other operations.
Suggested-by: Leo Yan <leo.yan@linaro.org >
Signed-off-by: German Gomez <german.gomez@arm.com >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2023-03-20 19:27:52 -03:00
Jing Zhang
74a61d53a6
perf arm-spe: augment the data source type with neoverse_spe list
...
When synthesizing event with SPE data source, commit 4e6430cbb1a9("perf
arm-spe: Use SPE data source for neoverse cores") augment the type with
source information by MIDR. However, is_midr_in_range only compares the
first entry in neoverse_spe.
Change is_midr_in_range to is_midr_in_range_list to traverse the
neoverse_spe array so that all neoverse cores synthesize event with data
source packet.
Fixes: 4e6430cbb1 ("perf arm-spe: Use SPE data source for neoverse cores")
Reviewed-by: Ali Saidi <alisaidi@amazon.com >
Reviewed-by: Leo Yan <leo.yan@linaro.org >
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ali Saidi <alisaidi@amazon.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.garry@huawei.com >
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Shuai Xue <xueshuai@linux.alibaba.com >
Cc: Timothy Hayes <timothy.hayes@arm.com >
Cc: Will Deacon <will@kernel.org >
Cc: Zhuo Song <zhuo.song@linux.alibaba.com >
Link: https://lore.kernel.org/r/1664197396-42672-1-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-09-28 11:26:33 -03:00
Ali Saidi
4e6430cbb1
perf arm-spe: Use SPE data source for neoverse cores
...
When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
the same encoding. I can't find encoding information for any other SPE
implementations to unify their choices with Arm's thus that is left for
future work.
This change populates the mem_lvl_num for Neoverse cores as well as the
deprecated mem_lvl namespace.
Reviewed-by: German Gomez <german.gomez@arm.com >
Reviewed-by: Leo Yan <leo.yan@linaro.org >
Signed-off-by: Ali Saidi <alisaidi@amazon.com >
Tested-by: Leo Yan <leo.yan@linaro.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Anshuman Khandual <anshuman.khandual@arm.com >
Cc: Gustavo A. R. Silva <gustavoars@kernel.org >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.garry@huawei.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Like Xu <likexu@tencent.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Timothy Hayes <timothy.hayes@arm.com >
Cc: Will Deacon <will@kernel.org >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220811062451.435810-4-leo.yan@linaro.org
Signed-off-by: Leo Yan <leo.yan@linaro.org >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-08-11 19:12:01 -03:00
Leo Yan
51ba539f5b
perf arm-spe: Don't set data source if it's not a memory operation
...
Except for memory load and store operations, ARM SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.
This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source. Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.
Fixes: e55ed3423c ("perf arm-spe: Synthesize memory event")
Reviewed-by: Ali Saidi <alisaidi@amazon.com >
Reviewed-by: German Gomez <german.gomez@arm.com >
Signed-off-by: Leo Yan <leo.yan@linaro.org >
Tested-by: Ali Saidi <alisaidi@amazon.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: alisaidi@amazon.com
Cc: Andrew Kilroy <andrew.kilroy@arm.com >
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.garry@huawei.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Li Huafei <lihuafei1@huawei.com >
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Nick Forrington <nick.forrington@arm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Link: http://lore.kernel.org/lkml/20220517020326.18580-5-alisaidi@amazon.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-06-19 10:41:43 -03:00
Timothy Hayes
7599b70a3c
perf arm-spe: Fix SPE events with phys addresses
...
This patch corrects a bug whereby SPE collection is invoked with
pa_enable=1 but synthesized events fail to show physical addresses.
Reviewed-by: Leo Yan <leo.yan@linaro.org >
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Fastabend <john.fastabend@gmail.com >
Cc: John Garry <john.garry@huawei.com >
Cc: KP Singh <kpsingh@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Martin KaFai Lau <kafai@fb.com >
Cc: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Song Liu <songliubraving@fb.com >
Cc: Will Deacon <will@kernel.org >
Cc: Yonghong Song <yhs@fb.com >
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220421165205.117662-3-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-04-28 10:39:28 -03:00
Timothy Hayes
4e13f6706d
perf arm-spe: Fix addresses of synthesized SPE events
...
This patch corrects a bug whereby synthesized events from SPE
samples are missing virtual addresses.
Fixes: 54f7815efe ("perf arm-spe: Fill address info for samples")
Reviewed-by: Leo Yan <leo.yan@linaro.org >
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: bpf@vger.kernel.org
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Fastabend <john.fastabend@gmail.com >
Cc: John Garry <john.garry@huawei.com >
Cc: KP Singh <kpsingh@kernel.org >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Martin KaFai Lau <kafai@fb.com >
Cc: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: netdev@vger.kernel.org
Cc: Song Liu <songliubraving@fb.com >
Cc: Will Deacon <will@kernel.org >
Cc: Yonghong Song <yhs@fb.com >
Link: https://lore.kernel.org/r/20220421165205.117662-2-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-04-28 10:39:14 -03:00
German Gomez
ff8752d761
perf arm-spe: Synthesize SPE instruction events
...
Synthesize instruction events for every ARM SPE record.
Arm SPE implements a hardware-based sample period, and perf implements a
software-based one. Add a warning message to inform the user of this.
Signed-off-by: German Gomez <german.gomez@arm.com >
Tested-by: Leo Yan <leo.yan@linaro.org >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: John Garry <john.garry@huawei.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Will Deacon <will@kernel.org >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211216152404.52474-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2021-12-17 22:44:10 -03:00
Namhyung Kim
b0fde9c6e2
perf arm-spe: Add SPE total latency as PERF_SAMPLE_WEIGHT
...
Use total latency info in the SPE counter packet as sample weight so
that we can see it in local_weight and (global) weight sort keys.
Maybe we can use PERF_SAMPLE_WEIGHT_STRUCT to support ins_lat as well
but I'm not sure which latency it matches. So just adding total latency
first.
Reviewed-by: Leo Yan <leo.yan@linaro.org >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: German Gomez <german.gomez@arm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Stephane Eranian <eranian@google.com >
Link: http://lore.kernel.org/lkml/20211201220855.1260688-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2021-12-16 12:18:11 -03:00
German Gomez
9e1a8d9f68
perf inject: Fix ARM SPE handling
...
'perf inject' is currently not working for Arm SPE. When you try to run
'perf inject' and 'perf report' with a perf.data file that contains SPE
traces, the tool reports a "Bad address" error:
# ./perf record -e arm_spe_0/ts_enable=1,store_filter=1,branch_filter=1,load_filter=1/ -a -- sleep 1
# ./perf inject -i perf.data -o perf.inject.data --itrace
# ./perf report -i perf.inject.data --stdio
0x42c00 [0x8]: failed to process type: 9 [Bad address]
Error:
failed to process sample
As far as I know, the issue was first spotted in [1], but 'perf inject'
was not yet injecting the samples. This patch does something similar to
what cs_etm does for injecting the samples [2], but for SPE.
[1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210412091006.468557-1-leo.yan@linaro.org/#24117339
[2] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/cs-etm.c?h=perf/core&id=133fe2e617e48ca0948983329f43877064ffda3e#n1196
Reviewed-by: James Clark <james.clark@arm.com >
Signed-off-by: German Gomez <german.gomez@arm.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: John Garry <john.garry@huawei.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Will Deacon <will@kernel.org >
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211105104130.28186-2-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2021-11-18 10:08:07 -03:00