'perf trace -p <PID>' work on a syscall that is unaugmented, but doesn't
work on a syscall that's augmented (when it calls perf_event_output() in
BPF).
Let's take open() as an example. open() is augmented in perf trace.
Before:
$ perf trace -e open -p 3792392
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
We can see there's no output.
After:
$ perf trace -e open -p 3792392
0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
Reason:
bpf_perf_event_output() will fail when you specify a pid in 'perf trace' (EOPNOTSUPP).
When using 'perf trace -p 114', before perf_event_open(), we'll have PID
= 114, and CPU = -1.
This is bad for bpf-output event, because the ring buffer won't accept
output from BPF's perf_event_output(), making it fail. I'm still trying
to find out why.
If we open bpf-output for every cpu, instead of setting it to -1, like
this:
PID = <PID>, CPU = 0
PID = <PID>, CPU = 1
PID = <PID>, CPU = 2
PID = <PID>, CPU = 3
Everything works.
You can test it with this script (open.c):
#include <unistd.h>
#include <sys/syscall.h>
int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DINGZHEN", s2[] = "XUEBAO";
while (1) {
syscall(SYS_open, s1, i1, i2);
sleep(1);
}
return 0;
}
save, compile:
make open
perf trace:
perf trace -e open <path-to-the-executable>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regular expression.
This output checking helper is designed to allow checking stderr output
of perf commands for unexpected messages, while ignoring messages that
are known to be harmless, e.g.:
"Lowering default frequency rate to \d+\."
"\d+ out of order events recorded."
etc.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-10-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test scripts in base_* directories currently have their own drivers
that run them. Before this patch, the shell test-suite generator causes
them to run twice. Fix that by skipping them in the generator.
A cleaner solution (for future) will be to use the directory structure
idea (introduced by Carsten Haitzler in 7391db6459 ("perf test:
Refactor shell tests allowing subdirs")) to generate test entries with
subtests, like:
$ perf test list
[...]
97: perf probe shell tests
97:1: perf probe basic functionality
97:2: perf probe tests with arguments
97:3: perf probe invalid options handling
[...]
There is already a lot of shell test scripts and many are about to come,
so there is a need for some hierarchy.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-3-vmolnaro@redhat.com
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The getname_flags() routine changed recently and thus the place where we
were getting the pathname is not probeable anymore, albeit still
present, so use the next line for that, before:
root@number:/home/acme/git/perf-tools-next# perf test vfs_getname
91: Add vfs_getname probe to get syscall args filenames : FAILED!
93: Use vfs_getname probe to get syscall args filenames : FAILED!
126: Check open filename arg using perf trace + vfs_getname : FAILED!
root@number:/home/acme/git/perf-tools-next#
Now tests 91 and 126 are passing, some more investigation is needed for
test 93, that continues to fail.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add Multiple bpf-filter test for two or more events with filters.
It uses task-clock and page-faults events with different filter
expressions and check the perf script output
$ sudo ./perf test filtering -vv
96: perf record sample filtering (by BPF) tests:
--- start ---
test child forked, pid 2804025
Checking BPF-filter privilege
Basic bpf-filter test
Basic bpf-filter test [Success]
Failing bpf-filter test
Error: task-clock event does not have PERF_SAMPLE_CPU
Failing bpf-filter test [Success]
Group bpf-filter test
Error: task-clock event does not have PERF_SAMPLE_CPU
Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE
Group bpf-filter test [Success]
Multiple bpf-filter test
Multiple bpf-filter test [Success]
---- end(0) ----
96: perf record sample filtering (by BPF) tests : Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240820154504.128923-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So far it used tgid as a key to get the filter expressions in the
pinned filters map for regular users but it won't work well if the has
more than one filters at the same time. Let's add the event id to the
key of the filter hash map so that it can identify the right filter
expression in the BPF program.
As the event can be inherited to child tasks, it should use the primary
id which belongs to the parent (original) event. Since evsel opens the
event for multiple CPUs and tasks, it needs to maintain a separate hash
map for the event id.
In the user space, it keeps a list for the multiple evsel and release
the entries in the both hash map when it closes the event.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240820154504.128923-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf crashes as below when applying --no-group
# perf record -e "{cache-misses,branches"} -b sleep 1
# perf report --stdio --no-group
free(): invalid next size (fast)
Aborted (core dumped)
#
In the __hpp__fmt(), only 1 hpp_fmt_value is allocated for the current
event when --no-group is applied.
However, the current implementation tries to assign the hists from all
members to the hpp_fmt_value, which exceeds the allocated memory.
Fixes: 8f6071a3dc ("perf hist: Simplify __hpp_fmt() using hpp_fmt_data")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240820183202.3174323-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In some cases, compilers don't set the location expression in DWARF
precisely. For instance, it may assign a variable to a register after
copying it from a different register. Then it should use the register
for the new type but still uses the old register. This makes hard to
track the type information properly.
This is an example I found in __tcp_transmit_skb(). The first argument
(sk) of this function is a pointer to sock and there's a variable (tp)
for tcp_sock.
static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
int clone_it, gfp_t gfp_mask, u32 rcv_nxt)
{
...
struct tcp_sock *tp;
BUG_ON(!skb || !tcp_skb_pcount(skb));
tp = tcp_sk(sk);
prior_wstamp = tp->tcp_wstamp_ns;
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
...
So it basically calls tcp_sk(sk) to get the tcp_sock pointer from sk.
But it turned out to be the same value because tcp_sock embeds sock as
the first member. The sk is located in reg5 (RDI) and tp is in reg3
(RBX). The offset of tcp_wstamp_ns is 0x748 and tcp_clock_cache is
0x750. So you need to use RBX (reg3) to access the fields in the
tcp_sock. But the code used RDI (reg5) as it has the same value.
$ pahole --hex -C tcp_sock vmlinux | grep -e 748 -e 750
u64 tcp_wstamp_ns; /* 0x748 0x8 */
u64 tcp_clock_cache; /* 0x750 0x8 */
And this is the disassembly of the part of the function.
<__tcp_transmit_skb>:
...
44: mov %rdi, %rbx
47: mov 0x748(%rdi), %rsi
4e: mov 0x750(%rdi), %rax
55: cmp %rax, %rsi
Because compiler put the debug info to RBX, it only knows RDI is a
pointer to sock and accessing those two fields resulted in error
due to offset being beyond the type size.
-----------------------------------------------------------
find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 30]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c) <<<--- the first argument ('sk' at %RDI)
mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
mov [20] stack canary -> reg0
mov [29] reg0 -> -0x30(stack) stack canary
bb: [36 - 3e]
mov [36] reg4 -> reg15 type='struct sk_buff*' size=0x8 (die:0x8181360)
bb: [44 - 63]
mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c) <<<--- calling tcp_sk()
var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead) <<<--- new variable ('tp' at %RBX)
var [4e] reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
chk [63] reg5 offset=0x748 ok=1 kind=1 (struct sock*) : offset bigger than size <<<--- access with old variable
final result: offset bigger than size
While it's a fault in the compiler, we could work around this issue by
using the type of new variable when it's copied directly. So I've added
copied_from field in the register state to track those direct register
to register copies. After that new register gets a new type and the old
register still has the same type, it'll update (copy it back) the type
of the old register.
For example, if we can update type of reg5 at __tcp_transmit_skb+0x47,
we can find the target type of the instruction at 0x63 like below:
-----------------------------------------------------------
find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
...
bb: [44 - 63]
mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c)
var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead)
var [47] copyback reg5 type='struct tcp_sock*' size=0x8 (die:0x819eead) <<<--- here
mov [47] 0x748(reg5) -> reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [4e] 0x750(reg5) -> reg0 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
chk [63] reg5 offset=0x748 ok=1 kind=1 (struct tcp_sock*) : Good! <<<--- new type
found by insn track: 0x748(reg5) type-offset=0x748
final result: type='struct tcp_sock' size=0xa98 (die:0x819eeb2)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When checking the match variable at the target instruction, it might not
have any information if it's a first write to a stack slot. In this
case it could spill a register value into the stack so the type info is
in the source operand.
But currently it's hard to get the operand from the checking function.
Let's process the instruction and retry to get the type info from the
stack if there's no information already.
This is an example of __tcp_transmit_skb(). The instructions are
<__tcp_transmit_skb>:
0: nopl 0x0(%rax, %rax, 1)
5: push %rbp
6: mov %rsp, %rbp
9: push %r15
b: push %r14
d: push %r13
f: push %r12
11: push %rbx
12: sub $0x98, %rsp
19: mov %r8d, -0xa8(%rbp)
...
It cannot find any variable at -0xa8(%rbp) at this point.
-----------------------------------------------------------
find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 19]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : no type information
no type information
And it was able to find the type after processing the 'mov' instruction.
-----------------------------------------------------------
find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 19]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : retry <<<--- here
mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : Good!
found by insn track: -0xa8(reg6) type-offset=0
final result: type='unsigned int' size=0x4 (die:0x8180ed6)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In check_matching_type(), it'd be easier to display the typename in
question if it's available.
For example, check out the line starts with 'chk'.
-----------------------------------------------------------
find data type for 0x10(reg0) at cpuacct_charge+0x13
CU for kernel/sched/build_utility.c (die:0x137ee0b)
frame base: cfa=1 fbreg=7
scope: [3/3] (die:13d9632)
bb: [c - 13]
var [c] reg5 type='struct task_struct*' size=0x8 (die:0x1381230)
mov [c] 0xdf8(reg5) -> reg0 type='struct css_set*' size=0x8 (die:0x1385c56)
chk [13] reg0 offset=0x10 ok=1 kind=1 (struct css_set*) : Good! <<<--- here
found by insn track: 0x10(reg0) type-offset=0x10
final result: type='struct css_set' size=0x250 (die:0x1385b0e)
Another example:
-----------------------------------------------------------
find data type for 0x8(reg0) at menu_select+0x279
CU for drivers/cpuidle/governors/menu.c (die:0x7b0fe79)
frame base: cfa=1 fbreg=7
scope: [2/2] (die:7b11010)
bb: [273 - 277]
bb: [279 - 279]
chk [279] reg0 offset=0x8 ok=0 kind=0 cfa : no type information
scope: [1/2] (die:7b10cbc)
bb: [0 - 64]
...
mov [26a] imm=0xffffffff -> reg15
bb: [273 - 277]
bb: [279 - 279]
chk [279] reg0 offset=0x8 ok=1 kind=1 (long long unsigned int) : no/void pointer <<<--- here
final result: no/void pointer
Also change some places to print negative offsets properly.
Before:
-----------------------------------------------------------
find data type for 0xffffff40(reg6) at __tcp_transmit_skb+0x58
After:
-----------------------------------------------------------
find data type for -0xc0(reg6) at __tcp_transmit_skb+0x58
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes it's useful to organize member fields in cache-line boundary.
The 'typecln' sort key is short for type-cacheline and to show samples
in each cacheline. The cacheline size is fixed to 64 for now, but it
can read the actual size once it saves the value from sysfs.
For example, you maybe want to which cacheline in a target is hot or
cold. The following shows members in the cfs_rq's first cache line.
$ perf report -s type,typecln,typeoff -H
...
- 2.67% struct cfs_rq
+ 1.23% struct cfs_rq: cache-line 2
+ 0.57% struct cfs_rq: cache-line 4
+ 0.46% struct cfs_rq: cache-line 6
- 0.41% struct cfs_rq: cache-line 0
0.39% struct cfs_rq +0x14 (h_nr_running)
0.02% struct cfs_rq +0x38 (tasks_timeline.rb_leftmost)
...
Committer testing:
# root@number:~# perf report -s type,typecln,typeoff -H --stdio
# Total Lost Samples: 0
#
# Samples: 5K of event 'cpu_atom/mem-loads,ldlat=5/P'
# Event count (approx.): 312251
#
# Overhead Data Type / Data Type Cacheline / Data Type Offset
# .............. ..................................................
#
<SNIP>
0.07% struct sigaction
0.05% struct sigaction: cache-line 1
0.02% struct sigaction +0x58 (sa_mask)
0.02% struct sigaction +0x78 (sa_mask)
0.03% struct sigaction: cache-line 0
0.02% struct sigaction +0x38 (sa_mask)
0.01% struct sigaction +0x8 (sa_mask)
<SNIP>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819233603.54941-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It'd be better to have them in hex to check cacheline alignment.
Percent offset size field
100.00 0 0x1c0 struct cfs_rq {
0.00 0 0x10 struct load_weight load {
0.00 0 0x8 long unsigned int weight;
0.00 0x8 0x4 u32 inv_weight;
};
0.00 0x10 0x4 unsigned int nr_running;
14.56 0x14 0x4 unsigned int h_nr_running;
0.00 0x18 0x4 unsigned int idle_nr_running;
0.00 0x1c 0x4 unsigned int idle_h_nr_running;
...
Committer notes:
Justification from Namhyung when asked about why it would be "better":
Cache line sizes are power of 2 so it'd be natural to use hex and
check whether an offset is in the same boundary. Also 'perf annotate'
shows instruction offsets in hex.
>
> Maybe this should be selectable?
I can add an option and/or a config if you want.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819233603.54941-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In check_matching_type(), it checks the type state of the register in a
wrong order. When it's the percpu pointer, it should check the type for
the pointer, but it checks the CFA bit first and thought it has no type
in the stack slot. This resulted in no type info.
-----------------------------------------------------------
find data type for 0x28(reg1) at hrtimer_reprogram+0x88
CU for kernel/time/hrtimer.c (die:0x18f219f)
frame base: cfa=1 fbreg=7
...
add [72] percpu 0x24500 -> reg1 pointer type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
bb: [7a - 7e]
bb: [80 - 86] (here)
bb: [88 - 88] vvv
chk [88] reg1 offset=0x28 ok=1 kind=4 cfa : no type information
no type information
Here, instruction at 0x72 found reg1 has a (percpu) pointer and got the
correct type. But when it checks the final result, it wrongly thought
it was stack variable because it checks the cfa bit first.
After changing the order of state check:
-----------------------------------------------------------
find data type for 0x28(reg1) at hrtimer_reprogram+0x88
CU for kernel/time/hrtimer.c (die:0x18f219f)
frame base: cfa=1 fbreg=7
... (here)
vvvvvvvvvv
chk [88] reg1 offset=0x28 ok=1 kind=4 percpu ptr : Good!
found by insn track: 0x28(reg1) type-offset=0x28
final type: type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The previous attempt fixed the build on debian:experimental-x-mipsel,
but when building on a larger set of containers I noticed it broke the
build on some other 32-bit architectures such as:
42 7.87 ubuntu:18.04-x-arm : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
builtin-daemon.c: In function 'cmd_session_list':
builtin-daemon.c:692:16: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=]
fprintf(out, "%c%" PRIu64,
^~~~~
builtin-daemon.c:694:13:
csv_sep, (curr - daemon->start) / 60);
~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from builtin-daemon.c:3:0:
/usr/arm-linux-gnueabihf/include/inttypes.h:105:34: note: format string is defined here
# define PRIu64 __PRI64_PREFIX "u"
So lets cast that time_t (32-bit/64-bit) to uint64_t to make sure it
builds everywhere.
Fixes: 4bbe600293 ("perf daemon: Fix the build on 32-bit architectures")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZsPmldtJ0D9Cua9_@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add it to the record.sh shell test to verify if it tracks cgroup
information correctly. It records with --all-cgroups option can check
if it has PERF_RECORD_CGROUP and the names are not "unknown".
$ sudo ./perf test -vv 95
95: perf record tests:
--- start ---
test child forked, pid 2871922
169c90-169cd0 g test_loop
perf does have symbol 'test_loop'
Basic --per-thread mode test
Basic --per-thread mode test [Success]
Register capture test
Register capture test [Success]
Basic --system-wide mode test
Basic --system-wide mode test [Success]
Basic target workload test
Basic target workload test [Success]
Branch counter test
branch counter feature not supported on all core PMUs (/sys/bus/event_source/devices/cpu) [Skipped]
Cgroup sampling test
Cgroup sampling test [Success]
---- end(0) ----
95: perf record tests : Ok
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240818212948.2873156-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes it matches a variable in the inner scope but it fails because
the actual access can be on a different type. Let's try variables in
every scope and choose the best one using is_better_type().
I have an example with update_blocked_averages(), at first it found a
variable (__mptr) but it's a void pointer. So it moved on to the upper
scope and found another variable (cfs_rq).
$ perf --debug type-profile annotate --data-type --stdio
...
-----------------------------------------------------------
find data type for 0x140(reg14) at update_blocked_averages+0x2db
CU for kernel/sched/fair.c (die:0x12dd892)
frame base: cfa=1 fbreg=7
found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer
variable location: base=reg14, offset=0x140
type='void*' size=0x8 (die:0x12dd8f9)
found "cfs_rq" (die: 0x1301721) in scope=3/4 (die: 0x130171c) type_offset=0x140
variable location: reg14
type='struct cfs_rq' size=0x1c0 (die:0x12e37e5)
final type: type='struct cfs_rq' size=0x1c0 (die:0x12e37e5)
IIUC the scope is like below:
1: update_blocked_averages
2: __update_blocked_fair
3: for_each_leaf_cfs_rq_safe
4: list_entry -> (container_of)
The container_of is implemented like:
#define container_of(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
static_assert(__same_type(*(ptr), ((type *)0)->member) || \
__same_type(*(ptr), void), \
"pointer type mismatch in container_of()"); \
((type *)(__mptr - offsetof(type, member))); })
That's why we see the __mptr variable first but it failed since it has
no type information.
Then for_each_leaf_cfs_rq_safe() is defined as
#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \
list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, \
leaf_cfs_rq_list)
Note that the access was 0x140(r14). And the cfs_rq has
leaf_cfs_rq_list at the 0x140. So it converts the list_head pointer to
a pointer to struct cfs_rq here.
$ pahole --hex -C cfs_rq vmlinux | grep 140
struct cfs_rq struct list_head leaf_cfs_rq_list; /* 0x140 0x10 */
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes more than one variables are located in the same register or a
stack slot. Or it can overwrite existing information with others. I
found this is not helpful in some cases so it needs to update the type
information from the variable only if it's better.
But it's hard to know which one is better, so we needs heuristics. :)
As it deals with memory accesses, the location should have a pointer or
something similar (like array or reference). So if it had an integer
type and a variable is a pointer, we can take the variable's type to
resolve the target of the access.
If it has a pointer type and a variable with the same location has a
different pointer type, it'll take one with bigger target type. This
can be useful when the target type embeds a smaller type (like list
header or RB-tree node) at the beginning so their location is same.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>