errno.h isn't used in auxtrace.h so remove it and fix build failures
caused by transitive dependencies through auxtrace.h on errno.h.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The tracked pointer offset was not being preserved in the stack state,
which could lead to incorrect type analysis. This change adds a
ptr_offset field to the type_state_stack struct and passes it to
set_stack_state and findnew_stack_state to ensure the offset is
preserved after the pointer is loaded from a stack location. It improves
the type annotation coverage and quality.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Track the arithmetic operations on registers with pointer types. We
handle only add, sub and lea instructions. The original pointer
information needs to be preserved for getting outermost struct types.
For example, reg0 points to a struct cfs_rq, when we add 0x10 to reg0,
it should preserve the information of struct cfs_rq + 0x10 in the
register instead of a pointer type to the child field at 0x10.
Details:
1. struct type_state_reg now includes an offset, indicating if the
register points to the start or an internal part of its associated
type. This offset is used in mem to reg and reg to stack mem
transfers, and also applied to the final type offset.
2. lea offset(%sp/%fp), reg is now treated as taking the address of a
stack variable. It worked fine in most cases, but an issue with this
approach is the pointer type may not exist.
3. lea offset(%base), reg is handled by moving the type from %base and
adding an offset, similar to an add operation followed by a mov reg
to reg.
4. Non-stack variables from DWARF with non-zero offsets in their
location expressions are now accepted with register offset tracking.
Multi-register addressing modes in LEA are not supported.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Introduce TSR_KIND_POINTER to improve the data type profiler's ability
to track pointer-based memory accesses and address register variables.
TSR_KIND_POINTER represents that the location holds a pointer type to
the type in the type state. The semantics match the `breg` registers
that describe a memory location.
This change implements handling for this new kind in mov instructions
and in the check_matching_type() function. When a TSR_KIND_POINTER is
moved to the stack, the stack state size is set to the architecture's
pointer size.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Utilizes the previous is_breg_access_indirect function to determine if
the register + offset stores the variable itself or the struct it points
to, save the information in die_var_type.is_reg_var_addr.
Since we are storing the real types in the stack state, we need to do a
type dereference when is_reg_var_addr is set to false for stack/frame
registers.
For other gp registers, skip the variable when the register is a pointer
to the type. If we want to accept these variables, we might also utilize
is_reg_var_addr in a different way, we need to mark that register as a
pointer to the type.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Zecheng Li <zecheng@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xu Liu <xliuprof@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Factor out a function to get the name of member field at the given
offset. This will be used in other places.
Also update the output of typeoff sort key a little bit. As we know
that some special types like (stack operation), (stack canary) and
(unknown) won't have fields, skip printing the offset and field.
For example, the following change is expected.
"(stack operation) +0 (no field)" ==> "(stack operation)"
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The pr_debug_scope() is to print more information about the scope DIE
during the instruction tracking so that it can help finding relevant
debug info and the source code like inlined functions more easily.
$ perf --debug type-profile annotate --data-type
...
-----------------------------------------------------------
find data type for 0(reg0, reg12) at set_task_cpu+0xdd
CU for kernel/sched/core.c (die:0x1268dae)
frame base: cfa=1 fbreg=7
scope: [3/3] (die:12b6d28) [inlined] set_task_rq <<<--- (here)
bb: [9f - dd]
var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0)
var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In some cases, compilers don't set the location expression in DWARF
precisely. For instance, it may assign a variable to a register after
copying it from a different register. Then it should use the register
for the new type but still uses the old register. This makes hard to
track the type information properly.
This is an example I found in __tcp_transmit_skb(). The first argument
(sk) of this function is a pointer to sock and there's a variable (tp)
for tcp_sock.
static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
int clone_it, gfp_t gfp_mask, u32 rcv_nxt)
{
...
struct tcp_sock *tp;
BUG_ON(!skb || !tcp_skb_pcount(skb));
tp = tcp_sk(sk);
prior_wstamp = tp->tcp_wstamp_ns;
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
...
So it basically calls tcp_sk(sk) to get the tcp_sock pointer from sk.
But it turned out to be the same value because tcp_sock embeds sock as
the first member. The sk is located in reg5 (RDI) and tp is in reg3
(RBX). The offset of tcp_wstamp_ns is 0x748 and tcp_clock_cache is
0x750. So you need to use RBX (reg3) to access the fields in the
tcp_sock. But the code used RDI (reg5) as it has the same value.
$ pahole --hex -C tcp_sock vmlinux | grep -e 748 -e 750
u64 tcp_wstamp_ns; /* 0x748 0x8 */
u64 tcp_clock_cache; /* 0x750 0x8 */
And this is the disassembly of the part of the function.
<__tcp_transmit_skb>:
...
44: mov %rdi, %rbx
47: mov 0x748(%rdi), %rsi
4e: mov 0x750(%rdi), %rax
55: cmp %rax, %rsi
Because compiler put the debug info to RBX, it only knows RDI is a
pointer to sock and accessing those two fields resulted in error
due to offset being beyond the type size.
-----------------------------------------------------------
find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 30]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c) <<<--- the first argument ('sk' at %RDI)
mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
mov [20] stack canary -> reg0
mov [29] reg0 -> -0x30(stack) stack canary
bb: [36 - 3e]
mov [36] reg4 -> reg15 type='struct sk_buff*' size=0x8 (die:0x8181360)
bb: [44 - 63]
mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c) <<<--- calling tcp_sk()
var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead) <<<--- new variable ('tp' at %RBX)
var [4e] reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
chk [63] reg5 offset=0x748 ok=1 kind=1 (struct sock*) : offset bigger than size <<<--- access with old variable
final result: offset bigger than size
While it's a fault in the compiler, we could work around this issue by
using the type of new variable when it's copied directly. So I've added
copied_from field in the register state to track those direct register
to register copies. After that new register gets a new type and the old
register still has the same type, it'll update (copy it back) the type
of the old register.
For example, if we can update type of reg5 at __tcp_transmit_skb+0x47,
we can find the target type of the instruction at 0x63 like below:
-----------------------------------------------------------
find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
...
bb: [44 - 63]
mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c)
var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead)
var [47] copyback reg5 type='struct tcp_sock*' size=0x8 (die:0x819eead) <<<--- here
mov [47] 0x748(reg5) -> reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [4e] 0x750(reg5) -> reg0 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
chk [63] reg5 offset=0x748 ok=1 kind=1 (struct tcp_sock*) : Good! <<<--- new type
found by insn track: 0x748(reg5) type-offset=0x748
final result: type='struct tcp_sock' size=0xa98 (die:0x819eeb2)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When checking the match variable at the target instruction, it might not
have any information if it's a first write to a stack slot. In this
case it could spill a register value into the stack so the type info is
in the source operand.
But currently it's hard to get the operand from the checking function.
Let's process the instruction and retry to get the type info from the
stack if there's no information already.
This is an example of __tcp_transmit_skb(). The instructions are
<__tcp_transmit_skb>:
0: nopl 0x0(%rax, %rax, 1)
5: push %rbp
6: mov %rsp, %rbp
9: push %r15
b: push %r14
d: push %r13
f: push %r12
11: push %rbx
12: sub $0x98, %rsp
19: mov %r8d, -0xa8(%rbp)
...
It cannot find any variable at -0xa8(%rbp) at this point.
-----------------------------------------------------------
find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 19]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : no type information
no type information
And it was able to find the type after processing the 'mov' instruction.
-----------------------------------------------------------
find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 19]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : retry <<<--- here
mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : Good!
found by insn track: -0xa8(reg6) type-offset=0
final result: type='unsigned int' size=0x4 (die:0x8180ed6)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In check_matching_type(), it'd be easier to display the typename in
question if it's available.
For example, check out the line starts with 'chk'.
-----------------------------------------------------------
find data type for 0x10(reg0) at cpuacct_charge+0x13
CU for kernel/sched/build_utility.c (die:0x137ee0b)
frame base: cfa=1 fbreg=7
scope: [3/3] (die:13d9632)
bb: [c - 13]
var [c] reg5 type='struct task_struct*' size=0x8 (die:0x1381230)
mov [c] 0xdf8(reg5) -> reg0 type='struct css_set*' size=0x8 (die:0x1385c56)
chk [13] reg0 offset=0x10 ok=1 kind=1 (struct css_set*) : Good! <<<--- here
found by insn track: 0x10(reg0) type-offset=0x10
final result: type='struct css_set' size=0x250 (die:0x1385b0e)
Another example:
-----------------------------------------------------------
find data type for 0x8(reg0) at menu_select+0x279
CU for drivers/cpuidle/governors/menu.c (die:0x7b0fe79)
frame base: cfa=1 fbreg=7
scope: [2/2] (die:7b11010)
bb: [273 - 277]
bb: [279 - 279]
chk [279] reg0 offset=0x8 ok=0 kind=0 cfa : no type information
scope: [1/2] (die:7b10cbc)
bb: [0 - 64]
...
mov [26a] imm=0xffffffff -> reg15
bb: [273 - 277]
bb: [279 - 279]
chk [279] reg0 offset=0x8 ok=1 kind=1 (long long unsigned int) : no/void pointer <<<--- here
final result: no/void pointer
Also change some places to print negative offsets properly.
Before:
-----------------------------------------------------------
find data type for 0xffffff40(reg6) at __tcp_transmit_skb+0x58
After:
-----------------------------------------------------------
find data type for -0xc0(reg6) at __tcp_transmit_skb+0x58
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It'd be better to have them in hex to check cacheline alignment.
Percent offset size field
100.00 0 0x1c0 struct cfs_rq {
0.00 0 0x10 struct load_weight load {
0.00 0 0x8 long unsigned int weight;
0.00 0x8 0x4 u32 inv_weight;
};
0.00 0x10 0x4 unsigned int nr_running;
14.56 0x14 0x4 unsigned int h_nr_running;
0.00 0x18 0x4 unsigned int idle_nr_running;
0.00 0x1c 0x4 unsigned int idle_h_nr_running;
...
Committer notes:
Justification from Namhyung when asked about why it would be "better":
Cache line sizes are power of 2 so it'd be natural to use hex and
check whether an offset is in the same boundary. Also 'perf annotate'
shows instruction offsets in hex.
>
> Maybe this should be selectable?
I can add an option and/or a config if you want.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819233603.54941-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In check_matching_type(), it checks the type state of the register in a
wrong order. When it's the percpu pointer, it should check the type for
the pointer, but it checks the CFA bit first and thought it has no type
in the stack slot. This resulted in no type info.
-----------------------------------------------------------
find data type for 0x28(reg1) at hrtimer_reprogram+0x88
CU for kernel/time/hrtimer.c (die:0x18f219f)
frame base: cfa=1 fbreg=7
...
add [72] percpu 0x24500 -> reg1 pointer type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
bb: [7a - 7e]
bb: [80 - 86] (here)
bb: [88 - 88] vvv
chk [88] reg1 offset=0x28 ok=1 kind=4 cfa : no type information
no type information
Here, instruction at 0x72 found reg1 has a (percpu) pointer and got the
correct type. But when it checks the final result, it wrongly thought
it was stack variable because it checks the cfa bit first.
After changing the order of state check:
-----------------------------------------------------------
find data type for 0x28(reg1) at hrtimer_reprogram+0x88
CU for kernel/time/hrtimer.c (die:0x18f219f)
frame base: cfa=1 fbreg=7
... (here)
vvvvvvvvvv
chk [88] reg1 offset=0x28 ok=1 kind=4 percpu ptr : Good!
found by insn track: 0x28(reg1) type-offset=0x28
final type: type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes it matches a variable in the inner scope but it fails because
the actual access can be on a different type. Let's try variables in
every scope and choose the best one using is_better_type().
I have an example with update_blocked_averages(), at first it found a
variable (__mptr) but it's a void pointer. So it moved on to the upper
scope and found another variable (cfs_rq).
$ perf --debug type-profile annotate --data-type --stdio
...
-----------------------------------------------------------
find data type for 0x140(reg14) at update_blocked_averages+0x2db
CU for kernel/sched/fair.c (die:0x12dd892)
frame base: cfa=1 fbreg=7
found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer
variable location: base=reg14, offset=0x140
type='void*' size=0x8 (die:0x12dd8f9)
found "cfs_rq" (die: 0x1301721) in scope=3/4 (die: 0x130171c) type_offset=0x140
variable location: reg14
type='struct cfs_rq' size=0x1c0 (die:0x12e37e5)
final type: type='struct cfs_rq' size=0x1c0 (die:0x12e37e5)
IIUC the scope is like below:
1: update_blocked_averages
2: __update_blocked_fair
3: for_each_leaf_cfs_rq_safe
4: list_entry -> (container_of)
The container_of is implemented like:
#define container_of(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
static_assert(__same_type(*(ptr), ((type *)0)->member) || \
__same_type(*(ptr), void), \
"pointer type mismatch in container_of()"); \
((type *)(__mptr - offsetof(type, member))); })
That's why we see the __mptr variable first but it failed since it has
no type information.
Then for_each_leaf_cfs_rq_safe() is defined as
#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \
list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, \
leaf_cfs_rq_list)
Note that the access was 0x140(r14). And the cfs_rq has
leaf_cfs_rq_list at the 0x140. So it converts the list_head pointer to
a pointer to struct cfs_rq here.
$ pahole --hex -C cfs_rq vmlinux | grep 140
struct cfs_rq struct list_head leaf_cfs_rq_list; /* 0x140 0x10 */
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes more than one variables are located in the same register or a
stack slot. Or it can overwrite existing information with others. I
found this is not helpful in some cases so it needs to update the type
information from the variable only if it's better.
But it's hard to know which one is better, so we needs heuristics. :)
As it deals with memory accesses, the location should have a pointer or
something similar (like array or reference). So if it had an integer
type and a variable is a pointer, we can take the variable's type to
resolve the target of the access.
If it has a pointer type and a variable with the same location has a
different pointer type, it'll take one with bigger target type. This
can be useful when the target type embeds a smaller type (like list
header or RB-tree node) at the beginning so their location is same.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can show a proper debug message in the right place. The
check_variable() is used in other places which don't want to print the
message.
$ perf --debug type-profile annotate --data-type
Before:
-----------------------------------------------------------
find data type for 0x140(reg14) at update_blocked_averages+0x2db
CU for kernel/sched/fair.c (die:0x12dd892)
frame base: cfa=1 fbreg=7
no pointer or no type <<<--- removed
check variable "__mptr" failed (die: 0x13022f1)
variable location: base=reg14, offset=0x140
type='void*' size=0x8 (die:0x12dd8f9)
After:
-----------------------------------------------------------
find data type for 0x140(reg14) at update_blocked_averages+0x2db
CU for kernel/sched/fair.c (die:0x12dd892)
frame base: cfa=1 fbreg=7
found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer <<<--- here
variable location: base=reg14, offset=0x140
type='void*' size=0x8 (die:0x12dd8f9)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The location list will have entries with half-open addressing like
[start, end) which means it doesn't include the end address. So it
should skip entries at the end address and match to the next entry.
An example location list looks like this (from readelf -wo):
00237876 ffffffff8110d32b (base address)
0023787f v000000000000000 v000000000000002 views at 00237868 for:
ffffffff8110d32b ffffffff8110d4eb (DW_OP_reg3 (rbx)) <<<--- 1
00237885 v000000000000002 v000000000000000 views at 0023786a for:
ffffffff8110d4eb ffffffff8110d50b (DW_OP_reg14 (r14)) <<<--- 2
0023788c v000000000000000 v000000000000001 views at 0023786c for:
ffffffff8110d50b ffffffff8110d7c4 (DW_OP_reg3 (rbx))
00237893 v000000000000000 v000000000000000 views at 0023786e for:
ffffffff8110d806 ffffffff8110d854 (DW_OP_reg3 (rbx))
0023789a v000000000000000 v000000000000000 views at 00237870 for:
ffffffff8110d876 ffffffff8110d88e (DW_OP_reg3 (rbx))
The first entry at 0023787f has [8110d32b, 8110d4eb) (omitting the
ffffffff at the beginning), and the second one has [8110d4eb, 8110d50b).
Fixes: 2bc3cf575a ("perf annotate-data: Improve debug message with location info")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The die_get_typename() would resolve typedef and get to the original
type. But sometimes the original type is a struct without name and it
makes the output confusing and hard to read.
This is a diff of perf report -s type before and after the change.
New types such as atomic{,64}_t and sigset_t appeared and the portion
of unnamed struct was reduced. Also u32, u64 and size_t were splitted
from the base types.
--- b 2024-08-01 17:02:34.307809952 -0700
+++ a 2024-08-07 14:17:05.245853999 -0700
- 2.40% long unsigned int
+ 2.26% long unsigned int
- 1.56% unsigned int
+ 1.27% unsigned int
- 0.98% struct
- 0.79% long long unsigned int
+ 0.58% long long unsigned int
+ 0.36% struct
+ 0.27% atomic64_t
+ 0.22% u32
+ 0.21% u64
+ 0.19% atomic_t
+ 0.13% size_t
- 0.08% struct seqcount_spinlock
+ 0.08% seqcount_spinlock_t
+ 0.08% sigset_t
+ 0.08% __poll_t
Let's use the typedef name directly and the resolved to get the size of
the type.
Committer testing:
root@x1:~# diff -u before after | head -30
--- before 2024-08-08 09:35:13.917325041 -0300
+++ after 2024-08-08 09:37:35.312257905 -0300
@@ -10,25 +10,27 @@
# ........ .........
#
79.40% (unknown)
- 2.28% union
1.96% (stack operation)
- 1.24% struct
+ 1.87% pthread_mutex_t
0.99% u32[]
- 0.92% unsigned int
0.77% struct task_struct
+ 0.75% U32
0.75% struct pcpu_hot
0.63% struct qspinlock
+ 0.61% atomic_t
0.59% struct list_head
- 0.58% int
0.53% struct cfs_rq
0.51% BYTE*
- 0.48% unsigned char
+ 0.48% BYTE
0.48% long unsigned int
0.46% struct rq
0.41% struct worker
0.41% struct memcg_vmstats_percpu
+ 0.41% pthread_cond_t
0.37% _Bool
+ 0.36% int
root@x1:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
likely accesses the same binary again and again.
Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.
Note that this will introduce a little difference in the output due to
the order of checking stack operations. It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check. So it'll report stack operations in DSOs without debug
info as unknown. But I think it's ok and better to have the checking
near the caching logic.
Committer testing:
root@x1:~# perf mem record -a sleep 5s
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~# diff -u before after
--- before 2024-08-08 09:33:53.880780784 -0300
+++ after 2024-08-08 09:35:13.917325041 -0300
@@ -81,8 +81,8 @@
# Overhead Data Type
# ........ .........
#
- 55.43% (unknown)
- 11.61% (stack operation)
+ 55.56% (unknown)
+ 11.48% (stack operation)
4.93% struct pcpu_hot
3.26% unsigned int
2.48% struct
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I sometimes see ("unknown type") in the result and it was because it
didn't check the type of stack variables properly during the instruction
tracking. The stack can carry constant values (without type info) and
if the target instruction is accessing the stack location, it resulted
in the "unknown type".
Maybe we could pick one of integer types for the constant, but it
doesn't really mean anything useful. Let's just drop the stack slot if
it doesn't have a valid type info.
Here's an example how it got the unknown type.
Note that 0xffffff48 = -0xb8.
-----------------------------------------------------------
find data type for 0xffffff48(reg6) at ...
CU for ...
frame base: cfa=0 fbreg=6
scope: [2/2] (die:11cb97f)
bb: [37 - 3a]
var [37] reg15 type='int' size=0x4 (die:0x1180633)
bb: [40 - 4b]
mov [40] imm=0x1 -> reg13
var [45] reg8 type='sigset_t*' size=0x8 (die:0x11a39ee)
mov [45] imm=0x1 -> reg2 <--- here reg2 has a constant
bb: [215 - 237]
mov [218] reg2 -> -0xb8(stack) constant <--- and save it to the stack
mov [225] reg13 -> -0xc4(stack) constant
call [22f] find_task_by_vgpid
call [22f] return -> reg0 type='struct task_struct*' size=0x8 (die:0x11881e8)
bb: [5c8 - 5cf]
bb: [2fb - 302]
mov [2fb] -0xc4(stack) -> reg13 constant
bb: [13b - 14d]
mov [143] 0xd50(reg3) -> reg5 type='struct task_struct*' size=0x8 (die:0xa31f3c)
bb: [153 - 153]
chk [153] reg6 offset=0xffffff48 ok=0 kind=0 fbreg <--- access here
found by insn track: 0xffffff48(reg6) type-offset=0
type='G<EF>^K<F6><AF>U' size=0 (die:0xffffffffffffffff)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following instruction pattern is used to access a global variable.
mov $0x231c0, %rax
movsql %edi, %rcx
mov -0x7dc94ae0(,%rcx,8), %rcx
cmpl $0x0, 0xa60(%rcx,%rax,1) <<<--- here
The first instruction set the address of the per-cpu variable (here, it
is 'runqueues' of type 'struct rq'). The second instruction seems like
a cpu number of the per-cpu base. The third instruction get the base
offset of per-cpu area for that cpu. The last instruction compares the
value of the per-cpu variable at the offset of 0xa60.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Like per-cpu base offset array, sometimes it accesses the global
variable directly using the offset. Allow this type of instructions as
long as it finds a global variable for the address.
movslq %edi, %rcx
mov -0x7dc94ae0(,%rcx,8), %rcx <<<--- here
As %rcx has a valid type (i.e. array index) from the first instruction,
it will be checked by the first case in check_matching_type(). But as
it's not a pointer type, the match will fail. But in this case, it
should check if it accesses the kernel global array variable.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In some cases, the stack pointer on x86 (rsp = reg7) is used to point
variables on stack but it's not the frame base register. Then it
should handle the register like normal registers (IOW not to access
the other stack variables using offset calculation) but it should not
assume it would have a pointer.
Before:
-----------------------------------------------------------
find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
CU for net/ipv4/tcp.c (die:0x7b5f516)
frame base: cfa=0 fbreg=6
no pointer or no type
check variable "zc" failed (die: 0x7b9580a)
variable location: base=reg7, offset=0x40
type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)
After:
-----------------------------------------------------------
find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
CU for net/ipv4/tcp.c (die:0x7b5f516)
frame base: cfa=0 fbreg=6
found "zc" in scope=3/3 (die: 0x7b957fc) type_offset=0x3c
variable location: base=reg7, offset=0x40
type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)
Note that the type-offset was properly calculated to 0x3c as the
variable starts at 0x40.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To verify it found the correct variable, let's add the location
expression to the debug message.
$ perf --debug type-profile annotate --data-type
...
-----------------------------------------------------------
find data type for 0xaf0(reg15) at schedule+0xeb
CU for kernel/sched/core.c (die:0x1180523)
frame base: cfa=0 fbreg=6
found "rq" in scope=3/4 (die: 0x11b6a00) type_offset=0xaf0
variable location: reg15
type='struct rq' size=0xfc0 (die:0x11892e2)
-----------------------------------------------------------
find data type for 0x7bc(reg3) at tcp_get_info+0x62
CU for net/ipv4/tcp.c (die:0x7b5f516)
frame base: cfa=0 fbreg=6
offset: 1980 is bigger than size: 760
check variable "sk" failed (die: 0x7b92b2c)
variable location: reg3
type='struct sock' size=0x2f8 (die:0x7b63c3a)
-----------------------------------------------------------
...
The first case is fine. It looked up a data type in r15 with offset of
0xaf0 at schedule+0xeb. It found the CU die and the frame base info and
the variable "rq" was found in the scope 3/4. Its location is the r15
register and the type size is 0xfc0 which includes 0xaf0.
But the second case is not good. It looked up a data type in rbx (reg3)
with offset 0x7bc. It found a CU and the frame base which is good so
far. And it also found a variable "sk" but the access offset is bigger
than the type size (1980 vs. 760 or 0x7bc vs. 0x2f8). The variable has
the right location (reg3) but I need to figure out why it accesses
beyond what it's supposed to.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-2-namhyung@kernel.org
[ Fix the build on 32-bit by casting Dwarf_Word to (long) in pr_debug_location() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Support data type profiling output on TUI.
Testing from Arnaldo:
First make sure that the debug information for your workload binaries
in embedded in them by building it with '-g' or install the debuginfo
packages, since our workload is 'find':
root@number:~# type find
find is hashed (/usr/bin/find)
root@number:~# rpm -qf /usr/bin/find
findutils-4.9.0-5.fc39.x86_64
root@number:~# dnf debuginfo-install findutils
<SNIP>
root@number:~#
Then collect some data:
root@number:~# echo 1 > /proc/sys/vm/drop_caches
root@number:~# perf mem record find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ]
root@number:~#
Finally do data-type annotation with the following command, that will
default, as 'perf report' to the --tui mode, with lines colored to
highlight the hotspots, etc.
root@number:~# perf annotate --data-type
Annotate type: 'struct predicate' (58 samples)
Percent Offset Size Field
100.00 0 312 struct predicate {
0.00 0 8 PRED_FUNC pred_func;
0.00 8 8 char* p_name;
0.00 16 4 enum predicate_type p_type;
0.00 20 4 enum predicate_precedence p_prec;
0.00 24 1 _Bool side_effects;
0.00 25 1 _Bool no_default_print;
0.00 26 1 _Bool need_stat;
0.00 27 1 _Bool need_type;
0.00 28 1 _Bool need_inum;
0.00 32 4 enum EvaluationCost p_cost;
0.00 36 4 float est_success_rate;
0.00 40 1 _Bool literal_control_chars;
0.00 41 1 _Bool artificial;
0.00 48 8 char* arg_text;
<SNIP>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For data type profiling, it removed non-instruction lines from the list
of annotation lines. It was to simplify the implementation dealing with
instructions like to calculate the PC-relative address and to search the
shortest path to the target instruction or basic block.
But it means that it removes all the comments and debug information in
the annotate output like source file name and line numbers. To support
both code annotation and data type annotation, it'd be better to keep
the non-instruction lines as well.
So this change is to skip those lines during the data type profiling
and to display them in the normal perf annotate output.
No function changes intended (other than having more lines).
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are different patterns for percpu variable access using a constant
value added to the base.
2aeb: mov -0x7da0f7e0(,%rax,8),%r14 # r14 = __per_cpu_offset[cpu]
2af3: mov $0x34740,%rax # rax = address of runqueues
* 2afa: add %rax,%r14 # r14 = &per_cpu(runqueues, cpu)
2bfd: cmpl $0x0,0x10(%r14) # cpu_rq(cpu)->has_blocked_load
2b03: je 0x2b36
At the first instruction, r14 has the __per_cpu_offset. And then rax
has an immediate value and then added to r14 to calculate the address of
a per-cpu variable. So it needs to track the immediate values and ADD
instructions.
Similar but a little different case is to use "this_cpu_off" instead of
"__per_cpu_offset" for the current CPU. This time the variable address
comes with PC-rel addressing.
89: mov $0x34740,%rax # rax = address of runqueues
* 90: add %gs:0x7f015f60(%rip),%rax # 19a78 <this_cpu_off>
98: incl 0xd8c(%rax) # cpu_rq(cpu)->sched_count
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240319055115.4063940-21-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is to support per-cpu variable access often without a matching
DWARF entry. For some reason, I cannot find debug info of per-cpu
variables sometimes. They have more complex pattern to calculate the
address of per-cpu variables like below.
2b7d: mov -0x1e0(%rbp),%rax ; rax = cpu
2b84: mov -0x7da0f7e0(,%rax,8),%rcx ; rcx = __per_cpu_offset[cpu]
* 2b8c: mov 0x34870(%rcx),%rax ; *(__per_cpu_offset[cpu] + 0x34870)
Let's assume the rax register has a number for a CPU at 2b7d. The next
instruction is to get the per-cpu offset' for that cpu. The offset
-0x7da0f7e0 is 0xffffffff825f0820 in u64 which is the address of the
'__per_cpu_offset' array in my system. So it'd get the actual offset
of that CPU's per-cpu region and save it to the rcx register.
Then, at 2b8c, accesses using rcx can be handled same as the global
variable access. To handle this case, it should check if the offset
of the instruction matches to the address of '__per_cpu_offset'.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240319055115.4063940-20-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>