In final testing of:
1fc502712c.1680954589.git.pengdonglin@sangfor.com.cn/
"function_graph: Support recording and printing the return value of function"
The test failed due to a new warning found in the build:
kernel/trace/fgraph.c:243:56: warning: ‘struct fgraph_ret_regs’ declared inside parameter list will not be visible outside of this definition or declaration
Instead of asking to send another patch series, just add it and then apply
the updates.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull tracing fixes from Steven Rostedt:
- Fix MAINTAINERS file to point to proper mailing list for rtla and rv
The mailing list pointed to linux-trace-devel instead of
linux-trace-kernel. The former is for the tracing libraries and the
latter is for anything in the Linux kernel tree. The wrong mailing
list was used because linux-trace-kernel did not exist when rtla and
rv were created.
- User events:
- Fix matching of dynamic events to their user events
When user writes to dynamic_events file, a lookup of the
registered dynamic events is made, but there were some cases that
a match could be incorrectly made.
- Add auto cleanup of user events
Have the user events automatically get removed when the last
reference (file descriptor) is closed. This was asked for to
prevent leaks of user events hanging around needing admins to
clean them up.
- Add persistent logic (but not let user space use it yet)
In some cases, having a persistent user event (one that does not
get cleaned up automatically) is useful. But there's still debates
about how to expose this to user space. The infrastructure is
added, but the API is not.
- Update the selftests
Update the user event selftests to reflect the above changes"
* tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/user_events: Document auto-cleanup and remove dyn_event refs
selftests/user_events: Adapt dyn_test to non-persist events
selftests/user_events: Ensure auto cleanup works as expected
tracing/user_events: Add auto cleanup and future persist flag
tracing/user_events: Track refcount consistently via put/get
tracing/user_events: Store register flags on events
tracing/user_events: Remove user_ns walk for groups
selftests/user_events: Add perf self-test for empty arguments events
selftests/user_events: Clear the events after perf self-test
selftests/user_events: Add ftrace self-test for empty arguments events
tracing/user_events: Fix the incorrect trace record for empty arguments events
tracing: Modify print_fields() for fields output order
tracing/user_events: Handle matching arguments that is null from dyn_events
tracing/user_events: Prevent same name but different args event
tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
Currently user events need to be manually deleted via the delete IOCTL
call or via the dynamic_events file. Most operators and processes wish
to have these events auto cleanup when they are no longer used by
anything to prevent them piling without manual maintenance. However,
some operators may not want this, such as pre-registering events via the
dynamic_events tracefs file.
Update user_event_put() to attempt an auto delete of the event if it's
the last reference. The auto delete must run in a work queue to ensure
proper behavior of class->reg() invocations that don't expect the call
to go away from underneath them during the unregister. Add work_struct
to user_event struct to ensure we can do this reliably.
Add a persist flag, that is not yet exposed, to ensure we can toggle
between auto-cleanup and leaving the events existing in the future. When
a non-zero flag is seen during register, return -EINVAL to ensure ABI
is clear for the user processes while we work out the best approach for
persistent events.
Link: https://lkml.kernel.org/r/20230614163336.5797-4-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/20230518093600.3f119d68@rorschach.local.home/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Various parts of the code today track user_event's refcnt field directly
via a refcount_add/dec. This makes it hard to modify the behavior of the
last reference decrement in all code paths consistently. For example, in
the future we will auto-delete events upon the last reference going
away. This last reference could happen in many places, but we want it to
be consistently handled.
Add user_event_get() and user_event_put() for the add/dec. Update all
places where direct refcounts are being used to utilize these new
functions. In each location pass if event_mutex is locked or not. This
allows us to drop events automatically in future patches clearly. Ensure
when caller states the lock is held, it really is (or is not) held.
Link: https://lkml.kernel.org/r/20230614163336.5797-3-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently we don't have any available flags for user processes to use to
indicate options for user_events. We will soon have a flag to indicate
the event should or should not auto-delete once it's not being used by
anyone.
Add a reg_flags field to user_events and parameters to existing
functions to allow for this in future patches.
Link: https://lkml.kernel.org/r/20230614163336.5797-2-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/sch_taprio.c
d636fc5dd6 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
dced11ef84 ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()")
net/ipv4/sysctl_net_ipv4.c
e209fee411 ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294")
ccce324dab ("tcp: make the first N SYN RTO backoffs linear")
https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-07
We've added 7 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 112 insertions(+), 7 deletions(-).
The main changes are:
1) Fix a use-after-free in BPF's task local storage, from KP Singh.
2) Make struct path handling more robust in bpf_d_path, from Jiri Olsa.
3) Fix a syzbot NULL-pointer dereference in sockmap, from Eric Dumazet.
4) UAPI fix for BPF_NETFILTER before final kernel ships,
from Florian Westphal.
5) Fix map-in-map array_map_gen_lookup code generation where elem_size was
not being set for inner maps, from Rhys Rustad-Elliott.
6) Fix sockopt_sk selftest's NETLINK_LIST_MEMBERSHIPS assertion,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add extra path pointer check to d_path helper
selftests/bpf: Fix sockopt_sk selftest
bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
selftests/bpf: Add access_inner_map selftest
bpf: Fix elem_size not being set for inner maps
bpf: Fix UAF in task local storage
bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
====================
Link: https://lore.kernel.org/r/20230607220514.29698-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Anastasios reported crash on stable 5.15 kernel with following
BPF attached to lsm hook:
SEC("lsm.s/bprm_creds_for_exec")
int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm)
{
struct path *path = &bprm->executable->f_path;
char p[128] = { 0 };
bpf_d_path(path, p, 128);
return 0;
}
But bprm->executable can be NULL, so bpf_d_path call will crash:
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
...
RIP: 0010:d_path+0x22/0x280
...
Call Trace:
<TASK>
bpf_d_path+0x21/0x60
bpf_prog_db9cf176e84498d9_bprm_creds_for_exec+0x94/0x99
bpf_trampoline_6442506293_0+0x55/0x1000
bpf_lsm_bprm_creds_for_exec+0x5/0x10
security_bprm_creds_for_exec+0x29/0x40
bprm_execve+0x1c1/0x900
do_execveat_common.isra.0+0x1af/0x260
__x64_sys_execve+0x32/0x40
It's problem for all stable trees with bpf_d_path helper, which was
added in 5.9.
This issue is fixed in current bpf code, where we identify and mark
trusted pointers, so the above code would fail even to load.
For the sake of the stable trees and to workaround potentially broken
verifier in the future, adding the code that reads the path object from
the passed pointer and verifies it's valid in kernel space.
Fixes: 6e22ab9da7 ("bpf: Add d_path helper")
Reported-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230606181714.532998-1-jolsa@kernel.org
Check the target function has non-void retval type and set the correct
fetch type if user doesn't specify it.
If the function returns void, $retval is rejected as below;
# echo 'f unregister_kprobes%return $retval' >> dynamic_events
sh: write error: No such file or directory
# cat error_log
[ 37.488397] trace_fprobe: error: This function returns 'void' type
Command: f unregister_kprobes%return $retval
^
Link: https://lore.kernel.org/all/168507476195.913472.16290308831790216609.stgit@mhiramat.roam.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add the '$arg*' meta fetch argument for function-entry probe events. This
will be expanded to the all arguments of the function and the tracepoint
using BTF function argument information.
e.g.
# echo 'p vfs_read $arg*' >> dynamic_events
# echo 'f vfs_write $arg*' >> dynamic_events
# echo 't sched_overutilized_tp $arg*' >> dynamic_events
# cat dynamic_events
p:kprobes/p_vfs_read_0 vfs_read file=file buf=buf count=count pos=pos
f:fprobes/vfs_write__entry vfs_write file=file buf=buf count=count pos=pos
t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized
Also, single '$arg[0-9]*' will be converted to the BTF function argument.
NOTE: This seems like a wildcard, but a fake one at this moment. This
is just for telling user that this can be expanded to several arguments.
And it is not like other $-vars, you can not use this $arg* as a part of
fetch args, e.g. specifying name "foo=$arg*" and using it in dereferences
"+0($arg*)" will lead a parse error.
Link: https://lore.kernel.org/all/168507475126.913472.18329684401466211816.stgit@mhiramat.roam.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Support function or tracepoint parameters by name if BTF support is enabled
and the event is for function entry (this feature can be used with kprobe-
events, fprobe-events and tracepoint probe events.)
Note that the BTF variable syntax does not require a prefix. If it starts
with an alphabetic character or an underscore ('_') without a prefix like
'$' and '%', it is considered as a BTF variable.
If you specify only the BTF variable name, the argument name will also
be the same name instead of 'arg*'.
# echo 'p vfs_read count pos' >> dynamic_events
# echo 'f vfs_write count pos' >> dynamic_events
# echo 't sched_overutilized_tp rd overutilized' >> dynamic_events
# cat dynamic_events
p:kprobes/p_vfs_read_0 vfs_read count=count pos=pos
f:fprobes/vfs_write__entry vfs_write count=count pos=pos
t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized
Link: https://lore.kernel.org/all/168507474014.913472.16963996883278039183.stgit@mhiramat.roam.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Allow fprobe_events to trace raw tracepoints so that user can trace
tracepoints which don't have traceevent wrappers. This new event is
always available if the fprobe_events is enabled (thus no kconfig),
because the fprobe_events depends on the trace-event and traceporint.
e.g.
# echo 't sched_overutilized_tp' >> dynamic_events
# echo 't 9p_client_req' >> dynamic_events
# cat dynamic_events
t:tracepoints/sched_overutilized_tp sched_overutilized_tp
t:tracepoints/_9p_client_req 9p_client_req
The event name is based on the tracepoint name, but if it is started
with digit character, an underscore '_' will be added.
NOTE: to avoid further confusion, this renames TPARG_FL_TPOINT to
TPARG_FL_TEVENT because this flag is used for eprobe (trace-event probe).
And reuse TPARG_FL_TPOINT for this raw tracepoint probe.
Link: https://lore.kernel.org/all/168507471874.913472.17214624519622959593.stgit@mhiramat.roam.corp.google.com/
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305020453.afTJ3VVp-lkp@intel.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add fprobe events for tracing function entry and exit instead of kprobe
events. With this change, we can continue to trace function entry/exit
even if the CONFIG_KPROBES_ON_FTRACE is not available. Since
CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS,
it is not available if the architecture only supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS. And that means kprobe events can not
probe function entry/exit effectively on such architecture.
But this can be solved if the dynamic events supports fprobe events.
The fprobe event is a new dynamic events which is only for the function
(symbol) entry and exit. This event accepts non register fetch arguments
so that user can trace the function arguments and return values.
The fprobe events syntax is here;
f[:[GRP/][EVENT]] FUNCTION [FETCHARGS]
f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS]
E.g.
# echo 'f vfs_read $arg1' >> dynamic_events
# echo 'f vfs_read%return $retval' >> dynamic_events
# cat dynamic_events
f:fprobes/vfs_read__entry vfs_read arg1=$arg1
f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
# echo 1 > events/fprobes/enable
# head -n 20 trace | tail
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
sh-142 [005] ...1. 448.386420: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.386436: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
sh-142 [005] ...1. 448.386451: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.386458: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
sh-142 [005] ...1. 448.386469: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.386476: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
sh-142 [005] ...1. 448.602073: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.602089: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
Link: https://lore.kernel.org/all/168507469754.913472.6112857614708350210.stgit@mhiramat.roam.corp.google.com/
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202302011530.7vm4O8Ro-lkp@intel.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Pull probes fixes from Masami Hiramatsu:
- Return NULL if the trace_probe list on trace_probe_event is empty
- selftests/ftrace: Choose testing symbol name for filtering feature
from sample data instead of fixed symbol
* tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/ftrace: Choose target function for filter test from samples
tracing/probe: trace_probe_primary_from_call(): checked list_first_entry
When all kernel debugging is enabled (lockdep, KSAN, etc), the function
graph enabling and disabling can take several seconds to complete. The
function_graph selftest enables and disables function graph tracing
several times. With full debugging enabled, the soft lockup watchdog was
triggering because the selftest was running without ever scheduling.
Add cond_resched() throughout the test to make sure it does not trigger
the soft lockup detector.
Link: https://lkml.kernel.org/r/20230528051742.1325503-6-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The tracing_selftest_running and tracing_selftest_disabled variables were
to keep trace_printk() and other writes from affecting the tracing
selftests, as the tracing selftests would examine the ring buffer to see
if it contained what it expected or not. trace_printk() and friends could
add to the ring buffer and cause the selftests to fail (and then disable
the tracer that was being tested). To keep that from happening, these
variables were added and would keep trace_printk() and friends from
writing to the ring buffer while the tests were going on.
But this was only the top level ring buffer (owned by the global_trace
instance). There is no reason to prevent writing into ring buffers of
other instances via the trace_array_printk() and friends. For the
functions that could be used by other instances, check if the global_trace
is the tracer instance that is being written to before deciding to not
allow the write.
Link: https://lkml.kernel.org/r/20230528051742.1325503-5-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
As there are more and more internal selftests being added to the Linux
kernel (KSAN, lockdep, etc) the selftests are taking longer to run when
these are enabled. Add a cond_resched() to the calling of
do_run_tracer_selftest() to force a schedule if NEED_RESCHED is set,
otherwise the soft lockup watchdog may trigger on boot up.
Link: https://lkml.kernel.org/r/20230528051742.1325503-3-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The variables tracing_selftest_running and tracing_selftest_disabled are
only used for when CONFIG_FTRACE_STARTUP_TEST is enabled. Make them only
visible within the selftest code. The setting of those variables are in
the register_tracer() call, and set in a location where they do not need
to be. Create a wrapper around run_tracer_selftest() called
do_run_tracer_selftest() which sets those variables, and have
register_tracer() call that instead.
Having those variables only set within the CONFIG_FTRACE_STARTUP_TEST
scope gets rid of them (and also the ability to remove testing against
them) when the startup tests are not enabled (most cases).
Link: https://lkml.kernel.org/r/20230528051742.1325503-2-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The histogram and synthetic events can use a pseudo event called
"stacktrace" that will create a stacktrace at the time of the event and
use it just like it was a normal field. We have other pseudo events such
as "common_cpu" and "common_timestamp". To stay consistent with that,
convert "stacktrace" to "common_stacktrace". As this was used in older
kernels, to keep backward compatibility, this will act just like
"common_cpu" did with "cpu". That is, "cpu" will be the same as
"common_cpu" unless the event has a "cpu" field. In which case, the
event's field is used. The same is true with "stacktrace".
Also update the documentation to reflect this change.
Link: https://lore.kernel.org/linux-trace-kernel/20230523230913.6860e28d@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Modifiers are used to change the behavior of keys. For instance, they
can grouped into buckets, converted to syscall names (from the syscall
identifier), show task->comm of the current pid, be an array of longs
that represent a stacktrace, and more.
It was found that nothing stopped a value from taking a modifier. As
values are simple counters. If this happened, it would call code that
was not expecting a modifier and crash the kernel. This was fixed by
having the ___create_val_field() function test if a modifier was present
and fail if one was. This fixed the crash.
Now there's a problem with variables. Variables are used to pass fields
from one event to another. Variables are allowed to have some modifiers,
as the processing may need to happen at the time of the event (like
stacktraces and comm names of the current pid). The issue is that it too
uses __create_val_field(). Now that fails on modifiers, variables can no
longer use them (this is a regression).
As not all modifiers are for variables, have them use a separate check.
Link: https://lore.kernel.org/linux-trace-kernel/20230523221108.064a5d82@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: e0213434fe ("tracing: Do not let histogram values have some modifiers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
While testing rtla timerlat auto analysis, I reach a condition where
the interface was not receiving tracing data. I was able to manually
reproduce the problem with these steps:
# echo 0 > tracing_on # disable trace
# echo 1 > osnoise/stop_tracing_us # stop trace if timerlat irq > 1 us
# echo timerlat > current_tracer # enable timerlat tracer
# sleep 1 # wait... that is the time when rtla
# apply configs like prio or cgroup
# echo 1 > tracing_on # start tracing
# cat trace
# tracer: timerlat
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# ||||| ACTIVATION
# TASK-PID CPU# ||||| TIMESTAMP ID CONTEXT LATENCY
# | | | ||||| | | | |
NOTHING!
Then, trying to enable tracing again with echo 1 > tracing_on resulted
in no change: the trace was still not tracing.
This problem happens because the timerlat IRQ hits the stop tracing
condition while tracing is off, and do not wake up the timerlat thread,
so the timerlat threads are kept sleeping forever, resulting in no
trace, even after re-enabling the tracer.
Avoid this condition by always waking up the threads, even after stopping
tracing, allowing the tracer to return to its normal operating after
a new tracing on.
Link: https://lore.kernel.org/linux-trace-kernel/1ed8f830638b20a39d535d27d908e319a9a3c4e2.1683822622.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: stable@vger.kernel.org
Fixes: a955d7eac1 ("trace: Add timerlat tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
fprobe_hander and fprobe_kprobe_handler has guarded ftrace recursion
detection but fprobe_exit_handler has not, which possibly introduce
recursive calls if the fprobe exit callback calls any traceable
functions. Checking in fprobe_hander or fprobe_kprobe_handler
is not enough and misses this case.
So add recursion free guard the same way as fprobe_hander. Since
ftrace recursion check does not employ ip(s), so here use entry_ip and
entry_parent_ip the same as fprobe_handler.
Link: https://lore.kernel.org/all/20230517034510.15639-4-zegao@tencent.com/
Fixes: 5b0ab78998 ("fprobe: Add exit_handler support")
Signed-off-by: Ze Gao <zegao@tencent.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Current implementation calls kprobe related functions before doing
ftrace recursion check in fprobe_kprobe_handler, which opens door
to kernel crash due to stack recursion if preempt_count_{add, sub}
is traceable in kprobe_busy_{begin, end}.
Things goes like this without this patch quoted from Steven:
"
fprobe_kprobe_handler() {
kprobe_busy_begin() {
preempt_disable() {
preempt_count_add() { <-- trace
fprobe_kprobe_handler() {
[ wash, rinse, repeat, CRASH!!! ]
"
By refactoring the common part out of fprobe_kprobe_handler and
fprobe_handler and call ftrace recursion detection at the very beginning,
the whole fprobe_kprobe_handler is free from recursion.
[ Fix the indentation of __fprobe_handler() parameters. ]
Link: https://lore.kernel.org/all/20230517034510.15639-3-zegao@tencent.com/
Fixes: ab51e15d53 ("fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobe")
Signed-off-by: Ze Gao <zegao@tencent.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
The commit 39d954200b ("fprobe: Skip exit_handler if entry_handler returns
!0") introduced a hidden dependency of 'ret' local variable in the
fprobe_handler(), Smatch warns the `ret` can be accessed without
initialization.
kernel/trace/fprobe.c:59 fprobe_handler()
error: uninitialized symbol 'ret'.
kernel/trace/fprobe.c
49 fpr->entry_ip = ip;
50 if (fp->entry_data_size)
51 entry_data = fpr->data;
52 }
53
54 if (fp->entry_handler)
55 ret = fp->entry_handler(fp, ip, ftrace_get_regs(fregs), entry_data);
ret is only initialized if there is an ->entry_handler
56
57 /* If entry_handler returns !0, nmissed is not counted. */
58 if (rh) {
rh is only true if there is an ->exit_handler. Presumably if you have
and ->exit_handler that means you also have a ->entry_handler but Smatch
is not smart enough to figure it out.
--> 59 if (ret)
^^^
Warning here.
60 rethook_recycle(rh);
61 else
62 rethook_hook(rh, ftrace_get_regs(fregs), true);
63 }
64 out:
65 ftrace_test_recursion_unlock(bit);
66 }
Link: https://lore.kernel.org/all/168100731160.79534.374827110083836722.stgit@devnote2/
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/85429a5c-a4b9-499e-b6c0-cbd313291c49@kili.mountain
Fixes: 39d954200b ("fprobe: Skip exit_handler if entry_handler returns !0")
Acked-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-05-16
We've added 57 non-merge commits during the last 19 day(s) which contain
a total of 63 files changed, 3293 insertions(+), 690 deletions(-).
The main changes are:
1) Add precision propagation to verifier for subprogs and callbacks,
from Andrii Nakryiko.
2) Improve BPF's {g,s}setsockopt() handling with wrong option lengths,
from Stanislav Fomichev.
3) Utilize pahole v1.25 for the kernel's BTF generation to filter out
inconsistent function prototypes, from Alan Maguire.
4) Various dyn-pointer verifier improvements to relax restrictions,
from Daniel Rosenberg.
5) Add a new bpf_task_under_cgroup() kfunc for designated task,
from Feng Zhou.
6) Unblock tests for arm64 BPF CI after ftrace supporting direct call,
from Florent Revest.
7) Add XDP hint kfunc metadata for RX hash/timestamp for igc,
from Jesper Dangaard Brouer.
8) Add several new dyn-pointer kfuncs to ease their usability,
from Joanne Koong.
9) Add in-depth LRU internals description and dot function graph,
from Joe Stringer.
10) Fix KCSAN report on bpf_lru_list when accessing node->ref,
from Martin KaFai Lau.
11) Only dump unprivileged_bpf_disabled log warning upon write,
from Kui-Feng Lee.
12) Extend test_progs to directly passing allow/denylist file,
from Stephen Veiss.
13) Fix BPF trampoline memleak upon failure attaching to fentry,
from Yafang Shao.
14) Fix emitting struct bpf_tcp_sock type in vmlinux BTF,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (57 commits)
bpf: Fix memleak due to fentry attach failure
bpf: Remove bpf trampoline selector
bpf, arm64: Support struct arguments in the BPF trampoline
bpftool: JIT limited misreported as negative value on aarch64
bpf: fix calculation of subseq_idx during precision backtracking
bpf: Remove anonymous union in bpf_kfunc_call_arg_meta
bpf: Document EFAULT changes for sockopt
selftests/bpf: Correctly handle optlen > 4096
selftests/bpf: Update EFAULT {g,s}etsockopt selftests
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
libbpf: fix offsetof() and container_of() to work with CO-RE
bpf: Address KCSAN report on bpf_lru_list
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
selftests/bpf: Accept mem from dynptr in helper funcs
bpf: verifier: Accept dynptr mem as mem in helpers
selftests/bpf: Check overflow in optional buffer
selftests/bpf: Test allowing NULL buffer in dynptr slice
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Add testcase for bpf_task_under_cgroup
bpf: Add bpf_task_under_cgroup() kfunc
...
====================
Link: https://lore.kernel.org/r/20230515225603.27027-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull more tracing updates from Steven Rostedt:
- Make buffer_percent read/write.
The buffer_percent file is how users can state how long to block on
the tracing buffer depending on how much is in the buffer. When it
hits the "buffer_percent" it will wake the task waiting on the
buffer. For some reason it was set to read-only.
This was not noticed because testing was done as root without
SELinux, but with SELinux it will prevent even root to write to it
without having CAP_DAC_OVERRIDE.
- The "touched_functions" was added this merge window, but one of the
reasons for adding it was not implemented.
That was to show what functions were not only touched, but had either
a direct trampoline attached to it, or a kprobe or live kernel
patching that can "hijack" the function to run a different function.
The point is to know if there's functions in the kernel that may not
be behaving as the kernel code shows. This can be used for debugging.
TODO: Add this information to kernel oops too.
* tag 'trace-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attached
tracing: Fix permissions for the buffer_percent file
If a function had ever had IPMODIFY or DIRECT attached to it, where this
is how live kernel patching and BPF overrides work, mark them and display
an "M" in the enabled_functions and touched_functions files. This can be
used for debugging. If a function had been modified and later there's a bug
in the code related to that function, this can be used to know if the cause
is possibly from a live kernel patch or a BPF program that changed the
behavior of the code.
Also update the documentation on the enabled_functions and
touched_functions output, as it was missing direct callers and CALL_OPS.
And include this new modify attribute.
Link: https://lore.kernel.org/linux-trace-kernel/20230502213233.004e3ae4@gandalf.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>