linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-04 21:42:31 -04:00

Author	SHA1	Message	Date
pengdonglin	f83ac7544f	function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously Currently, the funcgraph-args and funcgraph-retaddr features are mutually exclusive. This patch resolves this limitation by allowing funcgraph-retaddr to have an args array. To verify the change, use perf to trace vfs_write with both options enabled: Before: # perf ftrace -G vfs_write --graph-opts args,retaddr ...... down_read() { /* <-n_tty_write+0xa3/0x540 / __cond_resched(); / <-down_read+0x12/0x160 / preempt_count_add(); / <-down_read+0x3b/0x160 / preempt_count_sub(); / <-down_read+0x8b/0x160 / } After: # perf ftrace -G vfs_write --graph-opts args,retaddr ...... down_read(sem=0xffff8880100bea78) { / <-n_tty_write+0xa3/0x540 / __cond_resched(); / <-down_read+0x12/0x160 / preempt_count_add(val=1); / <-down_read+0x3b/0x160 / preempt_count_sub(val=1); / <-down_read+0x8b/0x160 */ } Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Xiaoqin Zhang <zhangxiaoqin@xiaomi.com> Link: https://patch.msgid.link/20251125093425.2563849-1-dolinux.peng@gmail.com Signed-off-by: pengdonglin <pengdonglin@xiaomi.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:30 -05:00
Masami Hiramatsu (Google)	20e7168326	tracing: Add boot-time backup of persistent ring buffer Currently, the persistent ring buffer instance needs to be read before using it. This means we have to wait for boot up user space and dump the persistent ring buffer. However, in that case we can not start tracing on it from the kernel cmdline. To solve this limitation, this adds an option which allows to create a trace instance as a backup of the persistent ring buffer at boot. If user specifies trace_instance=<BACKUP>=<PERSIST_RB> then the <BACKUP> instance is made as a copy of the <PERSIST_RB> instance. For example, the below kernel cmdline records all syscalls, scheduler and interrupt events on the persistent ring buffer `boot_map` but before starting the tracing, it makes a `backup` instance from the `boot_map`. Thus, the `backup` instance has the previous boot events. 'reserve_mem=12M:4M:trace trace_instance=boot_map@trace,syscalls:,sched:,irq:* trace_instance=backup=boot_map' As you can see, this just make a copy of entire reserved area and make a backup instance on it. So you can release (or shrink) the backup instance after use it to save the memory usage. /sys/kernel/tracing/instances # free total used free shared buff/cache available Mem: 1999284 55704 1930520 10132 13060 `1914628` Swap: 0 0 0 /sys/kernel/tracing/instances # rmdir backup/ /sys/kernel/tracing/instances # free total used free shared buff/cache available Mem: 1999284 40640 `1945584` 10132 13060 1929692 Swap: 0 0 0 Note: since there is no reason to make a copy of empty buffer, this backup only accepts a persistent ring buffer as the original instance. Also, since this backup is based on vmalloc(), it does not support user-space mmap(). Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/176377150002.219692.9425536150438129267.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:30 -05:00
Steven Rostedt	f93a7d0cac	ftrace: Allow tracing of some of the tracing code There is times when tracing the tracing infrastructure can be useful for debugging the tracing code. Currently all files in the tracing directory are set to "notrace" the functions. Add a new config option FUNCTION_SELF_TRACING that will allow some of the files in the tracing infrastructure to be traced. It requires a config to enable because it will add noise to the function tracer if events and other tracing features are enabled. Tracing functions and events together is quite common, so not tracing the event code should be the default. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251120181514.736f2d5f@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:30 -05:00
Steven Rostedt	400ddf1dbe	tracing: Use strim() in trigger_process_regex() instead of skip_spaces() The function trigger_process_regex() is called by a few functions, where only one calls strim() on the buffer passed to it. That leaves the other functions not trimming the end of the buffer passed in and making it a little inconsistent. Remove the strim() from event_trigger_regex_write() and have trigger_process_regex() use strim() instead of skip_spaces(). The buff variable is not passed in as const, so it can be modified. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251125214032.323747707@kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:30 -05:00
Steven Rostedt	61d445af0a	tracing: Add bulk garbage collection of freeing event_trigger_data The event trigger data requires a full tracepoint_synchronize_unregister() call before freeing. That call can take 100s of milliseconds to complete. In order to allow for bulk freeing of the trigger data, it can not call the tracepoint_synchronize_unregister() for every individual trigger data being free. Create a kthread that gets created the first time a trigger data is freed, and have it use the lockless llist to get the list of data to free, run the tracepoint_synchronize_unregister() then free everything in the list. By freeing hundreds of event_trigger_data elements together, it only requires two runs of the synchronization function, and not hundreds of runs. This speeds up the operation by orders of magnitude (milliseconds instead of several seconds). Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251125214032.151674992@kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:30 -05:00
Steven Rostedt	78c7051394	tracing: Remove unneeded event_mutex lock in event_trigger_regex_release() In event_trigger_regex_release(), the only code is: mutex_lock(&event_mutex); if (file->f_mode & FMODE_READ) seq_release(inode, file); mutex_unlock(&event_mutex); return 0; There's nothing special about the file->f_mode or the seq_release() that requires any locking. Remove the unnecessary locks. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251125214031.975879283@kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:29 -05:00
Steven Rostedt	b052d70f7c	tracing: Merge struct event_trigger_ops into struct event_command Now that there's pretty much a one to one mapping between the struct event_trigger_ops and struct event_command, there's no reason to have two different structures. Merge the function pointers of event_trigger_ops into event_command. There's one exception in trace_events_hist.c for the event_hist_trigger_named_ops. This has special logic for the init and free function pointers for "named histograms". In this case, allocate the cmd_ops of the event_trigger_data and set it to the proper init and free functions, which are used to initialize and free the event_trigger_data respectively. Have the free function and the init function (on failure) free the cmd_ops of the data element. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251125200932.446322765@kernel.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:29 -05:00
Steven Rostedt	bdafb4d4cb	tracing: Remove get_trigger_ops() and add count_func() from trigger ops The struct event_command has a callback function called get_trigger_ops(). This callback returns the "trigger_ops" to use for the trigger. These ops define the trigger function, how to init the trigger, how to print the trigger and how to free it. The only reason there's a callback function to get these ops is because some triggers have two types of operations. One is an "always on" operation, and the other is a "count down" operation. If a user passes in a parameter to say how many times the trigger should execute. For example: echo stacktrace:5 > events/kmem/kmem_cache_alloc/trigger It will trigger the stacktrace for the first 5 times the kmem_cache_alloc event is hit. Instead of having two different trigger_ops since the only difference between them is the tigger itself (the print, init and free functions are all the same), just use a single ops that the event_command points to and add a function field to the trigger_ops to have a count_func. When a trigger is added to an event, if there's a count attached to it and the trigger ops has the count_func field, the data allocated to represent this trigger will have a new flag set called COUNT. Then when the trigger executes, it will check if the COUNT data flag is set, and if so, it will call the ops count_func(). If that returns false, it returns without executing the trigger. This removes the need for duplicate event_trigger_ops structures. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251125200932.274566147@kernel.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:29 -05:00
Masami Hiramatsu (Google)	23c0e9cc76	tracing: Show the tracer options in boot-time created instance Since tracer_init_tracefs_work_func() only updates the tracer options for the global_trace, the instances created by the kernel cmdline do not have those options. Fix to update tracer options for those boot-time created instances to show those options. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/176354112555.2356172.3989277078358802353.stgit@mhiramat.tok.corp.google.com Fixes: `428add559b` ("tracing: Have tracer option be instance specific") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:29 -05:00
Menglong Dong	7a6735cc9b	ftrace: Avoid redundant initialization in register_ftrace_direct The FTRACE_OPS_FL_INITIALIZED flag is cleared in register_ftrace_direct, which can make it initialized by ftrace_ops_init() even if it is already initialized. It seems that there is no big deal here, but let's still fix it. Link: https://patch.msgid.link/20251110121808.1559240-1-dongml2@chinatelecom.cn Fixes: `f64dd4627e` ("ftrace: Add multi direct register/unregister interface") Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:28 -05:00
Steven Rostedt	49c1364c7c	tracing: Remove unused variable in tracing_trace_options_show() The flags and opts used in tracing_trace_options_show() now come directly from the trace array "current_trace_flags" and not the current_trace. The variable "trace" was still being assigned to tr->current_trace but never used. This caused a warning in clang. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251117120637.43ef995d@gandalf.local.home Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com> Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Closes: https://lore.kernel.org/all/aRtHWXzYa8ijUIDa@black.igk.intel.com/ Fixes: `428add559b` ("tracing: Have tracer option be instance specific") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:28 -05:00
Steven Rostedt	ac87b220a6	fgraph: Make fgraph_no_sleep_time signed The variable fgraph_no_sleep_time changed from being a boolean to being a counter. A check is made to make sure that it never goes below zero. But the variable being unsigned makes the check always fail even if it does go below zero. Make the variable a signed int so that checking it going below zero still works. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251125104751.4c9c7f28@gandalf.local.home Fixes: `5abb6ccb58` ("tracing: Have function graph tracer option sleep-time be per instance") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/aR1yRQxDmlfLZzoo@stanley.mountain/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-26 15:13:28 -05:00
Steven Rostedt	bc089c4725	tracing: Convert function graph set_flags() to use a switch() statement Currently the set_flags() of the function graph tracer has a bunch of: if (bit == FLAG1) { [..] } if (bit == FLAG2) { [..] } To clean it up a bit, convert it over to a switch statement. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251114192319.117123664@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-14 14:30:55 -05:00
Steven Rostedt	5abb6ccb58	tracing: Have function graph tracer option sleep-time be per instance Currently the option to have function graph tracer to ignore time spent when a task is sleeping is global when the interface is per-instance. Changing the value in one instance will affect the results of another instance that is also running the function graph tracer. This can lead to confusing results. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251114192318.950255167@kernel.org Fixes: `c132be2c4f` ("function_graph: Have the instances use their own ftrace_ops for filtering") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-14 14:30:55 -05:00
Steven Rostedt	4132886e1b	tracing: Move graph-time out of function graph options The option "graph-time" affects the function profiler when it is using the function graph infrastructure. It has nothing to do with the function graph tracer itself. The option only affects the global function profiler and does nothing to the function graph tracer. Move it out of the function graph tracer options and make it a global option that is only available at the top level instance. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251114192318.781711154@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-14 14:30:55 -05:00
Steven Rostedt	6479325eca	tracing: Have function graph tracer option funcgraph-irqs be per instance Currently the option to trace interrupts in the function graph tracer is global when the interface is per-instance. Changing the value in one instance will affect the results of another instance that is also running the function graph tracer. This can lead to confusing results. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251114192318.613867934@kernel.org Fixes: `c132be2c4f` ("function_graph: Have the instances use their own ftrace_ops for filtering") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-14 14:30:54 -05:00
Yongliang Gao	97e047f44d	trace/pid_list: optimize pid_list->lock contention When the system has many cores and task switching is frequent, setting set_ftrace_pid can cause frequent pid_list->lock contention and high system sys usage. For example, in a 288-core VM environment, we observed 267 CPUs experiencing contention on pid_list->lock, with stack traces showing: #4 [ffffa6226fb4bc70] native_queued_spin_lock_slowpath at ffffffff99cd4b7e #5 [ffffa6226fb4bc90] _raw_spin_lock_irqsave at ffffffff99cd3e36 #6 [ffffa6226fb4bca0] trace_pid_list_is_set at ffffffff99267554 #7 [ffffa6226fb4bcc0] trace_ignore_this_task at ffffffff9925c288 #8 [ffffa6226fb4bcd8] ftrace_filter_pid_sched_switch_probe at ffffffff99246efe #9 [ffffa6226fb4bcf0] __schedule at ffffffff99ccd161 Replaces the existing spinlock with a seqlock to allow concurrent readers, while maintaining write exclusivity. Link: https://patch.msgid.link/20251113000252.1058144-1-leonylgao@gmail.com Reviewed-by: Huang Cun <cunhuang@tencent.com> Signed-off-by: Yongliang Gao <leonylgao@tencent.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-13 15:15:54 -05:00
Steven Rostedt	e29aa918a9	tracing: Have function graph tracer define options per instance Currently the function graph tracer's options are saved via a global mask when it should be per instance. Use the new infrastructure to define a "default_flags" field in the tracer structure that is used for the top level instance as well as new ones. Currently the global mask causes confusion: # cd /sys/kernel/tracing # mkdir instances/foo # echo function_graph > instances/foo/current_tracer # echo 1 > options/funcgraph-args # echo function_graph > current_tracer # cat trace [..] 2) \| _raw_spin_lock_irq(lock=0xffff96b97dea16c0) { 2) 0.422 us \| do_raw_spin_lock(lock=0xffff96b97dea16c0); 7) \| rcu_sched_clock_irq(user=0) { 2) 1.478 us \| } 7) 0.758 us \| rcu_is_cpu_rrupt_from_idle(); 2) 0.647 us \| enqueue_hrtimer(timer=0xffff96b97dea2058, base=0xffff96b97dea1740, mode=0); # cat instances/foo/options/funcgraph-args 1 # cat instances/foo/trace [..] 4) \| __x64_sys_read() { 4) \| ksys_read() { 4) 0.755 us \| fdget_pos(); 4) \| vfs_read() { 4) \| rw_verify_area() { 4) \| security_file_permission() { 4) \| apparmor_file_permission() { 4) \| common_file_perm() { 4) \| aa_file_perm() { 4) \| rcu_read_lock_held() { [..] The above shows that updating the "funcgraph-args" option at the top level instance also updates the "funcgraph-args" option in the instance but because the update is only done by the instance that gets changed (as it should), it's confusing to see that the option is already set in the other instance. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251111232429.641030027@kernel.org Fixes: `c132be2c4f` ("function_graph: Have the instances use their own ftrace_ops for filtering") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-13 15:08:17 -05:00
Steven Rostedt	76680d0d28	tracing: Have function tracer define options per instance Currently the function tracer's options are saved via a global mask when it should be per instance. Use the new infrastructure to define a "default_flags" field in the tracer structure that is used for the top level instance as well as new ones. Currently the global mask causes confusion: # cd /sys/kernel/tracing # mkdir instances/foo # echo function > instances/foo/current_tracer # echo 1 > options/func-args # echo function > current_tracer # cat trace [..] <idle>-0 [005] d..3. 1050.656187: rcu_needs_cpu() <-tick_nohz_next_event <idle>-0 [005] d..3. 1050.656188: get_next_timer_interrupt(basej=0x10002dbad, basem=0xf45fd7d300) <-tick_nohz_next_event <idle>-0 [005] d..3. 1050.656189: _raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt <idle>-0 [005] d..4. 1050.656190: do_raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt <idle>-0 [005] d..4. 1050.656191: _raw_spin_lock_nested(lock=0xffff8944bdf5f140, subclass=1) <-__get_next_timer_interrupt # cat instances/foo/options/func-args 1 # cat instances/foo/trace [..] kworker/4:1-88 [004] ...1. 298.127735: next_zone <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127736: first_online_pgdat <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127738: next_online_pgdat <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127739: fold_diff <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127741: round_jiffies_relative <-vmstat_update [..] The above shows that updating the "func-args" option at the top level instance also updates the "func-args" option in the instance but because the update is only done by the instance that gets changed (as it should), it's confusing to see that the option is already set in the other instance. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251111232429.470883736@kernel.org Fixes: `f20a580627` ("ftrace: Allow instances to use function tracing") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-12 09:59:54 -05:00
Steven Rostedt	428add559b	tracing: Have tracer option be instance specific Tracers can add specify options to modify them. This logic was added before instances were created and the tracer flags were global variables. After instances were created where a tracer may exist in more than one instance, the flags were not updated from being global into instance specific. This causes confusion with these options. For example, the function tracer has an option to enable function arguments: # cd /sys/kernel/tracing # mkdir instances/foo # echo function > instances/foo/current_tracer # echo 1 > options/func-args # echo function > current_tracer # cat trace [..] <idle>-0 [005] d..3. 1050.656187: rcu_needs_cpu() <-tick_nohz_next_event <idle>-0 [005] d..3. 1050.656188: get_next_timer_interrupt(basej=0x10002dbad, basem=0xf45fd7d300) <-tick_nohz_next_event <idle>-0 [005] d..3. 1050.656189: _raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt <idle>-0 [005] d..4. 1050.656190: do_raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt <idle>-0 [005] d..4. 1050.656191: _raw_spin_lock_nested(lock=0xffff8944bdf5f140, subclass=1) <-__get_next_timer_interrupt # cat instances/foo/options/func-args 1 # cat instances/foo/trace [..] kworker/4:1-88 [004] ...1. 298.127735: next_zone <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127736: first_online_pgdat <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127738: next_online_pgdat <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127739: fold_diff <-refresh_cpu_vm_stats kworker/4:1-88 [004] ...1. 298.127741: round_jiffies_relative <-vmstat_update [..] The above shows that setting "func-args" in the top level instance also set it in the instance "foo", but since the interface of the trace flags are per instance, the update didn't take affect in the "foo" instance. Update the infrastructure to allow tracers to add a "default_flags" field in the tracer structure that can be set instead of "flags" which will make the flags per instance. If a tracer needs to keep the flags global (like blktrace), keeping the "flags" field set will keep the old behavior. This does not update function or the function graph tracers. That will be handled later. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251111232429.305317942@kernel.org Fixes: `f20a580627` ("ftrace: Allow instances to use function tracing") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-12 09:59:54 -05:00
Masami Hiramatsu (Google)	7157062bb4	tracing: Report wrong dynamic event command Report wrong dynamic event type in the command via error_log. ----- # echo "z hoge" > /sys/kernel/tracing/dynamic_events sh: write error: Invalid argument # cat /sys/kernel/tracing/error_log [ 22.977022] dynevent: error: No matching dynamic event type Command: z hoge ^ ----- Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/176278970056.343441.10528135217342926645.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-10 19:26:14 -05:00
Steven Rostedt	3a0d5bc76f	tracing: Use switch statement instead of ifs in set_tracer_flag() The "mask" passed in to set_trace_flag() has a single bit set. The function then checks if the mask is equal to one of the option masks and performs the appropriate function associated to that option. Instead of having a bunch of "if ()" statement, use a "switch ()" statement instead to make it cleaner and a bit more optimal. No function changes. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251106003501.890298562@kernel.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-10 14:33:51 -05:00
Steven Rostedt	9c5053083e	tracing: Exit out immediately after update_marker_trace() The call to update_marker_trace() in set_tracer_flag() performs the update to the tr->trace_flags. There's no reason to perform it again after it is called. Return immediately instead. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251106003501.726406870@kernel.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-10 14:33:43 -05:00
Steven Rostedt	5aa0d18df0	tracing: Have add_tracer_options() error pass up to callers The function add_tracer_options() can fail, but currently it is ignored. Pass the status of add_tracer_options() up to adding a new tracer as well as when an instance is created. Have the instance creation fail if the add_tracer_options() fail. Only print a warning for the top level instance, like it does with other failures. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251105161935.375299297@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-10 14:28:31 -05:00
Steven Rostedt	c7bed15ccf	tracing: Remove dummy options and flags When a tracer does not define their own flags, dummy options and flags are used so that the values are always valid. There's not that many locations that reference these values so having dummy versions just complicates the code. Remove the dummy values and just check for NULL when appropriate. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20251105161935.206093132@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-10 14:27:04 -05:00
Steven Rostedt	a10e6e6818	tracing: Hide __NR_utimensat and _NR_mq_timedsend when not defined Some architectures (riscv-32) do not define __NR_utimensat and _NR_mq_timedsend, and fails to build when they are used. Hide them in "ifdef"s. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251104205310.00a1db9a@batman.local.home Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511031239.ZigDcWzY-lkp@intel.com/ Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-10 14:23:53 -05:00
Steven Rostedt	2f294c35c0	Merge branch 'topic/func-profiler-offset' of git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux into trace/trace/core Updates to the function profiler adds new options to tracefs. The options are currently defined by an enum as flags. The added options brings the number of options over 32, which means they can no longer be held in a 32 bit enum. The TRACE_ITER_* flags are converted to a macro TRACE_ITER() to allow the creation of options to still be done by macros. This change is intrusive, as it affects all TRACE_ITER options throughout the trace code. Merge the branch that added these options and converted the TRACE_ITER_* enum into a TRACE_ITER(*) macro, to allow the topic branches to still be developed without conflict. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-04 10:12:32 -05:00
Masami Hiramatsu (Google)	1149fcf759	tracing: Add an option to show symbols in _text+offset for function profiler Function profiler shows the hit count of each function using its symbol name. However, there are some same-name local symbols, which we can not distinguish. To solve this issue, this introduces an option to show the symbols in "_text+OFFSET" format. This can avoid exposing the random shift of KASLR. The functions in modules are shown as "MODNAME+OFFSET" where the offset is from ".text". E.g. for the kernel text symbols, specify vmlinux and the output to addr2line, you can find the actual function and source info; $ addr2line -fie vmlinux _text+3078208 __balance_callbacks kernel/sched/core.c:5064 for modules, specify the module file and .text+OFFSET; $ addr2line -fie samples/trace_events/trace-events-sample.ko .text+8224 do_simple_thread_func samples/trace_events/trace-events-sample.c:23 Link: https://lore.kernel.org/all/176187878064.994619.8878296550240416558.stgit@devnote2/ Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-11-04 21:44:18 +09:00
Masami Hiramatsu (Google)	bbec8e28ca	tracing: Allow tracer to add more than 32 options Since enum trace_iterator_flags is 32bit, the max number of the option flags is limited to 32 and it is fully used now. To add a new option, we need to expand it. So replace the TRACE_ITER_##flag with TRACE_ITER(flag) macro which is 64bit bitmask. Link: https://lore.kernel.org/all/176187877103.994619.166076000668757232.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-11-04 21:44:00 +09:00
Steven Rostedt	25bd47a592	tracing: Have persistent ring buffer print syscalls normally The persistent ring buffer from a previous boot has to be careful printing events as the print formats of random events can have pointers to strings and such that are not available. Ftrace static events (like the function tracer event) are stable and are printed normally. System call event formats are also stable. Allow them to be printed normally as well: Instead of: <...>-1 [005] ...1. 57.240405: sys_enter_waitid: __syscall_nr=0xf7 (247) which=0x1 (1) upid=0x499 (1177) infop=0x7ffd5294d690 (140725988939408) options=0x5 (5) ru=0x0 (0) <...>-1 [005] ...1. 57.240433: sys_exit_waitid: __syscall_nr=0xf7 (247) ret=0x0 (0) <...>-1 [005] ...1. 57.240437: sys_enter_rt_sigprocmask: __syscall_nr=0xe (14) how=0x2 (2) nset=0x7ffd5294d7c0 (140725988939712) oset=0x0 (0) sigsetsize=0x8 (8) <...>-1 [005] ...1. 57.240438: sys_exit_rt_sigprocmask: __syscall_nr=0xe (14) ret=0x0 (0) <...>-1 [005] ...1. 57.240442: sys_enter_close: __syscall_nr=0x3 (3) fd=0x4 (4) <...>-1 [005] ...1. 57.240463: sys_exit_close: __syscall_nr=0x3 (3) ret=0x0 (0) <...>-1 [005] ...1. 57.240485: sys_enter_openat: __syscall_nr=0x101 (257) dfd=0xffffffffffdfff9c (-2097252) filename=(0xffff8b81639ca01c) flags=0x80000 (524288) mode=0x0 (0) __filename_val=/run/systemd/reboot-param <...>-1 [005] ...1. 57.240555: sys_exit_openat: __syscall_nr=0x101 (257) ret=0xffffffffffdffffe (-2097154) <...>-1 [005] ...1. 57.240571: sys_enter_openat: __syscall_nr=0x101 (257) dfd=0xffffffffffdfff9c (-2097252) filename=(0xffff8b81639ca01c) flags=0x80000 (524288) mode=0x0 (0) __filename_val=/run/systemd/reboot-param <...>-1 [005] ...1. 57.240620: sys_exit_openat: __syscall_nr=0x101 (257) ret=0xffffffffffdffffe (-2097154) <...>-1 [005] ...1. 57.240629: sys_enter_writev: __syscall_nr=0x14 (20) fd=0x3 (3) vec=0x7ffd5294ce50 (140725988937296) vlen=0x7 (7) <...>-1 [005] ...1. 57.242281: sys_exit_writev: __syscall_nr=0x14 (20) ret=0x24 (36) <...>-1 [005] ...1. 57.242286: sys_enter_reboot: __syscall_nr=0xa9 (169) magic1=0xfee1dead (4276215469) magic2=0x28121969 (672274793) cmd=0x1234567 (19088743) arg=0x0 (0) Have: <...>-1 [000] ...1. 91.446011: sys_waitid(which: 1, upid: 0x4d2, infop: 0x7ffdccdadfd0, options: 5, ru: 0) <...>-1 [000] ...1. 91.446042: sys_waitid -> 0x0 <...>-1 [000] ...1. 91.446045: sys_rt_sigprocmask(how: 2, nset: 0x7ffdccdae100, oset: 0, sigsetsize: 8) <...>-1 [000] ...1. 91.446047: sys_rt_sigprocmask -> 0x0 <...>-1 [000] ...1. 91.446051: sys_close(fd: 4) <...>-1 [000] ...1. 91.446073: sys_close -> 0x0 <...>-1 [000] ...1. 91.446095: sys_openat(dfd: 18446744073709551516, filename: 139732544945794 "/run/systemd/reboot-param", flags: O_RDONLY\|O_CLOEXEC) <...>-1 [000] ...1. 91.446165: sys_openat -> 0xfffffffffffffffe <...>-1 [000] ...1. 91.446182: sys_openat(dfd: 18446744073709551516, filename: 139732544945794 "/run/systemd/reboot-param", flags: O_RDONLY\|O_CLOEXEC) <...>-1 [000] ...1. 91.446233: sys_openat -> 0xfffffffffffffffe <...>-1 [000] ...1. 91.446242: sys_writev(fd: 3, vec: 0x7ffdccdad790, vlen: 7) <...>-1 [000] ...1. 91.447877: sys_writev -> 0x24 <...>-1 [000] ...1. 91.447883: sys_reboot(magic1: 0xfee1dead, magic2: 0x28121969, cmd: 0x1234567, arg: 0) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231149.097404581@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:11:00 -04:00
Steven Rostedt	b6e5d971fc	tracing: Check for printable characters when printing field dyn strings When the "fields" option is enabled, it prints each trace event field based on its type. But a dynamic array and a dynamic string can both have a "char *" type. Printing it as a string can cause escape characters to be printed and mess up the output of the trace. For dynamic strings, test if there are any non-printable characters, and if so, print both the string with the non printable characters as '.', and the print the hex value of the array. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231148.929243047@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:59 -04:00
Steven Rostedt	64b627c8da	tracing: Add parsing of flags to the sys_enter_openat trace event Add some logic to give the openat system call trace event a bit more human readable information: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7f0053dc121c "/etc/ld.so.cache", flags: O_RDONLY\|O_CLOEXEC, mode: 0000 The above is output from "perf script" and now shows the flags used by the openat system call. Since the output from tracing is in the kernel, it can also remove the mode field when not used (when flags does not contain O_CREATE\|O_TMPFILE) touch-1185 [002] ...1. 1291.690154: sys_openat(dfd: 4294967196, filename: 139785545139344 "/usr/lib/locale/locale-archive", flags: O_RDONLY\|O_CLOEXEC) touch-1185 [002] ...1. 1291.690504: sys_openat(dfd: 18446744073709551516, filename: 140733603151330 "/tmp/x", flags: O_WRONLY\|O_CREAT\|O_NOCTTY\|O_NONBLOCK, mode: 0666) As system calls have a fixed ABI, their trace events can be extended. This currently only updates the openat system call, but others may be extended in the future. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231148.763161484@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:59 -04:00
Steven Rostedt	32e0f607ac	tracing: Add trace_seq_pop() and seq_buf_pop() In order to allow an interface to remove an added character from the trace_seq and seq_buf descriptors, add helper functions trace_seq_pop() and seq_buf_pop(). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231148.594898736@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:59 -04:00
Steven Rostedt	e77ad6da90	tracing: Show printable characters in syscall arrays When displaying the contents of the user space data passed to the kernel, instead of just showing the array values, also print any printable content. Instead of just: bash-1113 [003] ..... 3433.290654: sys_write(fd: 2, buf: 0x555a8deeddb0 (72:6f:6f:74:40:64:65:62:69:61:6e:2d:78:38:36:2d:36:34:3a:7e:23:20), count: 0x16) Display: bash-1113 [003] ..... 3433.290654: sys_write(fd: 2, buf: 0x555a8deeddb0 (72:6f:6f:74:40:64:65:62:69:61:6e:2d:78:38:36:2d:36:34:3a:7e:23:20) "root@debian-x86-64:~# ", count: 0x16) This only affects tracing and does not affect perf, as this only updates the output from the kernel. The output from perf is via user space. This may change by an update to libtraceevent that will then update perf to have this as well. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231148.429422865@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:59 -04:00
Steven Rostedt	299ea67e6a	tracing: Add a config and syscall_user_buf_size file to limit amount written When a system call that can copy user space addresses into the ring buffer, it can copy up to 511 bytes of data. This can waste precious ring buffer space if the user isn't interested in the output. Add a new file "syscall_user_buf_size" that gets initialized to a new config CONFIG_SYSCALL_BUF_SIZE_DEFAULT that defaults to 63. The config also is used to limit how much perf can read from user space. Also lower the max down to 165, as this isn't to record everything that a system call may be passing through to the kernel. 165 is more than enough. The reason for 165 is because adding one for the nul terminating byte, as well as possibly needing to append the "..." string turns it into 170 bytes. As this needs to save up to 3 arguments and 3 * 170 is 510 which fits nicely in 512 bytes (a power of 2). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231148.260068913@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:59 -04:00
Steven Rostedt	baa031b7bd	tracing: Allow syscall trace events to read more than one user parameter Allow more than one field of a syscall trace event to read user space. Build on top of the user_mask by allowing more than one bit to be set that corresponds to the @args array of the syscall metadata. For each argument in the @args array that is to be read, it will have a dynamic array/string field associated to it. Note that multiple fields to be read from user space is not supported if the user_arg_size field is set in the syscall metada. That field can only be used if only one field is being read from user space as that field is a number representing the size field of the syscall event that holds the size of the data to read from user space. It becomes ambiguous if the system call reads more than one field. Currently this is not an issue. If a syscall event happens to enable two events to read user space and sets the user_arg_size field, it will trigger a warning at boot and the user_arg_size field will be cleared. The per CPU buffer that is used to read the user space addresses is now broken up into 3 sections, each of 168 bytes. The reason for 168 is that it is the biggest portion of 512 bytes divided by 3 that is 8 byte aligned. The max amount copied into the ring buffer from user space is now only 128 bytes, which is plenty. When reading user space, it still reads 167 (168-1) bytes and uses the remaining to know if it should append the extra "..." to the end or not. This will allow the event to look like this: sys_renameat2(olddfd: 0xffffff9c, oldname: 0x7ffe02facdff "/tmp/x", newdfd: 0xffffff9c, newname: 0x7ffe02face06 "/tmp/y", flags: 1) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231148.095789277@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:58 -04:00
Steven Rostedt	011ea0501d	tracing: Display some syscall arrays as strings Some of the system calls that read a fixed length of memory from the user space address are not arrays but strings. Take a bit away from the nb_args field in the syscall meta data to use as a flag to denote that the system call's user_arg_size is being used as a string. The nb_args should never be more than 6, so 7 bits is plenty to hold that number. When the user_arg_is_str flag that, when set, will display the data array from the user space address as a string and not an array. This will allow the output to look like this: sys_sethostname(name: 0x5584310eb2a0 "debian", len: 6) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.930550359@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:58 -04:00
Steven Rostedt	b4f7624cfc	tracing: Have system call events record user array data For system call events that have a length field, add a "user_arg_size" parameter to the system call meta data that denotes the index of the args array that holds the size of arg that the user_mask field has a bit set for. The "user_mask" has a bit set that denotes the arg that points to an array in the user space address space and if a system call event has the user_mask field set and the user_arg_size set, it will then record the content of that address into the trace event, up to the size defined by SYSCALL_FAULT_BUF_SZ - 1. This allows the output to look like: sys_write(fd: 0xa, buf: 0x5646978d13c0 (01:00:05:00:00:00:00:00:01:87:55:89:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00), count: 0x20) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.763528474@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:58 -04:00
Steven Rostedt	2e82e256df	perf: tracing: Have perf system calls read user space Allow some of the system call events to read user space buffers. Instead of just showing the pointer into user space, allow perf events to also record the content of those pointers. For example: # perf record -e syscalls:sys_enter_openat ls /usr/bin [..] # perf script ls 1024 [005] 52.902721: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbae321c "/etc/ld.so.cache", flags: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.902899: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaae140 "/lib/x86_64-linux-gnu/libselinux.so.1", flags: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.903471: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaae690 "/lib/x86_64-linux-gnu/libcap.so.2", flags: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.903946: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaaebe0 "/lib/x86_64-linux-gnu/libc.so.6", flags: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.904629: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaaf110 "/lib/x86_64-linux-gnu/libpcre2-8.so.0", flags: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.906985: syscalls:sys_enter_openat: dfd: 0xffffffffffffff9c, filename: 0x7fc1dba92904 "/proc/filesystems", flags: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.907323: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dba19490 "/usr/lib/locale/locale-archive", flags: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.907746: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x556fb888dcd0 "/usr/bin", flags: 0x00090800, mode: 0x00000000 Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.593925979@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:58 -04:00
Steven Rostedt	bd1b80fba7	perf: tracing: Simplify perf_sysenter_enable/disable() with guards Use guard(mutex)(&syscall_trace_lock) for perf_sysenter_enable() and perf_sysenter_disable() as well as for the perf_sysexit_enable() and perf_sysexit_disable(). This will make it easier to update these functions with other code that has early exit handling. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.429583335@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:58 -04:00
Steven Rostedt	a544d9a66b	tracing: Have syscall trace events read user space string As of commit `654ced4a13` ("tracing: Introduce tracepoint_is_faultable()") system call trace events allow faulting in user space memory. Have some of the system call trace events take advantage of this. Use the trace_user_fault_read() logic to read the user space buffer from user space and instead of just saving the pointer to the buffer in the system call event, also save the string that is passed in. The syscall event has its nb_args shorten from an int to a short (where even u8 is plenty big enough) and the freed two bytes are used for "user_mask". The new "user_mask" field is used to store the index of the "args" field array that has the address to read from user space. This value is set to 0 if the system call event does not need to read user space for a field. This mask can be used to know if the event may fault or not. Only one bit set in user_mask is supported at this time. This allows the output to look like this: sys_access(filename: 0x7f8c55368470 "/etc/ld.so.preload", mode: 4) sys_execve(filename: 0x564ebcf5a6b8 "/usr/bin/emacs", argv: 0x7fff357c0300, envp: 0x564ebc4a4820) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.261867956@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:58 -04:00
Steven Rostedt	a9f1687264	tracing: Make trace_user_fault_read() exposed to rest of tracing The write to the trace_marker file is a critical section where it cannot take locks nor allocate memory. To read from user space, it allocates a per CPU buffer when the trace_marker file is opened, and then when the write system call is performed, it uses the following method to read from user space: preempt_disable(); buffer = per_cpu_ptr(cpu_buffers, cpu); do { cnt = nr_context_switches_cpu(); migrate_disable(); preempt_enable(); ret = copy_from_user(buffer, ptr, len); preempt_disable(); migrate_enable(); } while (!ret && cnt != nr_context_switches_cpu()); if (!ret) ring_buffer_write(buffer); preempt_enable(); It records the number of context switches for the current CPU, enables preemption, copies from user space, disable preemption and then checks if the number of context switches changed. If it did not, then the buffer is valid, otherwise the buffer may have been corrupted and the read from user space must be tried again. The system call trace events are now faultable and have the same restrictions as the trace_marker write. For system calls to read the user space buffer (for example to read the file of the openat system call), it needs the same logic. Instead of copying the code over to the system call trace events, make the code generic to allow the system call trace events to use the same code. The following API is added internally to the tracing sub system (these are only exposed within the tracing subsystem and not to be used outside of it): trace_user_fault_init() - initializes a trace_user_buf_info descriptor that will allocate the per CPU buffers to copy from user space into. trace_user_fault_destroy() - used to free the allocations made by trace_user_fault_init(). trace_user_fault_get() - update the ref count of the info descriptor to allow more than one user to use the same descriptor. trace_user_fault_put() - decrement the ref count. trace_user_fault_read() - performs the above action to read user space into the per CPU buffer. The preempt_disable() is expected before calling this function and preemption must remain disabled while the buffer returned is in use. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231147.096570057@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-28 20:10:57 -04:00
Linus Torvalds	dcb6fa37fd	Linux 6.18-rc3 v6.18-rc3	2025-10-26 15:59:49 -07:00
Linus Torvalds	4bb1f7e19c	Merge tag 'char-misc-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc/android driver fixes for 6.18-rc3 for reported issues. Included in here are: - rust binder fixes for reported issues - mei device id addition - mei driver fixes - comedi bugfix - most usb driver bugfixes - fastrpc memory leak fix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: most: usb: hdm_probe: Fix calling put_device() before device initialization most: usb: Fix use-after-free in hdm_disconnect binder: remove "invalid inc weak" check mei: txe: fix initialization order comedi: fix divide-by-zero in comedi_buf_munge() mei: late_bind: Fix -Wincompatible-function-pointer-types-strict misc: fastrpc: Fix dma_buf object leak in fastrpc_map_lookup mei: me: add wildcat lake P DID misc: amd-sbi: Clarify that this is a BMC driver nvmem: rcar-efuse: add missing MODULE_DEVICE_TABLE binder: Fix missing kernel-doc entries in binder.c rust_binder: report freeze notification only when fully frozen rust_binder: don't delete FreezeListener if there are pending duplicates rust_binder: freeze_notif_done should resend if wrong state rust_binder: remove warning about orphan mappings rust_binder: clean `clippy::mem_replace_with_default` warning	2025-10-26 10:33:46 -07:00
Linus Torvalds	40282418e1	Merge tag 'staging-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are some small staging driver fixes for the gpib subsystem to resolve some reported issues. Included in here are: - memory leak fixes - error code fixes - proper protocol fixes All of these have been in linux-next for almost 2 weeks now with no reported issues" * tag 'staging-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: gpib: Fix device reference leak in fmh_gpib driver staging: gpib: Return -EINTR on device clear staging: gpib: Fix sending clear and trigger events staging: gpib: Fix no EOI on 1 and 2 byte writes	2025-10-26 10:29:45 -07:00
Linus Torvalds	aa6085a067	Merge tag 'tty-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for reported issues. Included in here are: - sh-sci serial driver fixes - 8250_dw and _mtk driver fixes - sc16is7xx driver bugfix - new 8250_exar device ids added All of these have been in linux-next this past week with no reported issues" * tag 'tty-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_mtk: Enable baud clock and manage in runtime PM serial: 8250_dw: handle reset control deassert error dt-bindings: serial: sh-sci: Fix r8a78000 interrupts serial: sc16is7xx: remove useless enable of enhanced features serial: 8250_exar: add support for Advantech 2 port card with Device ID 0x0018 tty: serial: sh-sci: fix RSCI FIFO overrun handling	2025-10-26 10:24:39 -07:00
Linus Torvalds	6190d0fa18	Merge tag 'usb-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB driver fixes and new device ids for 6.18-rc3. Included in here are: - new option serial driver device ids added - dt bindings fixes for numerous platforms - xhci bugfixes for many reported regressions - usbio dependency bugfix - dwc3 driver fix - raw-gadget bugfix All of these have been in linux-next this week with no reported issues" * tag 'usb-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: option: add Telit FN920C04 ECM compositions USB: serial: option: add Quectel RG255C tcpm: switch check for role_sw device with fw_node usb/core/quirks: Add Huawei ME906S to wakeup quirk usb: raw-gadget: do not limit transfer length USB: serial: option: add UNISOC UIS7720 xhci: dbc: enable back DbC in resume if it was enabled before suspend xhci: dbc: fix bogus 1024 byte prefix if ttyDBC read races with stall event usb: xhci-pci: Fix USB2-only root hub registration dt-bindings: usb: qcom,snps-dwc3: Fix bindings for X1E80100 usb: misc: Add x86 dependency for Intel USBIO driver dt-bindings: usb: switch: split out ports definition usb: dwc3: Don't call clk_bulk_disable_unprepare() twice dt-bindings: usb: dwc3-imx8mp: dma-range is required only for imx8mp	2025-10-26 10:21:13 -07:00
Linus Torvalds	dbfc6422a3	Merge tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Remove dead code leftovers after a recent mitigations cleanup which fail a Clang build - Make sure a Retbleed mitigation message is printed only when necessary - Correct the last Zen1 microcode revision for which Entrysign sha256 check is needed - Fix a NULL ptr deref when mounting the resctrl fs on a system which supports assignable counters but where L3 total and local bandwidth monitoring has been disabled at boot * tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Remove dead code which might prevent from building x86/bugs: Qualify RETBLEED_INTEL_MSG x86/microcode: Fix Entrysign revision check for Zen1/Naples x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode	2025-10-26 09:57:18 -07:00
Linus Torvalds	5fee0dafba	Merge tag 'irq_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Restore the original buslock locking in a couple of places in the irq core subsystem after a rework * tag 'irq_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/manage: Add buslock back in to enable_irq() genirq/manage: Add buslock back in to __disable_irq_nosync() genirq/chip: Add buslock back in to irq_set_handler()	2025-10-26 09:54:36 -07:00
Linus Torvalds	af8159515f	Merge tag 'objtool_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Fix x32 build due to wrong format specifier on that sub-arch - Add one more Rust noreturn function to objtool's list * tag 'objtool_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix failure when being compiled on x32 system objtool/rust: add one more `noreturn` Rust function	2025-10-26 09:44:36 -07:00

1 2 3 4 5 ...

1397382 Commits