linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-21 07:55:15 -04:00

Author	SHA1	Message	Date
Jiri Olsa	ec46350fe1	uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode The uprobe_write has special path to restore the original page when we write original instruction back. This happens when uprobe_write detects that we want to write anything else but breakpoint instruction. Moving the detection away and passing it to uprobe_write as argument, so it's possible to write different instructions (other than just breakpoint and rest). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250720112133.244369-7-jolsa@kernel.org	2025-08-21 20:09:19 +02:00
Jiri Olsa	f8b7c528b4	uprobes: Add nbytes argument to uprobe_write Adding nbytes argument to uprobe_write and related functions as preparation for writing whole instructions in following changes. Also renaming opcode arguments to insn, which seems to fit better. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250720112133.244369-6-jolsa@kernel.org	2025-08-21 20:09:19 +02:00
Jiri Olsa	33d7b2beaf	uprobes: Add uprobe_write function Adding uprobe_write function that does what uprobe_write_opcode did so far, but allows to pass verify callback function that checks the memory location before writing the opcode. It will be used in following changes to implement specific checking logic for instruction update. The uprobe_write_opcode now calls uprobe_write with verify_opcode as the verify callback. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250720112133.244369-5-jolsa@kernel.org	2025-08-21 20:09:19 +02:00
Jiri Olsa	82afdd05a1	uprobes: Make copy_from_page global Making copy_from_page global and adding uprobe prefix. Adding the uprobe prefix to copy_to_page as well for symmetry. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250720112133.244369-4-jolsa@kernel.org	2025-08-21 20:09:18 +02:00
Jiri Olsa	0f07b7919d	uprobes: Rename arch_uretprobe_trampoline function We are about to add uprobe trampoline, so cleaning up the namespace. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250720112133.244369-3-jolsa@kernel.org	2025-08-21 20:09:18 +02:00
Jiri Olsa	7769cb177b	uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Currently unapply_uprobe takes mmap_read_lock, but it might call remove_breakpoint which eventually changes user pages. Current code writes either breakpoint or original instruction, so it can go away with read lock as explained in here [1]. But with the upcoming change that writes multiple instructions on the probed address we need to ensure that any update to mm's pages is exclusive. [1] https://lore.kernel.org/all/20240710140045.GA1084@redhat.com/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250720112133.244369-2-jolsa@kernel.org	2025-08-21 20:09:18 +02:00
Linus Torvalds	068a56e56f	Merge tag 'probes-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: "Sanitize wildcard for fprobe event name Fprobe event accepts wildcards for the target functions, but unless the user specifies its event name, it makes an event with the wildcards. Replace the wildcard '' with the underscore '_'" tag 'probes-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fprobe-event: Sanitize wildcard for fprobe event name	2025-08-20 16:29:30 -07:00
Masami Hiramatsu (Google)	ec879e1a0b	tracing: fprobe-event: Sanitize wildcard for fprobe event name Fprobe event accepts wildcards for the target functions, but unless user specifies its event name, it makes an event with the wildcards. /sys/kernel/tracing # echo 'f mutex' >> dynamic_events /sys/kernel/tracing # cat dynamic_events f:fprobes/mutex__entry mutex* /sys/kernel/tracing # ls events/fprobes/ enable filter mutex__entry To fix this, replace the wildcard ('') with an underscore. Link: https://lore.kernel.org/all/175535345114.282990.12294108192847938710.stgit@devnote2/ Fixes: `334e5519c3` ("tracing/probes: Add fprobe events for tracing function entry and exit.") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org	2025-08-20 23:41:58 +09:00
Ye Weihua	edede7a6dc	trace/fgraph: Fix the warning caused by missing unregister notifier This warning was triggered during testing on v6.16: notifier callback ftrace_suspend_notifier_call already registered WARNING: CPU: 2 PID: 86 at kernel/notifier.c:23 notifier_chain_register+0x44/0xb0 ... Call Trace: <TASK> blocking_notifier_chain_register+0x34/0x60 register_ftrace_graph+0x330/0x410 ftrace_profile_write+0x1e9/0x340 vfs_write+0xf8/0x420 ? filp_flush+0x8a/0xa0 ? filp_close+0x1f/0x30 ? do_dup2+0xaf/0x160 ksys_write+0x65/0xe0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f When writing to the function_profile_enabled interface, the notifier was not unregistered after start_graph_tracing failed, causing a warning the next time function_profile_enabled was written. Fixed by adding unregister_pm_notifier in the exception path. Link: https://lore.kernel.org/20250818073332.3890629-1-yeweihua4@huawei.com Fixes: `4a2b8dda3f` ("tracing/function-graph-tracer: fix a regression while suspend to disk") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ye Weihua <yeweihua4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-08-20 09:21:03 -04:00
Liao Yuanhong	cd6e4faba9	ring-buffer: Remove redundant semicolons Remove unnecessary semicolons. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250813095114.559530-1-liaoyuanhong@vivo.com Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-08-20 09:20:30 -04:00
Pu Lehui	6a909ea83f	tracing: Limit access to parser->buffer when trace_get_user failed When the length of the string written to set_ftrace_filter exceeds FTRACE_BUFF_MAX, the following KASAN alarm will be triggered: BUG: KASAN: slab-out-of-bounds in strsep+0x18c/0x1b0 Read of size 1 at addr ffff0000d00bd5ba by task ash/165 CPU: 1 UID: 0 PID: 165 Comm: ash Not tainted 6.16.0-g6bcdbd62bd56-dirty Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x34/0x50 (C) dump_stack_lvl+0xa0/0x158 print_address_description.constprop.0+0x88/0x398 print_report+0xb0/0x280 kasan_report+0xa4/0xf0 __asan_report_load1_noabort+0x20/0x30 strsep+0x18c/0x1b0 ftrace_process_regex.isra.0+0x100/0x2d8 ftrace_regex_release+0x484/0x618 __fput+0x364/0xa58 ____fput+0x28/0x40 task_work_run+0x154/0x278 do_notify_resume+0x1f0/0x220 el0_svc+0xec/0xf0 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x1ac/0x1b0 The reason is that trace_get_user will fail when processing a string longer than FTRACE_BUFF_MAX, but not set the end of parser->buffer to 0. Then an OOB access will be triggered in ftrace_regex_release-> ftrace_process_regex->strsep->strpbrk. We can solve this problem by limiting access to parser->buffer when trace_get_user failed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250813040232.1344527-1-pulehui@huaweicloud.com Fixes: `8c9af478c0` ("ftrace: Handle commands when closing set_ftrace_filter file") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-08-20 09:20:30 -04:00
Pasha Tatashin	44958f2025	kho: warn if KHO is disabled due to an error During boot scratch area is allocated based on command line parameters or auto calculated. However, scratch area may fail to allocate, and in that case KHO is disabled. Currently, no warning is printed that KHO is disabled, which makes it confusing for the end user to figure out why KHO is not available. Add the missing warning message. Link: https://lkml.kernel.org/r/20250808201804.772010-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-19 16:35:53 -07:00
Pasha Tatashin	8b66ed2c3f	kho: mm: don't allow deferred struct page with KHO KHO uses struct pages for the preserved memory early in boot, however, with deferred struct page initialization, only a small portion of memory has properly initialized struct pages. This problem was detected where vmemmap is poisoned, and illegal flag combinations are detected. Don't allow them to be enabled together, and later we will have to teach KHO to work properly with deferred struct page init kernel feature. Link: https://lkml.kernel.org/r/20250808201804.772010-3-pasha.tatashin@soleen.com Fixes: `4e1d010e3b` ("kexec: add config option for KHO") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-19 16:35:53 -07:00
Pasha Tatashin	63b17b653d	kho: init new_physxa->phys_bits to fix lockdep Patch series "Several KHO Hotfixes". Three unrelated fixes for Kexec Handover. This patch (of 3): Lockdep shows the following warning: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. [<ffffffff810133a6>] dump_stack_lvl+0x66/0xa0 [<ffffffff8136012c>] assign_lock_key+0x10c/0x120 [<ffffffff81358bb4>] register_lock_class+0xf4/0x2f0 [<ffffffff813597ff>] __lock_acquire+0x7f/0x2c40 [<ffffffff81360cb0>] ? __pfx_hlock_conflict+0x10/0x10 [<ffffffff811707be>] ? native_flush_tlb_global+0x8e/0xa0 [<ffffffff8117096e>] ? __flush_tlb_all+0x4e/0xa0 [<ffffffff81172fc2>] ? __kernel_map_pages+0x112/0x140 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff81359556>] lock_acquire+0xe6/0x280 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff8100b9e0>] _raw_spin_lock+0x30/0x40 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff813ec327>] xa_load_or_alloc+0x67/0xe0 [<ffffffff813eb4c0>] kho_preserve_folio+0x90/0x100 [<ffffffff813ebb7f>] __kho_finalize+0xcf/0x400 [<ffffffff813ebef4>] kho_finalize+0x34/0x70 This is becase xa has its own lock, that is not initialized in xa_load_or_alloc. Modifiy __kho_preserve_order(), to properly call xa_init(&new_physxa->phys_bits); Link: https://lkml.kernel.org/r/20250808201804.772010-2-pasha.tatashin@soleen.com Fixes: `fc33e4b44b` ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-19 16:35:53 -07:00
Linus Torvalds	055f213075	Merge tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix two memory leaks in pidfs - Prevent changing the idmapping of an already idmapped mount without OPEN_TREE_CLONE through open_tree_attr() - Don't fail listing extended attributes in kernfs when no extended attributes are set - Fix the return value in coredump_parse() - Fix the error handling for unbuffered writes in netfs - Fix broken data integrity guarantees for O_SYNC writes via iomap - Fix UAF in __mark_inode_dirty() - Keep inode->i_blkbits constant in fuse - Fix coredump selftests - Fix get_unused_fd_flags() usage in do_handle_open() - Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES - Fix use-after-free in bh_read() - Fix incorrect lflags value in the move_mount() syscall * tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signal: Fix memory leak for PIDFD_SELF* sentinels kernfs: don't fail listing extended attributes coredump: Fix return value in coredump_parse() fs/buffer: fix use-after-free when call bh_read() helper pidfs: Fix memory leak in pidfd_info() netfs: Fix unbuffered write error handling fhandle: do_handle_open() should get FD with user flags module: Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES fs: fix incorrect lflags value in the move_mount syscall selftests/coredump: Remove the read() that fails the test fuse: keep inode->i_blkbits constant iomap: Fix broken data integrity guarantees for O_SYNC writes selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE fs: writeback: fix use-after-free in __mark_inode_dirty()	2025-08-19 09:54:47 -07:00
Adrian Huang (Lenovo)	a2c1f82618	signal: Fix memory leak for PIDFD_SELF* sentinels Commit `f08d0c3a71` ("pidfd: add PIDFD_SELF* sentinels to refer to own thread/process") introduced a leak by acquiring a pid reference through get_task_pid(), which increments pid->count but never drops it with put_pid(). As a result, kmemleak reports unreferenced pid objects after running tools/testing/selftests/pidfd/pidfd_test, for example: unreferenced object 0xff1100206757a940 (size 160): comm "pidfd_test", pid 16965, jiffies 4294853028 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 fd 57 50 04 .............WP. 5e 44 00 00 00 00 00 00 18 de 34 17 01 00 11 ff ^D........4..... backtrace (crc cd8844d4): kmem_cache_alloc_noprof+0x2f4/0x3f0 alloc_pid+0x54/0x3d0 copy_process+0xd58/0x1740 kernel_clone+0x99/0x3b0 __do_sys_clone3+0xbe/0x100 do_syscall_64+0x7b/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fix this by calling put_pid() after do_pidfd_send_signal() returns. Fixes: `f08d0c3a71` ("pidfd: add PIDFD_SELF* sentinels to refer to own thread/process") Signed-off-by: Adrian Huang (Lenovo) <adrianhuang0701@gmail.com> Link: https://lore.kernel.org/20250818134310.12273-1-adrianhuang0701@gmail.com Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-19 13:51:28 +02:00
Oleg Nesterov	b1afcaddd6	pid: change bacct_add_tsk() to use task_ppid_nr_ns() to simplify the code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/20250810173615.GA20000@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-19 13:38:20 +02:00
Oleg Nesterov	abdfd4948e	pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers task_pid_vnr(another_task) will crash if the caller was already reaped. The pid_alive(current) check can't really help, the parent/debugger can call release_task() right after this check. This also means that even task_ppid_nr_ns(current, NULL) is not safe, pid_alive() only ensures that it is safe to dereference ->real_parent. Change __task_pid_nr_ns() to ensure ns != NULL. Originally-by: 高翔 <gaoxiang17@xiaomi.com> Link: https://lore.kernel.org/all/20250802022123.3536934-1-gxxa03070307@gmail.com/ Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/20250810173604.GA19991@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-19 13:38:20 +02:00
gaoxiang17	006568ab4c	pid: Add a judgment for ns null in pid_nr_ns __task_pid_nr_ns ns = task_active_pid_ns(current); pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns); if (pid && ns->level <= pid->level) { Sometimes null is returned for task_active_pid_ns. Then it will trigger kernel panic in pid_nr_ns. For example: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 Mem abort info: ESR = 0x0000000096000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=00000002175aa000 [0000000000000058] pgd=08000002175ab003, p4d=08000002175ab003, pud=08000002175ab003, pmd=08000002175be003, pte=0000000000000000 pstate: 834000c5 (Nzcv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : __task_pid_nr_ns+0x74/0xd0 lr : __task_pid_nr_ns+0x24/0xd0 sp : ffffffc08001bd10 x29: ffffffc08001bd10 x28: ffffffd4422b2000 x27: 0000000000000001 x26: ffffffd442821168 x25: ffffffd442821000 x24: 00000f89492eab31 x23: 00000000000000c0 x22: ffffff806f5693c0 x21: ffffff806f5693c0 x20: 0000000000000001 x19: 0000000000000000 x18: 0000000000000000 x17: 00000000529c6ef0 x16: 00000000529c6ef0 x15: 00000000023a1adc x14: 0000000000000003 x13: 00000000007ef6d8 x12: 001167c391c78800 x11: 00ffffffffffffff x10: 0000000000000000 x9 : 0000000000000001 x8 : ffffff80816fa3c0 x7 : 0000000000000000 x6 : 49534d702d535449 x5 : ffffffc080c4c2c0 x4 : ffffffd43ee128c8 x3 : ffffffd43ee124dc x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffffff806f5693c0 Call trace: __task_pid_nr_ns+0x74/0xd0 ... __handle_irq_event_percpu+0xd4/0x284 handle_irq_event+0x48/0xb0 handle_fasteoi_irq+0x160/0x2d8 generic_handle_domain_irq+0x44/0x60 gic_handle_irq+0x4c/0x114 call_on_irq_stack+0x3c/0x74 do_interrupt_handler+0x4c/0x84 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c account_kernel_stack+0x60/0x144 exit_task_stack_account+0x1c/0x80 do_exit+0x7e4/0xaf8 ... get_signal+0x7bc/0x8d8 do_notify_resume+0x128/0x828 el0_svc+0x6c/0x70 el0t_64_sync_handler+0x68/0xbc el0t_64_sync+0x1a8/0x1ac Code: 35fffe54 911a02a8 f9400108 b4000128 (b9405a69) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com> Link: https://lore.kernel.org/20250802022123.3536934-1-gxxa03070307@gmail.com Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-19 13:38:20 +02:00
Thorsten Blum	800348aa34	kcsan: test: Replace deprecated strcpy() with strscpy() strcpy() is deprecated; use strscpy() instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Marco Elver <elver@google.com>	2025-08-19 12:52:12 +02:00
Martin KaFai Lau	5c42715e63	Merge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master' Merge 'skb-meta-dynptr' branch into 'master' branch. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-08-18 17:59:26 -07:00
Martin KaFai Lau	7e1371023a	Merge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/net' Merge 'skb-meta-dynptr' branch into 'net' branch. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-08-18 17:58:21 -07:00
Jakub Sitnicki	6877cd392b	bpf: Enable read/write access to skb metadata through a dynptr Now that we can create a dynptr to skb metadata, make reads to the metadata area possible with bpf_dynptr_read() or through a bpf_dynptr_slice(), and make writes to the metadata area possible with bpf_dynptr_write() or through a bpf_dynptr_slice_rdwr(). Note that for cloned skbs which share data with the original, we limit the skb metadata dynptr to be read-only since we don't unclone on a bpf_dynptr_write to metadata. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-2-8a39e636e0fb@cloudflare.com	2025-08-18 10:29:42 -07:00
Jakub Sitnicki	89d912e494	bpf: Add dynptr type for skb metadata Add a dynptr type, similar to skb dynptr, but for the skb metadata access. The dynptr provides an alternative to __sk_buff->data_meta for accessing the custom metadata area allocated using the bpf_xdp_adjust_meta() helper. More importantly, it abstracts away the fact where the storage for the custom metadata lives, which opens up the way to persist the metadata by relocating it as the skb travels through the network stack layers. Writes to skb metadata invalidate any existing skb payload and metadata slices. While this is more restrictive that needed at the moment, it leaves the door open to reallocating the metadata on writes, and should be only a minor inconvenience to the users. Only the program types which can access __sk_buff->data_meta today are allowed to create a dynptr for skb metadata at the moment. We need to modify the network stack to persist the metadata across layers before opening up access to other BPF hooks. Once more BPF hooks gain access to skb_meta dynptr, we will also need to add a read-only variant of the helper similar to bpf_dynptr_from_skb_rdonly. skb_meta dynptr ops are stubbed out and implemented by subsequent changes. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-1-8a39e636e0fb@cloudflare.com	2025-08-18 10:29:42 -07:00
Anton Protopopov	dbe99ea541	bpf: Add a verbose message when the BTF limit is reached When a BPF program which is being loaded reaches the map limit (MAX_USED_MAPS) or the BTF limit (MAX_USED_BTFS) the -E2BIG is returned. However, in the former case there is an accompanying verifier verbose message, and in the latter case there is not. Add a verbose message to make the behaviour symmetrical. Reported-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250816151554.902995-1-a.s.protopopov@gmail.com	2025-08-18 17:27:01 +02:00
Fushuai Wang	d87fdb1f27	bpf: Replace get_next_cpu() with cpumask_next_wrap() The get_next_cpu() function was only used in one place to find the next possible CPU, which can be replaced by cpumask_next_wrap(). Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250818032344.23229-1-wangfushuai@baidu.com	2025-08-18 15:11:02 +02:00
Linus Torvalds	0a9ee9ce49	Merge tag 'locking_urgent_for_v6.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Make sure sanity checks down in the mutex lock path happen on the correct type of task so that they don't trigger falsely - Use the write unsafe user access pairs when writing a futex value to prevent an error on PowerPC which does user read and write accesses differently * tag 'locking_urgent_for_v6.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: Fix __clear_task_blocked_on() warning from __ww_mutex_wound() path futex: Use user_write_access_begin/_end() in futex_put_value()	2025-08-17 05:57:47 -07:00
Thorsten Blum	5eb4b9a4cd	params: Replace deprecated strcpy() with strscpy() and memcpy() strcpy() is deprecated; use strscpy() and memcpy() instead. In param_set_copystring(), we can safely use memcpy() because we already know the length of the source string 'val' and that it is guaranteed to be NUL-terminated within the first 'kps->maxlen' bytes. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Link: https://lore.kernel.org/r/20250813132200.184064-2-thorsten.blum@linux.dev Signed-off-by: Daniel Gomez <da.gomez@samsung.com>	2025-08-16 21:47:25 +02:00
Tao Chen	abdaf49be5	bpf: Remove migrate_disable in kprobe_multi_link_prog_run Graph tracer framework ensures we won't migrate, kprobe_multi_link_prog_run called all the way from graph tracer, which disables preemption in function_graph_enter_regs, as Jiri and Yonghong suggested, there is no need to use migrate_disable. As a result, some overhead may will be reduced. And add cant_sleep check for __this_cpu_inc_return. Fixes: `0dcac27254` ("bpf: Add multi kprobe link") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250814121430.2347454-1-chen.dylane@linux.dev	2025-08-15 16:49:31 -07:00
Thomas Gleixner	448f97fba9	perf: Convert mmap() refcounts to refcount_t The recently fixed reference count leaks could have been detected by using refcount_t and refcount_t would have mitigated the potential overflow at least. Now that the code is properly structured, convert the mmap() related mmap_count variants over to refcount_t. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104020.071507932@infradead.org	2025-08-15 13:13:02 +02:00
Peter Zijlstra	59741451b4	perf: Identify the 0->1 transition for event::mmap_count Needed because refcount_inc() doesn't allow the 0->1 transition. Specifically, this is the case where we've created the RB, this means there was no RB, and as such there could not have been an mmap. Additionally we hold mmap_mutex to serialize everything. This must be the first. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250812104019.956479989@infradead.org	2025-08-15 13:13:02 +02:00
Peter Zijlstra	d23a6dbc0a	perf: Use scoped_guard() for mmap_mutex in perf_mmap() Mostly just re-indent noise. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.838047976@infradead.org	2025-08-15 13:13:01 +02:00
Peter Zijlstra	5d299897f1	perf: Split out the RB allocation Move the RB buffer allocation branch into its own function. Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.722214699@infradead.org	2025-08-15 13:13:01 +02:00
Peter Zijlstra	191759e5ea	perf: Make RB allocation branch self sufficient Ensure @rb usage doesn't extend out of the branch block. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.605285302@infradead.org	2025-08-15 13:13:01 +02:00
Peter Zijlstra	2aee376823	perf: Split out the AUX buffer allocation Move the AUX buffer allocation branch into its own function. Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.494205648@infradead.org	2025-08-15 13:13:00 +02:00
Peter Zijlstra	8558dca9fb	perf: Reflow to get rid of aux_success label Mostly re-indent noise needed to get rid of that label. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.362581570@infradead.org	2025-08-15 13:13:00 +02:00
Peter Zijlstra	b33a51564e	perf: Use guard() for aux_mutex in perf_mmap() After duplicating the common code into the rb/aux branches is it possible to use a simple guard() for the aux_mutex. Making the aux branch self-contained. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.246250452@infradead.org	2025-08-15 13:13:00 +02:00
Peter Zijlstra	41b80e1d74	perf: Remove redundant aux_unlock label unlock and aux_unlock are now identical, remove the aux_unlock one. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.131293512@infradead.org	2025-08-15 13:13:00 +02:00
Peter Zijlstra	4118994b33	perf: Move common code into both rb and aux branches if (cond) { A; } else { B; } C; into if (cond) { A; C; } else { B; C; } Notably C has a success branch and both A and B have two places for success. For A (rb case), duplicate the success case because later patches will result in them no longer being identical. For B (aux case), share using goto (cleaned up later). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104019.016252852@infradead.org	2025-08-15 13:12:59 +02:00
Peter Zijlstra	3821f25868	perf: Merge consecutive conditionals in perf_mmap() if (cond) { A; } else { B; } if (cond) { C; } else { D; } into: if (cond) { A; C; } else { B; D; } Notably the conditions are not identical in form, but are equivalent. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.900078502@infradead.org	2025-08-15 13:12:59 +02:00
Peter Zijlstra	86a0a7c598	perf: Move perf_mmap_calc_limits() into both rb and aux branches if (cond) { A; } else { B; } C; into if (cond) { A; C; } else { B; C; } Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.781244099@infradead.org	2025-08-15 13:12:59 +02:00
Thomas Gleixner	1ea3e3b0da	perf: Split out VM accounting Similarly to the mlock limit calculation the VM accounting is required for both the ringbuffer and the AUX buffer allocations. To prepare for splitting them out into separate functions, move the accounting into a helper function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.660347811@infradead.org	2025-08-15 13:12:59 +02:00
Thomas Gleixner	81e026ca47	perf: Split out mlock limit handling To prepare for splitting the buffer allocation out into separate functions for the ring buffer and the AUX buffer, split out mlock limit handling into a helper function, which can be called from both. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.541975109@infradead.org	2025-08-15 13:12:58 +02:00
Thomas Gleixner	e8c4f6ee8e	perf: Remove redundant condition for AUX buffer size It is already checked whether the VMA size is the same as nr_pages * PAGE_SIZE, so later checking both: aux_size == vma_size && aux_size == nr_pages * PAGE_SIZE is redundant. Remove the vma_size check as nr_pages is what is actually used in the allocation function. That prepares for splitting out the buffer allocation into separate functions, so that only nr_pages needs to be handed in. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.424519320@infradead.org	2025-08-15 13:12:58 +02:00
Yunseong Kim	b64fdd422a	perf: Avoid undefined behavior from stopping/starting inactive events Calling pmu->start()/stop() on perf events in PERF_EVENT_STATE_OFF can leave event->hw.idx at -1. When PMU drivers later attempt to use this negative index as a shift exponent in bitwise operations, it leads to UBSAN shift-out-of-bounds reports. The issue is a logical flaw in how event groups handle throttling when some members are intentionally disabled. Based on the analysis and the reproducer provided by Mark Rutland (this issue on both arm64 and x86-64). The scenario unfolds as follows: 1. A group leader event is configured with a very aggressive sampling period (e.g., sample_period = 1). This causes frequent interrupts and triggers the throttling mechanism. 2. A child event in the same group is created in a disabled state (.disabled = 1). This event remains in PERF_EVENT_STATE_OFF. Since it hasn't been scheduled onto the PMU, its event->hw.idx remains initialized at -1. 3. When throttling occurs, perf_event_throttle_group() and later perf_event_unthrottle_group() iterate through all siblings, including the disabled child event. 4. perf_event_throttle()/unthrottle() are called on this inactive child event, which then call event->pmu->start()/stop(). 5. The PMU driver receives the event with hw.idx == -1 and attempts to use it as a shift exponent. e.g., in macros like PMCNTENSET(idx), leading to the UBSAN report. The throttling mechanism attempts to start/stop events that are not actively scheduled on the hardware. Move the state check into perf_event_throttle()/perf_event_unthrottle() so that inactive events are skipped entirely. This ensures only active events with a valid hw.idx are processed, preventing undefined behavior and silencing UBSAN warnings. The corrected check ensures true before proceeding with PMU operations. The problem can be reproduced with the syzkaller reproducer: Fixes: `9734e25fbf` ("perf: Fix the throttle logic for a group") Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250812181046.292382-2-ysk@kzalloc.com	2025-08-15 13:12:56 +02:00
Jiri Olsa	e4414b01c1	bpf: Check the helper function is valid in get_helper_proto kernel test robot reported verifier bug [1] where the helper func pointer could be NULL due to disabled config option. As Alexei suggested we could check on that in get_helper_proto directly. Marking tail_call helper func with BPF_PTR_POISON, because it is unused by design. [1] https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: syzbot+a9ed3d9132939852d0df@syzkaller.appspotmail.com Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250814200655.945632-1-jolsa@kernel.org Closes: https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com	2025-08-15 11:16:56 +02:00
Jesper Dangaard Brouer	2b986b9e91	bpf, cpumap: Disable page_pool direct xdp_return need larger scope When running an XDP bpf_prog on the remote CPU in cpumap code then we must disable the direct return optimization that xdp_return can perform for mem_type page_pool. This optimization assumes code is still executing under RX-NAPI of the original receiving CPU, which isn't true on this remote CPU. The cpumap code already disabled this via helpers xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(), but the scope didn't include xdp_do_flush(). When doing XDP_REDIRECT towards e.g devmap this causes the function bq_xmit_all() to run with direct return optimization enabled. This can lead to hard to find bugs. The issue only happens when bq_xmit_all() cannot ndo_xdp_xmit all frames and them frees them via xdp_return_frame_rx_napi(). Fix by expanding scope to include xdp_do_flush(). This was found by Dragos Tatulea. Fixes: `11941f8a85` ("bpf: cpumap: Implement generic cpumap") Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Reported-by: Chris Arges <carges@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Chris Arges <carges@cloudflare.com> Link: https://patch.msgid.link/175519587755.3008742.1088294435150406835.stgit@firesoul	2025-08-15 11:08:08 +02:00
Paul E. McKenney	51c285baa3	rcutorture: Delay forward-progress testing until boot completes Forward-progress testing can hog CPUs, which is not a great thing to do before boot has completed. This commit therefore makes the CPU-hotplug operations hold off until boot has completed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00
Paul E. McKenney	6e9c48b3e3	torture: Delay CPU-hotplug operations until boot completes CPU-hotplug operations invoke stop-machine, which can hog CPUs, which is not a great thing to do before boot has completed. This commit therefore makes the CPU-hotplug operations hold off until boot has completed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00
Paul E. McKenney	9a316fe3ad	rcutorture: Delay rcutorture readers and writers until boot completes The rcutorture writers and (especially) readers are the biggest CPU hogs of the bunch, so this commit therefore makes them wait until boot has completed. This makes the current setting of the boot_ended local variable dead code, so while in the area, this commit removes that as well. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2025-08-14 15:26:30 -07:00

... 10 11 12 13 14 ...

49620 Commits