linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-21 09:05:20 -04:00

Author	SHA1	Message	Date
Ihor Solodrai	5fc5d8fded	bpf: Add bpf_dynptr_memset() kfunc Currently there is no straightforward way to fill dynptr memory with a value (most commonly zero). One can do it with bpf_dynptr_write(), but a temporary buffer is necessary for that. Implement bpf_dynptr_memset() - an analogue of memset() from libc. Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250702210309.3115903-2-isolodrai@meta.com	2025-07-03 15:21:20 -07:00
tuhaowen	4266e8fa56	PM: sleep: console: Fix the black screen issue When the computer enters sleep status without a monitor connected, the system switches the console to the virtual terminal tty63(SUSPEND_CONSOLE). If a monitor is subsequently connected before waking up, the system skips the required VT restoration process during wake-up, leaving the console on tty63 instead of switching back to tty1. To fix this issue, a global flag vt_switch_done is introduced to record whether the system has successfully switched to the suspend console via vt_move_to_console() during suspend. If the switch was completed, vt_switch_done is set to 1. Later during resume, this flag is checked to ensure that the original console is restored properly by calling vt_move_to_console(orig_fgconsole, 0). This prevents scenarios where the resume logic skips console restoration due to incorrect detection of the console state, especially when a monitor is reconnected before waking up. Signed-off-by: tuhaowen <tuhaowen@uniontech.com> Link: https://patch.msgid.link/20250611032345.29962-1-tuhaowen@uniontech.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-03 15:58:33 +02:00
Thomas Gleixner	858e65af91	irqdomain: Add device pointer to irq_domain_info and msi_domain_info Add device pointer to irq_domain_info and msi_domain_info, so that the device can be specified at domain creation time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/943e52403b20cf13c320d55bd4446b4562466aab.1750860131.git.namcao@linutronix.de	2025-07-03 15:49:24 +02:00
Paolo Abeni	4f38a6db7b	Merge tag 'ktime-get-clock-ts64-for-ptp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Base implementation for PTP with a temporary CLOCK_AUX* workaround to allow integration of depending changes into the networking tree. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 15:35:07 +02:00
Thomas Gleixner	a6d9638d4d	Merge tag 'ktime-get-clock-ts64-for-ptp' into timers/ptp Pull the base implementation of ktime_get_clock_ts64() for PTP, which contains a temporary CLOCK_AUX* workaround. That was created to allow integration of depending changes into the networking tree. The workaround is going to be removed in a subsequent change in the timekeeping tree. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-07-03 14:35:53 +02:00
Thomas Gleixner	5b605dbee0	timekeeping: Provide ktime_get_clock_ts64() PTP implements an inline switch case for taking timestamps from various POSIX clock IDs, which already consumes quite some text space. Expanding it for auxiliary clocks really becomes too big for inlining. Provide a out of line version. The function invalidates the timestamp in case the clock is invalid. The invalidation allows to implement a validation check without the need to propagate a return value through deep existing call chains. Due to merge logistics this temporarily defines CLOCK_AUX[_LAST] if undefined, so that the plain branch, which does not contain any of the core timekeeper changes, can be pulled into the networking tree as prerequisite for the PTP side changes. These temporary defines are removed after that branch is merged into the tip::timers/ptp branch. That way the result in -next or upstream in the next merge window has zero dependencies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250701132628.357686408@linutronix.de	2025-07-03 14:18:06 +02:00
Peter Zijlstra	ba677dbe77	perf: Revert to requiring CAP_SYS_ADMIN for uprobes Jann reports that uprobes can be used destructively when used in the middle of an instruction. The kernel only verifies there is a valid instruction at the requested offset, but due to variable instruction length cannot determine if this is an instruction as seen by the intended execution stream. Additionally, Mark Rutland notes that on architectures that mix data in the text segment (like arm64), a similar things can be done if the data word is 'mistaken' for an instruction. As such, require CAP_SYS_ADMIN for uprobes. Fixes: `c9e0924e5c` ("perf/core: open access to probes for CAP_PERFMON privileged process") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/CAG48ez1n4520sq0XrWYDHKiKxE_+WCfAK+qt9qkY4ZiBGmL-5g@mail.gmail.com	2025-07-03 10:33:55 +02:00
Paul Chaignon	65fdafd676	bpf: Avoid warning on multiple referenced args in call The description of full helper calls in syzkaller [1] and the addition of kernel warnings in commit `0df1a55afa` ("bpf: Warn on internal verifier errors") allowed syzbot to reach a verifier state that was thought to indicate a verifier bug [2]: 12: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 verifier bug: more than one arg with ref_obj_id R2 2 2 This error can be reproduced with the program from the previous commit: 0: (b7) r2 = 20 1: (b7) r3 = 0 2: (18) r1 = 0xffff92cee3cbc600 4: (85) call bpf_ringbuf_reserve#131 5: (55) if r0 == 0x0 goto pc+3 6: (bf) r1 = r0 7: (bf) r2 = r0 8: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 9: (95) exit bpf_tcp_raw_gen_syncookie_ipv4 expects R1 and R2 to be ARG_PTR_TO_FIXED_SIZE_MEM (with a size of at least sizeof(struct iphdr) for R1). R0 is a ring buffer payload of 20B and therefore matches this requirement. The verifier reaches the check on ref_obj_id while verifying R2 and rejects the program because the helper isn't supposed to take two referenced arguments. This case is a legitimate rejection and doesn't indicate a kernel bug, so we shouldn't log it as such and shouldn't emit a kernel warning. Link: https://github.com/google/syzkaller/pull/4313 [1] Link: https://lore.kernel.org/all/686491d6.a70a0220.3b7e22.20ea.GAE@google.com/T/ [2] Fixes: `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it") Fixes: `0df1a55afa` ("bpf: Warn on internal verifier errors") Reported-by: syzbot+69014a227f8edad4d8c6@syzkaller.appspotmail.com Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/cd09afbfd7bef10bbc432d72693f78ffdc1e8ee5.1751463262.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-07-02 10:43:38 -07:00
Eduard Zingerman	c3b9faac9b	bpf: avoid jump misprediction for PTR_TO_MEM \| PTR_UNTRUSTED Commit `f2362a57ae` ("bpf: allow void* cast using bpf_rdonly_cast()") added a notion of PTR_TO_MEM \| MEM_RDONLY \| PTR_UNTRUSTED type. This simultaneously introduced a bug in jump prediction logic for situations like below: p = bpf_rdonly_cast(..., 0); if (p) a(); else b(); Here verifier would wrongly predict that else branch cannot be taken. This happens because: - Function reg_not_null() assumes that PTR_TO_MEM w/o PTR_MAYBE_NULL flag cannot be zero. - The call to bpf_rdonly_cast() returns a rdonly_untrusted_mem value w/o PTR_MAYBE_NULL flag. Tracking of PTR_MAYBE_NULL flag for untrusted PTR_TO_MEM does not make sense, as the main purpose of the flag is to catch null pointer access errors. Such errors are not possible on load of PTR_UNTRUSTED values and verifier makes sure that PTR_UNTRUSTED can't be passed to helpers or kfuncs. Hence, modify reg_not_null() to assume that nullness of untrusted PTR_TO_MEM is not known. Fixes: `f2362a57ae` ("bpf: allow void* cast using bpf_rdonly_cast()") Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250702073620.897517-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-07-02 10:43:10 -07:00
Yury Norov [NVIDIA]	e0e9506523	smp: Defer check for local execution in smp_call_function_many_cond() Defer check for local execution to the actual place where it is needed, which removes the extra local variable. Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250623000010.10124-5-yury.norov@gmail.com	2025-07-02 19:13:14 +02:00
Song Liu	5bc9557c9f	bpf: Mark cgroup_subsys_state->cgroup RCU safe Mark struct cgroup_subsys_state->cgroup as safe under RCU read lock. This will enable accessing css->cgroup from a bpf css iterator. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/20250623063854.1896364-4-song@kernel.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-02 14:18:20 +02:00
Song Liu	b95ee9049c	bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node BPF programs, such as LSM and sched_ext, would benefit from tags on cgroups. One common practice to apply such tags is to set xattrs on cgroupfs folders. Introduce kfunc bpf_cgroup_read_xattr, which allows reading cgroup's xattr. Note that, we already have bpf_get_[file\|dentry]_xattr. However, these two APIs are not ideal for reading cgroupfs xattrs, because: 1) These two APIs only works in sleepable contexts; 2) There is no kfunc that matches current cgroup to cgroupfs dentry. bpf_cgroup_read_xattr is generic and can be useful for many program types. It is also safe, because it requires trusted or rcu protected argument (KF_RCU). Therefore, we make it available to all program types. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/20250623063854.1896364-3-song@kernel.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-02 14:18:20 +02:00
Masami Hiramatsu (Google)	2867495dea	tracing: tprobe-events: Register tracepoint when enable tprobe event As same as fprobe, register tracepoint stub function only when enabling tprobe events. The major changes are introducing a list of tracepoint_user and its lock, and tprobe_event_module_nb, which is another module notifier for module loading/unloading. By spliting the lock from event_mutex and a module notifier for trace_fprobe, it solved AB-BA lock dependency issue between event_mutex and tracepoint_module_list_mutex. Link: https://lore.kernel.org/all/174343538901.843280.423773753642677941.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-07-02 09:45:35 +09:00
Masami Hiramatsu (Google)	2db832ec90	tracing: fprobe-events: Register fprobe-events only when it is enabled Currently fprobe events are registered when it is defined. Thus it will give some overhead even if it is disabled. This changes it to register the fprobe only when it is enabled. Link: https://lore.kernel.org/all/174343537128.843280.16131300052837035043.stgit@devnote2/ Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-07-02 09:45:09 +09:00
Masami Hiramatsu (Google)	e3d6e1b9a3	tracing: tprobe-events: Support multiple tprobes on the same tracepoint Allow user to set multiple tracepoint-probe events on the same tracepoint. After the last tprobe-event is removed, the tracepoint callback is unregistered. Link: https://lore.kernel.org/all/174343536245.843280.6548776576601537671.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-07-02 09:44:55 +09:00
Masami Hiramatsu (Google)	c135ab4a96	tracing: tprobe-events: Remove mod field from tprobe-event Remove unneeded 'mod' struct module pointer field from trace_fprobe because we don't need to save this info. Link: https://lore.kernel.org/all/174343535351.843280.5868426549023332120.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-07-02 09:44:36 +09:00
Masami Hiramatsu (Google)	ddb017ec9c	tracing: probe-events: Cleanup entry-arg storing code Cleanup __store_entry_arg() so that it is easier to understand. The main complexity may come from combining the loops for finding stored-entry-arg and max-offset and appending new entry. This split those different loops into 3 parts, lookup the same entry-arg, find the max offset and append new entry. Link: https://lore.kernel.org/all/174323039929.348535.4705349977127704120.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-07-02 09:44:12 +09:00
Paul Chaignon	f824274587	bpf: Reject %p% format string in bprintf-like helpers static const char fmt[] = "%p%"; bpf_trace_printk(fmt, sizeof(fmt)); The above BPF program isn't rejected and causes a kernel warning at runtime: Please remove unsupported %\x00 in format string WARNING: CPU: 1 PID: 7244 at lib/vsprintf.c:2680 format_decode+0x49c/0x5d0 This happens because bpf_bprintf_prepare skips over the second %, detected as punctuation, while processing %p. This patch fixes it by not skipping over punctuation. %\x00 is then processed in the next iteration and rejected. Reported-by: syzbot+e2c932aec5c8a6e1d31c@syzkaller.appspotmail.com Fixes: `48cac3f4a9` ("bpf: Implement formatted output helpers with bstr_printf") Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/a0e06cc479faec9e802ae51ba5d66420523251ee.1751395489.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-01 15:22:46 -07:00
Paul Chaignon	0df1a55afa	bpf: Warn on internal verifier errors This patch is a follow up to commit `1cb0f56d96` ("bpf: WARN_ONCE on verifier bugs"). It generalizes the use of verifier_error throughout the verifier, in particular for logs previously marked "verifier internal error". As a consequence, all of those verifier bugs will now come with a kernel warning (under CONFIG_DBEUG_KERNEL) detectable by fuzzers. While at it, some error messages that were too generic (ex., "bpf verifier is misconfigured") have been reworded. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/aGQqnzMyeagzgkCK@Tunnel Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-01 12:38:30 -07:00
Daniel Wagner	b6139a6abf	lib/group_cpus: Let group_cpu_evenly() return the number of initialized masks group_cpu_evenly() might have allocated less groups then requested: group_cpu_evenly() __group_cpus_evenly() alloc_nodes_groups() # allocated total groups may be less than numgrps when # active total CPU number is less then numgrps In this case, the caller will do an out of bound access because the caller assumes the masks returned has numgrps. Return the number of groups created so the caller can limit the access range accordingly. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250617-isolcpus-queue-counters-v1-1-13923686b54b@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-01 10:24:11 -06:00
Greg Kroah-Hartman	e78f70bad2	time/timecounter: Fix the lie that struct cyclecounter is const In both the read callback for struct cyclecounter, and in struct timecounter, struct cyclecounter is declared as a const pointer. Unfortunatly, a number of users of this pointer treat it as a non-const pointer as it is burried in a larger structure that is heavily modified by the callback function when accessed. This lie had been hidden by the fact that container_of() "casts away" a const attribute of a pointer without any compiler warning happening at all. Fix this all up by removing the const attribute in the needed places so that everyone can see that the structure really isn't const, but can, and is, modified by the users of it. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/2025070124-backyard-hurt-783a@gregkh	2025-07-01 15:38:25 +02:00
Peter Zijlstra	009836b4fa	sched/core: Fix migrate_swap() vs. hotplug On Mon, Jun 02, 2025 at 03:22:13PM +0800, Kuyo Chang wrote: > So, the potential race scenario is: > > CPU0 CPU1 > // doing migrate_swap(cpu0/cpu1) > stop_two_cpus() > ... > // doing _cpu_down() > sched_cpu_deactivate() > set_cpu_active(cpu, false); > balance_push_set(cpu, true); > cpu_stop_queue_two_works > __cpu_stop_queue_work(stopper1,...); > __cpu_stop_queue_work(stopper2,..); > stop_cpus_in_progress -> true > preempt_enable(); > ... > 1st balance_push > stop_one_cpu_nowait > cpu_stop_queue_work > __cpu_stop_queue_work > list_add_tail -> 1st add push_work > wake_up_q(&wakeq); -> "wakeq is empty. > This implies that the stopper is at wakeq@migrate_swap." > preempt_disable > wake_up_q(&wakeq); > wake_up_process // wakeup migrate/0 > try_to_wake_up > ttwu_queue > ttwu_queue_cond ->meet below case > if (cpu == smp_processor_id()) > return false; > ttwu_do_activate > //migrate/0 wakeup done > wake_up_process // wakeup migrate/1 > try_to_wake_up > ttwu_queue > ttwu_queue_cond > ttwu_queue_wakelist > __ttwu_queue_wakelist > __smp_call_single_queue > preempt_enable(); > > 2nd balance_push > stop_one_cpu_nowait > cpu_stop_queue_work > __cpu_stop_queue_work > list_add_tail -> 2nd add push_work, so the double list add is detected > ... > ... > cpu1 get ipi, do sched_ttwu_pending, wakeup migrate/1 > So this balance_push() is part of schedule(), and schedule() is supposed to switch to stopper task, but because of this race condition, stopper task is stuck in WAKING state and not actually visible to be picked. Therefore CPU1 can do another schedule() and end up doing another balance_push() even though the last one hasn't been done yet. This is a confluence of fail, where both wake_q and ttwu_wakelist can cause crucial wakeups to be delayed, resulting in the malfunction of balance_push. Since there is only a single stopper thread to be woken, the wake_q doesn't really add anything here, and can be removed in favour of direct wakeups of the stopper thread. Then add a clause to ttwu_queue_cond() to ensure the stopper threads are never queued / delayed. Of all 3 moving parts, the last addition was the balance_push() machinery, so pick that as the point the bug was introduced. Fixes: `2558aacff8` ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug") Reported-by: Kuyo Chang <kuyo.chang@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kuyo Chang <kuyo.chang@mediatek.com> Link: https://lkml.kernel.org/r/20250605100009.GO39944@noisy.programming.kicks-ass.net	2025-07-01 15:02:03 +02:00
Thomas Weißschuh	3ebb1b6522	sched: Fix preemption string of preempt_dynamic_none Zero is a valid value for "preempt_dynamic_mode", namely "preempt_dynamic_none". Fix the off-by-one in preempt_model_str(), so that "preempty_dynamic_none" is correctly formatted as PREEMPT(none) instead of PREEMPT(undef). Fixes: `8bdc5daaa0` ("sched: Add a generic function to return the preemption string") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250626-preempt-str-none-v2-1-526213b70a89@linutronix.de	2025-07-01 15:02:02 +02:00
Thomas Gleixner	5173ac2dc8	Merge tag 'entry-split-for-arm' into core/entry Prerequisite for ARM[64] generic entry conversion Merge it into the entry branch so further changes can be based on it.	2025-06-30 19:54:19 +02:00
Jinjie Ruan	a70e9f647f	entry: Split generic entry into generic exception and syscall entry Currently CONFIG_GENERIC_ENTRY enables both the generic exception entry logic and the generic syscall entry logic, which are otherwise loosely coupled. Introduce separate config options for these so that architectures can select the two independently. This will make it easier for architectures to migrate to generic entry code. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/20250213130007.1418890-2-ruanjinjie@huawei.com Link: https://lore.kernel.org/all/20250624-generic-entry-split-v1-1-53d5ef4f94df@linaro.org [Linus Walleij: rebase onto v6.16-rc1]	2025-06-30 19:52:55 +02:00
Luo Gengkun	7b4c5a3754	perf/core: Fix the WARN_ON_ONCE is out of lock protected region commit `3172fb9866` ("perf/core: Fix WARN in perf_cgroup_switch()") try to fix a concurrency problem between perf_cgroup_switch and perf_cgroup_event_disable. But it does not to move the WARN_ON_ONCE into lock-protected region, so the warning is still be triggered. Fixes: `3172fb9866` ("perf/core: Fix WARN in perf_cgroup_switch()") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250626135403.2454105-1-luogengkun@huaweicloud.com	2025-06-30 09:32:49 +02:00
Linus Torvalds	2fc18d0b89	Merge tag 'perf_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Make sure an AUX perf event is really disabled when it overruns * tag 'perf_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/aux: Fix pending disable flow when the AUX ring buffer overruns	2025-06-29 08:16:02 -07:00
Linus Torvalds	ded779017a	Merge tag 'trace-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: - Fix possible UAF on error path in filter_free_subsystem_filters() When freeing a subsystem filter, the filter for the subsystem is passed in to be freed and all the events within the subsystem will have their filter freed too. In order to free without waiting for RCU synchronization, list items are allocated to hold what is going to be freed to free it via a call_rcu(). If the allocation of these items fails, it will call the synchronization directly and free after that (causing a bit of delay for the user). The subsystem filter is first added to this list and then the filters for all the events under the subsystem. The bug is if one of the allocations of the list items for the event filters fail to allocate, it jumps to the "free_now" label which will free the subsystem filter, then all the items on the allocated list, and then the event filters that were not added to the list yet. But because the subsystem filter was added first, it gets freed twice. The solution is to add the subsystem filter after the events, and then if any of the allocations fail it will not try to free any of them twice * tag 'trace-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix filter logic error	2025-06-28 11:39:24 -07:00
Linus Torvalds	0fd39af24e	Merge tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 6 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 5 are for MM" * tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add Lorenzo as THP co-maintainer mailmap: update Duje Mihanović's email address selftests/mm: fix validate_addr() helper crashdump: add CONFIG_KEYS dependency mailmap: correct name for a historical account of Zijun Hu mailmap: add entries for Zijun Hu fuse: fix runtime warning on truncate_folio_batch_exceptionals() scripts/gdb: fix dentry_name() lookup mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write mm/alloc_tag: fix the kmemleak false positive issue in the allocation of the percpu variable tag->counters lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly() mm/hugetlb: remove unnecessary holding of hugetlb_lock MAINTAINERS: add missing files to mm page alloc section MAINTAINERS: add tree entry to mm init block mm: add OOM killer maintainer structure fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folio	2025-06-27 20:34:10 -07:00
Edward Adam Davis	6921d1e07c	tracing: Fix filter logic error If the processing of the tr->events loop fails, the filter that has been added to filter_head will be released twice in free_filter_list(&head->rcu) and __free_filter(filter). After adding the filter of tr->events, add the filter to the filter_head process to avoid triggering uaf. Link: https://lore.kernel.org/tencent_4EF87A626D702F816CD0951CE956EC32CD0A@qq.com Fixes: `a9d0aab5eb` ("tracing: Fix regression of filter waiting a long time on RCU synchronization") Reported-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=daba72c4af9915e9c894 Tested-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-06-27 15:51:36 -04:00
Eduard Zingerman	a5a7b25d75	bpf: guard BTF_ID_FLAGS(bpf_cgroup_read_xattr) with CONFIG_BPF_LSM Function bpf_cgroup_read_xattr is defined in fs/bpf_fs_kfuncs.c, which is compiled only when CONFIG_BPF_LSM is set. Add CONFIG_BPF_LSM check to bpf_cgroup_read_xattr spec in common_btf_ids in kernel/bpf/helpers.c to avoid build failures for configs w/o CONFIG_BPF_LSM. Build failure example: BTF .tmp_vmlinux1.btf.o btf_encoder__tag_kfunc: failed to find kfunc 'bpf_cgroup_read_xattr' in BTF ... WARN: resolve_btfids: unresolved symbol bpf_cgroup_read_xattr make[2]: *** [scripts/Makefile.vmlinux:91: vmlinux.unstripped] Error 255 Fixes: `535b070f4a` ("bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node") Reported-by: Jake Hillion <jakehillion@meta.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250627175309.2710973-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-27 12:18:42 -07:00
Thomas Gleixner	7b95663a3d	timekeeping: Provide interface to control auxiliary clocks Auxiliary clocks are disabled by default and attempts to access them fail. Provide an interface to enable/disable them at run-time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.444626478@linutronix.de	2025-06-27 20:13:13 +02:00
Thomas Gleixner	e6d4c00719	timekeeping: Provide update for auxiliary timekeepers Update the auxiliary timekeepers periodically. For now this is tied to the system timekeeper update from the tick. This might be revisited and moved out of the tick. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.382451331@linutronix.de	2025-06-27 20:13:13 +02:00
Thomas Gleixner	ecf3e70304	timekeeping: Provide adjtimex() for auxiliary clocks The behaviour is close to clock_adtime(CLOCK_REALTIME) with the following differences: 1) ADJ_SETOFFSET adjusts the auxiliary clock offset 2) ADJ_TAI is not supported 3) Leap seconds are not supported 4) PPS is not supported Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.317946543@linutronix.de	2025-06-27 20:13:13 +02:00
Thomas Gleixner	4eca49d0b6	timekeeping: Prepare do_adtimex() for auxiliary clocks Exclude ADJ_TAI, leap seconds and PPS functionality as they make no sense in the context of auxiliary clocks and provide a time stamp based on the actual clock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.253203783@linutronix.de	2025-06-27 20:13:13 +02:00
Thomas Gleixner	775f71ebed	timekeeping: Make do_adjtimex() reusable Split out the actual functionality of adjtimex() and make do_adjtimex() a wrapper which feeds the core timekeeper into it and handles the result including audit at the call site. This allows to reuse the actual functionality for auxiliary clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.187322876@linutronix.de	2025-06-27 20:13:13 +02:00
Thomas Gleixner	2c8aea59c2	timekeeping: Add auxiliary clock support to __timekeeping_inject_offset() Redirect the relative offset adjustment to the auxiliary clock offset instead of modifying CLOCK_REALTIME, which has no meaning in context of these clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.124057787@linutronix.de	2025-06-27 20:13:13 +02:00
Thomas Gleixner	e8db3a5579	timekeeping: Make timekeeping_inject_offset() reusable Split out the inner workings for auxiliary clock support and feed the core time keeper into it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.059934561@linutronix.de	2025-06-27 20:13:13 +02:00
Thomas Gleixner	60ecc26ec5	timekeeping: Provide time setter for auxiliary clocks Add clock_settime(2) support for auxiliary clocks. The function affects the AUX offset which is added to the "monotonic" clock readout of these clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.995688714@linutronix.de	2025-06-27 20:13:12 +02:00
Thomas Gleixner	606424bf4f	timekeeping: Add minimal posix-timers support for auxiliary clocks Provide clock_getres(2) and clock_gettime(2) for auxiliary clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.932220594@linutronix.de	2025-06-27 20:13:12 +02:00
Thomas Gleixner	05bc6e6290	timekeeping: Provide time getters for auxiliary clocks Provide interfaces similar to the ktime_get*() family which provide access to the auxiliary clocks. These interfaces have a boolean return value, which indicates whether the accessed clock is valid or not. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.868342628@linutronix.de	2025-06-27 20:13:12 +02:00
Thomas Gleixner	9f7729480a	timekeeping: Update auxiliary timekeepers on clocksource change Propagate a system clocksource change to the auxiliary timekeepers so that they can pick up the new clocksource. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.803890875@linutronix.de	2025-06-27 20:13:12 +02:00
Viktor Malik	5272b51367	bpf: Fix string kfuncs names in doc comments Documentation comments for bpf_strnlen and bpf_strcspn contained incorrect function names. Fixes: `e91370550f` ("bpf: Add kfuncs for read-only string operations") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/bpf/20250627174759.3a435f86@canb.auug.org.au/T/#u Signed-off-by: Viktor Malik <vmalik@redhat.com> Link: https://lore.kernel.org/r/20250627082001.237606-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-27 08:48:06 -07:00
Alexei Starovoitov	48d998af99	Merge branch 'vfs-6.17.bpf' of https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Merge branch 'vfs-6.17.bpf' from vfs tree into bpf-next/master and resolve conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-26 19:01:04 -07:00
Yury Norov [NVIDIA]	976e0e3103	smp: Use cpumask_any_but() in smp_call_function_many_cond() smp_call_function_many_cond() opencodes cpumask_any_but(). Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250623000010.10124-3-yury.norov@gmail.com	2025-06-26 23:46:34 +02:00
Yury Norov [NVIDIA]	5f295519b4	smp: Improve locality in smp_call_function_any() smp_call_function_any() tries to make a local call as it's the cheapest option, or switches to a CPU in the same node. If it's not possible, the algorithm gives up and searches for any CPU, in a numerical order. Instead, it can search for the best CPU based on NUMA locality, including the 2nd nearest hop (a set of equidistant nodes), and higher. sched_numa_find_nth_cpu() does exactly that, and also helps to drop most of the housekeeping code. Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250623000010.10124-2-yury.norov@gmail.com	2025-06-26 23:46:34 +02:00
Mario Limonciello	12ffc3b151	PM: Restrict swap use to later in the suspend sequence Currently swap is restricted before drivers have had a chance to do their prepare() PM callbacks. Restricting swap this early means that if a driver needs to evict some content from memory into sawp in it's prepare callback, it won't be able to. On AMD dGPUs this can lead to failed suspends under memory pressure situations as all VRAM must be evicted to system memory or swap. Move the swap restriction to right after all devices have had a chance to do the prepare() callback. If there is any problem with the sequence, restore swap in the appropriate dpm resume callbacks or error handling paths. Closes: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Nat Wittstock <nat@fardog.io> Tested-by: Lucian Langa <lucilanga@7pot.org> Link: https://patch.msgid.link/20250613214413.4127087-1-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-26 20:39:34 +02:00
Alexei Starovoitov	886178a33a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3 Cross-merge BPF, perf and other fixes after downstream PRs. It restores BPF CI to green after critical fix commit `bc4394e5e7` ("perf: Fix the throttle error of some clock events") No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-26 09:49:39 -07:00
Viktor Malik	e91370550f	bpf: Add kfuncs for read-only string operations String operations are commonly used so this exposes the most common ones to BPF programs. For now, we limit ourselves to operations which do not copy memory around. Unfortunately, most in-kernel implementations assume that strings are %NUL-terminated, which is not necessarily true, and therefore we cannot use them directly in the BPF context. Instead, we open-code them using __get_kernel_nofault instead of plain dereference to make them safe and limit the strings length to XATTR_SIZE_MAX to make sure the functions terminate. When __get_kernel_nofault fails, functions return -EFAULT. Similarly, when the size bound is reached, the functions return -E2BIG. In addition, we return -ERANGE when the passed strings are outside of the kernel address space. Note that thanks to these dynamic safety checks, no other constraints are put on the kfunc args (they are marked with the "__ign" suffix to skip any verifier checks for them). All of the functions return integers, including functions which normally (in kernel or libc) return pointers to the strings. The reason is that since the strings are generally treated as unsafe, the pointers couldn't be dereferenced anyways. So, instead, we return an index to the string and let user decide what to do with it. This also nicely fits with returning various error codes when necessary (see above). Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Viktor Malik <vmalik@redhat.com> Link: https://lore.kernel.org/r/4b008a6212852c1b056a413f86e3efddac73551c.1750917800.git.vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-26 09:44:45 -07:00
Leo Yan	1476b21832	perf/aux: Fix pending disable flow when the AUX ring buffer overruns If an AUX event overruns, the event core layer intends to disable the event by setting the 'pending_disable' flag. Unfortunately, the event is not actually disabled afterwards. In commit: `ca6c21327c` ("perf: Fix missing SIGTRAPs") the 'pending_disable' flag was changed to a boolean. However, the AUX event code was not updated accordingly. The flag ends up holding a CPU number. If this number is zero, the flag is taken as false and the IRQ work is never triggered. Later, with commit: `2b84def990` ("perf: Split __perf_pending_irq() out of perf_pending_irq()") a new IRQ work 'pending_disable_irq' was introduced to handle event disabling. The AUX event path was not updated to kick off the work queue. To fix this bug, when an AUX ring buffer overrun is detected, call perf_event_disable_inatomic() to initiate the pending disable flow. Also update the outdated comment for setting the flag, to reflect the boolean values (0 or 1). Fixes: `2b84def990` ("perf: Split __perf_pending_irq() out of perf_pending_irq()") Fixes: `ca6c21327c` ("perf: Fix missing SIGTRAPs") Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Liang Kan <kan.liang@linux.intel.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20250625170737.2918295-1-leo.yan@arm.com	2025-06-26 10:50:37 +02:00

... 19 20 21 22 23 ...

49605 Commits