linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-22 00:14:52 -04:00

Author	SHA1	Message	Date
Rafael J. Wysocki	75b63ce2c9	PM: suspend: Drop a misplaced pm_restore_gfp_mask() call The pm_restore_gfp_mask() call added by commit `12ffc3b151` ("PM: Restrict swap use to later in the suspend sequence") to suspend_devices_and_enter() is done too early because it takes place before calling dpm_resume() in dpm_resume_end() and some swap-backing devices may not be ready at that point. Moreover, dpm_resume_end() called subsequently in the same code path invokes pm_restore_gfp_mask() again and calling it twice in a row is pointless. Drop the misplaced pm_restore_gfp_mask() call from suspend_devices_and_enter() to address this issue. Fixes: `12ffc3b151` ("PM: Restrict swap use to later in the suspend sequence") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/2810409.mvXUDI8C0e@rjwysocki.net [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-15 14:54:26 +02:00
Peng Jiang	e1876fb015	kprobes: Add missing kerneldoc for __get_insn_slot Add kerneldoc for '__get_insn_slot' function to fix W=1 warnings: kernel/kprobes.c:141 function parameter 'c' not described in '__get_insn_slot' Link: https://lore.kernel.org/all/20250704143817707TOCcfTRWsO5OAbQ2eYoU9@zte.com.cn/ Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-07-15 18:45:34 +09:00
Breno Leitao	7a3cedafcc	lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization lockdep_unregister_key() is called from critical code paths, including sections where rtnl_lock() is held. For example, when replacing a qdisc in a network device, network egress traffic is disabled while __qdisc_destroy() is called for every network queue. If lockdep is enabled, __qdisc_destroy() calls lockdep_unregister_key(), which gets blocked waiting for synchronize_rcu() to complete. For example, a simple tc command to replace a qdisc could take 13 seconds: # time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq real 0m13.195s user 0m0.001s sys 0m2.746s During this time, network egress is completely frozen while waiting for RCU synchronization. Use synchronize_rcu_expedited() instead to minimize the impact on critical operations like network connectivity changes. This improves 10x the function call to tc, when replacing the qdisc for a network card. # time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq real 0m1.789s user 0m0.000s sys 0m1.613s [boqun: Fixed the comment and add more information for the temporary workaround, and add TODO information for hazptr] Reported-by: Erik Lundgren <elundgren@meta.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250321-lockdep-v1-1-78b732d195fb@debian.org	2025-07-14 21:57:29 -07:00
Ran Xiaokai	1dfe5ea6db	locking/mutex: Remove redundant #ifdefs hung_task_{set,clear}_blocker() is already guarded by CONFIG_DETECT_HUNG_TASK_BLOCKER in hung_task.h, So remove the redudant check of #ifdef. Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250704015218.359754-1-ranxiaokai627@163.com	2025-07-14 21:57:29 -07:00
Arnd Bergmann	bd27cfb58c	locking/lockdep: Change 'static const' variables to enum values gcc warns about 'static const' variables even in headers when building with -Wunused-const-variables enabled: In file included from kernel/locking/lockdep_proc.c:25: kernel/locking/lockdep_internals.h:69:28: error: 'LOCKF_USED_IN_IRQ_READ' defined but not used [-Werror=unused-const-variable=] 69 \| static const unsigned long LOCKF_USED_IN_IRQ_READ = \| ^~~~~~~~~~~~~~~~~~~~~~ kernel/locking/lockdep_internals.h:63:28: error: 'LOCKF_ENABLED_IRQ_READ' defined but not used [-Werror=unused-const-variable=] 63 \| static const unsigned long LOCKF_ENABLED_IRQ_READ = \| ^~~~~~~~~~~~~~~~~~~~~~ kernel/locking/lockdep_internals.h:57:28: error: 'LOCKF_USED_IN_IRQ' defined but not used [-Werror=unused-const-variable=] 57 \| static const unsigned long LOCKF_USED_IN_IRQ = \| ^~~~~~~~~~~~~~~~~ kernel/locking/lockdep_internals.h:51:28: error: 'LOCKF_ENABLED_IRQ' defined but not used [-Werror=unused-const-variable=] 51 \| static const unsigned long LOCKF_ENABLED_IRQ = \| ^~~~~~~~~~~~~~~~~ This one is easy to avoid by changing the generated constant definition into an equivalent enum. Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250409122314.2848028-6-arnd@kernel.org	2025-07-14 21:57:29 -07:00
Arnd Bergmann	d7c36d6350	locking/lockdep: Avoid struct return in lock_stats() Returning a large structure from the lock_stats() function causes clang to have multiple copies of it on the stack and copy between them, which can end up exceeding the frame size warning limit: kernel/locking/lockdep.c:300:25: error: stack frame size (1464) exceeds limit (1280) in 'lock_stats' [-Werror,-Wframe-larger-than] 300 \| struct lock_class_stats lock_stats(struct lock_class *class) Change the calling conventions to directly operate on the caller's copy, which apparently is what gcc does already. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250610092941.2642847-1-arnd@kernel.org	2025-07-14 21:57:20 -07:00
Eric Biggers	9503ca2cca	lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() Rename the existing sha1_init() to sha1_init_raw(), since it conflicts with the upcoming library function. This will later be removed, but this keeps the kernel building for the introduction of the library. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-14 08:22:31 -07:00
Peter Zijlstra	7de9d4f946	sched: Start blocked_on chain processing in find_proxy_task() Start to flesh out the real find_proxy_task() implementation, but avoid the migration cases for now, in those cases just deactivate the donor task and pick again. To ensure the donor task or other blocked tasks in the chain aren't migrated away while we're running the proxy, also tweak the fair class logic to avoid migrating donor or mutex blocked tasks. [jstultz: This change was split out from the larger proxy patch] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-9-jstultz@google.com	2025-07-14 17:16:33 +02:00
Valentin Schneider	be39617e38	sched: Fix proxy/current (push,pull)ability Proxy execution forms atomic pairs of tasks: The waiting donor task (scheduling context) and a proxy (execution context). The donor task, along with the rest of the blocked chain, follows the proxy wrt CPU placement. They can be the same task, in which case push/pull doesn't need any modification. When they are different, however, FIFO1 & FIFO42: ,-> RT42 \| \| blocked-on \| v blocked_donor \| mutex \| \| owner \| v `-- RT1 RT1 RT42 CPU0 CPU1 ^ ^ \| \| overloaded !overloaded rq prio = 42 rq prio = 0 RT1 is eligible to be pushed to CPU1, but should that happen it will "carry" RT42 along. Clearly here neither RT1 nor RT42 must be seen as push/pullable. Unfortunately, only the donor task is usually dequeued from the rq, and the proxy'ed execution context (rq->curr) remains on the rq. This can cause RT1 to be selected for migration from logic like the rt pushable_list. Thus, adda a dequeue/enqueue cycle on the proxy task before __schedule returns, which allows the sched class logic to avoid adding the now current task to the pushable_list. Furthermore, tasks becoming blocked on a mutex don't need an explicit dequeue/enqueue cycle to be made (push/pull)able: they have to be running to block on a mutex, thus they will eventually hit put_prev_task(). Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-8-jstultz@google.com	2025-07-14 17:16:33 +02:00
John Stultz	be41bde4c3	sched: Add an initial sketch of the find_proxy_task() function Add a find_proxy_task() function which doesn't do much. When we select a blocked task to run, we will just deactivate it and pick again. The exception being if it has become unblocked after find_proxy_task() was called. This allows us to validate keeping blocked tasks on the runqueue and later deactivating them is working ok, stressing the failure cases for when a proxy isn't found. Greatly simplified from patch by: Peter Zijlstra (Intel) <peterz@infradead.org> Juri Lelli <juri.lelli@redhat.com> Valentin Schneider <valentin.schneider@arm.com> Connor O'Brien <connoro@google.com> [jstultz: Split out from larger proxy patch and simplified for review and testing.] Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-7-jstultz@google.com	2025-07-14 17:16:32 +02:00
John Stultz	aa4f74dfd4	sched: Fix runtime accounting w/ split exec & sched contexts Without proxy-exec, we normally charge the "current" task for both its vruntime as well as its sum_exec_runtime. With proxy, however, we have two "current" contexts: the scheduler context and the execution context. We want to charge the execution context rq->curr (ie: proxy/lock holder) execution time to its sum_exec_runtime (so it's clear to userland the rq->curr task is running), as well as its thread group. However the rest of the time accounting (such a vruntime and cgroup accounting), we charge against the scheduler context (rq->donor) task, because it is from that task that the time is being "donated". If the donor and curr tasks are the same, then it's the same as without proxy. Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-6-jstultz@google.com	2025-07-14 17:16:32 +02:00
John Stultz	865d8cfb16	sched: Move update_curr_task logic into update_curr_se Absorb update_curr_task() into update_curr_se(), and in the process simplify update_curr_common(). This will make the next step a bit easier. Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-5-jstultz@google.com	2025-07-14 17:16:32 +02:00
Valentin Schneider	a4f0b6fef4	locking/mutex: Add p->blocked_on wrappers for correctness checks This lets us assert mutex::wait_lock is held whenever we access p->blocked_on, as well as warn us for unexpected state changes. [fix conflicts, call in more places] [jstultz: tweaked commit subject, reworked a good bit] Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-4-jstultz@google.com	2025-07-14 17:16:32 +02:00
Peter Zijlstra	44e4e0297c	locking/mutex: Rework task_struct::blocked_on Track the blocked-on relation for mutexes, to allow following this relation at schedule time. task \| blocked-on v mutex \| owner v task This all will be used for tracking blocked-task/mutex chains with the prox-execution patch in a similar fashion to how priority inheritance is done with rt_mutexes. For serialization, blocked-on is only set by the task itself (current). And both when setting or clearing (potentially by others), is done while holding the mutex::wait_lock. [minor changes while rebasing] [jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-3-jstultz@google.com	2025-07-14 17:16:31 +02:00
John Stultz	25c411fce7	sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument sched_proxy_exec= that can be used to disable the feature at boot time if CONFIG_SCHED_PROXY_EXEC was enabled. Also uses this option to allow the rq->donor to be different from rq->curr. Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com	2025-07-14 17:16:31 +02:00
Peter Zijlstra	8f2146159b	Merge branch 'tip/sched/urgent' Avoid merge conflicts Signed-off-by: Peter Zijlstra <peterz@infradead.org>	2025-07-14 17:16:28 +02:00
K Prateek Nayak	1eec89a671	sched/topology: Remove sched_domain_topology_level::flags Support for overlapping domains added in commit `e3589f6c81` ("sched: Allow for overlapping sched_domain spans") also allowed forcefully setting SD_OVERLAP for !NUMA domains via FORCE_SD_OVERLAP sched_feat(). Since NUMA domains had to be presumed overlapping to ensure correct behavior, "sched_domain_topology_level::flags" was introduced. NUMA domains added the SDTL_OVERLAP flag would ensure SD_OVERLAP was always added during build_sched_domains() for these domains, even when FORCE_SD_OVERLAP was off. Condition for adding the SD_OVERLAP flag at the aforementioned commit was as follows: if (tl->flags & SDTL_OVERLAP \|\| sched_feat(FORCE_SD_OVERLAP)) sd->flags \|= SD_OVERLAP; The FORCE_SD_OVERLAP debug feature was removed in commit `af85596c74` ("sched/topology: Remove FORCE_SD_OVERLAP") which left the NUMA domains as the exclusive users of SDTL_OVERLAP, SD_OVERLAP, and SD_NUMA flags. Get rid of SDTL_OVERLAP and SD_OVERLAP as they have become redundant and instead rely on SD_NUMA to detect the only overlapping domain currently supported. Since SDTL_OVERLAP was the only user of "tl->flags", get rid of "sched_domain_topology_level::flags" too. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/ba4dbdf8-bc37-493d-b2e0-2efb00ea3e19@amd.com	2025-07-14 10:59:35 +02:00
Li Chen	e075f43609	smpboot: introduce SDTL_INIT() helper to tidy sched topology setup Define a small SDTL_INIT(maskfn, flagsfn, name) macro and use it to build the sched_domain_topology_level array. Purely a cleanup; behaviour is unchanged. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250710105715.66594-2-me@linux.beauty	2025-07-14 10:59:34 +02:00
Juri Lelli	440989c10f	sched/deadline: Fix accounting after global limits change A global limits change (sched_rt_handler() logic) currently leaves stale and/or incorrect values in variables related to accounting (e.g. extra_bw). Properly clean up per runqueue variables before implementing the change and rebuild scheduling domains (so that accounting is also properly restored) after such a change is complete. Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b Link: https://lore.kernel.org/r/20250627115118.438797-4-juri.lelli@redhat.com	2025-07-14 10:59:33 +02:00
Juri Lelli	fcc9276c4d	sched/deadline: Reset extra_bw to max_bw when clearing root domains dl_clear_root_domain() doesn't take into account the fact that per-rq extra_bw variables retain values computed before root domain changes, resulting in broken accounting. Fix it by resetting extra_bw to max_bw before restoring back dl-servers contributions. Fixes: `2ff899e351` ("sched/deadline: Rebuild root domain accounting after every update") Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b Link: https://lore.kernel.org/r/20250627115118.438797-3-juri.lelli@redhat.com	2025-07-14 10:59:32 +02:00
Juri Lelli	9f239df555	sched/deadline: Initialize dl_servers after SMP dl-servers are currently initialized too early at boot when CPUs are not fully up (only boot CPU is). This results in miscalculation of per runqueue DEADLINE variables like extra_bw (which needs a stable CPU count). Move initialization of dl-servers later on after SMP has been initialized and CPUs are all online, so that CPU count is stable and DEADLINE variables can be computed correctly. Fixes: `d741f297bc` ("sched/fair: Fair server interface") Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b Link: https://lore.kernel.org/r/20250627115118.438797-2-juri.lelli@redhat.com	2025-07-14 10:59:32 +02:00
Aruna Ramakrishna	36569780b0	sched: Change nr_uninterruptible type to unsigned long The commit `e6fe3f422b` ("sched: Make multiple runqueue task counters 32-bit") changed nr_uninterruptible to an unsigned int. But the nr_uninterruptible values for each of the CPU runqueues can grow to large numbers, sometimes exceeding INT_MAX. This is valid, if, over time, a large number of tasks are migrated off of one CPU after going into an uninterruptible state. Only the sum of all nr_interruptible values across all CPUs yields the correct result, as explained in a comment in kernel/sched/loadavg.c. Change the type of nr_uninterruptible back to unsigned long to prevent overflows, and thus the miscalculation of load average. Fixes: `e6fe3f422b` ("sched: Make multiple runqueue task counters 32-bit") Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250709173328.606794-1-aruna.ramakrishna@oracle.com	2025-07-14 10:59:31 +02:00
Zi Yan	1bc3587a88	mm/page_alloc: add support for initializing pageblock as isolated MIGRATE_ISOLATE is a standalone bit, so a pageblock cannot be initialized to just MIGRATE_ISOLATE. Add init_pageblock_migratetype() to enable initialize a pageblock with a migratetype and isolated. Link: https://lkml.kernel.org/r/20250617021115.2331563-4-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Richard Chang <richardycc@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-13 16:38:17 -07:00
Oscar Salvador	8e1bf051c5	kernel,cpuset: use node-notifier instead of memory-notifier cpuset is only concerned when a numa node changes its memory state, as it needs to know the current numa nodes with memory to keep an updated mems_allowed mask. So stop using the memory notifier and use the new numa node notifer instead. Link: https://lkml.kernel.org/r/20250616135158.450136-9-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-13 16:38:16 -07:00
Vlastimil Babka	6b233784b1	mm, madvise: extract mm code from prctl_set_vma() to mm/madvise.c Setting anon_name is done via madvise_set_anon_name() and behaves a lot of like other madvise operations. However, apparently because madvise() has lacked the 4th argument and prctl() not, the userspace entry point has been implemented via prctl(PR_SET_VMA, ...) and handled first by prctl_set_vma(). Currently prctl_set_vma() lives in kernel/sys.c but setting the vma->anon_name is mm-specific code so extract it to a new set_anon_vma_name() function under mm. mm/madvise.c seems to be the most straightforward place as that's where madvise_set_anon_name() lives. Stop declaring the latter in mm.h and instead declare set_anon_vma_name(). Link: https://lkml.kernel.org/r/20250624-anon_name_cleanup-v2-2-600075462a11@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Colin Cross <ccross@google.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-13 16:38:13 -07:00
Linus Torvalds	0a197b7576	Merge tag 'perf_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Prevent perf_sigtrap() from observing an exiting task and warning about it * tag 'perf_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix WARN in perf_sigtrap()	2025-07-13 10:34:47 -07:00
Andrew Morton	cac3d177c0	Merge branch 'mm-hotfixes-stable' into mm-stable to pick up changes which are required for a merge of the series "mm: folio_pte_batch() improvements".	2025-07-12 14:48:26 -07:00
Linus Torvalds	3f31a806a6	Merge tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. A whopping 16 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 14 are for MM. Three gdb-script fixes and a kallsyms build fix" * tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "sched/numa: add statistics of numa balance task" mm: fix the inaccurate memory statistics issue for users mm/damon: fix divide by zero in damon_get_intervals_score() samples/damon: fix damon sample mtier for start failure samples/damon: fix damon sample wsse for start failure samples/damon: fix damon sample prcl for start failure kasan: remove kasan_find_vm_area() to prevent possible deadlock scripts: gdb: vfs: support external dentry names mm/migrate: fix do_pages_stat in compat mode mm/damon/core: handle damon_call_control as normal under kdmond deactivation mm/rmap: fix potential out-of-bounds page table access during batched unmap mm/hugetlb: don't crash when allocating a folio if there are no resv scripts/gdb: de-reference per-CPU MCE interrupts scripts/gdb: fix interrupts.py after maple tree conversion maple_tree: fix mt_destroy_walk() on root leaf node mm/vmalloc: leave lazy MMU mode on PTE mapping error scripts/gdb: fix interrupts display after MCP on x86 lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() kallsyms: fix build without execinfo	2025-07-12 10:30:47 -07:00
Jinliang Zheng	f84a15b90d	locking/rwsem: Use OWNER_NONSPINNABLE directly instead of OWNER_SPINNABLE After commit `7d43f1ce9d` ("locking/rwsem: Enable time-based spinning on reader-owned rwsem"), OWNER_SPINNABLE contains all possible values except OWNER_NONSPINNABLE, namely OWNER_NULL \| OWNER_WRITER \| OWNER_READER. Therefore, it is better to use OWNER_NONSPINNABLE directly to determine whether to exit optimistic spin. And, remove useless OWNER_SPINNABLE to simplify the code. Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250610130158.4876-1-alexjlzheng@tencent.com	2025-07-11 15:11:54 -07:00
Jakub Kicinski	0cad34fb7c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc6-2). No conflicts. Adjacent changes: drivers/net/wireless/mediatek/mt76/mt7925/mcu.c `c701574c54` ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan") `b3a431fe2e` ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()") drivers/net/wireless/mediatek/mt76/mt7996/mac.c `62da647a2b` ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()") `dc66a129ad` ("wifi: mt76: add a wrapper for wcid access with validation") drivers/net/wireless/mediatek/mt76/mt7996/main.c `3dd6f67c66` ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()") `8989d8e90f` ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()") net/mac80211/cfg.c `58fcb1b428` ("wifi: mac80211: reject VHT opmode for unsupported channel widths") `037dc18ac3` ("wifi: mac80211: add support for storing station S1G capabilities") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-11 11:42:38 -07:00
Nam Cao	9a425da913	panic: Fix up description of vpanic() The description above vpanic() has the wrong function name. Fix it up. Cc: John Ogness <john.ogness@linutronix.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/23a7e8add6546b155371b7e0fbb37bb1def13d6e.1752232374.git.namcao@linutronix.de Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20250711183802.2d8c124d@canb.auug.org.au/ Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-11 14:25:39 -04:00
Tao Chen	0eeeebdcc5	bpf: Remove attach_type in bpf_tracing_link Use attach_type in bpf_link, and remove it in bpf_tracing_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-7-chen.dylane@linux.dev	2025-07-11 11:01:08 -07:00
Tao Chen	2a76a80c7f	bpf: Remove attach_type in bpf_netns_link Use attach_type in bpf_link, and remove it in bpf_netns_link. And move netns_type field to the end to fill the byte hole. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20250710032038.888700-6-chen.dylane@linux.dev	2025-07-11 11:01:04 -07:00
Tao Chen	6e816e1c05	bpf: Remove location field in tcx_link Use attach_type in bpf_link to replace the location filed, and remove location field in tcx_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20250710032038.888700-5-chen.dylane@linux.dev	2025-07-11 11:00:57 -07:00
Tao Chen	9b8d543dc2	bpf: Remove attach_type in bpf_cgroup_link Use attach_type in bpf_link, and remove it in bpf_cgroup_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-3-chen.dylane@linux.dev	2025-07-11 10:51:55 -07:00
Tao Chen	b725441f02	bpf: Add attach_type field to bpf_link Attach_type will be set when a link is created by user. It is better to record attach_type in bpf_link generically and have it available universally for all link types. So add the attach_type field in bpf_link and move the sleepable field to avoid unnecessary gap padding. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-2-chen.dylane@linux.dev	2025-07-11 10:51:55 -07:00
Paul Chaignon	6279846b9b	bpf: Forget ranges when refining tnum after JSET Syzbot reported a kernel warning due to a range invariant violation on the following BPF program. 0: call bpf_get_netns_cookie 1: if r0 == 0 goto <exit> 2: if r0 & Oxffffffff goto <exit> The issue is on the path where we fall through both jumps. That path is unreachable at runtime: after insn 1, we know r0 != 0, but with the sign extension on the jset, we would only fallthrough insn 2 if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to figure this out, so the verifier walks all branches. The verifier then refines the register bounds using the second condition and we end up with inconsistent bounds on this unreachable path: 1: if r0 == 0 goto <exit> r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff) 2: if r0 & 0xffffffff goto <exit> r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0) r0 after reg_bounds_sync: u64=[0x1, 0] var_off=(0, 0) Improving the range refinement for JSET to cover all cases is tricky. We also don't expect many users to rely on JSET given LLVM doesn't generate those instructions. So instead of improving the range refinement for JSETs, Eduard suggested we forget the ranges whenever we're narrowing tnums after a JSET. This patch implements that approach. Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/9d4fd6432a095d281f815770608fdcd16028ce0b.1752171365.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-11 10:45:25 -07:00
Emil Tsalapatis	8fc3d2d8b5	bpf/arena: add bpf_arena_reserve_pages kfunc Add a new BPF arena kfunc for reserving a range of arena virtual addresses without backing them with pages. This prevents the range from being populated using bpf_arena_alloc_pages(). Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250709191312.29840-2-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-11 10:43:54 -07:00
David Kaplan	19c24f7ee3	cpu: Define attack vectors Define 4 new attack vectors that are used for controlling CPU speculation mitigations. These may be individually disabled as part of the mitigations= command line. Attack vector controls are combined with global options like 'auto' or 'auto,nosmt' like 'mitigations=auto,no_user_kernel'. The global options come first in the mitigations= string. Cross-thread mitigations can either remain enabled fully, including potentially disabling SMT ('auto,nosmt'), remain enabled except for disabling SMT ('auto'), or entirely disabled through the new 'no_cross_thread' attack vector option. The default settings for these attack vectors are consistent with existing kernel defaults, other than the automatic disabling of VM-based attack vectors if KVM support is not present. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250707183316.1349127-3-david.kaplan@amd.com	2025-07-11 17:55:16 +02:00
Linus Torvalds	a0f8361c3c	Merge tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fix from Marek Szyprowski: - small fix relevant to arm64 server and custom CMA configuration (Feng Tang) * tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-contiguous: hornor the cma address limit setup by user	2025-07-11 08:49:25 -07:00
Sebastian Andrzej Siewior	760e6f7bef	futex: Remove support for IMMUTABLE The FH_FLAG_IMMUTABLE flag was meant to avoid the reference counting on the private hash and so to avoid the performance regression on big machines. With the switch to per-CPU counter this is no longer needed. That flag was never useable on any released kernel. Remove any support for IMMUTABLE while preserve the flags argument and enforce it to be zero. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250710110011.384614-5-bigeasy@linutronix.de	2025-07-11 16:02:01 +02:00
Sebastian Andrzej Siewior	fb3c553da7	futex: Make futex_private_hash_get() static futex_private_hash_get() is not used outside if its compilation unit. Make it static. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250710110011.384614-4-bigeasy@linutronix.de	2025-07-11 16:02:00 +02:00
Peter Zijlstra	56180dd20c	futex: Use RCU-based per-CPU reference counting instead of rcuref_t The use of rcuref_t for reference counting introduces a performance bottleneck when accessed concurrently by multiple threads during futex operations. Replace rcuref_t with special crafted per-CPU reference counters. The lifetime logic remains the same. The newly allocate private hash starts in FR_PERCPU state. In this state, each futex operation that requires the private hash uses a per-CPU counter (an unsigned int) for incrementing or decrementing the reference count. When the private hash is about to be replaced, the per-CPU counters are migrated to a atomic_t counter mm_struct::futex_atomic. The migration process: - Waiting for one RCU grace period to ensure all users observe the current private hash. This can be skipped if a grace period elapsed since the private hash was assigned. - futex_private_hash::state is set to FR_ATOMIC, forcing all users to use mm_struct::futex_atomic for reference counting. - After a RCU grace period, all users are guaranteed to be using the atomic counter. The per-CPU counters can now be summed up and added to the atomic_t counter. If the resulting count is zero, the hash can be safely replaced. Otherwise, active users still hold a valid reference. - Once the atomic reference count drops to zero, the next futex operation will switch to the new private hash. call_rcu_hurry() is used to speed up transition which otherwise might be delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The side effects would be that on auto scaling the new hash is used later and the SET_SLOTS prctl() will block longer. [bigeasy: commit description + mm get/ put_async] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de	2025-07-11 16:02:00 +02:00
Nam Cao	a6b0465bd2	irqdomain: Export irq_domain_free_irqs_top() Export irq_domain_free_irqs_top(), making it usable for drivers compiled as modules. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-07-11 12:57:00 +01:00
Rafael J. Wysocki	fd4716a151	Merge back earlier changes related to system suspend and hibernation	2025-07-11 11:39:59 +02:00
Jakub Kicinski	3321e97eab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml `0a12c435a1` ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") `b3603c0466` ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 10:10:49 -07:00
Samuel Zhang	2640e81947	PM: hibernate: shrink shmem pages after dev_pm_ops.prepare() When hibernate with data center dGPUs, huge number of VRAM data will be moved to shmem during dev_pm_ops.prepare(). These shmem pages take a lot of system memory so that there's no enough free memory for creating the hibernation image. This will cause hibernation fail and abort. After dev_pm_ops.prepare(), call shrink_all_memory() to force move shmem pages to swap disk and reclaim the pages, so that there's enough system memory for hibernation image and less pages needed to copy to the image. This patch can only flush and free about half shmem pages. It will be better to flush and free more pages, even all of shmem pages, so that there're less pages to be copied to the hibernation image and the overall hibernation time can be reduced. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20250710062313.3226149-4-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>	2025-07-10 10:49:45 -05:00
Tudor Ambarus	f747cde5e7	PM: sleep: add kernel parameter to disable asynchronous suspend/resume On some platforms, device dependencies are not properly represented by device links, which can cause issues when asynchronous power management is enabled. While it is possible to disable this via sysfs, doing so at runtime can race with the first system suspend event. This patch introduces a kernel command-line parameter, "pm_async", which can be set to "off" to globally disable asynchronous suspend and resume operations from early boot. It effectively provides a way to set the initial value of the existing pm_async sysfs knob at boot time. This offers a robust method to fall back to synchronous (sequential) operation, which can stabilize platforms with problematic dependencies and also serve as a useful debugging tool. The default behavior remains unchanged (asynchronous enabled). To disable it, boot the kernel with the "pm_async=off" parameter. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20250709-pm-async-off-v3-1-cb69a6fc8d04@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-10 14:40:07 +02:00
Jiazi Li	fed307b67c	kthread: update comment for __to_kthread With commit `343f4c49f2` ("kthread: Don't allocate kthread_struct for init and umh") and commit `753550eb0c` ("fork: Explicitly set PF_KTHREAD"), umh task no longer have struct kthread and PF_KTHREAD flag. Update the comment to describe what the current rules are to detect is something is a kthread. Link: https://lkml.kernel.org/r/20250620100801.23185-1-jqqlijiazi@gmail.com Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com> Signed-off-by: mingzhu.wang <mingzhu.wang@transsion.com> Suggested-by Eric W . Biederman <ebiederm@xmission.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-09 22:57:55 -07:00
Pasha Tatashin	64960497ea	fork: clean up ifdef logic around stack allocation There is an unneeded OR in the ifdef functions that are used to allocate and free kernel stacks based on direct map or vmap. Adding dynamic stack support would complicate this logic even further. Therefore, clean up by changing the order so OR is no longer needed. Link: https://lkml.kernel.org/r/20250618-fork-fixes-v4-1-2e05a2e1f5fc@linaro.org Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/20240311164638.2015063-3-pasha.tatashin@soleen.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-09 22:57:54 -07:00

... 6 7 8 9 10 ...

49087 Commits