linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 05:31:37 -04:00

Author	SHA1	Message	Date
JP Kobryn (Meta)	e4f4fc7aa8	mm: move pgscan, pgsteal, pgrefill to node stats There are situations where reclaim kicks in on a system with free memory. One possible cause is a NUMA imbalance scenario where one or more nodes are under pressure. It would help if we could easily identify such nodes. Move the pgscan, pgsteal, and pgrefill counters from vm_event_item to node_stat_item to provide per-node reclaim visibility. With these counters as node stats, the values are now displayed in the per-node section of /proc/zoneinfo, which allows for quick identification of the affected nodes. /proc/vmstat continues to report the same counters, aggregated across all nodes. But the ordering of these items within the readout changes as they move from the vm events section to the node stats section. Memcg accounting of these counters is preserved. The relocated counters remain visible in memory.stat alongside the existing aggregate pgscan and pgsteal counters. However, this change affects how the global counters are accumulated. Previously, the global event count update was gated on !cgroup_reclaim(), excluding memcg-based reclaim from /proc/vmstat. Now that mod_lruvec_state() is being used to update the counters, the global counters will include all reclaim. This is consistent with how pgdemote counters are already tracked. Finally, the virtio_balloon driver is updated to use global_node_page_state() to fetch the counters, as they are no longer accessible through the vm_events array. Link: https://lkml.kernel.org/r/20260219235846.161910-1-jp.kobryn@linux.dev Signed-off-by: JP Kobryn <jp.kobryn@linux.dev> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: David Hildenbrand <david@kernel.org> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Wei Xu <weixugc@google.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:58 -07:00
AnishMulay	54218f10df	selftests/mm: skip migration tests if NUMA is unavailable Currently, the migration test asserts that numa_available() returns 0. On systems where NUMA is not available (returning -1), such as certain ARM64 configurations or single-node systems, this assertion fails and crashes the test. Update the test to check the return value of numa_available(). If it is less than 0, skip the test gracefully instead of failing. This aligns the behavior with other MM selftests (like rmap) that skip when NUMA support is missing. Link: https://lkml.kernel.org/r/20260218163941.13499-1-anishm7030@gmail.com Fixes: `0c2d087284` ("mm: add selftests for migration entries") Signed-off-by: AnishMulay <anishm7030@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: Sayali Patil <sayalip@linux.ibm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:57 -07:00
Seongsu Park	3d443691ed	mm/pkeys: remove unused tsk parameter from arch_set_user_pkey_access() The tsk parameter in arch_set_user_pkey_access() is never used in the function implementations across all architectures (arm64, powerpc, x86). Link: https://lkml.kernel.org/r/20260219063506.545148-1-sgsu.park@samsung.com Signed-off-by: Seongsu Park <sgsu.park@samsung.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:57 -07:00
Liam R. Howlett	0e8cf9a31a	maple_tree: clean up mas_wr_node_store() The new_end does not need to be passed in as the data is already being checked. This allows for other areas to skip getting the node new_end in the calling function. The type was incorrectly void * instead of void __rcu *, which isn't an issue but is technically incorrect. Move the variable assignment to after the declarations to clean up the initial setup. Ensure there is something to copy before calling memcpy(). Link: https://lkml.kernel.org/r/20260130205935.2559335-31-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:57 -07:00
Liam R. Howlett	b82f4c811e	maple_tree: don't pass end to mas_wr_append() Figure out the end internally. This is necessary for future cleanups. Link: https://lkml.kernel.org/r/20260130205935.2559335-30-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:57 -07:00
Liam R. Howlett	2969241fa2	maple_tree: pass maple copy node to mas_wmb_replace() mas_wmb_replace() is called in three places with the same setup, move the setup into the function itself. The function needs to be relocated as it calls mtree_range_walk(). Link: https://lkml.kernel.org/r/20260130205935.2559335-29-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:57 -07:00
Liam R. Howlett	b8852ef30c	maple_tree: remove maple big node and subtree structs Now that no one uses the structures and functions, drop the dead code. Link: https://lkml.kernel.org/r/20260130205935.2559335-28-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:57 -07:00
Liam R. Howlett	280b792cac	maple_tree: use maple copy node for mas_wr_split() Instead of using the maple big node, use the maple copy node for reduced stack usage and aligning with mas_wr_rebalance() and mas_wr_spanning_store(). Splitting a node is similar to rebalancing, but a new evaluation of when to ascend is needed. The only other difference is that the data is pushed and never rebalanced at each level. The testing must also align with the changes to this commit to ensure the test suite continues to pass. Link: https://lkml.kernel.org/r/20260130205935.2559335-27-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:56 -07:00
Liam R. Howlett	11e7f22f5e	maple_tree: add cp_converged() helper When the maple copy node converges into a single entry, then certain operations can stop ascending the tree. This is used more later. Link: https://lkml.kernel.org/r/20260130205935.2559335-26-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:56 -07:00
Liam R. Howlett	0abff20819	maple_tree: add copy_tree_location() helper Extract the copying of the tree location from one maple state to another into its own function. This is used more later. Link: https://lkml.kernel.org/r/20260130205935.2559335-25-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:56 -07:00
Liam R. Howlett	ebfee00c0b	maple_tree: add test for rebalance calculation off-by-one During the big node removal, an incorrect rebalance step went too far up the tree causing insufficient nodes. Test the faulty condition by recreating the scenario in the userspace testing. Link: https://lkml.kernel.org/r/20260130205935.2559335-24-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:56 -07:00
Liam R. Howlett	971f0db159	maple_tree: use maple copy node for mas_wr_rebalance() operation Stop using the maple big node for rebalance operations by changing to more align with spanning store. The rebalance operation needs its own data calculation in rebalance_data(). In the event of too much data, the rebalance tries to push the data using push_data_sib(). If there is insufficient data, the rebalance operation will rebalance against a sibling (found with rebalance_sib()). The rebalance starts at the leaf and works its way upward in the tree using rebalance_ascend(). Most of the code is shared with spanning store such as the copy node having a new root, but is fundamentally different in that the data must come from a sibling. A parent maple state is used to track the parent location to avoid multiple mas_ascend() calls. The maple state tree location is copied from the parent to the mas (child) in the ascend step. Ascending itself is done in the main loop. Link: https://lkml.kernel.org/r/20260130205935.2559335-23-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:56 -07:00
Liam R. Howlett	b00a1804e6	maple_tree: add cp_is_new_root() helper Add a helper to do what is needed when the maple copy node contains a new root node. This is useful for future commits and is self-documenting code. [Liam.Howlett@oracle.com: remove warnings on older compilers] Link: https://lkml.kernel.org/r/malwmirqnpuxqkqrobcmzfkmmxipoyzwfs2nwc5fbpxlt2r2ej@wchmjtaljvw3 [akpm@linux-foundation.org: s/cp->slot[0]/&cp->slot[0]/, per Liam] Link: https://lkml.kernel.org/r/20260130205935.2559335-22-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:56 -07:00
Liam R. Howlett	62e9d349af	maple_tree: separate wr_split_store and wr_rebalance store type code path The split and rebalance store types both go through the same function that uses the big node. Separate the code paths so that each can be updated independently. No functional change intended Link: https://lkml.kernel.org/r/20260130205935.2559335-21-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:56 -07:00
Liam R. Howlett	448ec8c0a4	maple_tree: remove unnecessary return statements Functions do not need to state return at the end, unless skipping unwind. These can safely be dropped. Link: https://lkml.kernel.org/r/20260130205935.2559335-20-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:55 -07:00
Liam R. Howlett	3578d61c1c	maple_tree: inline mas_wr_spanning_rebalance() Now that the spanning rebalance is small, fully inline it in mas_wr_spanning_store(). No functional change. Link: https://lkml.kernel.org/r/20260130205935.2559335-19-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:55 -07:00
Liam R. Howlett	a9c6716e08	maple_tree: start using maple copy node for destination Stop using the maple subtree state and big node in favour of using three destinations in the maple copy node. That is, expand the way leaves were handled to all levels of the tree and use the maple copy node to track the new nodes. Extract out the sibling init into the data calculation since this is where the insufficient data can be detected. The remainder of the sibling code to shift the next iteration is moved to the spanning_ascend() function, since it is not always needed. Next introduce the dst_setup() function which will decide how many nodes are needed to contain the data at this level. Using the destination count, populate the copy node's dst array with the new nodes and set d_count to the correct value. Note that this can be tricky in the case of a leaf node with exactly enough room because of the rule against NULLs at the end of leaves. Once the destinations are ready, copy the data by altering the cp_data_write() function to copy from the sources to the destinations directly. This eliminates the use of the big node in this code path. On node completion, node_finalise() will zero out the remaining area and set the metadata, if necessary. spanning_ascend() is used to decide if the operation is complete. It may create a new root, converge into one destination, or continue upwards by ascending the left and right write maple states. One test case setup needed to be tweaked so that the targeted node was surrounded by full nodes. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20260130205935.2559335-18-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:55 -07:00
Liam R. Howlett	20b20162e1	maple_tree: add gap support, slot and pivot sizes for maple copy Add plumbing work for using maple copy as a normal node for a source of copy operations. This is needed later. Link: https://lkml.kernel.org/r/20260130205935.2559335-17-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:55 -07:00
Liam R. Howlett	de7f3ed37c	maple_tree: introduce ma_leaf_max_gap() This is the same as mas_leaf_max_gap(), but the information necessary is known without a maple state in future code. Adding this function now simplifies the review for a subsequent patch. Link: https://lkml.kernel.org/r/20260130205935.2559335-16-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:55 -07:00
Liam R. Howlett	6953038cab	maple_tree: change initial big node setup in mas_wr_spanning_rebalance() Instead of copying the data into the big node and finding out that the data may need to be moved or appended to, calculate the data space up front (in the maple copy node) and set up another source for the copy. The additional copy source is tracked in the maple state sib (short for sibling), and is put into the maple write states for future operations after the data is in the big node. To facilitate the newly moved node, some initial setup of the maple subtree state are relocated after the potential shift caused by the new way of rebalancing against a sibling. Link: https://lkml.kernel.org/r/20260130205935.2559335-15-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:55 -07:00
Liam R. Howlett	f141d56643	maple_tree: inline mas_spanning_rebalance_loop() into mas_wr_spanning_rebalance() Just copy the code and replace count with height. This is done to avoid affecting other code paths into mas_spanning_rebalance_loop() for the next change. No functional change intended. Link: https://lkml.kernel.org/r/20260130205935.2559335-14-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:55 -07:00
Liam R. Howlett	b14ffd2c6c	maple_tree: testing update for spanning store Spanning store had some corner cases which showed up during rcu stress testing. Add explicit tests for those cases. At the same time add some locking for easier visibility of the rcu stress testing. Only a single dump of the tree will happen on the first detected issue instead of flooding the console with output. Link: https://lkml.kernel.org/r/20260130205935.2559335-13-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:54 -07:00
Liam R. Howlett	9ec1e972c3	maple_tree: introduce maple_copy node and use it in mas_spanning_rebalance() Introduce an internal-memory only node type called maple_copy to facilitate internal copy operations. Use it in mas_spanning_rebalance() for just the leaf nodes. Initially, the maple_copy node is used to configure the source nodes and copy the data into the big_node. The maple_copy contains a list of source entries with start and end offsets. One of the maple_copy entries can be itself with an offset of 0 to 2, representing the data where the store partially overwrites entries, or fully overwrites the entry. The side effect is that the source nodes no longer have to worry about partially copying the existing offset if it is not fully overwritten. This is in preparation of removal of the maple big_node, but for the time being the data is copied to the big node to limit the change size. Link: https://lkml.kernel.org/r/20260130205935.2559335-12-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:54 -07:00
Liam R. Howlett	6b74d44b62	maple_tree: correct right ma_wr_state end pivot in mas_wr_spanning_store() The end_piv will be needed in the next patch set and has not been set correctly in this code path. Correct the oversight before using it. Link: https://lkml.kernel.org/r/20260130205935.2559335-11-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:54 -07:00
Liam R. Howlett	2fce1c3c47	maple_tree: move maple_subtree_state from mas_wr_spanning_store to mas_wr_spanning_rebalance Moving the maple_subtree_state is necessary for future cleanups and is only set up in mas_wr_spanning_rebalance() but never used. Link: https://lkml.kernel.org/r/20260130205935.2559335-10-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:54 -07:00
Liam R. Howlett	3680159527	maple_tree: don't pass through height in mas_wr_spanning_store Height is not used locally in the function, so call the height argument closer to where it is passed in the next level. Link: https://lkml.kernel.org/r/20260130205935.2559335-9-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:54 -07:00
Liam R. Howlett	41bcc348f2	maple_tree: remove l_wr_mas from mas_wr_spanning_rebalance Use the wr_mas instead of creating another variable on the stack. Take the opportunity to remove l_mas from being used anywhere but in the maple_subtree_state. Link: https://lkml.kernel.org/r/20260130205935.2559335-8-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:54 -07:00
Liam R. Howlett	3dd3dbaac1	maple_tree: make ma_wr_states reliable for reuse in spanning store mas_extend_spanning_null() was not modifying the range min and range max of the resulting store operation. The result was that the maple write state no longer matched what the write was doing. This was not an issue as the values were previously not used, but to make the ma_wr_state usable in future changes, the range min/max stored in the ma_wr_state for left and right need to be consistent with the operation. Link: https://lkml.kernel.org/r/20260130205935.2559335-7-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:54 -07:00
Liam R. Howlett	6f2e522186	maple_tree: inline mas_spanning_rebalance() into mas_wr_spanning_rebalance() Copy the contents of mas_spanning_rebalance() into mas_wr_spanning_rebalance(), in preparation of removing initial big node use. No functional changes intended. Link: https://lkml.kernel.org/r/20260130205935.2559335-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:53 -07:00
Liam R. Howlett	a2ac9935d3	maple_tree: remove unnecessary assignment of orig_l index The index value is already a copy of the maple state so there is no need to set it again. Link: https://lkml.kernel.org/r/20260130205935.2559335-5-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:53 -07:00
Liam R. Howlett	df11f9ee8f	maple_tree: extract use of big node from mas_wr_spanning_store() Isolate big node to use in its own function. No functional changes intended. Link: https://lkml.kernel.org/r/20260130205935.2559335-4-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:53 -07:00
Liam R. Howlett	3e302560b9	maple_tree: move mas_spanning_rebalance loop to function Move the loop over the tree levels to its own function. No intended functional changes. Link: https://lkml.kernel.org/r/20260130205935.2559335-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:53 -07:00
Liam R. Howlett	6884832472	maple_tree: fix mas_dup_alloc() sparse warning Patch series "maple_tree: Replace big node with maple copy", v3. The big node struct was created for simplicity of splitting, rebalancing, and spanning store operations by using a copy buffer to create the data necessary prior to breaking it up into 256B nodes. Certain operations were rather tricky due to the restriction of keeping NULL entries together and never at the end of a node (except the right-most node). The big node struct is incompatible with future features that are currently in development. Specifically different node types and different data type sizes for pivots. The big node struct was also a stack variable, which caused issues with certain configurations of kernel build. This series removes big node by introducing another node type which will never be written to the tree: maple_copy. The maple copy node operates more like a scatter/gather operation with a number of sources and destinations of allocated nodes. The sources are copied to the destinations, in turn, until the sources are exhausted. The destination is changed if it is filled or the split location is reached prior to the source data end. New data is inserted by using the maple copy node itself as a source with up to 3 slots and pivots. The data in the maple copy node is the data being written to the tree along with any fragment of the range(s) being overwritten. As with all nodes, the maple copy node is of size 256B. Using a node type allows for the copy operation to treat the new data stored in the maple copy node the same as any other source node. Analysis of the runtime shows no regression or benefit of removing the larger stack structure. The motivation is the ground work to use new node types and to help those with odd configurations that have had issues. The change was tested by myself using mm_tests on amd64 and by Suren on android (arm64). Limited testing on s390 qemu was also performed using stress-ng on the virtual memory, which should cover many corner cases. This patch (of 30): Use RCU_INIT_POINTER to initialize an rcu pointer to an initial value since there are no readers within the tree being created during duplication. There is no risk of readers seeing the initialized or uninitialized value until after the synchronization call in mas_dup_buld(). Link: https://lkml.kernel.org/r/20260130205935.2559335-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20260130205935.2559335-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrew Ballance <andrewjballance@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:53 -07:00
Kevin Lourenco	0fd66c343c	mm/fadvise: validate offset in generic_fadvise When converted to (u64) for page calculations, a negative offset can produce extremely large page indices. This may lead to issues in certain advice modes (excessive readahead or cache invalidation). Reject negative offsets with -EINVAL for consistent argument validation and to avoid silent misbehavior. POSIX and the man page do not clearly define behavior for negative offset/len. FreeBSD rejects negative offsets as well, so failing with -EINVAL is consistent with existing practice. The man page can be updated separately to document the Linux behavior. Link: https://lkml.kernel.org/r/20260208135738.18992-1-klourencodev@gmail.com Link: https://lkml.kernel.org/r/20251222141817.13335-1-klourencodev@gmail.com Signed-off-by: Kevin Lourenco <k.lourenco@criteo.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:53 -07:00
xu xin	318d87b8fa	ksm: initialize the addr only once in rmap_walk_ksm This is a minor performance optimization, especially when there are many for-loop iterations, because the addr variable doesn't change across iterations. Therefore, it only needs to be initialized once before the loop. Link: https://lkml.kernel.org/r/20260212192820223O_r2NQzSEPG_C56cs-z4l@zte.com.cn Link: https://lkml.kernel.org/r/20260212192932941MSsJEAyoRW4YdLBN7_myn@zte.com.cn Signed-off-by: xu xin <xu.xin16@zte.com.cn> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Hugh Dickins <hughd@google.com> Cc: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:52:35 -07:00
Jiaqi Yan	34ca46cc6f	fs: hugetlb: simplify remove_inode_hugepages() return type When remove_inode_hugepages() was introduced in commit `c86272287b` ("hugetlb: create remove_inode_single_folio to remove single file folio") it used to return a boolean to indicate if it bailed out due to race with page faults. However, since the race is already solved by [1], remove_inode_hugepages() doesn't have any path to return false anymore. Simplify remove_inode_hugepages() return type to void, remove the unnecessary ret variable, and adjust the call site in remove_inode_hugepages(). No functional change in this commit. Link: https://lkml.kernel.org/r/20260204214741.3161520-1-jiaqiyan@google.com Link: https://lore.kernel.org/all/20220914221810.95771-10-mike.kravetz@oracle.com [1] Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Suggested-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: David Hildenbrand (arm) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-24 14:38:26 -07:00
Altan Hacigumus	260d70819c	mm/shrinker: fix refcount leak in shrink_slab_memcg() When kmem is disabled for memcg, slab-backed shrinkers are skipped. However, shrink_slab_memcg() doesn't drop the reference acquired via shrinker_try_get() before continuing. Add the missing shrinker_put(). Also, since memcg_kmem_online() and shrinker flags cannot change dynamically, remove the shrinker from the bitmap to avoid unnecessary future scans. Link: https://lkml.kernel.org/r/20260204033553.50039-1-ahacigu.linux@gmail.com Fixes: `50d09da8e1` ("mm: shrinker: make memcg slab shrink lockless") Signed-off-by: Altan Hacigumus <ahacigu.linux@gmail.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Link: https://lore.kernel.org/r/20260203073757.135088-1-ahacigu.linux@gmail.com Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Dave Chinner <david@fromorbit.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-24 14:38:26 -07:00
qinyu	b8a4b08838	mm/damon/ops-common: remove redudnant mmu notifier call in pmdp mkold Currently, mmu_notifier_clear_young() is called immediately after pmdp_clear_young_notify(), which already calls mmu_notifier_clear_young() internally. This results in a redundant notifier call. Replace pmdp_clear_young_notify() with the non-notify variant to avoid the duplicate call and make the pmdp path consistent with the corresponding ptep_mkold() code. Link: https://lkml.kernel.org/r/20260203095400.2465255-1-qin.yuA@h3c.com Signed-off-by: qinyu <qin.yuA@h3c.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-24 14:38:25 -07:00
Shengming Hu	7e74dd0316	mm/page_alloc: avoid overcounting bulk alloc in watermark check alloc_pages_bulk_noprof() only fills NULL slots and already tracks how many entries are pre-populated via nr_populated. The fast watermark check was adding nr_pages unconditionally, which can overestimate the demand. Use (nr_pages - nr_populated) instead, as an upper bound on the remaining pages this call can still allocate without scanning the whole array. Link: https://lkml.kernel.org/r/tencent_F36C5B5FB4DED98C79D9BDEE1210CD338C06@qq.com Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-24 14:38:25 -07:00
Kairui Song	396f57b572	mm, swap: speed up hibernation allocation and writeout Since commit `0ff67f990b` ("mm, swap: remove swap slot cache"), hibernation has been using the swap slot slow allocation path for simplification, which turns out might cause regression for some devices because the allocator now rotates clusters too often, leading to slower allocation and more random distribution of data. Fast allocation is not complex, so implement hibernation support as well. Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the performance is several times better [1]: 6.19: 324 seconds After this series: 35 seconds Link: https://lkml.kernel.org/r/20260216-hibernate-perf-v4-1-1ba9f0bf1ec9@tencent.com Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1] Signed-off-by: Kairui Song <kasong@tencent.com> Fixes: `0ff67f990b` ("mm, swap: remove swap slot cache") Reported-by: Carsten Grohmann <mail@carstengrohmann.de> Closes: https://lore.kernel.org/linux-mm/20260206121151.dea3633d1f0ded7bbf49c22e@linux-foundation.org/ Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-24 14:38:25 -07:00
Linus Torvalds	24f9515de8	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Clear the pending exception state from a vcpu coming out of reset, as it could otherwise affect the first instruction executed in the guest - Fix pointer arithmetic in address translation emulation, so that the Hardware Access bit is set on the correct PTE instead of some other location s390: - Fix deadlock in new memory management - Properly handle kernel faults on donated memory - Fix bounds checking for irq routing, with selftest - Fix invalid machine checks and log all of them" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm64: Fix the descriptor address in __kvm_at_swap_desc() KVM: s390: vsie: Avoid injecting machine check on signal KVM: s390: log machine checks more aggressively KVM: s390: selftests: Add IRQ routing address offset tests KVM: s390: Limit adapter indicator access to mapped page s390/mm: Add missing secure storage access fixups for donated memory KVM: arm64: Discard PC update state on vcpu reset KVM: s390: Fix a deadlock	2026-03-24 13:11:26 -07:00
Linus Torvalds	45f667ebb0	Merge tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) fixes from Dave Jiang: - Adjust the startup priority of cxl_pmem to be higher than that of cxl_acpi - Use proper endpoint validity check upon sanitize - Avoid incorrect DVSEC fallback when HDM decoders are enabled - Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch - Fix leakage in __construct_region() - Fix use after free of parent_port in cxl_detach_ep() * tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: Adjust the startup priority of cxl_pmem to be higher than that of cxl_acpi cxl/mbox: Use proper endpoint validity check upon sanitize cxl/hdm: Avoid incorrect DVSEC fallback when HDM decoders are enabled cxl/acpi: Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch cxl/region: Fix leakage in __construct_region() cxl/port: Fix use after free of parent_port in cxl_detach_ep()	2026-03-24 12:41:29 -07:00
Paolo Bonzini	52dad81e4b	Merge tag 'kvmarm-fixes-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.0, take #4 - Clear the pending exception state from a vcpu coming out of reset, as it could otherwise affect the first instruction executed in the guest. - Fix the address translation emulation icode to set the Hardware Access bit on the correct PTE instead of some other location.	2026-03-24 17:32:30 +01:00
Paolo Bonzini	12fd965871	Merge tag 'kvm-s390-master-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for 7.0 - fix deadlock in new memory management - handle kernel faults on donated memory properly - fix bounds checking for irq routing + selftest - fix invalid machine checks + logging	2026-03-24 17:32:13 +01:00
Linus Torvalds	e3c33bc767	Merge tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "6 hotfixes. 2 are cc:stable. All are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/damon/stat: monitor all System RAM resources mm/zswap: add missing kunmap_local() mailmap: update email address for Muhammad Usama Anjum zram: do not slot_free() written-back slots mm/damon/core: avoid use of half-online-committed context mm/rmap: clear vma->anon_vma on error	2026-03-24 09:12:45 -07:00
Linus Torvalds	26a01984dd	Merge tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix parsing 'overwrite' in command line event definitions in big-endian machines by writing correct union member - Fix finding default metric in 'perf stat' - Fix relative paths for including headers in 'perf kvm stat' - Sync header copies with the kernel sources: msr-index.h, kvm, build_bug.h * tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: tools headers: Synchronize linux/build_bug.h with the kernel sources tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources tools headers UAPI: Sync linux/kvm.h with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources perf kvm stat: Fix relative paths for including headers perf parse-events: Fix big-endian 'overwrite' by writing correct union member perf metricgroup: Fix metricgroup__has_metric_or_groups() tools headers: Skip arm64 cputype.h check	2026-03-24 08:58:38 -07:00
Linus Torvalds	97a48d1aab	Merge tag 'media/v7.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - rkvdec: fix stack usage with clang and improve handling missing short/long term RPS - synopsys: fix a Kconfig issue and an out-of-bounds check - verisilicon: Fix kernel panic due to __initconst misuse - media core: serialize REINIT and REQBUFS with req_queue_mutex * tag 'media/v7.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: verisilicon: Fix kernel panic due to __initconst misuse media: rkvdec: reduce stack usage in rkvdec_init_v4l2_vp9_count_tbl() media: rkvdec: reduce excessive stack usage in assemble_hw_pps() media: rkvdec: Improve handling missing short/long term RPS media: mc, v4l2: serialize REINIT and REQBUFS with req_queue_mutex media: synopsys: csi2rx: add missing kconfig dependency media: synopsys: csi2rx: fix out-of-bounds check for formats array	2026-03-24 08:56:36 -07:00
Linus Torvalds	a0124352d5	Merge tag 'xsa482-7.0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Restrict the xen privcmd driver in unprivileged domU to only allow hypercalls to target domain when using secure boot" * tag 'xsa482-7.0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/privcmd: add boot control for restricted usage in domU xen/privcmd: restrict usage in unprivileged domU	2026-03-23 21:30:14 -07:00
SeongJae Park	84481e705a	mm/damon/stat: monitor all System RAM resources DAMON_STAT usage document (Documentation/admin-guide/mm/damon/stat.rst) says it monitors the system's entire physical memory. But, it is monitoring only the biggest System RAM resource of the system. When there are multiple System RAM resources, this results in monitoring only an unexpectedly small fraction of the physical memory. For example, suppose the system has a 500 GiB System RAM, 10 MiB non-System RAM, and 500 GiB System RAM resources in order on the physical address space. DAMON_STAT will monitor only the first 500 GiB System RAM. This situation is particularly common on NUMA systems. Select a physical address range that covers all System RAM areas of the system, to fix this issue and make it work as documented. [sj@kernel.org: return error if monitoring target region is invalid] Link: https://lkml.kernel.org/r/20260317053631.87907-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260316235118.873-1-sj@kernel.org Fixes: `369c415e60` ("mm/damon: introduce DAMON_STAT module") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-23 09:35:05 -07:00
Lorenzo Stoakes (Oracle)	631c111150	mm/zswap: add missing kunmap_local() Commit `e2c3b6b21c` ("mm: zswap: use SG list decompression APIs from zsmalloc") updated zswap_decompress() to use the scatterwalk API to copy data for uncompressed pages. In doing so, it mapped kernel memory locally for 32-bit kernels using kmap_local_folio(), however it never unmapped this memory. This resulted in the linked syzbot report where a BUG_ON() is triggered due to leaking the kmap slot. This patch fixes the issue by explicitly unmapping the established kmap. Also, add flush_dcache_folio() after the kunmap_local() call I had assumed that a new folio here combined with the flush that is done at the point of setting the PTE would suffice, but it doesn't seem that's actually the case, as update_mmu_cache() will in many archtectures only actually flush entries where a dcache flush was done on a range previously. I had also wondered whether kunmap_local() might suffice, but it doesn't seem to be the case. Some arches do seem to actually dcache flush on unmap, parisc does it if CONFIG_HIGHMEM is not set by setting ARCH_HAS_FLUSH_ON_KUNMAP and calling kunmap_flush_on_unmap() from __kunmap_local(), otherwise non-CONFIG_HIGHMEM callers do nothing here. Otherwise arch_kmap_local_pre_unmap() is called which does: * sparc - flush_cache_all() * arm - if VIVT, __cpuc_flush_dcache_area() * otherwise - nothing Also arch_kmap_local_post_unmap() is called which does: * arm - local_flush_tlb_kernel_page() * csky - kmap_flush_tlb() * microblaze, ppc - local_flush_tlb_page() * mips - local_flush_tlb_one() * sparc - flush_tlb_all() (again) * x86 - arch_flush_lazy_mmu_mode() * otherwise - nothing But this is only if it's high memory, and doesn't cover all architectures, so is presumably intended to handle other cache consistency concerns. In any case, VIPT is problematic here whether low or high memory (in spite of what the documentation claims, see [0] - 'the kernel did write to a page that is in the page cache page and / or in high memory'), because dirty cache lines may exist at the set indexed by the kernel direct mapping, which won't exist in the set indexed by any subsequent userland mapping, meaning userland might read stale data from L2 cache. Even if the documentation is correct and low memory is fine not to be flushed here, we can't be sure as to whether the memory is low or high (kmap_local_folio() will be a no-op if low), and this call should be harmless if it is low. VIVT would require more work if the memory were shared and already mapped, but this isn't the case here, and would anyway be handled by the dcache flush call. In any case, we definitely need this flush as far as I can tell. And we should probably consider updating the documentation unless it turns out there's somehow dcache synchronisation that happens for low memory/64-bit kernels elsewhere? [ljs@kernel.org: add flush_dcache_folio() after the kunmap_local() call] Link: https://lkml.kernel.org/r/13e09a99-181f-45ac-a18d-057faf94bccb@lucifer.local Link: https://lkml.kernel.org/r/20260316140122.339697-1-ljs@kernel.org Link: https://docs.kernel.org/core-api/cachetlb.html [0] Fixes: `e2c3b6b21c` ("mm: zswap: use SG list decompression APIs from zsmalloc") Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reported-by: syzbot+fe426bef95363177631d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69b75e2c.050a0220.12d28.015a.GAE@google.com Acked-by: Yosry Ahmed <yosry@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Yosry Ahmed <yosry@kernel.org> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-23 09:35:05 -07:00

1 2 3 4 5 ...

1428889 Commits