linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 16:21:45 -04:00

Author	SHA1	Message	Date
Mike Rapoport (Microsoft)	ab674b6871	execmem: drop writable parameter from execmem_fill_trapping_insns() After update of execmem_cache_free() that made memory writable before updating it, there is no need to update read only memory, so the writable parameter to execmem_fill_trapping_insns() is not needed. Drop it. Link: https://lkml.kernel.org/r/20250713071730.4117334-7-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:12 -07:00
Mike Rapoport (Microsoft)	3bd4e0ac61	execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP) When execmem populates ROX cache it uses vmalloc(VM_ALLOW_HUGE_VMAP). Although vmalloc falls back to allocating base pages if high order allocation fails, it may happen that it still cannot allocate enough memory. Right now ROX cache is only used by modules and in majority of cases the allocations happen at boot time when there's plenty of free memory, but upcoming enabling ROX cache for ftrace and kprobes would mean that execmem allocations can happen when the system is under memory pressure and a failure to allocate large page worth of memory becomes more likely. Fallback to regular vmalloc() if vmalloc(VM_ALLOW_HUGE_VMAP) fails. Link: https://lkml.kernel.org/r/20250713071730.4117334-6-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Mike Rapoport (Microsoft)	888b5a847b	execmem: move execmem_force_rw() and execmem_restore_rox() before use to avoid static declarations. Link: https://lkml.kernel.org/r/20250713071730.4117334-5-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Mike Rapoport (Microsoft)	187fd8521d	execmem: rework execmem_cache_free() Currently execmem_cache_free() ignores potential allocation failures that may happen in execmem_cache_add(). Besides, it uses text poking to fill the memory with trapping instructions before returning it to cache although it would be more efficient to make that memory writable, update it using memcpy and then restore ROX protection. Rework execmem_cache_free() so that in case of an error it will defer freeing of the memory to a delayed work. With this the happy fast path will now change permissions to RW, fill the memory with trapping instructions using memcpy, restore ROX permissions, add the memory back to the free cache and clear the relevant entry in busy_areas. If any step in the fast path fails, the entry in busy_areas will be marked as pending_free. These entries will be handled by a delayed work and freed asynchronously. To make the fast path faster, use __GFP_NORETRY for memory allocations and let asynchronous handler try harder with GFP_KERNEL. Link: https://lkml.kernel.org/r/20250713071730.4117334-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Mike Rapoport (Microsoft)	838955f64a	execmem: introduce execmem_alloc_rw() Some callers of execmem_alloc() require the memory to be temporarily writable even when it is allocated from ROX cache. These callers use execemem_make_temp_rw() right after the call to execmem_alloc(). Wrap this sequence in execmem_alloc_rw() API. Link: https://lkml.kernel.org/r/20250713071730.4117334-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Mike Rapoport (Microsoft)	fcd90ad31e	execmem: drop unused execmem_update_copy() Patch series "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes", v3. These patches enable use of EXECMEM_ROX_CACHE for ftrace and kprobes allocations on x86. They also include some ground work in execmem. Since the execmem model for caching large ROX pages changed from the initial assumption that the memory that is allocated from ROX cache is always ROX to the current state where memory can be temporarily made RW and then restored to ROX, we can stop using text poking to update it. This also saves the hassle of trying lock text_mutex in execmem_cache_free() when kprobes already hold that mutex. This patch (of 8): The execmem_update_copy() that used text poking was required when memory allocated from ROX cache was always read-only. Since now its permissions can be switched to read-write there is no need in a function that updates memory with text poking. Remove it. Link: https://lkml.kernel.org/r/20250713071730.4117334-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20250713071730.4117334-2-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Suren Baghdasaryan	9bbffee67f	mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped By inducing delays in the right places, Jann Horn created a reproducer for a hard to hit UAF issue that became possible after VMAs were allowed to be recycled by adding SLAB_TYPESAFE_BY_RCU to their cache. Race description is borrowed from Jann's discovery report: lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under rcu_read_lock(). At that point, the VMA may be concurrently freed, and it can be recycled by another process. vma_start_read() then increments the vma->vm_refcnt (if it is in an acceptable range), and if this succeeds, vma_start_read() can return a recycled VMA. In this scenario where the VMA has been recycled, lock_vma_under_rcu() will then detect the mismatching ->vm_mm pointer and drop the VMA through vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm. This is wrong: It implicitly assumes that the caller is keeping the VMA's mm alive, but in this scenario the caller has no relation to the VMA's mm, so the rcuwait_wake_up() can cause UAF. The diagram depicting the race: T1 T2 T3 == == == lock_vma_under_rcu mas_walk <VMA gets removed from mm> mmap <the same VMA is reallocated> vma_start_read __refcount_inc_not_zero_limited_acquire munmap __vma_enter_locked refcount_add_not_zero vma_end_read vma_refcount_put __refcount_dec_and_test rcuwait_wait_event <finish operation> rcuwait_wake_up [UAF] Note that rcuwait_wait_event() in T3 does not block because refcount was already dropped by T1. At this point T3 can exit and free the mm causing UAF in T1. To avoid this we move vma->vm_mm verification into vma_start_read() and grab vma->vm_mm to stabilize it before vma_refcount_put() operation. [surenb@google.com: v3] Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com Fixes: `3104138517` ("mm: make vma cache SLAB_TYPESAFE_BY_RCU") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: Jann Horn <jannh@google.com> Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+m2tA@mail.gmail.com/ Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Jann Horn	a222439e1e	mm/rmap: add anon_vma lifetime debug check If an anon folio is mapped into userspace, its anon_vma must be alive, otherwise rmap walks can hit UAF. There have been syzkaller reports a few months ago[1][2] of UAF in rmap walks that seems to indicate that there can be pages with elevated mapcount whose anon_vma has already been freed, but I think we never figured out what the cause is; and syzkaller only hit these UAFs when memory pressure randomly caused reclaim to rmap-walk the affected pages, so it of course didn't manage to create a reproducer. Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous folios to hopefully catch such issues more reliably. [1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com [2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com Link: https://lkml.kernel.org/r/20250725-anonvma-uaf-debug-v2-1-bc3c7e5ba5b1@google.com Signed-off-by: Jann Horn <jannh@google.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Harry Yoo <harry.yoo@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:11 -07:00
Lorenzo Stoakes	9a4f90e246	mm: remove mm/io-mapping.c This is dead code, which was used from commit `b739f125e4` ("i915: use io_mapping_map_user") but reverted a month later by commit `0e4fe0c9f2` ("Revert "i915: use io_mapping_map_user"") back in 2021. Since then nobody has used it, so remove it. [akpm@linux-foundation.org: update Documentation/core-api/mm-api.rst, per Vlastimil] Link: https://lkml.kernel.org/r/20250725142901.81502-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:10 -07:00
Dev Jain	22d0229093	khugepaged: optimize collapse_pte_mapped_thp() by PTE batching Use PTE batching to batch process PTEs mapping the same large folio. An improvement is expected due to batching mapcount manipulation on the folios, and for arm64 which supports contig mappings, the number of TLB flushes is also reduced. Note that we do not need to make a change to the check "if (folio_page(folio, i) != page)"; if i'th page of the folio is equal to the first page of our batch, then i + 1, .... i + nr_batch_ptes - 1 pages of the folio will be equal to the corresponding pages of our batch mapping consecutive pages. Link: https://lkml.kernel.org/r/20250724052301.23844-4-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:10 -07:00
Dev Jain	4ea3594a47	khugepaged: optimize __collapse_huge_page_copy_succeeded() by PTE batching Use PTE batching to batch process PTEs mapping the same large folio. An improvement is expected due to batching refcount-mapcount manipulation on the folios, and for arm64 which supports contig mappings, the number of TLB flushes is also reduced. Link: https://lkml.kernel.org/r/20250724052301.23844-3-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:10 -07:00
David Hildenbrand	3dfde97800	mm: add get_and_clear_ptes() and clear_ptes() Patch series "Optimizations for khugepaged", v4. If the underlying folio mapped by the ptes is large, we can process those ptes in a batch using folio_pte_batch(). For arm64 specifically, this results in a 16x reduction in the number of ptep_get() calls, since on a contig block, ptep_get() on arm64 will iterate through all 16 entries to collect a/d bits. Next, ptep_clear() will cause a TLBI for every contig block in the range via contpte_try_unfold(). Instead, use clear_ptes() to only do the TLBI at the first and last contig block of the range. For split folios, there will be no pte batching; the batch size returned by folio_pte_batch() will be 1. For pagetable split folios, the ptes will still point to the same large folio; for arm64, this results in the optimization described above, and for other arches, a minor improvement is expected due to a reduction in the number of function calls and batching atomic operations. This patch (of 3): Let's add variants to be used where "full" does not apply -- which will be the majority of cases in the future. "full" really only applies if we are about to tear down a full MM. Use get_and_clear_ptes() in existing code, clear_ptes() users will be added next. Link: https://lkml.kernel.org/r/20250724052301.23844-2-dev.jain@arm.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:10 -07:00
Jinjiang Tu	1623717b05	mm/mincore: hold PTL in mincore_hugetlb Hold PTL in mincore_hugetlb() to avoid operating on stale page, as mincore_pte_range() have done. Link: https://lkml.kernel.org/r/20250724090958.455887-4-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brahmajit Das <brahmajit.xyz@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joern Engel <joern@logfs.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:10 -07:00
Jinjiang Tu	9109bd5255	mm/memory-failure: hold PTL in hwpoison_hugetlb_range Hold PTL in hwpoison_hugetlb_range() to avoid operating on stale page, as hwpoison_pte_range() have done. This change is not known to address any issues which users have experienced. Link: https://lkml.kernel.org/r/20250725033112.2690158-1-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brahmajit Das <brahmajit.xyz@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joern Engel <joern@logfs.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:10 -07:00
Lorenzo Stoakes	6c2da14ae1	mm/mseal: rework mseal apply logic The logic can be simplified - firstly by renaming the inconsistently named apply_mm_seal() to mseal_apply(). We then wrap mseal_fixup() into the main loop as the logic is simple enough to not require it, equally it isn't a hugely pleasant pattern in mprotect() etc. so it's not something we want to perpetuate. We eliminate the need for invoking vma_iter_end() on each loop by directly determining if the VMA was merged - the only thing we need concern ourselves with is whether the start/end of the (gapless) range are offset into VMAs. This refactoring also avoids the rather horrid 'pass pointer to prev around' pattern used in mprotect() et al. No functional change intended. Link: https://lkml.kernel.org/r/ddfa4376ce29f19a589d7dc8c92cb7d4f7605a4c.1753431105.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Xu <jeffxu@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:09 -07:00
Lorenzo Stoakes	530e090964	mm/mseal: simplify and rename VMA gap check The check_mm_seal() function is doing something general - checking whether a range contains only VMAs (or rather that it does NOT contain any unmapped regions). So rename this function to range_contains_unmapped(). Additionally simplify the logic, we are simply checking whether the last vma->vm_end has either a VMA starting after it or ends before the end parameter. This check is rather dubious, so it is sensible to keep it local to mm/mseal.c as at a later stage it may be removed, and we don't want any other mm code to perform such a check. No functional change intended. [lorenzo.stoakes@oracle.com: add comment explaining why we disallow gaps on mseal()] Link: https://lkml.kernel.org/r/d85b3d55-09dc-43ba-8204-b48267a96751@lucifer.local Link: https://lkml.kernel.org/r/dd50984eff1e242b5f7f0f070a3360ef760e06b8.1753431105.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:09 -07:00
Lorenzo Stoakes	8b2914162a	mm/mseal: small cleanups Drop the wholly unnecessary set_vma_sealed() helper(), which is used only once, and place VMA_ITERATOR() declarations in the correct place. Retain vma_is_sealed(), and use it instead of the confusingly named can_modify_vma(), so it's abundantly clear what's being tested, rather then a nebulous sense of 'can the VMA be modified'. No functional change intended. Link: https://lkml.kernel.org/r/98cf28d04583d632a6eb698e9ad23733bb6af26b.1753431105.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Xu <jeffxu@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:09 -07:00
Lorenzo Stoakes	d0b47a6866	mm/mseal: update madvise() logic The madvise() logic is inexplicably performed in mm/mseal.c - this ought to be located in mm/madvise.c. Additionally can_modify_vma_madv() is inconsistently named and, in combination with is_ro_anon(), is very confusing logic. Put a static function in mm/madvise.c instead - can_madvise_modify() - that spells out exactly what's happening. Also explicitly check for an anon VMA. Also add commentary to explain what's going on. Essentially - we disallow discarding of data in mseal()'d mappings in instances where the user couldn't otherwise write to that data. We retain the existing behaviour here regarding MAP_PRIVATE mappings of file-backed mappings, which entails some complexity - while this, strictly speaking - appears to violate mseal() semantics, it may interact badly with users which expect to be able to madvise(MADV_DONTNEED) .text mappings for instance. We may revisit this at a later date. No functional change intended. Link: https://lkml.kernel.org/r/492a98d9189646e92c8f23f4cce41ed323fe01df.1753431105.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:09 -07:00
Lorenzo Stoakes	f225b34f1e	mm/mseal: always define VM_SEALED Patch series "mseal cleanups", v4. Perform a number of cleanups to the mseal logic. Firstly, VM_SEALED is treated differently from every other VMA flag, it really doesn't make sense to do this, so we start by making this consistent with everything else. Next we place the madvise logic where it belongs - in mm/madvise.c. It really makes no sense to abstract this elsewhere. In doing so, we go to great lengths to explain very clearly the previously very confusing logic as to what sealed mappings are impacted here. In doing so, we retain existing logic regarding treatment of madvise() discard operations for a sealed, read-only MAP_PRIVATE file-backed mapping. This is something we likely need to revisit. We then abstract out and explain the 'are there are any gaps in this range in the mm?' check being performed as a prerequisite to mseal being performed. Finally, we simplify the actual mseal logic which is really quite straightforward. No functional change is intended. This patch (of 4): There is no reason to treat VM_SEALED in a special way, in each other case in which a VMA flag is unavailable due to configuration, we simply assign that flag to VM_NONE, so make VM_SEALED consistent with all other VMA flags in this respect. Additionally, use the next available bit for VM_SEALED, 42, rather than arbitrarily putting it at 63 and update the declaration to match all other VMA flags. No functional change intended. Link: https://lkml.kernel.org/r/cover.1753431105.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/aeb398a77029b6e7377cd944328bc9bbc3c90537.1753431105.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:09 -07:00
Bijan Tabatabai	dee3ab621f	mm/damon/vaddr: skip isolating folios already in destination nid damos_va_migrate_dests_add() determines the node a folio should be in based on the struct damos_migrate_dests associated with the migration scheme and adds the folio to the linked list corresponding to that node so it can be migrated later. Currently, folios are isolated and added to the list even if they are already in the node they should be in. In using damon weighted interleave more, I've found that the overhead of needlessly adding these folios to the migration lists can be quite high. The overhead comes from isolating folios and placing them in the migration lists inside of damos_va_migrate_dests_add(), as well as the cost of handling those folios in damon_migrate_pages(). This patch eliminates that overhead by simply avoiding the addition of folios that are already in their intended location to the migration list. To show the benefit of this patch, we start the test workload and start a DAMON instance attached to that workload with a migrate_hot scheme that has one dest field sending data to the local node. This way, we are only measuring the overheads of the scheme, and not the cost of migrating pages, since data will be allocated to the local node by default. I tested with two workloads: the embedding reduction workload used in [1] and a microbenchmark that allocates 20GB of data then sleeps, which is similar to the memory usage of the embedding reduction workload. The time taken in damos_va_migrate_dests_add() and damon_migrate_pages() each aggregation interval is shown below. Before this patch: damos_va_migrate_dests_add damon_migrate_pages microbenchmark ~2ms ~3ms embedding reduction ~1s ~3s After this patch: damos_va_migrate_dests_add damon_migrate_pages microbenchmark 0us ~40us embedding reduction 0us ~100us I did not do an in depth analysis for why things are much slower in the embedding reduction workload than the microbenchmark. However, I assume it's because the embedding reduction workload oversaturates the bandwidth of the local memory node, increasing the memory access latency, and in turn making the pointer chasing involved in iterating through a linked list much slower. Regardless of that, this patch results in a significant speedup. [1] https://lore.kernel.org/damon/20250709005952.17776-1-bijan311@gmail.com/ Link: https://lkml.kernel.org/r/20250725163300.4602-1-bijan311@gmail.com Fixes: `19c1dc15c8` ("mm/damon/vaddr: use damos->migrate_dests in migrate_{hot,cold}") Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:09 -07:00
Suresh K C	d6a511dea4	selftests: cachestat: add tests for mmap, refactor and enhance mmap test for cachestat validation Add a cohesive test case that verifies cachestat behavior with memory-mapped files using mmap(). Also refactor the test logic to reduce redundancy, improve error reporting, and clarify failure messages for both shmem and mmap file types. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20250709174657.6916-1-suresh.k.chandrappa@gmail.com Signed-off-by: Suresh K C <suresh.k.chandrappa@gmail.com> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Tested-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:09 -07:00
Xuanye Liu	881388f343	mm: add process info to bad rss-counter warning Enhance the debugging information in check_mm() by including the process name and PID when reporting bad rss-counter states. This helps identify which process is associated with the memory accounting issue. Link: https://lkml.kernel.org/r/20250723100901.1909683-1-liuqiye2025@163.com Signed-off-by: Xuanye Liu <liuqiye2025@163.com> Acked-by: SeongJae Park <sj@kernel.org> Cc: Ben Segall <bsegall@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:08 -07:00
Jann Horn	56bdf83de7	kasan: skip quarantine if object is still accessible under RCU Currently, enabling KASAN masks bugs where a lockless lookup path gets a pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be recycled and is insufficiently careful about handling recycled objects: KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine queues, even when it can't actually detect UAF in these objects, and the quarantine prevents fast recycling. When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed after an RCU grace period and put them on the quarantine, while disabling CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately; but that hasn't actually been working. I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I could only trigger this bug in a KASAN build by disabling CONFIG_SLUB_RCU_DEBUG and applying this patch. Link: https://lkml.kernel.org/r/20250723-kasan-tsbrcu-noquarantine-v1-1-846c8645976c@google.com Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:08 -07:00
Joanne Koong	d171b10b2d	mm/page-flags: remove folio_start_writeback_keepwrite() Commit `cd57b77197` ("ext4: Convert ext4_bio_write_page() to use a folio) removed set_page_writeback_keepwrite() which was the last/only caller of folio_start_writeback_keepwrite(). Link: https://lkml.kernel.org/r/20250722182230.2114587-1-joannelkoong@gmail.com Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:08 -07:00
wang lian	b50e37889f	selftests/mm: add process_madvise() tests Add tests for process_madvise(), focusing on verifying behavior under various conditions including valid usage and error cases. [lianux.mm@gmail.com: v7] Link: https://lkml.kernel.org/r/20250729113109.12272-1-lianux.mm@gmail.com Link: https://lkml.kernel.org/r/20250729113109.12272-1-lianux.mm@gmail.com Link: https://lkml.kernel.org/r/20250721114614.40996-1-lianux.mm@gmail.com Signed-off-by: wang lian <lianux.mm@gmail.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: David Hildenbrand <david@redhat.com> Suggested-by: Zi Yan <ziy@nvidia.com> Suggested-by: Mark Brown <broonie@kernel.org> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Zi Yan <ziy@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:06:08 -07:00
Baolin Wang	8d58d65621	mm: shmem: fix the shmem large folio allocation for the i915 driver After commit `acd7ccb284` ("mm: shmem: add large folio support for tmpfs"), we extend the 'huge=' option to allow any sized large folios for tmpfs, which means tmpfs will allow getting a highest order hint based on the size of write() and fallocate() paths, and then will try each allowable large order. However, when the i915 driver allocates shmem memory, it doesn't provide hint information about the size of the large folio to be allocated, resulting in the inability to allocate PMD-sized shmem, which in turn affects GPU performance. Patryk added: : In my tests, the performance drop ranges from a few percent up to 13% : in Unigine Superposition under heavy memory usage on the CPU Core Ultra : 155H with the Xe 128 EU GPU. Other users have reported performance : impact up to 30% on certain workloads. Please find more in the : regressions reports: : https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14645 : https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845 : : I believe the change should be backported to all active kernel branches : after version 6.12. To fix this issue, we can use the inode's size as a write size hint in shmem_read_folio_gfp() to help allocate PMD-sized large folios. Link: https://lkml.kernel.org/r/f7e64e99a3a87a8144cc6b2f1dddf7a89c12ce44.1753926601.git.baolin.wang@linux.alibaba.com Fixes: `acd7ccb284` ("mm: shmem: add large folio support for tmpfs") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws> Suggested-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:05:51 -07:00
Kairui Song	5c241ed8d0	mm/shmem, swap: improve cached mTHP handling and fix potential hang The current swap-in code assumes that, when a swap entry in shmem mapping is order 0, its cached folios (if present) must be order 0 too, which turns out not always correct. The problem is shmem_split_large_entry is called before verifying the folio will eventually be swapped in, one possible race is: CPU1 CPU2 shmem_swapin_folio /* swap in of order > 0 swap entry S1 / folio = swap_cache_get_folio / folio = NULL / order = xa_get_order / order > 0 / folio = shmem_swap_alloc_folio / mTHP alloc failure, folio = NULL / <... Interrupted ...> shmem_swapin_folio / S1 is swapped in / shmem_writeout / S1 is swapped out, folio cached / shmem_split_large_entry(..., S1) / S1 is split, but the folio covering it has order > 0 now */ Now any following swapin of S1 will hang: `xa_get_order` returns 0, and folio lookup will return a folio with order > 0. The `xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always return false causing swap-in to return -EEXIST. And this looks fragile. So fix this up by allowing seeing a larger folio in swap cache, and check the whole shmem mapping range covered by the swapin have the right swap value upon inserting the folio. And drop the redundant tree walks before the insertion. This will actually improve performance, as it avoids two redundant Xarray tree walks in the hot path, and the only side effect is that in the failure path, shmem may redundantly reallocate a few folios causing temporary slight memory pressure. And worth noting, it may seems the order and value check before inserting might help reducing the lock contention, which is not true. The swap cache layer ensures raced swapin will either see a swap cache folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is bypassed), so holding the folio lock and checking the folio flag is already good enough for avoiding the lock contention. The chance that a folio passes the swap entry value check but the shmem mapping slot has changed should be very low. Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20250728075306.12704-2-ryncsn@gmail.com Fixes: `809bc86517` ("mm: shmem: support large folio swap out") Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 11:53:19 -07:00
Joshua Hahn	af915c3c13	MAINTAINERS: add missing headers to mempory policy & migration section These two files currently do not belong to any section. The memory policy & migration section seems to be a good home for them! Link: https://lkml.kernel.org/r/20250725175616.2397031-1-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:24 -07:00
Lorenzo Stoakes	1729003f28	MAINTAINERS: add missing file to cgroup section The page_counter files seems most appropriately placed here. Link: https://lkml.kernel.org/r/20250724135421.54510-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:24 -07:00
Lorenzo Stoakes	e23210425c	MAINTAINERS: add MM MISC section, add missing files to MISC and CORE Add a MEMORY MANAGEMENT - MISC section to contain files that are not described by other sections, moving all but the catch-all mm/ and tools/mm/ from MEMORY MANAGEMENT to MEMORY MANAGEMENT - CORE and MEMORY MANAGEMENT - MISC as appropriate. In both sections add remaining missing files. At this point, with the other recent MAINTAINERS changes, this should now mean that every memory management-related file has a section and assigned maintainers/reviewers. Finally, we copy across the maintainers/reviewers from MEMORY MANAGEMENT - CORE to MEMORY MANAGEMENT - MISC, as it seems the two are sufficiently related for this to be sensible. Link: https://lkml.kernel.org/r/20250724133356.49487-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:24 -07:00
Lorenzo Stoakes	a5c9fcb18c	MAINTAINERS: add missing zsmalloc file The mm/zpdesc.h file is only included by mm/zsmalloc.c so the zsmalloc section seems the most appropriate place for this file. Link: https://lkml.kernel.org/r/20250722181827.156035-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:24 -07:00
Lorenzo Stoakes	2656a75ca1	MAINTAINERS: add missing files to page alloc section There are a couple of mm/-specific header files that were accidentally missed previously, and some page ref debug code also that ought to live here. Link: https://lkml.kernel.org/r/20250722174143.147143-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:23 -07:00
Lorenzo Stoakes	c3ef2cc695	MAINTAINERS: add missing shrinker files The mm/list_lru.[ch] files implement a shrinker-specific data structure so seem most suited to the SHRINKER section. Link: https://lkml.kernel.org/r/20250722173436.145526-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:23 -07:00
Lorenzo Stoakes	2011011ad6	MAINTAINERS: move memremap.[ch] to hotplug section This seems to be the most appropriate place for these files. Link: https://lkml.kernel.org/r/20250722172258.143488-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:23 -07:00
Lorenzo Stoakes	651ad43d56	MAINTAINERS: add missing mm_slot.h file THP section This seems to be the most appropriate place for this file. [lorenzo.stoakes@oracle.com: also add mm_slot.h to KSM section] Link: https://lkml.kernel.org/r/685747e2-a8cb-4620-a0c0-5cd9048d69b8@lucifer.local Link: https://lkml.kernel.org/r/20250722171904.142306-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Nico Pache <npache@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dev Jain <dev.jain@arm.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:23 -07:00
Lorenzo Stoakes	85c16ee6fa	MAINTAINERS: add missing interval_tree.c to memory mapping section This seems to be the best place for this file. Link: https://lkml.kernel.org/r/20250722171528.141083-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:22 -07:00
Lorenzo Stoakes	44d10df200	MAINTAINERS: add missing percpu-internal.h file to per-cpu section This file seems to most appropriately belong to the PER-CPU MEMORY ALLOCATOR section, so place it there. Link: https://lkml.kernel.org/r/20250722171023.139777-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:22 -07:00
Zi Yan	48e6561b66	mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() The trace event has not recorded the right data since it was introduced at commit `c8b3600312` ("mm: add alloc_contig_migrate_range allocation statistics"). Remove it. Link: https://lkml.kernel.org/r/20250722194649.4135191-1-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507220742.P3SaKlI6-lkp@intel.com/ Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Chang <richardycc@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:22 -07:00
Enze Li	511914506d	selftests/damon: introduce _common.sh to host shared function The current test scripts contain duplicated root permission checks in multiple locations. This patch consolidates these checks into _common.sh to eliminate code redundancy. Link: https://lkml.kernel.org/r/20250718064217.299300-1-lienze@kylinos.cn Signed-off-by: Enze Li <lienze@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	da5973a0b8	selftests/damon/sysfs.py: test runtime reduction of DAMON parameters sysfs.py is testing if non-default additional parameters can be committed. Add a test case for further reducing the parameters to the default set. Link: https://lkml.kernel.org/r/20250720171652.92309-23-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	62b7b1ffa2	selftests/damon/sysfs.py: test non-default parameters runtime commit sysfs.py is testing only the default and minimum DAMON parameters. Add another test case for more non-default additional DAMON parameters commitment on runtime. Link: https://lkml.kernel.org/r/20250720171652.92309-22-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	16797a55aa	selftests/damon/sysfs.py: generalize DAMON context commit assertion DAMON context commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-21-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	a4027b5f24	selftests/damon/sysfs.py: generalize monitoring attributes commit assertion DAMON monitoring attributes commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-20-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	771d7754ab	selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion DAMOS schemes commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-19-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	53f800581f	selftests/damon/sysfs.py: test DAMOS filters commitment Current DAMOS scheme commitment assertion is not testing DAMOS filters. Add the test. Link: https://lkml.kernel.org/r/20250720171652.92309-18-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	f22ff7b5a5	selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion DAMOS scheme commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-17-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	bd0487a774	selftests/damon/sysfs.py: test DAMOS destinations commitment Current DAMOS commitment assertion is not testing quota destinations commitment. Add the test. Link: https://lkml.kernel.org/r/20250720171652.92309-16-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	84dc442bd5	selftests/damon/sysfs.py: test quota goal commitment Current DAMOS quota commitment assertion is not testing quota goal commitment. Add the test. Link: https://lkml.kernel.org/r/20250720171652.92309-15-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:19 -07:00
SeongJae Park	f797e709f7	selftests/damon/sysfs.py: generalize DamosQuota commit assertion DamosQuota commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-14-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:19 -07:00
SeongJae Park	b50c48de61	selftests/damon/sysfs.py: generalize DAMOS Watermarks commit assertion DamosWatermarks commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:19 -07:00

1 2 3 4 5 ...

1369019 Commits