linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-26 00:31:10 -05:00

Author	SHA1	Message	Date
Wander Lairson Costa	dfd04add59	kmem/tracing: add kmem name to kmem_cache_alloc tracepoint The kmem_cache_free tracepoint includes a "name" field, which allows for easy identification and filtering of specific kmem's. However, the kmem_cache_alloc tracepoint lacks this field, making it difficult to pair corresponding alloc and free events for analysis. Add the "name" field to kmem_cache_alloc to enable consistent tracking and correlation of kmem alloc and free events. Link: https://lkml.kernel.org/r/20250825125927.59816-1-wander@redhat.com Signed-off-by: Wander Lairson Costa <wander@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:18 -07:00
Kairui Song	46afff4599	mm/page-writeback: drop usage of folio_index folio_index is only needed for mixed usage of page cache and swap cache. The remaining three caller in page-writeback are for page cache tag marking. Swap cache space doesn't use tag (explicitly sets mapping_set_no_writeback_tags), so use folio->index here directly. Link: https://lkml.kernel.org/r/20250825163721.17734-1-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
I Viswanath	79dfed0976	selftests/mm: use calloc instead of malloc in pagemap_ioctl.c As per Documentation/process/deprecated.rst, dynamic size calculations should not be performed in memory allocator arguments due to possible overflows. Replace malloc with calloc to avoid open-ended arithmetic and prevent possible overflows. Link: https://lkml.kernel.org/r/20250825170643.63174-1-viswanathiyyappan@gmail.com Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed by: Donet Tom <donettom@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Donet Tom	786eb990cf	drivers/base/node: handle error properly in register_one_node() If register_node() returns an error, it is not handled correctly. The function will proceed further and try to register CPUs under the node, which is not correct. So, in this patch, if register_node() returns an error, we return immediately from the function. Link: https://lkml.kernel.org/r/20250822084845.19219-1-donettom@linux.ibm.com Fixes: `76b67ed9dc` ("[PATCH] node hotplug: register cpu: remove node struct") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Wei Yang	3615e106e0	mm/khugepaged: use list_xxx() helper to improve readability In general, khugepaged_scan_mm_slot() iterates khugepaged_scan.mm_head list to get a mm_struct for collapse memory. Use list_xxx() helper would be more obvious to the list iteration operation. No functional change. Link: https://lkml.kernel.org/r/20250822025732.9025-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Bala-Vignesh-Reddy	a7498388b0	selftests: centralise maybe-unused definition in kselftest.h Several selftests subdirectories duplicated the define __maybe_unused, leading to redundant code. Move to kselftest.h header and remove other definitions. This addresses the duplication noted in the proc-pid-vm warning fix Link: https://lkml.kernel.org/r/20250821101159.2238-1-reddybalavignesh9979@gmail.com Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Link:https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/ Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Mickal Salan <mic@digikod.net> [landlock] Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
ally heev	940b1be225	kselftest: mm: fix typos in test_vmalloc.sh Fix simple typos in function name and console message. Link: https://lkml.kernel.org/r/20250823170208.184149-1-allyheev@gmail.com Signed-off-by: ally heev <allyheev@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Usama Arif	32960f7503	mm/huge_memory: remove enforce_sysfs from __thp_vma_allowable_orders Using forced_collapse directly is clearer and enforce_sysfs is not really needed. Link: https://lkml.kernel.org/r/20250821150038.2025521-1-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Brendan Jackman	ce32123b9b	mm: remove is_migrate_highatomic() There are 3 potential reasons for is_migrate_() helpers: 1. They represent higher-level attributes of migratetypes, like is_migrate_movable() 2. They are ifdef'd, like is_migrate_isolate(). 3. For consistency with an is_migrate__page() helper, also like is_migrate_isolate(). It looks like is_migrate_highatomic() was for case 3, but that was removed in commit `e0932b6c1f` ("mm: page_alloc: consolidate free page accounting"). So remove the indirection and go back to a simple comparison. Link: https://lkml.kernel.org/r/20250821-is-migrate-highatomic-v1-1-ddb6e5d7c566@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Shankari Anand	9907e1df31	rust: mm: update ARef and AlwaysRefCounted imports from sync::aref Update call sites in the mm subsystem to import `ARef` and `AlwaysRefCounted` from `sync::aref` instead of `types`. This aligns with the ongoing effort to move `ARef` and `AlwaysRefCounted` to sync. Link: https://lkml.kernel.org/r/20250716091158.812860-1-shankari.ak0208@gmail.com Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com> Suggested-by: Benno Lossin <lossin@kernel.org> Link: https://github.com/Rust-for-Linux/linux/issues/1173 Acked-by: Alice Ryhl <aliceryhl@google.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Trevor Gross <tmgross@umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00
Nhat Pham	0b1bf60c32	mm/zswap: reduce the size of the compression buffer to a single page Reduce the compression buffer size from 2 * PAGE_SIZE to only one page, as the compression output (in the success case) should not exceed the length of the input. In the past, Chengming tried to reduce the compression buffer size, but ran into issues with the LZO algorithm (see [2]). Herbert Xu reported that the issue has been fixed (see [3]). Now we should have the guarantee that compressors' output should not exceed one page in the success case, and the algorithm will just report failure otherwise. With this patch, we save one page per cpu (per compression algorithm). Link: https://lkml.kernel.org/r/20250820181547.3794167-1-nphamcs@gmail.com Link: https://lore.kernel.org/linux-mm/20231213-zswap-dstmem-v4-1-f228b059dd89@bytedance.com/ [1] Link: https://lore.kernel.org/lkml/0000000000000b05cd060d6b5511@google.com/ [2] Link: https://lore.kernel.org/linux-mm/aKUmyl5gUFCdXGn-@gondor.apana.org.au/ [3] Co-developed-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Nhat Pham <nphamcs@gmail.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00
gaoxiang17	0cd01c4a5c	mm/cma: add 'available count' and 'total count' to trace_cma_alloc_start This makes cma info more intuitive during debugging. Show up in the trace as: 279.814717: cma_alloc_start: name=reserved request_count=4 available_count=8096 total_count=8192 align=0 309.790580: cma_alloc_start: name=reserved request_count=4 available_count=8092 total_count=8192 align=0 317.046609: cma_alloc_start: name=reserved request_count=4 available_count=8088 total_count=8192 align=0 Link: https://lkml.kernel.org/r/8a79284879c529f467478552825154b018076e95.1755729178.git.gaoxiang17@xiaomi.com Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00
Wei Yang	5d5d75ff64	mm/rmap: use folio_large_nr_pages() when we are sure it is a large folio Non-large folio is handled at the beginning, so it is a large folio for sure. Use folio_large_nr_pages() here like elsewhere. Link: https://lkml.kernel.org/r/20250817032647.29147-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00
Wei Yang	e5e758922d	mm/rmap: not necessary to mask off FOLIO_PAGES_MAPPED At this point, we are in an if branch conditional on (nr < ENTIRELY_MAPPED), and FOLIO_PAGES_MAPPED is equal to (ENTIRELY_MAPPED - 1). This means the upper bits are already cleared. It is not necessary to mask it off. Link: https://lkml.kernel.org/r/20250817032647.29147-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:14 -07:00
Steven Rostedt	658fa653b4	mm, x86/mm: move creating the tlb_flush event back to x86 code Commit `e73ad5ff2f` ("mm, x86/mm: Make the batched unmap TLB flush API more generic") moved the trace_tlb_flush out of mm/rmap.c and back into x86 specific architecture, but it kept the include to the events/tlb.h file, even though it didn't use that event. Then another commit came in and added more events to the mm/rmap.c file and moved the #define CREATE_TRACE_POINTS define from the x86 specific architecture to the generic mm/rmap.h file to create both the tlb_flush tracepoint and the new tracepoints. But since the tlb_flush tracepoint is only x86 specific, it now creates that tracepoint for all other architectures and this wastes approximately 5K of text and meta data that will not be used. Remove the events/tlb.h from mm/rmap.c and add the define CREATE_TRACE_POINTS back in the x86 code. Link: https://lkml.kernel.org/r/20250612100313.3b9a8b80@batman.local.home Fixes: `e73ad5ff2f` ("mm, x86/mm: Make the batched unmap TLB flush API more generic") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:14 -07:00
Christoph Hellwig	7bebb41b96	mm: remove write_cache_pages No users left. Link: https://lkml.kernel.org/r/20250818061017.1526853-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:14 -07:00
Christoph Hellwig	e34b21ba15	bcachefs: stop using write_cache_pages Stop using the obsolete write_cache_pages and use writeback_iter directly. This basically just open codes write_cache_pages without the indirect call, but there's probably ways to structure the code even nicer as a follow on. Link: https://lkml.kernel.org/r/20250818061017.1526853-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:14 -07:00
Christoph Hellwig	8d4bb46ba7	ntfs3: stop using write_cache_pages Patch series "remove write_cache_pages()". Kill off write_cache_pages() after converting the last two users to the iterator. This patch (of 3): Stop using the obsolete write_cache_pages and use writeback_iter directly. Link: https://lkml.kernel.org/r/20250818061017.1526853-1-hch@lst.de Link: https://lkml.kernel.org/r/20250818061017.1526853-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:13 -07:00
Xichao Zhao	1aca4021f8	lib/test_hmm: drop redundant conversion to bool The result of integer comparison already evaluates to bool. No need for explicit conversion. No functional impact. Link: https://lkml.kernel.org/r/20250819070457.486348-1-zhao.xichao@vivo.com Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:13 -07:00
Wei Yang	c9615059ca	selftests/mm: test that rmap behaves as expected As David suggested, currently we don't have a high level test case to verify the behavior of rmap. This patch introduce the verification on rmap by migration. The general idea is if migrate one shared page between processes, this would be reflected in all related processes. Otherwise, we have problem in rmap. Currently it covers following four scenarios: * anonymous page * shmem page * pagecache page * ksm page Link: https://lkml.kernel.org/r/20250819080047.10063-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Rik van Riel <riel@surriel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:13 -07:00
Wei Yang	b27f292de6	selftests/mm: put general ksm operation into vm_util Patch series "test that rmap behaves as expected", v4. As David suggested, currently we don't have a high level test case to verify the behavior of rmap. This patch set introduce the verification on rmap by migration. Patch 1 is a preparation to move ksm related operations into vm_util. Patch 2 is the new test case for rmap. Currently it covers following four scenarios: * anonymous page * shmem page * pagecache page * ksm page This patch (of 2): There are some general ksm operations could be used by other related test cases. Put them into vm_util for common use. This is a preparation patch for later use. Link: https://lkml.kernel.org/r/20250819080047.10063-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250819080047.10063-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Rik van Riel <riel@surriel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:13 -07:00
Baokun Li	63ec0c26b6	tmpfs: preserve SB_I_VERSION on remount Now tmpfs enables i_version by default and tmpfs does not modify it. But SB_I_VERSION can also be modified via sb_flags, and reconfigure_super() always overwrites the existing flags with the latest ones. This means that if tmpfs is remounted without specifying iversion, the default i_version will be unexpectedly disabled. To ensure iversion remains enabled, SB_I_VERSION is now always set for fc->sb_flags in shmem_init_fs_context(), instead of for sb->s_flags in shmem_fill_super(). Link: https://lkml.kernel.org/r/20250819061803.1496443-1-libaokun@huaweicloud.com Fixes: `36f05cab0a` ("tmpfs: add support for an i_version counter") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:12 -07:00
Zi Yan	c55ed758e0	selftests/mm: check after-split folio orders in split_huge_page_test Instead of just checking the existence of PMD folios before and after folio split tests, use check_folio_orders() to check after-split folio orders. The split ranges in split_thp_in_pagecache_to_order_at() are changed to [addr, addr + pagesize) for every pmd_pagesize. It prevents folios within the range being split multiple times due to debugfs split function always perform splits with a pagesize step for a given range. The following tests are not changed: 1. split_pte_mapped_thp: the test already uses kpageflags to check; 2. split_file_backed_thp: no vaddr available. Link: https://lkml.kernel.org/r/20250818184622.1521620-6-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: wang lian <lianux.mm@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:12 -07:00
Zi Yan	fca418e59a	selftests/mm: add check_after_split_folio_orders() helper The helper gathers a folio order statistics of folios within a virtual address range and checks it against a given order list. It aims to provide a more precise folio order check instead of just checking the existence of PMD folios. The helper will be used the upcoming commit. Link: https://lkml.kernel.org/r/20250818184622.1521620-5-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: wang lian <lianux.mm@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:12 -07:00
Zi Yan	bd66448f2a	selftests/mm: reimplement is_backed_by_thp() with more precise check and rename it to is_backed_by_folio(). is_backed_by_folio() checks if the given vaddr is backed a folio with a given order. It does so by: 1. getting the pfn of the vaddr; 2. checking kpageflags of the pfn; if order is greater than 0: 3. checking kpageflags of the head pfn; 4. checking kpageflags of all tail pfns. pmd_order is added to split_huge_page_test.c and replaces max_order. [ziy@nvidia.com: reduce code duplication, per David] Link: https://lkml.kernel.org/r/F54782D6-65A3-4D35-AE03-8ADE636EE258@nvidia.com Link: https://lkml.kernel.org/r/20250818184622.1521620-4-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: wang lian <lianux.mm@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:12 -07:00
Zi Yan	72a07c0390	selftests/mm: mark all functions static in split_huge_page_test.c All functions are only used within the file. Link: https://lkml.kernel.org/r/20250818184622.1521620-3-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: wang lian <lianux.mm@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:11 -07:00
Zi Yan	9eff16bd3a	mm/huge_memory: add new_order and offset to split_huge_pages*() pr_debug Patch series "Better split_huge_page_test result check", v5. This patchset uses kpageflags to get after-split folio orders for a better split_huge_page_test result check[1]. The added gather_after_split_folio_orders() scans through a VPN range and collects the numbers of folios at different orders. check_after_split_folio_orders() compares the result of gather_after_split_folio_orders() to a given list of numbers of different orders. This patchset also adds new order and in folio offset to the split huge page debugfs's pr_debug()s; This patch (of 5): They are useful information for debugging split huge page tests. Link: https://lkml.kernel.org/r/20250818184622.1521620-1-ziy@nvidia.com Link: https://lkml.kernel.org/r/20250818184622.1521620-2-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: wang lian <lianux.mm@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:11 -07:00
Li RongQing	b322e88b3d	mm/hugetlb: early exit from hugetlb_pages_alloc_boot() when max_huge_pages=0 Optimize hugetlb_pages_alloc_boot() to return immediately when max_huge_pages is 0, avoiding unnecessary CPU cycles and the below log message when hugepages aren't configured in the kernel command line. [ 3.702280] HugeTLB: allocation took 0ms with hugepage_allocation_threads=32 Link: https://lkml.kernel.org/r/20250814102333.4428-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Tested-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:11 -07:00
Chi Zhiling	35224da7e3	mm/filemap: skip non-uptodate folio if there are available folios When reading data exceeding the maximum IO size, the operation is split into multiple IO requests, but the data isn't immediately copied to userspace after each IO completion. For example, when reading 2560k data from a device with 1280k maximum IO size, the following sequence occurs: 1. read 1280k 2. copy 41 pages and issue read ahead for next 1280k 3. copy 31 pages to user buffer 4. wait the next 1280k 5. copy 8 pages to user buffer 6. copy 20 folios(64k) to user buffer The 8 pages in step 5 are copied after the second 1280k completes(step 4) due to waiting for a non-uptodate folio in filemap_update_page. We can copy the 8 pages before the second 1280k completes(step 4) to reduce the latency of this read operation. After applying the patch, these 8 pages will be copied before the next IO completes: 1. read 1280k 2. copy 41 pages and issue read ahead for next 1280k 3. copy 31 pages to user buffer 4. copy 8 pages to user buffer 5. wait the next 1280k 6. copy 20 folios(64k) to user buffer This patch drops a setting of IOCB_NOWAIT for AIO, which is fine because filemap_read will set it again for AIO. The final solution provided by Matthew Wilcox: Link: https://lore.kernel.org/linux-fsdevel/aIDy076Sxt544qja@casper.infradead.org/ Link: https://lkml.kernel.org/r/20250728083952.75518-3-chizhiling@163.com Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:11 -07:00
Chi Zhiling	c4408277c0	mm/filemap: do not use is_partially_uptodate for entire folio Patch series "Tiny optimization for large read operations". This series contains two patches, 1. Skip calling is_partially_uptodate for entire folio to save time, I have reviewed the mpage and iomap implementations and didn't spot any issues, but this change likely needs more thorough review. 2. Skip calling filemap_uptodate if there are ready folios in the batch, This might save a few milliseconds in practice, but I didn't observe measurable improvements in my tests. This patch (of 2): When a folio is marked as non-uptodate, it means the folio contains some non-uptodate data. Therefore, calling is_partially_uptodate() to recheck the entire folio is redundant. If all data in a folio is actually up-to-date but the folio lacks the uptodate flag, it will still be treated as non-uptodate in many other places. Thus, there should be no special case handling for filemap. Link: https://lkml.kernel.org/r/20250728083952.75518-1-chizhiling@163.com Link: https://lkml.kernel.org/r/20250728083952.75518-2-chizhiling@163.com Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:10 -07:00
Sang-Heon Jeon	f6a4a150f1	mm/damon/tests/core-kunit: add damos_commit_filter test Add unit test to verify that damos_commmit_filter() change dest value well. Link: https://lkml.kernel.org/r/20250817021348.570692-1-ekffu200098@gmail.com Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Honggyu Kim <honggyu.kim@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:10 -07:00
liuqiqi	4bd22a7ae5	mm: fix duplicate accounting of free pages in should_reclaim_retry() In the zone_reclaimable_pages() function, if the page counts for NR_ZONE_INACTIVE_FILE, NR_ZONE_ACTIVE_FILE, NR_ZONE_INACTIVE_ANON, and NR_ZONE_ACTIVE_ANON are all zero, the function returns the number of free pages as the result. In this case, when should_reclaim_retry() calculates reclaimable pages, it will inadvertently double-count the free pages in its accounting. static inline bool should_reclaim_retry(gfp_t gfp_mask, unsigned order, struct alloc_context ac, int alloc_flags, bool did_some_progress, int no_progress_loops) { ... available = reclaimable = zone_reclaimable_pages(zone); available += zone_page_state_snapshot(zone, NR_FREE_PAGES); This may result in an increase in the number of retries of __alloc_pages_slowpath(), causing increased kswapd load. Link: https://lkml.kernel.org/r/20250812070210.1624218-1-liuqiqi@kylinos.cn Fixes: `6aaced5abd` ("mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()") Signed-off-by: liuqiqi <liuqiqi@kylinos.cn> Reviewed-by: Ye Liu <liuye@kylinos.cn> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:10 -07:00
Matthew Wilcox (Oracle)	88df6ab2f3	mm: add folio_is_pci_p2pdma() Reimplement is_pci_p2pdma_page() in terms of folio_is_pci_p2pdma(). Moves the page_folio() call from inside page_pgmap() to is_pci_p2pdma_page(). This removes a page_folio() call from try_grab_folio() which already has a folio and can pass it in. Link: https://lkml.kernel.org/r/20250805172307.1302730-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:10 -07:00
Matthew Wilcox (Oracle)	c995ac3aa3	mm: reimplement folio_is_fsdax() For callers of folio_is_fsdax(), we save a folio->page->folio conversion. Callers of is_fsdax_page() simply move the conversion of page->folio from the implementation of page_pgmap() to is_fsdax_page(). Link: https://lkml.kernel.org/r/20250805172307.1302730-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:09 -07:00
Matthew Wilcox (Oracle)	bd0dbbb3fd	mm: reimplement folio_is_device_coherent() For callers of folio_is_device_coherent(), we save a folio->page->folio conversion. Callers of is_device_coherent_page() simply move the conversion of page->folio from the implementation of page_pgmap() to is_device_coherent_page(). Link: https://lkml.kernel.org/r/20250805172307.1302730-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:09 -07:00
Matthew Wilcox (Oracle)	7cfe9cafb6	mm: reimplement folio_is_device_private() For callers of folio_is_device_private(), we save a folio->page->folio conversion. Callers of is_device_private_page() simply move the conversion of page->folio from the implementation of page_pgmap() to is_device_private_page(). Link: https://lkml.kernel.org/r/20250805172307.1302730-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:09 -07:00
Matthew Wilcox (Oracle)	89ef6ad6fa	mm: introduce memdesc_is_zone_device() Remove the conversion from folio to page in folio_is_zone_device() by introducing memdesc_is_zone_device() which takes a memdesc_flags_t from either a page or a folio. Link: https://lkml.kernel.org/r/20250805172307.1302730-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:08 -07:00
Matthew Wilcox (Oracle)	11afccce2a	slab: use memdesc_nid() We no longer need to convert from slab to folio to get the nid, we can ask memdesc_nid() for the nid directly. Link: https://lkml.kernel.org/r/20250805172307.1302730-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:08 -07:00
Matthew Wilcox (Oracle)	87479378ac	slab: use memdesc_flags_t The slab flags are memdesc flags and contain the same information in the upper bits as the other memdescs (like node ID). Link: https://lkml.kernel.org/r/20250805172307.1302730-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:08 -07:00
Matthew Wilcox (Oracle)	4aff03fbe5	mm: introduce memdesc_zonenum() Remove a conversion from folio to page by passing the folio->flags (which are a copy of the page->flags) to the new memdesc_zonenum() function. Link: https://lkml.kernel.org/r/20250805172307.1302730-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:08 -07:00
Matthew Wilcox (Oracle)	eb00fdd84d	mm: introduce memdesc_nid() Remove a conversion from folio to page by passing the folio->flags (which are a copy of the page->flags) to the new memdesc_nid() function. Link: https://lkml.kernel.org/r/20250805172307.1302730-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:07 -07:00
Matthew Wilcox (Oracle)	56d578c130	mm: convert page_to_section() to memdesc_section() Pass in the memdesc_flags_t instead of a pointer to the page. This will allow us to remove a few conversions to struct page in upcoming patches. Link: https://lkml.kernel.org/r/20250805172307.1302730-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:07 -07:00
Matthew Wilcox (Oracle)	53fbef56e0	mm: introduce memdesc_flags_t Patch series "Add and use memdesc_flags_t". At some point struct page will be separated from struct slab and struct folio. This is a step towards that by introducing a type for the 'flags' word of all three structures. This gives us a certain amount of type safety by establishing that some of these unsigned longs are different from other unsigned longs in that they contain things like node ID, section number and zone number in the upper bits. That lets us have functions that can be easily called by anyone who has a slab, folio or page (but not easily by anyone else) to get the node or zone. There's going to be some unusual merge problems with this as some odd bits of the kernel decide they want to print out the flags value or something similar by writing page->flags and now they'll need to write page->flags.f instead. That's most of the churn here. Maybe we should be removing these things from the debug output? This patch (of 11): Wrap the unsigned long flags in a typedef. In upcoming patches, this will provide a strong hint that you can't just pass a random unsigned long to functions which take this as an argument. [willy@infradead.org: s/flags/flags.f/ in several architectures] Link: https://lkml.kernel.org/r/aKMgPRLD-WnkPxYm@casper.infradead.org [nicola.vetrini@gmail.com: mips: fix compilation error] Link: https://lore.kernel.org/lkml/CA+G9fYvkpmqGr6wjBNHY=dRp71PLCoi2341JxOudi60yqaeUdg@mail.gmail.com/ Link: https://lkml.kernel.org/r/20250825214245.1838158-1-nicola.vetrini@gmail.com Link: https://lkml.kernel.org/r/20250805172307.1302730-1-willy@infradead.org Link: https://lkml.kernel.org/r/20250805172307.1302730-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:07 -07:00
Enze Li	4e915656a3	mm/damon/Kconfig: make DAMON_STAT_ENABLED_DEFAULT depend on DAMON_STAT The DAMON_STAT_ENABLED_DEFAULT option is strongly tied to DAMON_STAT option -- enabling it alone is meaningless. This patch makes DAMON_STAT_ENABLED_DEFAULT depend on DAMON_STAT, ensuring functional consistency. Link: https://lkml.kernel.org/r/20250815092110.811757-1-lienze@kylinos.cn Fixes: `369c415e60` ("mm/damon: introduce DAMON_STAT module") Signed-off-by: Enze Li <lienze@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:07 -07:00
Usama Arif	6bb9614484	selftests: prctl: introduce tests for disabling THPs except for madvise The test will set the global system THP setting to never, madvise or always depending on the fixture variant and the 2M setting to inherit before it starts (and reset to original at teardown). The fixture setup will also test if PR_SET_THP_DISABLE prctl call can be made with PR_THP_DISABLE_EXCEPT_ADVISED and skip if it fails. This tests if the process can: - successfully get the policy to disable THPs expect for madvise. - get hugepages only on MADV_HUGE and MADV_COLLAPSE if the global policy is madvise/always and only with MADV_COLLAPSE if the global policy is never. - successfully reset the policy of the process. - after reset, only get hugepages with: - MADV_COLLAPSE when policy is set to never. - MADV_HUGE and MADV_COLLAPSE when policy is set to madvise. - always when policy is set to "always". - never get a THP with MADV_NOHUGEPAGE. - repeat the above tests in a forked process to make sure the policy is carried across forks. Test results: ./prctl_thp_disable TAP version 13 1..12 ok 1 prctl_thp_disable_completely.never.nofork ok 2 prctl_thp_disable_completely.never.fork ok 3 prctl_thp_disable_completely.madvise.nofork ok 4 prctl_thp_disable_completely.madvise.fork ok 5 prctl_thp_disable_completely.always.nofork ok 6 prctl_thp_disable_completely.always.fork ok 7 prctl_thp_disable_except_madvise.never.nofork ok 8 prctl_thp_disable_except_madvise.never.fork ok 9 prctl_thp_disable_except_madvise.madvise.nofork ok 10 prctl_thp_disable_except_madvise.madvise.fork ok 11 prctl_thp_disable_except_madvise.always.nofork ok 12 prctl_thp_disable_except_madvise.always.fork [usamaarif642@gmail.com: return after executing test in child process] Link: https://lkml.kernel.org/r/3dca2de4-9a6a-4efe-a86c-83f9509831fc@gmail.com Link: https://lkml.kernel.org/r/20250815135549.130506-8-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:06 -07:00
Usama Arif	681f45deca	selftests: prctl: introduce tests for disabling THPs completely The test will set the global system THP setting to never, madvise or always depending on the fixture variant and the 2M setting to inherit before it starts (and reset to original at teardown). The fixture setup will also test if PR_SET_THP_DISABLE prctl call can be made to disable all THPs and skip if it fails. This tests if the process can: - successfully get the policy to disable THPs completely. - never get a hugepage when the THPs are completely disabled with the prctl, including with MADV_HUGE and MADV_COLLAPSE. - successfully reset the policy of the process. - after reset, only get hugepages with: - MADV_COLLAPSE when policy is set to never. - MADV_HUGE and MADV_COLLAPSE when policy is set to madvise. - always when policy is set to "always". - never get a THP with MADV_NOHUGEPAGE. - repeat the above tests in a forked process to make sure the policy is carried across forks. [usamaarif642@gmail.com: return after executing test in child process] Link: https://lkml.kernel.org/r/2d0ea708-ecba-4021-b6ca-e93f1413d60a@gmail.com [usamaarif642@gmail.com: include linux/mman.h for prctl_thp_disable] Link: https://lkml.kernel.org/r/20250910204609.1720498-1-usamaarif642@gmail.com Link: https://lore.kernel.org/all/c8249725-e91d-4c51-b9bb-40305e61e20d@sirena.org.uk/ Link: https://lkml.kernel.org/r/20250815135549.130506-7-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:06 -07:00
Usama Arif	49850bd026	selftest/mm: extract sz2ord function into vm_util.h The function already has 2 uses and will have a 3rd one in prctl selftests. The pagesize argument is added into the function, as it's not a global variable anymore. No functional change intended with this patch. Link: https://lkml.kernel.org/r/20250815135549.130506-6-usamaarif642@gmail.com Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:06 -07:00
Usama Arif	7de854910b	docs: transhuge: document process level THP controls This includes the PR_SET_THP_DISABLE/PR_GET_THP_DISABLE pair of prctl calls as well the newly introduced PR_THP_DISABLE_EXCEPT_ADVISED flag for the PR_SET_THP_DISABLE prctl call. Link: https://lkml.kernel.org/r/20250815135549.130506-5-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:05 -07:00
David Hildenbrand	8cdc4d2701	mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED Let's allow for making MADV_COLLAPSE succeed on areas that neither have VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED). MADV_COLLAPSE is a clear advice that we want to collapse. Note that we still respect the VM_NOHUGEPAGE flag, just like MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED, including for shmem. Link: https://lkml.kernel.org/r/20250815135549.130506-4-usamaarif642@gmail.com Co-developed-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:05 -07:00
David Hildenbrand	1f1c061089	mm/huge_memory: convert "tva_flags" to "enum tva_type" When determining which THP orders are eligible for a VMA mapping, we have previously specified tva_flags, however it turns out it is really not necessary to treat these as flags. Rather, we distinguish between distinct modes. The only case where we previously combined flags was with TVA_ENFORCE_SYSFS, but we can avoid this by observing that this is the default, except for MADV_COLLAPSE or an edge cases in collapse_pte_mapped_thp() and hugepage_vma_revalidate(), and adding a mode specifically for this case - TVA_FORCED_COLLAPSE. We have: * smaps handling for showing "THPeligible" * Pagefault handling * khugepaged handling * Forced collapse handling: primarily MADV_COLLAPSE, but also for an edge case in collapse_pte_mapped_thp() Disregarding the edge cases, we only want to ignore sysfs settings only when we are forcing a collapse through MADV_COLLAPSE, otherwise we want to enforce it, hence this patch does the following flag to enum conversions: * TVA_SMAPS \| TVA_ENFORCE_SYSFS -> TVA_SMAPS * TVA_IN_PF \| TVA_ENFORCE_SYSFS -> TVA_PAGEFAULT * TVA_ENFORCE_SYSFS -> TVA_KHUGEPAGED * 0 -> TVA_FORCED_COLLAPSE With this change, we immediately know if we are in the forced collapse case, which will be valuable next. Link: https://lkml.kernel.org/r/20250815135549.130506-3-usamaarif642@gmail.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:05 -07:00

1 2 3 4 5 ...

1382849 Commits