linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-21 20:45:27 -04:00

Author	SHA1	Message	Date
Andrew Morton	bc9950b56f	Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up changes required by mm-stable material: hugetlb and damon.	2025-09-21 14:19:36 -07:00
Jan Kara	e1b849cfa6	writeback: Avoid contention on wb->list_lock when switching inodes There can be multiple inode switch works that are trying to switch inodes to / from the same wb. This can happen in particular if some cgroup exits which owns many (thousands) inodes and we need to switch them all. In this case several inode_switch_wbs_work_fn() instances will be just spinning on the same wb->list_lock while only one of them makes forward progress. This wastes CPU cycles and quickly leads to softlockup reports and unusable system. Instead of running several inode_switch_wbs_work_fn() instances in parallel switching to the same wb and contending on wb->list_lock, run just one work item per wb and manage a queue of isw items switching to this wb. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>	2025-09-19 13:11:00 +02:00
Dev Jain	a660194dd1	arm64: Enable permission change on arm64 kernel block mappings This patch paves the path to enable huge mappings in vmalloc space and linear map space by default on arm64. For this we must ensure that we can handle any permission games on the kernel (init_mm) pagetable. Previously, __change_memory_common() used apply_to_page_range() which does not support changing permissions for block mappings. We move away from this by using the pagewalk API, similar to what riscv does right now. It is the responsibility of the caller to ensure that the range over which permissions are being changed falls on leaf mapping boundaries. For systems with BBML2, this will be handled in future patches by dyanmically splitting the mappings when required. Unlike apply_to_page_range(), the pagewalk API currently enforces the init_mm.mmap_lock to be held. To avoid the unnecessary bottleneck of the mmap_lock for our usecase, this patch extends this generic API to be used locklessly, so as to retain the existing behaviour for changing permissions. Apart from this reason, it is noted at [1] that KFENCE can manipulate kernel pgtable entries during softirqs. It does this by calling set_memory_valid() -> __change_memory_common(). This being a non-sleepable context, we cannot take the init_mm mmap lock. Add comments to highlight the conditions under which we can use the lockless variant - no underlying VMA, and the user having exclusive control over the range, thus guaranteeing no concurrent access. We require that the start and end of a given range do not partially overlap block mappings, or cont mappings. Return -EINVAL in case a partial block mapping is detected in any of the PGD/P4D/PUD/PMD levels; add a corresponding comment in update_range_prot() to warn that eliminating such a condition is the responsibility of the caller. Note that, the pte level callback may change permissions for a whole contpte block, and that will be done one pte at a time, as opposed to an atomic operation for the block mappings. This is fine as any access will decode either the old or the new permission until the TLBI. apply_to_page_range() currently performs all pte level callbacks while in lazy mmu mode. Since arm64 can optimize performance by batching barriers when modifying kernel pgtables in lazy mmu mode, we would like to continue to benefit from this optimisation. Unfortunately walk_kernel_page_table_range() does not use lazy mmu mode. However, since the pagewalk framework is not allocating any memory, we can safely bracket the whole operation inside lazy mmu mode ourselves. Therefore, wrap the call to walk_kernel_page_table_range() with the lazy MMU helpers. Link: https://lore.kernel.org/linux-arm-kernel/89d0ad18-4772-4d8f-ae8a-7c48d26a927e@arm.com/ [1] Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Yang Shi <yshi@os.amperecomputing.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:36:37 +01:00
Jakub Kicinski	f2cdc4c22b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc7). No conflicts. Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en/fs.h `9536fbe10c` ("net/mlx5e: Add PSP steering in local NIC RX") `7601a0a462` ("net/mlx5e: Add a miss level for ipsec crypto offload") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-18 11:26:06 -07:00
Linus Torvalds	8b789f2b76	Merge tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "15 hotfixes. 11 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 13 of these fixes are for MM. The usual shower of singletons, plus - fixes from Hugh to address various misbehaviors in get_user_pages() - patches from SeongJae to address a quite severe issue in DAMON - another series also from SeongJae which completes some fixes for a DAMON startup issue" * tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: zram: fix slot write race condition nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/* samples/damon/mtier: avoid starting DAMON before initialization samples/damon/prcl: avoid starting DAMON before initialization samples/damon/wsse: avoid starting DAMON before initialization MAINTAINERS: add Lance Yang as a THP reviewer MAINTAINERS: add Jann Horn as rmap reviewer mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control mm/damon/core: introduce damon_call_control->dealloc_on_cancel mm: folio_may_be_lru_cached() unless folio_test_large() mm: revert "mm: vmscan.c: fix OOM on swap stress test" mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" mm/gup: local lru_add_drain() to avoid lru_add_drain_all() mm/gup: check ref_count instead of lru before migration	2025-09-17 21:34:26 -07:00
Leon Romanovsky	ef3d979b3e	kmsan: fix missed kmsan_handle_dma() signature conversion kmsan_handle_dma_sg() has call to kmsan_handle_dma() function which was missed during conversion to physical addresses. Update that caller too and fix the following compilation error: mm/kmsan/hooks.c:372:6: error: too many arguments to function call, expected 3, have 4 371 \| kmsan_handle_dma(sg_page(item), item->offset, item->length, \| ~~~~~~~~~~~~~~~~ 372 \| dir); \| ^~~ mm/kmsan/hooks.c:362:19: note: 'kmsan_handle_dma' declared here 362 \| EXPORT_SYMBOL_GPL(kmsan_handle_dma); Fixes: `6eb1e769b2` ("kmsan: convert kmsan_handle_dma to use physical addresses") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509170638.AMGNCMEE-lkp@intel.com/ Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/4b2d7d0175b30177733bbbd42bf979d77eb73c29.1758090947.git.leon@kernel.org	2025-09-17 14:42:36 +02:00
Suren Baghdasaryan	f7381b9116	slab: mark slab->obj_exts allocation failures unconditionally alloc_slab_obj_exts() should mark failed obj_exts vector allocations independent on whether the vector is being allocated for a new or an existing slab. Current implementation skips doing this for existing slabs. Fix this by marking failed allocations unconditionally. Fixes: `09c46563ff` ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations") Reported-by: Shakeel Butt <shakeel.butt@linux.dev> Closes: https://lore.kernel.org/all/avhakjldsgczmq356gkwmvfilyvf7o6temvcmtt5lqd4fhp5rk@47gp2ropyixg/ Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org # v6.10+ Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-16 14:32:51 +02:00
Suren Baghdasaryan	4038016397	slab: prevent warnings when slab obj_exts vector allocation fails When object extension vector allocation fails, we set slab->obj_exts to OBJEXTS_ALLOC_FAIL to indicate the failure. Later, once the vector is successfully allocated, we will use this flag to mark codetag references stored in that vector as empty to avoid codetag warnings. slab_obj_exts() used to retrieve the slab->obj_exts vector pointer checks slab->obj_exts for being either NULL or a pointer with MEMCG_DATA_OBJEXTS bit set. However it does not handle the case when slab->obj_exts equals OBJEXTS_ALLOC_FAIL. Add the missing condition to avoid extra warning. Fixes: `09c46563ff` ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations") Reported-by: Shakeel Butt <shakeel.butt@linux.dev> Closes: https://lore.kernel.org/all/jftidhymri2af5u3xtcqry3cfu6aqzte3uzlznhlaylgrdztsi@5vpjnzpsemf5/ Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org # v6.10+ Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-16 14:32:51 +02:00
Vlastimil Babka	3864e4d5a5	slab: don't validate slab pointer in free_debug_processing() The struct slab pointer has been obtained from the object being freed on all the paths that lead to this function. In all cases this already includes the test for slab type of the struct page which struct slab is overlaying. Thus we would not reach this function if it was not a valid slab pointer in the first place. One less obvious case is that kmem_cache_free() trusts virt_to_slab() blindly so it may be NULL if the slab type check is false. But with SLAB_CONSISTENCY_CHECKS, cache_from_obj() called also from kmem_cache_free() catches this and returns NULL, which terminates freeing immediately. Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-15 16:48:03 +02:00
Vlastimil Babka	a21fe7b010	slab: validate slab before using it in alloc_single_from_partial() We touch slab->freelist and slab->inuse before checking the slab pointer is actually sane. Do that validation first, which will be safer. We can thus also remove the check from alloc_debug_processing(). This adds a new "s->flags & SLAB_CONSISTENCY_CHECKS" test but alloc_single_from_partial() is only called for caches with debugging enabled so it's acceptable. In alloc_single_from_new_slab() we just created the struct slab and call alloc_debug_processing() to mainly set up redzones, tracking etc, while not really expecting the consistency checks to fail. Thus don't validate it there. Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-15 16:47:36 +02:00
Vlastimil Babka	40522db59b	slab: move validate_slab_ptr() from alloc_consistency_checks() to its caller In alloc_debug_processing() we can call validate_slab_ptr() upfront and then don't need to recheck when alloc_consistency_checks() fails for other reasons. Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-15 16:10:10 +02:00
Vlastimil Babka	6f6fcd4634	slab: move validate_slab_ptr() from check_slab() to its callers We will want to do the validation earlier in some callers or remove it completely, so extract it from check_slab() first. No functional change. Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-15 16:10:10 +02:00
Vlastimil Babka	86169b00f8	slab: wrap debug slab validation in validate_slab_ptr() This will make it clear where we currently cast struct slab to folio only to check the slab type, and allow to change the implementation later with memdesc conversion. For now use a struct page based implementation instead of struct folio to be compatible with further upcoming changes. Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-15 16:10:10 +02:00
Matthew Wilcox (Oracle)	f4930de03d	slab: Remove dead code in free_consistency_checks() We already know that slab is a valid slab as that's checked by the caller. In the future, we won't be able to get to a slab pointer from a non-slab page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-09-15 16:10:10 +02:00
Mateusz Guzik	f99b391778	fs: rename generic_delete_inode() and generic_drop_inode() generic_delete_inode() is rather misleading for what the routine is doing. inode_just_drop() should be much clearer. The new naming is inconsistent with generic_drop_inode(), so rename that one as well with inode_ as the suffix. No functional changes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-15 16:09:42 +02:00
Mike Rapoport (Microsoft)	e68f150bc1	memblock: drop for_each_free_mem_pfn_range_in_zone_from() for_each_free_mem_pfn_range_in_zone_from() and its "backend" implementation __next_mem_pfn_range_in_zone() were only used by deferred initialization of the memory map. Remove them as they are not used anymore. Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>	2025-09-14 08:49:03 +03:00
Mike Rapoport (Microsoft)	219f624d06	mm/mm_init: drop deferred_init_maxorder() deferred_init_memmap_chunk() calls deferred_init_maxorder() to initialize struct pages in MAX_ORDER_NR_PAGES because according to commit `0e56acae4b` ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") this provides better cache locality than initializing the memory map in larger sections. The looping through free memory ranges is quite cumbersome in the current implementation as it is divided between deferred_init_memmap_chunk() and deferred_init_maxorder(). Besides, the latter has two loops, one that initializes struct pages and another one that frees them. There is no need in two loops because it is safe to free pages in groups smaller than MAX_ORDER_NR_PAGES. Even if lookup for a buddy page will access a struct page ahead of the pages being initialized, that page is guaranteed to be initialized either by memmap_init_reserved_pages() or by init_unavailable_range(). Simplify the code by moving initialization and freeing of the pages into deferred_init_memmap_chunk() and dropping deferred_init_maxorder(). Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>	2025-09-14 08:48:59 +03:00
Mike Rapoport (Microsoft)	f1f86187fd	mm/mm_init: deferred_init_memmap: use a job per zone deferred_init_memmap() loops over free memory ranges and creates a padata_mt_job for every free range that intersects with the zone being initialized. padata_do_multithreaded() then splits every such range to several chunks and runs a thread that initializes struct pages in that chunk using deferred_init_memmap_chunk(). The number of threads is limited by amount of the CPUs on the node (or 1 for memoryless nodes). Looping through free memory ranges is then repeated in deferred_init_memmap_chunk() first to find the first range that should be initialized and then to traverse the ranges until the end of the chunk is reached. Remove the loop over free memory regions in deferred_init_memmap() and pass the entire zone to padata_do_multithreaded() so that it will be divided to several chunks by the parallelization code. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>	2025-09-14 08:48:55 +03:00
Mike Rapoport (Microsoft)	3acb913c9d	mm/mm_init: use deferred_init_memmap_chunk() in deferred_grow_zone() deferred_grow_zone() initializes one or more sections in the memory map if buddy runs out of initialized struct pages when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled. It loops through memblock regions and initializes and frees pages in MAX_ORDER_NR_PAGES chunks. Essentially the same loop is implemented in deferred_init_memmap_chunk(), the only actual difference is that deferred_init_memmap_chunk() does not count initialized pages. Make deferred_init_memmap_chunk() count the initialized pages and return their number, wrap it with deferred_init_memmap_job() for multithreaded initialization with padata_do_multithreaded() and replace open-coded initialization of struct pages in deferred_grow_zone() with a call to deferred_init_memmap_chunk(). Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>	2025-09-14 08:48:48 +03:00
Wei Yang	204dfefe03	mm/page_alloc: find_large_buddy() from start_pfn aligned order We iterate pfn from order 0 to MAX_PAGE_ORDER aligned to find large buddy. While if the order is less than start_pfn aligned order, we would get the same pfn and do the same check again. Iterate from start_pfn aligned order to reduce duplicated work. [richard.weiyang@gmail.com: add comment on assignment of order] Link: https://lkml.kernel.org/r/20250828091618.7869-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250902025807.11467-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250828091618.7869-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250902025807.11467-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:25 -07:00
Quanmin Yan	d8f867fa08	mm/damon: add damon_ctx->min_sz_region Adopting addr_unit would make DAMON_MINREGION 'addr_unit * 4096' bytes and cause data alignment issues[1]. Add damon_ctx->min_sz_region to change DAMON_MIN_REGION from a global macro value to per-context variable. Link: https://lkml.kernel.org/r/20250828171242.59810-12-sj@kernel.org Link: https://lore.kernel.org/all/527714dd-0e33-43ab-bbbd-d89670ba79e7@huawei.com [1] Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:24 -07:00
SeongJae Park	540a2aebc6	mm/damon/sysfs: implement addr_unit file under context dir Only DAMON kernel API callers can use addr_unit parameter. Implement a sysfs file to let DAMON sysfs ABI users use it. Additionally, addr_unit must be set to a non-zero value. Link: https://lkml.kernel.org/r/20250828171242.59810-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:23 -07:00
SeongJae Park	01e7ee33a0	mm/damon/paddr: support addr_unit for DAMOS_STAT Add support of addr_unit for DAMOS_STAT action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:23 -07:00
SeongJae Park	ec1d5bab06	mm/damon/paddr: support addr_unit for MIGRATE_{HOT,COLD} Add support of addr_unit for DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	51a1ebd3a2	mm/damon/paddr: support addr_unit for DAMOS_LRU_[DE]PRIO Add support of addr_unit for DAMOS_LRU_PRIO and DAMOS_LRU_DEPRIO action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	85246435b2	mm/damon/paddr: support addr_unit for DAMOS_PAGEOUT Add support of addr_unit for DAMOS_PAGEOUT action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	d8096848e7	mm/damon/paddr: support addr_unit for access monitoring Add support of addr_unit paramer for access monitoing operations of paddr. Link: https://lkml.kernel.org/r/20250828171242.59810-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	09a616cbb3	mm/damon/core: add damon_ctx->addr_unit Patch series "mm/damon: support ARM32 with LPAE", v3. Previously, DAMON's physical address space monitoring only supported memory ranges below 4GB on LPAE-enabled systems. This was due to the use of 'unsigned long' in 'struct damon_addr_range', which is 32-bit on ARM32 even with LPAE enabled[1]. To add DAMON support for ARM32 with LPAE enabled, a new core layer parameter called 'addr_unit' was introduced[2]. Operations set layer can translate a core layer address to the real address by multiplying the parameter value to the core layer address. Support of the parameter is up to each operations layer implementation, though. For example, operations set implementations for virtual address space can simply ignore the parameter. Add the support on paddr, which is the DAMON operations set implementation for the physical address space, as we have a clear use case for that. This patch (of 11): In some cases, some of the real address that handled by the underlying operations set cannot be handled by DAMON since it uses only 'unsinged long' as the address type. Using DAMON for physical address space monitoring of 32 bit ARM devices with large physical address extension (LPAE) is one example[1]. Add a parameter name 'addr_unit' to core layer to help such cases. DAMON core API callers can set it as the scale factor that will be used by the operations set for translating the core layer's addresses to the real address by multiplying the parameter value to the core layer address. Support of the parameter is up to each operations set layer. The support from the physical address space operations set (paddr) will be added with following commits. Link: https://lkml.kernel.org/r/20250828171242.59810-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250828171242.59810-2-sj@kernel.org Link: https://lore.kernel.org/20250408075553.959388-1-zuoze1@huawei.com [1] Link: https://lore.kernel.org/all/20250416042551.158131-1-sj@kernel.org/ [2] Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:21 -07:00
Wei Yang	98c94f1035	mm/pageblock-flags: remove PB_migratetype_bits/PB_migrate_end enum pageblock_bits defines the meaning of pageblock bits. Currently PB_migratetype_bits says the lowest 3 bits represents migratetype and PB_migrate_end/MIGRATETYPE_MASK's definition rely on it with magical computation. Remove the definition of PB_migratetype_bits/PB_migrate_end. Use PB_migrate_[0\|1\|2] to represent lowest bits for migratetype. Then we can simplify related definition. Also, MIGRATETYPE_AND_ISO_MASK is MIGRATETYPE_MASK add isolation bit. Use MIGRATETYPE_MASK in the definition of MIGRATETYPE_AND_ISO_MASK looks cleaner. No functional change intended. Link: https://lkml.kernel.org/r/20250827070105.16864-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:21 -07:00
Wei Yang	dd3b304b94	mm/page_alloc: use xxx_pageblock_isolate() for better reading Patch series "mm/pageblock: improve readability of some pageblock handling", v3. During code reading, found two possible points to improve the readability of pageblock handling. Patch 1: isolate bit is standalone and there are dedicated helpers. Instead of check the bit directly, we could use the helper to do it. Patch 2: remove PB_migratetype_bits and PB_migrate_end to reduce magical computation. This patch (of 2): Since commit `e904bce2d9` ("mm/page_isolation: make page isolation a standalone bit"), it provides dedicated helper to handle isolation. Change to use these helpers to be better reading. No functional change intended. Link: https://lkml.kernel.org/r/20250827070105.16864-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250827070105.16864-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:21 -07:00
Boris Burkov	e3a9ac4e86	mm: add vmstat for kernel_file pages Kernel file pages are tricky to track because they are indistinguishable from files whose usage is accounted to the root cgroup. To maintain good accounting, introduce a vmstat counter tracking kernel file pages. Confirmed that these work as expected at a high level by mounting a btrfs using AS_KERNEL_FILE for metadata pages, and seeing the counter rise with fs usage then go back to a minimal level after drop_caches and finally down to 0 after unmounting the fs. Link: https://lkml.kernel.org/r/08ff633e3a005ed5f7691bfd9f58a5df8e474339.1755812945.git.boris@bur.io Signed-off-by: Boris Burkov <boris@bur.io> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Tested-by: syzbot@syzkaller.appspotmail.com Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qu Wenruo <wqu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:20 -07:00
Boris Burkov	cf1dec76ba	mm/filemap: add AS_KERNEL_FILE Patch series "introduce kernel file mapped folios", v4. Btrfs currently tracks its metadata pages in the page cache, using a fake inode (fs_info->btree_inode) with offsets corresponding to where the metadata is stored in the filesystem's full logical address space. A consequence of this is that when btrfs uses filemap_add_folio(), this usage is charged to the cgroup of whichever task happens to be running at the time. These folios don't belong to any particular user cgroup, so I don't think it makes much sense for them to be charged in that way. Some negative consequences as a result: - A task can be holding some important btrfs locks, then need to lookup some metadata and go into reclaim, extending the duration it holds that lock for, and unfairly pushing its own reclaim pain onto other cgroups. - If that cgroup goes into reclaim, it might reclaim these folios a different non-reclaiming cgroup might need soon. This is naturally offset by LRU reclaim, but still. We have two options for how to manage such file pages: 1. charge them to the root cgroup. 2. don't charge them to any cgroup at all. 2. breaks the invariant that every mapped page has a cgroup. This is workable, but unnecessarily risky. Therefore, go with 1. A very similar proposal to use the root cgroup was previously made by Qu, where he eventually proposed the idea of setting it per address_space. This makes good sense for the btrfs use case, as the behavior should apply to all use of the address_space, not select allocations. I.e., if someone adds another filemap_add_folio() call using btrfs's btree_inode, we would almost certainly want to account that to the root cgroup as well. This patch (of 3): Add the flag AS_KERNEL_FILE to the address_space to indicate that this mapping's memory is exempt from the usual memcg accounting. [boris@bur.io: fix CONFIG_MEMCG build for AS_KERNEL_FILE] Link: https://lkml.kernel.org/r/6de59ddeec81b5c294d337c001ba0061631d4ec6.1755816635.git.boris@bur.io Link: https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/ Link: https://lkml.kernel.org/r/f09c4e2c90351d4cb30a1969f7a863b9238bd291.1755812945.git.boris@bur.io Signed-off-by: Boris Burkov <boris@bur.io> Suggested-by: Qu Wenruo <wqu@suse.com> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:20 -07:00
Dev Jain	060b6c72ce	selftests/mm/uffd-stress: make test operate on less hugetlb memory Patch series "selftests/mm: uffd-stress fixes", v2. This patchset ensures that the number of hugepages is correctly set in the system so that the uffd-stress test does not fail due to the racy nature of the test. Patch 1 changes the hugepage constraint in the run_vmtests.sh script, whereas patch 2 changes the constraint in the test itself. This patch (of 2): We observed uffd-stress selftest failure on arm64 and intermittent failures on x86 too: running ./uffd-stress hugetlb-private 128 32 bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:617) [FAIL] not ok 18 uffd-stress hugetlb-private 128 32 # exit=1 For this particular case, the number of free hugepages from run_vmtests.sh will be 128, and the test will allocate 64 hugepages in the source location. The stress() function will start spawning threads which will operate on the destination location, triggering uffd-operations like UFFDIO_COPY from src to dst, which means that we will require 64 more hugepages for the dst location. Let us observe the locking_thread() function. It will lock the mutex kept at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for dst) hugepages have been reserved. In case of BOUNCE_RANDOM, it may happen that two threads trying to lock the mutex at dst, try to do so at the same hugepage number. If one thread succeeds in reserving the last hugepage, then the other thread may fail in alloc_hugetlb_folio(), returning -ENOMEM. I can confirm that this is indeed the case by this hacky patch: :--- a/mm/hugetlb.c ; +++ b/mm/hugetlb.c ; @@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t dst_pte, ; ; folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); ; if (IS_ERR(folio)) { ; + pte_t actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE); ; + if (actual_pte) { ; + ret = -EEXIST; ; + goto out; ; + } ; ret = -ENOMEM; ; goto out; ; } This code path gets triggered indicating that the PMD at which one thread is trying to map a hugepage, gets filled by a racing thread. Therefore, instead of using freepgs to compute the amount of memory, use freepgs - (min(32, nr_cpus) - 1), so that the test still has some extra hugepages to use. The adjustment is a function of min(32, nr_cpus) - the value of nr_parallel in the test - because in the worst case, nr_parallel number of threads will try to map a hugepage on the same PMD, one will win the allocation race, and the other nr_parallel - 1 threads will fail, so we need extra nr_parallel - 1 hugepages to satisfy this request. Note that, in case the adjusted value underflows, there is a check for the number of free hugepages in the test itself, which will fail: get_free_hugepages() < bytes / page_size A negative value will be passed on to bytes which is of type size_t, thus the RHS will become a large value and the check will fail, so we are safe. Link: https://lkml.kernel.org/r/20250909061531.57272-1-dev.jain@arm.com Link: https://lkml.kernel.org/r/20250909061531.57272-2-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:19 -07:00
Baolin Wang	6d11dec130	mm: shmem: drop the unnecessary folio_nr_pages() We've got the number of pages in the folio earlier, thus remove the redundant folio_nr_pages() call. Link: https://lkml.kernel.org/r/67c80182ebd949e3894908e01e224697c143aabb.1756200587.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:19 -07:00
Baolin Wang	ab1c34c834	mm: shmem: use 'folio' for shmem_partial_swap_usage() It is more straightforward to use the term `folio'. No functional changes. Link: https://lkml.kernel.org/r/a2d39608d99cba1130cacd9cffbafc6949193c08.1756200587.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:19 -07:00
Brendan Jackman	6c3826173e	mm/page_alloc: harmonize should_compact_retry() type Currently order is signed in one version of the function and unsigned in the other. Tidy that up. In page_alloc.c, order is unsigned in the vast majority of cases. But, there is a cluster of exceptions in compaction-related code (probably stemming from the fact that compact_control.order is signed). So, prefer local consistency and make this one signed too. Link: https://lkml.kernel.org/r/20250826-cleanup-should_compact_retry-v1-1-d2ca89727fcf@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:18 -07:00
Kairui Song	46afff4599	mm/page-writeback: drop usage of folio_index folio_index is only needed for mixed usage of page cache and swap cache. The remaining three caller in page-writeback are for page cache tag marking. Swap cache space doesn't use tag (explicitly sets mapping_set_no_writeback_tags), so use folio->index here directly. Link: https://lkml.kernel.org/r/20250825163721.17734-1-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Wei Yang	3615e106e0	mm/khugepaged: use list_xxx() helper to improve readability In general, khugepaged_scan_mm_slot() iterates khugepaged_scan.mm_head list to get a mm_struct for collapse memory. Use list_xxx() helper would be more obvious to the list iteration operation. No functional change. Link: https://lkml.kernel.org/r/20250822025732.9025-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Usama Arif	32960f7503	mm/huge_memory: remove enforce_sysfs from __thp_vma_allowable_orders Using forced_collapse directly is clearer and enforce_sysfs is not really needed. Link: https://lkml.kernel.org/r/20250821150038.2025521-1-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Brendan Jackman	ce32123b9b	mm: remove is_migrate_highatomic() There are 3 potential reasons for is_migrate_() helpers: 1. They represent higher-level attributes of migratetypes, like is_migrate_movable() 2. They are ifdef'd, like is_migrate_isolate(). 3. For consistency with an is_migrate__page() helper, also like is_migrate_isolate(). It looks like is_migrate_highatomic() was for case 3, but that was removed in commit `e0932b6c1f` ("mm: page_alloc: consolidate free page accounting"). So remove the indirection and go back to a simple comparison. Link: https://lkml.kernel.org/r/20250821-is-migrate-highatomic-v1-1-ddb6e5d7c566@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Nhat Pham	0b1bf60c32	mm/zswap: reduce the size of the compression buffer to a single page Reduce the compression buffer size from 2 * PAGE_SIZE to only one page, as the compression output (in the success case) should not exceed the length of the input. In the past, Chengming tried to reduce the compression buffer size, but ran into issues with the LZO algorithm (see [2]). Herbert Xu reported that the issue has been fixed (see [3]). Now we should have the guarantee that compressors' output should not exceed one page in the success case, and the algorithm will just report failure otherwise. With this patch, we save one page per cpu (per compression algorithm). Link: https://lkml.kernel.org/r/20250820181547.3794167-1-nphamcs@gmail.com Link: https://lore.kernel.org/linux-mm/20231213-zswap-dstmem-v4-1-f228b059dd89@bytedance.com/ [1] Link: https://lore.kernel.org/lkml/0000000000000b05cd060d6b5511@google.com/ [2] Link: https://lore.kernel.org/linux-mm/aKUmyl5gUFCdXGn-@gondor.apana.org.au/ [3] Co-developed-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Nhat Pham <nphamcs@gmail.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00
gaoxiang17	0cd01c4a5c	mm/cma: add 'available count' and 'total count' to trace_cma_alloc_start This makes cma info more intuitive during debugging. Show up in the trace as: 279.814717: cma_alloc_start: name=reserved request_count=4 available_count=8096 total_count=8192 align=0 309.790580: cma_alloc_start: name=reserved request_count=4 available_count=8092 total_count=8192 align=0 317.046609: cma_alloc_start: name=reserved request_count=4 available_count=8088 total_count=8192 align=0 Link: https://lkml.kernel.org/r/8a79284879c529f467478552825154b018076e95.1755729178.git.gaoxiang17@xiaomi.com Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00
Wei Yang	5d5d75ff64	mm/rmap: use folio_large_nr_pages() when we are sure it is a large folio Non-large folio is handled at the beginning, so it is a large folio for sure. Use folio_large_nr_pages() here like elsewhere. Link: https://lkml.kernel.org/r/20250817032647.29147-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00
Wei Yang	e5e758922d	mm/rmap: not necessary to mask off FOLIO_PAGES_MAPPED At this point, we are in an if branch conditional on (nr < ENTIRELY_MAPPED), and FOLIO_PAGES_MAPPED is equal to (ENTIRELY_MAPPED - 1). This means the upper bits are already cleared. It is not necessary to mask it off. Link: https://lkml.kernel.org/r/20250817032647.29147-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:14 -07:00
Steven Rostedt	658fa653b4	mm, x86/mm: move creating the tlb_flush event back to x86 code Commit `e73ad5ff2f` ("mm, x86/mm: Make the batched unmap TLB flush API more generic") moved the trace_tlb_flush out of mm/rmap.c and back into x86 specific architecture, but it kept the include to the events/tlb.h file, even though it didn't use that event. Then another commit came in and added more events to the mm/rmap.c file and moved the #define CREATE_TRACE_POINTS define from the x86 specific architecture to the generic mm/rmap.h file to create both the tlb_flush tracepoint and the new tracepoints. But since the tlb_flush tracepoint is only x86 specific, it now creates that tracepoint for all other architectures and this wastes approximately 5K of text and meta data that will not be used. Remove the events/tlb.h from mm/rmap.c and add the define CREATE_TRACE_POINTS back in the x86 code. Link: https://lkml.kernel.org/r/20250612100313.3b9a8b80@batman.local.home Fixes: `e73ad5ff2f` ("mm, x86/mm: Make the batched unmap TLB flush API more generic") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:14 -07:00
Christoph Hellwig	7bebb41b96	mm: remove write_cache_pages No users left. Link: https://lkml.kernel.org/r/20250818061017.1526853-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:14 -07:00
Baokun Li	63ec0c26b6	tmpfs: preserve SB_I_VERSION on remount Now tmpfs enables i_version by default and tmpfs does not modify it. But SB_I_VERSION can also be modified via sb_flags, and reconfigure_super() always overwrites the existing flags with the latest ones. This means that if tmpfs is remounted without specifying iversion, the default i_version will be unexpectedly disabled. To ensure iversion remains enabled, SB_I_VERSION is now always set for fc->sb_flags in shmem_init_fs_context(), instead of for sb->s_flags in shmem_fill_super(). Link: https://lkml.kernel.org/r/20250819061803.1496443-1-libaokun@huaweicloud.com Fixes: `36f05cab0a` ("tmpfs: add support for an i_version counter") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:12 -07:00
Zi Yan	9eff16bd3a	mm/huge_memory: add new_order and offset to split_huge_pages*() pr_debug Patch series "Better split_huge_page_test result check", v5. This patchset uses kpageflags to get after-split folio orders for a better split_huge_page_test result check[1]. The added gather_after_split_folio_orders() scans through a VPN range and collects the numbers of folios at different orders. check_after_split_folio_orders() compares the result of gather_after_split_folio_orders() to a given list of numbers of different orders. This patchset also adds new order and in folio offset to the split huge page debugfs's pr_debug()s; This patch (of 5): They are useful information for debugging split huge page tests. Link: https://lkml.kernel.org/r/20250818184622.1521620-1-ziy@nvidia.com Link: https://lkml.kernel.org/r/20250818184622.1521620-2-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: wang lian <lianux.mm@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:11 -07:00
Li RongQing	b322e88b3d	mm/hugetlb: early exit from hugetlb_pages_alloc_boot() when max_huge_pages=0 Optimize hugetlb_pages_alloc_boot() to return immediately when max_huge_pages is 0, avoiding unnecessary CPU cycles and the below log message when hugepages aren't configured in the kernel command line. [ 3.702280] HugeTLB: allocation took 0ms with hugepage_allocation_threads=32 Link: https://lkml.kernel.org/r/20250814102333.4428-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Tested-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:11 -07:00
Chi Zhiling	35224da7e3	mm/filemap: skip non-uptodate folio if there are available folios When reading data exceeding the maximum IO size, the operation is split into multiple IO requests, but the data isn't immediately copied to userspace after each IO completion. For example, when reading 2560k data from a device with 1280k maximum IO size, the following sequence occurs: 1. read 1280k 2. copy 41 pages and issue read ahead for next 1280k 3. copy 31 pages to user buffer 4. wait the next 1280k 5. copy 8 pages to user buffer 6. copy 20 folios(64k) to user buffer The 8 pages in step 5 are copied after the second 1280k completes(step 4) due to waiting for a non-uptodate folio in filemap_update_page. We can copy the 8 pages before the second 1280k completes(step 4) to reduce the latency of this read operation. After applying the patch, these 8 pages will be copied before the next IO completes: 1. read 1280k 2. copy 41 pages and issue read ahead for next 1280k 3. copy 31 pages to user buffer 4. copy 8 pages to user buffer 5. wait the next 1280k 6. copy 20 folios(64k) to user buffer This patch drops a setting of IOCB_NOWAIT for AIO, which is fine because filemap_read will set it again for AIO. The final solution provided by Matthew Wilcox: Link: https://lore.kernel.org/linux-fsdevel/aIDy076Sxt544qja@casper.infradead.org/ Link: https://lkml.kernel.org/r/20250728083952.75518-3-chizhiling@163.com Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:11 -07:00

... 3 4 5 6 7 ...

25197 Commits