linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 11:40:19 -04:00

Author	SHA1	Message	Date
Anthony Yznaga	d0813985a2	sparc64: remove hugetlb_free_pgd_range() Patch series "drop hugetlb_free_pgd_range()". For all architectures that support hugetlb except for sparc, hugetlb_free_pgd_range() just calls free_pgd_range(). It turns out the sparc implementation is essentially identical to free_pgd_range() and can be removed. Remove it and update free_pgtables() to treat hugetlb VMAs the same as others. This patch (of 3): The sparc implementation of hugetlb_free_pgd_range() is identical to free_pgd_range() with the exception of checking for and skipping possible leaf entries at the PUD and PMD levels. These checks are unnecessary because any huge pages have been freed and their PTEs cleared by the time page tables needed to map them are freed. While some huge page sizes do populate the page table with multiple PTEs, they are correctly cleared by huge_ptep_get_and_clear(). To verify this, libhugetlbfs tests were run for 64K, 8M, and 256M page sizes with an instrumented kernel on a qemu guest modified to support the 256M page size. The same tests were used to verify no regressions after applying this patch and were also run on x86 for both 2M and 1G page sizes. Link: https://lkml.kernel.org/r/20250716012611.10369-1-anthony.yznaga@oracle.com Link: https://lkml.kernel.org/r/20250716012611.10369-2-anthony.yznaga@oracle.com Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:31 -07:00
Hugh Dickins	6344a6d9ce	mm/shmem: writeout free swap if swap_writeout() reactivates If swap_writeout() returns AOP_WRITEPAGE_ACTIVATE (for example, because zswap cannot compress and memcg disables writeback), there is no virtue in keeping that folio in swap cache and holding the swap allocation: shmem_writeout() switch it back to shmem page cache before returning. Folio lock is held, and folio->memcg_data remains set throughout, so there is no need to get into any memcg or memsw charge complications: swap_free_nr() and delete_from_swap_cache() do as much as is needed (but beware the race with shmem_free_swap() when inode truncated or evicted). Doing the same for an anonymous folio is harder, since it will usually have been unmapped, with references to the swap left in the page tables. Adding a function to remap the folio would be fun, but not worthwhile unless it has other uses, or an urgent bug with anon is demonstrated. [hughd@google.com: use shmem_recalc_inode() rather than open coding, per Baolin] Link: https://lkml.kernel.org/r/101a7d89-290c-545d-8a6d-b1174ed8b1e5@google.com Link: https://lkml.kernel.org/r/5c911f7a-af7a-5029-1dd4-2e00b66d565c@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: David Rientjes <rientjes@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Kairui Song <ryncsn@gmail.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:31 -07:00
Hugh Dickins	ea693aaa5c	mm/shmem: hold shmem_swaplist spinlock (not mutex) much less A flamegraph (from an MGLRU load) showed shmem_writeout()'s use of the global shmem_swaplist_mutex worryingly hot: improvement is long overdue. 3.1 commit `6922c0c7ab` ("tmpfs: convert shmem_writepage and enable swap") apologized for extending shmem_swaplist_mutex across add_to_swap_cache(), and hoped to find another way: yes, there may be lots of work to allocate radix tree nodes in there. Then 6.15 commit `b487a2da35` ("mm, swap: simplify folio swap allocation") will have made it worse, by moving shmem_writeout()'s swap allocation under that mutex too (but the worrying flamegraph was observed even before that change). There's a useful comment about pagelock no longer protecting from eviction once moved to swap cache: but it's good till shmem_delete_from_page_cache() replaces page pointer by swap entry, so move the swaplist add between them. We would much prefer to take the global lock once per inode than once per page: given the possible races with shmem_unuse() pruning when !swapped (and other tasks racing to swap other pages out or in), try the swaplist add whenever swapped was incremented from 0 (but inode may already be on the list - only unuse and evict bother to remove it). This technique is more subtle than it looks (we're avoiding the very lock which would make it easy), but works: whereas an unlocked list_empty() check runs a risk of the inode being unqueued and left off the swaplist forever, swapoff only completing when the page is faulted in or removed. The need for a sleepable mutex went away in 5.1 commit `b56a2d8af9` ("mm: rid swapoff of quadratic complexity"): a spinlock works better now. This commit is certain to take shmem_swaplist_mutex out of contention, and has been seen to make a practical improvement (but there is likely to have been an underlying issue which made its contention so visible). Link: https://lkml.kernel.org/r/87beaec6-a3b0-ce7a-c892-1e1e5bd57aa3@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Kairui Song <kasong@tencent.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:31 -07:00
Lorenzo Stoakes	d53f248258	tools/testing/selftests: extend mremap_test to test multi-VMA mremap Now that we have added the ability to move multiple VMAs at once, assert that this functions correctly, both overwriting VMAs and moving backwards and forwards with merge and VMA invalidation. Additionally assert that page tables are correctly propagated by setting random data and reading it back. Link: https://lkml.kernel.org/r/139074a24a011ca4ed52498a7fa2080024b43917.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:31 -07:00
Lorenzo Stoakes	d23cb648e3	mm/mremap: permit mremap() move of multiple VMAs Historically we've made it a uAPI requirement that mremap() may only operate on a single VMA at a time. For instances where VMAs need to be resized, this makes sense, as it becomes very difficult to determine what a user actually wants should they indicate a desire to expand or shrink the size of multiple VMAs (truncate? Adjust sizes individually? Some other strategy?). However, in instances where a user is moving VMAs, it is restrictive to disallow this. This is especially the case when anonymous mapping remap may or may not be mergeable depending on whether VMAs have or have not been faulted due to anon_vma assignment and folio index alignment with vma->vm_pgoff. Often this can result in surprising impact where a moved region is faulted, then moved back and a user fails to observe a merge from otherwise compatible, adjacent VMAs. This change allows such cases to work without the user having to be cognizant of whether a prior mremap() move or other VMA operations has resulted in VMA fragmentation. We only permit this for mremap() operations that do NOT change the size of the VMA and DO specify MREMAP_MAYMOVE \| MREMAP_FIXED. Should no VMA exist in the range, -EFAULT is returned as usual. If a VMA move spans a single VMA - then there is no functional change. Otherwise, we place additional requirements upon VMAs: * They must not have a userfaultfd context associated with them - this requires dropping the lock to notify users, and we want to perform the operation with the mmap write lock held throughout. * If file-backed, they cannot have a custom get_unmapped_area handler - this might result in MREMAP_FIXED not being honoured, which could result in unexpected positioning of VMAs in the moved region. There may be gaps in the range of VMAs that are moved: X Y X Y <---> <-> <---> <-> \|-------\| \|-----\| \|-----\| \|-------\| \|-----\| \|-----\| \| A \| \| B \| \| C \| ---> \| A' \| \| B' \| \| C' \| \|-------\| \|-----\| \|-----\| \|-------\| \|-----\| \|-----\| addr new_addr The move will preserve the gaps between each VMA. Note that any failures encountered will result in a partial move. Since an mremap() can fail at any time, this might result in only some of the VMAs being moved. Note that failures are very rare and typically require an out of a memory condition or a mapping limit condition to be hit, assuming the VMAs being moved are valid. We don't try to assess ahead of time whether VMAs are valid according to the multi VMA rules, as it would be rather unusual for a user to mix uffd-enabled VMAs and/or VMAs which map unusual driver mappings that specify custom get_unmapped_area() handlers in an aggregate operation. So we optimise for the far, far more likely case of the operation being entirely permissible. In the case of the move of a single VMA, the above conditions are permitted. This makes the behaviour identical for a single VMA as before. Link: https://lkml.kernel.org/r/8cab2f2c202c4208bdfdb562635748bea6eb37bf.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:30 -07:00
Lorenzo Stoakes	2cf442d742	mm/mremap: clean up mlock populate behaviour When an mlock()'d VMA is expanded, we need to populate the expanded region to maintain the contract that all mlock()'d memory is present (albeit - with some period after mmap unlock where the expanded part of the mapping remains unfaulted). The current implementation is very unclear, so make it absolutely explicit under what circumstances we do this. Link: https://lkml.kernel.org/r/2358b0006baa9cab83db4259817794f16fe1992e.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:30 -07:00
Lorenzo Stoakes	9b2301bf8d	mm/mremap: move remap_is_valid() into check_prep_vma() Group parameter check logic together, moving check_mremap_params() next to it. This puts all such checks into a single place, and invokes them early so we can simply bail out as soon as we are aware that a condition is not met. No functional change intended. Link: https://lkml.kernel.org/r/4d0669c23531629d8ead42aa701c6237bd6bf012.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:30 -07:00
Lorenzo Stoakes	a85dc37186	mm/mremap: check remap conditions earlier When we expand or move a VMA, this requires a number of additional checks to be performed. Make it really obvious under what circumstances these checks must be performed and aggregate all the checks in one place by invoking this in check_prep_vma(). We have to adjust the checks to account for shrink + move operations by checking new_len <= old_len rather than new_len == old_len. No functional change intended. [lorenzo.stoakes@oracle.com: allow undocumented mremap() shrink behaviour] Link: https://lkml.kernel.org/r/8fc92a38-c636-465e-9a2f-2c6ac9cb49b8@lucifer.local Link: https://lkml.kernel.org/r/8b4161ce074901e00602a446d81f182db92b0430.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:30 -07:00
Lorenzo Stoakes	f9f11398d4	mm/mremap: use an explicit uffd failure path for mremap Right now it appears that the code is relying upon the returned destination address having bits outside PAGE_MASK to indicate whether an error value is specified, and decrementing the increased refcount on the uffd ctx if so. This is not a safe means of determining an error value, so instead, be specific. It makes far more sense to do so in a dedicated error path, so add mremap_userfaultfd_fail() for this purpose and use this when an error arises. A vm_userfaultfd_ctx is not established until we are at the point where mremap_userfaultfd_prep() is invoked in copy_vma_and_data(), so this is a no-op until this happens. That is - uffd remap notification only occurs if the VMA is actually moved - at which point a UFFD_EVENT_REMAP event is raised. No errors can occur after this point currently, though it's certainly not guaranteed this will always remain the case, and we mustn't rely on this. However, the reason for needing to handle this case is that, when an error arises on a VMA move at the point of adjusting page tables, we revert this operation, and propagate the error. At this point, it is not correct to raise a uffd remap event, and we must handle it. This refactoring makes it abundantly clear what we are doing. We assume vrm->new_addr is always valid, which a prior change made the case even for mremap() invocations which don't move the VMA, however given no uffd context would be set up in this case it's immaterial to this change anyway. No functional change intended. Link: https://lkml.kernel.org/r/a70e8a1f7bce9f43d1431065b414e0f212297297.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:29 -07:00
Lorenzo Stoakes	e49e76c20b	mm/mremap: cleanup post-processing stage of mremap Separate out the uffd bits so it clear's what's happening. Don't bother setting vrm->mmap_locked after unlocking, because after this we are done anyway. The only time we drop the mmap lock is on VMA shrink, at which point vrm->new_len will be < vrm->old_len and the operation will not be performed anyway, so move this code out of the if (vrm->mmap_locked) block. All addresses returned by mremap() are page-aligned, so the offset_in_page() check on ret seems only to be incorrectly trying to detect whether an error occurred - explicitly check for this. No functional change intended. Link: https://lkml.kernel.org/r/ebb8f29650b8e343fe98fefc67b3a61a24d1e0f1.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:29 -07:00
Lorenzo Stoakes	f256a7a4ca	mm/mremap: put VMA check and prep logic into helper function Rather than lumping everything together in do_mremap(), add a new helper function, check_prep_vma(), to do the work relating to each VMA. This further lays groundwork for subsequent patches which will allow for batched VMA mremap(). Additionally, if we set vrm->new_addr == vrm->addr when prepping the VMA, this avoids us needing to do so in the expand VMA mlocked case. No functional change intended. Link: https://lkml.kernel.org/r/15efa3c57935f7f8894094b94c1803c2f322c511.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:29 -07:00
Lorenzo Stoakes	3215eaceca	mm/mremap: refactor initial parameter sanity checks We are currently checking some things later, and some things immediately. Aggregate the checks and avoid ones that need not be made. Simplify things by aligning lengths immediately. Defer setting the delta parameter until later, which removes some duplicate code in the hugetlb case. We can safely perform the checks moved from mremap_to() to check_mremap_params() because: * If we set a new address via vrm_set_new_addr(), then this is guaranteed to not overlap nor to position the new VMA past TASK_SIZE, so there's no need to check these later. * We can simply page align lengths immediately. We do not need to check for overlap nor TASK_SIZE sanity after hugetlb alignment as this asserts addresses are huge-aligned, then huge-aligns lengths, rounding down. This means any existing overlap would have already been caught. Moving things around like this lays the groundwork for subsequent changes to permit operations on batches of VMAs. No functional change intended. Link: https://lkml.kernel.org/r/c862d625c98b1abd861c406f2bfad8baf3287f83.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:29 -07:00
Lorenzo Stoakes	000c0691ec	mm/mremap: perform some simple cleanups Patch series "mm/mremap: permit mremap() move of multiple VMAs", v4. Historically we've made it a uAPI requirement that mremap() may only operate on a single VMA at a time. For instances where VMAs need to be resized, this makes sense, as it becomes very difficult to determine what a user actually wants should they indicate a desire to expand or shrink the size of multiple VMAs (truncate? Adjust sizes individually? Some other strategy?). However, in instances where a user is moving VMAs, it is restrictive to disallow this. This is especially the case when anonymous mapping remap may or may not be mergeable depending on whether VMAs have or have not been faulted due to anon_vma assignment and folio index alignment with vma->vm_pgoff. Often this can result in surprising impact where a moved region is faulted, then moved back and a user fails to observe a merge from otherwise compatible, adjacent VMAs. This change allows such cases to work without the user having to be cognizant of whether a prior mremap() move or other VMA operations has resulted in VMA fragmentation. In order to do this, this series performs a large amount of refactoring, most pertinently - grouping sanity checks together, separately those that check input parameters and those relating to VMAs. we also simplify the post-mmap lock drop processing for uffd and mlock()'d VMAs. With this done, we can then fairly straightforwardly implement this functionality. This works exclusively for mremap() invocations which specify MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the notification of the userland fault handler would require us to drop the mmap lock. It is also not compatible with file-backed mappings with customised get_unmapped_area() handlers as these may not honour MREMAP_FIXED. The input and output addresses ranges must not overlap. We carefully account for moves which would result in VMA iterator invalidation. While there can be gaps between VMAs in the input range, there can be no gap before the first VMA in the range. This patch (of 10): We const-ify the vrm flags parameter to indicate this will never change. We rename resize_is_valid() to remap_is_valid(), as this function does not only apply to cases where we resize, so it's simply confusing to refer to that here. We remove the BUG() from mremap_at(), as we should not BUG() unless we are certain it'll result in system instability. We rename vrm_charge() to vrm_calc_charge() to make it clear this simply calculates the charged number of pages rather than actually adjusting any state. We update the comment for vrm_implies_new_addr() to explain that MREMAP_DONTUNMAP does not require a set address, but will always be moved. Additionally consistently use 'res' rather than 'ret' for result values. No functional change intended. Link: https://lkml.kernel.org/r/cover.1752770784.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d35ad8ce6b2c33b2f2f4ef7ec415f04a35cba34f.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:28 -07:00
Lorenzo Stoakes	cfea89210a	mm/vma: refactor vma_modify_flags_name() to vma_modify_name() The single instance in which we use this function doesn't actually need to change VMA flags, so remove this parameter and update the caller accordingly. [lorenzo.stoakes@oracle.com: correct comment] Link: https://lkml.kernel.org/r/77f45b2e-a748-4635-9381-a5051091087f@lucifer.local Link: https://lkml.kernel.org/r/20250714135839.178032-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:28 -07:00
Hugh Dickins	3865301dc5	mm: optimize lru_note_cost() by adding lru_note_cost_unlock_irq() Dropping a lock, just to demand it again for an afterthought, cannot be good if contended: convert lru_note_cost() to lru_note_cost_unlock_irq(). [hughd@google.com: delete unneeded comment] Link: https://lkml.kernel.org/r/dbf9352a-1ed9-a021-c0c7-9309ac73e174@google.com Link: https://lkml.kernel.org/r/21100102-51b6-79d5-03db-1bb7f97fa94c@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Tested-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:28 -07:00
Hao Jia	526660b950	mm/mglru: stop try_to_inc_min_seq() if min_seq[type] has not increased In try_to_inc_min_seq(), if min_seq[type] has not increased. In other words, min_seq[type] == lrugen->min_seq[type]. Then we should return directly to avoid unnecessary overhead later. Corollary: If min_seq[type] of both anonymous and file is not increased, try_to_inc_min_seq() will fail. Proof: It is known that min_seq[type] has not increased, that is, min_seq[type] is equal to lrugen->min_seq[type], then the following: case 1: min_seq[type] has not been reassigned and changed before judgment min_seq[type] <= lrugen->min_seq[type]. Then the subsequent min_seq[type] <= lrugen->min_seq[type] judgment will always be true. case 2: min_seq[type] is reassigned to seq, before judgment min_seq[type] <= lrugen->min_seq[type]. Then at least the condition of min_seq[type] > seq must be met before min_seq[type] will be reassigned to seq. That is to say, before the reassignment, lrugen->min_seq[type] > seq is met, and then min_seq[type] = seq. Then the following min_seq[type](seq) <= lrugen->min_seq[type] judgment is always true. Therefore, in try_to_inc_min_seq(), If min_seq[type] of both anonymous and file is not increased, we can return false directly to avoid unnecessary overhead. Link: https://lkml.kernel.org/r/20250703023946.65315-1-jiahao.kernel@gmail.com Signed-off-by: Hao Jia <jiahao1@lixiang.com> Suggested-by: Yuanchu Xie <yuanchu@google.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kinsey Ho <kinseyho@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:28 -07:00
Sidhartha Kumar	9989db9f23	mm/page_owner: convert set_page_owner_migrate_reason() to folios Both callers of set_page_owner_migrate_reason() use folios. Convert the function to take a folio directly and move the &folio->page conversion inside __set_page_owner_migrate_reason(). Link: https://lkml.kernel.org/r/20250711145910.90135-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:57 -07:00
Thorsten Blum	5bd88fef6a	mm/memfd: replace deprecated strcpy() with memcpy() in alloc_name() strcpy() is deprecated; use memcpy() instead. Not copying the NUL terminator is safe because strncpy_from_user() would overwrite it anyway by appending uname to the destination buffer at index MFD_NAME_PREFIX_LEN. No functional changes intended. Link: https://github.com/KSPP/linux/issues/88 Link: https://lkml.kernel.org/r/20250712174516.64243-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:57 -07:00
SeongJae Park	5add26c0a1	mm/damon/core: remove damon_callback All damon_callback usages are replicated by damon_call() and damos_walk(). Time to say goodbye. Remove damon_callback. Link: https://lkml.kernel.org/r/20250712195016.151108-15-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:57 -07:00
SeongJae Park	0c96decca5	mm/damon/sysfs: remove damon_sysfs_before_terminate() DAMON core layer does target cleanup on its own. Remove duplicated and unnecessarily selective cleanup attempts in DAMON sysfs interface. Link: https://lkml.kernel.org/r/20250712195016.151108-14-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:57 -07:00
SeongJae Park	3a69f16357	mm/damon/core: destroy targets when kdamond_fn() finish When kdamond_fn() completes, the targets are kept. Those are kept to let callers do additional cleanups if they need. There are no such additional cleanups though. DAMON sysfs interface deallocates those in before_terminate() callback, to reduce unnecessary memory usage, for [f]vaddr use case. Just destroy the targets for every case in the core layer. This saves more memory and simplifies the logic. Link: https://lkml.kernel.org/r/20250712195016.151108-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:56 -07:00
SeongJae Park	f59ae147ab	mm/damon/sysfs: remove damon_sysfs_destroy_targets() The function was introduced for putting pids and deallocating unnecessary targets. Hence it is called before damon_destroy_ctx(). Now vaddr puts pid for each target destruction (cleanup_target()). damon_destroy_ctx() deallocates the targets anyway. So damon_sysfs_destroy_targets() has no reason to exist. Remove it. Link: https://lkml.kernel.org/r/20250712195016.151108-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:56 -07:00
SeongJae Park	ff01aba6e4	mm/damon/vaddr: put pid in cleanup_target() Implement cleanup_target() callback for [f]vaddr, which calls put_pid() for each target that will be destroyed. Also remove redundant put_pid() calls in core, sysfs and sample modules, which were required to be done redundantly due to the lack of such self cleanup in vaddr. Link: https://lkml.kernel.org/r/20250712195016.151108-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:56 -07:00
SeongJae Park	7114bc5e01	mm/damon/core: add cleanup_target() ops callback Some DAMON operation sets may need additional cleanup per target. For example, [f]vaddr need to put pids of each target. Each user and core logic is doing that redundantly. Add another DAMON ops callback that will be used for doing such cleanups in operations set layer. [sj@kernel.org: add kernel-doc comment for damon_operations->cleanup_target] Link: https://lkml.kernel.org/r/20250715185239.89152-2-sj@kernel.org [sj@kernel.org: remove damon_ctx->callback kernel-doc comment] Link: https://lkml.kernel.org/r/20250715185239.89152-3-sj@kernel.org Link: https://lkml.kernel.org/r/20250712195016.151108-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:56 -07:00
SeongJae Park	d4614161fb	mm/damon/core: do not call ops.cleanup() when destroying targets damon_operations.cleanup() is documented to be called for kdamond termination, but also being called for targets destruction, which is done for any damon_ctx destruction. Nobody is using the callback for now, though. Remove the cleanup() call under the destruction. Link: https://lkml.kernel.org/r/20250712195016.151108-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:55 -07:00
SeongJae Park	cc9c1b8c20	samples/damon/wsse: use damon_call() repeat mode instead of damon_callback wsse uses damon_callback for periodically reading DAMON internal data. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:55 -07:00
SeongJae Park	a6c33f1054	samples/damon/prcl: use damon_call() repeat mode instead of damon_callback prcl uses damon_callback for periodically reading DAMON internal data. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:55 -07:00
SeongJae Park	9cc8f00e52	mm/damon/lru_sort: use damon_call() repeat mode instead of damon_callback DAMON_LRU_SORT uses damon_callback for periodically reading and writing DAMON internal data and parameters. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:55 -07:00
SeongJae Park	5da7c70318	mm/damon/reclaim: use damon_call() repeat mode instead of damon_callback DAMON_RECLAIM uses damon_callback for periodically reading and writing DAMON internal data and parameters. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:54 -07:00
SeongJae Park	405f61996d	mm/damon/stat: use damon_call() repeat mode instead of damon_callback DAMON_STAT uses damon_callback for periodically reading DAMON internal data. Use its alternative, damon_call() repeat mode. Link: https://lkml.kernel.org/r/20250712195016.151108-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:54 -07:00
SeongJae Park	43df7676e5	mm/damon/core: introduce repeat mode damon_call() damon_call() can be useful for reading or writing DAMON internal data for one time. A common pattern of DAMON core usage from DAMON modules is doing such reads and writes repeatedly, for example, to periodically update the DAMOS stats. To do that with damon_call(), callers should call damon_call() repeatedly, with their own delay loop. Each caller doing that is repetitive. Introduce a repeat mode damon_call(). Callers can use the mode by setting a new field in damon_call_control. If the mode is turned on, damon_call() returns success immediately, and DAMON repeats invoking the callback function inside the kdamond main loop. Link: https://lkml.kernel.org/r/20250712195016.151108-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:54 -07:00
SeongJae Park	004ded6bee	mm/damon: accept parallel damon_call() requests Patch series "mm/damon: remove damon_callback". damon_callback was the only way for communicating with DAMON for contexts running on its worker thread. The interface is flexible and simple. But as DAMON evolves with more features, damon_callback has become somewhat too old. With runtime parameters update, for example, its lack of synchronization support was found to be inconvenient. Arguably it is also not easy to use correctly since the callers should understand when each callback is called, and implication of the return values from the callbacks. To replace it, damon_call() and damos_walk() are introduced. And those replaced a few damon_callback use cases. Some use cases of damon_callback such as parallel or repetitive DAMON internal data reading and additional cleanups cannot simply be replaced by damon_call() and damos_walk(), though. To allow those replaceable, extend damon_call() for parallel and/or repeated callbacks and modify the core/ops layers for additional resources cleanup. With the updates, replace the remaining damon_callback usages and finally say goodbye to damon_callback. This patch (of 14): Calling damon_call() while it is serving for another parallel thread immediately fails with -EBUSY. The caller should call it again, later. Each caller implementing such retry logic would be redundant. Accept parallel damon_call() requests and do the wait instead of the caller. Link: https://lkml.kernel.org/r/20250712195016.151108-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250712195016.151108-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:54 -07:00
Xuanye Liu	e001ef9652	mm: simplify min_brk handling in brk() Set min_brk to mm->start_brk by default, and override it with mm->end_data only when CONFIG_COMPAT_BRK is enabled and brk_randomized is false. This makes the logic clearer with no functional change. Link: https://lkml.kernel.org/r/20250710025859.926355-1-liuqiye2025@163.com Signed-off-by: Xuanye Liu <liuqiye2025@163.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:53 -07:00
Chi Zhiling	c809579374	readahead: use folio_nr_pages() instead of shift operation folio_nr_pages() is faster helper function to get the number of pages when NR_PAGES_IN_LARGE_FOLIO is enabled. Link: https://lkml.kernel.org/r/20250710060451.3535957-1-chizhiling@163.com Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:53 -07:00
Andy Shevchenko	188cb385bb	mm/hmm: move pmd_to_hmm_pfn_flags() to the respective #ifdeffery When pmd_to_hmm_pfn_flags() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_TRANSPARENT_HUGEPAGE=n: mm/hmm.c:186:29: warning: unused function 'pmd_to_hmm_pfn_flags' [-Wunused-function] Fix this by moving the function to the respective existing ifdeffery for its the only user. See also: `6863f5643d` ("kbuild: allow Clang to find unused static inline functions for W=1 build") Link: https://lkml.kernel.org/r/20250710082403.664093-1-andriy.shevchenko@linux.intel.com Fixes: `992de9a8b7` ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:53 -07:00
Davidlohr Bueso	b980077899	mm: introduce per-node proactive reclaim interface This adds support for allowing proactive reclaim in general on a NUMA system. A per-node interface extends support for beyond a memcg-specific interface, respecting the current semantics of memory.reclaim: respecting aging LRU and not supporting artificially triggering eviction on nodes belonging to non-bottom tiers. This patch allows userspace to do: echo "512M swappiness=10" > /sys/devices/system/node/nodeX/reclaim One of the premises for this is to semantically align as best as possible with memory.reclaim. During a brief time memcg did support nodemask until `55ab834a86` (Revert "mm: add nodes= arg to memory.reclaim"), for which semantics around reclaim (eviction) vs demotion were not clear, rendering charging expectations to be broken. With this approach: 1. Users who do not use memcg can benefit from proactive reclaim. The memcg interface is not NUMA aware and there are usecases that are focusing on NUMA balancing rather than workload memory footprint. 2. Proactive reclaim on top tiers will trigger demotion, for which memory is still byte-addressable. Reclaiming on the bottom nodes will trigger evicting to swap (the traditional sense of reclaim). This follows the semantics of what is today part of the aging process on tiered memory, mirroring what every other form of reclaim does (reactive and memcg proactive reclaim). Furthermore per-node proactive reclaim is not as susceptible to the memcg charging problem mentioned above. 3. Unlike the nodes= arg, this interface avoids confusing semantics, such as what exactly the user wants when mixing top-tier and low-tier nodes in the nodemask. Further per-node interface is less exposed to "free up memory in my container" usecases, where eviction is intended. 4. Users that really want to free up memory can use proactive reclaim on nodes knowingly to be on the bottom tiers to force eviction in a natural way - higher access latencies are still better than swap. If compelled, while no guarantees and perhaps not worth the effort, users could also also potentially follow a ladder-like approach to eventually free up the memory. Alternatively, perhaps an 'evict' option could be added to the parameters for both memory.reclaim and per-node interfaces to force this action unconditionally. [akpm@linux-foundation.org: user_proactive_reclaim(): return -EBUSY on PGDAT_RECLAIM_LOCKED contention, per Roman] [dave@stgolabs.net: memcg && node is also a bogus case, per Shakeel] Link: https://lkml.kernel.org/r/20250717235604.2atyx2aobwowpge3@offworld Link: https://lkml.kernel.org/r/20250623185851.830632-5-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:53 -07:00
Davidlohr Bueso	57972c78e6	mm/vmscan: make __node_reclaim() more generic As this will be called from non page allocator paths for proactive reclaim, allow users to pass the sc and nr of pages, and adjust the return value as well. No change in semantics. Link: https://lkml.kernel.org/r/20250623185851.830632-4-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:52 -07:00
Davidlohr Bueso	2b7226af73	mm/memcg: make memory.reclaim interface generic This adds a general call for both parsing as well as the common reclaim semantics. memcg is still the only user and no change in semantics. [akpm@linux-foundation.org: fix CONFIG_NUMA=n build] Link: https://lkml.kernel.org/r/20250623185851.830632-3-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:52 -07:00
Davidlohr Bueso	7a92f4f591	mm/vmscan: respect psi_memstall region in node reclaim Patch series "mm: per-node proactive reclaim", v2. This adds support for allowing proactive reclaim in general on a NUMA system. A per-node interface extends support for beyond a memcg-specific interface, respecting the current semantics of memory.reclaim: respecting aging LRU and not supporting artificially triggering eviction on nodes belonging to non-bottom tiers. This patch allows userspace to do: echo 512M swappiness=10 > /sys/devices/system/node/nodeX/reclaim One of the premises for this is to semantically align as best as possible with memory.reclaim. During a brief time memcg did support nodemask until `55ab834a86` (Revert "mm: add nodes= arg to memory.reclaim"), for which semantics around reclaim (eviction) vs demotion were not clear, rendering charging expectations to be broken. With this approach: 1. Users who do not use memcg can benefit from proactive reclaim. 2. Proactive reclaim on top tiers will trigger demotion, for which memory is still byte-addressable. Reclaiming on the bottom nodes will trigger evicting to swap (the traditional sense of reclaim). This follows the semantics of what is today part of the aging process on tiered memory, mirroring what every other form of reclaim does (reactive and memcg proactive reclaim). Furthermore per-node proactive reclaim is not as susceptible to the memcg charging problem mentioned above. 3. Unlike memcg, there should be no surprises of callers expecting reclaim but instead got a demotion. Essentially relying on behavior of shrink_folio_list() after `6b426d0714` ("mm: disable top-tier fallback to reclaim on proactive reclaim"), without the expectations of try_to_free_mem_cgroup_pages(). 4. Unlike the nodes= arg, this interface avoids confusing semantics, such as what exactly the user wants when mixing top-tier and low-tier nodes in the nodemask. Further per-node interface is less exposed to "free up memory in my container" usecases, where eviction is intended. 5. Users that really want to free up memory can use proactive reclaim on nodes knowingly to be on the bottom tiers to force eviction in a natural way - higher access latencies are still better than swap. If compelled, while no guarantees and perhaps not worth the effort, users could also also potentially follow a ladder-like approach to eventually free up the memory. Alternatively, perhaps an 'evict' option could be added to the parameters for both memory.reclaim and per-node interfaces to force this action unconditionally. This patch (of 4): ... rather benign but keep proper ending order. Link: https://lkml.kernel.org/r/20250623185851.830632-1-dave@stgolabs.net Link: https://lkml.kernel.org/r/20250623185851.830632-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:52 -07:00
Vishal Moola (Oracle)	e05d3a6014	mm: remove unmap_and_put_page() There are no callers of unmap_and_put_page() left. Remove it. Link: https://lkml.kernel.org/r/20250709194017.927978-6-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jordan Rome <linux@jordanrome.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:51 -07:00
Vishal Moola (Oracle)	0c092fef53	mm/memory.c: use folios in __access_remote_vm() Use kmap_local_folio() instead of kmap_local_page(). Replaces 2 calls to compound_head() with one. This prepares us for the removal of unmap_and_put_page(). Link: https://lkml.kernel.org/r/20250709194017.927978-5-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jordan Rome <linux@jordanrome.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:51 -07:00
Vishal Moola (Oracle)	0ec5eea208	mm/memory.c: use folios in __copy_remote_vm_str() Patch series "Remove unmap_and_put_page()". This patchset uses folios in both the callers of unmap_and_put_page(), saving a couple calls to compound_head() wrappers. This patch (of 3): Use kmap_local_folio() instead of kmap_local_page(). Replaces 2 calls to compound_head() from unmap_and_put_page() with one. This prepares us for the removal of unmap_and_put_page(). Link: https://lkml.kernel.org/r/20250709194017.927978-3-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20250709194017.927978-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jordan Rome <linux@jordanrome.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:51 -07:00
Bijan Tabatabai	db87a4e236	mm/damon/vaddr: apply filters in migrate_{hot/cold} The paddr versions of migrate_{hot/cold} filter out folios from migration based on the scheme's filters. This patch does the same for the vaddr versions of those schemes. The filtering code is mostly the same for the paddr and vaddr versions. The exception is the young filter. paddr determines if a page is young by doing a folio rmap walk to find the page table entries corresponding to the folio. However, vaddr schemes have easier access to the page tables, so we add some logic to avoid the extra work. Link: https://lkml.kernel.org/r/20250709005952.17776-14-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:51 -07:00
Bijan Tabatabai	0a707d6b04	mm/damon: move folio filtering from paddr to ops-common This patch moves damos_pa_filter_match and the functions it calls to ops-common, renaming it to damos_folio_filter_match. Doing so allows us to share the filtering logic for the vaddr version of the migrate_{hot,cold} schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-13-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:50 -07:00
Bijan Tabatabai	19c1dc15c8	mm/damon/vaddr: use damos->migrate_dests in migrate_{hot,cold} damos->migrate_dests provides a list of nodes the migrate_{hot,cold} actions should migrate to, as well as the weights which specify the ratio pages should be migrated to each destination node. This patch interleaves pages in the migrate_{hot,cold} actions according to the information provided in damos->migrate_dests if it is used. The interleaving algorithm used is similar to the one used in weighted_interleave_nid(). If damos->migration_dests is not provided, the actions migrate pages to the node specified in damos->target_nid as before. Link: https://lkml.kernel.org/r/20250709005952.17776-12-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:50 -07:00
Bijan Tabatabai	0af934b131	Docs/mm/damon/design: document vaddr support for migrate_{hot,cold} Document that the migrate_{hot,cold} schemes can be used by the vaddr operations set. Link: https://lkml.kernel.org/r/20250709005952.17776-11-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:50 -07:00
Bijan Tabatabai	256b0c7faa	mm/damon/vaddr: add vaddr versions of migrate_{hot,cold} migrate_{hot,cold} are paddr schemes that are used to migrate hot/cold data to a specified node. However, these schemes are only available when doing physical address monitoring. This patch adds an implementation for them virtual address monitoring as well. Link: https://lkml.kernel.org/r/20250709005952.17776-10-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:50 -07:00
Bijan Tabatabai	13dde31db7	mm/damon: move migration helpers from paddr to ops-common This patch moves the damon_pa_migrate_pages function along with its corresponding helper functions from paddr to ops-common. The function prefix of "damon_pa_" was also changed to just "damon_" accordingly. This patch will allow page migration to be available to vaddr schemes as well as paddr schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-9-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:49 -07:00
Bijan Tabatabai	cbc4eea4ff	mm/damon/core: commit damos->migrate_dests When committing new scheme parameters from the sysfs, copy the migrate_dests struct of the source schemes into the destination schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-8-bijan311@gmail.com Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:49 -07:00
SeongJae Park	3a4785c2d3	Docs/admin-guide/mm/damon/usage: document dests directory Document the newly added DAMOS action destination directory of the DAMON sysfs interface on the usage document. Link: https://lkml.kernel.org/r/20250709005952.17776-7-bijan311@gmail.com Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Bijan Tabatabai <bijantabatab@micron.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-19 18:59:49 -07:00

1 2 3 4 5 ...

1368910 Commits