linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 13:41:48 -04:00

Author	SHA1	Message	Date
Shivank Garg	398556570e	mm/khugepaged: retry with sync writeback for MADV_COLLAPSE When MADV_COLLAPSE is called on file-backed mappings (e.g., executable text sections), the pages may still be dirty from recent writes. collapse_file() will trigger async writeback and fail with SCAN_PAGE_DIRTY_OR_WRITEBACK (-EAGAIN). MADV_COLLAPSE is a synchronous operation where userspace expects immediate results. If the collapse fails due to dirty pages, perform synchronous writeback on the specific range and retry once. This avoids spurious failures for freshly written executables while avoiding unnecessary synchronous I/O for mappings that are already clean. Link: https://lkml.kernel.org/r/20260118190939.8986-7-shivankg@amd.com Signed-off-by: Shivank Garg <shivankg@amd.com> Reported-by: Branden Moore <Branden.Moore@amd.com> Closes: https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@amd.com Fixes: `34488399fa` ("mm/madvise: add file and shmem support to MADV_COLLAPSE") Suggested-by: David Hildenbrand <david@kernel.org> Tested-by: Lance Yang <lance.yang@linux.dev> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: wang lian <lianux.mm@gmail.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:12 -08:00
Shivank Garg	5173ae0a06	mm/khugepaged: map dirty/writeback pages failures to EAGAIN Patch series "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE", v5. MADV_COLLAPSE on file-backed mappings fails with -EINVAL when TEXT pages are dirty. This affects scenarios like package/container updates or executing binaries immediately after writing them, etc. The issue is that collapse_file() triggers async writeback and returns SCAN_FAIL (maps to -EINVAL), expecting khugepaged to revisit later. But MADV_COLLAPSE is synchronous and userspace expects immediate success or a clear retry signal. Reproduction: - Compile or copy 2MB-aligned executable to XFS/ext4 FS - Call MADV_COLLAPSE on .text section - First call fails with -EINVAL (text pages dirty from copy) - Second call succeeds (async writeback completed) Issue Report: https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@amd.com This patch (of 2): When collapse_file encounters dirty or writeback pages in file-backed mappings, it currently returns SCAN_FAIL which maps to -EINVAL. This is misleading as EINVAL suggests invalid arguments, whereas dirty/writeback pages represent transient conditions that may resolve on retry. Introduce SCAN_PAGE_DIRTY_OR_WRITEBACK to cover both dirty and writeback states, mapping it to -EAGAIN. For MADV_COLLAPSE, this provides userspace with a clear signal that retry may succeed after writeback completes. For khugepaged, this is harmless as it will naturally revisit the range during periodic scans after async writeback completes. Link: https://lkml.kernel.org/r/20260118190939.8986-2-shivankg@amd.com Link: https://lkml.kernel.org/r/20260118190939.8986-4-shivankg@amd.com Fixes: `34488399fa` ("mm/madvise: add file and shmem support to MADV_COLLAPSE") Signed-off-by: Shivank Garg <shivankg@amd.com> Reported-by: Branden Moore <Branden.Moore@amd.com> Closes: https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@amd.com Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: wang lian <lianux.mm@gmail.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:12 -08:00
Wei Yang	f9b74c13b7	mm/mmu_gather: remove @delay_remap of __tlb_remove_page_size() __tlb_remove_page_size() is only used in tlb_remove_page_size() with @delay_remap set to false and it is passed directly to __tlb_remove_folio_pages_size(). Remove @delay_remap of __tlb_remove_page_size() and call __tlb_remove_folio_pages_size() with false @delay_remap. Link: https://lkml.kernel.org/r/20251231030026.15938-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:54 -08:00
Dipendra Khadka	29ec27805f	mm/oom_kill: remove unnecessary integer promotion in format string The 'h' length modifier in '%hd' is unnecessary as short integers are promoted to int in variadic functions. Use '%d' instead. Checkpatch flags the 'h' modifier as unnecessary for this reason, and many other subsystems have moved to using %d for promoted types. Hence, I think this patch aligns with kernel coding practices. Link: https://lkml.kernel.org/r/20251228154456.2386-1-kdipendra88@gmail.com Signed-off-by: Dipendra Khadka <kdipendra88@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:54 -08:00
Shu Anzai	860996495f	mm/damon/tests/core-kunit: remove a redundant test case and add a new test case in damos_test_commit_quota_goal() Remove a redundant test case from damos_test_commit_quota_goal() as it is already covered. Instead, add a new test for DAMOS_QUOTA_SOME_MEM_PSI_US, which was previously not tested. Link: https://lkml.kernel.org/r/20251224042200.2061847-6-shu17az@gmail.com Signed-off-by: Shu Anzai <shu17az@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:54 -08:00
Shu Anzai	2caf45764a	mm/damon/tests/core-kunit: add test cases for multiple regions in damon_test_split_regions_of() Extend damon_test_split_regions_of() to verify that it correctly handles multiple regions with various 'min_sz_region'. [sj@kernel.org: remove braces in damon_test_split_regions_of()] Link: https://lkml.kernel.org/r/20251224153125.69194-1-sj@kernel.org Link: https://lkml.kernel.org/r/20251224042200.2061847-5-shu17az@gmail.com Signed-off-by: Shu Anzai <shu17az@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:53 -08:00
Shu Anzai	65a17a3e60	mm/damon/tests/core-kunit: add a test case for region merge size limit in damon_test_merge_regions_of() Add a test case in damon_test_merge_regions_of() to verify that two adjacent regions are not merged if the resulting region would exceed the specified size limit. Link: https://lkml.kernel.org/r/20251224042200.2061847-4-shu17az@gmail.com Signed-off-by: Shu Anzai <shu17az@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:53 -08:00
Shu Anzai	738dae96b2	mm/damon/tests/core-kunit: verify the 'age' and 'nr_accesses_bp' fields in damon_test_merge_two() Extend damon_test_merge_two() to verify the 'age' and 'nr_accesses_bp' fields. Link: https://lkml.kernel.org/r/20251224042200.2061847-3-shu17az@gmail.com Signed-off-by: Shu Anzai <shu17az@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:53 -08:00
Shu Anzai	6c59085fc0	mm/damon/tests/core-kunit: verify the 'age' field in damon_test_split_at() Patch series "mm/damon/tests/core-kunit: extend existing test scenarios", v2. Improve the KUnit test coverage for DAMON. The five patches in this series respectively extend damon_test_split_at(), damon_test_merge_two(), damon_test_merge_regions_of(), damon_test_split_regions_of(), and damos_test_commit_quota_goal(). This patch (of 5): Extend damon_test_split_at() to verify the 'age' field. Link: https://lkml.kernel.org/r/20251224042200.2061847-1-shu17az@gmail.com Link: https://lkml.kernel.org/r/20251224042200.2061847-2-shu17az@gmail.com Signed-off-by: Shu Anzai <shu17az@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:53 -08:00
Wei Yang	a8d933dc33	mm/vmstat: remove unused node and zone state helpers Several helper functions for managing node and zone states have become obsolete and no longer have any callers within the kernel. inc_node_state() inc_zone_state() dec_zone_state() This commit removes the dead code. Link: https://lkml.kernel.org/r/20251225210213.2553-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:52 -08:00
Chunyu Hu	6319c4f442	selftests/mm: fix comment for check_test_requirements The test supports arm64 as well so the comment is incorrect. And there's a check for arm64 in va_high_addr_switch.c. Link: https://lkml.kernel.org/r/20251221040025.3159990-5-chuhu@redhat.com Fixes: `983e760bcd` ("selftest/mm: va_high_addr_switch: add ppc64 support check") Fixes: `f556acc2fa` ("selftests/mm: skip test for non-LPA2 and non-LVA systems") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:52 -08:00
Chunyu Hu	dd0202a0bd	selftests/mm: va_high_addr_switch return fail when either test failed When the first test failed, and the hugetlb test passed, the result would be pass, but we expect a fail. Fix this issue by returning fail if either is not KSFT_PASS. Link: https://lkml.kernel.org/r/20251221040025.3159990-4-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:52 -08:00
Chunyu Hu	7544d7969d	selftests/mm: remove arm64 nr_hugepages setup for va_high_addr_switch test arm64 and x86_64 has the same nr_hugepages requriement for running the va_high_addr_switch test. Since commit `d9d957bd7b` ("selftests/mm: alloc hugepages in va_high_addr_switch test"), the setup can be done in va_high_addr_switch.sh. So remove the duplicated setup. Link: https://lkml.kernel.org/r/20251221040025.3159990-3-chuhu@redhat.com Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Cc: Luiz Capitulino <luizcap@redhat.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:52 -08:00
Chunyu Hu	b1f031e33c	selftests/mm: allocate 6 hugepages in va_high_addr_switch.sh The va_high_addr_switch test requires 6 hugepages, not 5. If running the test directly by: ./va_high_addr_switch.sh, the test will hit a mmap 'FAIL' caused by not enough hugepages: mmap(addr_switch_hint - hugepagesize, 2hugepagesize, MAP_HUGETLB): 0x7f330f800000 - OK mmap(addr_switch_hint , 2hugepagesize, MAP_FIXED \| MAP_HUGETLB): 0xffffffffffffffff - FAILED The failure can't be hit if run the tests by running 'run_vmtests.sh -t hugevm' because the nr_hugepages is set to 128 at the beginning of run_vmtests.sh and va_high_addr_switch.sh skip the setup of nr_hugepages because already enough. Link: https://lkml.kernel.org/r/20251221040025.3159990-2-chuhu@redhat.com Fixes: `d9d957bd7b` ("selftests/mm: alloc hugepages in va_high_addr_switch test") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:51 -08:00
Chunyu Hu	b47beff129	selftests/mm: fix va_high_addr_switch.sh return value Patch series "Fix va_high_addr_switch.sh test failure - again", v2. The series address several issues exist for the va_high_addr_switch test: 1) the test return value is ignored in va_high_addr_switch.sh. 2) the va_high_addr_switch test requires 6 hugepages not 5. 3) the reurn value of the first test in va_high_addr_switch.c can be overridden by the second test. 4) the nr_hugepages setup in run_vmtests.sh for arm64 can be done in va_high_addr_switch.sh too. 5) update a comment for check_test_requirements. This patch: (of 5) The return value should be return value of va_high_addr_switch, otherwise a test failure would be silently ignored. Link: https://lkml.kernel.org/r/20251221040025.3159990-1-chuhu@redhat.com Fixes: `d9d957bd7b` ("selftests/mm: alloc hugepages in va_high_addr_switch test") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Cc: Luiz Capitulino <luizcap@redhat.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:51 -08:00
Li Wang	b618876f2e	selftests/mm/charge_reserved_hugetlb.sh: add waits with timeout helper The hugetlb cgroup usage wait loops in charge_reserved_hugetlb.sh were unbounded and could hang forever if the expected cgroup file value never appears (e.g. due to write_to_hugetlbfs in Error mapping). === Error log === # uname -r 6.12.0-xxx.el10.aarch64+64k # ls /sys/kernel/mm/hugepages/hugepages-* hugepages-16777216kB/ hugepages-2048kB/ hugepages-524288kB/ #./charge_reserved_hugetlb.sh -cgroup-v2 # ----------------------------------------- ... # nr hugepages = 10 # writing cgroup limit: 5368709120 # writing reseravation limit: 5368709120 ... # write_to_hugetlbfs: Error mapping the file: Cannot allocate memory # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 ... Introduce a small helper, wait_for_file_value(), and use it for: - waiting for reservation usage to drop to 0, - waiting for reservation usage to reach a given size, - waiting for fault usage to reach a given size. This makes the waits consistent and adds a hard timeout (60 tries with 1s sleep) so the test fails instead of stalling indefinitely. Link: https://lkml.kernel.org/r/20251221122639.3168038-4-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:51 -08:00
Li Wang	1aa1dd9cc5	selftests/mm/charge_reserved_hugetlb: drop mount size for hugetlbfs charge_reserved_hugetlb.sh mounts a hugetlbfs instance at /mnt/huge with a fixed size of 256M. On systems with large base hugepages (e.g. 512MB), this is smaller than a single hugepage, so the hugetlbfs mount ends up with zero capacity (often visible as size=0 in mount output). As a result, write_to_hugetlbfs fails with ENOMEM and the test can hang waiting for progress. === Error log === # uname -r 6.12.0-xxx.el10.aarch64+64k #./charge_reserved_hugetlb.sh -cgroup-v2 # ----------------------------------------- ... # nr hugepages = 10 # writing cgroup limit: 5368709120 # writing reseravation limit: 5368709120 ... # write_to_hugetlbfs: Error mapping the file: Cannot allocate memory # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 ... # mount \|grep /mnt/huge none on /mnt/huge type hugetlbfs (rw,relatime,seclabel,pagesize=512M,size=0) # grep -i huge /proc/meminfo ... HugePages_Total: 10 HugePages_Free: 10 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 524288 kB Hugetlb: 5242880 kB Drop the mount args with 'size=256M', so the filesystem capacity is sufficient regardless of HugeTLB page size. Link: https://lkml.kernel.org/r/20251221122639.3168038-3-liwang@redhat.com Fixes: `29750f71a9` ("hugetlb_cgroup: add hugetlb_cgroup reservation tests") Signed-off-by: Li Wang <liwang@redhat.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:51 -08:00
Li Wang	8e46adb62f	selftests/mm/write_to_hugetlbfs: parse -s as size_t Patch series "selftests/mm: hugetlb cgroup charging: robustness fixes", v3. This series fixes a few issues in the hugetlb cgroup charging selftests (write_to_hugetlbfs.c + charge_reserved_hugetlb.sh) that show up on systems with large hugepages (e.g. 512MB) and when failures cause the test to wait indefinitely. On an aarch64 64k page kernel with 512MB hugepages, the test consistently fails in write_to_hugetlbfs with ENOMEM and then hangs waiting for the expected usage values. The root cause is that charge_reserved_hugetlb.sh mounts hugetlbfs with a fixed size=256M, which is smaller than a single hugepage, resulting in a mount with size=0 capacity. In addition, write_to_hugetlbfs previously parsed -s via atoi() into an int, which can overflow and print negative sizes. Reproducer / environment: - Kernel: 6.12.0-xxx.el10.aarch64+64k - Hugepagesize: 524288 kB (512MB) - ./charge_reserved_hugetlb.sh -cgroup-v2 - Observed mount: pagesize=512M,size=0 before this series After applying the series, the test completes successfully on the above setup. This patch (of 3): write_to_hugetlbfs currently parses the -s size argument with atoi() into an int. This silently accepts malformed input, cannot report overflow, and can truncate large sizes. === Error log === # uname -r 6.12.0-xxx.el10.aarch64+64k # ls /sys/kernel/mm/hugepages/hugepages-* hugepages-16777216kB/ hugepages-2048kB/ hugepages-524288kB/ #./charge_reserved_hugetlb.sh -cgroup-v2 # ----------------------------------------- ... # nr hugepages = 10 # writing cgroup limit: 5368709120 # writing reseravation limit: 5368709120 ... # Writing to this path: /mnt/huge/test # Writing this size: -1610612736 <-------- Switch the size variable to size_t and parse -s with sscanf("%zu", ...). Also print the size using %zu. This avoids incorrect behavior with large -s values and makes the utility more robust. Link: https://lkml.kernel.org/r/20251221122639.3168038-1-liwang@redhat.com Link: https://lkml.kernel.org/r/20251221122639.3168038-2-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: David Hildenbrand <david@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:50 -08:00
Gregory Price	3bb64898f0	page_alloc: allow migration of smaller hugepages during contig_alloc We presently skip regions with hugepages entirely when trying to do contiguous page allocation. This will cause otherwise-movable 2MB HugeTLB pages to be considered unmovable, and makes 1GB gigantic page allocation less reliable on systems utilizing both. Commit `4d73ba5fa7` ("mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages") skipped all HugePage containing regions because it can cause significant delays in 1G allocation (as HugeTLB migrations may fail for a number of reasons). Instead, if hugepage migration is enabled, consider regions with hugepages smaller than the target contiguous allocation request as valid targets for allocation. We optimize for the existing behavior by searching for non-hugetlb regions in a first pass, then retrying the search to include hugetlb only on failure. This allows the existing fast-path to remain the default case with a slow-path fallback to increase reliability. We only fallback to the slow path if a hugetlb region was detected, and we do a full re-scan because the zones/blocks may have changed during the first pass (and it's not worth further complexity). isolate_migrate_pages_block() has similar hugetlb filter logic, and the hugetlb code does a migratable check in folio_isolate_hugetlb() during isolation. The code servicing the allocation and migration already supports this exact use case. To test, allocate a bunch of 2MB HugeTLB pages (in this case 48GB) and then attempt to allocate some 1G HugeTLB pages (in this case 4GB) (Scale to your machine's memory capacity). echo 24576 > .../hugepages-2048kB/nr_hugepages echo 4 > .../hugepages-1048576kB/nr_hugepages Prior to this patch, the 1GB page reservation can fail if no contiguous 1GB pages remain. After this patch, the kernel will try to move 2MB pages and successfully allocate the 1GB pages (assuming overall sufficient memory is available). Also tested this while a program had the 2MB reservations mapped, and the 1GB reservation still succeeds. folio_alloc_gigantic() is the primary user of alloc_contig_pages(), other users are debug or init-time allocations and largely unaffected. - ppc/memtrace is a debugfs interface - x86/tdx memory allocation occurs once on module-init - kfence/core happens once on module (late) init - THP uses it in debug_vm_pgtable_alloc_huge_page at __init time Link: https://lkml.kernel.org/r/20251221124656.2362540-1-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Suggested-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/linux-mm/6fe3562d-49b2-4975-aa86-e139c535ad00@redhat.com/ Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:50 -08:00
Gregory Price	9e80e66dda	mm, hugetlb: implement movable_gigantic_pages sysctl This reintroduces a concept removed by: commit `d6cb41cc44` ("mm, hugetlb: remove hugepages_treat_as_movable sysctl") This sysctl provides flexibility between ZONE_MOVABLE use cases: 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility 2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable When ZONE_MOVABLE is used to make huge page allocation more reliable, disallowing gigantic pages memory in this region is pointless. If hotplug is not a requirement, we can loosen the restrictions to allow 1GB gigantic pages in ZONE_MOVABLE. Since 1GB can be difficult to migrate / has impacts on compaction / defragmentation, we don't enable this by default. Notably, 1GB pages can only be migrated if another 1GB page is available - so hot-unplug will fail if such a page cannot be found. However, since there are scenarios where gigantic pages are migratable, we should allow use of these on movable regions. When not valid 1GB is available for migration, hot-unplug will retry indefinitely (or until interrupted). For example: echo 0 > node0/hugepages/..-1GB/nr_hugepages # clear node0 1GB pages echo 1 > node1/hugepages/..-1GB/nr_hugepages # reserve node1 1GB page ./alloc_huge_node1 & # Allocate a 1GB page on node1 ./node1_offline & # attempt to offline all node1 memory echo 1 > node0/hugepages/..-1GB/nr_hugepages # reserve node0 1GB page In this example, node1_offline will block indefinitely until the final step, when a node0 1GB page is made available. Note: Boot-time CMA is not possible for driver-managed hotplug memory, as CMA requires the memory to be registered as SystemRAM at boot time. Additionally, 1GB huge pages are not supported by THP. Link: https://lkml.kernel.org/r/20251221125603.2364174-1-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Suggested-by: David Rientjes <rientjes@google.com> Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/ Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Gregory Price <gourry@gourry.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:50 -08:00
Wentao Guan	7db0787000	mm: cleanup vma_iter_bulk_alloc commit `d240629148` ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()"), removed the only user and mas_expected_entries has been removed, since commit `e3852a1213` ("maple_tree: Drop bulk insert support"). Also cleanup the mas_expected_entries in maple_tree.h. No functional change. Link: https://lkml.kernel.org/r/20251106110929.3522073-1-guanwentao@uniontech.com Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cheng Nie <niecheng1@uniontech.com> Cc: Guan Wentao <guanwentao@uniontech.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Pedro Falcato <pfalcato@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:50 -08:00
Brendan Jackman	241b3a0963	mm: clarify GFP_ATOMIC/GFP_NOWAIT doc-comment The current description of contexts where it's invalid to make GFP_ATOMIC and GFP_NOWAIT calls is rather vague. Replace this with a direct description of the actual contexts of concern and refer to the RT docs where this is explained more discursively. While rejigging this prose, also move the documentation of GFP_NOWAIT to the GFP_NOWAIT section. Link: https://lore.kernel.org/all/d912480a-5229-4efe-9336-b31acded30f5@suse.cz/ Link: https://lkml.kernel.org/r/20251219-b4-gfp_atomic-comment-v2-1-4c4ce274c2b6@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:49 -08:00
Kairui Song	7969f30594	mm/gup: remove no longer used gup_fast_undo_dev_pagemap This helper is no longer used after commit `fd2825b076` ("mm/gup: remove pXX_devmap usage from get_user_pages()"). Link: https://lkml.kernel.org/r/20251219-gup-cleanup-v1-1-348a70d9eecb@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:49 -08:00
Vlastimil Babka	9c9828d3ea	mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations Since commit `cc638f329e` ("mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations"), THP page fault allocations have settled on the following scheme (from the commit log): 1. local node only THP allocation with no reclaim, just compaction. 2. for madvised VMA's or when synchronous compaction is enabled always - THP allocation from any node with effort determined by global defrag setting and VMA madvise 3. fallback to base pages on any node Recent customer reports however revealed we have a gap in step 1 above. What we have seen is excessive reclaim due to THP page faults on a NUMA node that's close to its high watermark, while other nodes have plenty of free memory. The problem with step 1 is that it promises no reclaim after the compaction attempt, however reclaim is only avoided for certain compaction outcomes (deferred, or skipped due to insufficient free base pages), and not e.g. when compaction is actually performed but fails (we did see compact_fail vmstat counter increasing). THP page faults can therefore exhibit a zone_reclaim_mode-like behavior, which is not the intention. Thus add a check for __GFP_THISNODE that corresponds to this exact situation and prevents continuing with reclaim/compaction once the initial compaction attempt isn't successful in allocating the page. Note that commit `cc638f329e` has not introduced this over-reclaim possibility; it appears to exist in some form since commit `2f0799a0ff` ("mm, thp: restore node-local hugepage allocations"). Followup commits `b39d0ee263` ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") and `cc638f329e` have moved in the right direction, but left the abovementioned gap. Link: https://lkml.kernel.org/r/20251219-costly-noretry-thisnode-fix-v1-1-e1085a4a0c34@suse.cz Fixes: `2f0799a0ff` ("mm, thp: restore node-local hugepage allocations") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:49 -08:00
Xiu Jianfeng	ed60c8e280	mm/hugetlb_cgroup: fix -Wformat-truncation warning A false-positive compile warnings with -Wformat-trucation was introduced by commit `47179fe035` ("mm/hugetlb_cgroup: prepare cftypes based on template") on arch s390. Suppress it by replacing snprintf() with scnprintf(). mm/hugetlb_cgroup.c: In function 'hugetlb_cgroup_file_init': mm/hugetlb_cgroup.c:829:44: warning: '%s' directive output may be truncated writing up to 1623 bytes into a region of size between 32 and 63 [-Wformat-truncation=] 829 \| snprintf(cft->name, MAX_CFTYPE_NAME, "%s.%s", buf, tmpl->name); \| ^~ Link: https://lkml.kernel.org/r/20251222072359.3626182-1-xiujianfeng@huaweicloud.com Fixes: `47179fe035` ("mm/hugetlb_cgroup: prepare cftypes based on template") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512212332.9lFRbgdS-lkp@intel.com/ Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:48 -08:00
Kevin Lourenco	62451ae347	mm: fix minor spelling mistakes in comments Correct several typos in comments across files in mm/ [akpm@linux-foundation.org: also fix comment grammar, per SeongJae] Link: https://lkml.kernel.org/r/20251218150906.25042-1-klourencodev@gmail.com Signed-off-by: Kevin Lourenco <klourencodev@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:48 -08:00
Kevin Lourenco	5ec9bb6de4	mm/damon: fix typos in comments Correct minor spelling mistakes in several files under mm/damon. No functional changes. Link: https://lkml.kernel.org/r/20251217181216.47576-1-klourencodev@gmail.com Signed-off-by: Kevin Lourenco <klourencodev@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:48 -08:00
Heiko Carstens	a9853ac1c3	zram: remove KMSG_COMPONENT macro The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" from 2008 [1] which never made it upstream. The macro was added to s390 code to allow for an out-of-tree patch which used this to generate unique message ids. Also this out-of-tree doesn't exist anymore. The pattern of how the KMSG_COMPONENT is used was partially also used for non s390 specific code, for whatever reasons. Remove the macro in order to get rid of a pointless indirection. Link: https://lkml.kernel.org/r/20251126143602.2207435-1-hca@linux.ibm.com Link: https://lwn.net/Articles/292650/ [1] Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:48 -08:00
Thorsten Blum	84355caa27	mm/mm_init: replace simple_strtoul with kstrtobool in set_hashdist Use bool for 'hashdist' and replace simple_strtoul() with kstrtobool() for parsing the 'hashdist=' boot parameter. Unlike simple_strtoul(), which returns an unsigned long, kstrtobool() converts the string directly to bool and avoids implicit casting. Check the return value of kstrtobool() and reject invalid values. This adds error handling while preserving behavior for existing values, and removes use of the deprecated simple_strtoul() helper. The current code silently sets 'hashdist = 0' if parsing fails, instead of leaving the default value (HASHDIST_DEFAULT) unchanged. Additionally, kstrtobool() accepts common boolean strings such as "on" and "off". Link: https://lkml.kernel.org/r/20251217110214.50807-1-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:47 -08:00
Audra Mitchell	a98ec863fd	lib/test_vmalloc.c: minor fixes to test_vmalloc.c If PAGE_SIZE is larger than 4k and if you have a system with a large number of CPUs, this test can require a very large amount of memory leading to oom-killer firing. Given the type of allocation, the kernel won't have anything to kill, causing the system to stall. Add a parameter to the test_vmalloc driver to represent the number of times a percpu object will be allocated. Calculate this in test_vmalloc.sh to be 90% of available memory or the current default of 35000, whichever is smaller. Link: https://lkml.kernel.org/r/20251201181848.1216197-1-audra@redhat.com Signed-off-by: Audra Mitchell <audra@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rafael Aquini <raquini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:47 -08:00
Sidhartha Kumar	bd4526e64b	maple_tree: remove struct maple_alloc struct maple_alloc is deprecated after the maple tree conversion to sheaves, remove the references from the header file. Link: https://lkml.kernel.org/r/20251203224511.469978-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:47 -08:00
Johannes Weiner	64dd89ae01	mm/block/fs: remove laptop_mode Laptop mode was introduced to save battery, by delaying and consolidating writes and thereby maximize the time rotating hard drives wouldn't have to spin. Luckily, rotating hard drives, with their high spin-up times and power draw, are a thing of the past for battery-powered devices. Reclaim has also since changed to not write single filesystem pages anymore, and regular filesystem writeback is lumpy by design. The juice doesn't appear worth the squeeze anymore. The footprint of the feature is small, but nevertheless it's a complicating factor in mm, block, filesystems. Developers don't think about it, and it likely hasn't been tested with new reclaim and writeback changes in years. Let's sunset it. Keep the sysctl with a deprecation warning around for a few more cycles, but remove all functionality behind it. [akpm@linux-foundation.org: fix Documentation/admin-guide/laptops/index.rst] Link: https://lkml.kernel.org/r/20251216185201.GH905277@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:47 -08:00
Sergey Senozhatsky	657a81fe3b	zram: drop pp_in_progress pp_in_progress makes sure that only one post-processing (writeback or recomrpession) is active at any given time. Functionality wise it, basically, shadows zram init_lock, when init_lock is acquired in writer mode. Switch recompress_store() and writeback_store() to take zram init_lock in writer mode, like all store() sysfs handlers should do, so that we can drop pp_in_progress. Recompression and writeback can be somewhat slow, so holding init_lock in writer mode can block zram attrs reads, but in reality the only zram attrs reads that take place are mm_stat reads, and usually it's the same process that reads mm_stat and does recompression or writeback. Link: https://lkml.kernel.org/r/20251216071342.687993-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:46 -08:00
JaeJoon Jung	9082f24bd3	mm/damon/stat: deduplicate intervals_goal setup in damon_stat_build_ctx() The damon_stat_build_ctx() function sets the values of intervals_goal structure members. These values are applied to damon_ctx in damon_set_attrs(). However, It is resetting the values that were already applied previously to the same values. I suggest removing this code as it constitutes duplicate execution. Link: https://patch.msgid.link/20251206011716.7185-1-rgbi3307@gmail.com Link: https://lkml.kernel.org/r/20251216073440.40891-1-sj@kernel.org Signed-off-by: JaeJoon Jung <rgbi3307@gmail.com> Reviewed-by: Enze Li <lienze@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:46 -08:00
SeongJae Park	804c26b961	mm/damon/core: add trace point for damos stat per apply interval DAMON users can read DAMOS stats via DAMON sysfs interface. It enables efficient, simple and flexible usages of the stats. Especially for systems not having advanced tools like perf or bpftrace, that can be useful. But if the advanced tools are available, exposing the stats via tracepoint can reduce unnecessary reimplementation of the wheels. Add a new tracepoint for DAMOS stats, namely damos_stat_after_apply_interval. The tracepoint is triggered for each scheme's apply interval and exposes the whole stat values. If the user needs sub-apply interval information for any chance, damos_before_apply tracepoint could be used. Link: https://lkml.kernel.org/r/20251216080128.42991-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:46 -08:00
SeongJae Park	dcecf9e58b	Docs/ABI/damon: update for max_nr_snapshots Update DAMON ABI document for the newly added DAMON sysfs interface file, max_nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:46 -08:00
SeongJae Park	2584dd7496	Docs/admin-guide/mm/damon/usage: update for max_nr_snapshots Update DAMON usage document for the newly added DAMON sysfs interface file, max_nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:45 -08:00
SeongJae Park	64aa87f03d	Docs/mm/damon/design: update for max_nr_snapshots Update DAMON design document for the newly added snapshot level DAMOS deactivation feature, max_nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:45 -08:00
SeongJae Park	204ab9ab93	mm/damon/sysfs-schemes: implement max_nr_snapshots file Add a new DAMON sysfs file for setting and getting the newly introduced per-DAMON-snapshot level DAMOS deactivation control parameter, max_nr_snapshots. The file has a name same to the parameter and placed under the damos stat directory. Link: https://lkml.kernel.org/r/20251216080128.42991-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:45 -08:00
SeongJae Park	84e425c68e	mm/damon/core: implement max_nr_snapshots There are DAMOS use cases that require user-space centric control of its activation and deactivation. Having the control plane on the user-space, or using DAMOS as a way for monitoring results collection are such examples. DAMON parameters online commit, DAMOS quotas and watermarks can be useful for this purpose. However, those features work only at the sub-DAMON-snapshot level. In some use cases, the DAMON-snapshot level control is required. For example, in DAMOS-based monitoring results collection use case, the user online-installs a DAMOS scheme with DAMOS_STAT action, wait it be applied to whole regions of a single DAMON-snapshot, retrieves the stats and tried regions information, and online-uninstall the scheme. It is efficient to ensure the lifetime of the scheme as no more no less one snapshot consumption. To support such use cases, introduce a new DAMOS core API per-scheme parameter, namely max_nr_snapshots. As the name implies, it is the upper limit of nr_snapshots, which is a DAMOS stat that represents the number of DAMON-snapshots that the scheme has fully applied. If the limit is set with a non-zero value and nr_snapshots reaches or exceeds the limit, the scheme is deactivated. Link: https://lkml.kernel.org/r/20251216080128.42991-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:45 -08:00
SeongJae Park	ccaa2d062a	mm/damon: update damos kerneldoc for stat field Commit `0e92c2ee9f` ("mm/damon/schemes: account scheme actions that successfully applied") has replaced ->stat_count and ->stat_sz of 'struct damos' with ->stat. The commit mistakenly did not update the related kernel doc comment, though. Update the comment. Link: https://lkml.kernel.org/r/20251216080128.42991-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:44 -08:00
SeongJae Park	55221e53f7	Docs/ABI/damon: update for nr_snapshots damos stat Update DAMON ABI document for the newly added damos stat, nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:44 -08:00
SeongJae Park	0b43f89e2d	Docs/admin-guide/mm/damon/usage: update for nr_snapshots damos stat Update DAMON usage document for the newly added damos stat, nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:44 -08:00
SeongJae Park	ee7f5d1933	Docs/mm/damon/design: update for nr_snapshots damos stat Update DAMON design document for the newly added damos stat, nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:43 -08:00
SeongJae Park	83a741b974	mm/damon/sysfs-schemes: introduce nr_snapshots damos stat file Introduce a new DAMON sysfs interface file for exposing the newly added DAMOS stat, nr_snapshots. The file has the name same to the stat name (nr_snapshots) and placed under the damos stat sysfs directory. Link: https://lkml.kernel.org/r/20251216080128.42991-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:43 -08:00
SeongJae Park	4a6ceb7c97	mm/damon/core: introduce nr_snapshots damos stat Patch series "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats". Introduce three changes for improving DAMOS stat's provided information, deterministic control, and reading usability. DAMOS provides stats that are important for understanding its behavior. It lacks information about how many DAMON-generated monitoring output snapshots it has worked on. Add a new stat, nr_snapshots, to show the information. Users can control DAMOS schemes in multiple ways. Using the online parameters commit feature, they can install and uninstall DAMOS schemes whenever they want while keeping DAMON runs. DAMOS quotas and watermarks can be used for manually or automatically turning on/off or adjusting the aggressiveness of the scheme. DAMOS filters can be used for applying the scheme to specific memory entities based on their types and locations. Some users want their DAMOS scheme to be applied to only specific number of DAMON snapshots, for more deterministic control. One example use case is tracepoint based snapshot reading. Add a new knob, max_nr_snapshots, to support this. If the nr_snapshots parameter becomes same to or greater than the value of this parameter, the scheme is deactivated. Users can read DAMOS stats via DAMON's sysfs interface. For deep level investigations on environments having advanced tools like perf and bpftrace, exposing the stats via a tracepoint can be useful. Implement a new tracepoint, namely damon:damos_stat_after_apply_interval. First five patches (patches 1-5) of this series implement the new stat, nr_snapshots, on the core layer (patch 1), expose on DAMON sysfs user interface (patch 2), and update documents (patches 3-5). Following six patches (patches 6-11) are for the new stat based DAMOS deactivation (max_nr_snapshots). The first one (patch 6) of this group updates a kernel-doc comment before making further changes. Then an implementation of it on the core layer (patch 7), an introduction of a new DAMON sysfs interface file for users of the feature (patch 8), and three updates of the documents (patches 9-11) follow. The final one (patch 12) introduces the new tracepoint that exposes the DAMOS stat values for each scheme apply interval. This patch (of 12): DAMON generates monitoring results snapshots for every sampling interval. DAMOS applies given schemes on the regions of the snapshots, for every apply interval of the scheme. DAMOS stat informs a given scheme has tried to how many memory entities and applied, in the region and byte level. In some use cases including user-space oriented tuning and investigations, it is useful to know that in the DAMON-snapshot level. Introduce a new stat, namely nr_snapshots for DAMON core API callers. [sj@kernel.org: fix wrong list_is_last() call in damons_is_last_region()] Link: https://lkml.kernel.org/r/20260114152049.99727-1-sj@kernel.org Link: https://lkml.kernel.org/r/20251216080128.42991-1-sj@kernel.org Link: https://lkml.kernel.org/r/20251216080128.42991-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:43 -08:00
Kaushlendra Kumar	8b8017d7c4	tools/mm/slabinfo: fix --partial long option mapping The long option "--partial" was incorrectly mapped to lowercase 'p' in the opts[] array, but the getopt string and switch case handle uppercase 'P'. This mismatch caused --partial to be rejected. Fix the long_options mapping to use 'P' so --partial works correctly alongside the existing -P short option. Link: https://lkml.kernel.org/r/20251208105240.2719773-1-kaushlendra.kumar@intel.com Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Reviewed-by: SeongJae Park <sj@kernel.org> Tested-by: SeongJae Park <sj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:43 -08:00
Kaushlendra Kumar	9f5edd785d	tools/mm/thp_swap_allocator_test: fix small folio alignment Use ALIGNMENT_SMALLFOLIO instead of ALIGNMENT_MTHP when allocating small folios to ensure correct memory alignment for the test case. Before: test allocates small folios with 64KB alignment (ALIGNMENT_MTHP) when only 4KB alignment (ALIGNMENT_SMALLFOLIO) is needed. This wastes address space and may cause allocation failures on systems with fragmented memory. Worst-case impact: this only affects thp_swap_allocator_test tool behavior. Link: https://lkml.kernel.org/r/20251209031745.2723120-1-kaushlendra.kumar@intel.com Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:42 -08:00
Enze Li	6e4930e333	mm/damon/core: fix wasteful CPU calls by skipping non-existent targets Currently, DAMON does not proactively clean up invalid monitoring targets during its runtime. When some monitored processes exit, DAMON continues to make the following unnecessary function calls, --damon_for_each_target-- --damon_for_each_region-- damon_do_apply_schemes damos_apply_scheme damon_va_apply_scheme damos_madvise damon_get_mm it is only in the damon_get_mm() function that it may finally discover the target no longer exists, which wastes CPU resources. A simple idea is to check for the existence of monitoring targets within the kdamond_need_stop() function and promptly clean up non-existent targets. However, SJ pointed out that this approach is problematic because the online commit logic incorrectly uses list indices to update the monitoring state. This can lead to data loss if the target list is changed concurrently. Meanwhile, SJ suggests checking for target existence at the damon_for_each_target level, and if a target does not exist, simply skip it and proceed to the next one. Link: https://lkml.kernel.org/r/20251210052508.264433-1-lienze@kylinos.cn Signed-off-by: Enze Li <lienze@kylinos.cn> Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:42 -08:00
Johannes Weiner	16cc8b9396	mm: memcontrol: rename mem_cgroup_from_slab_obj() In addition to slab objects, this function is used for resolving non-slab kernel pointers. This has caused confusion in recent refactoring work. Rename it to mem_cgroup_from_virt(), sticking with terminology established by the virt_to_<foo>() converters. Link: https://lore.kernel.org/linux-mm/20251113161424.GB3465062@cmpxchg.org/ Link: https://lkml.kernel.org/r/20251210154301.720133-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:42 -08:00

1 2 3 4 5 ...

1413655 Commits