Commit Graph

1429080 Commits

Author SHA1 Message Date
SeongJae Park
c2b0cb96e7 selftests/damon/drgn_dump_damon_status: support quota goal_tuner dumping
Update drgn_dump_damon_status.py, which is being used to dump the
in-kernel DAMON status for tests, to dump goal_tuner setup status.

Link: https://lkml.kernel.org/r/20260310010529.91162-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
c00863bc7c selftests/damon/_damon_sysfs: support goal_tuner setup
Add support of goal_tuner setup to the test-purpose DAMON sysfs interface
control helper, _damon_sysfs.py.

Link: https://lkml.kernel.org/r/20260310010529.91162-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
d972d68d50 mm/damon/tests/core-kunit: test goal_tuner commit
Extend damos_commit_quota() kunit test for the newly added goal_tuner
parameter.

Link: https://lkml.kernel.org/r/20260310010529.91162-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
3eda936f2a Docs/ABI/damon: update for goal_tuner
Update the ABI document for the newly added goal_tuner sysfs file.

Link: https://lkml.kernel.org/r/20260310010529.91162-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
d9cfe515d3 Docs/admin-guide/mm/damon/usage: document goal_tuner sysfs file
Update the DAMON usage document for the new sysfs file for the goal based
quota auto-tuning algorithm selection.

Link: https://lkml.kernel.org/r/20260310010529.91162-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
5a242f9daf Docs/mm/damon/design: document the goal-based quota tuner selections
Update the design document for the newly added goal-based quota tuner
selection feature.

Link: https://lkml.kernel.org/r/20260310010529.91162-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
e9a19cc85d mm/damon/sysfs-schemes: implement quotas->goal_tuner file
Add a new DAMON sysfs interface file, namely 'goal_tuner' under the DAMOS
quotas directory.  It is connected to the damos_quota->goal_tuner field. 
Users can therefore select their favorite goal-based quotas tuning
algorithm by writing the name of the tuner to the file.  Reading the file
returns the name of the currently selected tuner.

Link: https://lkml.kernel.org/r/20260310010529.91162-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
SeongJae Park
af738a6a00 mm/damon/core: introduce DAMOS_QUOTA_GOAL_TUNER_TEMPORAL
Introduce a new goal-based DAMOS quota auto-tuning algorithm, namely
DAMOS_QUOTA_GOAL_TUNER_TEMPORAL (temporal in short).  The algorithm aims
to trigger the DAMOS action only for a temporal time, to achieve the goal
as soon as possible.  For the temporal period, it uses as much quota as
allowed.  Once the goal is achieved, it sets the quota zero, so
effectively makes the scheme be deactivated.

Link: https://lkml.kernel.org/r/20260310010529.91162-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
SeongJae Park
54419bbd0e mm/damon/core: allow quota goals set zero effective size quota
User-explicit quotas (size and time quotas) having zero value means the
quotas are unset.  And, effective size quota is set as the minimum value
of the explicit quotas.  When quota goals are set, the goal-based quota
tuner can make it lower.  But the existing only single tuner never sets
the effective size quota zero.  Because of the fact, DAMON core assumes
zero effective quota means the user has set no quota.

Multiple tuners are now allowed, though.  In the future, some tuners might
want to set a zero effective size quota.  There is no reason to restrict
that.  Meanwhile, because of the current implementation, it will only
deactivate all quotas and make the scheme work at its full speed.

Introduce a dedicated function for checking if no quota is set.  The
function checks the fact by showing if the user-set explicit quotas are
zero and no goal is installed.  It is decoupled from zero effective quota,
and hence allows future tuners set zero effective quota for intentionally
deactivating the scheme by a purpose.

Link: https://lkml.kernel.org/r/20260310010529.91162-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
SeongJae Park
8719c59c4b mm/damon/core: introduce damos_quota_goal_tuner
Patch series "mm/damon: support multiple goal-based quota tuning
algorithms".

Aim-oriented DAMOS quota auto-tuning uses a single tuning algorithm.  The
algorithm is designed to find a quota value that should be consistently
kept for achieving the aimed goal for long term.  It is useful and
reliable at automatically operating systems that have dynamic environments
in the long term.

As always, however, no single algorithm fits all.  When the environment
has static characteristics or there are control towers in not only the
kernel space but also the user space, the algorithm shows some
limitations.  In such environments, users want kernel work in a more short
term deterministic way.  Actually there were at least two reports [1,2] of
such cases.

Extend DAMOS quotas goal to support multiple quota tuning algorithms that
users can select.  Keep the current algorithm as the default one, to not
break the old users.  Also give it a name, "consist", as it is designed to
"consistently" apply the DAMOS action.  And introduce a new tuning
algorithm, namely "temporal".  It is designed to apply the DAMOS action
only temporally, in a deterministic way.  In more detail, as long as the
goal is under-achieved, it uses the maximum quota available.  Once the
goal is over-achieved, it sets the quota zero.

Tests
=====

I confirmed the feature is working as expected using the latest version of
DAMON user-space tool, like below.

    $ # start DAMOS for reclaiming memory aiming 30% free memory
    $ sudo ./damo/damo start --damos_action pageout \
            --damos_quota_goal_tuner temporal \
            --damos_quota_goal node_mem_free_bp 30% 0 \
            --damos_quota_interval 1s \
            --damos_quota_space 100M

Note that >=3.1.8 version of DAMON user-space tool supports this feature
(--damos_quota_goal_tuner).  As expected, DAMOS stops reclaiming memory as
soon as the goal amount of free memory is made.  When 'consist' tuner is
used, the reclamation was continued even after the goal amount of free
memory is made, resulting in more than goal amount of free memory, as
expected.

Patch Sequence
==============

First four patches implement the features.  Patch 1 extends core API to
allow multiple tuners and make the current tuner as the default and only
available tuner, namely 'consist'.  Patch 2 allows future tuners setting
zero effective quota.  Patch 3 introduces the second tuner, namely
'temporal'.  Patch 4 further extends DAMON sysfs API to let users use
that.

Three following patches (patches 5-7) update design, usage, and ABI
documents, respectively.

Final four patches (patches 8-11) are for adding tests.  The eighth patch
(patch 8) extends the kunit test for online parameters commit for
validating the goal_tuner.  The ninth and the tenth patches (patches 9-10)
extend the testing-purpose DAMON sysfs control helper and DAMON status
dumping tool to support the newly added feature.  The final eleventh one
(patch 11) extends the existing online commit selftest to cover the new
feature.

This patch (of 11):

DAMOS quota goal feature utilizes a single feedback loop based algorithm
for automatic tuning of the effective quota.  It is useful in dynamic
environments that operate systems with only kernels in the long term. 
But, no one fits all.  It is not very easy to control in environments
having more controlled characteristics and user-space control towers.  We
actually got multiple reports [1,2] of use cases that the algorithm is not
optimal.

Introduce a new field of 'struct damos_quotas', namely 'goal_tuner'.  It
specifies what tuning algorithm the given scheme should use, and allows
DAMON API callers to set it as they want.  Nonetheless, this commit
introduces no new tuning algorithm but only the interface.  This commit
hence makes no behavioral change.  A new algorithm will be added by the
following commit.

Link: https://lkml.kernel.org/r/20260310010529.91162-2-sj@kernel.org
Link: https://lore.kernel.org/CALa+Y17__d=ZsM1yX+MXx0ozVdsXnFqF4p0g+kATEitrWyZFfg@mail.gmail.com [1]
Link: https://lore.kernel.org/20260204022537.814-1-yunjeong.mun@sk.com [2]
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
Hui Zhu
86e69c020b mm/swap: strengthen locking assertions and invariants in cluster allocation
swap_cluster_alloc_table() requires several locks to be held by its
callers: ci->lock, the per-CPU swap_cluster lock, and, for non-solid-state
devices (non-SWP_SOLIDSTATE), the si->global_cluster_lock.

While most call paths (e.g., via cluster_alloc_swap_entry() or
alloc_swap_scan_list()) correctly acquire these locks before invocation,
the path through swap_reclaim_work() -> swap_reclaim_full_clusters() ->
isolate_lock_cluster() is distinct.  This path operates exclusively on
si->full_clusters, where the swap allocation tables are guaranteed to be
already allocated.  Consequently, isolate_lock_cluster() should never
trigger a call to swap_cluster_alloc_table() for these clusters.

Strengthen the locking and state assertions to formalize these invariants:

1. Add a lockdep_assert_held() for si->global_cluster_lock in
   swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.
2. Reorder existing lockdep assertions in swap_cluster_alloc_table() to
   match the actual lock acquisition order (per-CPU lock, then global lock,
   then cluster lock).
3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that table
   allocations are only attempted for clusters being isolated from the
   free list. Attempting to allocate a table for a cluster from other
   lists (like the full list during reclaim) indicates a violation of
   subsystem invariants.

These changes ensure locking consistency and help catch potential
synchronization or logic issues during development.

[zhuhui@kylinos.cn: remove redundant comment, per Barry]
  Link: https://lkml.kernel.org/r/20260311022241.177801-1-hui.zhu@linux.dev
[zhuhui@kylinos.cn: initialize `flags', per Chris]
  Link: https://lkml.kernel.org/r/20260312023024.903143-1-hui.zhu@linux.dev
Link: https://lkml.kernel.org/r/20260310015657.42395-1-hui.zhu@linux.dev
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Reviewed-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
Anthony Yznaga
d239462787 mm: prevent droppable mappings from being locked
Droppable mappings must not be lockable.  There is a check for VMAs with
VM_DROPPABLE set in mlock_fixup() along with checks for other types of
unlockable VMAs which ensures this when calling mlock()/mlock2().

For mlockall(MCL_FUTURE), the check for unlockable VMAs is different.  In
apply_mlockall_flags(), if the flags parameter has MCL_FUTURE set, the
current task's mm's default VMA flag field mm->def_flags has VM_LOCKED
applied to it.  VM_LOCKONFAULT is also applied if MCL_ONFAULT is also set.
When these flags are set as default in this manner they are cleared in
__mmap_complete() for new mappings that do not support mlock.  A check for
VM_DROPPABLE in __mmap_complete() is missing resulting in droppable
mappings created with VM_LOCKED set.  To fix this and reduce that chance
of similar bugs in the future, introduce and use vma_supports_mlock().

Link: https://lkml.kernel.org/r/20260310155821.17869-1-anthony.yznaga@oracle.com
Fixes: 9651fcedf7 ("mm: add MAP_DROPPABLE for designating always lazily freeable mappings")
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Suggested-by: David Hildenbrand <david@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Tested-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
Sergey Senozhatsky
301f392200 zram: unify and harden algo/priority params handling
We have two functions that accept algo= and priority= params -
algorithm_params_store() and recompress_store().  This patch unifies and
hardens handling of those parameters.

There are 4 possible cases:

- only priority= provided [recommended]
  We need to verify that provided priority value is
  within permitted range for each particular function.

- both algo= and priority= provided
  We cannot prioritize one over another.  All we should
  do is to verify that zram is configured in the way
  that user-space expects it to be.  Namely that zram
  indeed has compressor algo= setup at given priority=.

- only algo= provided [not recommended]
  We should lookup priority in compressors list.

- none provided [not recommended]
  Just use function's defaults.

Link: https://lkml.kernel.org/r/20260311084312.1766036-7-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Suggested-by: Minchan Kim <minchan@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: gao xu <gaoxu2@honor.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
Sergey Senozhatsky
cedfa028b5 zram: remove chained recompression
Chained recompression has unpredictable behavior and is not useful in
practice.

First, systems usually configure just one alternative recompression
algorithm, which has slower compression/decompression but better
compression ratio.  A single alternative algorithm doesn't need chaining.

Second, even with multiple recompression algorithms, chained recompression
is suboptimal.  If a lower priority algorithm succeeds, the page is never
attempted with a higher priority algorithm, leading to worse memory
savings.  If a lower priority algorithm fails, the page is still attempted
with a higher priority algorithm, wasting resources on the failed lower
priority attempt.

In either case, the system would be better off targeting a specific
priority directly.

Chained recompression also significantly complicates the code.  Remove it.

Link: https://lkml.kernel.org/r/20260311084312.1766036-6-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: gao xu <gaoxu2@honor.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:24 -07:00
Sergey Senozhatsky
be5f13d948 zram: update recompression documentation
Emphasize usage of the `priority` parameter for recompression and explain
why `algo` parameter can lead to unexpected behavior and thus is not
recommended.

Link: https://lkml.kernel.org/r/20260311084312.1766036-5-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: gao xu <gaoxu2@honor.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:24 -07:00
Sergey Senozhatsky
5004a27edb zram: drop ->num_active_comps
It's not entirely correct to use ->num_active_comps for max-prio limit, as
->num_active_comps just tells the number of configured algorithms, not the
max configured priority.  For instance, in the following theoretical
example:

    [lz4] [nil] [nil] [deflate]

->num_active_comps is 2, while the actual max-prio is 3.

Drop ->num_active_comps and use ZRAM_MAX_COMPS instead.

Link: https://lkml.kernel.org/r/20260311084312.1766036-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Suggested-by: Minchan Kim <minchan@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: gao xu <gaoxu2@honor.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:24 -07:00
Sergey Senozhatsky
ed19b9d550 zram: do not autocorrect bad recompression parameters
Do not silently autocorrect bad recompression priority parameter value and
just error out.

Link: https://lkml.kernel.org/r/20260311084312.1766036-3-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Suggested-by: Minchan Kim <minchan@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: gao xu <gaoxu2@honor.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:24 -07:00
Sergey Senozhatsky
241f9005b1 zram: do not permit params change after init
Patch series "zram: recompression cleanups and tweaks", v2.

This series is a somewhat random mix of fixups, recompression cleanups and
improvements partly based on internal conversations.  A few patches in the
series remove unexpected or confusing behaviour, e.g.  auto correction of
bad priority= param for recompression, which should have always been just
an error.  Then it also removes "chain recompression" which has a tricky,
unexpected and confusing behaviour at times.  We also unify and harden the
handling of algo/priority params.  There is also an addition of missing
device lock in algorithm_params_store() which previously permitted
modification of algo params while the device is active.


This patch (of 6):

First, algorithm_params_store(), like any sysfs handler, should grab
device lock.

Second, like any write() sysfs handler, it should grab device lock in
exclusive mode.

Third, it should not permit change of algos' parameters after device init,
as this doesn't make sense - we cannot compress with one C/D dict and then
just change C/D dict to a different one, for example.

Another thing to notice is that algorithm_params_store() accesses device's
->comp_algs for algo priority lookup, which should be protected by device
lock in exclusive mode in general.

Link: https://lkml.kernel.org/r/20260311084312.1766036-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20260311084312.1766036-2-senozhatsky@chromium.org
Fixes: 4eac932103 ("zram: introduce algorithm_params device attribute")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Brian Geffon <bgeffon@google.com>
Cc: gao xu <gaoxu2@honor.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:24 -07:00
Pratyush Yadav
22bdab8e98 kho: drop restriction on maximum page order
KHO currently restricts the maximum order of a restored page to the
maximum order supported by the buddy allocator.  While this works fine for
much of the data passed across kexec, it is possible to have pages larger
than MAX_PAGE_ORDER.

For one, it is possible to get a larger order when using
kho_preserve_pages() if the number of pages is large enough, since it
tries to combine multiple aligned 0-order preservations into one higher
order preservation.

For another, upcoming support for hugepages can have gigantic hugepages
being preserved over KHO.

There is no real reason for this limit.  The KHO preservation machinery
can handle any page order.  Remove this artificial restriction on max page
order.

Link: https://lkml.kernel.org/r/20260309123410.382308-2-pratyush@kernel.org
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:24 -07:00
Pratyush Yadav (Google)
91e74fa8b1 kho: make sure preservations do not span multiple NUMA nodes
The KHO restoration machinery is not capable of dealing with preservations
that span multiple NUMA nodes.  kho_preserve_folio() guarantees the
preservation will only span one NUMA node since folios can't span multiple
nodes.

This leaves kho_preserve_pages().  While semantically kho_preserve_pages()
only deals with 0-order pages, so all preservations should be single page
only, in practice it combines preservations to higher orders for
efficiency.  This can result in a preservation spanning multiple nodes. 
Break up the preservations into a smaller order if that happens.

Link: https://lkml.kernel.org/r/20260309123410.382308-1-pratyush@kernel.org
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Suggested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:24 -07:00
David Hildenbrand (Arm)
396042fb2b KVM: PPC: remove hugetlb.h inclusion
hugetlb.h is no longer required now that we moved vma_kernel_pagesize() to
mm.h.

Link: https://lkml.kernel.org/r/20260309151901.123947-5-david@kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
David Hildenbrand (Arm)
e8301b6adc KVM: remove hugetlb.h inclusion
hugetlb.h is no longer required now that we moved vma_kernel_pagesize() to
mm.h.

Link: https://lkml.kernel.org/r/20260309151901.123947-4-david@kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
David Hildenbrand (Arm)
a9496e9e4b mm: move vma_mmu_pagesize() from hugetlb to vma.c
vma_mmu_pagesize() is also queried on non-hugetlb VMAs and does not really
belong into hugetlb.c.

PPC64 provides a custom overwrite with CONFIG_HUGETLB_PAGE, see
arch/powerpc/mm/book3s64/slice.c, so we cannot easily make this a static
inline function.

So let's move it to vma.c and add some proper kerneldoc.

To make vma tests happy, add a simple vma_kernel_pagesize() stub in
tools/testing/vma/include/custom.h.

Link: https://lkml.kernel.org/r/20260309151901.123947-3-david@kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
David Hildenbrand (Arm)
341ffe82a7 mm: move vma_kernel_pagesize() from hugetlb to mm.h
Patch series "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c", v2.

Looking into vma_(kernel|mmu)_pagesize(), I realized that there is one
scenario where DAX would not do the right thing when the kernel is not
compiled with hugetlb support.

Without hugetlb support, vma_(kernel|mmu)_pagesize() will always return
PAGE_SIZE instead of using the ->pagesize() result provided by dax-device
code.

Fix that by moving vma_kernel_pagesize() to core MM code, where it
belongs.  I don't think this is stable material, but am not 100% sure.

Also, move vma_mmu_pagesize() while at it.  Remove the unnecessary
hugetlb.h inclusion from KVM code.


This patch (of 4):

In the past, only hugetlb had special "vma_kernel_pagesize()"
requirements, so it provided its own implementation.

In commit 05ea88608d ("mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct") we generalized that approach by providing a
vm_ops->pagesize() callback to be used by device-dax.

Once device-dax started using that callback in commit c1d53b92b9
("device-dax: implement ->pagesize() for smaps to report MMUPageSize") it
was missed that CONFIG_DEV_DAX does not depend on hugetlb support.

So building a kernel with CONFIG_DEV_DAX but without CONFIG_HUGETLBFS
would not pick up that value.

Fix it by moving vma_kernel_pagesize() to mm.h, providing only a single
implementation.  While at it, improve the kerneldoc a bit.

Ideally, we'd move vma_mmu_pagesize() as well to the header.  However, its
__weak symbol might be overwritten by a PPC variant in hugetlb code.  So
let's leave it in there for now, as it really only matters for some
hugetlb oddities.

This was found by code inspection.

Link: https://lkml.kernel.org/r/20260309151901.123947-1-david@kernel.org
Link: https://lkml.kernel.org/r/20260309151901.123947-2-david@kernel.org
Fixes: c1d53b92b9 ("device-dax: implement ->pagesize() for smaps to report MMUPageSize")
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
Akinobu Mita
1eba4c9599 docs: mm: fix typo in numa_memory_policy.rst
Fix a typo: MPOL_INTERLEAVED -> MPOL_INTERLEAVE.

Link: https://lkml.kernel.org/r/20260310151837.5888-1-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
SeongJae Park
a4e82de81f Docs/mm/damon/index: fix typo: autoamted -> automated
There is an obvious typo.  Fix it (s/autoamted/automated/).

Link: https://lkml.kernel.org/r/20260307195356.203753-8-sj@kernel.org
Fixes: 32d11b3208 ("Docs/mm/damon/index: simplify the intro")
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: wang lian <lianux.mm@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
SeongJae Park
20675fc8c0 Docs/mm/damon/maintainer-profile: use flexible review cadence
The document mentions the maitainer is working in the usual 9-5 fashion. 
The maintainer nowadays prefers working in a more flexible way.  Update
the document to avoid contributors having a wrong time expectation.

Link: https://lkml.kernel.org/r/20260307195356.203753-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: wang lian <lianux.mm@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
SeongJae Park
d7f00084f6 Docs/admin-guide/mm/damn/lru_sort: fix intervals autotune parameter name
The section name should be the same as the parameter name.  Fix it.

Link: https://lkml.kernel.org/r/20260307195356.203753-6-sj@kernel.org
Fixes: ed581147a4 ("Docs/admin-guide/mm/damon/lru_sort: document intervals autotuning")
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: wang lian <lianux.mm@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:22 -07:00
SeongJae Park
3802e1d98e mm/damon: document non-zero length damon_region assumption
DAMON regions are assumed to always be non-zero length.  There was a
confusion [1] about it, probably due to lack of the documentation. 
Document it.

Link: https://lkml.kernel.org/r/20260307195356.203753-5-sj@kernel.org
Link: https://lore.kernel.org/20251231070029.79682-1-sj@kernel.org/ [1]
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: wang lian <lianux.mm@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:22 -07:00
SeongJae Park
2a5f4454e0 mm/damon/core: clarify damon_set_attrs() usages
damon_set_attrs() is called for multiple purposes from multiple places. 
Calling it in an unsafe context can make DAMON internal state polluted and
results in unexpected behaviors.  Clarify when it is safe, and where it is
being called.

Link: https://lkml.kernel.org/r/20260307195356.203753-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: wang lian <lianux.mm@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:22 -07:00
SeongJae Park
fd83b0d1c4 mm/damon/tests/core-kunit: add a test for damon_is_last_region()
There was a bug [1] in damon_is_last_region().  Add a kunit test to not
reintroduce the bug.

Link: https://lkml.kernel.org/r/20260307195356.203753-3-sj@kernel.org
Link: https://lore.kernel.org/20260114152049.99727-1-sj@kernel.org/ [1]
Signed-off-by: SeongJae Park <sj@kernel.org>
Tested-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:22 -07:00
SeongJae Park
5d6a520aff mm/damon/core: use mult_frac()
Patch series "mm/damon: improve/fixup/update ratio calculation, test and
documentation".

Yet another batch of misc/minor improvements and fixups.  Use mult_frac()
instead of the worse open-coding for rate calculations (patch 1).  Add a
test for a previously found and fixed bug (patch 2).  Improve and update
comments and documentations for easier code review and up-to-date
information (patches 3-6).  Finally, fix an obvious typo (patch 7).


This patch (of 7):

There are multiple places in core code that do open-code rate
calculations.  Use mult_frac(), which is developed for doing that in a way
more safe from overflow and precision loss.

Link: https://lkml.kernel.org/r/20260307195356.203753-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260307195356.203753-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: wang lian <lianux.mm@gmail.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:22 -07:00
SeongJae Park
23754a36cd mm/damon/core: use time_after_eq() in kdamond_fn()
damon_ctx->passed_sample_intervals and damon_ctx->next_*_sis are unsigned
long.  Those are compared in kdamond_fn() using normal comparison
operators.  It is unsafe from overflow.  Use time_after_eq(), which is
safe from overflows when correctly used, instead.

Link: https://lkml.kernel.org/r/20260307194915.203169-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:22 -07:00
SeongJae Park
f05e253637 mm/damon/core: use time_before() for next_apply_sis
damon_ctx->passed_sample_intervals and damos->next_apply_sis are unsigned
long, and compared via normal comparison operators.  It is unsafe from
overflow.  Use time_before(), which is safe from overflow when correctly
used, instead.

Link: https://lkml.kernel.org/r/20260307194915.203169-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:22 -07:00
SeongJae Park
7e6c650fdb mm/damon/core: remove damos_set_next_apply_sis() duplicates
Patch series "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe".

DAMON accounts time using its own jiffies-like time counter, namely
damon_ctx->passed_sample_intervals.  The counter is incremented on each
iteration of kdamond_fn() main loop, which sleeps at least one sample
interval.  Hence the name is like that.

DAMON has time-periodic operations including monitoring results
aggregation and DAMOS action application.  DAMON sets the next time to do
each of such operations in the passed_sample_intervals unit.  And it does
the operation when the counter becomes the same to or larger than the
pre-set values, and update the next time for the operation.  Note that the
operation is done not only when the values exactly match but also when the
time is passed, because the values can be updated for online-committed
DAMON parameters.

The counter is 'unsigned long' type, and the comparison is done using
normal comparison operators.  It is not safe from overflows.  This can
cause rare and limited but odd situations.

Let's suppose there is an operation that should be executed every 20
sampling intervals, and the passed_sample_intervals value for next
execution of the operation is ULONG_MAX - 3.  Once the
passed_sample_intervals reaches ULONG_MAX - 3, the operation will be
executed, and the next time value for doing the operation becomes 17
(ULONG_MAX - 3 + 20), since overflow happens.  In the next iteration of
the kdamond_fn() main loop, passed_sample_intervals is larger than the
next operation time value, so the operation will be executed again.  It
will continue executing the operation for each iteration, until the
passed_sample_intervals also overflows.

Note that this will not be common and problematic in the real world.  The
sampling interval, which takes for each passed_sample_intervals increment,
is 5 ms by default.  And it is usually [auto-]tuned for hundreds of
milliseconds.  That means it takes about 248 days or 4,971 days to have
the overflow on 32 bit machines when the sampling interval is 5 ms and 100
ms, respectively (1<<32 * sampling_interval_in_seconds / 3600 / 24).  On
64 bit machines, the numbers become 2924712086.77536 and 58494241735.5072
years.  So the real user impact is negligible.  But still this is better
to be fixed as long as the fix is simple and efficient.

Fix this by simply replacing the overflow-unsafe native comparison
operators with the existing overflow-safe time comparison helpers.

The first patch only cleans up the next DAMOS action application time
setup for consistency and reduced code.  The second and the third patches
update DAMOS action application time setup and rest, respectively.


This patch (of 3):

There is a function for damos->next_apply_sis setup.  But some places are
open-coding it.  Consistently use the helper.

Link: https://lkml.kernel.org/r/20260307194915.203169-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:21 -07:00
SeongJae Park
bfb1523cde Docs/mm/damon/design: document the power-of-two limitation for addr_unit
The min_region_sz is set as max(DAMON_MIN_REGION_SZ / addr_unit, 1). 
DAMON_MIN_REGION_SZ is the same to PAGE_SIZE, and addr_unit is what the
user can arbitrarily set.  Commit c80f46ac22 ("mm/damon/core: disallow
non-power of two min_region_sz") made min_region_sz to always be a power
of two.  Hence, addr_unit should be a power of two when it is smaller than
PAGE_SIZE.  While 'addr_unit' is a user-exposed parameter, the rule is not
documented.  This can confuse users.  Specifically, if the user sets
addr_unit as a value that is smaller than PAGE_SIZE and not a power of
two, the setup will explicitly fail.

Document the rule on the design document.  Usage documents reference the
design document for detail, so updating only the design document should
suffice.

Link: https://lkml.kernel.org/r/20260307194222.202075-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:21 -07:00
SeongJae Park
a260de7d45 mm/damon/tests/core-kunit: add a test for damon_commit_ctx()
Patch series "mm/damon: test and document power-of-2 min_region_sz
requirement".

Since commit c80f46ac22 ("mm/damon/core: disallow non-power of two
min_region_sz"), min_region_sz is always restricted to be a power of two. 
Add a kunit test to confirm the functionality.  Also, the change adds a
restriction to addr_unit parameter.  Clarify it on the document.


This patch (of 2):

Add a kunit test for confirming the change that is made on commit
c80f46ac22 ("mm/damon/core: disallow non-power of two min_region_sz")
functions as expected.

Link: https://lkml.kernel.org/r/20260307194222.202075-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:21 -07:00
SeongJae Park
300252ebb1 selftests/damon/config: enable DAMON_DEBUG_SANITY
CONFIG_DAMON_DEBUG_SANITY is recommended for DAMON development and test
setups.  Enable it on the build config for DAMON selftests.

Link: https://lkml.kernel.org/r/20260306152914.86303-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:21 -07:00
SeongJae Park
09cbdf7dbe mm/damon/tests/.kunitconifg: enable DAMON_DEBUG_SANITY
CONFIG_DAMON_DEBUG_SANITY is recommended for DAMON development and test
setups.  Enable it on the default configurations for DAMON kunit test run.

Link: https://lkml.kernel.org/r/20260306152914.86303-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:21 -07:00
SeongJae Park
c556187b6e mm/damon/core: add damon_reset_aggregated() debug_sanity check
At time of damon_reset_aggregated(), aggregation of the interval should be
completed, and hence nr_accesses and nr_accesses_bp should match.  I found
a few bugs caused it to be broken in the past, from online parameters
update and complicated nr_accesses handling changes.  Add a sanity check
for that under CONFIG_DAMON_DEBUG_SANITY.

Link: https://lkml.kernel.org/r/20260306152914.86303-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:21 -07:00
SeongJae Park
6aa1f78354 mm/damon/core: add damon_split_region_at() debug_sanity check
damon_split_region_at() should be called with the correct address to split
on.  Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY.

Link: https://lkml.kernel.org/r/20260306152914.86303-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:20 -07:00
SeongJae Park
c070da2391 mm/damon/core: add damon_merge_regions_of() debug_sanity check
damon_merge_regions_of() should be called only after aggregation is
finished and therefore each region's nr_accesses and nr_accesses_bp match.
There were bugs that broke the assumption, during development of online
DAMON parameter updates and monitoring results handling changes.  Add a
sanity check for that under CONFIG_DAMON_DEBUG_SANITY.

Link: https://lkml.kernel.org/r/20260306152914.86303-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:20 -07:00
SeongJae Park
0bb7682fdb mm/damon/core: add damon_merge_two_regions() debug_sanity check
A data corruption could cause damon_merge_two_regions() creating zero
length DAMON regions.  Add a sanity check for that under
CONFIG_DAMON_DEBUG_SANITY.

Link: https://lkml.kernel.org/r/20260306152914.86303-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:20 -07:00
SeongJae Park
242a764abe mm/damon/core: add damon_nr_regions() debug_sanity check
damon_target->nr_regions is introduced to get the number quickly without
having to iterate regions always.  Add a sanity check for that under
CONFIG_DAMON_DEBUG_SANITY.

Link: https://lkml.kernel.org/r/20260306152914.86303-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:20 -07:00
SeongJae Park
9a647920d0 mm/damon/core: add damon_del_region() debug_sanity check
damon_del_region() should be called for targets that have one or more
regions.  Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY.

Link: https://lkml.kernel.org/r/20260306152914.86303-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:20 -07:00
SeongJae Park
b0264a951c mm/damon/core: add damon_new_region() debug_sanity check
damon_new_region() is supposed to be called with only valid address range
arguments.  Do the check under DAMON_DEBUG_SANITY.

Link: https://lkml.kernel.org/r/20260306152914.86303-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:20 -07:00
SeongJae Park
62f0582875 mm/damon: add CONFIG_DAMON_DEBUG_SANITY
Patch series "mm/damon: add optional debugging-purpose sanity checks".

DAMON code has a few assumptions that can be critical if violated. 
Validating the assumptions in code can be useful at finding such critical
bugs.  I was actually adding some such additional sanity checks in my
personal tree, and those were useful at finding bugs that I made during
the development of new patches.  We also found [1] sometimes the
assumptions are misunderstood.  The validation can work as good
documentation for such cases.

Add some of such debugging purpose sanity checks.  Because those
additional checks can impose more overhead, make those only optional via
new config, CONFIG_DAMON_DEBUG_SANITY, that is recommended for only
development and test setups.  And as recommended, enable it for DAMON
kunit tests and selftests.

Note that the verification only WARN_ON() for each of the insanity.  The
developer or tester may better to set panic_on_oops together, like
damon-tests/corr did [2].


This patch (of 10):

Add a new build config that will enable additional DAMON sanity checks. 
It is recommended to be enabled on only development and test setups, since
it can impose additional overhead.

Link: https://lkml.kernel.org/r/20260306152914.86303-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260306152914.86303-2-sj@kernel.org
Link: https://lore.kernel.org/20251231070029.79682-1-sj@kernel.org [1]
Link: a80fbee55e [2]
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:20 -07:00
Usama Arif
5a14198ec6 mm/migrate_device: document folio_get requirement before frozen PMD split
split_huge_pmd_address() with freeze=true splits a PMD migration entry
into PTE migration entries, consuming one folio reference in the process. 
The folio_get() before it provides this reference.

Add a comment explaining this relationship.  The expected folio refcount
at the start of migrate_vma_split_unmapped_folio() is 1.

Link: https://lkml.kernel.org/r/20260309212502.3922825-1-usama.arif@linux.dev
Signed-off-by: Usama Arif <usama.arif@linux.dev>
Suggested-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:19 -07:00
Arnd Bergmann
d765108993 ubsan: turn off kmsan inside of ubsan instrumentation
The structure initialization in the two type mismatch handling functions
causes a call to __msan_memset() to be generated inside of a UACCESS
block, which in turn leads to an objtool warning about possibly leaking
uaccess-enabled state:

lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch+0xda: call to __msan_memset() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1+0xf4: call to __msan_memset() with UACCESS enabled

Most likely __msan_memset() is safe to be called here and could be added
to the uaccess_safe_builtin[] list of safe functions, but seeing that the
ubsan file itself already has kasan, ubsan and kcsan disabled itself, it
is probably a good idea to also turn off kmsan here, in particular this
also avoids the risk of recursing between ubsan and kcsan checks in other
functions of this file.

I saw this happen while testing randconfig builds with clang-22, but did
not try older versions, or attempt to see which kernel change introduced
the warning.

Link: https://lkml.kernel.org/r/20260306150613.350029-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:19 -07:00
Byungchul Park
db359fccf2 mm: introduce a new page type for page pool in page type
Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
determine if a page belongs to a page pool.  However, with the planned
removal of @pp_magic, we should instead leverage the page_type in struct
page, such as PGTY_netpp, for this purpose.

Introduce and use the page type APIs e.g.  PageNetpp(), __SetPageNetpp(),
and __ClearPageNetpp() instead, and remove the existing APIs accessing
@pp_magic e.g.  page_pool_page_is_pp(), netmem_or_pp_magic(), and
netmem_clear_pp_magic().

Plus, add @page_type to struct net_iov at the same offset as struct page
so as to use the page_type APIs for struct net_iov as well.  While at it,
reorder @type and @owner in struct net_iov to avoid a hole and increasing
the struct size.

This work was inspired by the following link:

  https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/

While at it, move the sanity check for page pool to on the free path.

[byungchul@sk.com: gate the sanity check, per Johannes]
  Link: https://lkml.kernel.org/r/20260316223113.20097-1-byungchul@sk.com
Link: https://lkml.kernel.org/r/20260224051347.19621-1-byungchul@sk.com
Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Wei <dw@davidwei.uk>
Cc: Dragos Tatulea <dtatulea@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mark Bloch <mbloch@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:19 -07:00