Kemeng Shi
a5cdbe9f37
mm: shmem: only remove inode from swaplist when it's swapped page count is 0
...
Even if we fail to allocate a swap entry, the inode might have previously
allocated entry and we might take inode containing swap entry off
swaplist. As a result, try_to_unuse() may enter a potential dead loop to
repeatedly look for inode and clean it's swap entry. Only take inode off
swaplist when it's swapped page count is 0 to fix the issue.
Link: https://lkml.kernel.org/r/20250516170939.965736-5-shikemeng@huaweicloud.com
Fixes: b487a2da35 ("mm, swap: simplify folio swap allocation")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com >
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com >
Reviewed-by: Kairui Song <kasong@tencent.com >
Reported-by: kernel test robot <oliver.sang@intel.com >
Closes: https://lore.kernel.org/oe-lkp/202505161438.9009cf47-lkp@intel.com
Cc: Hugh Dickins <hughd@google.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:10 -07:00
Kemeng Shi
3f778ab1b5
mm/shmem: fix potential dead loop in shmem_unuse()
...
If multi shmem_unuse() for different swap type is called concurrently, a
dead loop could occur as following:
shmem_unuse(typeA) shmem_unuse(typeB)
mutex_lock(&shmem_swaplist_mutex)
list_for_each_entry_safe(info, next, ...)
...
mutex_unlock(&shmem_swaplist_mutex)
/* info->swapped may drop to 0 */
shmem_unuse_inode(&info->vfs_inode, type)
mutex_lock(&shmem_swaplist_mutex)
list_for_each_entry(info, next, ...)
if (!info->swapped)
list_del_init(&info->swaplist)
...
mutex_unlock(&shmem_swaplist_mutex)
mutex_lock(&shmem_swaplist_mutex)
/* iterate with offlist entry and encounter a dead loop */
next = list_next_entry(info, swaplist);
...
Restart the iteration if the inode is already off shmem_swaplist list to
fix the issue.
Link: https://lkml.kernel.org/r/20250516170939.965736-4-shikemeng@huaweicloud.com
Fixes: b56a2d8af9 ("mm: rid swapoff of quadratic complexity")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com >
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Kairui Song <kasong@tencent.com >
Cc: kernel test robot <oliver.sang@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:10 -07:00
Kemeng Shi
594ec2ab38
mm: shmem: add missing shmem_unacct_size() in __shmem_file_setup()
...
We will miss shmem_unacct_size() when is_idmapped_mnt() returns a failure.
Move is_idmapped_mnt() before shmem_acct_size() to fix the issue.
Link: https://lkml.kernel.org/r/20250516170939.965736-3-shikemeng@huaweicloud.com
Fixes: 7a80e5b8c6 ("shmem: support idmapped mounts for tmpfs")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com >
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Kairui Song <kasong@tencent.com >
Cc: kernel test robot <oliver.sang@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:10 -07:00
Kemeng Shi
e08d5f5156
mm: shmem: avoid unpaired folio_unlock() in shmem_swapin_folio()
...
Patch series "Some random fixes and cleanup to shmem", v3.
This series contains some simple fixes and cleanup which are made during
learning shmem. More details can be found in respective patches.
This patch (of 5):
If we get a folio from swap_cache_get_folio() successfully but encounter a
failure before the folio is locked, we will unlock the folio which was not
previously locked.
Put the folio and set it to NULL when a failure occurs before the folio is
locked to fix the issue.
Link: https://lkml.kernel.org/r/20250516170939.965736-1-shikemeng@huaweicloud.com
Link: https://lkml.kernel.org/r/20250516170939.965736-2-shikemeng@huaweicloud.com
Fixes: 058313515d ("mm: shmem: fix potential data corruption during shmem swapin")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com >
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com >
Reviewed-by: Kairui Song <kasong@tencent.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: kernel test robot <oliver.sang@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:10 -07:00
Akinobu Mita
8e1c4961f4
mm/damon/core: avoid destroyed target reference from DAMOS quota
...
When the number of the monitoring targets in running contexts is reduced,
there may be DAMOS quotas referencing the targets that will be destroyed.
Applying the scheme action for such DAMOS scheme will be skipped forever
looking for the starting part of the region for the destroyed monitoring
target.
To fix this issue, when the monitoring target is destroyed, reset the
starting part for all DAMOS quotas that reference the target.
Link: https://lkml.kernel.org/r/20250517141852.142802-1-akinobu.mita@gmail.com
Fixes: da87878010 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com >
Reviewed-by: SeongJae Park <sj@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:09 -07:00
Shakeel Butt
3ac4638a73
memcg: make memcg_rstat_updated nmi safe
...
Currently kernel maintains memory related stats updates per-cgroup to
optimize stats flushing. The stats_updates is defined as atomic64_t which
is not nmi-safe on some archs. Actually we don't really need 64bit atomic
as the max value stats_updates can get should be less than nr_cpus *
MEMCG_CHARGE_BATCH. A normal atomic_t should suffice.
Also the function cgroup_rstat_updated() is still not nmi-safe but there
is parallel effort to make it nmi-safe, so until then let's ignore it in
the nmi context.
Link: https://lkml.kernel.org/r/20250519063142.111219-6-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Tejun Heo <tj@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:09 -07:00
Shakeel Butt
15ca4fa904
memcg: nmi-safe slab stats updates
...
The objcg based kmem [un]charging can be called in nmi context and it may
need to update NR_SLAB_[UN]RECLAIMABLE_B stats. So, let's correctly
handle the updates of these stats in the nmi context.
Link: https://lkml.kernel.org/r/20250519063142.111219-5-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Tejun Heo <tj@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:09 -07:00
Shakeel Butt
9d3edf96ce
memcg: add nmi-safe update for MEMCG_KMEM
...
The objcg based kmem charging and uncharging code path needs to update
MEMCG_KMEM appropriately. Let's add support to update MEMCG_KMEM in
nmi-safe way for those code paths.
Link: https://lkml.kernel.org/r/20250519063142.111219-4-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Tejun Heo <tj@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:09 -07:00
Shakeel Butt
940b01fc8d
memcg: nmi safe memcg stats for specific archs
...
There are archs which have NMI but does not support this_cpu_* ops safely
in the nmi context but they support safe atomic ops in nmi context. For
such archs, let's add infra to use atomic ops for the memcg stats which
can be updated in nmi.
At the moment, the memcg stats which get updated in the objcg charging
path are MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B.
Rather than adding support for all memcg stats to be nmi safe, let's just
add infra to make these three stats nmi safe which this patch is doing.
Link: https://lkml.kernel.org/r/20250519063142.111219-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Tejun Heo <tj@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:08 -07:00
Shakeel Butt
25352d2f2d
memcg: disable kmem charging in nmi for unsupported arch
...
Patch series "memcg: nmi-safe kmem charging", v4.
Users can attached their BPF programs at arbitrary execution points in the
kernel and such BPF programs may run in nmi context. In addition, these
programs can trigger memcg charged kernel allocations in the nmi context.
However memcg charging infra for kernel memory is not equipped to handle
nmi context for all architectures.
This series removes the hurdles to enable kmem charging in the nmi context
for most of the archs. For archs without CONFIG_HAVE_NMI, this series is
a noop. For archs with NMI support and have
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS, the previous work to make memcg
stats re-entrant is sufficient for allowing kmem charging in nmi context.
For archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and with ARCH_HAVE_NMI_SAFE_CMPXCHG,
this series added infra to support kmem charging in nmi context. Lastly
those archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, kmem
charging in nmi context is not supported at all.
Mostly used archs have support for CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
and this series should be almost a noop (other than making
memcg_rstat_updated nmi safe) for such archs.
This patch (of 5):
The memcg accounting and stats uses this_cpu* and atomic* ops. There are
archs which define CONFIG_HAVE_NMI but does not define
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, so
memcg accounting for such archs in nmi context is not possible to support.
Let's just disable memcg accounting in nmi context for such archs.
Link: https://lkml.kernel.org/r/20250519063142.111219-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20250519063142.111219-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Tejun Heo <tj@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:08 -07:00
Mark Brown
9abb8c208f
selftests/mm: deduplicate default page size test results in thuge-gen
...
The thuge-gen test program runs mmap() and shmget() tests for both every
available page size and the default page size, resulting in two tests for
the default size. These tests are distinct since the flags in the default
case do not specify an explicit size, add the flags to the test name that
is logged to deduplicate.
Link: https://lkml.kernel.org/r/20250515-selfests-mm-thuge-gen-dup-v1-1-057d2836553f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org >
Acked-by: Dev Jain <dev.jain@arm.com >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:08 -07:00
Mark Brown
62973e3867
selftests/mm: deduplicate test logging in test_mlock_lock()
...
The mlock2-tests test_mlock_lock() test reports two test results with an
identical string, one reporitng if it successfully locked a block of
memory and another reporting if the lock is still present after doing an
unlock (following a similar pattern to other tests in the same program).
This confuses test automation since the test string is used to deduplicate
tests, change the post unlock test to report "Unlocked" instead like the
other tests to fix this.
Link: https://lkml.kernel.org/r/20250515-selftest-mm-mlock2-dup-v1-1-963d5d7d243a@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org >
Acked-by: Dev Jain <dev.jain@arm.com >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:08 -07:00
Sergey Senozhatsky
dc75a0d93b
zram: support deflate-specific params
...
Introduce support of algorithm specific parameters in algorithm_params
device attribute. The expected format is algorithm.param=value.
For starters, add support for deflate.winbits parameter.
Link: https://lkml.kernel.org/r/20250514024825.1745489-3-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org >
Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com >
Cc: Minchan Kim <minchan@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:07 -07:00
Sergey Senozhatsky
a5ade2e9fa
zram: rename ZCOMP_PARAM_NO_LEVEL
...
Patch series "zram: support algorithm-specific parameters".
This patchset adds support for algorithm-specific parameters. For now,
only deflate-specific winbits can be configured, which fixes deflate
support on some s390 setups.
This patch (of 2):
Use more generic name because this will be default "un-set"
value for more params in the future.
Link: https://lkml.kernel.org/r/20250514024825.1745489-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20250514024825.1745489-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org >
Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com >
Cc: Minchan Kim <minchan@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:07 -07:00
Matthew Wilcox (Oracle)
d973692944
iov: remove copy_page_from_iter_atomic()
...
All callers now use copy_folio_from_iter_atomic(), so convert
copy_page_from_iter_atomic(). While I'm in there, use kmap_local_folio()
and pagefault_disable() instead of kmap_atomic(). That allows preemption
and/or task migration to happen during the copy_from_user(). Also use the
new folio_test_partial_kmap() predicate instead of open-coding it.
Link: https://lkml.kernel.org/r/20250514170607.3000994-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Cc: Hugh Dickins <hughd@google.com >
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:07 -07:00
Matthew Wilcox (Oracle)
80ae99c572
ntfs3: use folios more in ntfs_compress_write()
...
Remove the local 'page' variable and do everything in terms of folios.
Removes the last user of copy_page_from_iter_atomic() and a hidden call to
compound_head() in ClearPageDirty().
Link: https://lkml.kernel.org/r/20250514170607.3000994-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com >
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Cc: Hugh Dickins <hughd@google.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:07 -07:00
Matthew Wilcox (Oracle)
acc53a0b4c
mm: rename page->index to page->__folio_index
...
All users of page->index have been converted to not refer to it any more.
Update a few pieces of documentation that were missed and prevent new
users from appearing (or at least make them easy to grep for).
Link: https://lkml.kernel.org/r/20250514181508.3019795-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Acked-by: David Hildenbrand <david@redhat.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:06 -07:00
Matthew Wilcox (Oracle)
e94715982c
m68k: remove use of page->index
...
Switch to using struct ptdesc to store the markbits which will allow us to
remove index from struct page.
Link: https://lkml.kernel.org/r/20250516151332.3705351-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Geert Uytterhoeven <geert@linux-m68k.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-31 22:46:06 -07:00
Nikhil Dhama
c544a952ba
mm: pcp: increase pcp->free_count threshold to trigger free_high
...
In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
which is invoked by free_pcppages_bulk(). So, it used to increase
free_factor by 1 only when we try to reduce the size of pcp list and
free_high used to trigger only for order > 0 and order < costly_order
and pcp->free_factor > 0.
For iperf3 I noticed that with older design in kernel v6.6, pcp list
was drained mostly when pcp->count > high (more often when count goes
above 530). and most of the time pcp->free_factor was 0, triggering
very few high order flushes.
But this is changed in the current design, introduced in commit
6ccdcb6d3a ("mm, pcp: reduce detecting time of consecutive high order
page freeing"), where pcp->free_factor is changed to pcp->free_count to
keep track of the number of pages freed contiguously. In this design,
pcp->free_count is incremented on every deallocation, irrespective of
whether pcp list was reduced or not. And logic to trigger free_high is
if pcp->free_count goes above batch (which is 63) and there are two
contiguous page free without any allocation.
With this design, for iperf3, pcp list is getting flushed more
frequently because free_high heuristics is triggered more often now. I
observed that high order pcp list is drained as soon as both count and
free_count goes above 63.
Due to this more aggressive high order flushing, applications doing
contiguous high order allocation will require to go to global list more
frequently.
On a 2-node AMD machine with 384 vCPUs on each node, connected via
Mellonox connectX-7, I am seeing a ~30% performance reduction if we
scale number of iperf3 client/server pairs from 32 to 64.
Though this new design reduced the time to detect high order flushes,
but for application which are allocating high order pages more
frequently it may be flushing the high order list pre-maturely. This
motivates towards tuning on how late or early we should flush high
order lists.
So, in this patch, we increased the pcp->free_count threshold to
trigger free_high from "batch" to "batch + pcp->high_min / 2" as
suggested by Ying [1], In the original pcp->free_factor solution,
free_high is triggered for contiguous freeing with size ranging from
"batch" to "pcp->high + batch". So, the average value is "batch +
pcp->high / 2". While in the pcp->free_count solution, free_high will
be triggered for contiguous freeing with size "batch". So, to restore
the original behavior, we can use the threshold "batch + pcp->high_min
/ 2"
This new threshold keeps high order pages in pcp list for a longer
duration which can help the application doing high order allocations
frequently.
With this patch performace to Iperf3 is restored and score for other
benchmarks on the same machine are as follows:
iperf3 lmbench3 netperf kbuild
(AF_UNIX) (SCTP_STREAM_MANY)
------- --------- ----------------- ------
v6.6 vanilla (base) 100 100 100 100
v6.12 vanilla 69 113 98.5 98.8
v6.12 + this patch 100 110.3 100.2 99.3
netperf-tcp:
6.12 6.12
vanilla this_patch
Hmean 64 732.14 ( 0.00%) 730.45 ( -0.23%)
Hmean 128 1417.46 ( 0.00%) 1419.44 ( 0.14%)
Hmean 256 2679.67 ( 0.00%) 2676.45 ( -0.12%)
Hmean 1024 8328.52 ( 0.00%) 8339.34 ( 0.13%)
Hmean 2048 12716.98 ( 0.00%) 12743.68 ( 0.21%)
Hmean 3312 15787.79 ( 0.00%) 15887.25 ( 0.63%)
Hmean 4096 17311.91 ( 0.00%) 17332.68 ( 0.12%)
Hmean 8192 20310.73 ( 0.00%) 20465.09 ( 0.76%)
Link: https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/ [1]
Link: https://lkml.kernel.org/r/20250407105219.55351-1-nikhil.dhama@amd.com
Fixes: 6ccdcb6d3a ("mm, pcp: reduce detecting time of consecutive high order page freeing")
Signed-off-by: Nikhil Dhama <nikhil.dhama@amd.com >
Suggested-by: Huang Ying <ying.huang@linux.alibaba.com >
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com >
Cc: Raghavendra K T <raghavendra.kt@amd.com >
Cc: Mel Gorman <mgorman@techsingularity.net >
Cc: Bharata B Rao <bharata@amd.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-27 19:38:27 -07:00
Fan Ni
05275594a3
mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
...
In __unmap_hugepage_range(), the "page" pointer always points to the first
page of a huge page, which guarantees there is a folio associating with
it. Convert the "page" pointer to use folio.
Link: https://lkml.kernel.org/r/20250505182345.506888-6-nifan.cxl@gmail.com
Signed-off-by: Fan Ni <fan.ni@samsung.com >
Reviewed-by: Oscar Salvador <osalvador@suse.de >
Acked-by: David Hildenbrand <david@redhat.com >
Cc: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com >
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-27 19:38:26 -07:00
Fan Ni
7f4b6065d9
mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
...
The function __unmap_hugepage_range() has two kinds of users:
1) unmap_hugepage_range(), which passes in the head page of a folio.
Since unmap_hugepage_range() already takes folio and there are no other
uses of the folio struct in the function, it is natural for
__unmap_hugepage_range() to take folio also.
2) All other uses, which pass in NULL pointer.
In both cases, we can pass in folio. Refactor __unmap_hugepage_range() to
take folio.
Link: https://lkml.kernel.org/r/20250505182345.506888-5-nifan.cxl@gmail.com
Signed-off-by: Fan Ni <fan.ni@samsung.com >
Acked-by: David Hildenbrand <david@redhat.com >
Reviewed-by: Oscar Salvador <osalvador@suse.de >
Cc: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com >
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-27 19:38:26 -07:00
Fan Ni
81edb1ba32
mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
...
The function unmap_hugepage_range() has two kinds of users:
1) unmap_ref_private(), which passes in the head page of a folio. Since
unmap_ref_private() already takes folio and there are no other uses
of the folio struct in the function, it is natural for
unmap_hugepage_range() to take folio also.
2) All other uses, which pass in NULL pointer.
In both cases, we can pass in folio. Refactor unmap_hugepage_range() to
take folio.
Link: https://lkml.kernel.org/r/20250505182345.506888-4-nifan.cxl@gmail.com
Signed-off-by: Fan Ni <fan.ni@samsung.com >
Reviewed-by: Muchun Song <muchun.song@linux.dev >
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com >
Reviewed-by: Oscar Salvador <osalvador@suse.de >
Acked-by: David Hildenbrand <david@redhat.com >
Cc: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-27 19:38:26 -07:00
Fan Ni
b0752f1a70
mm/hugetlb: pass folio instead of page to unmap_ref_private()
...
Patch series "Let unmap_hugepage_range() and several related functions
take folio instead of page", v4.
This patch (of 4):
unmap_ref_private() has only a single user, which passes in &folio->page.
Let it take the folio directly.
Link: https://lkml.kernel.org/r/20250505182345.506888-2-nifan.cxl@gmail.com
Link: https://lkml.kernel.org/r/20250505182345.506888-3-nifan.cxl@gmail.com
Signed-off-by: Fan Ni <fan.ni@samsung.com >
Reviewed-by: Muchun Song <muchun.song@linux.dev >
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com >
Reviewed-by: Oscar Salvador <osalvador@suse.de >
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Acked-by: David Hildenbrand <david@redhat.com >
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-27 19:38:26 -07:00
Shakeel Butt
200577f69f
memcg: objcg stock trylock without irq disabling
...
There is no need to disable irqs to use objcg per-cpu stock, so let's just
not do that but consume_obj_stock() and refill_obj_stock() will need to
use trylock instead to avoid deadlock against irq. One consequence of
this change is that the charge request from irq context may take slowpath
more often but it should be rare.
Link: https://lkml.kernel.org/r/20250514184158.3471331-8-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:39 -07:00
Shakeel Butt
0ccf1806d4
memcg: no stock lock for cpu hot-unplug
...
Previously on the cpu hot-unplug, the kernel would call drain_obj_stock()
with objcg local lock. However local lock was not needed as the stock
which was accessed belongs to a dead cpu but we kept it there to disable
irqs as drain_obj_stock() may call mod_objcg_mlstate() which required irqs
disabled. However there is no need to disable irqs now for
mod_objcg_mlstate(), so we can remove the local lock altogether from cpu
hot-unplug path.
Link: https://lkml.kernel.org/r/20250514184158.3471331-7-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Shakeel Butt
eee8a1778c
memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
...
Let's make __mod_memcg_lruvec_state re-entrant safe and name it
mod_memcg_lruvec_state(). The only thing needed is to convert the usage
of __this_cpu_add() to this_cpu_add(). There are two callers of
mod_memcg_lruvec_state() and one of them i.e. __mod_objcg_mlstate() will
be re-entrant safe as well, so, rename it mod_objcg_mlstate(). The last
caller __mod_lruvec_state() still calls __mod_node_page_state() which is
not re-entrant safe yet, so keep it as is.
Link: https://lkml.kernel.org/r/20250514184158.3471331-6-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Shakeel Butt
e52401e724
memcg: make count_memcg_events re-entrant safe against irqs
...
Let's make count_memcg_events re-entrant safe against irqs. The only
thing needed is to convert the usage of __this_cpu_add() to
this_cpu_add(). In addition, with re-entrant safety, there is no need to
disable irqs. Also add warnings for in_nmi() as it is not safe against
nmi context.
Link: https://lkml.kernel.org/r/20250514184158.3471331-5-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Shakeel Butt
8814e3b869
memcg: make mod_memcg_state re-entrant safe against irqs
...
Let's make mod_memcg_state re-entrant safe against irqs. The only thing
needed is to convert the usage of __this_cpu_add() to this_cpu_add(). In
addition, with re-entrant safety, there is no need to disable irqs.
mod_memcg_state() is not safe against nmi, so let's add warning if someone
tries to call it in nmi context.
Link: https://lkml.kernel.org/r/20250514184158.3471331-4-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Shakeel Butt
c7163535cd
memcg: move preempt disable to callers of memcg_rstat_updated
...
Let's move the explicit preempt disable code to the callers of
memcg_rstat_updated and also remove the memcg_stats_lock and related
functions which ensures the callers of stats update functions have
disabled preemption because now the stats update functions are explicitly
disabling preemption.
Link: https://lkml.kernel.org/r/20250514184158.3471331-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Shakeel Butt
8a4b42b955
memcg: memcg_rstat_updated re-entrant safe against irqs
...
Patch series "memcg: make memcg stats irq safe", v2.
This series converts memcg stats to be irq safe i.e. memcg stats can be
updated in any context (task, softirq or hardirq) without disabling the
irqs. This is still not nmi-safe on all architectures but after this
series converting memcg charging and stats nmi-safe will be easier.
This patch (of 7):
memcg_rstat_updated() is used to track the memcg stats updates for
optimizing the flushes. At the moment, it is not re-entrant safe and the
callers disabled irqs before calling. However to achieve the goal of
updating memcg stats without irqs, memcg_rstat_updated() needs to be
re-entrant safe against irqs.
This patch makes memcg_rstat_updated() re-entrant safe using this_cpu_*
ops. On archs with CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS, this patch is
also making memcg_rstat_updated() nmi safe.
[lorenzo.stoakes@oracle.com: fix build]
Link: https://lkml.kernel.org/r/22f69e6e-7908-4e92-96ca-5c70d535c439@lucifer.local
Link: https://lkml.kernel.org/r/20250514184158.3471331-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20250514184158.3471331-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev >
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Reviewed-by: Vlastimil Babka <vbabka@suse.cz >
Tested-by: Alexei Starovoitov <ast@kernel.org >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Roman Gushchin <roman.gushchin@linux.dev >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Baolin Wang
cc79061b8f
mm: khugepaged: decouple SHMEM and file folios' collapse
...
Originally, the file pages collapse was intended for tmpfs/shmem to merge
into THP in the background. However, now not only tmpfs/shmem can support
large folios, but some other file systems (such as XFS, erofs ...) also
support large folios. Therefore, it is time to decouple the support of
file folios collapse from SHMEM.
Link: https://lkml.kernel.org/r/ce5c2314e0368cf34bda26f9bacf01c982d4da17.1747119309.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com >
Acked-by: David Hildenbrand <david@redhat.com >
Acked-by: Zi Yan <ziy@nvidia.com >
Cc: Dev Jain <dev.jain@arm.com >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Cc: Mariano Pache <npache@redhat.com >
Cc: Michal Hocko <mhocko@suse.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Ryan Roberts <ryan.roberts@arm.com >
Cc: Suren Baghdasaryan <surenb@google.com >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Ryan Chung
19e0713bbe
selftests/eventfd: correct test name and improve messages
...
- Rename test from eventfd_chek_flag_cloexec_and_nonblock to
eventfd_check_flag_cloexec_and_nonblock.
- Make the RDWR‐flag comment declarative:
“The kernel automatically adds the O_RDWR flag.”
- Update semaphore‐flag failure message to:
“eventfd semaphore flag check failed: …”
Link: https://lkml.kernel.org/r/20250513074411.6965-1-seokwoo.chung130@gmail.com
Signed-off-by: Ryan Chung <seokwoo.chung130@gmail.com >
Reviewed-by: Wen Yang <wen.yang@linux.dev >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Casey Chen
780138b123
alloc_tag: check mem_profiling_support in alloc_tag_init
...
If mem_profiling_support is false, for example by
sysctl.vm.mem_profiling=never, alloc_tag_init should skip module tags
allocation, codetag type registration and procfs init.
Link: https://lkml.kernel.org/r/20250513182602.121843-1-cachen@purestorage.com
Signed-off-by: Casey Chen <cachen@purestorage.com >
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com >
Acked-by: Suren Baghdasaryan <surenb@google.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
SeongJae Park
6a4b3551ba
Docs/damon: update titles and brief introductions to explain DAMOS
...
DAMON was initially developed only for data access monitoring, and then
extended for not only access monitoring but also access-aware system
operations (DAMOS). But the documents have old titles and brief
introductions for only the monitoring part. Update the titles and the
brief introductions to explain DAMOS part together.
Link: https://lkml.kernel.org/r/20250513002715.40126-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org >
Cc: Brendan Higgins <brendan.higgins@linux.dev >
Cc: David Gow <davidgow@google.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
SeongJae Park
03f83209e8
selftests/damon/_damon_sysfs: read tried regions directories in order
...
Kdamond.update_schemes_tried_regions() reads and stores tried regions
information out of address order. It makes debugging a test failure
difficult. Change the behavior to do the reading and writing in the
address order.
Link: https://lkml.kernel.org/r/20250513002715.40126-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org >
Cc: Brendan Higgins <brendan.higgins@linux.dev >
Cc: David Gow <davidgow@google.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
SeongJae Park
094fb14913
mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
...
DAMOS filters' default reject behavior is not very simple. Actually there
was a mistake[1] during the development. Add a kunit test for validating
the behavior.
Link: https://lkml.kernel.org/r/20250513002715.40126-5-sj@kernel.org
Link: https://lore.kernel.org/20250227002913.19359-1-sj@kernel.org [1]
Signed-off-by: SeongJae Park <sj@kernel.org >
Cc: Brendan Higgins <brendan.higgins@linux.dev >
Cc: David Gow <davidgow@google.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
SeongJae Park
a82cf30010
mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
...
Commit c0cb9d91bf ("mm/damon/paddr: report filter-passed bytes back for
DAMOS_STAT action") added unused variable in damon_pa_stat(), due to a
copy-and-paste error. Remove it.
Link: https://lkml.kernel.org/r/20250513002715.40126-4-sj@kernel.org
Fixes: c0cb9d91bf ("mm/damon/paddr: report filter-passed bytes back for DAMOS_STAT action")
Signed-off-by: SeongJae Park <sj@kernel.org >
Cc: Brendan Higgins <brendan.higgins@linux.dev >
Cc: David Gow <davidgow@google.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
SeongJae Park
0bac6b1a11
mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
...
A comment on damos_sysfs_quota_goal_metric_strs is simply wrong, due to a
copy-and-paste error. Fix it.
Link: https://lkml.kernel.org/r/20250513002715.40126-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org >
Cc: Brendan Higgins <brendan.higgins@linux.dev >
Cc: David Gow <davidgow@google.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
SeongJae Park
591c4c78be
mm/damon/core: warn and fix nr_accesses[_bp] corruption
...
Patch series "mm/damon: minor fixups and improvements for code, tests, and
documents".
Yet another batch of miscellaneous DAMON changes. Fix and improve minor
problems in code, tests and documents.
This patch (of 6):
For a bug such as double aggregation reset[1], ->nr_accesses and/or
->nr_accesses_bp of damon_region could be corrupted. Such corruption can
make monitoring results pretty inaccurate, so the root causing bug should
be investigated. Meanwhile, the corruption itself can easily be fixed but
silently fixing it will hide the bug.
Fix the corruption as soon as found, but WARN_ONCE() so that we can be
aware of the existence of the bug while keeping the system running in a
more sane way.
Link: https://lkml.kernel.org/r/20250513002715.40126-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250513002715.40126-2-sj@kernel.org
Link: https://lore.kernel.org/20250302214145.356806-1-sj@kernel.org [1]
Signed-off-by: SeongJae Park <sj@kernel.org >
Cc: Brendan Higgins <brendan.higgins@linux.dev >
Cc: David Gow <davidgow@google.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:38 -07:00
Alexei Starovoitov
2aad4edf6e
mm: rename try_alloc_pages() to alloc_pages_nolock()
...
The "try_" prefix is confusing, since it made people believe that
try_alloc_pages() is analogous to spin_trylock() and NULL return means
EAGAIN. This is not the case. If it returns NULL there is no reason to
call it again. It will most likely return NULL again. Hence rename it to
alloc_pages_nolock() to make it symmetrical to free_pages_nolock() and
document that NULL means ENOMEM.
Link: https://lkml.kernel.org/r/20250517003446.60260-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Acked-by: Johannes Weiner <hannes@cmpxchg.org >
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev >
Acked-by: Harry Yoo <harry.yoo@oracle.com >
Cc: Andrii Nakryiko <andrii@kernel.org >
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com >
Cc: Michal Hocko <mhocko@suse.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Steven Rostedt <rostedt@goodmis.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
Mark Brown
5fc4b770fc
selftests/mm: deduplicate second mmap() of 5*PAGE_SIZE at base
...
The map_fixed_noreplace test does two blocks of test starting from a
mapping of 5 pages at the base address, logging a test result for each
initial mapping. These are logged with the same test name, causing test
automation software to see two reports for the same test in a single run.
Tweak the log message for the second one to deduplicate.
Link: https://lkml.kernel.org/r/20250518-selftests-mm-map-fixed-noreplace-dup-v1-1-1a11a62c5e9f@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
Lorenzo Stoakes
6669d1aaa0
mm: remove WARN_ON_ONCE() in file_has_valid_mmap_hooks()
...
Having encountered a trinity report in linux-next (Linked in the 'Closes'
tag) it appears that there are legitimate situations where a file-backed
mapping can be acquired but no file->f_op->mmap or
file->f_op->mmap_prepare is set, at which point do_mmap() should simply
error out with -ENODEV.
Since previously we did not warn in this scenario and it appears we rely
upon this, restore this situation, while retaining a WARN_ON_ONCE() for
the case where both are set, which is absolutely incorrect and must be
addressed and thus always requires a warning.
If further work is required to chase down precisely what is causing this,
then we can later restore this, but it makes no sense to hold up this
series to do so, as this is existing and apparently expected behaviour.
Link: https://lkml.kernel.org/r/20250514084024.29148-1-lorenzo.stoakes@oracle.com
Fixes: c84bf6dd2b ("mm: introduce new .mmap_prepare() file callback")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Reported-by: kernel test robot <oliver.sang@intel.com >
Closes: https://lore.kernel.org/oe-lkp/202505141434.96ce5e5d-lkp@intel.com
Reviewed-by: Vlastimil Babka <vbabka@suse.cz >
Reviewed-by: Pedro Falcato <pfalcato@suse.de >
Acked-by: David Hildenbrand <david@redhat.com >
Cc: Al Viro <viro@zeniv.linux.org.uk >
Cc: Christian Brauner <brauner@kernel.org >
Cc: Jan Kara <jack@suse.cz >
Cc: Jann Horn <jannh@google.com >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Michal Hocko <mhocko@suse.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Suren Baghdasaryan <surenb@google.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
Lorenzo Stoakes
cc0535acd1
MAINTAINERS: add kernel/fork.c to relevant sections
...
Currently kernel/fork.c both contains absolutely key logic relating to a
number of kernel subsystems and also has absolutely no assignment in
MAINTAINERS.
Correct this by placing this file in relevant sections - mm core, exec and
the scheduler so people know who to contact when making changes here.
scripts/get_maintainers.pl can perfectly well handle a file being in
multiple sections, so this functions correctly.
Intent is that we keep putting changes to kernel/fork.c through Andrew's
tree.
Link: https://lkml.kernel.org/r/20250513145706.122101-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Acked-by: Michal Hocko <mhocko@suse.com >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Reviewed-by: Kees Cook <kees@kernel.org >
Acked-by: David Hildenbrand <david@redhat.com >
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com >
Cc: Ben Segall <bsegall@google.com >
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Juri Lelli <juri.lelli@redhat.com >
Cc: Mel Gorman <mgorman <mgorman@suse.de >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Suren Baghdasaryan <surenb@google.com >
Cc: Valentin Schneider <vschneid@redhat.com >
Cc: Vincent Guittot <vincent.guittot@linaro.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
Baolin Wang
698c0089cd
mm: convert do_set_pmd() to take a folio
...
In do_set_pmd(), we always use the folio->page to build PMD mappings for
the entire folio. Since all callers of do_set_pmd() already hold a stable
folio, converting do_set_pmd() to take a folio is safe and more
straightforward.
In addition, to ensure the extensibility of do_set_pmd() for supporting
larger folios beyond PMD size, we keep the 'page' parameter to specify
which page within the folio should be mapped.
No functional changes expected.
Link: https://lkml.kernel.org/r/9b488f4ecb4d3fd8634e3d448dd0ed6964482480.1747017104.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com >
Reviewed-by: Zi Yan <ziy@nvidia.com >
Acked-by: David Hildenbrand <david@redhat.com >
Cc: Dev Jain <dev.jain@arm.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Cc: Mariano Pache <npache@redhat.com >
Cc: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Michal Hocko <mhocko@suse.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Ryan Roberts <ryan.roberts@arm.com >
Cc: Suren Baghdasaryan <surenb@google.com >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
Baolin Wang
5053383829
mm: khugepaged: convert set_huge_pmd() to take a folio
...
We've already gotten the stable locked folio in collapse_pte_mapped_thp(),
so just use folio for set_huge_pmd() to set the PMD entry, which is more
straightforward.
Moreover, we will check the folio size in do_set_pmd(), so we can remove
the unnecessary VM_BUG_ON() in set_huge_pmd(). While we are at it, we can
also remove the PageTransHuge(), as it currently has no callers.
Link: https://lkml.kernel.org/r/110c3e1ec5fe7854a0e2c95ffcbc985817180ed7.1747017104.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com >
Acked-by: David Hildenbrand <david@redhat.com >
Cc: Dev Jain <dev.jain@arm.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Cc: Mariano Pache <npache@redhat.com >
Cc: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Michal Hocko <mhocko@suse.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Ryan Roberts <ryan.roberts@arm.com >
Cc: Suren Baghdasaryan <surenb@google.com >
Cc: Vlastimil Babka <vbabka@suse.cz >
Cc: Zi Yan <ziy@nvidia.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
David Hildenbrand
a624c424d5
mm/io-mapping: track_pfn() -> "pfnmap tracking"
...
track_pfn() does not exist, let's simply refer to it as "pfnmap tracking".
Link: https://lkml.kernel.org/r/20250512123424.637989-12-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Acked-by: Ingo Molnar <mingo@kernel.org > [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Betkov <bp@alien8.de >
Cc: Dave Airlie <airlied@gmail.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Jann Horn <jannh@google.com >
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Peter Xu <peterx@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Thomas Gleinxer <tglx@linutronix.de >
Cc: Tvrtko Ursulin <tursulin@ursulin.net >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
David Hildenbrand
11c82e7181
drm/i915: track_pfn() -> "pfnmap tracking"
...
track_pfn() does not exist, let's simply refer to it as "pfnmap tracking".
Link: https://lkml.kernel.org/r/20250512123424.637989-11-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Acked-by: Ingo Molnar <mingo@kernel.org > [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Betkov <bp@alien8.de >
Cc: Dave Airlie <airlied@gmail.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Jann Horn <jannh@google.com >
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Peter Xu <peterx@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Thomas Gleinxer <tglx@linutronix.de >
Cc: Tvrtko Ursulin <tursulin@ursulin.net >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
David Hildenbrand
99e27b047c
x86/mm/pat: inline memtype_match() into memtype_erase()
...
Let's just have it in a single function. The resulting function is
certainly small enough and readable.
Link: https://lkml.kernel.org/r/20250512123424.637989-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Betkov <bp@alien8.de >
Cc: Dave Airlie <airlied@gmail.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Jann Horn <jannh@google.com >
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Peter Xu <peterx@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Thomas Gleinxer <tglx@linutronix.de >
Cc: Tvrtko Ursulin <tursulin@ursulin.net >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
David Hildenbrand
81baf84501
x86/mm/pat: remove MEMTYPE_*_MATCH
...
The "memramp() shrinking" scenario no longer applies, so let's remove that
now-unnecessary handling.
Link: https://lkml.kernel.org/r/20250512123424.637989-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Acked-by: Ingo Molnar <mingo@kernel.org > [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Betkov <bp@alien8.de >
Cc: Dave Airlie <airlied@gmail.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Jann Horn <jannh@google.com >
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Peter Xu <peterx@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Thomas Gleinxer <tglx@linutronix.de >
Cc: Tvrtko Ursulin <tursulin@ursulin.net >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00
David Hildenbrand
b3662fb91b
x86/mm/pat: remove strict_prot parameter from reserve_pfn_range()
...
Always set to 0, so let's remove it.
Link: https://lkml.kernel.org/r/20250512123424.637989-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Acked-by: Ingo Molnar <mingo@kernel.org > [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Borislav Betkov <bp@alien8.de >
Cc: Dave Airlie <airlied@gmail.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Jann Horn <jannh@google.com >
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org >
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com >
Cc: Peter Xu <peterx@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Thomas Gleinxer <tglx@linutronix.de >
Cc: Tvrtko Ursulin <tursulin@ursulin.net >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-05-22 14:55:37 -07:00