linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-14 04:09:18 -04:00

Author	SHA1	Message	Date
Lorenzo Stoakes	c3ef2cc695	MAINTAINERS: add missing shrinker files The mm/list_lru.[ch] files implement a shrinker-specific data structure so seem most suited to the SHRINKER section. Link: https://lkml.kernel.org/r/20250722173436.145526-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:23 -07:00
Lorenzo Stoakes	2011011ad6	MAINTAINERS: move memremap.[ch] to hotplug section This seems to be the most appropriate place for these files. Link: https://lkml.kernel.org/r/20250722172258.143488-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:23 -07:00
Lorenzo Stoakes	651ad43d56	MAINTAINERS: add missing mm_slot.h file THP section This seems to be the most appropriate place for this file. [lorenzo.stoakes@oracle.com: also add mm_slot.h to KSM section] Link: https://lkml.kernel.org/r/685747e2-a8cb-4620-a0c0-5cd9048d69b8@lucifer.local Link: https://lkml.kernel.org/r/20250722171904.142306-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Nico Pache <npache@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dev Jain <dev.jain@arm.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:23 -07:00
Lorenzo Stoakes	85c16ee6fa	MAINTAINERS: add missing interval_tree.c to memory mapping section This seems to be the best place for this file. Link: https://lkml.kernel.org/r/20250722171528.141083-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:22 -07:00
Lorenzo Stoakes	44d10df200	MAINTAINERS: add missing percpu-internal.h file to per-cpu section This file seems to most appropriately belong to the PER-CPU MEMORY ALLOCATOR section, so place it there. Link: https://lkml.kernel.org/r/20250722171023.139777-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:22 -07:00
Zi Yan	48e6561b66	mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() The trace event has not recorded the right data since it was introduced at commit `c8b3600312` ("mm: add alloc_contig_migrate_range allocation statistics"). Remove it. Link: https://lkml.kernel.org/r/20250722194649.4135191-1-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507220742.P3SaKlI6-lkp@intel.com/ Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Chang <richardycc@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:22 -07:00
Enze Li	511914506d	selftests/damon: introduce _common.sh to host shared function The current test scripts contain duplicated root permission checks in multiple locations. This patch consolidates these checks into _common.sh to eliminate code redundancy. Link: https://lkml.kernel.org/r/20250718064217.299300-1-lienze@kylinos.cn Signed-off-by: Enze Li <lienze@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	da5973a0b8	selftests/damon/sysfs.py: test runtime reduction of DAMON parameters sysfs.py is testing if non-default additional parameters can be committed. Add a test case for further reducing the parameters to the default set. Link: https://lkml.kernel.org/r/20250720171652.92309-23-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	62b7b1ffa2	selftests/damon/sysfs.py: test non-default parameters runtime commit sysfs.py is testing only the default and minimum DAMON parameters. Add another test case for more non-default additional DAMON parameters commitment on runtime. Link: https://lkml.kernel.org/r/20250720171652.92309-22-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	16797a55aa	selftests/damon/sysfs.py: generalize DAMON context commit assertion DAMON context commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-21-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:21 -07:00
SeongJae Park	a4027b5f24	selftests/damon/sysfs.py: generalize monitoring attributes commit assertion DAMON monitoring attributes commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-20-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	771d7754ab	selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion DAMOS schemes commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-19-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	53f800581f	selftests/damon/sysfs.py: test DAMOS filters commitment Current DAMOS scheme commitment assertion is not testing DAMOS filters. Add the test. Link: https://lkml.kernel.org/r/20250720171652.92309-18-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	f22ff7b5a5	selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion DAMOS scheme commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-17-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	bd0487a774	selftests/damon/sysfs.py: test DAMOS destinations commitment Current DAMOS commitment assertion is not testing quota destinations commitment. Add the test. Link: https://lkml.kernel.org/r/20250720171652.92309-16-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:20 -07:00
SeongJae Park	84dc442bd5	selftests/damon/sysfs.py: test quota goal commitment Current DAMOS quota commitment assertion is not testing quota goal commitment. Add the test. Link: https://lkml.kernel.org/r/20250720171652.92309-15-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:19 -07:00
SeongJae Park	f797e709f7	selftests/damon/sysfs.py: generalize DamosQuota commit assertion DamosQuota commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-14-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:19 -07:00
SeongJae Park	b50c48de61	selftests/damon/sysfs.py: generalize DAMOS Watermarks commit assertion DamosWatermarks commitment assertion is hard-coded for a specific test case. Split it out into a general version that can be reused for different test cases. Link: https://lkml.kernel.org/r/20250720171652.92309-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:19 -07:00
SeongJae Park	a1d52cd030	selftests/damon/drgn_dump_damon_status: dump DAMOS filters drgn_dump_damon_status.py is a script for dumping DAMON internal status in json format. It is being used for seeing if DAMON parameters that are set using _damon_sysfs.py are actually passed to DAMON in the kernel space. It is, however, not dumping full DAMON internal status, and it makes increasing test coverage difficult. Add damos filters dumping for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:19 -07:00
SeongJae Park	eb413daaf2	selftests/damon/drgn_dump_damon_status: dump ctx->ops.id drgn_dump_damon_status.py is a script for dumping DAMON internal status in json format. It is being used for seeing if DAMON parameters that are set using _damon_sysfs.py are actually passed to DAMON in the kernel space. It is, however, not dumping full DAMON internal status, and it makes increasing test coverage difficult. Add ctx->ops.id dumping for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:18 -07:00
SeongJae Park	c1a6958957	selftests/damon/drgn_dump_damon_status: dump damos->migrate_dests drgn_dump_damon_status.py is a script for dumping DAMON internal status in json format. It is being used for seeing if DAMON parameters that are set using _damon_sysfs.py are actually passed to DAMON in the kernel space. It is, however, not dumping full DAMON internal status, and it makes increasing test coverage difficult. Add damos->migrate_dests dumping for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:18 -07:00
SeongJae Park	80d4e38107	selftests/damon/_damon_sysfs: use 2**32 - 1 as max nr_accesses and age nr_accesses and age are unsigned int. Use the proper max value. Link: https://lkml.kernel.org/r/20250720171652.92309-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:18 -07:00
SeongJae Park	86e541f0be	selftests/damon/_damon_sysfs: support DAMOS target_nid setup _damon_sysfs.py contains code for test-purpose DAMON sysfs interface control. Add support of DAMOS action destination target_nid setup for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:18 -07:00
SeongJae Park	fca6ddf44d	selftests/damon/_damon_sysfs: support DAMOS action dests setup _damon_sysfs.py contains code for test-purpose DAMON sysfs interface control. Add support of DAMOS action destinations setup for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:17 -07:00
SeongJae Park	229b0af664	selftests/damon/_damon_sysfs: support DAMOS quota goal nid setup _damon_sysfs.py contains code for test-purpose DAMON sysfs interface control. Add support of DAMOS quota goal nid setup for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:17 -07:00
SeongJae Park	ff5aae307b	selftests/damon/_damon_sysfs: support DAMOS quota weights setup _damon_sysfs.py contains code for test-purpose DAMON sysfs interface control. Add support of DAMOS quotas prioritization weights setup for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:17 -07:00
SeongJae Park	b436dfaad2	selftests/damon/_damon_sysfs: support monitoring intervals goal setup _damon_sysfs.py contains code for test-purpose DAMON sysfs interface control. Add support of the monitoring intervals auto-tune goal setup for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:17 -07:00
SeongJae Park	9d93c103ed	selftests/damon/_damon_sysfs: support DAMOS filters setup _damon_sysfs.py contains code for test-purpose DAMON sysfs interface control. Add support of DAMOS filters setup for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:16 -07:00
SeongJae Park	6da5e2961f	selftests/damon/_damon_sysfs: support DAMOS watermarks setup Patch series "selftests/damon/sysfs.py: test all parameters". sysfs.py tests if DAMON sysfs interface is passing the user-requested parameters to DAMON as expected. But only the default (minimum) parameters are being tested. This is partially because _damon_sysfs.py, which is the library for making the parameter requests, is not supporting the entire parameters. The internal DAMON status dump script (drgn_dump_damon_status.py) is also not dumping entire parameters. Extend the test coverage by updating parameters input and status dumping scripts to support all parameters, and writing additional tests using those. This increased test coverage actually found one real bug (https://lore.kernel.org/20250719181932.72944-1-sj@kernel.org). First seven patches (1-7) extend _damon_sysfs.py for all parameters setup. The eight patch (8) fixes _damon_sysfs.py to use correct max nr_acceses and age values for their type. Following three patches (9-11) extend drgn_dump_damon_status.py to dump full DAMON parameters. Following nine patches (12-20) refactor sysfs.py for general testing code reuse, and extend it for full parameters check. Finally, two patches (21 and 22) add test cases in sysfs.py for full parameters testing. This patch (of 22): _damon_sysfs.py contains code for test-purpose DAMON sysfs interface control. Add support of DAMOS watermarks setup for more tests. Link: https://lkml.kernel.org/r/20250720171652.92309-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250720171652.92309-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:16 -07:00
SeongJae Park	cf20cb9ad1	selftests/damon/sysfs.py: stop DAMON for dumping failures Commit `4ece018976` ("selftests/damon: add python and drgn-based DAMON sysfs test") in mm-stable tree introduced sysfs.py that runs drgn for dumping DAMON status. When the DAMON status dumping fails for reasons including drgn uninstalled environment, the test fails without stopping DAMON. Following DAMON selftests that assumes DAMON is not running when they executed therefore fail. Catch dumping failures and stop DAMON for that case. Link: https://lkml.kernel.org/r/20250722060330.56068-1-sj@kernel.org Fixes: `4ece018976` ("selftests/damon: add python and drgn-based DAMON sysfs test") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220707.9c5d6247-lkp@intel.com Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-26 15:08:16 -07:00
Matthew Wilcox (Oracle)	45cd52c44e	mm: remove grab_cache_page() All callers have been converted to use filemap_grab_folio(). Link: https://lkml.kernel.org/r/20250721204619.163883-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:43 -07:00
SeongJae Park	7e6c313069	mm/damon/ops-common: ignore migration request to invalid nodes damon_migrate_pages() tries migration even if the target node is invalid. If users mistakenly make such invalid requests via DAMOS_MIGRATE_{HOT,COLD} action, the below kernel BUG can happen. [ 7831.883495] BUG: unable to handle page fault for address: 0000000000001f48 [ 7831.884160] #PF: supervisor read access in kernel mode [ 7831.884681] #PF: error_code(0x0000) - not-present page [ 7831.885203] PGD 0 P4D 0 [ 7831.885468] Oops: Oops: 0000 [#1] SMP PTI [ 7831.885852] CPU: 31 UID: 0 PID: 94202 Comm: kdamond.0 Not tainted 6.16.0-rc5-mm-new-damon+ #93 PREEMPT(voluntary) [ 7831.886913] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014 [ 7831.887777] RIP: 0010:__alloc_frozen_pages_noprof (include/linux/mmzone.h:1724 include/linux/mmzone.h:1750 mm/page_alloc.c:4936 mm/page_alloc.c:5137) [...] [ 7831.895953] Call Trace: [ 7831.896195] <TASK> [ 7831.896397] __folio_alloc_noprof (mm/page_alloc.c:5183 mm/page_alloc.c:5192) [ 7831.896787] migrate_pages_batch (mm/migrate.c:1189 mm/migrate.c:1851) [ 7831.897228] ? __pfx_alloc_migration_target (mm/migrate.c:2137) [ 7831.897735] migrate_pages (mm/migrate.c:2078) [ 7831.898141] ? __pfx_alloc_migration_target (mm/migrate.c:2137) [ 7831.898664] damon_migrate_folio_list (mm/damon/ops-common.c:321 mm/damon/ops-common.c:354) [ 7831.899140] damon_migrate_pages (mm/damon/ops-common.c:405) [...] Add a target node validity check in damon_migrate_pages(). The validity check is stolen from that of do_pages_move(), which is being used for the move_pages() system call. Link: https://lkml.kernel.org/r/20250720185822.1451-1-sj@kernel.org Fixes: `b51820ebea` ("mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion") [6.11.x] Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Honggyu Kim <honggyu.kim@sk.com> Cc: Hyeongtak Ji <hyeongtak.ji@sk.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:43 -07:00
Lorenzo Stoakes	a27848a035	docs: update THP documentation to clarify sysfs "never" setting Rather confusingly, setting all Transparent Huge Page sysfs settings to "never" does not in fact result in THP being globally disabled. Rather, it results in khugepaged being disabled, but one can still obtain THP pages using madvise(..., MADV_COLLAPSE). This is something that has remained poorly documented for some time, and it is likely the received wisdom of most users of THP that never does, in fact, mean never. It is therefore important to highlight, very clearly, that this is not the case. [lorenzo.stoakes@oracle.com: update transhuge page to mention MADV_COLLAPSE] Link: https://lkml.kernel.org/r/d54d1dfb-f06d-4979-983b-73998f05867e@lucifer.local Link: https://lkml.kernel.org/r/20250721155530.75944-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:43 -07:00
Lorenzo Stoakes	7d6597dfef	tools/testing/selftests: explicitly test split multi VMA mremap move Check that moving a range of VMAs where we are offset into the first and last VMAs works correctly. This results in the VMAs being split at these points at which we are offset into VMAs. We explicitly test both the ordinary MREMAP_FIXED multi VMA move case and the MREMAP_DONTUNMAP multi VMA move case. Link: https://lkml.kernel.org/r/b04920bb6c09dc86c207c251eab8ec670fbbcaef.1753119043.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:42 -07:00
Lorenzo Stoakes	7062387ed6	tools/testing/selftests: test MREMAP_DONTUNMAP on multiple VMA move We support MREMAP_MAYMOVE \| MREMAP_FIXED \| MREMAP_DONTUNMAP for moving multiple VMAs via mremap(), so assert that the tests pass with both MREMAP_DONTUNMAP set and not set. Additionally, add success = false settings when mremap() fails. This is something that cannot realistically happen, so in no way impacted test outcome, but it is incorrect to indicate a test pass when something has failed. Link: https://lkml.kernel.org/r/d7359941981e4e44c774753b3e364d1c54928e6a.1753119043.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:42 -07:00
Lorenzo Stoakes	10aed7dac4	tools/testing/selftests: add mremap() shrink test for multiple VMAs Patch series "tools/testing: expand mremap testing". Expand our mremap() testing to further assert that behaviour is as expected. There is a poorly documented mremap() feature whereby it is possible to mremap() multiple VMAs (even with gaps) when shrinking, as long as the resultant shrunk range spans only a single VMA. So we start by asserting this behaviour functions correctly both with an in-place shrink and a shrink/move. Next, we further test the newly introduced ability to mremap() multiple VMAs when performing a MAP_FIXED move (that is without the size being changed), firstly by asserting that MREMAP_DONTUNMAP has no bearing on this behaviour. Finally, we explicitly test that such moves, when splitting source VMAs, function correctly. This patch (of 3): There is an apparently little-known feature of mremap() whereby, in stark contrast to other modes (other than the recently introduced capacity to move multiple VMAs), the input source range span multiple VMAs with gaps between. This is, when shrinking a VMA, whether moving it or not, and the shrink would reduce the range to a single VMA - this is permitted, as the shrink is actioned by an unmap. This patch adds tests to assert that this behaves as expected. Link: https://lkml.kernel.org/r/cover.1753119043.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/f08122893a26092a2bec6e69443e87f468ffdbed.1753119043.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:42 -07:00
wang lian	6f1cc9fb47	selftests/mm: guard-regions: Use SKIP() instead of ksft_exit_skip() To ensure only the current test is skipped on permission failure, instead of terminating the entire test binary. Link: https://lkml.kernel.org/r/20250717131857.59909-3-lianux.mm@gmail.com Signed-off-by: wang lian <lianux.mm@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:42 -07:00
wang lian	3f6bfd4789	selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" Patch series "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup", v2. This series introduces a common FORCE_READ() macro to replace the cryptic asm volatile("" : "+r" (variable)); construct used in several mm selftests. This improves code readability and maintainability by removing duplicated, hard-to-understand code. This patch (of 2): Several mm selftests use the `asm volatile("" : "+r" (variable));` construct to force a read of a variable, preventing the compiler from optimizing away the memory access. This idiom is cryptic and duplicated across multiple test files. Following a suggestion from David[1], this patch refactors this common pattern into a FORCE_READ() macro Link: https://lkml.kernel.org/r/20250717131857.59909-1-lianux.mm@gmail.com Link: https://lkml.kernel.org/r/20250717131857.59909-2-lianux.mm@gmail.com Link: https://lore.kernel.org/lkml/4a3e0759-caa1-4cfa-bc3f-402593f1eee3@redhat.com/ [1] Signed-off-by: wang lian <lianux.mm@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:41 -07:00
Dev Jain	7efa1cd5f8	arm64: add batched versions of ptep_modify_prot_start/commit Override the generic definition of modify_prot_start_ptes() to use get_and_clear_full_ptes(). This helper does a TLBI only for the starting and ending contpte block of the range, whereas the current implementation will call ptep_get_and_clear() for every contpte block, thus doing a TLBI on every contpte block. Therefore, we have a performance win. The arm64 definition of pte_accessible() allows us to batch in the errata specific case: #define pte_accessible(mm, pte) \ (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte)) All ptes are obviously present in the folio batch, and they are also valid. Override the generic definition of modify_prot_commit_ptes() to simply use set_ptes() to map the new ptes into the pagetable. Link: https://lkml.kernel.org/r/20250718090244.21092-8-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:41 -07:00
Dev Jain	cac1db8c3a	mm: optimize mprotect() by PTE batching Use folio_pte_batch to batch process a large folio. Note that, PTE batching here will save a few function calls, and this strategy in certain cases (not this one) batches atomic operations in general, so we have a performance win for all arches. This patch paves the way for patch 7 which will help us elide the TLBI per contig block on arm64. The correctness of this patch lies on the correctness of setting the new ptes based upon information only from the first pte of the batch (which may also have accumulated a/d bits via modify_prot_start_ptes()). Observe that the flag combination we pass to mprotect_folio_pte_batch() guarantees that the batch is uniform w.r.t the soft-dirty bit and the writable bit. Therefore, the only bits which may differ are the a/d bits. So we only need to worry about code which is concerned about the a/d bits of the PTEs. Setting extra a/d bits on the new ptes where previously they were not set, is fine - setting access bit when it was not set is not an incorrectness problem but will only possibly delay the reclaim of the page mapped by the pte (which is in fact intended because the kernel just operated on this region via mprotect()!). Setting dirty bit when it was not set is again not an incorrectness problem but will only possibly force an unnecessary writeback. So now we need to reason whether something can go wrong via can_change_pte_writable(). The pte_protnone, pte_needs_soft_dirty_wp, and userfaultfd_pte_wp cases are solved due to uniformity in the corresponding bits guaranteed by the flag combination. The ptes all belong to the same VMA (since callers guarantee that [start, end) will lie within the VMA) therefore the conditional based on the VMA is also safe to batch around. Since the dirty bit on the PTE really is just an indication that the folio got written to - even if the PTE is not actually dirty but one of the PTEs in the batch is, the wp-fault optimization can be made. Therefore, it is safe to batch around pte_dirty() in can_change_shared_pte_writable() (in fact this is better since without batching, it may happen that some ptes aren't changed to writable just because they are not dirty, even though the other ptes mapping the same large folio are dirty). To batch around the PageAnonExclusive case, we must check the corresponding condition for every single page. Therefore, from the large folio batch, we process sub batches of ptes mapping pages with the same PageAnonExclusive condition, and process that sub batch, then determine and process the next sub batch, and so on. Note that this does not cause any extra overhead; if suppose the size of the folio batch is 512, then the sub batch processing in total will take 512 iterations, which is the same as what we would have done before. For pte_needs_flush(): ppc does not care about the a/d bits. For x86, PAGE_SAVED_DIRTY is ignored. We will flush only when a/d bits get cleared; since we can only have extra a/d bits due to batching, we will only have an extra flush, not a case where we elide a flush due to batching when we shouldn't have. Link: https://lkml.kernel.org/r/20250718090244.21092-7-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:41 -07:00
Dev Jain	45199f715b	mm: split can_change_pte_writable() into private and shared parts In preparation for patch 6 and modularizing the code in general, split can_change_pte_writable() into private and shared VMA parts. No functional change intended. Link: https://lkml.kernel.org/r/20250718090244.21092-6-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:41 -07:00
Dev Jain	57fae936b4	mm: introduce FPB_RESPECT_WRITE for PTE batching infrastructure Patch 6 ("mm: Optimize mprotect() by PTE batching") optimizes mprotect() by batch clearing the ptes, masking in the new protections, and batch setting the ptes. Suppose that the first pte of the batch is writable - with the current implementation of folio_pte_batch(), it is not guaranteed that the other ptes in the batch are already writable too, so we may incorrectly end up setting the writable bit on all ptes via modify_prot_commit_ptes(). Therefore, introduce FPB_RESPECT_WRITE so that all ptes in the batch are writable or not. Link: https://lkml.kernel.org/r/20250718090244.21092-5-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:40 -07:00
Dev Jain	0aa3657df3	mm: add batched versions of ptep_modify_prot_start/commit Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect, implementing them as a simple loop over the corresponding single pte helpers. Architecture may override these helpers. Link: https://lkml.kernel.org/r/20250718090244.21092-4-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:40 -07:00
Dev Jain	1d40f4e3d9	mm: optimize mprotect() for MM_CP_PROT_NUMA by batch-skipping PTEs For the MM_CP_PROT_NUMA skipping case, observe that, if we skip an iteration due to the underlying folio satisfying any of the skip conditions, then for all subsequent ptes which map the same folio, the iteration will be skipped for them too. Therefore, we can optimize by using folio_pte_batch() to batch skip the iterations. Use prot_numa_skip() introduced in the previous patch to determine whether we need to skip the iteration. Change its signature to have a double pointer to a folio, which will be used by mprotect_folio_pte_batch() to determine the number of iterations we can safely skip. Link: https://lkml.kernel.org/r/20250718090244.21092-3-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:40 -07:00
Dev Jain	b9bf6c2872	mm: refactor MM_CP_PROT_NUMA skipping case into new function Patch series "Optimize mprotect() for large folios", v5. Use folio_pte_batch() to optimize change_pte_range(). On arm64, if the ptes are painted with the contig bit, then ptep_get() will iterate through all 16 entries to collect a/d bits. Hence this optimization will result in a 16x reduction in the number of ptep_get() calls. Next, ptep_modify_prot_start() will eventually call contpte_try_unfold() on every contig block, thus flushing the TLB for the complete large folio range. Instead, use get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only do them on the starting and ending contig block. For split folios, there will be no pte batching; the batch size returned by folio_pte_batch() will be 1. For pagetable split folios, the ptes will still point to the same large folio; for arm64, this results in the optimization described above, and for other arches, a minor improvement is expected due to a reduction in the number of function calls. mm-selftests pass on arm64. I have some failing tests on my x86 VM already; no new tests fail as a result of this patchset. We use the following test cases to measure performance, mprotect()'ing the mapped memory to read-only then read-write 40 times: Test case 1: Mapping 1G of memory, touching it to get PMD-THPs, then pte-mapping those THPs Test case 2: Mapping 1G of memory with 64K mTHPs Test case 3: Mapping 1G of memory with 4K pages Average execution time on arm64, Apple M3: Before the patchset: T1: 2.1 seconds T2: 2 seconds T3: 1 second After the patchset: T1: 0.65 seconds T2: 0.7 seconds T3: 1.1 seconds Observing T1/T2 and T3 before the patchset, we also remove the regression introduced by ptep_get() on a contpte block. And, for large folios we get an almost 74% performance improvement, albeit the trade-off being a slight degradation in the small folio case. For x86: Before the patchset: T1: 3.75 seconds T2: 3.7 seconds T3: 3.85 seconds After the patchset: T1: 3.7 seconds T2: 3.7 seconds T3: 3.9 seconds So there is a minor improvement due to reduction in number of function calls, and a slight degradation in the small folio case due to the overhead of vm_normal_folio() + folio_test_large(). Here is the test program: #define _GNU_SOURCE #include <sys/mman.h> #include <stdlib.h> #include <string.h> #include <stdio.h> #include <unistd.h> #define SIZE (102410241024) unsigned long pmdsize = (1UL << 21); unsigned long pagesize = (1UL << 12); static void pte_map_thps(char mem, size_t size) { size_t offs; int ret = 0; / PTE-map each THP by temporarily splitting the VMAs. / for (offs = 0; offs < size; offs += pmdsize) { ret \|= madvise(mem + offs, pagesize, MADV_DONTFORK); ret \|= madvise(mem + offs, pagesize, MADV_DOFORK); } if (ret) { fprintf(stderr, "ERROR: mprotect() failed\n"); exit(1); } } int main(int argc, char argv[]) { char *p; int ret = 0; p = mmap((1UL << 30), SIZE, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0); if (p != (1UL << 30)) { perror("mmap"); return 1; } memset(p, 0, SIZE); if (madvise(p, SIZE, MADV_NOHUGEPAGE)) perror("madvise"); explicit_bzero(p, SIZE); pte_map_thps(p, SIZE); for (int loops = 0; loops < 40; loops++) { if (mprotect(p, SIZE, PROT_READ)) perror("mprotect"), exit(1); if (mprotect(p, SIZE, PROT_READ\|PROT_WRITE)) perror("mprotect"), exit(1); explicit_bzero(p, SIZE); } } This patch (of 7): Reduce indentation by refactoring the prot_numa case into a new function. No functional change intended. Link: https://lkml.kernel.org/r/20250718090244.21092-1-dev.jain@arm.com Link: https://lkml.kernel.org/r/20250718090244.21092-2-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:40 -07:00
Zi Yan	fde47708f9	mm/huge_memory: refactor after-split (page) cache code Smatch/coverity checkers report NULL mapping referencing issues[1][2][3] every time the code is modified, because they do not understand that mapping cannot be NULL when a folio is in page cache in the code. Refactor the code to make it explicit. Remove "end = -1" for anonymous folios, since after code refactoring, end is no longer used by anonymous folio handling code. No functional change is intended. Link: https://lkml.kernel.org/r/20250718023000.4044406-7-ziy@nvidia.com Link: https://lore.kernel.org/linux-mm/2afe3d59-aca5-40f7-82a3-a6d976fb0f4f@stanley.mountain/ [1] Link: https://lore.kernel.org/oe-kbuild/64b54034-f311-4e7d-b935-c16775dbb642@suswa.mountain/ [2] Link: https://lore.kernel.org/linux-mm/20250716145804.4836-1-antonio@mandelbit.com/ [3] Link: https://lkml.kernel.org/r/20250718183720.4054515-7-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <k.shutemov@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:39 -07:00
Zi Yan	a3871560ff	mm/huge_memory: get frozen folio refcount with folio_expected_ref_count() Instead of open coding the refcount calculation, use folio_expected_ref_count() to calculate frozen folio refcount. Because: 1. __folio_split() does not split a folio with PG_private, so no elevated refcount from PG_private; 2. a frozen folio in __folio_split() is fully unmapped, so folio_mapcount() in folio_expected_ref_count() is always 0; 3. (mapping \|\| swap_cache) ? folio_nr_pages(folio) is taken care of by folio_expected_ref_count() too. Link: https://lkml.kernel.org/r/20250718023000.4044406-6-ziy@nvidia.com Link: https://lkml.kernel.org/r/20250718183720.4054515-6-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Antonio Quartulli <antonio@mandelbit.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <k.shutemov@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:39 -07:00
Zi Yan	714b056c83	mm/huge_memory: convert VM_BUG* to VM_WARN* in __folio_split These VM_BUG* can be handled gracefully without crashing kernel. Link: https://lkml.kernel.org/r/20250718023000.4044406-5-ziy@nvidia.com Link: https://lkml.kernel.org/r/20250718183720.4054515-5-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Antonio Quartulli <antonio@mandelbit.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <k.shutemov@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:39 -07:00
Zi Yan	391dc7f405	mm/huge_memory: deduplicate code in __folio_split() xas unlock, remap_page(), local_irq_enable() are moved out of if branches to deduplicate the code. While at it, add remap_flags to clean up remap_page() call site. nr_dropped is renamed to nr_shmem_dropped, as it becomes a variable at __folio_split() scope. Link: https://lkml.kernel.org/r/20250718183720.4054515-4-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Antonio Quartulli <antonio@mandelbit.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <k.shutemov@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:39 -07:00
Zi Yan	3653fc1bdb	mm/huge_memory: remove after_split label in __split_unmapped_folio() Check stop_split instead to avoid the goto statement. Link: https://lkml.kernel.org/r/20250718183720.4054515-3-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Antonio Quartulli <antonio@mandelbit.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <k.shutemov@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:38 -07:00

1 2 3 4 5 ...

1368987 Commits