mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-02-15 09:44:21 -05:00
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
308 lines
9.3 KiB
C
308 lines
9.3 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _LINUX_MEMREMAP_H_
|
|
#define _LINUX_MEMREMAP_H_
|
|
|
|
#include <linux/mmzone.h>
|
|
#include <linux/range.h>
|
|
#include <linux/ioport.h>
|
|
#include <linux/percpu-refcount.h>
|
|
|
|
struct resource;
|
|
struct device;
|
|
|
|
/**
|
|
* struct vmem_altmap - pre-allocated storage for vmemmap_populate
|
|
* @base_pfn: base of the entire dev_pagemap mapping
|
|
* @reserve: pages mapped, but reserved for driver use (relative to @base)
|
|
* @free: free pages set aside in the mapping for memmap storage
|
|
* @align: pages reserved to meet allocation alignments
|
|
* @alloc: track pages consumed, private to vmemmap_populate()
|
|
*/
|
|
struct vmem_altmap {
|
|
unsigned long base_pfn;
|
|
const unsigned long end_pfn;
|
|
const unsigned long reserve;
|
|
unsigned long free;
|
|
unsigned long align;
|
|
unsigned long alloc;
|
|
};
|
|
|
|
/*
|
|
* Specialize ZONE_DEVICE memory into multiple types each has a different
|
|
* usage.
|
|
*
|
|
* MEMORY_DEVICE_PRIVATE:
|
|
* Device memory that is not directly addressable by the CPU: CPU can neither
|
|
* read nor write private memory. In this case, we do still have struct pages
|
|
* backing the device memory. Doing so simplifies the implementation, but it is
|
|
* important to remember that there are certain points at which the struct page
|
|
* must be treated as an opaque object, rather than a "normal" struct page.
|
|
*
|
|
* A more complete discussion of unaddressable memory may be found in
|
|
* include/linux/hmm.h and Documentation/mm/hmm.rst.
|
|
*
|
|
* MEMORY_DEVICE_COHERENT:
|
|
* Device memory that is cache coherent from device and CPU point of view. This
|
|
* is used on platforms that have an advanced system bus (like CAPI or CXL). A
|
|
* driver can hotplug the device memory using ZONE_DEVICE and with that memory
|
|
* type. Any page of a process can be migrated to such memory. However no one
|
|
* should be allowed to pin such memory so that it can always be evicted.
|
|
*
|
|
* MEMORY_DEVICE_FS_DAX:
|
|
* Host memory that has similar access semantics as System RAM i.e. DMA
|
|
* coherent and supports page pinning. In support of coordinating page
|
|
* pinning vs other operations MEMORY_DEVICE_FS_DAX arranges for a
|
|
* wakeup event whenever a page is unpinned and becomes idle. This
|
|
* wakeup is used to coordinate physical address space management (ex:
|
|
* fs truncate/hole punch) vs pinned pages (ex: device dma).
|
|
*
|
|
* MEMORY_DEVICE_GENERIC:
|
|
* Host memory that has similar access semantics as System RAM i.e. DMA
|
|
* coherent and supports page pinning. This is for example used by DAX devices
|
|
* that expose memory using a character device.
|
|
*
|
|
* MEMORY_DEVICE_PCI_P2PDMA:
|
|
* Device memory residing in a PCI BAR intended for use with Peer-to-Peer
|
|
* transactions.
|
|
*/
|
|
enum memory_type {
|
|
/* 0 is reserved to catch uninitialized type fields */
|
|
MEMORY_DEVICE_PRIVATE = 1,
|
|
MEMORY_DEVICE_COHERENT,
|
|
MEMORY_DEVICE_FS_DAX,
|
|
MEMORY_DEVICE_GENERIC,
|
|
MEMORY_DEVICE_PCI_P2PDMA,
|
|
};
|
|
|
|
struct dev_pagemap_ops {
|
|
/*
|
|
* Called once the folio refcount reaches 0. The reference count will be
|
|
* reset to one by the core code after the method is called to prepare
|
|
* for handing out the folio again.
|
|
*/
|
|
void (*folio_free)(struct folio *folio);
|
|
|
|
/*
|
|
* Used for private (un-addressable) device memory only. Must migrate
|
|
* the page back to a CPU accessible page.
|
|
*/
|
|
vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf);
|
|
|
|
/*
|
|
* Handle the memory failure happens on a range of pfns. Notify the
|
|
* processes who are using these pfns, and try to recover the data on
|
|
* them if necessary. The mf_flags is finally passed to the recover
|
|
* function through the whole notify routine.
|
|
*
|
|
* When this is not implemented, or it returns -EOPNOTSUPP, the caller
|
|
* will fall back to a common handler called mf_generic_kill_procs().
|
|
*/
|
|
int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn,
|
|
unsigned long nr_pages, int mf_flags);
|
|
|
|
/*
|
|
* Used for private (un-addressable) device memory only.
|
|
* This callback is used when a folio is split into
|
|
* a smaller folio
|
|
*/
|
|
void (*folio_split)(struct folio *head, struct folio *tail);
|
|
};
|
|
|
|
#define PGMAP_ALTMAP_VALID (1 << 0)
|
|
|
|
/**
|
|
* struct dev_pagemap - metadata for ZONE_DEVICE mappings
|
|
* @altmap: pre-allocated/reserved memory for vmemmap allocations
|
|
* @ref: reference count that pins the devm_memremap_pages() mapping
|
|
* @done: completion for @ref
|
|
* @type: memory type: see MEMORY_* above in memremap.h
|
|
* @flags: PGMAP_* flags to specify defailed behavior
|
|
* @vmemmap_shift: structural definition of how the vmemmap page metadata
|
|
* is populated, specifically the metadata page order.
|
|
* A zero value (default) uses base pages as the vmemmap metadata
|
|
* representation. A bigger value will set up compound struct pages
|
|
* of the requested order value.
|
|
* @ops: method table
|
|
* @owner: an opaque pointer identifying the entity that manages this
|
|
* instance. Used by various helpers to make sure that no
|
|
* foreign ZONE_DEVICE memory is accessed.
|
|
* @nr_range: number of ranges to be mapped
|
|
* @range: range to be mapped when nr_range == 1
|
|
* @ranges: array of ranges to be mapped when nr_range > 1
|
|
*/
|
|
struct dev_pagemap {
|
|
struct vmem_altmap altmap;
|
|
struct percpu_ref ref;
|
|
struct completion done;
|
|
enum memory_type type;
|
|
unsigned int flags;
|
|
unsigned long vmemmap_shift;
|
|
const struct dev_pagemap_ops *ops;
|
|
void *owner;
|
|
int nr_range;
|
|
union {
|
|
struct range range;
|
|
DECLARE_FLEX_ARRAY(struct range, ranges);
|
|
};
|
|
};
|
|
|
|
static inline bool pgmap_has_memory_failure(struct dev_pagemap *pgmap)
|
|
{
|
|
return pgmap->ops && pgmap->ops->memory_failure;
|
|
}
|
|
|
|
static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap)
|
|
{
|
|
if (pgmap->flags & PGMAP_ALTMAP_VALID)
|
|
return &pgmap->altmap;
|
|
return NULL;
|
|
}
|
|
|
|
static inline unsigned long pgmap_vmemmap_nr(struct dev_pagemap *pgmap)
|
|
{
|
|
return 1 << pgmap->vmemmap_shift;
|
|
}
|
|
|
|
static inline bool folio_is_device_private(const struct folio *folio)
|
|
{
|
|
return IS_ENABLED(CONFIG_DEVICE_PRIVATE) &&
|
|
folio_is_zone_device(folio) &&
|
|
folio->pgmap->type == MEMORY_DEVICE_PRIVATE;
|
|
}
|
|
|
|
static inline bool is_device_private_page(const struct page *page)
|
|
{
|
|
return IS_ENABLED(CONFIG_DEVICE_PRIVATE) &&
|
|
folio_is_device_private(page_folio(page));
|
|
}
|
|
|
|
static inline bool folio_is_pci_p2pdma(const struct folio *folio)
|
|
{
|
|
return IS_ENABLED(CONFIG_PCI_P2PDMA) &&
|
|
folio_is_zone_device(folio) &&
|
|
folio->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
|
|
}
|
|
|
|
static inline void *folio_zone_device_data(const struct folio *folio)
|
|
{
|
|
VM_WARN_ON_FOLIO(!folio_is_device_private(folio), folio);
|
|
return folio->page.zone_device_data;
|
|
}
|
|
|
|
static inline void folio_set_zone_device_data(struct folio *folio, void *data)
|
|
{
|
|
VM_WARN_ON_FOLIO(!folio_is_device_private(folio), folio);
|
|
folio->page.zone_device_data = data;
|
|
}
|
|
|
|
static inline bool is_pci_p2pdma_page(const struct page *page)
|
|
{
|
|
return IS_ENABLED(CONFIG_PCI_P2PDMA) &&
|
|
folio_is_pci_p2pdma(page_folio(page));
|
|
}
|
|
|
|
static inline bool folio_is_device_coherent(const struct folio *folio)
|
|
{
|
|
return folio_is_zone_device(folio) &&
|
|
folio->pgmap->type == MEMORY_DEVICE_COHERENT;
|
|
}
|
|
|
|
static inline bool is_device_coherent_page(const struct page *page)
|
|
{
|
|
return folio_is_device_coherent(page_folio(page));
|
|
}
|
|
|
|
static inline bool folio_is_fsdax(const struct folio *folio)
|
|
{
|
|
return folio_is_zone_device(folio) &&
|
|
folio->pgmap->type == MEMORY_DEVICE_FS_DAX;
|
|
}
|
|
|
|
static inline bool is_fsdax_page(const struct page *page)
|
|
{
|
|
return folio_is_fsdax(page_folio(page));
|
|
}
|
|
|
|
#ifdef CONFIG_ZONE_DEVICE
|
|
void zone_device_page_init(struct page *page, unsigned int order);
|
|
void *memremap_pages(struct dev_pagemap *pgmap, int nid);
|
|
void memunmap_pages(struct dev_pagemap *pgmap);
|
|
void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
|
|
void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap);
|
|
struct dev_pagemap *get_dev_pagemap(unsigned long pfn);
|
|
bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn);
|
|
|
|
unsigned long memremap_compat_align(void);
|
|
|
|
static inline void zone_device_folio_init(struct folio *folio, unsigned int order)
|
|
{
|
|
zone_device_page_init(&folio->page, order);
|
|
if (order)
|
|
folio_set_large_rmappable(folio);
|
|
}
|
|
|
|
static inline void zone_device_private_split_cb(struct folio *original_folio,
|
|
struct folio *new_folio)
|
|
{
|
|
if (folio_is_device_private(original_folio)) {
|
|
if (!original_folio->pgmap->ops->folio_split) {
|
|
if (new_folio) {
|
|
new_folio->pgmap = original_folio->pgmap;
|
|
new_folio->page.mapping =
|
|
original_folio->page.mapping;
|
|
}
|
|
} else {
|
|
original_folio->pgmap->ops->folio_split(original_folio,
|
|
new_folio);
|
|
}
|
|
}
|
|
}
|
|
|
|
#else
|
|
static inline void *devm_memremap_pages(struct device *dev,
|
|
struct dev_pagemap *pgmap)
|
|
{
|
|
/*
|
|
* Fail attempts to call devm_memremap_pages() without
|
|
* ZONE_DEVICE support enabled, this requires callers to fall
|
|
* back to plain devm_memremap() based on config
|
|
*/
|
|
WARN_ON_ONCE(1);
|
|
return ERR_PTR(-ENXIO);
|
|
}
|
|
|
|
static inline void devm_memunmap_pages(struct device *dev,
|
|
struct dev_pagemap *pgmap)
|
|
{
|
|
}
|
|
|
|
static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn)
|
|
{
|
|
return NULL;
|
|
}
|
|
|
|
static inline bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn)
|
|
{
|
|
return false;
|
|
}
|
|
|
|
/* when memremap_pages() is disabled all archs can remap a single page */
|
|
static inline unsigned long memremap_compat_align(void)
|
|
{
|
|
return PAGE_SIZE;
|
|
}
|
|
|
|
static inline void zone_device_private_split_cb(struct folio *original_folio,
|
|
struct folio *new_folio)
|
|
{
|
|
}
|
|
#endif /* CONFIG_ZONE_DEVICE */
|
|
|
|
static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
|
|
{
|
|
if (pgmap)
|
|
percpu_ref_put(&pgmap->ref);
|
|
}
|
|
|
|
#endif /* _LINUX_MEMREMAP_H_ */
|