Commit Graph

1157 Commits

Author SHA1 Message Date
Linus Torvalds
8804d970fa Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - "mm, swap: improve cluster scan strategy" from Kairui Song improves
   performance and reduces the failure rate of swap cluster allocation

 - "support large align and nid in Rust allocators" from Vitaly Wool
   permits Rust allocators to set NUMA node and large alignment when
   perforning slub and vmalloc reallocs

 - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
   DAMOS_STAT's handling of the DAMON operations sets for virtual
   address spaces for ops-level DAMOS filters

 - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
   Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps

 - "mm/mincore: minor clean up for swap cache checking" from Kairui Song
   performs some cleanup in the swap code

 - "mm: vm_normal_page*() improvements" from David Hildenbrand provides
   code cleanup in the pagemap code

 - "add persistent huge zero folio support" from Pankaj Raghav provides
   a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero

 - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
   the recently added Kexec Handover feature

 - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
   Stoakes turns mm_struct.flags into a bitmap. To end the constant
   struggle with space shortage on 32-bit conflicting with 64-bit's
   needs

 - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
   code

 - "selftests/mm: Fix false positives and skip unsupported tests" from
   Donet Tom fixes a few things in our selftests code

 - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
   from David Hildenbrand "allows individual processes to opt-out of
   THP=always into THP=madvise, without affecting other workloads on the
   system".

   It's a long story - the [1/N] changelog spells out the considerations

 - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
   the memdesc project. Please see

      https://kernelnewbies.org/MatthewWilcox/Memdescs and
      https://blogs.oracle.com/linux/post/introducing-memdesc

 - "Tiny optimization for large read operations" from Chi Zhiling
   improves the efficiency of the pagecache read path

 - "Better split_huge_page_test result check" from Zi Yan improves our
   folio splitting selftest code

 - "test that rmap behaves as expected" from Wei Yang adds some rmap
   selftests

 - "remove write_cache_pages()" from Christoph Hellwig removes that
   function and converts its two remaining callers

 - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
   selftests issues

 - "introduce kernel file mapped folios" from Boris Burkov introduces
   the concept of "kernel file pages". Using these permits btrfs to
   account its metadata pages to the root cgroup, rather than to the
   cgroups of random inappropriate tasks

 - "mm/pageblock: improve readability of some pageblock handling" from
   Wei Yang provides some readability improvements to the page allocator
   code

 - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
   to understand arm32 highmem

 - "tools: testing: Use existing atomic.h for vma/maple tests" from
   Brendan Jackman performs some code cleanups and deduplication under
   tools/testing/

 - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
   a couple of 32-bit issues in tools/testing/radix-tree.c

 - "kasan: unify kasan_enabled() and remove arch-specific
   implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
   initialization code into a common arch-neutral implementation

 - "mm: remove zpool" from Johannes Weiner removes zspool - an
   indirection layer which now only redirects to a single thing
   (zsmalloc)

 - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
   couple of cleanups in the fork code

 - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
   adjustments at various nth_page() callsites, eventually permitting
   the removal of that undesirable helper function

 - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
   creates a KASAN read-only mode for ARM, using that architecture's
   memory tagging feature. It is felt that a read-only mode KASAN is
   suitable for use in production systems rather than debug-only

 - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
   some tidying in the hugetlb folio allocation code

 - "mm: establish const-correctness for pointer parameters" from Max
   Kellermann makes quite a number of the MM API functions more accurate
   about the constness of their arguments. This was getting in the way
   of subsystems (in this case CEPH) when they attempt to improving
   their own const/non-const accuracy

 - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
   code sites which were confused over when to use free_pages() vs
   __free_pages()

 - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
   mapletree code accessible to Rust. Required by nouveau and by its
   forthcoming successor: the new Rust Nova driver

 - "selftests/mm: split_huge_page_test: split_pte_mapped_thp
   improvements" from David Hildenbrand adds a fix and some cleanups to
   the thp selftesting code

 - "mm, swap: introduce swap table as swap cache (phase I)" from Chris
   Li and Kairui Song is the first step along the path to implementing
   "swap tables" - a new approach to swap allocation and state tracking
   which is expected to yield speed and space improvements. This
   patchset itself yields a 5-20% performance benefit in some situations

 - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
   layer to clean up the ptdesc code a little

 - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
   issues in our 5-level pagetable selftesting code

 - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
   addresses a couple of minor issues in relatively new memory
   allocation profiling feature

 - "Small cleanups" from Matthew Wilcox has a few cleanups in
   preparation for more memdesc work

 - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
   Quanmin Yan makes some changes to DAMON in furtherance of supporting
   arm highmem

 - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
   Anjum adds that compiler check to selftests code and fixes the
   fallout, by removing dead code

 - "Improvements to Victim Process Thawing and OOM Reaper Traversal
   Order" from zhongjinji makes a number of improvements in the OOM
   killer: mainly thawing a more appropriate group of victim threads so
   they can release resources

 - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
   is a bunch of small and unrelated fixups for DAMON

 - "mm/damon: define and use DAMON initialization check function" from
   SeongJae Park implement reliability and maintainability improvements
   to a recently-added bug fix

 - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
   SeongJae Park provides additional transparency to userspace clients
   of the DAMON_STAT information

 - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
   some constraints on khubepaged's collapsing of anon VMAs. It also
   increases the success rate of MADV_COLLAPSE against an anon vma

 - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
   from Lorenzo Stoakes moves us further towards removal of
   file_operations.mmap(). This patchset concentrates upon clearing up
   the treatment of stacked filesystems

 - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
   provides some fixes and improvements to mlock's tracking of large
   folios. /proc/meminfo's "Mlocked" field became more accurate

 - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
   Donet Tom fixes several user-visible KSM stats inaccuracies across
   forks and adds selftest code to verify these counters

 - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
   some potential but presently benign issues in KSM's mm_slot handling

* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
  mm: swap: check for stable address space before operating on the VMA
  mm: convert folio_page() back to a macro
  mm/khugepaged: use start_addr/addr for improved readability
  hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
  alloc_tag: fix boot failure due to NULL pointer dereference
  mm: silence data-race in update_hiwater_rss
  mm/memory-failure: don't select MEMORY_ISOLATION
  mm/khugepaged: remove definition of struct khugepaged_mm_slot
  mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  hugetlb: increase number of reserving hugepages via cmdline
  selftests/mm: add fork inheritance test for ksm_merging_pages counter
  mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
  drivers/base/node: fix double free in register_one_node()
  mm: remove PMD alignment constraint in execmem_vmalloc()
  mm/memory_hotplug: fix typo 'esecially' -> 'especially'
  mm/rmap: improve mlock tracking for large folios
  mm/filemap: map entire large folio faultaround
  mm/fault: try to map the entire file folio in finish_fault()
  mm/rmap: mlock large folios in try_to_unmap_one()
  mm/rmap: fix a mlock race condition in folio_referenced_one()
  ...
2025-10-02 18:18:33 -07:00
Linus Torvalds
7f70725741 Merge tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nathan Chancellor:

 - Extend modules.builtin.modinfo to include module aliases from
   MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such
   as kmod) can verify that a particular module alias will be handled by
   a builtin module

 - Bump the minimum version of LLVM for building the kernel to 15.0.0

 - Upgrade several userspace API checks in headers_check.pl to errors

 - Unify and consolidate CONFIG_WERROR / W=e handling

 - Turn assembler and linker warnings into errors with CONFIG_WERROR /
   W=e

 - Respect CONFIG_WERROR / W=e when building userspace programs
   (userprogs)

 - Enable -Werror unconditionally when building host programs
   (hostprogs)

 - Support copy_file_range() and data segment alignment in gen_init_cpio
   to improve performance on filesystems that support reflinks such as
   btrfs and XFS

 - Miscellaneous small changes to scripts and configuration files

* tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (47 commits)
  modpost: Initialize builtin_modname to stop SIGSEGVs
  Documentation: kbuild: note CONFIG_DEBUG_EFI in reproducible builds
  kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o
  modpost: Create modalias for builtin modules
  modpost: Add modname to mod_device_table alias
  scsi: Always define blogic_pci_tbl structure
  kbuild: extract modules.builtin.modinfo from vmlinux.unstripped
  kbuild: keep .modinfo section in vmlinux.unstripped
  kbuild: always create intermediate vmlinux.unstripped
  s390: vmlinux.lds.S: Reorder sections
  KMSAN: Remove tautological checks
  objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY
  lib/Kconfig.debug: Drop CLANG_VERSION check from DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
  riscv: Remove ld.lld version checks from many TOOLCHAIN_HAS configs
  riscv: Unconditionally use linker relaxation
  riscv: Remove version check for LTO_CLANG selects
  powerpc: Drop unnecessary initializations in __copy_inst_from_kernel_nofault()
  mips: Unconditionally select ARCH_HAS_CURRENT_STACK_POINTER
  arm64: Remove tautological LLVM Kconfig conditions
  ARM: Clean up definition of ARM_HAS_GROUP_RELOCS
  ...
2025-10-01 20:58:51 -07:00
Linus Torvalds
4b81e2eb9e Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:

 - Further consolidation of the VDSO infrastructure and the common data
   store

 - Simplification of the related Kconfig logic

 - Improve the VDSO selftest suite

* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests: vDSO: Drop vdso_test_clock_getres
  selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
  selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
  selftests: vDSO: vdso_test_abi: Use explicit indices for name array
  selftests: vDSO: vdso_test_abi: Drop clock availability tests
  selftests: vDSO: vdso_test_abi: Use ksft_finished()
  selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
  selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
  vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
  vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
  vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
  vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
  vdso: Drop kconfig GENERIC_COMPAT_VDSO
  vdso: Drop kconfig GENERIC_VDSO_32
  riscv: vdso: Untangle Kconfig logic
  time: Build generic update_vsyscall() only with generic time vDSO
  vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
  vdso: Move ENABLE_COMPAT_VDSO from core to arm64
  ARM: VDSO: Remove cntvct_ok global variable
  vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30 16:58:21 -07:00
Linus Torvalds
6c7340a7a8 Merge tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Core scheduler changes:

   - Make migrate_{en,dis}able() inline, to improve performance
     (Menglong Dong)

   - Move STDL_INIT() functions out-of-line (Peter Zijlstra)

   - Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra)

  Fair scheduling:

   - Defer throttling to when tasks exit to user-space, to reduce the
     chance & impact of throttle-preemption with held locks and other
     resources (Aaron Lu, Valentin Schneider)

   - Get rid of sched_domains_curr_level hack for tl->cpumask(), as the
     warning was getting triggered on certain topologies (Peter
     Zijlstra)

  Misc cleanups & fixes:

   - Header cleanups (Menglong Dong)

   - Fix race in push_dl_task() (Harshit Agarwal)"

* tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix some typos in include/linux/preempt.h
  sched: Make migrate_{en,dis}able() inline
  rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h
  arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
  sched/fair: Do not balance task to a throttled cfs_rq
  sched/fair: Do not special case tasks in throttled hierarchy
  sched/fair: update_cfs_group() for throttled cfs_rqs
  sched/fair: Propagate load for throttled cfs_rq
  sched/fair: Get rid of throttled_lb_pair()
  sched/fair: Task based throttle time accounting
  sched/fair: Switch to task based throttle model
  sched/fair: Implement throttle task work and related helpers
  sched/fair: Add related data structure for task based throttle
  sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig
  sched: Move STDL_INIT() functions out-of-line
  sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  sched/deadline: Fix race in push_dl_task()
2025-09-30 10:35:11 -07:00
Linus Torvalds
feafee2845 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "There's good stuff across the board, including some nice mm
  improvements for CPUs with the 'noabort' BBML2 feature and a clever
  patch to allow ptdump to play nicely with block mappings in the
  vmalloc area.

  Confidential computing:

   - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
     and mapping them with appropriate attributes.

  CPU features:

   - Advertise atomic floating-point instructions to userspace

   - Extend Spectre workarounds to cover additional Arm CPU variants

   - Extend list of CPUs that support break-before-make level 2 and
     guarantee not to generate TLB conflict aborts for changes of
     mapping granularity (BBML2_NOABORT)

   - Add GCS support to our uprobes implementation.

  Documentation:

   - Remove bogus SME documentation concerning register state when
     entering/exiting streaming mode.

  Entry code:

   - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)

   - Micro-optimise syscall entry path with a compiler branch hint.

  Memory management:

   - Enable huge mappings in vmalloc space even when kernel page-table
     dumping is enabled

   - Tidy up the types used in our early MMU setup code

   - Rework rodata= for closer parity with the behaviour on x86

   - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
     linear map even when rodata= applies to virtual aliases

   - Don't re-allocate the virtual region between '_text' and '_stext',
     as doing so confused tools parsing /proc/vmcore.

  Miscellaneous:

   - Clean-up Kconfig menuconfig text for architecture features

   - Avoid redundant bitmap_empty() during determination of supported
     SME vector lengths

   - Re-enable warnings when building the 32-bit vDSO object

   - Avoid breaking our eggs at the wrong end.

  Perf and PMUs:

   - Support for v3 of the Hisilicon L3C PMU

   - Support for Hisilicon's MN and NoC PMUs

   - Support for Fujitsu's Uncore PMU

   - Support for SPE's extended event filtering feature

   - Preparatory work to enable data source filtering in SPE

   - Support for multiple lanes in the DWC PCIe PMU

   - Support for i.MX94 in the IMX DDR PMU driver

   - MAINTAINERS update (Thank you, Yicong)

   - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).

  Selftests:

   - Add basic LSFE check to the existing hwcaps test

   - Support nolibc in GCS tests

   - Extend SVE ptrace test to pass unsupported regsets and invalid
     vector lengths

   - Minor cleanups (typos, cosmetic changes).

  System registers:

   - Fix ID_PFR1_EL1 definition

   - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1

   - Sync TCR_EL1 definition with the latest Arm ARM (L.b)

   - Be stricter about the input fed into our AWK sysreg generator
     script

   - Typo fixes and removal of redundant definitions.

  ACPI, EFI and PSCI:

   - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
     support from the ACPI GHES code so that it can be used by platforms
     booted with device-tree

   - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
     runtime calls

   - Fix a node refcount imbalance in the PSCI device-tree code.

  CPU Features:

   - Ensure register sanitisation is applied to fields in ID_AA64MMFR4

   - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
     guests can reliably query the underlying CPU types from the VMM

   - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
     to our context-switching, signal handling and ptrace code"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64: cpufeature: Remove duplicate asm/mmu.h header
  arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
  perf/dwc_pcie: Fix use of uninitialized variable
  arm/syscalls: mark syscall invocation as likely in invoke_syscall
  Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
  Documentation: hisi-pmu: Fix of minor format error
  drivers/perf: hisi: Add support for L3C PMU v3
  drivers/perf: hisi: Refactor the event configuration of L3C PMU
  drivers/perf: hisi: Extend the field of tt_core
  drivers/perf: hisi: Extract the event filter check of L3C PMU
  drivers/perf: hisi: Simplify the probe process of each L3C PMU version
  drivers/perf: hisi: Export hisi_uncore_pmu_isr()
  drivers/perf: hisi: Relax the event ID check in the framework
  perf: Fujitsu: Add the Uncore PMU driver
  arm64: map [_text, _stext) virtual address range non-executable+read-only
  arm64/sysreg: Update TCR_EL1 register
  arm64: Enable vmalloc-huge with ptdump
  arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
  arm64: errata: Apply workarounds for Neoverse-V3AE
  arm64: cputype: Add Neoverse-V3AE definitions
  ...
2025-09-29 18:48:39 -07:00
Kees Cook
23ef9d4397 kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).

Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-24 14:29:14 -07:00
Will Deacon
4e4e36dce3 Merge branch 'for-next/uprobes' into for-next/core
* for-next/uprobes:
  arm64: probes: Fix incorrect bl/blr address and register usage
  uprobes: uprobe_warn should use passed task
  arm64: Kconfig: Remove GCS restrictions on UPROBES
  arm64: uprobes: Add GCS support to uretprobes
  arm64: probes: Add GCS support to bl/blr/ret
  arm64: uaccess: Add additional userspace GCS accessors
  arm64: uaccess: Move existing GCS accessors definitions to gcs.h
  arm64: probes: Break ret out from bl/blr
2025-09-24 16:35:06 +01:00
Will Deacon
77dfca70ba Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
  arm64: map [_text, _stext) virtual address range non-executable+read-only
  arm64: Enable vmalloc-huge with ptdump
  arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs
  arm64: mm: support large block mapping when rodata=full
  arm64: Enable permission change on arm64 kernel block mappings
  arm64/Kconfig: Remove CONFIG_RODATA_FULL_DEFAULT_ENABLED
  arm64: mm: Rework the 'rodata=' options
  arm64: mm: Represent physical memory with phys_addr_t and resource_size_t
  arm64: mm: Make map_fdt() return mapped pointer
  arm64: mm: Cast start/end markers to char *, not u64
2025-09-24 16:34:34 +01:00
Will Deacon
30f9386820 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
  arm64: Kconfig: Spell out "ARMv9.4" in menuconfig text
  arm64/fpsimd: simplify sme_setup()
2025-09-24 16:34:06 +01:00
Will Deacon
7df73a0049 Merge branch 'for-next/entry' into for-next/core
* for-next/entry:
  arm/syscalls: mark syscall invocation as likely in invoke_syscall
  arm64: entry: Switch to generic IRQ entry
  arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()
  arm64: entry: Refactor preempt_schedule_irq() check code
  entry: Add arch_irqentry_exit_need_resched() for arm64
  arm64: entry: Use preempt_count() and need_resched() helper
  arm64: entry: Rework arm64_preempt_schedule_irq()
  arm64: entry: Refactor the entry and exit for exceptions from EL1
  arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled()
2025-09-24 16:34:02 +01:00
Will Deacon
1cf89b6bf6 arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
Big-endian arm64 configurations are vanishingly rare, yet we still claim
to support them in Linux despite very limited testing or visible
interest. Supporting big-endian adds unnecessary burden to reviewers and
contributors which, without any known active users, is hard to justify.
For example, recent work to improve our futex routines and to implement
nested virtualisation support is non-trivially complicated by having to
support both big- and little-endianness.

Back in 2019 [1], it was claimed that Huawei were using arm64 big-endian
machines in their telecommunication products but I don't know whether
that's still the case and certainly haven't seen any patch contributions
to help support or maintain it.

Make CPU_BIG_ENDIAN depend on BROKEN as an initial deprecation step
towards its removal.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/linux-arm-kernel/73701e9f-bee1-7ae8-2277-7a3576171cd4@huawei.com/ [1]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-24 16:25:45 +01:00
Mark Rutland
0c33aa1804 arm64: errata: Apply workarounds for Neoverse-V3AE
Neoverse-V3AE is also affected by erratum #3312417, as described in its
Software Developer Errata Notice (SDEN) document:

  Neoverse V3AE (MP172) SDEN v9.0, erratum 3312417
  https://developer.arm.com/documentation/SDEN-2615521/9-0/

Enable the workaround for Neoverse-V3AE, and document this.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-22 11:27:14 +01:00
David Hildenbrand
84188a3ab5 arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE is
selected.

Link: https://lkml.kernel.org/r/20250901150359.867252-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:01 -07:00
Will Deacon
92d051a1c1 arm64: Kconfig: Spell out "ARMv9.4" in menuconfig text
The menuconfig entries to configure various architectural features are
all formatted as "ARMvx.y architecture features" with the unusual
exception of 9.4, which omits the "ARM" prefix.

Add the "ARM" prefix to the menuconfig entry for the ARMv9.4
architectural features.

Signed-off-by: Will Deacon <will@kernel.org>
2025-09-19 14:56:43 +01:00
Jeremy Linton
cc66c711e5 arm64: Kconfig: Remove GCS restrictions on UPROBES
Now that the uprobe paths have been made GCS compatible
drop the Kconfig restriction.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-16 21:34:48 +01:00
Huang Shijie
bfbbb0d321 arm64/Kconfig: Remove CONFIG_RODATA_FULL_DEFAULT_ENABLED
Now that 'rodata=full' has been removed in favour of parity with x86,
CONFIG_RODATA_FULL_DEFAULT_ENABLED no longer serves a useful purpose.

Remove it.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-16 20:53:13 +01:00
Jinjie Ruan
b3cf07851b arm64: entry: Switch to generic IRQ entry
Currently, x86, Riscv and Loongarch use the generic entry code, which
makes maintainer's work easier and code more elegant. Start converting
arm64 to use the generic entry infrastructure from kernel/entry/* by
switching it to generic IRQ entry, which removes 100+ lines of duplicate
code. arm64 will completely switch to generic entry in a later series.

The changes are below:
 - Remove *enter_from/exit_to_kernel_mode(), and wrap with generic
   irqentry_enter/exit() as their code and functionality are almost
   identical.

 - Define ARCH_EXIT_TO_USER_MODE_WORK and implement
   arch_exit_to_user_mode_work() to check arm64-specific thread flags
   "_TIF_MTE_ASYNC_FAULT" and "_TIF_FOREIGN_FPSTATE".
   So also remove *enter_from/exit_to_user_mode(), and wrap with
   generic enter_from/exit_to_user_mode() because they are
   exactly the same.

 - Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit()
   because they're exactly the same, so the temporary arm64 version
   irqentry_state can also be removed.

 - Remove PREEMPT_DYNAMIC code, as generic irqentry_exit_cond_resched()
   has the same functionality.

 - Implement arch_irqentry_exit_need_resched() with
   arm64_preempt_schedule_irq() for arm64 which will allow arm64 to do
   its architecture specific checks.

Tested-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Suggested-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-11 15:55:35 +01:00
Thomas Weißschuh
bad53ae2dc vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
All architectures implementing time-related functionality in the vDSO are
using the generic vDSO library which handles time namespaces properly.

Remove the now unnecessary Kconfig symbol.

Enables the use of time namespaces on architectures, which use the
generic vDSO but did not enable GENERIC_VDSO_TIME_NS, namely MIPS and arm.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-10-d9b65750e49f@linutronix.de
2025-09-04 11:23:50 +02:00
Thomas Weißschuh
7b338f6d4e vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
All users of the generic vDSO library also use the generic vDSO datastore.

Remove the now unnecessary Kconfig symbol.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-9-d9b65750e49f@linutronix.de
2025-09-04 11:23:50 +02:00
Thomas Weißschuh
bb5bc7bfab vdso: Drop kconfig GENERIC_COMPAT_VDSO
This configuration is never used.

Remove it.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-8-d9b65750e49f@linutronix.de
2025-09-04 11:23:50 +02:00
Peter Zijlstra
7bd291abe2 sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig
Like many Kconfig symbols, SCHED_{SMT,CLUSTER,MC} are duplicated
across arch/*/Kconfig. Try and clean up a little.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> # powerpc
Link: https://lkml.kernel.org/r/20250826094358.GG3245006@noisy.programming.kicks-ass.net
2025-09-03 10:03:13 +02:00
Nathan Chancellor
23cb051420 arm64: Remove tautological LLVM Kconfig conditions
Now that the minimum supported version of LLVM for building the kernel
has been bumped to 15.0.0, two sets of always true conditions can be
safely removed in the arm64 Kconfig file.

Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250821-bump-min-llvm-ver-15-v2-4-635f3294e5f0@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-08-28 16:58:44 -07:00
Linus Torvalds
beace86e61 Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
 "As usual, many cleanups. The below blurbiage describes 42 patchsets.
  21 of those are partially or fully cleanup work. "cleans up",
  "cleanup", "maintainability", "rationalizes", etc.

  I never knew the MM code was so dirty.

  "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
     addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
     mapped VMAs were not eligible for merging with existing adjacent
     VMAs.

  "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
     adds a new kernel module which simplifies the setup and usage of
     DAMON in production environments.

  "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
     is a cleanup to the writeback code which removes a couple of
     pointers from struct writeback_control.

  "drivers/base/node.c: optimization and cleanups" (Donet Tom)
     contains largely uncorrelated cleanups to the NUMA node setup and
     management code.

  "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
     does some maintenance work on the userfaultfd code.

  "Readahead tweaks for larger folios" (Ryan Roberts)
     implements some tuneups for pagecache readahead when it is reading
     into order>0 folios.

  "selftests/mm: Tweaks to the cow test" (Mark Brown)
     provides some cleanups and consistency improvements to the
     selftests code.

  "Optimize mremap() for large folios" (Dev Jain)
     does that. A 37% reduction in execution time was measured in a
     memset+mremap+munmap microbenchmark.

  "Remove zero_user()" (Matthew Wilcox)
     expunges zero_user() in favor of the more modern memzero_page().

  "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
     addresses some warts which David noticed in the huge page code.
     These were not known to be causing any issues at this time.

  "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
     provides some cleanup and consolidation work in DAMON.

  "use vm_flags_t consistently" (Lorenzo Stoakes)
     uses vm_flags_t in places where we were inappropriately using other
     types.

  "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
     increases the reliability of large page allocation in the memfd
     code.

  "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
     removes several now-unneeded PFN_* flags.

  "mm/damon: decouple sysfs from core" (SeongJae Park)
     implememnts some cleanup and maintainability work in the DAMON
     sysfs layer.

  "madvise cleanup" (Lorenzo Stoakes)
     does quite a lot of cleanup/maintenance work in the madvise() code.

  "madvise anon_name cleanups" (Vlastimil Babka)
     provides additional cleanups on top or Lorenzo's effort.

  "Implement numa node notifier" (Oscar Salvador)
     creates a standalone notifier for NUMA node memory state changes.
     Previously these were lumped under the more general memory
     on/offline notifier.

  "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
     cleans up the pageblock isolation code and fixes a potential issue
     which doesn't seem to cause any problems in practice.

  "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
     adds additional drgn- and python-based DAMON selftests which are
     more comprehensive than the existing selftest suite.

  "Misc rework on hugetlb faulting path" (Oscar Salvador)
     fixes a rather obscure deadlock in the hugetlb fault code and
     follows that fix with a series of cleanups.

  "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
     rationalizes and cleans up the highmem-specific code in the CMA
     allocator.

  "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
     provides cleanups and future-preparedness to the migration code.

  "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
     adds some tracepoints to some DAMON auto-tuning code.

  "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
     does that.

  "mm/damon: misc cleanups" (SeongJae Park)
     also does what it claims.

  "mm: folio_pte_batch() improvements" (David Hildenbrand)
     cleans up the large folio PTE batching code.

  "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
     facilitates dynamic alteration of DAMON's inter-node allocation
     policy.

  "Remove unmap_and_put_page()" (Vishal Moola)
     provides a couple of page->folio conversions.

  "mm: per-node proactive reclaim" (Davidlohr Bueso)
     implements a per-node control of proactive reclaim - beyond the
     current memcg-based implementation.

  "mm/damon: remove damon_callback" (SeongJae Park)
     replaces the damon_callback interface with a more general and
     powerful damon_call()+damos_walk() interface.

  "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
     implements a number of mremap cleanups (of course) in preparation
     for adding new mremap() functionality: newly permit the remapping
     of multiple VMAs when the user is specifying MREMAP_FIXED. It still
     excludes some specialized situations where this cannot be performed
     reliably.

  "drop hugetlb_free_pgd_range()" (Anthony Yznaga)
     switches some sparc hugetlb code over to the generic version and
     removes the thus-unneeded hugetlb_free_pgd_range().

  "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
     augments the present userspace-requested update of DAMON sysfs
     monitoring files. Automatic update is now provided, along with a
     tunable to control the update interval.

  "Some randome fixes and cleanups to swapfile" (Kemeng Shi)
     does what is claims.

  "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
     provides (and uses) a means by which debug-style functions can grab
     a copy of a pageframe and inspect it locklessly without tripping
     over the races inherent in operating on the live pageframe
     directly.

  "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
     addresses the large contention issues which can be triggered by
     reads from that procfs file. Latencies are reduced by more than
     half in some situations. The series also introduces several new
     selftests for the /proc/pid/maps interface.

  "__folio_split() clean up" (Zi Yan)
     cleans up __folio_split()!

  "Optimize mprotect() for large folios" (Dev Jain)
     provides some quite large (>3x) speedups to mprotect() when dealing
     with large folios.

  "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
     does some cleanup work in the selftests code.

  "tools/testing: expand mremap testing" (Lorenzo Stoakes)
     extends the mremap() selftest in several ways, including adding
     more checking of Lorenzo's recently added "permit mremap() move of
     multiple VMAs" feature.

  "selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
     extends the DAMON sysfs interface selftest so that it tests all
     possible user-requested parameters. Rather than the present minimal
     subset"

* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
  MAINTAINERS: add missing headers to mempory policy & migration section
  MAINTAINERS: add missing file to cgroup section
  MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
  MAINTAINERS: add missing zsmalloc file
  MAINTAINERS: add missing files to page alloc section
  MAINTAINERS: add missing shrinker files
  MAINTAINERS: move memremap.[ch] to hotplug section
  MAINTAINERS: add missing mm_slot.h file THP section
  MAINTAINERS: add missing interval_tree.c to memory mapping section
  MAINTAINERS: add missing percpu-internal.h file to per-cpu section
  mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
  selftests/damon: introduce _common.sh to host shared function
  selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
  selftests/damon/sysfs.py: test non-default parameters runtime commit
  selftests/damon/sysfs.py: generalize DAMON context commit assertion
  selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
  selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
  selftests/damon/sysfs.py: test DAMOS filters commitment
  selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
  selftests/damon/sysfs.py: test DAMOS destinations commitment
  ...
2025-07-31 14:57:54 -07:00
Linus Torvalds
63eb28bb14 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Host driver for GICv5, the next generation interrupt controller for
     arm64, including support for interrupt routing, MSIs, interrupt
     translation and wired interrupts

   - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
     GICv5 hardware, leveraging the legacy VGIC interface

   - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
     userspace to disable support for SGIs w/o an active state on
     hardware that previously advertised it unconditionally

   - Map supporting endpoints with cacheable memory attributes on
     systems with FEAT_S2FWB and DIC where KVM no longer needs to
     perform cache maintenance on the address range

   - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
     guest hypervisor to inject external aborts into an L2 VM and take
     traps of masked external aborts to the hypervisor

   - Convert more system register sanitization to the config-driven
     implementation

   - Fixes to the visibility of EL2 registers, namely making VGICv3
     system registers accessible through the VGIC device instead of the
     ONE_REG vCPU ioctls

   - Various cleanups and minor fixes

  LoongArch:

   - Add stat information for in-kernel irqchip

   - Add tracepoints for CPUCFG and CSR emulation exits

   - Enhance in-kernel irqchip emulation

   - Various cleanups

  RISC-V:

   - Enable ring-based dirty memory tracking

   - Improve perf kvm stat to report interrupt events

   - Delegate illegal instruction trap to VS-mode

   - MMU improvements related to upcoming nested virtualization

  s390x

   - Fixes

  x86:

   - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
     APIC, PIC, and PIT emulation at compile time

   - Share device posted IRQ code between SVM and VMX and harden it
     against bugs and runtime errors

   - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
     O(1) instead of O(n)

   - For MMIO stale data mitigation, track whether or not a vCPU has
     access to (host) MMIO based on whether the page tables have MMIO
     pfns mapped; using VFIO is prone to false negatives

   - Rework the MSR interception code so that the SVM and VMX APIs are
     more or less identical

   - Recalculate all MSR intercepts from scratch on MSR filter changes,
     instead of maintaining shadow bitmaps

   - Advertise support for LKGS (Load Kernel GS base), a new instruction
     that's loosely related to FRED, but is supported and enumerated
     independently

   - Fix a user-triggerable WARN that syzkaller found by setting the
     vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
     the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
     possible path leading to architecturally forbidden states is hard
     and even risks breaking userspace (if it goes from valid to valid
     state but passes through invalid states), so just wait until
     KVM_RUN to detect that the vCPU state isn't allowed

   - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
     interception of APERF/MPERF reads, so that a "properly" configured
     VM can access APERF/MPERF. This has many caveats (APERF/MPERF
     cannot be zeroed on vCPU creation or saved/restored on suspend and
     resume, or preserved over thread migration let alone VM migration)
     but can be useful whenever you're interested in letting Linux
     guests see the effective physical CPU frequency in /proc/cpuinfo

   - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
     created, as there's no known use case for changing the default
     frequency for other VM types and it goes counter to the very reason
     why the ioctl was added to the vm file descriptor. And also, there
     would be no way to make it work for confidential VMs with a
     "secure" TSC, so kill two birds with one stone

   - Dynamically allocation the shadow MMU's hashed page list, and defer
     allocating the hashed list until it's actually needed (the TDP MMU
     doesn't use the list)

   - Extract many of KVM's helpers for accessing architectural local
     APIC state to common x86 so that they can be shared by guest-side
     code for Secure AVIC

   - Various cleanups and fixes

  x86 (Intel):

   - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
     Failure to honor FREEZE_IN_SMM can leak host state into guests

   - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
     prevent L1 from running L2 with features that KVM doesn't support,
     e.g. BTF

  x86 (AMD):

   - WARN and reject loading kvm-amd.ko instead of panicking the kernel
     if the nested SVM MSRPM offsets tracker can't handle an MSR (which
     is pretty much a static condition and therefore should never
     happen, but still)

   - Fix a variety of flaws and bugs in the AVIC device posted IRQ code

   - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
     supports) instead of rejecting vCPU creation

   - Extend enable_ipiv module param support to SVM, by simply leaving
     IsRunning clear in the vCPU's physical ID table entry

   - Disable IPI virtualization, via enable_ipiv, if the CPU is affected
     by erratum #1235, to allow (safely) enabling AVIC on such CPUs

   - Request GA Log interrupts if and only if the target vCPU is
     blocking, i.e. only if KVM needs a notification in order to wake
     the vCPU

   - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
     the vCPU's CPUID model

   - Accept any SNP policy that is accepted by the firmware with respect
     to SMT and single-socket restrictions. An incompatible policy
     doesn't put the kernel at risk in any way, so there's no reason for
     KVM to care

   - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
     use WBNOINVD instead of WBINVD when possible for SEV cache
     maintenance

   - When reclaiming memory from an SEV guest, only do cache flushes on
     CPUs that have ever run a vCPU for the guest, i.e. don't flush the
     caches for CPUs that can't possibly have cache lines with dirty,
     encrypted data

  Generic:

   - Rework irqbypass to track/match producers and consumers via an
     xarray instead of a linked list. Using a linked list leads to
     O(n^2) insertion times, which is hugely problematic for use cases
     that create large numbers of VMs. Such use cases typically don't
     actually use irqbypass, but eliminating the pointless registration
     is a future problem to solve as it likely requires new uAPI

   - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
     "void *", to avoid making a simple concept unnecessarily difficult
     to understand

   - Decouple device posted IRQs from VFIO device assignment, as binding
     a VM to a VFIO group is not a requirement for enabling device
     posted IRQs

   - Clean up and document/comment the irqfd assignment code

   - Disallow binding multiple irqfds to an eventfd with a priority
     waiter, i.e. ensure an eventfd is bound to at most one irqfd
     through the entire host, and add a selftest to verify eventfd:irqfd
     bindings are globally unique

   - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
     related to private <=> shared memory conversions

   - Drop guest_memfd's .getattr() implementation as the VFS layer will
     call generic_fillattr() if inode_operations.getattr is NULL

   - Fix issues with dirty ring harvesting where KVM doesn't bound the
     processing of entries in any way, which allows userspace to keep
     KVM in a tight loop indefinitely

   - Kill off kvm_arch_{start,end}_assignment() and x86's associated
     tracking, now that KVM no longer uses assigned_device_count as a
     heuristic for either irqbypass usage or MDS mitigation

  Selftests:

   - Fix a comment typo

   - Verify KVM is loaded when getting any KVM module param so that
     attempting to run a selftest without kvm.ko loaded results in a
     SKIP message about KVM not being loaded/enabled (versus some random
     parameter not existing)

   - Skip tests that hit EACCES when attempting to access a file, and
     print a "Root required?" help message. In most cases, the test just
     needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
  Documentation: KVM: Use unordered list for pre-init VGIC registers
  RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
  RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
  RISC-V: perf/kvm: Add reporting of interrupt events
  RISC-V: KVM: Enable ring-based dirty memory tracking
  RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
  RISC-V: KVM: Delegate illegal instruction fault to VS mode
  RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
  RISC-V: KVM: Factor-out g-stage page table management
  RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
  RISC-V: KVM: Introduce struct kvm_gstage_mapping
  RISC-V: KVM: Factor-out MMU related declarations into separate headers
  RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
  RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
  RISC-V: KVM: Don't flush TLB when PTE is unchanged
  RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
  RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
  RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
  RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
  KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
  ...
2025-07-30 17:14:01 -07:00
Linus Torvalds
2be6a7503d Merge tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracepoint cleanup from Steven Rostedt:
 "Remove or hide unused tracepoints

  Tracepoints take up memory (around 5K per tracepoint) even when they
  are unused. Changes are being made to detect when a tracepoint is
  defined but unused and a warning is shown at build. But those changes
  are not yet ready for inclusion.

   - Fix some of the unused tracepoints that it detected

     Some tracepoints were removed and others were hidden by config
     settings to match the config settings of where they are
     instantiated. Some tracepoints were moved into architecture
     specific code as only one architecture used them.

   - Call the ftrace_test_filter tracepoint in an unreachable if
     statement

     The ftrace_test_filter tracepoint which is defined when ftrace
     selftests are configured and is used to test the filter logic, but
     the tracepoint is not actually called. It is put into an if
     statement to not have it get compiled out, but also not warn for
     not being used"

* tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING
  powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64
  tracing: Call trace_ftrace_test_filter() for the event
  tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
  binder: Remove unused binder lock events
  PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS
  PM: tracing: Hide device_pm_callback events under PM_SLEEP
  PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE
  PM: cpufreq: powernv/tracing: Move powernv_throttle trace event
  alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured
  tracing, AER: Hide PCIe AER event when PCIEAER is not configured
2025-07-30 16:41:58 -07:00
Linus Torvalds
90a871f74b Merge tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:

 - Keep track of when fgraph_ops are registered or not

   Keep accounting of when fgraph_ops are registered as if a fgraph_ops
   is registered twice it can mess up the accounting and it will not
   work as expected later. Trigger a warning if something registers it
   twice as to catch bugs before they are found by things just not
   working as expected.

 - Make DYNAMIC_FTRACE always enabled for architectures that support it

   As static ftrace (where all functions are always traced) is very
   expensive and only exists to help architectures support ftrace, do
   not make it an option. As soon as an architecture supports
   DYNAMIC_FTRACE make it use it. This simplifies the code.

 - Remove redundant config HAVE_FTRACE_MCOUNT_RECORD

   The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
   DYNAMIC_FTRACE work, but now every architecture that implements
   DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it
   redundant with the HAVE_DYNAMIC_FTRACE.

 - Make pid_ptr string size match the comment

   In print_graph_proc() the pid_ptr string is of size 11, but the
   comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12.

* tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
  ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it
  fgraph: Keep track of when fgraph_ops are registered or not
  fgraph: Make pid_str size match the comment
2025-07-30 16:04:10 -07:00
Linus Torvalds
6fb44438a5 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
 "A quick summary: perf support for Branch Record Buffer Extensions
  (BRBE), typical PMU hardware updates, small additions to MTE for
  store-only tag checking and exposing non-address bits to signal
  handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on.

  There is also a TLBI optimisation on hardware that does not require
  break-before-make when changing the user PTEs between contiguous and
  non-contiguous.

  More details:

  Perf and PMU updates:

   - Add support for new (v3) Hisilicon SLLC and DDRC PMUs

   - Add support for Arm-NI PMU integrations that share interrupts
     between clock domains within a given instance

   - Allow SPE to be configured with a lower sample period than the
     minimum recommendation advertised by PMSIDR_EL1.Interval

   - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)

   - Adjust the perf watchdog period according to cpu frequency changes

   - Minor driver fixes and cleanups

  Hardware features:

   - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)

   - Support for reporting the non-address bits during a synchronous MTE
     tag check fault (FEAT_MTE_TAGGED_FAR)

   - Optimise the TLBI when folding/unfolding contiguous PTEs on
     hardware with FEAT_BBM (break-before-make) level 2 and no TLB
     conflict aborts

  Software features:

   - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
     and using the text-poke API for late module relocations

   - Force VMAP_STACK always on and change arm64_efi_rt_init() to use
     arch_alloc_vmap_stack() in order to avoid KASAN false positives

  ACPI:

   - Improve SPCR handling and messaging on systems lacking an SPCR
     table

  Debug:

   - Simplify the debug exception entry path

   - Drop redundant DBG_MDSCR_* macros

  Kselftests:

   - Cleanups and improvements for SME, SVE and FPSIMD tests

  Miscellaneous:

   - Optimise loop to reduce redundant operations in contpte_ptep_get()

   - Remove ISB when resetting POR_EL0 during signal handling

   - Mark the kernel as tainted on SEA and SError panic

   - Remove redundant gcs_free() call"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64/gcs: task_gcs_el0_enable() should use passed task
  arm64: Kconfig: Keep selects somewhat alphabetically ordered
  arm64: signal: Remove ISB when resetting POR_EL0
  kselftest/arm64: Handle attempts to disable SM on SME only systems
  kselftest/arm64: Fix SVE write data generation for SME only systems
  kselftest/arm64: Test SME on SME only systems in fp-ptrace
  kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
  kselftest/arm64: Allow sve-ptrace to run on SME only systems
  arm64/mm: Drop redundant addr increment in set_huge_pte_at()
  kselftest/arm4: Provide local defines for AT_HWCAP3
  arm64: Mark kernel as tainted on SAE and SError panic
  arm64/gcs: Don't call gcs_free() when releasing task_struct
  drivers/perf: hisi: Support PMUs with no interrupt
  drivers/perf: hisi: Relax the event number check of v2 PMUs
  drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
  drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
  drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
  drivers/perf: hisi: Simplify the probe process for each DDRC version
  perf/arm-ni: Support sharing IRQs within an NI instance
  perf/arm-ni: Consolidate CPU affinity handling
  ...
2025-07-29 20:21:54 -07:00
Paolo Bonzini
314b40b3b6 Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1

 - Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.

 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.

 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.

 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.

 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.

 - Convert more system register sanitization to the config-driven
   implementation.

 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.

 - Various cleanups and minor fixes.
2025-07-29 12:27:40 -04:00
Linus Torvalds
a578dd095d Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:

 - Reorganize the architecture-optimized CRC code

   It now lives in lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/,
   and it is no longer artificially split into separate generic and arch
   modules. This allows better inlining and dead code elimination

   The generic CRC code is also no longer exported, simplifying the API.
   (This mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/,
   which can be found in the "Crypto library updates" pull request)

 - Improve crc32c() performance on newer x86_64 CPUs on long messages by
   enabling the VPCLMULQDQ optimized code

 - Simplify the crypto_shash wrappers for crc32_le() and crc32c()

   Register just one shash algorithm for each that uses the (fully
   optimized) library functions, instead of unnecessarily providing
   direct access to the generic CRC code

 - Remove unused and obsolete drivers for hardware CRC engines

 - Remove CRC-32 combination functions that are no longer used

 - Add kerneldoc for crc32_le(), crc32_be(), and crc32c()

 - Convert the crc32() macro to an inline function

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (26 commits)
  lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial
  lib/crc: x86: Reorganize crc-pclmul static_call initialization
  lib/crc: crc64: Add include/linux/crc64.h to kernel-api.rst
  lib/crc: crc32: Change crc32() from macro to inline function and remove cast
  nvmem: layouts: Switch from crc32() to crc32_le()
  lib/crc: crc32: Document crc32_le(), crc32_be(), and crc32c()
  lib/crc: Explicitly include <linux/export.h>
  lib/crc: Remove ARCH_HAS_* kconfig symbols
  lib/crc: x86: Migrate optimized CRC code into lib/crc/
  lib/crc: sparc: Migrate optimized CRC code into lib/crc/
  lib/crc: s390: Migrate optimized CRC code into lib/crc/
  lib/crc: riscv: Migrate optimized CRC code into lib/crc/
  lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
  lib/crc: mips: Migrate optimized CRC code into lib/crc/
  lib/crc: loongarch: Migrate optimized CRC code into lib/crc/
  lib/crc: arm64: Migrate optimized CRC code into lib/crc/
  lib/crc: arm: Migrate optimized CRC code into lib/crc/
  lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
  lib/crc: Move files into lib/crc/
  lib/crc32: Remove unused combination support
  ...
2025-07-28 17:43:29 -07:00
Linus Torvalds
8e736a2eea Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:

 - Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)

 - mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)

 - string: Group str_has_prefix() and strstarts() (Andy Shevchenko)

 - Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)

 - Refactor and rename stackleak feature to support Clang

 - Add KUnit test for seq_buf API

 - Fix KUnit fortify test under LTO

* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
  sched/task_stack: Add missing const qualifier to end_of_stack()
  kstack_erase: Support Clang stack depth tracking
  kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
  init.h: Disable sanitizer coverage for __init and __head
  kstack_erase: Disable kstack_erase for all of arm compressed boot code
  x86: Handle KCOV __init vs inline mismatches
  arm64: Handle KCOV __init vs inline mismatches
  s390: Handle KCOV __init vs inline mismatches
  arm: Handle KCOV __init vs inline mismatches
  mips: Handle KCOV __init vs inline mismatch
  powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
  configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
  configs/hardening: Enable CONFIG_KSTACK_ERASE
  stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
  stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
  stackleak: Rename STACKLEAK to KSTACK_ERASE
  seq_buf: Introduce KUnit tests
  string: Group str_has_prefix() and strstarts()
  kunit/fortify: Add back "volatile" for sizeof() constants
  acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
  ...
2025-07-28 17:16:12 -07:00
Catalin Marinas
3ae8cef210 Merge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', 'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (23 commits)
  drivers/perf: hisi: Support PMUs with no interrupt
  drivers/perf: hisi: Relax the event number check of v2 PMUs
  drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
  drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
  drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
  drivers/perf: hisi: Simplify the probe process for each DDRC version
  perf/arm-ni: Support sharing IRQs within an NI instance
  perf/arm-ni: Consolidate CPU affinity handling
  perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation
  perf/cxlpmu: Remove unintended newline from IRQ name format string
  perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe()
  perf: arm_spe: Relax period restriction
  perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  KVM: arm64: nvhe: Disable branch generation in nVHE guests
  arm64: Handle BRBE booting requirements
  arm64/sysreg: Add BRBE registers and fields
  perf/arm: Add missing .suppress_bind_attrs
  perf/arm-cmn: Reduce stack usage during discovery
  perf: imx9_perf: make the read-only array mask static const
  perf/arm-cmn: Broaden module description for wider interconnect support
  ...

* for-next/livepatch:
  : Support for HAVE_LIVEPATCH on arm64
  arm64: Kconfig: Keep selects somewhat alphabetically ordered
  arm64: Implement HAVE_LIVEPATCH
  arm64: stacktrace: Implement arch_stack_walk_reliable()
  arm64: stacktrace: Check kretprobe_find_ret_addr() return value
  arm64/module: Use text-poke API for late relocations.

* for-next/user-contig-bbml2:
  : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts
  arm64/mm: Elide tlbi in contpte_convert() under BBML2
  iommu/arm: Add BBM Level 2 smmu feature
  arm64: Add BBM Level 2 cpu feature
  arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type

* for-next/misc:
  : Miscellaneous arm64 patches
  arm64/gcs: task_gcs_el0_enable() should use passed task
  arm64: signal: Remove ISB when resetting POR_EL0
  arm64/mm: Drop redundant addr increment in set_huge_pte_at()
  arm64: Mark kernel as tainted on SAE and SError panic
  arm64/gcs: Don't call gcs_free() when releasing task_struct
  arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y
  arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get
  arm64: pi: use 'targets' instead of extra-y in Makefile

* for-next/acpi:
  : Various ACPI arm64 changes
  ACPI: Suppress misleading SPCR console message when SPCR table is absent
  ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled

* for-next/debug-entry:
  : Simplify the debug exception entry path
  arm64: debug: remove debug exception registration infrastructure
  arm64: debug: split bkpt32 exception entry
  arm64: debug: split brk64 exception entry
  arm64: debug: split hardware watchpoint exception entry
  arm64: debug: split single stepping exception entry
  arm64: debug: refactor reinstall_suspended_bps()
  arm64: debug: split hardware breakpoint exception entry
  arm64: entry: Add entry and exit functions for debug exceptions
  arm64: debug: remove break/step handler registration infrastructure
  arm64: debug: call step handlers statically
  arm64: debug: call software breakpoint handlers statically
  arm64: refactor aarch32_break_handler()
  arm64: debug: clean up single_step_handler logic

* for-next/feat_mte_tagged_far:
  : Support for reporting the non-address bits during a synchronous MTE tag check fault
  kselftest/arm64/mte: Add mtefar tests on check_mmap_options
  kselftest/arm64/mte: Refactor check_mmap_option test
  kselftest/arm64/mte: Add verification for address tag in signal handler
  kselftest/arm64/mte: Add address tag related macro and function
  kselftest/arm64/mte: Check MTE_FAR feature is supported
  kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS
  kselftest/arm64: Add MTE_FAR hwcap test
  KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest
  arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported
  arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature

* for-next/kselftest:
  : Kselftest updates for arm64
  kselftest/arm64: Handle attempts to disable SM on SME only systems
  kselftest/arm64: Fix SVE write data generation for SME only systems
  kselftest/arm64: Test SME on SME only systems in fp-ptrace
  kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
  kselftest/arm64: Allow sve-ptrace to run on SME only systems
  kselftest/arm4: Provide local defines for AT_HWCAP3
  kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
  kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
  kselftest/arm64: Fix check for setting new VLs in sve-ptrace
  kselftest/arm64: Convert tpidr2 test to use kselftest.h

* for-next/mdscr-cleanup:
  : Drop redundant DBG_MDSCR_* macros
  KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t
  arm64/debug: Drop redundant DBG_MDSCR_* macros

* for-next/vmap-stack:
  : Force VMAP_STACK on arm64
  arm64: remove CONFIG_VMAP_STACK checks from entry code
  arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling
  arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic
  arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack
  arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup
  arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN
  arm64: efi: Remove CONFIG_VMAP_STACK check
  arm64: Mandate VMAP_STACK
  arm64: efi: Fix KASAN false positive for EFI runtime stack
  arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()
  arm64/gcs: Don't call gcs_free() during flush_gcs()
  arm64: Restrict pagetable teardown to avoid false warning
  docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-07-24 16:01:22 +01:00
Steven Rostedt
9f0cb91767 tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
The ipi tracepoints are mostly generic, but the tracepoints ipi_raise,
ipi_entry and ipi_exit are only used by arm and arm64. This means these
trace events are wasting memory in all the other architectures that do not
use them.

Add CONFIG_HAVE_EXTRA_IPI_TRACEPOINTS and have arm and arm64 select it to
enable these trace events. The config makes it easy if other architectures
decide to trace these as well.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Will Deacon <will@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250722103714.64eba013@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-23 14:58:55 -04:00
Steven Rostedt
4d6d0a6263 tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
Ftrace is tightly coupled with architecture specific code because it
requires the use of trampolines written in assembly. This means that when
a new feature or optimization is made, it must be done for all
architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are
added to denote which architecture has the new enhancement so that other
architectures can still function until they too have been updated.

The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant
with the HAVE_DYNAMIC_FTRACE.

Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where
applicable.

Link: https://lore.kernel.org/all/20250703154916.48e3ada7@gandalf.local.home/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250704104838.27a18690@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:15:56 -04:00
Catalin Marinas
8e7a67ca5a arm64: Kconfig: Keep selects somewhat alphabetically ordered
Recent patches selecting HAVE_RELIABLE_STACKTRACE and HAVE_LIVEPATCH
added them to the end of the ARM64 Kconfig select list. Move them around
to keep this list nearly alphabetically ordered.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-22 10:55:48 +01:00
Kees Cook
57fbad15c2 stackleak: Rename STACKLEAK to KSTACK_ERASE
In preparation for adding Clang sanitizer coverage stack depth tracking
that can support stack depth callbacks:

- Add the new top-level CONFIG_KSTACK_ERASE option which will be
  implemented either with the stackleak GCC plugin, or with the Clang
  stack depth callback support.
- Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE,
  but keep it for anything specific to the GCC plugin itself.
- Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named
  for what it does rather than what it protects against), but leave as
  many of the internals alone as possible to avoid even more churn.

While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS,
since that's the only place it is referenced from.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-21 21:35:01 -07:00
Alistair Popple
d438d27341 mm: remove devmap related functions and page table bits
Now that DAX and all other reference counts to ZONE_DEVICE pages are
managed normally there is no need for the special devmap PTE/PMD/PUD page
table bits.  So drop all references to these, freeing up a software
defined page table bit on architectures supporting it.

Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Will Deacon <will@kernel.org> # arm64
Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: John Groves <john@groves.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:18 -07:00
Lorenzo Pieralisi
53bb952a62 arm64: Kconfig: Enable GICv5
Enable GICv5 driver code for the ARM64 architecture.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-31-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:52 +01:00
Breno Leitao
ef6861b8e6 arm64: Mandate VMAP_STACK
On arm64, VMAP_STACK has been enabled by default for a while now, and
the only reason to disable it was a historical lack of support for
KASAN_VMALLOC. Today there's no good reason to disable VMAP_STACK.

Mandate VMAP_STACK, which will allow code to be simplified in
subsequent patches.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707-arm64_vmap-v1-1-8de98ca0f91c@debian.org
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08 13:41:08 +01:00
Marc Zyngier
727c2a53cf arm64: Unconditionally select CONFIG_JUMP_LABEL
Aneesh reports that his kernel fails to boot in nVHE mode with
KVM's protected mode enabled. Further investigation by Mostafa
reveals that this fails because CONFIG_JUMP_LABEL=n and that
we have static keys shared between EL1 and EL2.

While this can be worked around, it is obvious that we have long
relied on having CONFIG_JUMP_LABEL enabled at all times, as all
supported compilers now have 'asm goto' (which is the basic block
for jump labels).

Let's simplify our lives once and for all by mandating jump labels.
It's not like anyone else is testing anything without them, and
we already rely on them for other things (kfence, xfs, preempt).

Link: https://lore.kernel.org/r/yq5ah60pkq03.fsf@kernel.org
Reported-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Reported-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250613141936.2219895-1-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04 14:47:51 +01:00
Song Liu
fd1e0fd71f arm64: Implement HAVE_LIVEPATCH
Allocate a task flag used to represent the patch pending state for the
task. When a livepatch is being loaded or unloaded, the livepatch code
uses this flag to select the proper version of a being patched kernel
functions to use for current task.

In arch/arm64/Kconfig, select HAVE_LIVEPATCH and include proper Kconfig.

This is largely based on [1] by Suraj Jitindar Singh.

[1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/

Cc: Suraj Jitindar Singh <surajjs@amazon.com>
Cc: Torsten Duwe <duwe@suse.de>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Tested-by: Breno Leitao <leitao@debian.org>
Tested-by: Andrea della Porta <andrea.porta@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250630174502.842486-1-song@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-01 15:18:53 +01:00
Eric Biggers
2b7531b2a2 lib/crc: arm64: Migrate optimized CRC code into lib/crc/
Move the arm64-optimized CRC code from arch/arm64/lib/crc* into its new
location in lib/crc/arm64/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Song Liu
805f13e403 arm64: stacktrace: Implement arch_stack_walk_reliable()
Add arch_stack_walk_reliable(), which will be used during kernel live
patching to detect when threads have completed executing old versions of
functions.

Note that arch_stack_walk_reliable() only needs to guarantee that it
returns an error code when it cannot provide a reliable stacktrace. It
is not required to provide a reliable stacktrace in all scenarios so
long as it returns said error code.

At present we can only reliably unwind up to an exception boundary. In
future we should be able to improve this with additional data from the
compiler (e.g. sframe).

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250320171559.3423224-2-song@kernel.org
[ Mark: Simplify logic, clarify commit message ]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrea della Porta <andrea.porta@suse.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Song Liu <song@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250521111000.2237470-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-20 13:10:09 +01:00
Linus Torvalds
00c010e130 Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
   creating a pte which addresses the first page in a folio and reduces
   the amount of plumbing which architecture must implement to provide
   this.

 - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
   largely unrelated folio infrastructure changes which clean things up
   and better prepare us for future work.

 - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
   Price adds early-init code to prevent x86 from leaving physical
   memory unused when physical address regions are not aligned to memory
   block size.

 - "mm/compaction: allow more aggressive proactive compaction" from
   Michal Clapinski provides some tuning of the (sadly, hard-coded (more
   sadly, not auto-tuned)) thresholds for our invokation of proactive
   compaction. In a simple test case, the reduction of a guest VM's
   memory consumption was dramatic.

 - "Minor cleanups and improvements to swap freeing code" from Kemeng
   Shi provides some code cleaups and a small efficiency improvement to
   this part of our swap handling code.

 - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
   adds the ability for a ptracer to modify syscalls arguments. At this
   time we can alter only "system call information that are used by
   strace system call tampering, namely, syscall number, syscall
   arguments, and syscall return value.

   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.

 - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
   Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
   against /proc/pid/pagemap. This permits CRIU to more efficiently get
   at the info about guard regions.

 - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
   implements that fix. No runtime effect is expected because
   validate_page_before_insert() happens to fix up this error.

 - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
   Hildenbrand basically brings uprobe text poking into the current
   decade. Remove a bunch of hand-rolled implementation in favor of
   using more current facilities.

 - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
   Khandual provides enhancements and generalizations to the pte dumping
   code. This might be needed when 128-bit Page Table Descriptors are
   enabled for ARM.

 - "Always call constructor for kernel page tables" from Kevin Brodsky
   ensures that the ctor/dtor is always called for kernel pgtables, as
   it already is for user pgtables.

   This permits the addition of more functionality such as "insert hooks
   to protect page tables". This change does result in various
   architectures performing unnecesary work, but this is fixed up where
   it is anticipated to occur.

 - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
   Ryhl adds plumbing to permit Rust access to core MM structures.

 - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
   Stoakes takes advantage of some VMA merging opportunities which we've
   been missing for 15 years.

 - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
   SeongJae Park optimizes process_madvise()'s TLB flushing.

   Instead of flushing each address range in the provided iovec, we
   batch the flushing across all the iovec entries. The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.

 - "Track node vacancy to reduce worst case allocation counts" from
   Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.

   stress-ng mmap performance increased by single-digit percentages and
   the amount of unnecessarily preallocated memory was dramaticelly
   reduced.

 - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
   a few unnecessary things which Baoquan noted when reading the code.

 - ""Enhance sysfs handling for memory hotplug in weighted interleave"
   from Rakie Kim "enhances the weighted interleave policy in the memory
   management subsystem by improving sysfs handling, fixing memory
   leaks, and introducing dynamic sysfs updates for memory hotplug
   support". Fixes things on error paths which we are unlikely to hit.

 - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
   from SeongJae Park introduces new DAMOS quota goal metrics which
   eliminate the manual tuning which is required when utilizing DAMON
   for memory tiering.

 - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
   provides cleanups and small efficiency improvements which Baoquan
   found via code inspection.

 - "vmscan: enforce mems_effective during demotion" from Gregory Price
   changes reclaim to respect cpuset.mems_effective during demotion when
   possible. because presently, reclaim explicitly ignores
   cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated.

   This is useful for isolating workloads on a multi-tenant system from
   certain classes of memory more consistently.

 - "Clean up split_huge_pmd_locked() and remove unnecessary folio
   pointers" from Gavin Guo provides minor cleanups and efficiency gains
   in in the huge page splitting and migrating code.

 - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
   for `struct mem_cgroup', yielding improved memory utilization.

 - "add max arg to swappiness in memory.reclaim and lru_gen" from
   Zhongkun He adds a new "max" argument to the "swappiness=" argument
   for memory.reclaim MGLRU's lru_gen.

   This directs proactive reclaim to reclaim from only anon folios
   rather than file-backed folios.

 - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
   first step on the path to permitting the kernel to maintain existing
   VMs while replacing the host kernel via file-based kexec. At this
   time only memblock's reserve_mem is preserved.

 - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
   and uses a smarter way of looping over a pfn range. By skipping
   ranges of invalid pfns.

 - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
   cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
   when a task is pinned a single NUMA mode.

   Dramatic performance benefits were seen in some real world cases.

 - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
   Garg addresses a warning which occurs during memory compaction when
   using JFS.

 - "move all VMA allocation, freeing and duplication logic to mm" from
   Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
   appropriate mm/vma.c.

 - "mm, swap: clean up swap cache mapping helper" from Kairui Song
   provides code consolidation and cleanups related to the folio_index()
   function.

 - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.

 - "memcg: Fix test_memcg_min/low test failures" from Waiman Long
   addresses some bogus failures which are being reported by the
   test_memcontrol selftest.

 - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
   Stoakes commences the deprecation of file_operations.mmap() in favor
   of the new file_operations.mmap_prepare().

   The latter is more restrictive and prevents drivers from messing with
   things in ways which, amongst other problems, may defeat VMA merging.

 - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
   the per-cpu memcg charge cache from the objcg's one.

   This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.

 - "mm/damon: minor fixups and improvements for code, tests, and
   documents" from SeongJae Park is yet another batch of miscellaneous
   DAMON changes. Fix and improve minor problems in code, tests and
   documents.

 - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
   stats to be irq safe. Another step along the way to making memcg
   charging and stats updates NMI-safe, a BPF requirement.

 - "Let unmap_hugepage_range() and several related functions take folio
   instead of page" from Fan Ni provides folio conversions in the
   hugetlb code.

* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
  mm: pcp: increase pcp->free_count threshold to trigger free_high
  mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
  mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: pass folio instead of page to unmap_ref_private()
  memcg: objcg stock trylock without irq disabling
  memcg: no stock lock for cpu hot-unplug
  memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
  memcg: make count_memcg_events re-entrant safe against irqs
  memcg: make mod_memcg_state re-entrant safe against irqs
  memcg: move preempt disable to callers of memcg_rstat_updated
  memcg: memcg_rstat_updated re-entrant safe against irqs
  mm: khugepaged: decouple SHMEM and file folios' collapse
  selftests/eventfd: correct test name and improve messages
  alloc_tag: check mem_profiling_support in alloc_tag_init
  Docs/damon: update titles and brief introductions to explain DAMOS
  selftests/damon/_damon_sysfs: read tried regions directories in order
  mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
  mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
  mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
  ...
2025-05-31 15:44:16 -07:00
Linus Torvalds
dee264c16a Merge tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull compiler version requirement update from Arnd Bergmann:
 "Require gcc-8 and binutils-2.30

  x86 already uses gcc-8 as the minimum version, this changes all other
  architectures to the same version. gcc-8 is used is Debian 10 and Red
  Hat Enterprise Linux 8, both of which are still supported, and
  binutils 2.30 is the oldest corresponding version on those.

  Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as
  the system compiler but additionally include toolchains that remain
  supported.

  With the new minimum toolchain versions, a number of workarounds for
  older versions can be dropped, in particular on x86_64 and arm64.
  Importantly, the updated compiler version allows removing two of the
  five remaining gcc plugins, as support for sancov and structeak
  features is already included in modern compiler versions.

  I tried collecting the known changes that are possible based on the
  new toolchain version, but expect that more cleanups will be possible.

  Since this touches multiple architectures, I merged the patches
  through the asm-generic tree."

* tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV
  Documentation: update binutils-2.30 version reference
  gcc-plugins: remove SANCOV gcc plugin
  Kbuild: remove structleak gcc plugin
  arm64: drop binutils version checks
  raid6: skip avx512 checks
  kbuild: require gcc-8 and binutils-2.30
2025-05-31 08:16:52 -07:00
Linus Torvalds
43db111107 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "As far as x86 goes this pull request "only" includes TDX host support.

  Quotes are appropriate because (at 6k lines and 100+ commits) it is
  much bigger than the rest, which will come later this week and
  consists mostly of bugfixes and selftests. s390 changes will also come
  in the second batch.

  ARM:

   - Add large stage-2 mapping (THP) support for non-protected guests
     when pKVM is enabled, clawing back some performance.

   - Enable nested virtualisation support on systems that support it,
     though it is disabled by default.

   - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
     and protected modes.

   - Large rework of the way KVM tracks architecture features and links
     them with the effects of control bits. While this has no functional
     impact, it ensures correctness of emulation (the data is
     automatically extracted from the published JSON files), and helps
     dealing with the evolution of the architecture.

   - Significant changes to the way pKVM tracks ownership of pages,
     avoiding page table walks by storing the state in the hypervisor's
     vmemmap. This in turn enables the THP support described above.

   - New selftest checking the pKVM ownership transition rules

   - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
     even if the host didn't have it.

   - Fixes for the address translation emulation, which happened to be
     rather buggy in some specific contexts.

   - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
     from the number of counters exposed to a guest and addressing a
     number of issues in the process.

   - Add a new selftest for the SVE host state being corrupted by a
     guest.

   - Keep HCR_EL2.xMO set at all times for systems running with the
     kernel at EL2, ensuring that the window for interrupts is slightly
     bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

   - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
     from a pretty bad case of TLB corruption unless accesses to HCR_EL2
     are heavily synchronised.

   - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
     tables in a human-friendly fashion.

   - and the usual random cleanups.

  LoongArch:

   - Don't flush tlb if the host supports hardware page table walks.

   - Add KVM selftests support.

  RISC-V:

   - Add vector registers to get-reg-list selftest

   - VCPU reset related improvements

   - Remove scounteren initialization from VCPU reset

   - Support VCPU reset from userspace using set_mpstate() ioctl

  x86:

   - Initial support for TDX in KVM.

     This finally makes it possible to use the TDX module to run
     confidential guests on Intel processors. This is quite a large
     series, including support for private page tables (managed by the
     TDX module and mirrored in KVM for efficiency), forwarding some
     TDVMCALLs to userspace, and handling several special VM exits from
     the TDX module.

     This has been in the works for literally years and it's not really
     possible to describe everything here, so I'll defer to the various
     merge commits up to and including commit 7bcf7246c4 ('Merge
     branch 'kvm-tdx-finish-initial' into HEAD')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
  x86/tdx: mark tdh_vp_enter() as __flatten
  Documentation: virt/kvm: remove unreferenced footnote
  RISC-V: KVM: lock the correct mp_state during reset
  KVM: arm64: Fix documentation for vgic_its_iter_next()
  KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  KVM: arm64: Stage-2 huge mappings for np-guests
  KVM: arm64: Add a range to pkvm_mappings
  KVM: arm64: Convert pkvm_mappings to interval tree
  KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
  KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
  KVM: arm64: Add a range to __pkvm_host_unshare_guest()
  KVM: arm64: Add a range to __pkvm_host_share_guest()
  KVM: arm64: Introduce for_each_hyp_page
  KVM: arm64: Handle huge mappings for np-guest CMOs
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  RISC-V: KVM: Remove scounteren initialization
  KVM: RISC-V: remove unnecessary SBI reset state
  ...
2025-05-29 08:10:01 -07:00
Will Deacon
53a087046a Merge branch 'for-next/sme-fixes' into for-next/core
* for-next/sme-fixes: (35 commits)
  arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected
  arm64/fpsimd: ptrace: Gracefully handle errors
  arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state
  arm64/fpsimd: ptrace: Do not present register data for inactive mode
  arm64/fpsimd: ptrace: Save task state before generating SVE header
  arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state
  arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data
  arm64/fpsimd: Make clone() compatible with ZA lazy saving
  arm64/fpsimd: Clear PSTATE.SM during clone()
  arm64/fpsimd: Consistently preserve FPSIMD state during clone()
  arm64/fpsimd: Remove redundant task->mm check
  arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return()
  arm64/fpsimd: Add task_smstop_sm()
  arm64/fpsimd: Factor out {sve,sme}_state_size() helpers
  arm64/fpsimd: Clarify sve_sync_*() functions
  arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE
  arm64/fpsimd: signal: Consistently read FPSIMD context
  arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state
  arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only
  arm64/fpsimd: Do not discard modified SVE state
  ...
2025-05-27 12:26:43 +01:00
Will Deacon
c73497194a Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
  arm64/boot: Disallow BSS exports to startup code
  arm64/boot: Move global CPU override variables out of BSS
  arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace
  arm64: mm: Drop redundant check in pmd_trans_huge()
  arm64/mm: Permit lazy_mmu_mode to be nested
  arm64/mm: Disable barrier batching in interrupt contexts
  arm64/mm: Batch barriers when updating kernel mappings
  mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes
  arm64/mm: Support huge pte-mapped pages in vmap
  mm/vmalloc: Gracefully unmap huge ptes
  mm/vmalloc: Warn on improper use of vunmap_range()
  arm64/mm: Hoist barriers out of set_ptes_anysz() loop
  arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz()
  arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear()
  mm/page_table_check: Batch-check pmds/puds just like ptes
  arm64: hugetlb: Refine tlb maintenance scope
  arm64: hugetlb: Cleanup huge_pte size discovery mechanisms
  arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings
  arm64: Support ARM64_VA_BITS=52 when setting ARCH_MMAP_RND_BITS_MAX
  arm64/mm: Remove randomization of the linear map
2025-05-27 12:26:06 +01:00
Will Deacon
9d27622f7d Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64/cpuinfo: only show one cpu's info in c_show()
  arm64: Extend pr_crit message on invalid FDT
  arm64: Kconfig: remove unnecessary selection of CRC32
  arm64: Add missing includes for mem_encrypt
2025-05-27 12:25:58 +01:00
D Scott Phillips
fed55f49fa arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous
translations for data addresses initiated by load/store instructions.
Only instruction initiated translations are vulnerable, not translations
from prefetches for example. A DSB before the store to HCR_EL2 is
sufficient to prevent older instructions from hitting the window for
corruption, and an ISB after is sufficient to prevent younger
instructions from hitting the window for corruption.

Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 12:46:26 +01:00
Alexander Graf
274cdcb1c0 arm64: add KHO support
We now have all bits in place to support KHO kexecs.  Add awareness of KHO
in the kexec file as well as boot path for arm64 and adds the respective
kconfig option to the architecture so that it can use KHO successfully.

Changes to the "chosen" node have been sent to
https://github.com/devicetree-org/dt-schema/pull/158.

Link: https://lkml.kernel.org/r/20250509074635.3187114-10-changyuanl@google.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Co-developed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:41 -07:00