Commit Graph

20689 Commits

Author SHA1 Message Date
Dave Airlie
9356b50af5 Merge tag 'drm-misc-next-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:

UAPI Changes:
- Add Task Information for the wedge API

Cross-subsystem Changes:

Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests

Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250619-perfect-industrious-whippet-8ed3db@houat
2025-06-20 11:34:09 +10:00
Thomas Zimmermann
33b4e4fcd2 video: Make global edid_info depend on CONFIG_FIRMWARE_EDID
Protect global edid_info behind CONFIG_FIRMWARE_EDID and remove
the config tests for CONFIG_X86. Makes edid_info available iff
its option has been enabled.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Link: https://lore.kernel.org/r/20250602075537.137759-3-tzimmermann@suse.de
2025-06-16 11:00:29 +02:00
Rafael J. Wysocki
dd3581853c Merge branch 'pm-cpuidle'
Merge cpuidle updates for 6.16-rc2:

 - Update data types of variables passed as arguments to
   mwait_idle_with_hints() to match the function definition
   after recent changes (Uros Bizjak).

 - Eliminate mwait_play_dead_cpuid_hint() again after reverting its
   elimination during the merge window due to a problem with handling
   "dead" SMT siblings, but this time prevent leaving them in C1 after
   initialization by taking them online and back offline when a proper
   cpuidle driver for the platform has been registered (Rafael Wysocki).

* pm-cpuidle:
  intel_idle: Update arguments of mwait_idle_with_hints()
  Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
  ACPI: processor: Rescan "dead" SMT siblings during initialization
  intel_idle: Rescan "dead" SMT siblings during initialization
  x86/smp: PM/hibernate: Split arch_resume_nosmt()
  intel_idle: Use subsys_initcall_sync() for initialization
2025-06-13 21:28:07 +02:00
Linus Torvalds
0529ef8c36 Merge tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - Cure IO bitmap inconsistencies

     A failed fork cleans up all resources of the newly created thread
     via exit_thread(). exit_thread() invokes io_bitmap_exit() which
     does the IO bitmap cleanups, which unfortunately assume that the
     cleanup is related to the current task, which is obviously bogus.

     Make it work correctly

   - A lockdep fix in the resctrl code removed the clearing of the
     command buffer in two places, which keeps stale error messages
     around. Bring them back.

   - Remove unused trace events"

* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
  x86/iopl: Cure TIF_IO_BITMAP inconsistencies
  x86/fpu: Remove unused trace events
2025-06-08 11:27:20 -07:00
Linus Torvalds
8630c59e99 Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which
   exports a symbol only to specified modules

 - Improve ABI handling in gendwarfksyms

 - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n

 - Add checkers for redundant or missing <linux/export.h> inclusion

 - Deprecate the extra-y syntax

 - Fix a genksyms bug when including enum constants from *.symref files

* tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
  genksyms: Fix enum consts from a reference affecting new values
  arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
  kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
  efi/libstub: use 'targets' instead of extra-y in Makefile
  module: make __mod_device_table__* symbols static
  scripts/misc-check: check unnecessary #include <linux/export.h> when W=1
  scripts/misc-check: check missing #include <linux/export.h> when W=1
  scripts/misc-check: add double-quotes to satisfy shellcheck
  kbuild: move W=1 check for scripts/misc-check to top-level Makefile
  scripts/tags.sh: allow to use alternative ctags implementation
  kconfig: introduce menu type enum
  docs: symbol-namespaces: fix reST warning with literal block
  kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n
  tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
  docs/core-api/symbol-namespaces: drop table of contents and section numbering
  modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time
  kbuild: move kbuild syntax processing to scripts/Makefile.build
  Makefile: remove dependency on archscripts for header installation
  Documentation/kbuild: Add new gendwarfksyms kABI rules
  Documentation/kbuild: Drop section numbers
  ...
2025-06-07 10:05:35 -07:00
Rafael J. Wysocki
a18d098f2a Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
Revert commit 70523f3357 ("Revert "x86/smp: Eliminate
mwait_play_dead_cpuid_hint()"") to reapply the changes from commit
96040f7273 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
reverted by it.

Previously, these changes caused idle power to rise on systems booting
with "nosmt" in the kernel command line because they effectively caused
"dead" SMT siblings to remain in idle state C1 after executing the HLT
instruction, which prevented the processor from reaching package idle
states deeper than PC2 going forward.

Now, the "dead" SMT siblings are rescanned after initializing a proper
cpuidle driver for the processor (either intel_idle or ACPI idle), at
which point they are able to enter a sufficiently deep idle state
in native_play_dead() via cpuidle, so the code changes in question can
be reapplied.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/7813065.EvYhyI6sBW@rjwysocki.net
2025-06-07 14:23:22 +02:00
Rafael J. Wysocki
4c529a4a72 x86/smp: PM/hibernate: Split arch_resume_nosmt()
Move the inner part of the arch_resume_nosmt() code into a separate
function called arch_cpu_rescan_dead_smt_siblings(), so it can be
used in other places where "dead" SMT siblings may need to be taken
online and offline again in order to get into deep idle states.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/3361688.44csPzL39Z@rjwysocki.net
[ rjw: Prevent build issues with CONFIG_SMP unset ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-06-07 14:22:56 +02:00
Masahiro Yamada
e21efe833e arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN),
which behaves equivalently.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2025-06-07 14:38:07 +09:00
Linus Torvalds
c00b285024 Merge tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel)

 - Fixes for Hyper-V UIO driver (Long Li)

 - Fixes for Hyper-V PCI driver (Michael Kelley)

 - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley)

 - Documentation updates for Hyper-V VMBus (Michael Kelley)

 - Enhance logging for hv_kvp_daemon (Shradha Gupta)

* tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits)
  Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests
  Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir
  Documentation: hyperv: Update VMBus doc with new features and info
  PCI: hv: Remove unnecessary flex array in struct pci_packet
  Drivers: hv: Remove hv_alloc/free_* helpers
  Drivers: hv: Use kzalloc for panic page allocation
  uio_hv_generic: Align ring size to system page
  uio_hv_generic: Use correct size for interrupt and monitor pages
  Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary
  arch/x86: Provide the CPU number in the wakeup AP callback
  x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap()
  PCI: hv: Get vPCI MSI IRQ domain from DeviceTree
  ACPI: irq: Introduce acpi_get_gsi_dispatcher()
  Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device()
  Drivers: hv: vmbus: Get the IRQ number from DeviceTree
  dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties
  arm64, x86: hyperv: Report the VTL the system boots in
  arm64: hyperv: Initialize the Virtual Trust Level field
  Drivers: hv: Provide arch-neutral implementation of get_vtl()
  Drivers: hv: Enable VTL mode for arm64
  ...
2025-06-03 08:39:20 -07:00
Thomas Gleixner
8b68e97871 x86/iopl: Cure TIF_IO_BITMAP inconsistencies
io_bitmap_exit() is invoked from exit_thread() when a task exists or
when a fork fails. In the latter case the exit_thread() cleans up
resources which were allocated during fork().

io_bitmap_exit() invokes task_update_io_bitmap(), which in turn ends up
in tss_update_io_bitmap(). tss_update_io_bitmap() operates on the
current task. If current has TIF_IO_BITMAP set, but no bitmap installed,
tss_update_io_bitmap() crashes with a NULL pointer dereference.

There are two issues, which lead to that problem:

  1) io_bitmap_exit() should not invoke task_update_io_bitmap() when
     the task, which is cleaned up, is not the current task. That's a
     clear indicator for a cleanup after a failed fork().

  2) A task should not have TIF_IO_BITMAP set and neither a bitmap
     installed nor IOPL emulation level 3 activated.

     This happens when a kernel thread is created in the context of
     a user space thread, which has TIF_IO_BITMAP set as the thread
     flags are copied and the IO bitmap pointer is cleared.

     Other than in the failed fork() case this has no impact because
     kernel threads including IO workers never return to user space and
     therefore never invoke tss_update_io_bitmap().

Cure this by adding the missing cleanups and checks:

  1) Prevent io_bitmap_exit() to invoke task_update_io_bitmap() if
     the to be cleaned up task is not the current task.

  2) Clear TIF_IO_BITMAP in copy_thread() unconditionally. For user
     space forks it is set later, when the IO bitmap is inherited in
     io_bitmap_share().

For paranoia sake, add a warning into tss_update_io_bitmap() to catch
the case, when that code is invoked with inconsistent state.

Fixes: ea5f1cd7ab ("x86/ioperm: Remove bitmap if all permissions dropped")
Reported-by: syzbot+e2b1803445d236442e54@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/87wmdceom2.ffs@tglx
2025-06-03 15:56:39 +02:00
Linus Torvalds
7f9039c524 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
  Generic:

   - Clean up locking of all vCPUs for a VM by using the *_nest_lock()
     family of functions, and move duplicated code to virt/kvm/. kernel/
     patches acked by Peter Zijlstra

   - Add MGLRU support to the access tracking perf test

  ARM fixes:

   - Make the irqbypass hooks resilient to changes in the GSI<->MSI
     routing, avoiding behind stale vLPI mappings being left behind. The
     fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
     and nuking the vLPI mapping upon a routing change

   - Close another VGIC race where vCPU creation races with VGIC
     creation, leading to in-flight vCPUs entering the kernel w/o
     private IRQs allocated

   - Fix a build issue triggered by the recently added workaround for
     Ampere's AC04_CPU_23 erratum

   - Correctly sign-extend the VA when emulating a TLBI instruction
     potentially targeting a VNCR mapping

   - Avoid dereferencing a NULL pointer in the VGIC debug code, which
     can happen if the device doesn't have any mapping yet

  s390:

   - Fix interaction between some filesystems and Secure Execution

   - Some cleanups and refactorings, preparing for an upcoming big
     series

  x86:

   - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
     to fix a race between AP destroy and VMRUN

   - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
     the VM

   - Refine and harden handling of spurious faults

   - Add support for ALLOWED_SEV_FEATURES

   - Add #VMGEXIT to the set of handlers special cased for
     CONFIG_RETPOLINE=y

   - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
     features that utilize those bits

   - Don't account temporary allocations in sev_send_update_data()

   - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
     Threshold

   - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
     IBPB, between SVM and VMX

   - Advertise support to userspace for WRMSRNS and PREFETCHI

   - Rescan I/O APIC routes after handling EOI that needed to be
     intercepted due to the old/previous routing, but not the
     new/current routing

   - Add a module param to control and enumerate support for device
     posted interrupts

   - Fix a potential overflow with nested virt on Intel systems running
     32-bit kernels

   - Flush shadow VMCSes on emergency reboot

   - Add support for SNP to the various SEV selftests

   - Add a selftest to verify fastops instructions via forced emulation

   - Refine and optimize KVM's software processing of the posted
     interrupt bitmap, and share the harvesting code between KVM and the
     kernel's Posted MSI handler"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
  rtmutex_api: provide correct extern functions
  KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
  KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
  KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
  KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
  KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
  KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
  KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
  arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
  KVM: s390: Simplify and move pv code
  KVM: s390: Refactor and split some gmap helpers
  KVM: s390: Remove unneeded srcu lock
  s390: Remove unneeded includes
  s390/uv: Improve splitting of large folios that cannot be split while dirty
  s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
  s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
  rust: add helper for mutex_trylock
  RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
  KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
  x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
  ...
2025-06-02 12:24:58 -07:00
Linus Torvalds
7d4e49a77d Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "hung_task: extend blocking task stacktrace dump to semaphore" from
   Lance Yang enhances the hung task detector.

   The detector presently dumps the blocking tasks's stack when it is
   blocked on a mutex. Lance's series extends this to semaphores

 - "nilfs2: improve sanity checks in dirty state propagation" from
   Wentao Liang addresses a couple of minor flaws in nilfs2

 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn
   fixes a couple of issues in the gdb scripts

 - "Support kdump with LUKS encryption by reusing LUKS volume keys" from
   Coiby Xu addresses a usability problem with kdump.

   When the dump device is LUKS-encrypted, the kdump kernel may not have
   the keys to the encrypted filesystem. A full writeup of this is in
   the series [0/N] cover letter

 - "sysfs: add counters for lockups and stalls" from Max Kellermann adds
   /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and
   /sys/kernel/rcu_stall_count

 - "fork: Page operation cleanups in the fork code" from Pasha Tatashin
   implements a number of code cleanups in fork.c

 - "scripts/gdb/symbols: determine KASLR offset on s390 during early
   boot" from Ilya Leoshkevich fixes some s390 issues in the gdb
   scripts

* tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits)
  llist: make llist_add_batch() a static inline
  delayacct: remove redundant code and adjust indentation
  squashfs: add optional full compressed block caching
  crash_dump, nvme: select CONFIGFS_FS as built-in
  scripts/gdb/symbols: determine KASLR offset on s390 during early boot
  scripts/gdb/symbols: factor out pagination_off()
  scripts/gdb/symbols: factor out get_vmlinux()
  kernel/panic.c: format kernel-doc comments
  mailmap: update and consolidate Casey Connolly's name and email
  nilfs2: remove wbc->for_reclaim handling
  fork: define a local GFP_VMAP_STACK
  fork: check charging success before zeroing stack
  fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
  fork: clean-up ifdef logic around stack allocation
  kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
  kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
  x86/crash: make the page that stores the dm crypt keys inaccessible
  x86/crash: pass dm crypt keys to kdump kernel
  Revert "x86/mm: Remove unused __set_memory_prot()"
  crash_dump: retrieve dm crypt keys in kdump kernel
  ...
2025-05-31 19:12:53 -07:00
Linus Torvalds
00c010e130 Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
   creating a pte which addresses the first page in a folio and reduces
   the amount of plumbing which architecture must implement to provide
   this.

 - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
   largely unrelated folio infrastructure changes which clean things up
   and better prepare us for future work.

 - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
   Price adds early-init code to prevent x86 from leaving physical
   memory unused when physical address regions are not aligned to memory
   block size.

 - "mm/compaction: allow more aggressive proactive compaction" from
   Michal Clapinski provides some tuning of the (sadly, hard-coded (more
   sadly, not auto-tuned)) thresholds for our invokation of proactive
   compaction. In a simple test case, the reduction of a guest VM's
   memory consumption was dramatic.

 - "Minor cleanups and improvements to swap freeing code" from Kemeng
   Shi provides some code cleaups and a small efficiency improvement to
   this part of our swap handling code.

 - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
   adds the ability for a ptracer to modify syscalls arguments. At this
   time we can alter only "system call information that are used by
   strace system call tampering, namely, syscall number, syscall
   arguments, and syscall return value.

   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.

 - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
   Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
   against /proc/pid/pagemap. This permits CRIU to more efficiently get
   at the info about guard regions.

 - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
   implements that fix. No runtime effect is expected because
   validate_page_before_insert() happens to fix up this error.

 - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
   Hildenbrand basically brings uprobe text poking into the current
   decade. Remove a bunch of hand-rolled implementation in favor of
   using more current facilities.

 - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
   Khandual provides enhancements and generalizations to the pte dumping
   code. This might be needed when 128-bit Page Table Descriptors are
   enabled for ARM.

 - "Always call constructor for kernel page tables" from Kevin Brodsky
   ensures that the ctor/dtor is always called for kernel pgtables, as
   it already is for user pgtables.

   This permits the addition of more functionality such as "insert hooks
   to protect page tables". This change does result in various
   architectures performing unnecesary work, but this is fixed up where
   it is anticipated to occur.

 - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
   Ryhl adds plumbing to permit Rust access to core MM structures.

 - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
   Stoakes takes advantage of some VMA merging opportunities which we've
   been missing for 15 years.

 - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
   SeongJae Park optimizes process_madvise()'s TLB flushing.

   Instead of flushing each address range in the provided iovec, we
   batch the flushing across all the iovec entries. The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.

 - "Track node vacancy to reduce worst case allocation counts" from
   Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.

   stress-ng mmap performance increased by single-digit percentages and
   the amount of unnecessarily preallocated memory was dramaticelly
   reduced.

 - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
   a few unnecessary things which Baoquan noted when reading the code.

 - ""Enhance sysfs handling for memory hotplug in weighted interleave"
   from Rakie Kim "enhances the weighted interleave policy in the memory
   management subsystem by improving sysfs handling, fixing memory
   leaks, and introducing dynamic sysfs updates for memory hotplug
   support". Fixes things on error paths which we are unlikely to hit.

 - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
   from SeongJae Park introduces new DAMOS quota goal metrics which
   eliminate the manual tuning which is required when utilizing DAMON
   for memory tiering.

 - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
   provides cleanups and small efficiency improvements which Baoquan
   found via code inspection.

 - "vmscan: enforce mems_effective during demotion" from Gregory Price
   changes reclaim to respect cpuset.mems_effective during demotion when
   possible. because presently, reclaim explicitly ignores
   cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated.

   This is useful for isolating workloads on a multi-tenant system from
   certain classes of memory more consistently.

 - "Clean up split_huge_pmd_locked() and remove unnecessary folio
   pointers" from Gavin Guo provides minor cleanups and efficiency gains
   in in the huge page splitting and migrating code.

 - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
   for `struct mem_cgroup', yielding improved memory utilization.

 - "add max arg to swappiness in memory.reclaim and lru_gen" from
   Zhongkun He adds a new "max" argument to the "swappiness=" argument
   for memory.reclaim MGLRU's lru_gen.

   This directs proactive reclaim to reclaim from only anon folios
   rather than file-backed folios.

 - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
   first step on the path to permitting the kernel to maintain existing
   VMs while replacing the host kernel via file-based kexec. At this
   time only memblock's reserve_mem is preserved.

 - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
   and uses a smarter way of looping over a pfn range. By skipping
   ranges of invalid pfns.

 - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
   cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
   when a task is pinned a single NUMA mode.

   Dramatic performance benefits were seen in some real world cases.

 - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
   Garg addresses a warning which occurs during memory compaction when
   using JFS.

 - "move all VMA allocation, freeing and duplication logic to mm" from
   Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
   appropriate mm/vma.c.

 - "mm, swap: clean up swap cache mapping helper" from Kairui Song
   provides code consolidation and cleanups related to the folio_index()
   function.

 - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.

 - "memcg: Fix test_memcg_min/low test failures" from Waiman Long
   addresses some bogus failures which are being reported by the
   test_memcontrol selftest.

 - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
   Stoakes commences the deprecation of file_operations.mmap() in favor
   of the new file_operations.mmap_prepare().

   The latter is more restrictive and prevents drivers from messing with
   things in ways which, amongst other problems, may defeat VMA merging.

 - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
   the per-cpu memcg charge cache from the objcg's one.

   This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.

 - "mm/damon: minor fixups and improvements for code, tests, and
   documents" from SeongJae Park is yet another batch of miscellaneous
   DAMON changes. Fix and improve minor problems in code, tests and
   documents.

 - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
   stats to be irq safe. Another step along the way to making memcg
   charging and stats updates NMI-safe, a BPF requirement.

 - "Let unmap_hugepage_range() and several related functions take folio
   instead of page" from Fan Ni provides folio conversions in the
   hugetlb code.

* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
  mm: pcp: increase pcp->free_count threshold to trigger free_high
  mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
  mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: pass folio instead of page to unmap_ref_private()
  memcg: objcg stock trylock without irq disabling
  memcg: no stock lock for cpu hot-unplug
  memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
  memcg: make count_memcg_events re-entrant safe against irqs
  memcg: make mod_memcg_state re-entrant safe against irqs
  memcg: move preempt disable to callers of memcg_rstat_updated
  memcg: memcg_rstat_updated re-entrant safe against irqs
  mm: khugepaged: decouple SHMEM and file folios' collapse
  selftests/eventfd: correct test name and improve messages
  alloc_tag: check mem_profiling_support in alloc_tag_init
  Docs/damon: update titles and brief introductions to explain DAMOS
  selftests/damon/_damon_sysfs: read tried regions directories in order
  mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
  mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
  mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
  ...
2025-05-31 15:44:16 -07:00
Linus Torvalds
976aa630da Merge tag 'pm-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "These revert an x86 commit that introduced a nasty power regression on
  some systems, fix PSCI cpuidle driver and ACPI cpufreq driver
  regressions, add Rust abstractions for cpufreq, OPP, clk, and
  cpumasks, add a Rust-based cpufreq-dt driver, and do a minor SCMI
  cpufreq driver cleanup:

   - Revert an x86 commit that went into 6.15 and caused idle power,
     including power in suspend-to-idle, to rise rather dramatically on
     systems booting with "nosmt" in the kernel command line (Rafael
     Wysocki)

   - Prevent freeing an uninitialized pointer in error path of
     dt_idle_state_present() in the PSCI cpuidle driver (Dan Carpenter)

   - Use KHz as the nominal_freq units in get_max_boost_ratio() in the
     ACPI cpufreq driver (iGautham Shenoy)

   - Add Rust abstractions for CPUFreq framework (Viresh Kumar)

   - Add Rust abstractions for OPP framework (Viresh Kumar)

   - Add basic Rust abstractions for Clk and Cpumask frameworks (Viresh
     Kumar)

   - Clean up the SCMI cpufreq driver somewhat (Mike Tipton)"

* tag 'pm-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (21 commits)
  Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
  acpi-cpufreq: Fix nominal_freq units to KHz in get_max_boost_ratio()
  rust: opp: Move `cfg(CONFIG_OF)` attribute to the top of doc test
  cpuidle: psci: Fix uninitialized variable in dt_idle_state_present()
  rust: opp: Make the doctest example depend on CONFIG_OF
  cpufreq: scmi: Skip SCMI devices that aren't used by the CPUs
  cpufreq: Add Rust-based cpufreq-dt driver
  rust: opp: Extend OPP abstractions with cpufreq support
  rust: cpufreq: Extend abstractions for driver registration
  rust: cpufreq: Extend abstractions for policy and driver ops
  rust: cpufreq: Add initial abstractions for cpufreq framework
  rust: opp: Add abstractions for the configuration options
  rust: opp: Add abstractions for the OPP table
  rust: opp: Add initial abstractions for OPP framework
  rust: cpu: Add from_cpu()
  rust: macros: enable use of hyphens in module names
  rust: clk: Add initial abstractions
  rust: clk: Add helpers for Rust code
  MAINTAINERS: Add entry for Rust cpumask API
  rust: cpumask: Add initial abstractions
  ...
2025-05-30 11:54:29 -07:00
Rafael J. Wysocki
3d031d0d8d Merge branch 'pm-cpuidle'
Fix an issue in the PSCI cpuidle driver introduced recently and a nasty
x86 power regression introduced in 6.15:

 - Prevent freeing an uninitialized pointer in error path of
   dt_idle_state_present() in the PSCI cpuidle driver (Dan Carpenter).

 - Revert an x86 commit that went into 6.15 and caused idle power,
   including power in suspend-to-idle, to rise rather dramatically on
   systems booting with "nosmt" in the kernel command line (Rafael Wysocki).

* pm-cpuidle:
  Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
  cpuidle: psci: Fix uninitialized variable in dt_idle_state_present()
2025-05-30 20:21:36 +02:00
Linus Torvalds
bbd9c366bf Merge tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel software guard extension (SGX) updates from Dave Hansen:
 "A couple of x86/sgx changes.

  The first one is a no-brainer to use the (simple) SHA-256 library.

  For the second one, some folks doing testing noticed that SGX systems
  under memory pressure were inducing fatal machine checks at pretty
  unnerving rates, despite the SGX code having _some_ awareness of
  memory poison.

  It turns out that the SGX reclaim path was not checking for poison
  _and_ it always accesses memory to copy it around. Make sure that
  poisoned pages are not reclaimed"

* tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Prevent attempts to reclaim poisoned pages
  x86/sgx: Use SHA-256 library API instead of crypto_shash API
2025-05-29 21:13:17 -07:00
Rafael J. Wysocki
70523f3357 Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
Revert commit 96040f7273 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
because it introduced a significant power regression on systems that
start with "nosmt" in the kernel command line.

Namely, on such systems, SMT siblings permanently go offline early,
when cpuidle has not been initialized yet, so after the above commit,
hlt_play_dead() is called for them.  Later on, when the processor
attempts to enter a deep package C-state, including PC10 which is
requisite for reaching minimum power in suspend-to-idle, it is not
able to do that because of the SMT siblings staying in C1 (which
they have been put into by HLT).

As a result, the idle power (including power in suspend-to-idle)
rises quite dramatically on those systems with all of the possible
consequences, which (needless to say) may not be expected by their
users.

This issue is hard to debug and potentially dangerous, so it needs to
be addressed as soon as possible in a way that will work for 6.15.y,
hence the revert.

Of course, after this revert, the issue that commit 96040f7273
attempted to address will be back and it will need to be fixed again
later.

Fixes: 96040f7273 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: 6.15+ <stable@vger.kernel.org> # 6.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/12674167.O9o76ZdvQC@rjwysocki.net
2025-05-29 17:34:18 +02:00
Linus Torvalds
43db111107 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "As far as x86 goes this pull request "only" includes TDX host support.

  Quotes are appropriate because (at 6k lines and 100+ commits) it is
  much bigger than the rest, which will come later this week and
  consists mostly of bugfixes and selftests. s390 changes will also come
  in the second batch.

  ARM:

   - Add large stage-2 mapping (THP) support for non-protected guests
     when pKVM is enabled, clawing back some performance.

   - Enable nested virtualisation support on systems that support it,
     though it is disabled by default.

   - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
     and protected modes.

   - Large rework of the way KVM tracks architecture features and links
     them with the effects of control bits. While this has no functional
     impact, it ensures correctness of emulation (the data is
     automatically extracted from the published JSON files), and helps
     dealing with the evolution of the architecture.

   - Significant changes to the way pKVM tracks ownership of pages,
     avoiding page table walks by storing the state in the hypervisor's
     vmemmap. This in turn enables the THP support described above.

   - New selftest checking the pKVM ownership transition rules

   - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
     even if the host didn't have it.

   - Fixes for the address translation emulation, which happened to be
     rather buggy in some specific contexts.

   - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
     from the number of counters exposed to a guest and addressing a
     number of issues in the process.

   - Add a new selftest for the SVE host state being corrupted by a
     guest.

   - Keep HCR_EL2.xMO set at all times for systems running with the
     kernel at EL2, ensuring that the window for interrupts is slightly
     bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

   - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
     from a pretty bad case of TLB corruption unless accesses to HCR_EL2
     are heavily synchronised.

   - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
     tables in a human-friendly fashion.

   - and the usual random cleanups.

  LoongArch:

   - Don't flush tlb if the host supports hardware page table walks.

   - Add KVM selftests support.

  RISC-V:

   - Add vector registers to get-reg-list selftest

   - VCPU reset related improvements

   - Remove scounteren initialization from VCPU reset

   - Support VCPU reset from userspace using set_mpstate() ioctl

  x86:

   - Initial support for TDX in KVM.

     This finally makes it possible to use the TDX module to run
     confidential guests on Intel processors. This is quite a large
     series, including support for private page tables (managed by the
     TDX module and mirrored in KVM for efficiency), forwarding some
     TDVMCALLs to userspace, and handling several special VM exits from
     the TDX module.

     This has been in the works for literally years and it's not really
     possible to describe everything here, so I'll defer to the various
     merge commits up to and including commit 7bcf7246c4 ('Merge
     branch 'kvm-tdx-finish-initial' into HEAD')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
  x86/tdx: mark tdh_vp_enter() as __flatten
  Documentation: virt/kvm: remove unreferenced footnote
  RISC-V: KVM: lock the correct mp_state during reset
  KVM: arm64: Fix documentation for vgic_its_iter_next()
  KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  KVM: arm64: Stage-2 huge mappings for np-guests
  KVM: arm64: Add a range to pkvm_mappings
  KVM: arm64: Convert pkvm_mappings to interval tree
  KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
  KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
  KVM: arm64: Add a range to __pkvm_host_unshare_guest()
  KVM: arm64: Add a range to __pkvm_host_share_guest()
  KVM: arm64: Introduce for_each_hyp_page
  KVM: arm64: Handle huge mappings for np-guest CMOs
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  RISC-V: KVM: Remove scounteren initialization
  KVM: RISC-V: remove unnecessary SBI reset state
  ...
2025-05-29 08:10:01 -07:00
Linus Torvalds
350a604221 Merge tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull mtrr update from Borislav Petkov:
 "A single change to verify the presence of fixed MTRR ranges before
  accessing the respective MSRs"

* tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()
2025-05-27 10:17:03 -07:00
Linus Torvalds
664a231d90 Merge tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
 "Carve out the resctrl filesystem-related code into fs/resctrl/ so that
  multiple architectures can share the fs API for manipulating their
  respective hw resource control implementation.

  This is the second step in the work towards sharing the resctrl
  filesystem interface, the next one being plugging ARM's MPAM into the
  aforementioned fs API"

* tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  MAINTAINERS: Add reviewers for fs/resctrl
  x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
  x86/resctrl: Always initialise rid field in rdt_resources_all[]
  x86/resctrl: Relax some asm #includes
  x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
  x86/resctrl: Squelch whitespace anomalies in resctrl core code
  x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
  x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
  x86/resctrl: Move enum resctrl_event_id to resctrl.h
  x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
  fs/resctrl: Add boiler plate for external resctrl code
  x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
  x86/resctrl: Split trace.h
  x86/resctrl: Expand the width of domid by replacing mon_data_bits
  x86/resctrl: Add end-marker to the resctrl_event_id enum
  x86/resctrl: Move is_mba_sc() out of core.c
  x86/resctrl: Drop __init/__exit on assorted symbols
  x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
  x86/resctrl: Check all domains are offline in resctrl_exit()
  x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
  ...
2025-05-27 09:53:02 -07:00
Paolo Bonzini
db44dcbdf8 Merge tag 'kvm-x86-pir-6.16' of https://github.com/kvm-x86/linux into HEAD
KVM x86 posted interrupt changes for 6.16:

Refine and optimize KVM's software processing of the PIR, and ultimately share
PIR harvesting code between KVM and the kernel's Posted MSI handler
2025-05-27 12:15:01 -04:00
Linus Torvalds
5e8bbb2caa Merge tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
 "Another set of timer API cleanups:

    - Convert init_timer*(), try_to_del_timer_sync() and
      destroy_timer_on_stack() over to the canonical timer_*()
      namespace convention.

  There is another large conversion pending, which has not been included
  because it would have caused a gazillion of merge conflicts in next.
  The conversion scripts will be run towards the end of the merge window
  and a pull request sent once all conflict dependencies have been
  merged"

* tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack()
  treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()
  timers: Rename init_timers() as timers_init()
  timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA
  timers: Rename __init_timer_on_stack() as __timer_init_on_stack()
  timers: Rename __init_timer() as __timer_init()
  timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack()
  timers: Rename init_timer_key() as timer_init_key()
2025-05-27 08:31:21 -07:00
Linus Torvalds
2bd1bea5fa Merge tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq cleanups from Thomas Gleixner:
 "A set of cleanups for the generic interrupt subsystem:

   - Consolidate on one set of functions for the interrupt domain code
     to get rid of pointlessly duplicated code with only marginal
     different semantics.

   - Update the documentation accordingly and consolidate the coding
     style of the irqdomain header"

* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  irqdomain: Consolidate coding style
  irqdomain: Fix kernel-doc and add it to Documentation
  Documentation: irqdomain: Update it
  Documentation: irq-domain.rst: Simple improvements
  Documentation: irq/concepts: Minor improvements
  Documentation: irq/concepts: Add commas and reflow
  irqdomain: Improve kernel-docs of functions
  irqdomain: Make struct irq_domain_info variables const
  irqdomain: Use irq_domain_instantiate()'s return value as initializers
  irqdomain: Drop irq_linear_revmap()
  pinctrl: keembay: Switch to irq_find_mapping()
  irqchip/armada-370-xp: Switch to irq_find_mapping()
  gpu: ipu-v3: Switch to irq_find_mapping()
  gpio: idt3243x: Switch to irq_find_mapping()
  sh: Switch to irq_find_mapping()
  powerpc: Switch to irq_find_mapping()
  irqdomain: Drop irq_domain_add_*() functions
  powerpc: Switch irq_domain_add_nomap() to use fwnode
  thermal: Switch to irq_domain_create_linear()
  soc: Switch to irq_domain_create_*()
  ...
2025-05-27 08:07:32 -07:00
Linus Torvalds
0aee061726 Merge tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug updates from Ingo Molnar:
 "Move the x86 page fault tracepoints to generic code, because other
  architectures would like to make use of them as well"

* tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tracing, x86/mm: Move page fault tracepoints to generic
  x86/tracing, x86/mm: Remove redundant trace_pagefault_key
2025-05-26 21:18:59 -07:00
Linus Torvalds
020fca04c6 Merge tag 'x86-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc x86 cleanups: kernel-doc updates and a string API transition
  patch"

* tag 'x86-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/power: hibernate: Fix W=1 build kernel-doc warnings
  x86/mm/pat: Fix W=1 build kernel-doc warning
  x86/CPU/AMD: Replace strcpy() with strscpy()
2025-05-26 21:15:06 -07:00
Linus Torvalds
785cdec46e Merge tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
 "Boot code changes:

   - A large series of changes to reorganize the x86 boot code into a
     better isolated and easier to maintain base of PIC early startup
     code in arch/x86/boot/startup/, by Ard Biesheuvel.

     Motivation & background:

  	| Since commit
  	|
  	|    c88d71508e ("x86/boot/64: Rewrite startup_64() in C")
  	|
  	| dated Jun 6 2017, we have been using C code on the boot path in a way
  	| that is not supported by the toolchain, i.e., to execute non-PIC C
  	| code from a mapping of memory that is different from the one provided
  	| to the linker. It should have been obvious at the time that this was a
  	| bad idea, given the need to sprinkle fixup_pointer() calls left and
  	| right to manipulate global variables (including non-pointer variables)
  	| without crashing.
  	|
  	| This C startup code has been expanding, and in particular, the SEV-SNP
  	| startup code has been expanding over the past couple of years, and
  	| grown many of these warts, where the C code needs to use special
  	| annotations or helpers to access global objects.

     This tree includes the first phase of this work-in-progress x86
     boot code reorganization.

  Scalability enhancements and micro-optimizations:

   - Improve code-patching scalability (Eric Dumazet)

   - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)

  CPU features enumeration updates:

   - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S.
     Darwish)

   - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish,
     Thomas Gleixner)

   - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)

  Memory management changes:

   - Allow temporary MMs when IRQs are on (Andy Lutomirski)

   - Opt-in to IRQs-off activate_mm() (Andy Lutomirski)

   - Simplify choose_new_asid() and generate better code (Borislav
     Petkov)

   - Simplify 32-bit PAE page table handling (Dave Hansen)

   - Always use dynamic memory layout (Kirill A. Shutemov)

   - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)

   - Make 5-level paging support unconditional (Kirill A. Shutemov)

   - Stop prefetching current->mm->mmap_lock on page faults (Mateusz
     Guzik)

   - Predict valid_user_address() returning true (Mateusz Guzik)

   - Consolidate initmem_init() (Mike Rapoport)

  FPU support and vector computing:

   - Enable Intel APX support (Chang S. Bae)

   - Reorgnize and clean up the xstate code (Chang S. Bae)

   - Make task_struct::thread constant size (Ingo Molnar)

   - Restore fpu_thread_struct_whitelist() to fix
     CONFIG_HARDENED_USERCOPY=y (Kees Cook)

   - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg
     Nesterov)

   - Always preserve non-user xfeatures/flags in __state_perm (Sean
     Christopherson)

  Microcode loader changes:

   - Help users notice when running old Intel microcode (Dave Hansen)

   - AMD: Do not return error when microcode update is not necessary
     (Annie Li)

   - AMD: Clean the cache if update did not load microcode (Boris
     Ostrovsky)

  Code patching (alternatives) changes:

   - Simplify, reorganize and clean up the x86 text-patching code (Ingo
     Molnar)

   - Make smp_text_poke_batch_process() subsume
     smp_text_poke_batch_finish() (Nikolay Borisov)

   - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)

  Debugging support:

   - Add early IDT and GDT loading to debug relocate_kernel() bugs
     (David Woodhouse)

   - Print the reason for the last reset on modern AMD CPUs (Yazen
     Ghannam)

   - Add AMD Zen debugging document (Mario Limonciello)

   - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)

   - Stop decoding i64 instructions in x86-64 mode at opcode (Masami
     Hiramatsu)

  CPU bugs and bug mitigations:

   - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)

   - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)

   - Restructure and harmonize the various CPU bug mitigation methods
     (David Kaplan)

   - Fix spectre_v2 mitigation default on Intel (Pawan Gupta)

  MSR API:

   - Large MSR code and API cleanup (Xin Li)

   - In-kernel MSR API type cleanups and renames (Ingo Molnar)

  PKEYS:

   - Simplify PKRU update in signal frame (Chang S. Bae)

  NMI handling code:

   - Clean up, refactor and simplify the NMI handling code (Sohil Mehta)

   - Improve NMI duration console printouts (Sohil Mehta)

  Paravirt guests interface:

   - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)

  SEV support:

   - Share the sev_secrets_pa value again (Tom Lendacky)

  x86 platform changes:

   - Introduce the <asm/amd/> header namespace (Ingo Molnar)

   - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to
     <asm/amd/fch.h> (Mario Limonciello)

  Fixes and cleanups:

   - x86 assembly code cleanups and fixes (Uros Bizjak)

   - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy
     Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav
     Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David
     Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf,
     Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan
     Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank
     Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)"

* tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits)
  x86/bugs: Fix spectre_v2 mitigation default on Intel
  x86/bugs: Restructure ITS mitigation
  x86/xen/msr: Fix uninitialized variable 'err'
  x86/msr: Remove a superfluous inclusion of <asm/asm.h>
  x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
  x86/mm/64: Make 5-level paging support unconditional
  x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model
  x86/mm/64: Always use dynamic memory layout
  x86/bugs: Fix indentation due to ITS merge
  x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor()
  x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter
  x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter
  x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()
  x86/cpuid: Rename have_cpuid_p() to cpuid_feature()
  x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header
  x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h>
  x86/msr: Add rdmsrl_on_cpu() compatibility wrapper
  x86/mm: Fix kernel-doc descriptions of various pgtable methods
  x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too
  x86/boot: Defer initialization of VM space related global variables
  ...
2025-05-26 16:04:17 -07:00
Linus Torvalds
ddddf9d64f Merge tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
 "Core & generic-arch updates:

   - Add support for dynamic constraints and propagate it to the Intel
     driver (Kan Liang)

   - Fix & enhance driver-specific throttling support (Kan Liang)

   - Record sample last_period before updating on the x86 and PowerPC
     platforms (Mark Barnett)

   - Make perf_pmu_unregister() usable (Peter Zijlstra)

   - Unify perf_event_free_task() / perf_event_exit_task_context()
     (Peter Zijlstra)

   - Simplify perf_event_release_kernel() and perf_event_free_task()
     (Peter Zijlstra)

   - Allocate non-contiguous AUX pages by default (Yabin Cui)

  Uprobes updates:

   - Add support to emulate NOP instructions (Jiri Olsa)

   - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)

  x86 Intel PMU enhancements:

   - Support Intel Auto Counter Reload [ACR] (Kan Liang)

   - Add PMU support for Clearwater Forest (Dapeng Mi)

   - Arch-PEBS preparatory changes: (Dapeng Mi)
       - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
       - Decouple BTS initialization from PEBS initialization
       - Introduce pairs of PEBS static calls

  x86 AMD PMU enhancements:

   - Use hrtimer for handling overflows in the AMD uncore driver
     (Sandipan Das)

   - Prevent UMC counters from saturating (Sandipan Das)

  Fixes and cleanups:

   - Fix put_ctx() ordering (Frederic Weisbecker)

   - Fix irq work dereferencing garbage (Frederic Weisbecker)

   - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian
     Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan
     Das, Thorsten Blum)"

* tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  perf/headers: Clean up <linux/perf_event.h> a bit
  perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
  perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
  mips/perf: Remove driver-specific throttle support
  xtensa/perf: Remove driver-specific throttle support
  sparc/perf: Remove driver-specific throttle support
  loongarch/perf: Remove driver-specific throttle support
  csky/perf: Remove driver-specific throttle support
  arc/perf: Remove driver-specific throttle support
  alpha/perf: Remove driver-specific throttle support
  perf/apple_m1: Remove driver-specific throttle support
  perf/arm: Remove driver-specific throttle support
  s390/perf: Remove driver-specific throttle support
  powerpc/perf: Remove driver-specific throttle support
  perf/x86/zhaoxin: Remove driver-specific throttle support
  perf/x86/amd: Remove driver-specific throttle support
  perf/x86/intel: Remove driver-specific throttle support
  perf: Only dump the throttle log for the leader
  perf: Fix the throttle logic for a group
  perf/core: Add the is_event_in_freq_mode() helper to simplify the code
  ...
2025-05-26 15:40:23 -07:00
Linus Torvalds
14418ddcc2 Merge tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Fix memcpy_sglist to handle partially overlapping SG lists
   - Use memcpy_sglist to replace null skcipher
   - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK
   - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS
   - Hide CRYPTO_MANAGER
   - Add delayed freeing of driver crypto_alg structures

  Compression:
   - Allocate large buffers on first use instead of initialisation in scomp
   - Drop destination linearisation buffer in scomp
   - Move scomp stream allocation into acomp
   - Add acomp scatter-gather walker
   - Remove request chaining
   - Add optional async request allocation

  Hashing:
   - Remove request chaining
   - Add optional async request allocation
   - Move partial block handling into API
   - Add ahash support to hmac
   - Fix shash documentation to disallow usage in hard IRQs

  Algorithms:
   - Remove unnecessary SIMD fallback code on x86 and arm/arm64
   - Drop avx10_256 xts(aes)/ctr(aes) on x86
   - Improve avx-512 optimisations for xts(aes)
   - Move chacha arch implementations into lib/crypto
   - Move poly1305 into lib/crypto and drop unused Crypto API algorithm
   - Disable powerpc/poly1305 as it has no SIMD fallback
   - Move sha256 arch implementations into lib/crypto
   - Convert deflate to acomp
   - Set block size correctly in cbcmac

  Drivers:
   - Do not use sg_dma_len before mapping in sun8i-ss
   - Fix warm-reboot failure by making shutdown do more work in qat
   - Add locking in zynqmp-sha
   - Remove cavium/zip
   - Add support for PCI device 0x17D8 to ccp
   - Add qat_6xxx support in qat
   - Add support for RK3576 in rockchip-rng
   - Add support for i.MX8QM in caam

  Others:
   - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up
   - Add new SEV/SNP platform shutdown API in ccp"

* tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits)
  x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
  crypto: qat - add missing header inclusion
  crypto: api - Redo lookup on EEXIST
  Revert "crypto: testmgr - Add hash export format testing"
  crypto: marvell/cesa - Do not chain submitted requests
  crypto: powerpc/poly1305 - add depends on BROKEN for now
  Revert "crypto: powerpc/poly1305 - Add SIMD fallback"
  crypto: ccp - Add missing tee info reg for teev2
  crypto: ccp - Add missing bootloader info reg for pspv5
  crypto: sun8i-ce - move fallback ahash_request to the end of the struct
  crypto: octeontx2 - Use dynamic allocated memory region for lmtst
  crypto: octeontx2 - Initialize cptlfs device info once
  crypto: xts - Only add ecb if it is not already there
  crypto: lrw - Only add ecb if it is not already there
  crypto: testmgr - Add hash export format testing
  crypto: testmgr - Use ahash for generic tfm
  crypto: hmac - Add ahash support
  crypto: testmgr - Ignore EEXIST on shash allocation
  crypto: algapi - Add driver template support to crypto_inst_setname
  crypto: shash - Set reqsize in shash_alg
  ...
2025-05-26 13:47:28 -07:00
Paolo Bonzini
4d526b02df Merge tag 'kvmarm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.16

* New features:

  - Add large stage-2 mapping support for non-protected pKVM guests,
    clawing back some performance.

  - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
    protected modes.

  - Enable nested virtualisation support on systems that support it
    (yes, it has been a long time coming), though it is disabled by
    default.

* Improvements, fixes and cleanups:

  - Large rework of the way KVM tracks architecture features and links
    them with the effects of control bits. This ensures correctness of
    emulation (the data is automatically extracted from the published
    JSON files), and helps dealing with the evolution of the
    architecture.

  - Significant changes to the way pKVM tracks ownership of pages,
    avoiding page table walks by storing the state in the hypervisor's
    vmemmap. This in turn enables the THP support described above.

  - New selftest checking the pKVM ownership transition rules

  - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
    even if the host didn't have it.

  - Fixes for the address translation emulation, which happened to be
    rather buggy in some specific contexts.

  - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
    from the number of counters exposed to a guest and addressing a
    number of issues in the process.

  - Add a new selftest for the SVE host state being corrupted by a
    guest.

  - Keep HCR_EL2.xMO set at all times for systems running with the
    kernel at EL2, ensuring that the window for interrupts is slightly
    bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

  - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
    from a pretty bad case of TLB corruption unless accesses to HCR_EL2
    are heavily synchronised.

  - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
    tables in a human-friendly fashion.

  - and the usual random cleanups.
2025-05-26 16:19:46 -04:00
Eric Biggers
2297554f01 x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
irq_fpu_usable() incorrectly returned true before the FPU is
initialized.  The x86 CPU onlining code can call sha256() to checksum
AMD microcode images, before the FPU is initialized.  Since sha256()
recently gained a kernel-mode FPU optimized code path, a crash occurred
in kernel_fpu_begin_mask() during hotplug CPU onlining.

(The crash did not occur during boot-time CPU onlining, since the
optimized sha256() code is not enabled until subsys_initcalls run.)

Fix this by making irq_fpu_usable() return false before fpu__init_cpu()
has run.  To do this without adding any additional overhead to
irq_fpu_usable(), replace the existing per-CPU bool in_kernel_fpu with
kernel_fpu_allowed which tracks both initialization and usage rather
than just usage.  The initial state is false; FPU initialization sets it
to true; kernel-mode FPU sections toggle it to false and then back to
true; and CPU offlining restores it to the initial state of false.

Fixes: 11d7956d52 ("crypto: x86/sha256 - implement library instead of shash")
Reported-by: Ayush Jain <Ayush.Jain3@amd.com>
Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ayush Jain <Ayush.Jain3@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-26 10:58:50 +08:00
Roman Kisel
43cb39ad26 arch/x86: Provide the CPU number in the wakeup AP callback
When starting APs, confidential guests and paravisor guests
need to know the CPU number, and the pattern of using the linear
search has emerged in several places. With N processors that leads
to the O(N^2) time complexity.

Provide the CPU number in the AP wake up callback so that one can
get the CPU number in constant time.

Suggested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20250507182227.7421-3-romank@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250507182227.7421-3-romank@linux.microsoft.com>
2025-05-23 16:30:56 +00:00
Marc Zyngier
cb86616c39 Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/next
* kvm-arm64/ubsan-el2:
  : .
  : Add UBSAN support to the EL2 portion of KVM, reusing most of the
  : existing logic provided by CONFIG_IBSAN_TRAP.
  :
  : Patches courtesy of Mostafa Saleh.
  : .
  KVM: arm64: Handle UBSAN faults
  KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
  ubsan: Remove regs from report_ubsan_failure()
  arm64: Introduce esr_is_ubsan_brk()

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23 10:57:32 +01:00
Coiby Xu
cc66e4863a x86/crash: make the page that stores the dm crypt keys inaccessible
This adds an addition layer of protection for the saved copy of dm crypt
key.  Trying to access the saved copy will cause page fault.

Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 10:48:22 -07:00
Coiby Xu
5eb3f60554 x86/crash: pass dm crypt keys to kdump kernel
1st kernel will build up the kernel command parameter dmcryptkeys as
similar to elfcorehdr to pass the memory address of the stored info of dm
crypt key to kdump kernel.

Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 10:48:21 -07:00
Pawan Gupta
6a7c3c2606 x86/bugs: Fix spectre_v2 mitigation default on Intel
Commit

  480e803dac ("x86/bugs: Restructure spectre_v2 mitigation")

inadvertently changed the spectre-v2 mitigation default from eIBRS to IBRS on
Intel. While splitting the spectre_v2 mitigation in select/update/apply
functions, eIBRS and IBRS selection logic was separated in select and update.

This caused IBRS selection to not consider that eIBRS mitigation is already
selected, fix it.

Fixes: 480e803dac ("x86/bugs: Restructure spectre_v2 mitigation")
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250520-eibrs-fix-v1-1-91bacd35ed09@linux.intel.com
2025-05-21 11:51:32 +02:00
David Kaplan
61ab72c2c6 x86/bugs: Restructure ITS mitigation
Restructure the ITS mitigation to use select/update/apply functions like
the other mitigations.

There is a particularly complex interaction between ITS and Retbleed as CDT
(Call Depth Tracking) is a mitigation for both, and either its=stuff or
retbleed=stuff will attempt to enable CDT.

retbleed_update_mitigation() runs first and will check the necessary
pre-conditions for CDT if either ITS or Retbleed stuffing is selected.  If
checks pass and ITS stuffing is selected, it will select stuffing for
Retbleed as well.

its_update_mitigation() runs after and will either select stuffing if
retbleed stuffing was enabled, or fall back to the default (aligned thunks)
if stuffing could not be enabled.

Enablement of CDT is done exclusively in retbleed_apply_mitigation().
its_apply_mitigation() is only used to enable aligned thunks.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/20250516193212.128782-1-david.kaplan@amd.com
2025-05-21 08:45:27 +02:00
Ingo Molnar
412751aa69 Merge tag 'v6.15-rc7' into x86/core, to pick up fixes
Pick up build fixes from upstream to make this tree more testable.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-21 08:45:03 +02:00
Linus Torvalds
56b2b1fc90 Merge tag 'x86-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:

 - Fix SEV-SNP kdump bugs

 - Update the email address of Alexey Makhalov in MAINTAINERS

 - Add the CPU feature flag for the Zen6 microarchitecture

 - Fix typo in system message

* tag 'x86-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove duplicated word in warning message
  x86/CPU/AMD: Add X86_FEATURE_ZEN6
  x86/sev: Make sure pages are not skipped during kdump
  x86/sev: Do not touch VMSA pages during SNP guest memory kdump
  MAINTAINERS: Update Alexey Makhalov's email address
  x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL macro
2025-05-17 08:43:51 -07:00
Kirill A. Shutemov
09230b7554 x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
PARAVIRT_XXL is exclusively utilized by XEN_PV, which is only compatible
with 64-bit machines.

Clearly designate PARAVIRT_XXL as 64-bit only and remove ifdefs to
support CONFIG_PGTABLE_LEVELS < 5.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-5-kirill.shutemov@linux.intel.com
2025-05-17 10:38:29 +02:00
Kirill A. Shutemov
7212b58d6d x86/mm/64: Make 5-level paging support unconditional
Both Intel and AMD CPUs support 5-level paging, which is expected to
become more widely adopted in the future. All major x86 Linux
distributions have the feature enabled.

Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com
2025-05-17 10:38:16 +02:00
Kirill A. Shutemov
1bffe6f689 x86/mm/64: Always use dynamic memory layout
Dynamic memory layout is used by KASLR and 5-level paging.

CONFIG_X86_5LEVEL is going to be removed, making 5-level paging support
unconditional which requires unconditional support of dynamic memory
layout.

Remove CONFIG_DYNAMIC_MEMORY_LAYOUT.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-2-kirill.shutemov@linux.intel.com
2025-05-17 10:33:44 +02:00
Borislav Petkov (AMD)
a0f3fe547e x86/bugs: Fix indentation due to ITS merge
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-17 10:33:32 +02:00
Jiri Slaby (SUSE)
b712918091 x86/io_apic: Switch to of_fwnode_handle()
of_node_to_fwnode() is irqdomain's reimplementation of the "officially"
defined of_fwnode_handle(). The former is in the process of being
removed, so use the latter instead.

[ tglx: Fix up subject prefix ]

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-11-jirislaby@kernel.org
2025-05-16 21:06:08 +02:00
James Morse
7168ae330e x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
Resctrl is a filesystem interface to hardware that provides cache
allocation policy and bandwidth control for groups of tasks or CPUs.

To support more than one architecture, resctrl needs to live in /fs/.

Move the code that is concerned with the filesystem interface to
/fs/resctrl.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com
2025-05-16 14:36:09 +02:00
James Morse
f6b25be204 x86/resctrl: Always initialise rid field in rdt_resources_all[]
x86 has an array, rdt_resources_all[], of all possible resources.
The for-each-resource walkers depend on the rid field of all
resources being initialised.

If the array ever grows due to another architecture adding a resource
type that is not defined on x86, the for-each-resources walkers will
loop forever.

Initialise all the rid values in resctrl_arch_late_init() before
any for-each-resource walker can be called.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-24-james.morse@arm.com
2025-05-16 14:35:22 +02:00
Dave Martin
b7b57edbf5 x86/resctrl: Relax some asm #includes
checkpatch.pl identifies some direct #includes of asm headers that
can be satisfied by including the corresponding <linux/...> header
instead.

Fix them.

No intentional functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-23-james.morse@arm.com
2025-05-16 14:34:31 +02:00
Dave Martin
df3dc0efcc x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
rdt_init_fs_context() sizes a typed allocation using an explicit
sizeof(type) expression, which checkpatch.pl complains about.

Since this code is about to be factored out and made generic, this
is a good opportunity to fix the code to size the allocation based
on the target pointer instead, to reduce the chance of future mis-
maintenance.

Fix it.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-22-james.morse@arm.com
2025-05-16 14:33:56 +02:00
Dave Martin
556f48a509 x86/resctrl: Squelch whitespace anomalies in resctrl core code
checkpatch.pl complains about some whitespace anomalies in the
resctrl core code.

This doesn't matter, but since this code is about to be factored
out and made generic, this is a good opportunity to fix these
issues and so reduce future checkpatch fuzz.

Fix them.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-21-james.morse@arm.com
2025-05-16 14:09:18 +02:00
James Morse
3d95a49b36 x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
Once the filesystem parts of resctrl move to fs/resctrl, it cannot rely
on definitions in x86's internal.h.

Move definitions in internal.h that need to be shared between the
filesystem and architecture code to header files that fs/resctrl can
include.

Doing this separately means the filesystem code only moves between files
of the same name, instead of having these changes mixed in too.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-17-james.morse@arm.com
2025-05-16 11:35:23 +02:00
James Morse
bff70402d6 fs/resctrl: Add boiler plate for external resctrl code
Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL
for the common parts of the resctrl interface and make X86_CPU_RESCTRL
select this.

Adding an include of asm/resctrl.h to linux/resctrl.h allows the
/fs/resctrl files to switch over to using this header instead.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com
2025-05-16 11:05:40 +02:00