Commit Graph

6809 Commits

Author SHA1 Message Date
Linus Torvalds
b8a2c32b22 Merge tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Update the list of AMD microcode minimum Entrysign revisions

 - Add additional fixed AMD RDSEED microcode revisions

 - Update the language transliteration for Kiryl Shutsemau's name
   in the MAINTAINERS entry

* tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev
  x86/CPU/AMD: Add additional fixed RDSEED microcode revisions
  MAINTAINERS: Update name spelling
2025-11-15 08:55:29 -08:00
Borislav Petkov (AMD)
dd14022a7c x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev
Add the minimum Entrysign revision for that model+stepping to the list
of minimum revisions.

Fixes: 50cef76d5c ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/e94dd76b-4911-482f-8500-5c848a3df026@citrix.com
2025-11-14 14:04:49 +01:00
Mario Limonciello
e1a97a627c x86/CPU/AMD: Add additional fixed RDSEED microcode revisions
Microcode that resolves the RDSEED failure (SB-7055 [1]) has been released for
additional Zen5 models to linux-firmware [2]. Update the zen5_rdseed_microcode
array to cover these new models.

Fixes: 607b9fb2ce ("x86/CPU/AMD: Add RDSEED fix for Zen5")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7055.html [1]
Link: 6167e55669 [2]
Link: https://patch.msgid.link/20251113223608.1495655-1-mario.limonciello@amd.com
2025-11-14 13:02:11 +01:00
Linus Torvalds
0d7bee10be Merge tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Fix AMD PCI root device caching regression that triggers
   on certain firmware variants

 - Fix the zen5_rdseed_microcode[] array to be NULL-terminated

 - Add more AMD models to microcode signature checking

* tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Add more known models to entry sign checking
  x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
  x86/amd_node: Fix AMD root device caching
2025-11-08 09:01:11 -08:00
Mario Limonciello (AMD)
d23550efc6 x86/microcode/AMD: Add more known models to entry sign checking
Two Zen5 systems are missing from need_sha_check(). Add them.

Fixes: 50cef76d5c ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/20251106182904.4143757-1-superm1@kernel.org
2025-11-07 12:12:21 +01:00
Linus Torvalds
284922f4c5 x86: uaccess: don't use runtime-const rewriting in modules
The runtime-const infrastructure was never designed to handle the
modular case, because the constant fixup is only done at boot time for
core kernel code.

But by the time I used it for the x86-64 user space limit handling in
commit 86e6b1547b ("x86: fix user address masking non-canonical
speculation issue"), I had completely repressed that fact.

And it all happens to work because the only code that currently actually
gets inlined by modules is for the access_ok() limit check, where the
default constant value works even when not fixed up.  Because at least I
had intentionally made it be something that is in the non-canonical
address space region.

But it's technically very wrong, and it does mean that at least in
theory, the use of 'access_ok()' + '__get_user()' can trigger the same
speculation issue with non-canonical addresses that the original commit
was all about.

The pattern is unusual enough that this probably doesn't matter in
practice, but very wrong is still very wrong.  Also, let's fix it before
the nice optimized scoped user accessor helpers that Thomas Gleixner is
working on cause this pseudo-constant to then be more widely used.

This all came up due to an unrelated discussion with Mateusz Guzik about
using the runtime const infrastructure for names_cachep accesses too.
There the modular case was much more obviously broken, and Mateusz noted
it in his 'v2' of the patch series.

That then made me notice how broken 'access_ok()' had been in modules
all along.  Mea culpa, mea maxima culpa.

Fix it by simply not using the runtime-const code in modules, and just
using the USER_PTR_MAX variable value instead.  This is not
performance-critical like the core user accessor functions (get_user()
and friends) are.

Also make sure this doesn't get forgotten the next time somebody wants
to do runtime constant optimizations by having the x86 runtime-const.h
header file error out if included by modules.

Fixes: 86e6b1547b ("x86: fix user address masking non-canonical speculation issue")
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Triggered-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-05 10:24:36 +09:00
Mario Limonciello
f1fdffe0af x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
Running x86_match_min_microcode_rev() on a Zen5 CPU trips up KASAN for an out
of bounds access.

Fixes: 607b9fb2ce ("x86/CPU/AMD: Add RDSEED fix for Zen5")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251104161007.269885-1-mario.limonciello@amd.com
2025-11-04 17:16:14 +01:00
Borislav Petkov (AMD)
847ebc4476 x86/CPU/AMD: Extend Zen6 model range
Add some more Zen6 models.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251029123056.19987-1-bp@kernel.org
2025-10-30 11:33:55 +01:00
Gregory Price
607b9fb2ce x86/CPU/AMD: Add RDSEED fix for Zen5
There's an issue with RDSEED's 16-bit and 32-bit register output
variants on Zen5 which return a random value of 0 "at a rate inconsistent
with randomness while incorrectly signaling success (CF=1)". Search the
web for AMD-SB-7055 for more detail.

Add a fix glue which checks microcode revisions.

  [ bp: Add microcode revisions checking, rewrite. ]

Cc: stable@vger.kernel.org
Signed-off-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20251018024010.4112396-1-gourry@gourry.net
2025-10-28 12:37:49 +01:00
Borislav Petkov (AMD)
8a9fb5129e x86/microcode/AMD: Limit Entrysign signature checking to known generations
Limit Entrysign sha256 signature checking to CPUs in the range Zen1-Zen5.

X86_BUG cannot be used here because the loading on the BSP happens way
too early, before the cpufeatures machinery has been set up.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/all/20251023124629.5385-1-bp@kernel.org
2025-10-27 17:07:17 +01:00
Andy Shevchenko
84dfce65a7 x86/bugs: Remove dead code which might prevent from building
Clang, in particular, is not happy about dead code:

arch/x86/kernel/cpu/bugs.c:1830:20: error: unused function 'match_option' [-Werror,-Wunused-function]
 1830 | static inline bool match_option(const char *arg, int arglen, const char *opt)
      |                    ^~~~~~~~~~~~
1 error generated.

Remove a leftover from the previous cleanup.

Fixes: 02ac6cc8c5 ("x86/bugs: Simplify SSB cmdline parsing")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251024125959.1526277-1-andriy.shevchenko%40linux.intel.com
2025-10-24 09:42:00 -07:00
David Kaplan
204ced4108 x86/bugs: Qualify RETBLEED_INTEL_MSG
When retbleed mitigation is disabled, the kernel already prints an info
message that the system is vulnerable.  Recent code restructuring also
inadvertently led to RETBLEED_INTEL_MSG being printed as an error, which is
unnecessary as retbleed mitigation was already explicitly disabled (by config
option, cmdline, etc.).

Qualify this print statement so the warning is not printed unless an actual
retbleed mitigation was selected and is being disabled due to incompatibility
with spectre_v2.

Fixes: e3b78a7ad5 ("x86/bugs: Restructure retbleed mitigation")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220624
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251003171936.155391-1-david.kaplan@amd.com
2025-10-21 12:32:28 +02:00
Andrew Cooper
876f0d43af x86/microcode: Fix Entrysign revision check for Zen1/Naples
... to match AMD's statement here:

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7033.html

Fixes: 50cef76d5c ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/20251020144124.2930784-1-andrew.cooper3@citrix.com
2025-10-21 12:16:51 +02:00
Babu Moger
19de7113bf x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode
The following NULL pointer dereference is encountered on mount of resctrl fs
after booting a system that supports assignable counters with the
"rdt=!mbmtotal,!mbmlocal" kernel parameters:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:mbm_cntr_get
  Call Trace:
  rdtgroup_assign_cntr_event
  rdtgroup_assign_cntrs
  rdt_get_tree

Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables
the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features
and the MBM events they represent. This results in the per-domain MBM event
related data structures to not be allocated during early initialization.

resctrl fs initialization follows by implicitly enabling both MBM total and
local events on a system that supports assignable counters (mbm_event mode),
but this enabling occurs after the per-domain data structures have been
created.

After booting, resctrl fs assumes that an enabled event can access all its
state. This results in NULL pointer dereference when resctrl attempts to
access the un-allocated structures of an enabled event.

Remove the late MBM event enabling from resctrl fs.

This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter
(mbm_event) mode is enabled without any events to support. Switching between
the "default" and "mbm_event" mode without any events is not practical.

Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and
X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that
supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL
or X86_FEATURE_CQM_MBM_LOCAL.

This ensures all needed MBM related data structures are created before use and
that it is only possible to switch between "default" and "mbm_event" mode when
the same events are available in both modes. This dependency does not exist in
the hardware but this usage of these feature settings work for known systems.

  [ bp: Massage commit message. ]

Fixes: 13390861b4 ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details")
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
2025-10-20 18:06:31 +02:00
Rong Zhang
e6416c2dfe x86/CPU/AMD: Prevent reset reasons from being retained across reboot
The S5_RESET_STATUS register is parsed on boot and printed to kmsg.
However, this could sometimes be misleading and lead to users wasting a
lot of time on meaningless debugging for two reasons:

* Some bits are never cleared by hardware. It's the software's
responsibility to clear them as per the Processor Programming Reference
(see [1]).

* Some rare hardware-initiated platform resets do not update the
register at all.

In both cases, a previous reboot could leave its trace in the register,
resulting in users seeing unrelated reboot reasons while debugging random
reboots afterward.

Write the read value back to the register in order to clear all reason bits
since they are write-1-to-clear while the others must be preserved.

  [1]: https://bugzilla.kernel.org/show_bug.cgi?id=206537#attach_303991

  [ bp: Massage commit message. ]

Fixes: ab81310287 ("x86/CPU/AMD: Print the reason for the last reset")
Signed-off-by: Rong Zhang <i@rong.moe>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/all/20250913144245.23237-1-i@rong.moe/
2025-10-15 21:38:06 +02:00
Babu Moger
15292f1b4c x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID
Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.

When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.

resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.

This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).

Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.

Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
   monitoring group.

   $mount -t resctrl resctrl /sys/fs/resctrl
   $mkdir /sys/fs/resctrl/mon_groups/test1/
   $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks

   $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
   21323            <- Total bytes on domain 0
   "Unavailable"    <- Total bytes on domain 1

   Task is running on domain 0. Counter on domain 1 is "Unavailable".

2. The task runs on domain 0 for a while and then moves to domain 1. The
   counter starts incrementing on domain 1.

   $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
   7345357          <- Total bytes on domain 0
   4545             <- Total bytes on domain 1

3. At some point, the RMID in domain 0 transitions to the "Unavailable"
   state because the task is no longer executing in that domain.

   $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
   "Unavailable"    <- Total bytes on domain 0
   434341           <- Total bytes on domain 1

4.  Since the task continues to migrate between domains, it may eventually
    return to domain 0.

    $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
    17592178699059  <- Overflow on domain 0
    3232332         <- Total bytes on domain 1

In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.

Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.

In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.

Here is the text from APM [1] available from [2].

"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."

[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
    Bandwidth (MBM).

  [ bp: Split commit message into smaller paragraph chunks for better
    consumption. ]

Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org # needs adjustments for <= v6.17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-10-13 21:24:39 +02:00
Linus Torvalds
9591fdb061 Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Borislav Petkov:

 - Remove a bunch of asm implementing condition flags testing in KVM's
   emulator in favor of int3_emulate_jcc() which is written in C

 - Replace KVM fastops with C-based stubs which avoids problems with the
   fastop infra related to latter not adhering to the C ABI due to their
   special calling convention and, more importantly, bypassing compiler
   control-flow integrity checking because they're written in asm

 - Remove wrongly used static branches and other ugliness accumulated
   over time in hyperv's hypercall implementation with a proper static
   function call to the correct hypervisor call variant

 - Add some fixes and modifications to allow running FRED-enabled
   kernels in KVM even on non-FRED hardware

 - Add kCFI improvements like validating indirect calls and prepare for
   enabling kCFI with GCC. Add cmdline params documentation and other
   code cleanups

 - Use the single-byte 0xd6 insn as the official #UD single-byte
   undefined opcode instruction as agreed upon by both x86 vendors

 - Other smaller cleanups and touchups all over the place

* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86,retpoline: Optimize patch_retpoline()
  x86,ibt: Use UDB instead of 0xEA
  x86/cfi: Remove __noinitretpoline and __noretpoline
  x86/cfi: Add "debug" option to "cfi=" bootparam
  x86/cfi: Standardize on common "CFI:" prefix for CFI reports
  x86/cfi: Document the "cfi=" bootparam options
  x86/traps: Clarify KCFI instruction layout
  compiler_types.h: Move __nocfi out of compiler-specific header
  objtool: Validate kCFI calls
  x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
  x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
  x86/fred: Install system vector handlers even if FRED isn't fully enabled
  x86/hyperv: Use direct call to hypercall-page
  x86/hyperv: Clean up hv_do_hypercall()
  KVM: x86: Remove fastops
  KVM: x86: Convert em_salc() to C
  KVM: x86: Introduce EM_ASM_3WCL
  KVM: x86: Introduce EM_ASM_1SRC2
  KVM: x86: Introduce EM_ASM_2CL
  KVM: x86: Introduce EM_ASM_2W
  ...
2025-10-11 11:19:16 -07:00
Linus Torvalds
2f0a750453 Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:

 - Simplify inline asm flag output operands now that the minimum
   compiler version supports the =@ccCOND syntax

 - Remove a bunch of AS_* Kconfig symbols which detect assembler support
   for various instruction mnemonics now that the minimum assembler
   version supports them all

 - The usual cleanups all over the place

* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
  x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
  x86/mtrr: Remove license boilerplate text with bad FSF address
  x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
  x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
  x86/entry/fred: Push __KERNEL_CS directly
  x86/kconfig: Remove CONFIG_AS_AVX512
  crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
  crypto: X86 - Remove CONFIG_AS_VAES
  crypto: x86 - Remove CONFIG_AS_GFNI
  x86/kconfig: Drop unused and needless config X86_64_SMP
2025-10-11 10:51:14 -07:00
Linus Torvalds
2215336295 Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - Unify guest entry code for KVM and MSHV (Sean Christopherson)

 - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain()
   (Nam Cao)

 - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV
   (Mukesh Rathor)

 - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov)

 - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna
   Kumar T S M)

 - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari,
   Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley)

* tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyperv: Remove the spurious null directive line
  MAINTAINERS: Mark hyperv_fb driver Obsolete
  fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver
  Drivers: hv: Make CONFIG_HYPERV bool
  Drivers: hv: Add CONFIG_HYPERV_VMBUS option
  Drivers: hv: vmbus: Fix typos in vmbus_drv.c
  Drivers: hv: vmbus: Fix sysfs output format for ring buffer index
  Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()
  x86/hyperv: Switch to msi_create_parent_irq_domain()
  mshv: Use common "entry virt" APIs to do work in root before running guest
  entry: Rename "kvm" entry code assets to "virt" to genericize APIs
  entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
  mshv: Handle NEED_RESCHED_LAZY before transferring to guest
  x86/hyperv: Add kexec/kdump support on Azure CVMs
  Drivers: hv: Simplify data structures for VMBus channel close message
  Drivers: hv: util: Cosmetic changes for hv_utils_transport.c
  mshv: Add support for a new parent partition configuration
  clocksource: hyper-v: Skip unnecessary checks for the root partition
  hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-07 08:40:15 -07:00
Linus Torvalds
256e341706 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
 "Generic:

   - Rework almost all of KVM's exports to expose symbols only to KVM's
     x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko

  x86:

   - Rework almost all of KVM x86's exports to expose symbols only to
     KVM's vendor modules, i.e. to kvm-{amd,intel}.ko

   - Add support for virtualizing Control-flow Enforcement Technology
     (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
     (Shadow Stacks).

     It is worth noting that while SHSTK and IBT can be enabled
     separately in CPUID, it is not really possible to virtualize them
     separately. Therefore, Intel processors will really allow both
     SHSTK and IBT under the hood if either is made visible in the
     guest's CPUID. The alternative would be to intercept
     XSAVES/XRSTORS, which is not feasible for performance reasons

   - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
     when completing userspace I/O. KVM has already committed to
     allowing L2 to to perform I/O at that point

   - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
     MSR is supposed to exist for v2 PMUs

   - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs

   - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
     emulator support (KVM should never need to emulate the MSRs outside
     of forced emulation and other contrived testing scenarios)

   - Clean up the MSR APIs in preparation for CET and FRED
     virtualization, as well as mediated vPMU support

   - Clean up a pile of PMU code in anticipation of adding support for
     mediated vPMUs

   - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
     vmexits needed to faithfully emulate an I/O APIC for such guests

   - Many cleanups and minor fixes

   - Recover possible NX huge pages within the TDP MMU under read lock
     to reduce guest jitter when restoring NX huge pages

   - Return -EAGAIN during prefault if userspace concurrently
     deletes/moves the relevant memslot, to fix an issue where
     prefaulting could deadlock with the memslot update

  x86 (AMD):

   - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
     supported

   - Require a minimum GHCB version of 2 when starting SEV-SNP guests
     via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
     errors instead of latent guest failures

   - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
     prevents unauthorized CPU accesses from reading the ciphertext of
     SNP guest private memory, e.g. to attempt an offline attack. This
     feature splits the shared SEV-ES/SEV-SNP ASID space into separate
     ranges for SEV-ES and SEV-SNP guests, therefore a new module
     parameter is needed to control the number of ASIDs that can be used
     for VMs with CipherText Hiding vs. how many can be used to run
     SEV-ES guests

   - Add support for Secure TSC for SEV-SNP guests, which prevents the
     untrusted host from tampering with the guest's TSC frequency, while
     still allowing the the VMM to configure the guest's TSC frequency
     prior to launch

   - Validate the XCR0 provided by the guest (via the GHCB) to avoid
     bugs resulting from bogus XCR0 values

   - Save an SEV guest's policy if and only if LAUNCH_START fully
     succeeds to avoid leaving behind stale state (thankfully not
     consumed in KVM)

   - Explicitly reject non-positive effective lengths during SNP's
     LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
     them

   - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
     host's desired TSC_AUX, to fix a bug where KVM was keeping a
     different vCPU's TSC_AUX in the host MSR until return to userspace

  KVM (Intel):

   - Preparation for FRED support

   - Don't retry in TDX's anti-zero-step mitigation if the target
     memslot is invalid, i.e. is being deleted or moved, to fix a
     deadlock scenario similar to the aforementioned prefaulting case

   - Misc bugfixes and minor cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
  KVM: x86: Export KVM-internal symbols for sub-modules only
  KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
  KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
  KVM: Export KVM-internal symbols for sub-modules only
  KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
  KVM: VMX: Make CR4.CET a guest owned bit
  KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
  KVM: selftests: Add coverage for KVM-defined registers in MSRs test
  KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
  KVM: selftests: Extend MSRs test to validate vCPUs without supported features
  KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
  KVM: selftests: Add an MSR test to exercise guest/host and read/write
  KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
  KVM: x86: Define Control Protection Exception (#CP) vector
  KVM: x86: Add human friendly formatting for #XM, and #VE
  KVM: SVM: Enable shadow stack virtualization for SVM
  KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  KVM: SVM: Pass through shadow stack MSRs as appropriate
  KVM: SVM: Update dump_vmcb with shadow stack save area additions
  KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
  ...
2025-10-06 12:37:34 -07:00
Linus Torvalds
50ac57c3b1 Merge tag 'x86_tdx_for_6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
 "The biggest change here is making TDX and kexec play nicely together.

  Before this, the memory encryption hardware (which doesn't respect
  cache coherency) could write back old cachelines on top of data in the
  new kernel, so kexec and TDX were made mutually exclusive. This
  removes the limitation.

  There is also some work to tighten up a hardware bug workaround and
  some MAINTAINERS updates.

   - Make TDX and kexec work together

    - Skip TDX bug workaround when the bug is not present

    - Update maintainers entries"

* tag 'x86_tdx_for_6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/virt/tdx: Use precalculated TDVPR page physical address
  KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
  x86/virt/tdx: Update the kexec section in the TDX documentation
  x86/virt/tdx: Remove the !KEXEC_CORE dependency
  x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
  x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
  x86/sme: Use percpu boolean to control WBINVD during kexec
  x86/kexec: Consolidate relocate_kernel() function parameters
  x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
  x86/tdx: Tidy reset_pamt functions
  x86/tdx: Eliminate duplicate code in tdx_clear_page()
  MAINTAINERS: Add KVM mail list to the TDX entry
  MAINTAINERS: Add Rick Edgecombe as a TDX reviewer
  MAINTAINERS: Update the file list in the TDX entry.
2025-10-04 10:01:30 -07:00
Linus Torvalds
2cb8eeaf00 Merge tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
 "Add support on AMD for assigning QoS bandwidth counters to resources
  (RMIDs) with the ability for those resources to be tracked by the
  counters as long as they're assigned to them.

  Previously, due to hw limitations, bandwidth counts from untracked
  resources would get lost when those resources are not tracked.

  Refactor the code and user interfaces to be able to also support
  other, similar features on ARM, for example"

* tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled
  MAINTAINERS: resctrl: Add myself as reviewer
  x86/resctrl: Configure mbm_event mode if supported
  fs/resctrl: Introduce the interface to switch between monitor modes
  fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled
  fs/resctrl: Introduce the interface to modify assignments in a group
  fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
  fs/resctrl: Auto assign counters on mkdir and clean up on group removal
  fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
  fs/resctrl: Provide interface to update the event configurations
  fs/resctrl: Add event configuration directory under info/L3_MON/
  fs/resctrl: Support counter read/reset with mbm_event assignment mode
  x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read()
  x86/resctrl: Refactor resctrl_arch_rmid_read()
  fs/resctrl: Introduce counter ID read, reset calls in mbm_event mode
  fs/resctrl: Pass struct rdtgroup instead of individual members
  fs/resctrl: Add the functionality to unassign MBM events
  fs/resctrl: Add the functionality to assign MBM events
  x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  fs/resctrl: Introduce event configuration field in struct mon_evt
  ...
2025-09-30 13:29:42 -07:00
Linus Torvalds
a65879b458 Merge tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:

 - Make UMIP instruction detection more robust

 - Correct and cleanup AMD CPU topology detection; document the relevant
   CPUID leaves topology parsing precedence on AMD

 - Add support for running the kernel as guest on FreeBSD's Bhyve
   hypervisor

 - Cleanups and improvements

* tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases)
  x86/umip: Check that the instruction opcode is at least two bytes
  Documentation/x86/topology: Detail CPUID leaves used for topology enumeration
  x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR
  x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing has_xtopology
  x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_info
  x86/cpu: Rename and move CPU model entry for Diamond Rapids
  x86/cpu: Detect FreeBSD Bhyve hypervisor
2025-09-30 13:19:08 -07:00
Linus Torvalds
d7ec0cf1cd Merge tag 'x86_bugs_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mitigation updates from Borislav Petkov:

 - Add VMSCAPE to the attack vector controls infrastructure

 - A bunch of the usual cleanups and fixlets, some of them resulting
   from fuzzing the different mitigation options

* tag 'x86_bugs_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Report correct retbleed mitigation status
  x86/bugs: Fix reporting of LFENCE retpoline
  x86/bugs: Fix spectre_v2 forcing
  x86/bugs: Remove uses of cpu_mitigations_off()
  x86/bugs: Simplify SSB cmdline parsing
  x86/bugs: Use early_param() for spectre_v2
  x86/bugs: Use early_param() for spectre_v2_user
  x86/bugs: Add attack vector controls for VMSCAPE
  x86/its: Move ITS indirect branch thunks to .text..__x86.indirect_thunk
2025-09-30 12:46:57 -07:00
Linus Torvalds
d9c43b6e43 Merge tag 'ras_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:

 - Unify and refactor the MCA arch side and better separate code

 - Cleanup and simplify the AMD RAS side, unify code, drop unused stuff

* tag 'ras_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Add a clear_bank() helper
  x86/mce: Move machine_check_poll() status checks to helper functions
  x86/mce: Separate global and per-CPU quirks
  x86/mce: Do 'UNKNOWN' vendor check early
  x86/mce: Define BSP-only SMCA init
  x86/mce: Define BSP-only init
  x86/mce: Set CR4.MCE last during init
  x86/mce: Remove __mcheck_cpu_init_early()
  x86/mce: Cleanup bank processing on init
  x86/mce/amd: Put list_head in threshold_bank
  x86/mce/amd: Remove smca_banks_map
  x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device()
  x86/mce/amd: Rename threshold restart function
2025-09-30 12:43:17 -07:00
Linus Torvalds
bd91417a96 Merge tag 'x86_microcode_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:

 - Add infrastructure to be able to debug the microcode loader in a guest

 - Refresh Intel old microcode revisions

* tag 'x86_microcode_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Add microcode loader debugging functionality
  x86/microcode: Add microcode= cmdline parsing
  x86/microcode/intel: Refresh the revisions that determine old_microcode
2025-09-30 12:41:10 -07:00
Paolo Bonzini
d05ca6b793 Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD
KVM x86 changes for 6.18

 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
   where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
   intercepts and trigger a variety of WARNs in KVM.

 - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
   supposed to exist for v2 PMUs.

 - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.

 - Clean up KVM's vector hashing code for delivering lowest priority IRQs.

 - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
   actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
   in the process of doing so add fastpath support for TSC_DEADLINE writes on
   AMD CPUs.

 - Clean up a pile of PMU code in anticipation of adding support for mediated
   vPMUs.

 - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
   emulator support (KVM should never need to emulate the MSRs outside of
   forced emulation and other contrived testing scenarios).

 - Clean up the MSR APIs in preparation for CET and FRED virtualization, as
   well as mediated vPMU support.

 - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
   as KVM can't faithfully emulate an I/O APIC for such guests.

 - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
   for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
   response to PMU refreshes for guests with mediated vPMUs.

 - Misc cleanups and minor fixes.
2025-09-30 13:36:41 -04:00
Thomas Gleixner
2066f00e5b x86/topology: Implement topology_is_core_online() to address SMT regression
Christian reported that commit a430c11f40 ("intel_idle: Rescan "dead" SMT
siblings during initialization") broke the use case in which both 'nosmt'
and 'maxcpus' are on the kernel command line because it onlines primary
threads, which were offline due to the maxcpus limit.

The initially proposed fix to skip primary threads in the loop is
inconsistent. While it prevents the primary thread to be onlined, it then
onlines the corresponding hyperthread(s), which does not really make sense.

The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
threads of a core, when the primary thread is offline. The default
implementation is a NOOP and therefore not effective on x86.

Implement topology_is_core_online() on x86 to address this issue. This
makes the behaviour consistent between x86 and PowerPC.

Fixes: a430c11f40 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
Reported-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
2025-09-22 21:25:36 +02:00
K Prateek Nayak
bc6397cf0b x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR
Add defines for the 0xc001_1005 MSR (Core::X86::Msr::CPUID_ExtFeatures) used
to toggle the extended CPUID features, instead of using literal numbers. Also
define and use the bits necessary for an old TOPOEXT fixup on AMD Family 0x15
processors.

No functional changes intended.

  [ bp: Massage, rename MSR to adhere to the documentation name. ]

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/20250901170418.4314-1-kprateek.nayak@amd.com
2025-09-17 11:24:33 +02:00
K Prateek Nayak
d691c5f87f x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing has_xtopology
cpu_parse_topology_ext() sets X86_FEATURE_XTOPOLOGY before returning
true if any of the XTOPOLOGY leaf (0x80000026 / 0xb) could be parsed
successfully.

Instead of storing and passing around this return value using
"has_xtopology" in parse_topology_amd(), check for X86_FEATURE_XTOPOLOGY
directly in parse_8000_001e() to simplify the flow.

No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/20250901170418.4314-1-kprateek.nayak@amd.com
2025-09-17 11:23:40 +02:00
K Prateek Nayak
af507c6951 x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_info
struct _cpuid4_info has the same layout as the CPUID leaf 0x8000001d.
Use the encoded definition and amd_fill_cpuid4_info(), get_cache_id()
helpers instead of open coding masks and shifts to calculate the llc_id.

cacheinfo_amd_init_llc_id() is only called on AMD systems that support
X86_FEATURE_TOPOEXT and amd_fill_cpuid4_info() uses the information from
CPUID leaf 0x8000001d on all these systems which is consistent with the
current open coded implementation.

While at it, avoid reading  cpu_data() every time get_cache_id() is
called and instead pass the APIC ID necessary to return the
_cpuid4_info.id from get_cache_id().

No functional changes intended.

  [ bp: do what Ahmed suggests: merge into one patch, make id4 ptr
    const. ]

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ahmed S. Darwish <darwi@linutronix.de>
Link: https://lore.kernel.org/20250821051910.7351-2-kprateek.nayak@amd.com
2025-09-17 11:22:40 +02:00
David Kaplan
930f2361fe x86/bugs: Report correct retbleed mitigation status
On Intel CPUs, the default retbleed mitigation is IBRS/eIBRS but this
requires that a similar spectre_v2 mitigation is applied.  If the user
selects a different spectre_v2 mitigation (like spectre_v2=retpoline) a
warning is printed but sysfs will still report 'Mitigation: IBRS' or
'Mitigation: Enhanced IBRS'.  This is incorrect because retbleed is not
mitigated, and IBRS is not actually set.

Fix this by choosing RETBLEED_MITIGATION_NONE in this scenario so the
kernel correctly reports the system as vulnerable to retbleed.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
2025-09-16 13:32:18 +02:00
David Kaplan
d1cc1baef6 x86/bugs: Fix reporting of LFENCE retpoline
The LFENCE retpoline mitigation is not secure but the kernel prints
inconsistent messages about this fact.  The dmesg log says 'Mitigation:
LFENCE', implying the system is mitigated.  But sysfs reports 'Vulnerable:
LFENCE' implying the system (correctly) is not mitigated.

Fix this by printing a consistent 'Vulnerable: LFENCE' string everywhere
when this mitigation is selected.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
2025-09-16 13:21:21 +02:00
David Kaplan
30ef245c6f x86/bugs: Fix spectre_v2 forcing
There were two oddities with spectre_v2 command line options.

First, any option other than 'off' or 'auto' would force spectre_v2
mitigations even if the CPU (hypothetically) wasn't vulnerable to spectre_v2.
That was inconsistent with all the other bugs where mitigations are ignored
unless an explicit 'force' option is specified.

Second, even though spectre_v2 mitigations would be enabled in these cases,
the X86_BUG_SPECTRE_V2 bit wasn't set.  This is again inconsistent with the
forcing behavior of other bugs and arguably incorrect as it doesn't make sense
to enable a mitigation if the X86_BUG bit isn't set.

Fix both issues by only forcing spectre_v2 mitigations when the
'spectre_v2=on' option is specified (which was already called
SPECTRE_V2_CMD_FORCE) and setting the relevant X86_BUG_* bits in that case.

This also allows for simplifying bhi_update_mitigation() because
spectre_v2_cmd will now always be SPECTRE_V2_CMD_NONE if the CPU is immune to
spectre_v2.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
2025-09-16 12:59:55 +02:00
David Kaplan
440d20154a x86/bugs: Remove uses of cpu_mitigations_off()
cpu_mitigations_off() is no longer needed because all bugs use attack
vector controls to select a mitigation, and cpu_mitigations_off() is
equivalent to no attack vectors being selected.

Remove the few remaining unnecessary uses of this function in this file.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
2025-09-15 21:10:44 +02:00
David Kaplan
02ac6cc8c5 x86/bugs: Simplify SSB cmdline parsing
Simplify the SSB command line parsing by selecting a mitigation directly, as
is done in most of the simpler vulnerabilities.  Use early_param() instead of
cmdline_find_option() for consistency with the other mitigation selections.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/r/20250819192200.2003074-4-david.kaplan@amd.com
2025-09-15 18:04:20 +02:00
David Kaplan
9a9f8147ae x86/bugs: Use early_param() for spectre_v2
Most of the mitigations in bugs.c use early_param() for command line parsing.
Rework the spectre_v2 and nospectre_v2 command line options to be
consistent with the others.

Remove spec_v2_print_cond() as informing the user of the their cmdline
choice isn't interesting.

  [ bp: Zap spectre_v2_check_cmd(). ]

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/r/20250819192200.2003074-3-david.kaplan@amd.com
2025-09-15 17:26:49 +02:00
David Kaplan
8edb9e7711 x86/bugs: Use early_param() for spectre_v2_user
Most of the mitigations in bugs.c use early_param() to parse their command
line options.  Modify spectre_v2_user to use early_param() for consistency.

Remove spec_v2_user_print_cond() because informing a user about their
cmdline choice isn't very interesting and the chosen mitigation is already
printed in spectre_v2_user_update_mitigation().

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/r/20250819192200.2003074-2-david.kaplan@amd.com
2025-09-15 17:07:22 +02:00
David Woodhouse
fa1d117162 x86/cpu: Detect FreeBSD Bhyve hypervisor
Detect the Bhyve hypervisor and enable 15-bit MSI support if available.

Detecting Bhyve used to be a purely cosmetic issue of the kernel printing
'Hypervisor detected: Bhyve' at boot time.

But FreeBSD 15.0 will support¹ the 15-bit MSI enlightenment to support
more than 255 vCPUs (http://david.woodhou.se/ExtDestId.pdf) which means
there's now actually some functional reason to do so.

  ¹ https://github.com/freebsd/freebsd-src/commit/313a68ea20b4

  [ bp: Massage, move tail comment ontop. ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ahmed S. Darwish <darwi@linutronix.de>
Link: https://lore.kernel.org/03802f6f7f5b5cf8c5e8adfe123c397ca8e21093.camel@infradead.org
2025-09-15 14:06:44 +02:00
Babu Moger
0f1576e43a x86/resctrl: Configure mbm_event mode if supported
Configure mbm_event mode on AMD platforms. On AMD platforms, it is
recommended to use the mbm_event mode, if supported, to prevent the
hardware from resetting counters between reads. This can result in
misleading values or display "Unavailable" if no counter is assigned
to the event.

Enable mbm_event mode, known as ABMC (Assignable Bandwidth Monitoring
Counters) on AMD, by default if the system supports it.

Update ABMC across all logical processors within the resctrl domain to
ensure proper functionality.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:52:04 +02:00
Babu Moger
2a65b72c16 x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read()
System software reads resctrl event data for a particular resource by writing
the RMID and Event Identifier (EvtID) to the QM_EVTSEL register and then
reading the event data from the QM_CTR register.

In ABMC mode, the event data of a specific counter ID is read by setting the
following fields: QM_EVTSEL.ExtendedEvtID = 1, QM_EVTSEL.EvtID = L3CacheABMC
(=1) and setting QM_EVTSEL.RMID to the desired counter ID.  Reading the QM_CTR
then returns the contents of the specified counter ID.  RMID_VAL_ERROR bit is
set if the counter configuration is invalid, or if an invalid counter ID is
set in the QM_EVTSEL.RMID field.  RMID_VAL_UNAVAIL bit is set if the counter
data is unavailable.

Introduce resctrl_arch_reset_cntr() and resctrl_arch_cntr_read() to reset and
read event data for a specific counter.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:30:22 +02:00
Babu Moger
7c9ac605e2 x86/resctrl: Refactor resctrl_arch_rmid_read()
resctrl_arch_rmid_read() adjusts the value obtained from MSR_IA32_QM_CTR to
account for the overflow for MBM events and apply counter scaling for all the
events. This logic is common to both reading an RMID and reading a hardware
counter directly.

Refactor the hardware value adjustment logic into get_corrected_val() to
prepare for support of reading a hardware counter.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:28:21 +02:00
Babu Moger
f7a4fb2231 x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
The ABMC feature allows users to assign a hardware counter to an RMID,
event pair and monitor bandwidth usage as long as it is assigned. The
hardware continues to track the assigned counter until it is explicitly
unassigned by the user.

Implement an x86 architecture-specific handler to configure a counter. This
architecture specific handler is called by resctrl fs when a counter is
assigned or unassigned as well as when an already assigned counter's
configuration should be updated. Configure counters by writing to the
L3_QOS_ABMC_CFG MSR, specifying the counter ID, bandwidth source (RMID),
and event configuration.

The ABMC feature details are documented in APM [1] available from [2].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
    Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
    Monitoring (ABMC).

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15 12:20:29 +02:00
Babu Moger
84ecefb766 x86/resctrl: Add data structures and definitions for ABMC assignment
The ABMC feature allows users to assign a hardware counter to an RMID,
event pair and monitor bandwidth usage as long as it is assigned. The
hardware continues to track the assigned counter until it is explicitly
unassigned by the user.

The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
ABMC counter assignment is done by setting the counter id, bandwidth
source (RMID) and bandwidth configuration.

Attempts to read or write the MSR when ABMC is not enabled will result
in a #GP(0) exception.

Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
(0xC000_03FDh):
=========================================================================
Bits 	Mnemonic	Description			Access Reset
							Type   Value
=========================================================================
63 	CfgEn 		Configuration Enable 		R/W 	0

62 	CtrEn 		Enable/disable counting		R/W 	0

61:53 	– 		Reserved 			MBZ 	0

52:48 	CtrID 		Counter Identifier		R/W	0

47 	IsCOS		BwSrc field is a CLOSID		R/W	0
			(not an RMID)

46:44 	–		Reserved			MBZ	0

43:32	BwSrc		Bandwidth Source		R/W	0
			(RMID or CLOSID)

31:0	BwType		Bandwidth configuration		R/W	0
			tracked by the CtrID
==========================================================================

The ABMC feature details are documented in APM [1] available from [2].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

  [ bp: Touchups. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15 12:17:33 +02:00
Babu Moger
faebbc58cd x86/resctrl: Add support to enable/disable AMD ABMC feature
Add the functionality to enable/disable the AMD ABMC feature.

The AMD ABMC feature is enabled by setting enabled bit(0) in the
L3_QOS_EXT_CFG MSR. When the state of ABMC is changed, the MSR needs to be
updated on all the logical processors in the QOS Domain.

Hardware counters will reset when ABMC state is changed.

  [ bp: Massage commit message. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15 12:09:30 +02:00
Babu Moger
13390861b4 x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
     Monitoring Counter ID + 1

The ABMC feature details are documented in APM [1] available from [2].

  [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
  Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
  Monitoring (ABMC).

Detect the feature and number of assignable counters supported. For backward
compatibility, upon detecting the assignable counter feature, enable the
mbm_total_bytes and mbm_local_bytes events that users are familiar with as
part of original L3 MBM support.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15 12:08:01 +02:00
Babu Moger
5ad68c8f96 x86,fs/resctrl: Consolidate monitoring related data from rdt_resource
The cache allocation and memory bandwidth allocation feature properties are
consolidated into struct resctrl_cache and struct resctrl_membw respectively.

In preparation for more monitoring properties that will clobber the existing
resource struct more, re-organize the monitoring specific properties to also
be in a separate structure.

Also convert "bandwidth sources" terminology to "memory transactions" to have
consistency within resctrl for related monitoring features.

  [ bp: Massage commit message. ]

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:07:01 +02:00
Babu Moger
bebf57bf05 x86/resctrl: Add ABMC feature in the command line options
Add a kernel command-line parameter to enable or disable the exposure of
the ABMC (Assignable Bandwidth Monitoring Counters) hardware feature to
resctrl.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:05:23 +02:00
Babu Moger
e19c062199 x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)
Users can create as many monitor groups as RMIDs supported by the hardware.
However, the bandwidth monitoring feature on AMD only guarantees that RMIDs
currently assigned to a processor will be tracked by hardware. The counters of
any other RMIDs which are no longer being tracked will be reset to zero.

The MBM event counters return "Unavailable" for the RMIDs that are not tracked
by hardware. So, there can be only limited number of groups that can give
guaranteed monitoring numbers. With ever changing configurations there is no
way to definitely know which of these groups are being tracked during
a particular time. Users do not have the option to monitor a group or set of
groups for a certain period of time without worrying about RMID being reset in
between.

The ABMC feature allows users to assign a hardware counter to an RMID, event
pair and monitor bandwidth usage as long as it is assigned. The hardware
continues to track the assigned counter until it is explicitly unassigned by
the user. There is no need to worry about counters being reset during this
period. Additionally, the user can specify the type of memory transactions
(e.g., reads, writes) for the counter to track.

Without ABMC enabled, monitoring will work in current mode without assignment
option.

The Linux resctrl subsystem provides an interface that allows monitoring of up
to two memory bandwidth events per group, selected from a combination of
available total and local events. When ABMC is enabled, two events will be
assigned to each group by default, in line with the current interface design.
Users will also have the option to configure which types of memory
transactions are counted by these events.

Due to the limited number of available counters (32), users may quickly
exhaust the available counters. If the system runs out of assignable ABMC
counters, the kernel will report an error. In such cases, users will need to
unassign one or more active counters to free up counters for new assignments.
resctrl will provide options to assign or unassign events through the
group-specific interface file.

The feature is detected via CPUID_Fn80000020_EBX_x00 bit 5: ABMC (Assignable
Bandwidth Monitoring Counters).

The ABMC feature details are documented in APM [1] available from [2]. [1]
AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

  [ bp: Massage commit message, fixup enumeration due to VMSCAPE ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15 12:04:15 +02:00
Tony Luck
83b0398773 x86,fs/resctrl: Prepare for more monitor events
There's a rule in computer programming that objects appear zero, once, or many
times. So code accordingly.

There are two MBM events and resctrl is coded with a lot of

  if (local)
          do one thing
  if (total)
          do a different thing

Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays of
pointers to per event data instead of explicit fields for total and local
bandwidth.

Simplify by coding for many events using loops on which are enabled.

Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more
widely. Also provide a for_each_mbm_event_id() helper macro.

Cleanup variable names in functions touched to consistently use "eventid" for
those with type enum resctrl_event_id.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 11:57:03 +02:00