linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-26 02:32:45 -04:00

Author	SHA1	Message	Date
Paolo Bonzini	ebec25438f	KVM: x86: Enable support for emulating AVX MOV instructions Some users of KVM have emulated devices (typically added to private forks of QEMU) that execute AVX instructions on PCI BARs. Whenever the guest OS tries to do that, an illegal instruction exception or emulation failure is triggered. Add the Avx flag to move instructions: - (66) 0f 10 - MOVUPS/MOVUPD from memory - (66) 0f 11 - MOVUPS/MOVUPD to memory - 66 0f 6f - MOVDQA from memory - 66 0f 7f - MOVDQA to memory - f3 0f 6f - MOVDQU from memory - f3 0f 7f - MOVDQU to memory - (66) 0f 28 - MOVAPS/MOVAPD from memory - (66) 0f 29 - MOVAPS/MOVAPD to memory - (66) 0f 2b - MOVNTPS/MOVNTPD to memory - 66 0f e7 - MOVNTDQ to memory - 66 0f 38 2a - MOVNTDQA to memory Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/kvm/BD108C42-0382-4B17-B601-434A4BD038E7@fb.com/T/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://patch.msgid.link/20251114003633.60689-11-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-20 07:26:08 -08:00
Paolo Bonzini	f0585a714a	KVM: x86: Add emulator support for decoding VEX prefixes After all the changes done in the previous patches, the only thing left to support AVX MOV instructions is to expand the VEX prefix into the appropriate REX, 66/F3/F2 and map prefixes. Three-operand instructions are not supported. The Avx bit in this case is not cleared, in fact it is used as the sign that the instruction does support VEX encoding. Until it is added to any instruction, however, the only functional change is to change some not-implemented instructions to #UD if they correspond to a VEX prefix with an invalid map. Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://patch.msgid.link/20251114003633.60689-10-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:36:52 -08:00
Chang S. Bae	825f0aece0	KVM: x86: Refactor REX prefix handling in instruction emulation Restructure how to represent and interpret REX fields, preparing for handling of both REX2 and VEX. REX uses the upper four bits of a single byte as a fixed identifier, and the lower four bits containing the data. VEX and REX2 extends this so that the first byte identifies the prefix and the rest encode additional bits; and while VEX only has the same four data bits as REX, eight zero bits are a valid value for the data bits of REX2. So, stop storing the REX byte as-is. Instead, store only the low bits of the REX prefix and track separately whether a REX-like prefix was used. No functional changes intended. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Message-ID: <20251110180131.28264-11-chang.seok.bae@intel.com> [Extracted from APX series; removed bitfields and REX2-specific default. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://patch.msgid.link/20251114003633.60689-9-pbonzini@redhat.com [sean: name REX_{BXRW} enum "rex_bits"] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:36:38 -08:00
Paolo Bonzini	4cb21be4c3	KVM: x86: Add AVX support to the emulator's register fetch and writeback Prepare struct operand for hosting AVX registers. Remove the existing, incomplete code that placed the Avx flag in the operand alignment field, and repurpose the name for a separate bit that indicates: - after decode, whether an instruction supports the VEX prefix; - before writeback, that the instruction did have the VEX prefix and therefore 1) it can have op_bytes == 32; 2) t should clear high bytes of XMM registers. Right now the bit will never be set and the patch has no intended functional change. However, this is actually more vexing than the decoder changes itself, and therefore worth separating. Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://patch.msgid.link/20251114003633.60689-8-pbonzini@redhat.com [sean: guard ymm[8-15] accesses with #ifdef CONFIG_X86_64] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:30:54 -08:00
Paolo Bonzini	f106797f81	KVM: x86: Add x86_emulate_ops.get_xcr() callback This will be necessary in order to check whether AVX is enabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Link: https://patch.msgid.link/20251114003633.60689-7-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:29:55 -08:00
Paolo Bonzini	7e11eec989	KVM: x86: Share emulator's common register decoding code Remove all duplicate handling of register operands, including picking the right register class and fetching it, by extracting a new function that can be used for both REG and MODRM operands. Centralize setting op->orig_val = op->val in fetch_register_operand() as well. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Link: https://patch.msgid.link/20251114003633.60689-6-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:28:53 -08:00
Paolo Bonzini	1a84b07aca	KVM: x86: Move op_prefix to struct x86_emulate_ctxt (from x86_decode_insn()) VEX decode will need to set it based on the "pp" bits, so make it a field in the struct rather than a local variable. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Link: https://patch.msgid.link/20251114003633.60689-5-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:28:30 -08:00
Paolo Bonzini	3d8834a0d1	KVM: x86: Improve formatting of the emulator's flags table Align a little better the comments on the right side and list explicitly the bits used by multi-bit fields. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Link: https://patch.msgid.link/20251114003633.60689-4-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:27:25 -08:00
Paolo Bonzini	3f3fc58df5	KVM: x86: Move Src2Shift up one bit (use bits 36:32 for Src2 in the emulator) An irresistible microoptimization (changing accesses to Src2 to just an AND :)) that also frees a bit for AVX in the low flags word. This makes it closer to SSE since both of them can access XMM registers, pointlessly shaving another clock cycle or two (maybe). No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com Link: https://patch.msgid.link/20251114003633.60689-3-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:25:59 -08:00
Paolo Bonzini	c57d9bafbd	KVM: x86: Add support for emulating MOVNTDQA MOVNTDQA is a simple MOV instruction, in fact it has the same characteristics as 0F E7 (MOVNTDQ) other than the aligned-address requirement. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://patch.msgid.link/20251114003633.60689-2-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 14:24:53 -08:00
Binbin Wu	0b28f21ad4	KVM: x86: Add a helper to dedup loading guest/host XCR0 and XSS Add and use a helper, kvm_load_xfeatures(), to dedup the code that loads guest/host xfeatures. Opportunistically return early if X86_CR4_OSXSAVE is not set to reduce indentations. No functional change intended. Suggested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://patch.msgid.link/20251110050539.3398759-1-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 05:41:12 -08:00
Sean Christopherson	7649412af3	KVM: x86: Load guest/host PKRU outside of the fastpath run loop Move KVM's swapping of PKRU outside of the fastpath loop, as there is no KVM code anywhere in the fastpath that accesses guest/userspace memory, i.e. that can consume protection keys. As documented by commit `1be0e61c1f` ("KVM, pkeys: save/restore PKRU when guest/host switches"), KVM just needs to ensure the host's PKRU is loaded when KVM (or the kernel at-large) may access userspace memory. And at the time of commit `1be0e61c1f`, KVM didn't have a fastpath, and PKU was strictly contained to VMX, i.e. there was no reason to swap PKRU outside of vmx_vcpu_run(). Over time, the "need" to swap PKRU close to VM-Enter was likely falsely solidified by the association with XFEATUREs in commit `37486135d3` ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c"), and XFEATURE swapping was in turn moved close to VM-Enter/VM-Exit as a KVM hack-a-fix ution for an #MC handler bug by commit `1811d979c7` ("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context"). Deferring the PKRU loads shaves ~40 cycles off the fastpath for Intel, and ~60 cycles for AMD. E.g. using INVD in KVM-Unit-Test's vmexit.c, with extra hacks to enable CR4.PKE and PKRU=(-1u & ~0x3), latency numbers for AMD Turin go from ~1560 => ~1500, and for Intel Emerald Rapids, go from ~810 => ~770. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Jon Kohler <jon@nutanix.com> Link: https://patch.msgid.link/20251118222328.2265758-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 05:41:11 -08:00
Sean Christopherson	75c69c82f2	KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run loop Move KVM's swapping of XFEATURE masks, i.e. XCR0 and XSS, out of the fastpath loop now that the guts of the #MC handler runs in task context, i.e. won't invoke schedule() with preemption disabled and clobber state (or crash the kernel) due to trying to context switch XSTATE with a mix of host and guest state. For all intents and purposes, this reverts commit `1811d979c7` ("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context"), which papered over an egregious bug/flaw in the #MC handler where it would do schedule() even though IRQs are disabled. E.g. the call stack from the commit: kvm_load_guest_xcr0 ... kvm_x86_ops->run(vcpu) vmx_vcpu_run vmx_complete_atomic_exit kvm_machine_check do_machine_check do_memory_failure memory_failure lock_page Commit `1811d979c7` "fixed" the immediate issue of XRSTORS exploding, but completely ignored that scheduling out a vCPU task while IRQs and preemption is wildly broken. Thankfully, commit `5567d11c21` ("x86/mce: Send #MC singal from task work") (somewhat incidentally?) fixed that flaw by pushing the meat of the work to the user-return path, i.e. to task context. KVM has also hardened itself against #MC goofs by moving #MC forwarding to kvm_x86_ops.handle_exit_irqoff(), i.e. out of the fastpath. While that's by no means a robust fix, restoring as much state as possible before handling the #MC will hopefully provide some measure of protection in the event that #MC handling goes off the rails again. Note, KVM always intercepts XCR0 writes for vCPUs without protected state, e.g. there's no risk of consuming a stale XCR0 when determining if a PKRU update is needed; kvm_load_host_xfeatures() only reads, and never writes, vcpu->arch.xcr0. Deferring the XCR0 and XSS loads shaves ~300 cycles off the fastpath for Intel, and ~500 cycles for AMD. E.g. using INVD in KVM-Unit-Test's vmexit.c, which an extra hack to enable CR4.OXSAVE, latency numbers for AMD Turin go from ~2000 => 1500, and for Intel Emerald Rapids, go from ~1300 => ~1000. Cc: Jon Kohler <jon@nutanix.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Jon Kohler <jon@nutanix.com> Link: https://patch.msgid.link/20251118222328.2265758-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 05:41:10 -08:00
Sean Christopherson	63669bd1d5	KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath Handle Machine Checks (#MC) that happen on VM-Enter (VMX or TDX) outside of KVM's fastpath so that as much host state as possible is re-loaded before invoking the kernel's #MC handler. The only requirement is that KVM invokes the #MC handler before enabling IRQs (and even that could _probably_ be related to handling #MCs before enabling preemption). Waiting to handle #MCs until "more" host state is loaded hardens KVM against flaws in the #MC handler, which has historically been quite brittle. E.g. prior to commit `5567d11c21` ("x86/mce: Send #MC singal from task work"), the #MC code could trigger a schedule() with IRQs and preemption disabled. That led to a KVM hack-a-fix in commit `1811d979c7` ("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context"). Note, vmx_handle_exit_irqoff() is common to VMX and TDX guests. Cc: Tony Lindgren <tony.lindgren@linux.intel.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Jon Kohler <jon@nutanix.com> Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com> Link: https://patch.msgid.link/20251118222328.2265758-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-19 05:40:52 -08:00
Sean Christopherson	ebd1a33655	KVM: SVM: Handle #MCs in guest outside of fastpath Handle Machine Checks (#MC) that happen in the guest (by forwarding them to the host) outside of KVM's fastpath so that as much host state as possible is re-loaded before invoking the kernel's #MC handler. The only requirement is that KVM invokes the #MC handler before enabling IRQs (and even that could _probably_ be relaxed to handling #MCs before enabling preemption). Waiting to handle #MCs until "more" host state is loaded hardens KVM against flaws in the #MC handler, which has historically been quite brittle. E.g. prior to commit `5567d11c21` ("x86/mce: Send #MC singal from task work"), the #MC code could trigger a schedule() with IRQs and preemption disabled. That led to a KVM hack-a-fix in commit `1811d979c7` ("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context"). Note, except for #MCs on VM-Enter, VMX already handles #MCs outside of the fastpath. Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Jon Kohler <jon@nutanix.com> Link: https://patch.msgid.link/20251118222328.2265758-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:46 -08:00
Brendan Jackman	38ee66cb18	KVM: x86: Unify L1TF flushing under per-CPU variable Currently the tracking of the need to flush L1D for L1TF is tracked by two bits: one per-CPU and one per-vCPU. The per-vCPU bit is always set when the vCPU shows up on a core, so there is no interesting state that's truly per-vCPU. Indeed, this is a requirement, since L1D is a part of the physical CPU. So simplify this by combining the two bits. The vCPU bit was being written from preemption-enabled regions. To play nice with those cases, wrap all calls from KVM and use a raw write so that request a flush with preemption enabled doesn't trigger what would effectively be DEBUG_PREEMPT false positives. Preemption doesn't need to be disabled, as kvm_arch_vcpu_load() will mark the new CPU as needing a flush if the vCPU task is migrated, or if userspace runs the vCPU on a different task. Signed-off-by: Brendan Jackman <jackmanb@google.com> [sean: put raw write in KVM instead of in a hardirq.h variant] Link: https://patch.msgid.link/20251113233746.1703361-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:45 -08:00
Sean Christopherson	05bd63959a	KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n Disable support for flushing the L1 data cache to mitigate L1TF if CPU mitigations are disabled for the entire kernel. KVM's mitigation of L1TF is in no way special enough to justify ignoring CONFIG_CPU_MITIGATIONS=n. Deliberately use CPU_MITIGATIONS instead of the more precise MITIGATION_L1TF, as MITIGATION_L1TF only controls the default behavior, i.e. CONFIG_MITIGATION_L1TF=n doesn't completely disable L1TF mitigations in the kernel. Keep the vmentry_l1d_flush module param to avoid breaking existing setups, and leverage the .set path to alert the user to the fact that vmentry_l1d_flush will be ignored. Don't bother validating the incoming value; if an admin misconfigures vmentry_l1d_flush, the fact that the bad configuration won't be detected when running with CONFIG_CPU_MITIGATIONS=n is likely the least of their worries. Reviewed-by: Brendan Jackman <jackmanb@google.com> Link: https://patch.msgid.link/20251113233746.1703361-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:44 -08:00
Sean Christopherson	0abd9610d6	KVM: VMX: Bundle all L1 data cache flush mitigation code together Move vmx_l1d_flush(), vmx_cleanup_l1d_flush(), and the vmentry_l1d_flush param code up in vmx.c so that all of the L1 data cache flushing code is bundled together. This will allow conditioning the mitigation code on CONFIG_CPU_MITIGATIONS=y with minimal #ifdefs. No functional change intended. Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://patch.msgid.link/20251113233746.1703361-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:44 -08:00
Sean Christopherson	fc704b5789	x86/bugs: KVM: Move VM_CLEAR_CPU_BUFFERS into SVM as SVM_CLEAR_CPU_BUFFERS Now that VMX encodes its own sequence for clearing CPU buffers, move VM_CLEAR_CPU_BUFFERS into SVM to minimize the chances of KVM botching a mitigation in the future, e.g. using VM_CLEAR_CPU_BUFFERS instead of checking multiple mitigation flags. No functional change intended. Reviewed-by: Brendan Jackman <jackmanb@google.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251113233746.1703361-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:43 -08:00
Sean Christopherson	e6ff1d61de	KVM: VMX: Handle MMIO Stale Data in VM-Enter assembly via ALTERNATIVES_2 Rework the handling of the MMIO Stale Data mitigation to clear CPU buffers immediately prior to VM-Enter, i.e. in the same location that KVM emits a VERW for unconditional (at runtime) clearing. Co-locating the code and using a single ALTERNATIVES_2 makes it more obvious how VMX mitigates the various vulnerabilities. Deliberately order the alternatives as: 0. Do nothing 1. Clear if vCPU can access MMIO 2. Clear always since the last alternative wins in ALTERNATIVES_2(), i.e. so that KVM will honor the strictest mitigation (always clear CPU buffers) if multiple mitigations are selected. E.g. even if the kernel chooses to mitigate MMIO Stale Data via X86_FEATURE_CLEAR_CPU_BUF_VM_MMIO, another mitigation may enable X86_FEATURE_CLEAR_CPU_BUF_VM, and that other thing needs to win. Note, decoupling the MMIO mitigation from the L1TF mitigation also fixes a mostly-benign flaw where KVM wouldn't do any clearing/flushing if the L1TF mitigation is configured to conditionally flush the L1D, and the MMIO mitigation but not any other "clear CPU buffers" mitigation is enabled. For that specific scenario, KVM would skip clearing CPU buffers for the MMIO mitigation even though the kernel requested a clear on every VM-Enter. Note #2, the flaw goes back to the introduction of the MDS mitigation. The MDS mitigation was inadvertently fixed by commit `43fb862de8` ("KVM/VMX: Move VERW closer to VMentry for MDS mitigation"), but previous kernels that flush CPU buffers in vmx_vcpu_enter_exit() are affected (though it's unlikely the flaw is meaningfully exploitable even older kernels). Fixes: `650b68a062` ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active") Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Brendan Jackman <jackmanb@google.com> Link: https://patch.msgid.link/20251113233746.1703361-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:42 -08:00
Sean Christopherson	f6106d41ec	x86/bugs: Use an x86 feature to track the MMIO Stale Data mitigation Convert the MMIO Stale Data mitigation tracking from a static branch into an x86 feature flag so that it can be used via ALTERNATIVE_2 in KVM. No functional change intended. Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Brendan Jackman <jackmanb@google.com> Link: https://patch.msgid.link/20251113233746.1703361-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:42 -08:00
Sean Christopherson	afb99ffbd5	x86/bugs: Decouple ALTERNATIVE usage from VERW macro definition Decouple the use of ALTERNATIVE from the encoding of VERW to clear CPU buffers so that KVM can use ALTERNATIVE_2 to handle "always clear buffers" and "clear if guest can access host MMIO" in a single statement. No functional change intended. Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://patch.msgid.link/20251113233746.1703361-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:41 -08:00
Pawan Gupta	aba7de6088	x86/bugs: Use VM_CLEAR_CPU_BUFFERS in VMX as well TSA mitigation: `d8010d4ba4` ("x86/bugs: Add a Transient Scheduler Attacks mitigation") introduced VM_CLEAR_CPU_BUFFERS for guests on AMD CPUs. Currently on Intel CLEAR_CPU_BUFFERS is being used for guests which has a much broader scope (kernel->user also). Make mitigations on Intel consistent with TSA. This would help handling the guest-only mitigations better in future. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> [sean: make CLEAR_CPU_BUF_VM mutually exclusive with the MMIO mitigation] Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Brendan Jackman <jackmanb@google.com> Link: https://patch.msgid.link/20251113233746.1703361-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:40 -08:00
Sean Christopherson	844afc1af3	KVM: VMX: Use on-stack copy of @flags in __vmx_vcpu_run() When testing for VMLAUNCH vs. VMRESUME, use the copy of @flags from the stack instead of first moving it to EBX, and then propagating VMX_RUN_VMRESUME to RFLAGS.CF (because RBX is clobbered with the guest value prior to the conditional branch to VMLAUNCH). Stashing information in RFLAGS is gross, especially with the writer and reader being bifurcated by yet more gnarly assembly code. Opportunistically drop the SHIFT macros as they existed purely to allow the VM-Enter flow to use Bit Test. Suggested-by: Borislav Petkov <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Brendan Jackman <jackmanb@google.com> Link: https://patch.msgid.link/20251113233746.1703361-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 16:22:35 -08:00
Chao Gao	11d984633f	KVM: x86: Allocate/free user_return_msrs at kvm.ko (un)loading time Move user_return_msrs allocation/free from vendor modules (kvm-intel.ko and kvm-amd.ko) (un)loading time to kvm.ko's to make it less risky to access user_return_msrs in kvm.ko. Tying the lifetime of user_return_msrs to vendor modules makes every access to user_return_msrs prone to use-after-free issues as vendor modules may be unloaded at any time. Opportunistically turn the per-CPU variable into full structs, as there's no practical difference between statically allocating the memory and allocating it unconditionally during module_init(). Zero out kvm_nr_uret_msrs on vendor module exit to further minimize the chances of consuming stale data, and WARN on vendor module load if KVM thinks there are existing user-return MSRs. Note! The user-return MSRs also need to be "destroyed" if ops->hardware_setup() fails, as both SVM and VMX expect common KVM to clean up (because common code, not vendor code, is responsible for kvm_nr_uret_msrs). Signed-off-by: Chao Gao <chao.gao@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251108013601.902918-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-18 15:53:54 -08:00
Lei Chen	e78fb96b41	KVM: x86: remove comment about ntp correction sync for Since vcpu local clock is no longer affected by ntp, remove comment about ntp correction sync for function kvm_gen_kvmclock_update. Signed-off-by: Lei Chen <lei.chen@smartx.com> Link: https://patch.msgid.link/20250819152027.1687487-4-lei.chen@smartx.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:25 -08:00
Lei Chen	446fcce2a5	Revert "x86: kvm: rate-limit global clock updates" This reverts commit `7e44e4495a`. Commit `7e44e4495a` ("x86: kvm: rate-limit global clock updates") intends to use a kvmclock_update_work to sync ntp corretion across all vcpus kvmclock, which is based on commit `0061d53daf` ("KVM: x86: limit difference between kvmclock updates") Since kvmclock has been switched to mono raw, this commit can be reverted. Signed-off-by: Lei Chen <lei.chen@smartx.com> Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:24 -08:00
Lei Chen	43ddbf16ed	Revert "x86: kvm: introduce periodic global clock updates" This reverts commit `332967a3ea`. Commit `332967a3ea` ("x86: kvm: introduce periodic global clock updates") introduced a 300s interval work to sync ntp corrections across all vcpus. Since commit `53fafdbb8b` ("KVM: x86: switch KVMCLOCK base to monotonic raw clock"), kvmclock switched to mono raw clock, we can no longer take ntp into consideration. Signed-off-by: Lei Chen <lei.chen@smartx.com> Link: https://patch.msgid.link/20250819152027.1687487-2-lei.chen@smartx.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:23 -08:00
Sean Christopherson	a091fe60c2	KVM: x86: Grab lapic_timer in a local variable to cleanup periodic code Stash apic->lapic_timer in a local "ktimer" variable in advance_periodic_target_expiration() to eliminate a few unaligned wraps, and to make the code easier to read overall. No functional change intended. Link: https://patch.msgid.link/20251113205114.1647493-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:23 -08:00
fuqiang wang	18ab3fc8e8	KVM: x86: Fix VM hard lockup after prolonged inactivity with periodic HV timer When advancing the target expiration for the guest's APIC timer in periodic mode, set the expiration to "now" if the target expiration is in the past (similar to what is done in update_target_expiration()). Blindly adding the period to the previous target expiration can result in KVM generating a practically unbounded number of hrtimer IRQs due to programming an expired timer over and over. In extreme scenarios, e.g. if userspace pauses/suspends a VM for an extended duration, this can even cause hard lockups in the host. Currently, the bug only affects Intel CPUs when using the hypervisor timer (HV timer), a.k.a. the VMX preemption timer. Unlike the software timer, a.k.a. hrtimer, which KVM keeps running even on exits to userspace, the HV timer only runs while the guest is active. As a result, if the vCPU does not run for an extended duration, there will be a huge gap between the target expiration and the current time the vCPU resumes running. Because the target expiration is incremented by only one period on each timer expiration, this leads to a series of timer expirations occurring rapidly after the vCPU/VM resumes. More critically, when the vCPU first triggers a periodic HV timer expiration after resuming, advancing the expiration by only one period will result in a target expiration in the past. As a result, the delta may be calculated as a negative value. When the delta is converted into an absolute value (tscdeadline is an unsigned u64), the resulting value can overflow what the HV timer is capable of programming. I.e. the large value will exceed the VMX Preemption Timer's maximum bit width of cpu_preemption_timer_multi + 32, and thus cause KVM to switch from the HV timer to the software timer (hrtimers). After switching to the software timer, periodic timer expiration callbacks may be executed consecutively within a single clock interrupt handler, because hrtimers honors KVM's request for an expiration in the past and immediately re-invokes KVM's callback after reprogramming. And because the interrupt handler runs with IRQs disabled, restarting KVM's hrtimer over and over until the target expiration is advanced to "now" can result in a hard lockup. E.g. the following hard lockup was triggered in the host when running a Windows VM (only relevant because it used the APIC timer in periodic mode) after resuming the VM from a long suspend (in the host). NMI watchdog: Watchdog detected hard LOCKUP on cpu 45 ... RIP: 0010:advance_periodic_target_expiration+0x4d/0x80 [kvm] ... RSP: 0018:ff4f88f5d98d8ef0 EFLAGS: 00000046 RAX: fff0103f91be678e RBX: fff0103f91be678e RCX: 00843a7d9e127bcc RDX: 0000000000000002 RSI: 0052ca4003697505 RDI: ff440d5bfbdbd500 RBP: ff440d5956f99200 R08: ff2ff2a42deb6a84 R09: 000000000002a6c0 R10: 0122d794016332b3 R11: 0000000000000000 R12: ff440db1af39cfc0 R13: ff440db1af39cfc0 R14: ffffffffc0d4a560 R15: ff440db1af39d0f8 FS: 00007f04a6ffd700(0000) GS:ff440db1af380000(0000) knlGS:000000e38a3b8000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000d5651feff8 CR3: 000000684e038002 CR4: 0000000000773ee0 PKRU: 55555554 Call Trace: <IRQ> apic_timer_fn+0x31/0x50 [kvm] __hrtimer_run_queues+0x100/0x280 hrtimer_interrupt+0x100/0x210 ? ttwu_do_wakeup+0x19/0x160 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 </IRQ> Moreover, if the suspend duration of the virtual machine is not long enough to trigger a hard lockup in this scenario, since commit `98c25ead5e` ("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86"), KVM will continue using the software timer until the guest reprograms the APIC timer in some way. Since the periodic timer does not require frequent APIC timer register programming, the guest may continue to use the software timer in perpetuity. Fixes: `d8f2f498d9` ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode") Cc: stable@vger.kernel.org Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com> [sean: massage comments and changelog] Link: https://patch.msgid.link/20251113205114.1647493-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:22 -08:00
fuqiang wang	9633f180ce	KVM: x86: Explicitly set new periodic hrtimer expiration in apic_timer_fn() When restarting an hrtimer to emulate a the guest's APIC timer in periodic mode, explicitly set the expiration using the target expiration computed by advance_periodic_target_expiration() instead of adding the period to the existing timer. This will allow making adjustments to the expiration, e.g. to deal with expirations far in the past, without having to implement the same logic in both advance_periodic_target_expiration() and apic_timer_fn(). Cc: stable@vger.kernel.org Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com> [sean: split to separate patch, write changelog] Link: https://patch.msgid.link/20251113205114.1647493-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:21 -08:00
Sean Christopherson	0ea9494be9	KVM: x86: WARN if hrtimer callback for periodic APIC timer fires with period=0 WARN and don't restart the hrtimer if KVM's callback runs with the guest's APIC timer in periodic mode but with a period of '0', as not advancing the hrtimer's deadline would put the CPU into an infinite loop of hrtimer events. Observing a period of '0' should be impossible, even when the hrtimer is running on a different CPU than the vCPU, as KVM is supposed to cancel the hrtimer before changing (or zeroing) the period, e.g. when switching from periodic to one-shot. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251113205114.1647493-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:21 -08:00
Sean Christopherson	b3e5b670c9	KVM: x86: Use "checked" versions of get_user() and put_user() Use the normal, checked versions for get_user() and put_user() instead of the double-underscore versions that omit range checks, as the checked versions are actually measurably faster on modern CPUs (12%+ on Intel, 25%+ on AMD). The performance hit on the unchecked versions is almost entirely due to the added LFENCE on CPUs where LFENCE is serializing (which is effectively all modern CPUs), which was added by commit `304ec1b050` ("x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec"). The small optimizations done by commit `b19b74bc99` ("x86/mm: Rework address range check in get_user() and put_user()") likely shave a few cycles off, but the bulk of the extra latency comes from the LFENCE. Don't bother trying to open-code an equivalent for performance reasons, as the loss of inlining (e.g. see commit `ea6f043fc9` ("x86: Make __get_user() generate an out-of-line call") is largely a non-factor (ignoring setups where RET is something entirely different), As measured across tens of millions of calls of guest PTE reads in FNAME(walk_addr_generic): __get_user() get_user() open-coded open-coded, no LFENCE Intel (EMR) 75.1 67.6 75.3 65.5 AMD (Turin) 68.1 51.1 67.5 49.3 Note, Hyper-V MSR emulation is not a remotely hot path, but convert it anyways for consistency, and because there is a general desire to remove __{get,put}_user() entirely. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/all/CAHk-=wimh_3jM9Xe8Zx0rpuf8CPDu6DkRCGb44azk0Sz5yqSnw@mail.gmail.com Cc: Borislav Petkov <bp@alien8.de> Link: https://patch.msgid.link/20251106210206.221558-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-17 07:50:20 -08:00
Hou Wenlong	995d504100	KVM: x86: Don't disable IRQs when unregistering user-return notifier Remove the code to disable IRQs when unregistering KVM's user-return notifier now that KVM doesn't invoke kvm_on_user_return() when disabling virtualization via IPI function call, i.e. now that there's no need to guard against re-entrancy via IPI callback. Note, disabling IRQs has largely been unnecessary since commit `a377ac1cd9` ("x86/entry: Move user return notifier out of loop") moved fire_user_return_notifiers() into the section with IRQs disabled. In doing so, the commit somewhat inadvertently fixed the underlying issue that was papered over by commit `1650b4ebc9` ("KVM: Disable irq while unregistering user notifier"). I.e. in practice, the code and comment has been stale since commit `a377ac1cd9`. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> [sean: rewrite changelog after rebasing, drop lockdep assert] Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030191528.3380553-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-07 10:59:47 -08:00
Sean Christopherson	2baa33a8dd	KVM: x86: Leave user-return notifier registered on reboot/shutdown Leave KVM's user-return notifier registered in the unlikely case that the notifier is registered when disabling virtualization via IPI callback in response to reboot/shutdown. On reboot/shutdown, keeping the notifier registered is ok as far as MSR state is concerned (arguably better then restoring MSRs at an unknown point in time), as the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down. The only wrinkle is that if kvm.ko module unload manages to race with reboot/shutdown, then leaving the notifier registered could lead to use-after-free due to calling into unloaded kvm.ko module code. But such a race is only possible on --forced reboot/shutdown, because otherwise userspace tasks would be frozen before kvm_shutdown() is called, i.e. on a "normal" reboot/shutdown, it should be impossible for the CPU to return to userspace after kvm_shutdown(). Furthermore, on a --forced reboot/shutdown, unregistering the user-return hook from IRQ context doesn't fully guard against use-after-free, because KVM could immediately re-register the hook, e.g. if the IRQ arrives before kvm_user_return_register_notifier() is called. Rather than trying to guard against the IPI in the "normal" user-return code, which is difficult and noisy, simply leave the user-return notifier registered on a reboot, and bump the kvm.ko module refcount to defend against a use-after-free due to kvm.ko unload racing against reboot. Alternatively, KVM could allow kvm.ko and try to drop the notifiers during kvm_x86_exit(), but that's also a can of worms as registration is per-CPU, and so KVM would need to blast an IPI, and doing so while a reboot/shutdown is in-progress is far risky than preventing userspace from unloading KVM. Link: https://patch.msgid.link/20251030191528.3380553-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-07 10:59:46 -08:00
Sean Christopherson	b371174d2f	KVM: x86: WARN if user-return MSR notifier is registered on exit When freeing the per-CPU user-return MSRs structures, WARN if any CPU has a registered notifier to help detect and/or debug potential use-after-free issues. The lifecycle of the notifiers is rather convoluted, and has several non-obvious paths where notifiers are unregistered, i.e. isn't exactly the most robust code possible. The notifiers they are registered on-demand in KVM, on the first WRMSR to a tracked register. _Usually_ the notifier is unregistered whenever the CPU returns to userspace. But because any given CPU isn't guaranteed to return to userspace, e.g. the CPU could be offlined before doing so, KVM also "drops", a.k.a. unregisters, the notifiers when virtualization is disabled on the CPU. Further complicating the unregister path is the fact that the calls to disable virtualization come from common KVM, and the per-CPU calls are guarded by a per-CPU flag (to harden _that_ code against bugs, e.g. due to mishandling reboot). Reboot/shutdown in particular is problematic, as KVM disables virtualization via IPI function call, i.e. from IRQ context, instead of using the cpuhp framework, which runs in task context. I.e. on reboot/shutdown, drop_user_return_notifiers() is called asynchronously. Forced reboot/shutdown is the most problematic scenario, as userspace tasks are not frozen before kvm_shutdown() is invoked, i.e. KVM could be actively manipulating the user-return MSR lists and/or notifiers when the IPI arrives. To a certain extent, all bets are off when userspace forces a reboot/shutdown, but KVM should at least avoid a use-after-free, e.g. to avoid crashing the kernel when trying to reboot. Link: https://patch.msgid.link/20251030191528.3380553-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-07 10:59:46 -08:00
Sean Christopherson	c0711f8c61	KVM: TDX: Explicitly set user-return MSRs that may be clobbered by the TDX-Module Set all user-return MSRs to their post-TD-exit value when preparing to run a TDX vCPU to ensure the value that KVM expects to be loaded after running the vCPU is indeed the value that's loaded in hardware. If the TDX-Module doesn't actually enter the guest, i.e. doesn't do VM-Enter, then it won't "restore" VMM state, i.e. won't clobber user-return MSRs to their expected post-run values, in which case simply updating KVM's "cached" value will effectively corrupt the cache due to hardware still holding the original value. In theory, KVM could conditionally update the current user-return value if and only if tdh_vp_enter() succeeds, but in practice "success" doesn't guarantee the TDX-Module actually entered the guest, e.g. if the TDX-Module synthesizes an EPT Violation because it suspects a zero-step attack. Force-load the expected values instead of trying to decipher whether or not the TDX-Module restored/clobbered MSRs, as the risk doesn't justify the benefits. Effectively avoiding four WRMSRs once per run loop (even if the vCPU is scheduled out, user-return MSRs only need to be reloaded if the CPU exits to userspace or runs a non-TDX vCPU) is likely in the noise when amortized over all entries, given the cost of running a TDX vCPU. E.g. the cost of the WRMSRs is somewhere between ~300 and ~500 cycles, whereas the cost of a _single_ roundtrip to/from a TDX guest is thousands of cycles. Fixes: `e0b4f31a3c` ("KVM: TDX: restore user ret MSRs") Cc: stable@vger.kernel.org Cc: Yan Zhao <yan.y.zhao@intel.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://patch.msgid.link/20251030191528.3380553-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-07 10:59:45 -08:00
Maxim Levitsky	ab4e41eb9f	KVM: x86: Don't clear async #PF queue when CR0.PG is disabled (e.g. on #SMI) Fix an interaction between SMM and PV asynchronous #PFs where an #SMI can cause KVM to drop an async #PF ready event, and thus result in guest tasks becoming permanently stuck due to the task that encountered the #PF never being resumed. Specifically, don't clear the completion queue when paging is disabled, and re-check for completed async #PFs if/when paging is enabled. Prior to commit `2635b5c4a0` ("KVM: x86: interrupt based APF 'page ready' event delivery"), flushing the APF queue without notifying the guest of completed APF requests when paging is disabled was "necessary", in that delivering a #PF to the guest when paging is disabled would likely confuse and/or crash the guest. And presumably the original async #PF development assumed that a guest would only disable paging when there was no intent to ever re-enable paging. That assumption fails in several scenarios, most visibly on an emulated SMI, as entering SMM always disables CR0.PG (i.e. initially runs with paging disabled). When the SMM handler eventually executes RSM, the interrupted paging-enabled is restored, and the async #PF event is lost. Similarly, invoking firmware, e.g. via EFI runtime calls, might require a transition through paging modes and thus also disable paging with valid entries in the competion queue. To avoid dropping completion events, drop the "clear" entirely, and handle paging-enable transitions in the same way KVM already handles APIC enable/disable events: if a vCPU's APIC is disabled, APF completion events are not kept pending and not injected while APIC is disabled. Once a vCPU's APIC is re-enabled, KVM raises KVM_REQ_APF_READY so that the vCPU recognizes any pending pending #APF ready events. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251015033258.50974-4-mlevitsk@redhat.com [sean: rework changelog to call out #PF injection, drop "real mode" references, expand the code comment] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-07 10:59:44 -08:00
Maxim Levitsky	68c35f89d0	KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present_queued() Fix a semi theoretical race condition related to a lack of memory barriers when dealing with vcpu->arch.apf.pageready_pending. In theory, the "ready" side could see a stale pageready_pending and neglect to kick the vCPU, and thus allow the vCPU to enter the guest with a pending KVM_REQ_APF_READY and no kick/IPI on the way, in which case the KVM would fail to deliver a completed async #PF event to the guest in a timely manner as the request would be recognized only on the next (coincidental) VM-Exit. kvm_arch_async_page_present_queued() running in workqueue context: kvm_make_request(KVM_REQ_APF_READY, vcpu); /* memory barrier is missing here/ if (!vcpu->arch.apf.pageready_pending) kvm_vcpu_kick(vcpu); kvm_set_msr_common() running in task context: vcpu->arch.apf.pageready_pending = false; / memory barrier is missing here*/ And later, vcpu_enter_guest() running in task context: if (kvm_check_request(KVM_REQ_APF_READY, vcpu)) kvm_check_async_pf_completion(vcpu) Add missing full memory barriers in both cases to avoid theoretical case of not kicking the vCPU thread. Note that the bug is mostly theoretical because kvm_make_request() uses an atomic operation, which is always serializing on x86, requiring only for documentation purposes the smp_mb__after_atomic() after it (smp_mb__after_atomic() is a NOP on x86). The second missing barrier, between kvm_set_msr_common() and vcpu_enter_guest(), isn't strictly needed because KVM executes several barriers in between calling these functions, however it still makes sense to have an explicit barrier to be on the safe side and to document the ordering dependencies. Finally, also use READ_ONCE/WRITE_ONCE. Thanks a lot to Paolo for the help with this patch. Link: https://lore.kernel.org/all/7c7a5a75-a786-4a05-a836-4368582ca4c2@redhat.com Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://patch.msgid.link/20251015033258.50974-3-mlevitsk@redhat.com [sean: explain the race and its impact in more detail] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-07 10:59:43 -08:00
Sean Christopherson	65a70164ab	KVM: x86: Add a helper to dedup reporting of unhandled VM-Exits Add and use a helper, kvm_prepare_unexpected_reason_exit(), to dedup the code that fills the exit reason and CPU when KVM encounters a VM-Exit that KVM doesn't know how to handle. Reviewed-by: yaoyuan@linux.alibaba.com Reviewed-by: Yao Yuan <yaoyuan@linux.alibaba.com> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030185004.3372256-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-04 09:14:47 -08:00
Linus Torvalds	211ddde082	Linux 6.18-rc2 v6.18-rc2	2025-10-19 15:19:16 -10:00
Linus Torvalds	d9043c79ba	Merge tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Make sure the check for lost pelt idle time is done unconditionally to have correct lost idle time accounting - Stop the deadline server task before a CPU goes offline * tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix pelt lost idle time detection sched/deadline: Stop dl_server before CPU goes offline	2025-10-19 04:59:43 -10:00
Linus Torvalds	343b4b44a1	Merge tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure perf reporting works correctly in setups using overlayfs or FUSE - Move the uprobe optimization to a better location logically * tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix MMAP2 event device with backing files perf/core: Fix MMAP event path names with backing files perf/core: Fix address filter match with backing files uprobe: Move arch_uprobe_optimize right after handlers execution	2025-10-19 04:54:08 -10:00
Linus Torvalds	c7864eeaa4	Merge tag 'x86_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Reset the why-the-system-rebooted register on AMD to avoid stale bits remaining from previous boots - Add a missing barrier in the TLB flushing code to prevent erroneously not flushing a TLB generation - Make sure cpa_flush() does not overshoot when computing the end range of a flush region - Fix resctrl bandwidth counting on AMD systems when the amount of monitoring groups created exceeds the number the hardware can track * tag 'x86_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Prevent reset reasons from being retained across reboot x86/mm: Fix SMP ordering in switch_mm_irqs_off() x86/mm: Fix overflow in __cpa_addr() x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID	2025-10-19 04:41:27 -10:00
Linus Torvalds	1c64efcb08	Merge tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rustfmt fixes from Miguel Ojeda: "Rust 'rustfmt' cleanup 'rustfmt', by default, formats imports in a way that is prone to conflicts while merging and rebasing, since in some cases it condenses several items into the same line. Document in our guidelines that we will handle this for the moment with the trailing empty comment workaround and make the tree 'rustfmt'-clean again" * tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: bitmap: fix formatting rust: cpufreq: fix formatting rust: alloc: employ a trailing comment to keep vertical layout docs: rust: add section on imports formatting	2025-10-18 10:05:13 -10:00
Linus Torvalds	648937f64a	Merge tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fix from Jarkko Sakkinen: "Correct the state transitions for ARM FF-A to match the spec and how tpm_crb behaves on other platforms" * tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm_crb: Add idle support for the Arm FF-A start method	2025-10-18 08:38:28 -10:00
Linus Torvalds	e67bb0da33	Merge tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Search for MSI Capability with correct ID to fix an MSI regression on platforms with Cadence IP (Hans Zhang) - Revert early bridge resource set up to fix resource assignment failures that broke at least alpha boot and Snapdragon ath12k WiFi (Ilpo Järvinen) - Implement VMD .irq_startup()/.irq_shutdown() to fix IRQ issues that caused boot crashes and broken devices below VMD (Inochi Amaoto) - Select CONFIG_SCREEN_INFO on X86 to fix black screen on boot when SCREEN_INFO not selected (Mario Limonciello) * tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/VGA: Select SCREEN_INFO on X86 PCI: vmd: Override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info() PCI: Revert early bridge resource set up PCI: cadence: Search for MSI Capability with correct ID	2025-10-18 08:35:09 -10:00
Linus Torvalds	ea0bdf2b94	Merge tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link fixes from Dave Jiang: "A small collection of CXL fixes. In addition to some misc fixes for the CXL subsystem, a number of fixes for CXL extended linear cache support are included to make it functional again. - Avoid missing port component registers setup due to dport enumeration failure - Add check for no entries in cxl_feature_info to address accessing invalid pointer. - Use %pa printk format to emit resource_size_t in validate_region_offset() CXL extended linear cache support fixes: - Fix setup of memory resource in cxl_acpi_set_cache_size() - Set range param for region_res_match_cxl_range() as const (addresses a compile warning for match_region_by_range() fix) - Fix match_region_by_range() to use region_res_match_cxl_range() - Subtract to find an hpa_alias0 in cxl_poison events to correct the alias math calculation" * tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/trace: Subtract to find an hpa_alias0 in cxl_poison events cxl/region: Use %pa printk format to emit resource_size_t cxl: Fix match_region_by_range() to use region_res_match_cxl_range() cxl: Set range param for region_res_match_cxl_range() as const cxl/acpi: Fix setup of memory resource in cxl_acpi_set_cache_size() cxl/features: Add check for no entries in cxl_feature_info cxl/port: Avoid missing port component registers setup	2025-10-18 08:22:07 -10:00
Linus Torvalds	2953fb6548	Merge tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - fix for sticky fingers handling in hid-multitouch (Benjamin Tissoires) - fix for reporting of 0 battery levels (Dmitry Torokhov) - build fix for hid-haptic in certain configurations (Jonathan Denose) - improved probe and avoiding spamming kernel log by hid-nintendo (Vicki Pfau) - fix for OOB in hid-cp2112 (Deepak Sharma) - interrupt handling fix for intel-thc-hid (Even Xu) - a couple of new device IDs and device-specific quirks * tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: logitech-hidpp: Add HIDPP_QUIRK_RESET_HI_RES_SCROLL selftests/hid: add tests for missing release on the Dell Synaptics HID: multitouch: fix sticky fingers HID: multitouch: fix name of Stylus input devices HID: hid-input: only ignore 0 battery events for digitizers HID: hid-debug: Fix spelling mistake "Rechargable" -> "Rechargeable" HID: Kconfig: Fix build error from CONFIG_HID_HAPTIC HID: nintendo: Rate limit IMU compensation message HID: nintendo: Wait longer for initial probe HID: core: Add printk_ratelimited variants to hid_warn() etc HID: quirks: Add ALWAYS_POLL quirk for VRS R295 steering wheel HID: quirks: avoid Cooler Master MM712 dongle wakeup bug HID: cp2112: Add parameter validation to data length HID: intel-thc-hid: intel-quickspi: Add ARL PCI Device Id's HID: intel-thc-hid: Intel-quickspi: switch first interrupt from level to edge detection HID: intel-thc-hid: intel-quicki2c: Fix wrong type casting	2025-10-18 08:18:18 -10:00
Linus Torvalds	d303caf5ca	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov) - Make selftests/bpf arg_parsing.c more robust to errors (Andrii Nakryiko) - Fix redefinition of 'off' as different kind of symbol when I40E driver is builtin (Brahmajit Das) - Do not disable preemption in bpf_test_run (Sahil Chandna) - Fix memory leak in __lookup_instance error path (Shardul Bankar) - Ensure test data is flushed to disk before reading it (Xing Guo) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Fix redefinition of 'off' as different kind of symbol bpf: Do not disable preemption in bpf_test_run(). bpf: Fix memory leak in __lookup_instance error path selftests: arg_parsing: Ensure data is flushed to disk before reading. bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures. selftests/bpf: make arg_parsing.c more robust to crashes bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path	2025-10-18 08:00:43 -10:00

1 2 3 4 5 ...

1397051 Commits