linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-26 02:32:45 -04:00

Author	SHA1	Message	Date
Sean Christopherson	e44eb58334	KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Load the guest's FPU state if userspace is accessing MSRs whose values are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(), to facilitate access to such kind of MSRs. If MSRs supported in kvm_caps.supported_xss are passed through to guest, the guest MSRs are swapped with host's before vCPU exits to userspace and after it reenters kernel before next VM-entry. Because the modified code is also used for the KVM_GET_MSRS device ioctl(), explicitly check @vcpu is non-null before attempting to load guest state. The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without loading guest FPU state (which doesn't exist). Note that guest_cpuid_has() is not queried as host userspace is allowed to access MSRs that have not been exposed to the guest, e.g. it might do KVM_SET_MSRS prior to KVM_SET_CPUID2. The two helpers are put here in order to manifest accessing xsave-managed MSRs requires special check and handling to guarantee the correctness of read/write to the MSRs. Co-developed-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: drop S_CET, add big comment, move accessors to x86.c] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250919223258.1604852-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:00:47 -07:00
Yang Weijiang	779ed05511	KVM: x86: Initialize kvm_caps.supported_xss Set original kvm_caps.supported_xss to (host_xss & KVM_SUPPORTED_XSS) if XSAVES is supported. host_xss contains the host supported xstate feature bits for thread FPU context switch, KVM_SUPPORTED_XSS includes all KVM enabled XSS feature bits, the resulting value represents the supervisor xstates that are available to guest and are backed by host FPU framework for swapping {guest,host} XSAVE-managed registers/MSRs. [sean: relocate and enhance comment about PT / XSS[8] ] Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:00:46 -07:00
Yang Weijiang	9622e116d0	KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size due to XSS MSR modification. CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled xstate features in (XCR0 \| IA32_XSS). The CPUID value can be used by guest before allocate sufficient xsave buffer. Note, KVM does not yet support any XSS based features, i.e. supported_xss is guaranteed to be zero at this time. Opportunistically skip CPUID updates if XSS value doesn't change. Suggested-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:00:46 -07:00
Chao Gao	338543cbe0	KVM: x86: Check XSS validity against guest CPUIDs Maintain per-guest valid XSS bits and check XSS validity against them rather than against KVM capabilities. This is to prevent bits that are supported by KVM but not supported for a guest from being set. Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the host_initiated check. Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:00:45 -07:00
Sean Christopherson	c0a5f29891	KVM: x86: Report XSS as to-be-saved if there are supported features Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss is non-zero, i.e. KVM supports at least one XSS based feature. Before enabling CET virtualization series, guest IA32_MSR_XSS is guaranteed to be 0, i.e., XSAVES/XRSTORS is executed in non-root mode with XSS == 0, which equals to the effect of XSAVE/XRSTOR. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:00:44 -07:00
Yang Weijiang	06f2969c6a	KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and other non-MSR registers through them, along with support for KVM_GET_REG_LIST to enumerate support for KVM-defined registers. This is in preparation for allowing userspace to read/write the guest SSP register, which is needed for the upcoming CET virtualization support. Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is added for registers that lack existing KVM uAPIs to access them. The "KVM" in the name is intended to be vague to give KVM flexibility to include other potential registers. More precise names like "SYNTHETIC" and "SYNTHETIC_MSR" were considered, but were deemed too confusing (e.g. can be conflated with synthetic guest-visible MSRs) and may put KVM into a corner (e.g. if KVM wants to change how a KVM-defined register is modeled internally). Enumerate only KVM-defined registers in KVM_GET_REG_LIST to avoid duplicating KVM_GET_MSR_INDEX_LIST, and so that KVM can return _only_ registers that are fully supported (KVM_GET_REG_LIST is vCPU-scoped, i.e. can be precise, whereas KVM_GET_MSR_INDEX_LIST is system-scoped). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com [1] Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250919223258.1604852-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:00:44 -07:00
Sean Christopherson	1f2bbbbbda	KVM: x86: Merge 'selftests' into 'cet' to pick up ex_str() Merge the queue of KVM selftests changes for 6.18 to pick up the ex_str() helper so that it can be used to pretty print expected versus actual exceptions in a new MSR selftest. CET virtualization will add support for several MSRs with non-trivial semantics, along with new uAPI for accessing the guest's Shadow Stack Pointer (SSP) from userspace.	2025-09-23 09:00:18 -07:00
Sean Christopherson	5dca3808b2	KVM: x86: Merge 'svm' into 'cet' to pick up GHCB dependencies Merge the queue of SVM changes for 6.18 to pick up the KVM-defined GHCB helpers so that kvm_ghcb_get_xss() can be used to virtualize CET for SEV-ES+ guests.	2025-09-23 08:59:49 -07:00
Sean Christopherson	4135a9a8cc	KVM: SEV: Validate XCR0 provided by guest in GHCB Use __kvm_set_xcr() to propagate XCR0 changes from the GHCB to KVM's software model in order to validate the new XCR0 against KVM's view of the supported XCR0. Allowing garbage is thankfully mostly benign, as kvm_load_{guest,host}_xsave_state() bail early for vCPUs with protected state, xstate_required_size() will simply provide garbage back to the guest, and attempting to save/restore the bad value via KVM_{G,S}ET_XCRS will only harm the guest (setting XCR0 will fail). However, allowing the guest to put junk into a field that KVM assumes is valid is a CVE waiting to happen. And as a bonus, using the proper API eliminates the ugly open coding of setting arch.cpuid_dynamic_bits_dirty. Simply ignore bad values, as either the guest managed to get an unsupported value into hardware, or the guest is misbehaving and providing pure garbage. In either case, KVM can't fix the broken guest. Note, using __kvm_set_xcr() also avoids recomputing dynamic CPUID bits if XCR0 isn't actually changing (relatively to KVM's previous snapshot). Cc: Tom Lendacky <thomas.lendacky@amd.com> Fixes: `291bd20d5d` ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:55:19 -07:00
Sean Christopherson	bd5f500d23	KVM: SEV: Read save fields from GHCB exactly once Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific GHCB get() utility to help guard against TOCTOU bugs. Using READ_ONCE() doesn't completely prevent such bugs, e.g. doesn't prevent KVM from redoing get() after checking the initial value, but at least addresses all potential TOCTOU issues in the current KVM code base. To prevent unintentional use of the generic helpers, take only @svm for the kvm_ghcb_get_xxx() helpers and retrieve the ghcb instead of explicitly passing it in. Opportunistically reduce the indentation of the macro-defined helpers and clean up the alignment. Fixes: `4e15a0ddc3` ("KVM: SEV: snapshot the GHCB before accessing it") Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:55:18 -07:00
Sean Christopherson	e0ff302b79	KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() to make it clear that KVM is getting the cached value, not reading directly from the guest-controlled GHCB. More importantly, vacating kvm_ghcb_get_sw_exit_code() will allow adding a KVM-specific macro-built kvm_ghcb_get_##field() helper to read values from the GHCB. No functional change intended. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:55:18 -07:00
Sean Christopherson	df1f294013	KVM: selftests: Add ex_str() to print human friendly name of exception vectors Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line lengths reasonable) and use it in assert messages that currently print the raw vector number. Co-developed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-45-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:39:02 -07:00
Sukrut Heroorkar	ff86b48d4c	selftests/kvm: remove stale TODO in xapic_state_test The TODO about using the number of vCPUs instead of vcpu.id + 1 was already addressed by commit `376bc1b458` ("KVM: selftests: Don't assume vcpu->id is '0' in xAPIC state test"). The comment is now stale and can be removed. Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com> Link: https://lore.kernel.org/r/20250908210547.12748-1-hsukrut3@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:39:01 -07:00
dongsheng	c435978e4f	KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount Add a PMU errata framework and use it to relax precise event counts on Atom platforms that overcount "Instruction Retired" and "Branch Instruction Retired" events, as the overcount issues on VM-Exit/VM-Entry are impossible to prevent from userspace, e.g. the test can't prevent host IRQs. Setup errata during early initialization and automatically sync the mask to VMs so that tests can check for errata without having to manually manage host=>guest variables. For Intel Atom CPUs, the PMU events "Instruction Retired" or "Branch Instruction Retired" may be overcounted for some certain instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD and complex SGX/SMX/CSTATE instructions/flows. The detailed information can be found in the errata (section SRF7): https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details/ For the Atom platforms before Sierra Forest (including Sierra Forest), Both 2 events "Instruction Retired" and "Branch Instruction Retired" would be overcounted on these certain instructions, but for Clearwater Forest only "Instruction Retired" event is overcounted on these instructions. Signed-off-by: dongsheng <dongsheng.x.zhang@intel.com> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250919214648.1585683-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:38:59 -07:00
Dapeng Mi	2922b59588	KVM: selftests: Validate more arch-events in pmu_counters_test Add support for 5 new architectural events (4 topdown level 1 metrics events and LBR inserts event) that will first show up in Intel's Clearwater Forest CPUs. Detailed info about the new events can be found in SDM section 21.2.7 "Pre-defined Architectural Performance Events". Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> [sean: drop "unavailable_mask" changes] Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250919214648.1585683-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:38:59 -07:00
Sean Christopherson	1fcd3053aa	KVM: selftests: Reduce number of "unavailable PMU events" combos tested Reduce the number of combinations of unavailable PMU events masks that are testing by the PMU counters test. In reality, testing every possible combination isn't all that interesting, and certainly not worth the tens of seconds (or worse, minutes) of runtime. Fully testing the N^2 space will be especially problematic in the near future, as 5! new arch events are on their way. Use alternating bit patterns (and 0 and -1u) in the hopes that _if_ there is ever a KVM bug, it's not something horribly convoluted that shows up only with a super specific pattern/value. Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250919214648.1585683-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:38:59 -07:00
Sean Christopherson	571fc2833e	KVM: selftests: Track unavailable_mask for PMU events as 32-bit value Track the mask of "unavailable" PMU events as a 32-bit value. While bits 31:9 are currently reserved, silently truncating those bits is unnecessary and asking for missed coverage. To avoid running afoul of the sanity check in vcpu_set_cpuid_property(), explicitly adjust the mask based on the non-reserved bits as reported by KVM's supported CPUID. Opportunistically update the "all ones" testcase to pass -1u instead of 0xff. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250919214648.1585683-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:38:59 -07:00
Dapeng Mi	210b09fa42	KVM: selftests: Add timing_info bit support in vmx_pmu_caps_test A new bit PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added to indicated if PEBS supports to record timing information in a new "Retried Latency" field. Since KVM requires user can only set host consistent PEBS capabilities, otherwise the PERF_CAPABILITIES setting would fail, add pebs_timing_info into the "immutable_caps" to block host inconsistent PEBS configuration and cause errors. Opportunistically drop the anythread_deprecated bit. It isn't and likely never was a PERF_CAPABILITIES flag, the test's definition snuck in when the union was copy+pasted from the kernel's definition. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> [sean: call out anythread_deprecated change] Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250919214648.1585683-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 08:38:59 -07:00
Bagas Sanjaya	86bcd23df9	KVM: x86: Fix hypercalls docs section number order Commit `4180bf1b65` ("KVM: X86: Implement "send IPI" hypercall") documents KVM_HC_SEND_IPI hypercall, yet its section number duplicates KVM_HC_CLOCK_PAIRING one (which both are 6th). Fix the numbering order so that the former should be 7th. Fixes: `4180bf1b65` ("KVM: X86: Implement "send IPI" hypercall") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250909003952.10314-1-bagasdotme@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-22 07:51:36 -07:00
Sean Christopherson	e8f85d7884	KVM: x86: Don't treat ENTER and LEAVE as branches, because they aren't Remove the IsBranch flag from ENTER and LEAVE in KVM's emulator, as ENTER and LEAVE are stack operations, not branches. Add forced emulation of said instructions to the PMU counters test to prove that KVM diverges from hardware, and to guard against regressions. Opportunistically add a missing "1 MOV" to the selftest comment regarding the number of instructions per loop, which commit `7803339fa9` ("KVM: selftests: Use data load to trigger LLC references/misses in Intel PMU") forgot to add. Fixes: `018d70ffcf` ("KVM: x86: Update vPMCs when retiring branch instructions") Cc: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919004639.1360453-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-22 07:14:05 -07:00
Sean Christopherson	c49aa98376	KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to PMU v2+ Restrict support for GLOBAL_CTRL, GLOBAL_STATUS, fixed PMCs, and PEBS to v2 or later vPMUs. The SDM explicitly states that GLOBAL_{CTRL,STATUS} and fixed counters were introduced with PMU v2, and PEBS has hard dependencies on fixed counters and the bitmap MSR layouts established by PMU v2. Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:58:15 -07:00
Sean Christopherson	9bae7a0863	KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86 Move all initialization of all_valid_pmc_idx to common code, as the logic is 100% common to Intel and AMD, and KVM heavily relies on Intel and AMD having the same semantics. E.g. the fact that AMD doesn't support fixed counters doesn't allow KVM to use all_valid_pmc_idx[63:32] for other purposes. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:58:13 -07:00
Dapeng Mi	30c0267f15	KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents Replace a variety of "1ull << N" and "(u64)1 << N" snippets with BIT_ULL() in the PMU code. No functional change intended. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> [sean: split to separate patch, write changelog] Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:20 -07:00
Dapeng Mi	2bff2edf69	KVM: VMX: Add helpers to toggle/change a bit in VMCS execution controls Expand the VMCS controls builder macros to generate helpers to change a bit to the desired value, and use the new helpers when toggling APICv related controls. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> [sean: rewrite changelog] Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:19 -07:00
Sean Christopherson	5a1a726e68	KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates Defer recalculating MSR and instruction intercepts after a CPUID update via RECALC_INTERCEPTS to converge on RECALC_INTERCEPTS as the "official" mechanism for triggering recalcs. As a bonus, because KVM does a "recalc" during vCPU creation, and every functional VMM sets CPUID at least once, for all intents and purposes this saves at least one recalc. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:19 -07:00
Sean Christopherson	6057497336	KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS Rework the MSR_FILTER_CHANGED request into a more generic RECALC_INTERCEPTS request, and expand the responsibilities of vendor code to recalculate all intercepts that vary based on userspace input, e.g. instruction intercepts that are tied to guest CPUID. Providing a generic recalc request will allow the upcoming mediated PMU support to trigger a recalc when PMU features, e.g. PERF_CAPABILITIES, are set by userspace, without having to make multiple calls to/from PMU code. As a bonus, using a request will effectively coalesce recalcs, e.g. will reduce the number of recalcs for normal usage from 3+ to 1 (vCPU create, set CPUID, set PERF_CAPABILITIES (Intel only), set filter). The downside is that MSR filter changes that are done in isolation will do a small amount of unnecessary work, but that's already a relatively slow path, and the cost of recalculating instruction intercepts is negligible. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:18 -07:00
Dapeng Mi	cdfed9370b	KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h and rename them with PERF_CAP prefix to keep consistent with other perf capabilities macros. No functional change intended. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:57:16 -07:00
Dapeng Mi	1e24bece26	KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers Rename the two helpers vmx_vmentry/vmexit_ctrl() to vmx_get_initial_vmentry/vmexit_ctrl() to represent their real meaning. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:45 -07:00
Sean Christopherson	51f34b1e65	KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities Take a snapshot of the unadulterated PMU capabilities provided by perf so that KVM can compare guest vPMU capabilities against hardware capabilities when determining whether or not to intercept PMU MSRs (and RDPMC). Reviewed-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:44 -07:00
Sean Christopherson	e3d1f2826d	KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs Gate access to PMC MSRs based on pmu->version, not on kvm->arch.enable_pmu, to more accurately reflect KVM's behavior. This is a glorified nop, as pmu->version and pmu->nr_arch_gp_counters can only be non-zero if amd_pmu_refresh() is reached, kvm_pmu_refresh() invokes amd_pmu_refresh() if and only if kvm->arch.enable_pmu is true, and amd_pmu_refresh() forces pmu->version to be 1 or 2. I.e. the following holds true: !pmu->nr_arch_gp_counters \|\| kvm->arch.enable_pmu == (pmu->version > 0) and so the only way for amd_pmu_get_pmc() to return a non-NULL value is if both kvm->arch.enable_pmu and pmu->version evaluate to true. No real functional change intended. Reviewed-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:28 -07:00
Sean Christopherson	4687a2c4e6	KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init() Setup the golden VMCS config during vmx_init(), before the call to kvm_x86_vendor_init(), instead of waiting until the callback to do hardware setup. setup_vmcs_config() only touches VMX state, i.e. doesn't poke anything in kvm.ko, and has no runtime dependencies beyond hv_init_evmcs(). Setting the VMCS config early on will allow referencing VMCS and VMX capabilities at any point during setup, e.g. to check for PERF_GLOBAL_CTRL save/load support during mediated PMU initialization. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-18 12:56:28 -07:00
Jiaming Zhang	5b5133e6a5	Documentation: KVM: Call out that KVM strictly follows the 8254 PIT spec Explicitly document that the behavior of KVM_SET_PIT2 strictly conforms to the Intel 8254 PIT hardware specification, specifically that a write of '0' adheres to the spec's definition that a programmed count of '0' is converted to the maximum possible value (2^16). E.g. an unaware userspace might attempt to validate that KVM_GET_PIT2 returns the exact state set via KVM_SET_PIT2, and be surprised when the returned count is 65536, not 0. Add a references to the Intel 8254 PIT datasheet that will hopefully stay fresh for some time (the internet isn't exactly brimming with copies of the 8254 datasheet). Link: https://lore.kernel.org/all/CANypQFbEySjKOFLqtFFf2vrEe=NBr7XJfbkjQhqXuZGg7Rpoxw@mail.gmail.com Signed-off-by: Jiaming Zhang <r772577952@gmail.com> Link: https://lore.kernel.org/r/20250905174736.260694-1-r772577952@gmail.com [sean: add context Link, drop local APIC change, massage changelog accordingly] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-16 12:55:11 -07:00
Liao Yuanhong	cbd860293d	KVM: x86: hyper-v: Use guard() instead of mutex_lock() to simplify code Use guard(mutex) instead of mutex_lock/mutex_unlock pair to simplify the error handling when setting up the TSC page for a Hyper-V guest. No functional change intended. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Link: https://lore.kernel.org/r/20250901131604.646415-1-liaoyuanhong@vivo.com [sean: tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-16 12:55:11 -07:00
Liao Yuanhong	4319fa120f	KVM: x86: Use guard() instead of mutex_lock() to simplify code Use guard(mutex) instead of mutex_lock/mutex_unlock pair to simplify the error handling when allocating the APIC access page. No functional change intended. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Link: https://lore.kernel.org/r/20250901131822.647802-1-liaoyuanhong@vivo.com [sean: add blank link to isolate guard(), tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-16 12:55:10 -07:00
Dapeng Mi	06dc910f5e	KVM: x86/pmu: Correct typo "_COUTNERS" to "_COUNTERS" Fix typos. "_COUTNERS" -> "_COUNTERS". Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Link: https://lore.kernel.org/r/20250718001905.196989-2-dapeng1.mi@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-16 12:55:09 -07:00
Sagi Shahar	b3a37bff8d	KVM: TDX: Reject fully in-kernel irqchip if EOIs are protected, i.e. for TDX VMs Reject KVM_CREATE_IRQCHIP if the VM type has protected EOIs, i.e. if KVM can't intercept EOI and thus can't faithfully emulate level-triggered interrupts that are routed through the I/O APIC. For TDX VMs, the TDX-Module owns the VMX EOI-bitmap and configures all IRQ vectors to have the CPU accelerate EOIs, i.e. doesn't allow KVM to intercept any EOIs. KVM already requires a split irqchip[1], but does so during vCPU creation, which is both too late to allow userspace to fallback to a split irqchip and a less-than-stellar experience for userspace since an -EINVAL on KVM_VCPU_CREATE is far harder to debug/triage than failure exactly on KVM_CREATE_IRQCHIP. And of course, allowing an action that ultimately fails is arguably a bug regardless of the impact on userspace. Link: https://lore.kernel.org/lkml/20250222014757.897978-11-binbin.wu@linux.intel.com [1] Link: https://lore.kernel.org/lkml/aK3vZ5HuKKeFuuM4@google.com Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sagi Shahar <sagis@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250827011726.2451115-1-sagis@google.com [sean: massage shortlog+changelog, relocate setting has_protected_eoi] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-16 12:54:15 -07:00
Thorsten Blum	fc55b4cda0	KVM: nSVM: Replace kzalloc() + copy_from_user() with memdup_user() Replace kzalloc() followed by copy_from_user() with memdup_user() to improve and simplify svm_set_nested_state(). Return early if an error occurs instead of trying to allocate memory for 'save' when memory allocation for 'ctl' already failed. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250903002951.118912-1-thorsten.blum@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-11 09:03:14 -07:00
Sean Christopherson	aebc62b3de	KVM: selftests: Add support for DIV and IDIV in the fastops test Extend the fastops test coverage to DIV and IDIV, specifically to provide coverage for #DE (divide error) exceptions, as #DE is the only exception that can occur in KVM's fastops path, i.e. that requires exception fixup. Link: https://lore.kernel.org/r/20250909202835.333554-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-11 08:55:44 -07:00
Sean Christopherson	fe08478b1d	KVM: selftests: Dedup the gnarly constraints of the fastops tests (more macros!) Add a fastop() macro along with macros to define its required constraints, and use the macros to dedup the innermost guts of the fastop testcases. No functional change intended. Link: https://lore.kernel.org/r/20250909202835.333554-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-11 08:55:44 -07:00
Sean Christopherson	9bf5da1ca4	KVM: selftests: Add coverage for 'b' (byte) sized fastops emulation Extend the fastops test to cover instructions that operate on 8-bit data. Support for 8-bit instructions was omitted from the original commit purely due to complications with BT not having a r/m8 variant. To keep the RFLAGS.CF behavior deterministic and not heavily biased to '0' or '1', continue using BT, but cast and load the to-be-tested value into a dedicated 32-bit constraint. Supporting 8-bit operations will allow using guest_test_fastops() as-is to provide full coverage for DIV and IDIV. For divide operations, covering all operand sizes _is_ interesting, because KVM needs provide exception fixup for each size (failure to handle a #DE could panic the host). Link: https://lore.kernel.org/all/aIF7ZhWZxlkcpm4y@google.com Link: https://lore.kernel.org/r/20250909202835.333554-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-11 08:55:44 -07:00
Sean Christopherson	7b39b6c769	KVM: selftests: Add support for #DE exception fixup Add support for handling #DE (divide error) exceptions in KVM selftests so that the fastops test can verify KVM correctly handles #DE when emulating DIV or IDIV on behalf of the guest. Morph #DE to 0xff (i.e. to -1) as a mostly-arbitrary vector to indicate #DE, so that '0' (the real #DE vector) can still be used to indicate "no exception". Link: https://lore.kernel.org/r/20250909202835.333554-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-11 08:55:44 -07:00
Sean Christopherson	aac057dd62	KVM: x86: Move vector_hashing into lapic.c Move the vector_hashing module param into lapic.c now that all usage is contained within the local APIC emulation code. Opportunistically drop the accessor and append "_enabled" to the variable to help capture that it's a boolean module param. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-10 12:05:13 -07:00
Sean Christopherson	73473f31a4	KVM: x86: Make "lowest priority" helpers local to lapic.c Make various helpers for resolving lowest priority IRQs local to lapic.c now that kvm_irq_delivery_to_apic() lives in lapic.c as well. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-10 12:05:12 -07:00
Sean Christopherson	cbf5d94574	KVM: x86: Move kvm_irq_delivery_to_apic() from irq.c to lapic.c Move kvm_irq_delivery_to_apic() to lapic.c as it is specific to local APIC emulation. This will allow burying more local APIC code in lapic.c, e.g. the various "lowest priority" helpers. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-10 12:05:09 -07:00
Sean Christopherson	2f5f8fb9de	KVM: SEV: Save the SEV policy if and only if LAUNCH_START succeeds Wait until LAUNCH_START fully succeeds to set a VM's SEV/SNP policy so that KVM doesn't keep a potentially stale policy. In practice, the issue is benign as the policy is only used to detect if the VMSA can be decrypted, and the VMSA only needs to be decrypted if LAUNCH_UPDATE and thus LAUNCH_START succeeded. Fixes: `962e2b6152` ("KVM: SVM: Decrypt SEV VMSA in dump_vmcb() if debugging is enabled") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Kim Phillips <kim.phillips@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250821213841.3462339-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-08 13:19:41 -07:00
Alok Tiwari	665071186c	KVM: selftests: Fix typo in hyperv cpuid test message Fix a typo in hyperv_cpuid.c test assertion log: replace "our of supported range" -> "out of supported range". Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://lore.kernel.org/r/20250824181642.629297-1-alok.a.tiwari@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-08 13:03:22 -07:00
Nikunj A Dadhania	a311fce2b6	KVM: SVM: Enable Secure TSC for SNP guests Add support for Secure TSC, allowing userspace to configure the Secure TSC feature for SNP guests. Use the SNP specification's desired TSC frequency parameter during the SNP_LAUNCH_START command to set the mean TSC frequency in KHz for Secure TSC enabled guests. Always use kvm->arch.arch.default_tsc_khz as the TSC frequency that is passed to SNP guests in the SNP_LAUNCH_START command. The default value is the host TSC frequency. The userspace can optionally change the TSC frequency via the KVM_SET_TSC_KHZ ioctl before calling the SNP_LAUNCH_START ioctl. Introduce the read-only MSR GUEST_TSC_FREQ (0xc0010134) that returns guest's effective frequency in MHZ when Secure TSC is enabled for SNP guests. Disable interception of this MSR when Secure TSC is enabled. Note that GUEST_TSC_FREQ MSR is accessible only to the guest and not from the hypervisor context. Co-developed-by: Ketan Chaturvedi <Ketan.Chaturvedi@amd.com> Signed-off-by: Ketan Chaturvedi <Ketan.Chaturvedi@amd.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> [sean: contain Secure TSC to sev.c] Link: https://lore.kernel.org/r/20250819234833.3080255-9-seanjc@google.com [sean: return -EINVAL if TSC frequency is '0'] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-08-21 08:51:39 -07:00
Sean Christopherson	f7b1f0c162	KVM: SEV: Fold sev_es_vcpu_reset() into sev_vcpu_create() Fold the remaining line of sev_es_vcpu_reset() into sev_vcpu_create() as there's no need for a dedicated RESET hook just to init a mutex, and the mutex should be initialized as early as possible anyways. No functional change intended. Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-08-21 08:45:15 -07:00
Sean Christopherson	baf6ed1772	KVM: SEV: Set RESET GHCB MSR value during sev_es_init_vmcb() Set the RESET value for the GHCB "MSR" during sev_es_init_vmcb() instead of sev_es_vcpu_reset() to allow for dropping sev_es_vcpu_reset() entirely. Note, the call to sev_init_vmcb() from sev_migrate_from() also kinda sorta emulates a RESET, but sev_migrate_from() immediately overwrites ghcb_gpa with the source's current value, so whether or not stuffing the GHCB version is correct/desirable is moot. No functional change intended. Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-08-21 08:45:04 -07:00
Sean Christopherson	3d4e882e34	KVM: SEV: Move init of SNP guest state into sev_init_vmcb() Move the initialization of SNP guest state from svm_vcpu_reset() into sev_init_vmcb() to reduce the number of paths that deal with INIT/RESET for SEV+ vCPUs from 4+ to 1. Plumb in @init_event as necessary. Opportunistically check for an SNP guest outside of sev_snp_init_protected_guest_state() so that sev_init_vmcb() is consistent with respect to checking for SEV-ES+ and SNP+ guests. No functional change intended. Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20250819234833.3080255-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-08-21 08:44:49 -07:00

1 2 3 4 5 ...

1382009 Commits