Commit Graph

1311374 Commits

Author SHA1 Message Date
Paolo Bonzini
1508bae370 Merge tag 'kvmarm-fixes-6.13-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.13, part #2

 - Constrain invalidations from GICR_INVLPIR to only affect the LPI
   INTID space

 - Set of robustness improvements to the management of vgic irqs and GIC
   ITS table entries

 - Fix compilation issue w/ CONFIG_CC_OPTIMIZE_FOR_SIZE=y where
   set_sysreg_masks() wasn't getting inlined, breaking check for a
   constant sysreg index

 - Correct KVM's vPMU overflow condition to match the architecture for
   hyp and non-hyp counters
2024-11-21 06:00:16 -05:00
Oliver Upton
13905f4547 KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters
The 'global enable control' (as it is termed in the architecture) for
counters reserved by EL2 is MDCR_EL2.HPME. Use that instead of
PMCR_EL0.E when evaluating the overflow state for hyp counters.

Change the return value to a bool while at it, which better reflects the
fact that the overflow state is a shared signal and not a per-counter
property.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241120005230.2335682-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:23:32 -08:00
Raghavendra Rao Ananta
54bbee190d KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
DDI0487K.a D13.3.1 describes the PMU overflow condition, which evaluates
to true if any counter's global enable (PMCR_EL0.E), overflow flag
(PMOVSSET_EL0[n]), and interrupt enable (PMINTENSET_EL1[n]) are all 1.
Of note, this does not require a counter to be enabled
(i.e. PMCNTENSET_EL0[n] = 1) to generate an overflow.

Align kvm_pmu_overflow_status() with the reality of the architecture
and stop using PMCNTENSET_EL0 as part of the overflow condition. The
bug was discovered while running an SBSA PMU test [*], which only sets
PMCR.E, PMOVSSET<0>, PMINTENSET<0>, and expects an overflow interrupt.

Cc: stable@vger.kernel.org
Fixes: 76d883c4e6 ("arm64: KVM: Add access handler for PMOVSSET and PMOVSCLR register")
Link: https://github.com/ARM-software/sbsa-acs/blob/master/test_pool/pmu/operating_system/test_pmu001.c
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
[ oliver: massaged changelog ]
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241120005230.2335682-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:22:49 -08:00
Marc Zyngier
0f3a0f23f5 KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure
When compiling with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, set_sysreg_masks()
fails to compile thanks to:

	BUILD_BUG_ON(!__builtin_constant_p(sr));

as the compiler doesn't identify sr as a constant, despite all the
callers passing constants.

Fix the issue by always inlining this function, which allows GCC to
do the right thing.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411201857.ZNudtGJl-lkp@intel.com/
Fixes: a016202009 ("KVM: arm64: Extend masking facility to arbitrary registers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20241120111516.304250-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:22:00 -08:00
Marc Zyngier
3b2c81d5fe KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes
The ITS ABI infrastructure allows for some pretty lax code, where
the size of the data doesn't have to match the size of the entry,
potentially leading to a collection of interesting bugs.

Commit 7fe28d7e68 ("KVM: arm64: vgic-its: Add a data length check
in vgic_its_save_*") added some checks, but starts by implicitly
casting all writes to a 64bit value, hiding some of the issues.

Instead, introduce macros that will check the data type actually used
for dealing with the table entries. The macros are taking a symbolic
entry type that is used to fetch the size of the entry type for the
current ABI. This immediately catches a couple of low-impact gotchas
(zero values that are implicitly 32bit), easy enough to fix.

Given that we currently only have a single ABI, hardcode a couple of
BUILD_BUG_ON()s that will fire if we use anything but a 64bit quantity,
and some (currently unreachable) fallback code that may become useful
one day.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241117165757.247686-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:21:08 -08:00
Marc Zyngier
e7619f2a2f KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition
VGIC_MAX_PRIVATE is a pretty useless definition, and is better
replaced with VGIC_NR_PRIVATE_IRQS.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241117165757.247686-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:21:08 -08:00
Marc Zyngier
add570b39f KVM: arm64: vgic: Make vgic_get_irq() more robust
vgic_get_irq() has an awkward signature, as it takes both a kvm
*and* a vcpu, where the vcpu is allowed to be NULL if the INTID
being looked up is a global interrupt (SPI or LPI).

This leads to potentially problematic situations where the INTID
passed is a private interrupt, but that there is no vcpu.

In order to make things less ambiguous, let have *two* helpers
instead:

- vgic_get_irq(struct kvm *kvm, u32 intid), which is only concerned
  with *global* interrupts, as indicated by the lack of vcpu.

- vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid), which can
  return *any* interrupt class, but must have of course a non-NULL
  vcpu.

Most of the code nicely falls under one or the other situations,
except for a couple of cases (close to the UABI or in the debug code)
where we have to distinguish between the two cases.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241117165757.247686-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:21:08 -08:00
Marc Zyngier
d561491ba9 KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
Make sure we filter out non-LPI invalidation when handling writes
to GICR_INVLPIR.

Fixes: 4645d11f4a ("KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation")
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241117165757.247686-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:21:07 -08:00
Sean Christopherson
9ee62c33c0 KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
Rework CONFIG_KVM_X86's dependency to only check if KVM_INTEL or KVM_AMD
is selected, i.e. not 'n'.  Having KVM_X86 depend directly on the vendor
modules results in KVM_X86 being set to 'm' if at least one of KVM_INTEL
or KVM_AMD is enabled, but neither is 'y', regardless of the value of KVM
itself.

The documentation for def_tristate doesn't explicitly state that this is
the intended behavior, but it does clearly state that the "if" section is
parsed as a dependency, i.e. the behavior is consistent with how tristate
dependencies are handled in general.

  Optionally dependencies for this default value can be added with "if".

Fixes: ea4290d77b ("KVM: x86: leave kvm.ko out of the build if no vendor module is requested")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241118172002.1633824-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-19 19:34:51 -05:00
Arnd Bergmann
1331343af6 KVM: x86: add back X86_LOCAL_APIC dependency
Enabling KVM now causes a build failure on x86-32 if X86_LOCAL_APIC
is disabled:

arch/x86/kvm/svm/svm.c: In function 'svm_emergency_disable_virtualization_cpu':
arch/x86/kvm/svm/svm.c:597:9: error: 'kvm_rebooting' undeclared (first use in this function); did you mean 'kvm_irq_routing'?
  597 |         kvm_rebooting = true;
      |         ^~~~~~~~~~~~~
      |         kvm_irq_routing
arch/x86/kvm/svm/svm.c:597:9: note: each undeclared identifier is reported only once for each function it appears in
make[6]: *** [scripts/Makefile.build:221: arch/x86/kvm/svm/svm.o] Error 1
In file included from include/linux/rculist.h:11,
                 from include/linux/hashtable.h:14,
                 from arch/x86/kvm/svm/avic.c:18:
arch/x86/kvm/svm/avic.c: In function 'avic_pi_update_irte':
arch/x86/kvm/svm/avic.c:909:38: error: 'struct kvm' has no member named 'irq_routing'
  909 |         irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
      |                                      ^~
include/linux/rcupdate.h:538:17: note: in definition of macro '__rcu_dereference_check'
  538 |         typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \

Move the dependency to the same place as before.

Fixes: ea4290d77b ("KVM: x86: leave kvm.ko out of the build if no vendor module is requested")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410060426.e9Xsnkvi-lkp@intel.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sean Christopherson <seanjc@google.com>
[sean: add Cc to stable, tweak shortlog scope]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241118172002.1633824-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-19 19:34:51 -05:00
Sean Christopherson
85434c3c73 Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
Revert back to clearing VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL in KVM's
golden VMCS config, as applying the workaround during vCPU creation is
pointless and broken.  KVM *unconditionally* clears the controls in the
values returned by vmx_vmentry_ctrl() and vmx_vmexit_ctrl(), as KVM loads
PERF_GLOBAL_CTRL if and only if its necessary to do so.  E.g. if KVM wants
to run the guest with the same PERF_GLOBAL_CTRL as the host, then there's
no need to re-load the MSR on entry and exit.

Even worse, the buggy commit failed to apply the erratum where it's
actually needed, add_atomic_switch_msr().  As a result, KVM completely
ignores the erratum for all intents and purposes, i.e. uses the flawed
VMCS controls to load PERF_GLOBAL_CTRL.

To top things off, the patch was intended to be dropped, as the premise
of an L1 VMM being able to pivot on FMS is flawed, and KVM can (and now
does) fully emulate the controls in software.  Simply revert the commit,
as all upstream supported kernels that have the buggy commit should also
have commit f4c93d1a0e ("KVM: nVMX: Always emulate PERF_GLOBAL_CTRL
VM-Entry/VM-Exit controls"), i.e. the (likely theoretical) live migration
concern is a complete non-issue.

Opportunistically drop the manual "kvm: " scope from the warning about
the erratum, as KVM now uses pr_fmt() to provide the correct scope (v6.1
kernels and earlier don't, but the erratum only applies to CPUs that are
15+ years old; it's not worth a separate patch).

This reverts commit 9d78d6fb18.

Link: https://lore.kernel.org/all/YtnZmCutdd5tpUmz@google.com
Fixes: 9d78d6fb18 ("KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-ID: <20241119011433.1797921-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-19 19:34:35 -05:00
Paolo Bonzini
d96c77bd4e KVM: x86: switch hugepage recovery thread to vhost_task
kvm_vm_create_worker_thread() is meant to be used for kthreads that
can consume significant amounts of CPU time on behalf of a VM or in
response to how the VM behaves (for example how it accesses its memory).
Therefore it wants to charge the CPU time consumed by that work to
the VM's container.

However, because of these threads, cgroups which have kvm instances
inside never complete freezing.  This can be trivially reproduced:

  root@test ~# mkdir /sys/fs/cgroup/test
  root@test ~# echo $$ > /sys/fs/cgroup/test/cgroup.procs
  root@test ~# qemu-system-x86_64 -nographic -enable-kvm

and in another terminal:

  root@test ~# echo 1 > /sys/fs/cgroup/test/cgroup.freeze
  root@test ~# cat /sys/fs/cgroup/test/cgroup.events
  populated 1
  frozen 0

The cgroup freezing happens in the signal delivery path but
kvm_nx_huge_page_recovery_worker, while joining non-root cgroups, never
calls into the signal delivery path and thus never gets frozen. Because
the cgroup freezer determines whether a given cgroup is frozen by
comparing the number of frozen threads to the total number of threads
in the cgroup, the cgroup never becomes frozen and users waiting for
the state transition may hang indefinitely.

Since the worker kthread is tied to a user process, it's better if
it behaves similarly to user tasks as much as possible, including
being able to send SIGSTOP and SIGCONT.  In fact, vhost_task is all
that kvm_vm_create_worker_thread() wanted to be and more: not only it
inherits the userspace process's cgroups, it has other niceties like
being parented properly in the process tree.  Use it instead of the
homegrown alternative.

Incidentally, the new code is also better behaved when you flip recovery
back and forth to disabled and back to enabled.  If your recovery period
is 1 minute, it will run the next recovery after 1 minute independent
of how many times you flipped the parameter.

(Commit message based on emails from Tejun).

Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Luca Boccassi <bluca@debian.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Luca Boccassi <bluca@debian.org>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-14 13:20:04 -05:00
Paolo Bonzini
0586ade9e7 Merge tag 'loongarch-kvm-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.13

1. Add iocsr and mmio bus simulation in kernel.
2. Add in-kernel interrupt controller emulation.
3. Add virt extension support for eiointc irqchip.
2024-11-14 07:06:24 -05:00
Paolo Bonzini
7b541d557f Merge tag 'kvmarm-6.13' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.13, part #1

 - Support for stage-1 permission indirection (FEAT_S1PIE) and
   permission overlays (FEAT_S1POE), including nested virt + the
   emulated page table walker

 - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
   was introduced in PSCIv1.3 as a mechanism to request hibernation,
   similar to the S4 state in ACPI

 - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
   part of it, introduce trivial initialization of the host's MPAM
   context so KVM can use the corresponding traps

 - PMU support under nested virtualization, honoring the guest
   hypervisor's trap configuration and event filtering when running a
   nested guest

 - Fixes to vgic ITS serialization where stale device/interrupt table
   entries are not zeroed when the mapping is invalidated by the VM

 - Avoid emulated MMIO completion if userspace has requested synchronous
   external abort injection

 - Various fixes and cleanups affecting pKVM, vCPU initialization, and
   selftests
2024-11-14 07:05:36 -05:00
Paolo Bonzini
b467ab82a9 KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
For userspace that wants to disable KVM_X86_QUIRK_STUFF_FEATURE_MSRS, it
is useful to know what bits can be set to 1 in MSR_PLATFORM_INFO (apart
from the TSC ratio).  The right way to do that is via /dev/kvm's
feature MSR mechanism.

In fact, MSR_PLATFORM_INFO is already a feature MSR for the purpose of
blocking updates after the vCPU is run, but KVM_GET_MSRS did not return
a valid value for it.

Just like in a VM that leaves KVM_X86_QUIRK_STUFF_FEATURE_MSRS enabled,
the TSC ratio field is left to 0.  Only bit 31 is set.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-13 14:40:56 -05:00
Tao Su
a0423af92c x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
Latest Intel platform Clearwater Forest has introduced new instructions
enumerated by CPUIDs of SHA512, SM3, SM4 and AVX-VNNI-INT16. Advertise
these CPUIDs to userspace so that guests can query them directly.

SHA512, SM3 and SM4 are on an expected-dense CPUID leaf and some other
bits on this leaf have kernel usages. Considering they have not truly
kernel usages, hide them in /proc/cpuinfo.

These new instructions only operate in xmm, ymm registers and have no new
VMX controls, so there is no additional host enabling required for guests
to use these instructions, i.e. advertising these CPUIDs to userspace is
safe.

Tested-by: Jiaan Lu <jiaan.lu@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20241105054825.870939-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-13 14:40:40 -05:00
Paolo Bonzini
35ff7bfb04 Documentation: KVM: fix malformed table
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 5f6a3badbb ("KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-13 07:20:01 -05:00
Paolo Bonzini
2e9a2c624e Merge branch 'kvm-docs-6.13' into HEAD
- Drop obsolete references to PPC970 KVM, which was removed 10 years ago.

- Fix incorrect references to non-existing ioctls

- List registers supported by KVM_GET/SET_ONE_REG on s390

- Use rST internal links

- Reorganize the introduction to the API document
2024-11-13 07:18:12 -05:00
Paolo Bonzini
bb4409a9e7 Merge tag 'kvm-x86-misc-6.13' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.13

 - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.

 - Quirk KVM's misguided behavior of initialized certain feature MSRs to
   their maximum supported feature set, which can result in KVM creating
   invalid vCPU state.  E.g. initializing PERF_CAPABILITIES to a non-zero
   value results in the vCPU having invalid state if userspace hides PDCM
   from the guest, which can lead to save/restore failures.

 - Fix KVM's handling of non-canonical checks for vCPUs that support LA57
   to better follow the "architecture", in quotes because the actual
   behavior is poorly documented.  E.g. most MSR writes and descriptor
   table loads ignore CR4.LA57 and operate purely on whether the CPU
   supports LA57.

 - Bypass the register cache when querying CPL from kvm_sched_out(), as
   filling the cache from IRQ context is generally unsafe, and harden the
   cache accessors to try to prevent similar issues from occuring in the
   future.

 - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
   over-advertises SPEC_CTRL when trying to support cross-vendor VMs.

 - Minor cleanups
2024-11-13 06:33:00 -05:00
Paolo Bonzini
ef6fdc0e4c Merge tag 'kvm-x86-vmx-6.13' of https://github.com/kvm-x86/linux into HEAD
KVM VMX change for 6.13

 - Remove __invept()'s unused @gpa param, which was left behind when KVM
   dropped code for invalidating a specific GPA (Intel never officially
   documented support for single-address INVEPT; presumably pre-production
   CPUs supported it at some point).
2024-11-13 06:32:43 -05:00
Paolo Bonzini
edd1e59878 Merge tag 'kvm-x86-selftests-6.13' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.13

 - Enable XFAM-based features by default for all selftests VMs, which will
   allow removing the "no AVX" restriction.
2024-11-13 06:32:15 -05:00
Paolo Bonzini
c59de14133 Merge tag 'kvm-x86-mmu-6.13' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 6.13

 - Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
   documentation, harden against unexpected changes, and to simplify
   A/D-disabled MMUs by using the hardware-defined A/D bits to track if a
   PFN is Accessed and/or Dirty.

 - Elide TLB flushes when aging SPTEs, as has been done in x86's primary
   MMU for over 10 years.

 - Batch TLB flushes when zapping collapsible TDP MMU SPTEs, i.e. when
   dirty logging is toggled off, which reduces the time it takes to disable
   dirty logging by ~3x.

 - Recover huge pages in-place in the TDP MMU instead of zapping the SP
   and waiting until the page is re-accessed to create a huge mapping.
   Proactively installing huge pages can reduce vCPU jitter in extreme
   scenarios.

 - Remove support for (poorly) reclaiming page tables in shadow MMUs via
   the primary MMU's shrinker interface.
2024-11-13 06:31:54 -05:00
Paolo Bonzini
b39d1578d5 Merge tag 'kvm-x86-generic-6.13' of https://github.com/kvm-x86/linux into HEAD
KVM generic changes for 6.13

 - Rework kvm_vcpu_on_spin() to use a single for-loop instead of making two
   partial poasses over "all" vCPUs.  Opportunistically expand the comment
   to better explain the motivation and logic.

 - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
   of RCU, so that running a vCPU on a different task doesn't encounter
   long stalls due to having to wait for all CPUs become quiescent.
2024-11-13 06:24:19 -05:00
Bibo Mao
9899b82010 irqchip/loongson-eiointc: Add virt extension support
Interrupts can be routed to maximal four virtual CPUs with real HW
EIOINTC interrupt controller model, since interrupt routing is encoded
with CPU bitmap and EIOINTC node combined method. Here add the EIOINTC
virt extension support so that interrupts can be routed to 256 vCPUs in
virtual machine mode. CPU bitmap is replaced with normal encoding and
EIOINTC node type is removed, so there are 8 bits for cpu selection, at
most 256 vCPUs are supported for interrupt routing.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
1928254c5c LoongArch: KVM: Add irqfd support
Enable the KVM_IRQ_ROUTING/KVM_IRQCHIP/KVM_MSI configuration items,
add the KVM_CAP_IRQCHIP capability, and implement the query interface
of the in-kernel irqchip.

Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
d206d95148 LoongArch: KVM: Add PCHPIC user mode read and write functions
Implement the communication interface between the user mode programs
and the kernel in PCHPIC interrupt control simulation, which is used
to obtain or send the simulation data of the interrupt controller in
the user mode process, and is also used in VM migration or VM saving
and restoration.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
f5f31efa3c LoongArch: KVM: Add PCHPIC read and write functions
Add implementation of IPI interrupt controller's address space read and
write function simulation.

Implement interrupt injection interface under loongarch.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
e785dfacf7 LoongArch: KVM: Add PCHPIC device support
Add device model for PCHPIC interrupt controller, implemente basic
create & destroy interface, and register device model to kvm device
table.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
1ad7efa552 LoongArch: KVM: Add EIOINTC user mode read and write functions
Implement the communication interface between the user mode programs
and the kernel in EIOINTC interrupt controller simulation, which is
used to obtain or send the simulation data of the interrupt controller
in the user mode process, and is also used in VM migration or VM saving
and restoration.

Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
3956a52bc0 LoongArch: KVM: Add EIOINTC read and write functions
Add implementation of EIOINTC interrupt controller's address space read
and write function simulation.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
2e8b9df826 LoongArch: KVM: Add EIOINTC device support
Add device model for EIOINTC interrupt controller, implement basic
create & destroy interfaces, and register device model to kvm device
table.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
8e3054261b LoongArch: KVM: Add IPI user mode read and write function
Implement the communication interface between the user mode programs
and the kernel in IPI interrupt controller simulation, which is used
to obtain or send the simulation data of the interrupt controller in
the user mode process, and is also used in VM migration or VM saving
and restoration.

Signed-off-by: Min Zhou <zhoumin@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
daee2f9cae LoongArch: KVM: Add IPI read and write function
Add implementation of IPI interrupt controller's address space read and
write function simulation.

Signed-off-by: Min Zhou <zhoumin@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
c532de5a67 LoongArch: KVM: Add IPI device support
Add device model for IPI interrupt controller, implement basic create &
destroy interfaces, and register device model to kvm device table.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li
948ccbd950 LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
Add iocsr and mmio memory read and write simulation to the kernel. When
the VM accesses the device address space through iocsr instructions or
mmio, it does not need to return to the qemu user mode but can directly
completes the access in the kernel mode.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:26 +08:00
James Clark
60ad25e14a KVM: arm64: Pass on SVE mapping failures
This function can fail but its return value isn't passed onto the
caller. Presumably this could result in a broken state.

Fixes: 66d5b53e20 ("KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241112105604.795809-1-james.clark@linaro.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-12 11:04:39 -08:00
Paolo Bonzini
185e02d61e Merge tag 'kvm-s390-next-6.13-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- second part of the ucontrol selftest
- cpumodel sanity check selftest
- gen17 cpumodel changes
2024-11-12 13:17:55 -05:00
Oliver Upton
9d0bee66f7 Merge branch kvm-arm64/vgic-its-fixes into kvmarm/next
* kvm-arm64/vgic-its-fixes:
  : Fixes for vgic-its save/restore, courtesy of Kunkun Jiang and Jing Zhang
  :
  : Address bugs where restoring an ITS consumes a stale DTE/ITE, which
  : may lead to either garbage mappings in the ITS or the overall restore
  : ioctl failing. The fix in both cases is to zero a DTE/ITE when its
  : translation has been invalidated by the guest.
  KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE
  KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device
  KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 20:09:09 +00:00
Kunkun Jiang
7602ffd1d5 KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE
When DISCARD frees an ITE, it does not invalidate the
corresponding ITE. In the scenario of continuous saves and
restores, there may be a situation where an ITE is not saved
but is restored. This is unreasonable and may cause restore
to fail. This patch clears the corresponding ITE when DISCARD
frees an ITE.

Cc: stable@vger.kernel.org
Fixes: eff484e029 ("KVM: arm64: vgic-its: ITT save and restore")
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[Jing: Update with entry write helper]
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20241107214137.428439-6-jingzhangos@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 19:54:03 +00:00
Kunkun Jiang
e9649129d3 KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device
vgic_its_save_device_tables will traverse its->device_list to
save DTE for each device. vgic_its_restore_device_tables will
traverse each entry of device table and check if it is valid.
Restore if valid.

But when MAPD unmaps a device, it does not invalidate the
corresponding DTE. In the scenario of continuous saves
and restores, there may be a situation where a device's DTE
is not saved but is restored. This is unreasonable and may
cause restore to fail. This patch clears the corresponding
DTE when MAPD unmaps a device.

Cc: stable@vger.kernel.org
Fixes: 57a9a11715 ("KVM: arm64: vgic-its: Device table save/restore")
Co-developed-by: Shusen Li <lishusen2@huawei.com>
Signed-off-by: Shusen Li <lishusen2@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[Jing: Update with entry write helper]
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20241107214137.428439-5-jingzhangos@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 19:52:31 +00:00
Jing Zhang
7fe28d7e68 KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*
In all the vgic_its_save_*() functinos, they do not check whether
the data length is 8 bytes before calling vgic_write_guest_lock.
This patch adds the check. To prevent the kernel from being blown up
when the fault occurs, KVM_BUG_ON() is used. And the other BUG_ON()s
are replaced together.

Cc: stable@vger.kernel.org
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[Jing: Update with the new entry read/write helpers]
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20241107214137.428439-4-jingzhangos@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 19:50:14 +00:00
Oliver Upton
6d4b81e2e7 Merge branch kvm-arm64/nv-pmu into kvmarm/next
* kvm-arm64/nv-pmu:
  : Support for vEL2 PMU controls
  :
  : Align the vEL2 PMU support with the current state of non-nested KVM,
  : including:
  :
  :  - Trap routing, with the annoying complication of EL2 traps that apply
  :    in Host EL0
  :
  :  - PMU emulation, using the correct configuration bits depending on
  :    whether a counter falls in the hypervisor or guest range of PMCs
  :
  :  - Perf event swizzling across nested boundaries, as the event filtering
  :    needs to be remapped to cope with vEL2
  KVM: arm64: nv: Reprogram PMU events affected by nested transition
  KVM: arm64: nv: Apply EL2 event filtering when in hyp context
  KVM: arm64: nv: Honor MDCR_EL2.HLP
  KVM: arm64: nv: Honor MDCR_EL2.HPME
  KVM: arm64: Add helpers to determine if PMC counts at a given EL
  KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN
  KVM: arm64: Rename kvm_pmu_valid_counter_mask()
  KVM: arm64: nv: Advertise support for FEAT_HPMN0
  KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN
  KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0
  KVM: arm64: nv: Reinject traps that take effect in Host EL0
  KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY
  KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps
  KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2
  arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1
  arm64: sysreg: Migrate MDCR_EL2 definition to table
  arm64: sysreg: Describe ID_AA64DFR2_EL1 fields

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:49:02 +00:00
Oliver Upton
4bc1a8808e Merge branch kvm-arm64/mmio-sea into kvmarm/next
* kvm-arm64/mmio-sea:
  : Fix for SEA injection in response to MMIO
  :
  : Fix + test coverage for SEA injection in response to an unhandled MMIO
  : exit to userspace. Naturally, if userspace decides to abort an MMIO
  : instruction KVM shouldn't continue with instruction emulation...
  KVM: arm64: selftests: Add tests for MMIO external abort injection
  KVM: arm64: selftests: Convert to kernel's ESR terminology
  tools: arm64: Grab a copy of esr.h from kernel
  KVM: arm64: Don't retire aborted MMIO instruction

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:48:12 +00:00
Oliver Upton
fbf3372baa Merge branch kvm-arm64/misc into kvmarm/next
* kvm-arm64/misc:
  : Miscellaneous updates
  :
  :  - Drop useless check against vgic state in ICC_CLTR_EL1.SEIS read
  :    emulation
  :
  :  - Fix trap configuration for pKVM
  :
  :  - Close the door on initialization bugs surrounding userspace irqchip
  :    static key by removing it.
  KVM: selftests: Don't bother deleting memslots in KVM when freeing VMs
  KVM: arm64: Get rid of userspace_irqchip_in_use
  KVM: arm64: Initialize trap register values in hyp in pKVM
  KVM: arm64: Initialize the hypervisor's VM state at EL2
  KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp use
  KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu()
  KVM: arm64: Don't map 'kvm_vgic_global_state' at EL2 with pKVM
  KVM: arm64: Just advertise SEIS as 0 when emulating ICC_CTLR_EL1

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:47:50 +00:00
Sean Christopherson
5afe18dfa4 KVM: selftests: Don't bother deleting memslots in KVM when freeing VMs
When freeing a VM, don't call into KVM to manually remove each memslot,
simply cleanup and free any userspace assets associated with the memory
region.  KVM is ultimately responsible for ensuring kernel resources are
freed when the VM is destroyed, deleting memslots one-by-one is
unnecessarily slow, and unless a test is already leaking the VM fd, the
VM will be destroyed when kvm_vm_release() is called.

Not deleting KVM's memslot also allows cleaning up dead VMs without having
to care whether or not the to-be-freed VM is dead or alive.

Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvmarm/Zy0bcM0m-N18gAZz@google.com/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:45:29 +00:00
Oliver Upton
24bb181136 Merge branch kvm-arm64/mpam-ni into kvmarm/next
* kvm-arm64/mpam-ni:
  : Hiding FEAT_MPAM from KVM guests, courtesy of James Morse + Joey Gouly
  :
  : Fix a longstanding bug where FEAT_MPAM was accidentally exposed to KVM
  : guests + the EL2 trap configuration was not explicitly configured. As
  : part of this, bring in skeletal support for initialising the MPAM CPU
  : context so KVM can actually set traps for its guests.
  :
  : Be warned -- if this series leads to boot failures on your system,
  : you're running on turd firmware.
  :
  : As an added bonus (that builds upon the infrastructure added by the MPAM
  : series), allow userspace to configure CTR_EL0.L1Ip, courtesy of Shameer
  : Kolothum.
  KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspace
  KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignored
  KVM: arm64: Disable MPAM visibility by default and ignore VMM writes
  KVM: arm64: Add a macro for creating filtered sys_reg_descs entries
  KVM: arm64: Fix missing traps of guest accesses to the MPAM registers
  arm64: cpufeature: discover CPU support for MPAM
  arm64: head.S: Initialise MPAM EL2 registers and disable traps
  arm64/sysreg: Convert existing MPAM sysregs and add the remaining entries

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:38:30 +00:00
Oliver Upton
7ccd615bc6 Merge branch kvm-arm64/psci-1.3 into kvmarm/next
* kvm-arm64/psci-1.3:
  : PSCI v1.3 support, courtesy of David Woodhouse
  :
  : Bump KVM's PSCI implementation up to v1.3, with the added bonus of
  : implementing the SYSTEM_OFF2 call. Like other system-scoped PSCI calls,
  : this gets relayed to userspace for further processing with a new
  : KVM_SYSTEM_EVENT_SHUTDOWN flag.
  :
  : As an added bonus, implement client-side support for hibernation with
  : the SYSTEM_OFF2 call.
  arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
  KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
  KVM: selftests: Add test for PSCI SYSTEM_OFF2
  KVM: arm64: Add support for PSCI v1.2 and v1.3
  KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
  firmware/psci: Add definitions for PSCI v1.3 specification

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:36:46 +00:00
Oliver Upton
2865463442 Merge branch kvm-arm64/nv-s1pie-s1poe into kvmarm/next
* kvm-arm64/nv-s1pie-s1poe: (36 commits)
  : NV support for S1PIE/S1POE, courtesy of Marc Zyngier
  :
  : Complete support for S1PIE/S1POE at vEL2, including:
  :
  :  - Save/restore of the vEL2 sysreg context
  :
  :  - Use the S1PIE/S1POE context for fast-path AT emulation
  :
  :  - Enlightening the software walker to the behavior of S1PIE/S1POE
  :
  :  - Like any other good NV series, some trap routing descriptions
  KVM: arm64: Handle WXN attribute
  KVM: arm64: Handle stage-1 permission overlays
  KVM: arm64: Make PAN conditions part of the S1 walk context
  KVM: arm64: Disable hierarchical permissions when POE is enabled
  KVM: arm64: Add POE save/restore for AT emulation fast-path
  KVM: arm64: Add save/restore support for POR_EL2
  KVM: arm64: Add basic support for POR_EL2
  KVM: arm64: Add kvm_has_s1poe() helper
  KVM: arm64: Subject S1PIE/S1POE registers to HCR_EL2.{TVM,TRVM}
  KVM: arm64: Drop bogus CPTR_EL2.E0POE trap routing
  arm64: Add encoding for POR_EL2
  KVM: arm64: Rely on visibility to let PIR*_ELx/TCR2_ELx UNDEF
  KVM: arm64: Hide S1PIE registers from userspace when disabled for guests
  KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guests
  KVM: arm64: Define helper for EL2 registers with custom visibility
  KVM: arm64: Add a composite EL2 visibility helper
  KVM: arm64: Implement AT S1PIE support
  KVM: arm64: Disable hierarchical permissions when S1PIE is enabled
  KVM: arm64: Split S1 permission evaluation into direct and hierarchical parts
  KVM: arm64: Add AT fast-path support for S1PIE
  ...

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:36:12 +00:00
Shameer Kolothum
e9b57d7f97 KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspace
Only allow userspace to set VIPT(0b10) or PIPT(0b11) for L1Ip based on
what hardware reports as both AIVIVT (0b01) and VPIPT (0b00) are
documented as reserved.

Using a VIPT for Guest where hardware reports PIPT may lead to over
invalidation, but is still correct. Hence, we can allow downgrading
PIPT to VIPT, but not the other way around.

Reviewed-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20241022073943.35764-1-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:33:09 +00:00
Hendrik Brueckner
7a1f314337 KVM: s390: selftests: Add regression tests for PFCR subfunctions
Check if the PFCR query reported in userspace coincides with the
kernel reported function list. Right now we don't mask the functions
in the kernel so they have to be the same.

Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Hariharan Mari <hari55@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-5-brueckner@linux.ibm.com
[frankja@linux.ibm.com: Added commit description]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-5-brueckner@linux.ibm.com>
2024-11-11 12:15:44 +00:00