Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Enable the per-vcpu dirty-ring tracking mechanism, together with an
     option to keep the good old dirty log around for pages that are
     dirtied by something other than a vcpu.

   - Switch to the relaxed parallel fault handling, using RCU to delay
     page table reclaim and giving better performance under load.

   - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
     option, which multi-process VMMs such as crosvm rely on (see merge
     commit 382b5b87a9: "Fix a number of issues with MTE, such as
     races on the tags being initialised vs the PG_mte_tagged flag as
     well as the lack of support for VM_SHARED when KVM is involved.
     Patches from Catalin Marinas and Peter Collingbourne").

   - Merge the pKVM shadow vcpu state tracking that allows the
     hypervisor to have its own view of a vcpu, keeping that state
     private.

   - Add support for the PMUv3p5 architecture revision, bringing support
     for 64bit counters on systems that support it, and fix the
     no-quite-compliant CHAIN-ed counter support for the machines that
     actually exist out there.

   - Fix a handful of minor issues around 52bit VA/PA support (64kB
     pages only) as a prefix of the oncoming support for 4kB and 16kB
     pages.

   - Pick a small set of documentation and spelling fixes, because no
     good merge window would be complete without those.

  s390:

   - Second batch of the lazy destroy patches

   - First batch of KVM changes for kernel virtual != physical address
     support

   - Removal of a unused function

  x86:

   - Allow compiling out SMM support

   - Cleanup and documentation of SMM state save area format

   - Preserve interrupt shadow in SMM state save area

   - Respond to generic signals during slow page faults

   - Fixes and optimizations for the non-executable huge page errata
     fix.

   - Reprogram all performance counters on PMU filter change

   - Cleanups to Hyper-V emulation and tests

   - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
     guest running on top of a L1 Hyper-V hypervisor)

   - Advertise several new Intel features

   - x86 Xen-for-KVM:

      - Allow the Xen runstate information to cross a page boundary

      - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

      - Add support for 32-bit guests in SCHEDOP_poll

   - Notable x86 fixes and cleanups:

      - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

      - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
        a few years back when eliminating unnecessary barriers when
        switching between vmcs01 and vmcs02.

      - Clean up vmread_error_trampoline() to make it more obvious that
        params must be passed on the stack, even for x86-64.

      - Let userspace set all supported bits in MSR_IA32_FEAT_CTL
        irrespective of the current guest CPUID.

      - Fudge around a race with TSC refinement that results in KVM
        incorrectly thinking a guest needs TSC scaling when running on a
        CPU with a constant TSC, but no hardware-enumerated TSC
        frequency.

      - Advertise (on AMD) that the SMM_CTL MSR is not supported

      - Remove unnecessary exports

  Generic:

   - Support for responding to signals during page faults; introduces
     new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks

  Selftests:

   - Fix an inverted check in the access tracking perf test, and restore
     support for asserting that there aren't too many idle pages when
     running on bare metal.

   - Fix build errors that occur in certain setups (unsure exactly what
     is unique about the problematic setup) due to glibc overriding
     static_assert() to a variant that requires a custom message.

   - Introduce actual atomics for clear/set_bit() in selftests

   - Add support for pinning vCPUs in dirty_log_perf_test.

   - Rename the so called "perf_util" framework to "memstress".

   - Add a lightweight psuedo RNG for guest use, and use it to randomize
     the access pattern and write vs. read percentage in the memstress
     tests.

   - Add a common ucall implementation; code dedup and pre-work for
     running SEV (and beyond) guests in selftests.

   - Provide a common constructor and arch hook, which will eventually
     be used by x86 to automatically select the right hypercall (AMD vs.
     Intel).

   - A bunch of added/enabled/fixed selftests for ARM64, covering
     memslots, breakpoints, stage-2 faults and access tracking.

   - x86-specific selftest changes:

      - Clean up x86's page table management.

      - Clean up and enhance the "smaller maxphyaddr" test, and add a
        related test to cover generic emulation failure.

      - Clean up the nEPT support checks.

      - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.

      - Fix an ordering issue in the AMX test introduced by recent
        conversions to use kvm_cpu_has(), and harden the code to guard
        against similar bugs in the future. Anything that tiggers
        caching of KVM's supported CPUID, kvm_cpu_has() in this case,
        effectively hides opt-in XSAVE features if the caching occurs
        before the test opts in via prctl().

  Documentation:

   - Remove deleted ioctls from documentation

   - Clean up the docs for the x86 MSR filter.

   - Various fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
  KVM: x86: Add proper ReST tables for userspace MSR exits/flags
  KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
  KVM: arm64: selftests: Align VA space allocator with TTBR0
  KVM: arm64: Fix benign bug with incorrect use of VA_BITS
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
  KVM: x86: Advertise that the SMM_CTL MSR is not supported
  KVM: x86: remove unnecessary exports
  KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
  tools: KVM: selftests: Convert clear/set_bit() to actual atomics
  tools: Drop "atomic_" prefix from atomic test_and_set_bit()
  tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
  KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
  perf tools: Use dedicated non-atomic clear/set bit helpers
  tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
  KVM: arm64: selftests: Enable single-step without a "full" ucall()
  KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
  KVM: Remove stale comment about KVM_REQ_UNHALT
  KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
  KVM: Reference to kvm_userspace_memory_region in doc and comments
  KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
  ...
This commit is contained in:
Linus Torvalds
2022-12-15 11:12:21 -08:00
257 changed files with 12109 additions and 5029 deletions

View File

@@ -77,4 +77,13 @@
#define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
#define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
/*
* Compile time check that field has an expected offset
*/
#define ASSERT_STRUCT_OFFSET(type, field, expected_offset) \
BUILD_BUG_ON_MSG(offsetof(type, field) != (expected_offset), \
"Offset of " #field " in " #type " has changed.")
#endif /* _LINUX_BUILD_BUG_H */

View File

@@ -18,5 +18,6 @@
#define KPF_UNCACHED 39
#define KPF_SOFTDIRTY 40
#define KPF_ARCH_2 41
#define KPF_ARCH_3 42
#endif /* LINUX_KERNEL_PAGE_FLAGS_H */

View File

@@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
return 0;
}
static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
{
return true;
}
static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
int index, u32 size)
{
@@ -49,7 +54,7 @@ static inline int kvm_dirty_ring_reset(struct kvm *kvm,
return 0;
}
static inline void kvm_dirty_ring_push(struct kvm_dirty_ring *ring,
static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
u32 slot, u64 offset)
{
}
@@ -64,13 +69,11 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
{
}
static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
{
return true;
}
#else /* CONFIG_HAVE_KVM_DIRTY_RING */
int kvm_cpu_dirty_log_size(void);
bool kvm_use_dirty_bitmap(struct kvm *kvm);
bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
u32 kvm_dirty_ring_get_rsvd_entries(void);
int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
@@ -84,13 +87,14 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
* returns =0: successfully pushed
* <0: unable to push, need to wait
*/
void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
/* for use in vm_operations_struct */
struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
#endif /* CONFIG_HAVE_KVM_DIRTY_RING */

View File

@@ -50,8 +50,8 @@
#endif
/*
* The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
* in kvm, other bits are visible for userspace which are defined in
* The bit 16 ~ bit 31 of kvm_userspace_memory_region::flags are internally
* used in kvm, other bits are visible for userspace which are defined in
* include/linux/kvm_h.
*/
#define KVM_MEMSLOT_INVALID (1UL << 16)
@@ -96,6 +96,7 @@
#define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK)
#define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1)
#define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
#define KVM_PFN_ERR_SIGPENDING (KVM_PFN_ERR_MASK + 3)
/*
* error pfns indicate that the gfn is in slot but faild to
@@ -106,6 +107,15 @@ static inline bool is_error_pfn(kvm_pfn_t pfn)
return !!(pfn & KVM_PFN_ERR_MASK);
}
/*
* KVM_PFN_ERR_SIGPENDING indicates that fetching the PFN was interrupted
* by a pending signal. Note, the signal may or may not be fatal.
*/
static inline bool is_sigpending_pfn(kvm_pfn_t pfn)
{
return pfn == KVM_PFN_ERR_SIGPENDING;
}
/*
* error_noslot pfns indicate that the gfn can not be
* translated to pfn - it is not in slot or failed to
@@ -153,10 +163,11 @@ static inline bool is_error_page(struct page *page)
* Architecture-independent vcpu->requests bit members
* Bits 3-7 are reserved for more arch-independent bits.
*/
#define KVM_REQ_TLB_FLUSH (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UNBLOCK 2
#define KVM_REQUEST_ARCH_BASE 8
#define KVM_REQ_TLB_FLUSH (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UNBLOCK 2
#define KVM_REQ_DIRTY_RING_SOFT_FULL 3
#define KVM_REQUEST_ARCH_BASE 8
/*
* KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to
@@ -655,6 +666,8 @@ struct kvm_irq_routing_table {
};
#endif
bool kvm_arch_irqchip_in_kernel(struct kvm *kvm);
#ifndef KVM_INTERNAL_MEM_SLOTS
#define KVM_INTERNAL_MEM_SLOTS 0
#endif
@@ -710,6 +723,11 @@ struct kvm {
/* The current active memslot set for each address space */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
struct xarray vcpu_array;
/*
* Protected by slots_lock, but can be read outside if an
* incorrect answer is acceptable.
*/
atomic_t nr_memslots_dirty_logging;
/* Used to wait for completion of MMU notifiers. */
spinlock_t mn_invalidate_lock;
@@ -779,6 +797,7 @@ struct kvm {
bool override_halt_poll_ns;
unsigned int max_halt_poll_ns;
u32 dirty_ring_size;
bool dirty_ring_with_bitmap;
bool vm_bugged;
bool vm_dead;
@@ -1141,8 +1160,8 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
bool atomic, bool *async, bool write_fault,
bool *writable, hva_t *hva);
bool atomic, bool interruptible, bool *async,
bool write_fault, bool *writable, hva_t *hva);
void kvm_release_pfn_clean(kvm_pfn_t pfn);
void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -1244,18 +1263,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
* kvm_gpc_init - initialize gfn_to_pfn_cache.
*
* @gpc: struct gfn_to_pfn_cache object.
*
* This sets up a gfn_to_pfn_cache by initializing locks. Note, the cache must
* be zero-allocated (or zeroed by the caller before init).
*/
void kvm_gpc_init(struct gfn_to_pfn_cache *gpc);
/**
* kvm_gpc_activate - prepare a cached kernel mapping and HPA for a given guest
* physical address.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
* @vcpu: vCPU to be used for marking pages dirty and to be woken on
* invalidation.
* @usage: indicates if the resulting host physical PFN is used while
@@ -1264,28 +1272,36 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc);
* changes!---will also force @vcpu to exit the guest and
* refresh the cache); and/or if the PFN used directly
* by KVM (and thus needs a kernel virtual mapping).
*
* This sets up a gfn_to_pfn_cache by initializing locks and assigning the
* immutable attributes. Note, the cache must be zero-allocated (or zeroed by
* the caller before init).
*/
void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
struct kvm_vcpu *vcpu, enum pfn_cache_usage usage);
/**
* kvm_gpc_activate - prepare a cached kernel mapping and HPA for a given guest
* physical address.
*
* @gpc: struct gfn_to_pfn_cache object.
* @gpa: guest physical address to map.
* @len: sanity check; the range being access must fit a single page.
*
* @return: 0 for success.
* -EINVAL for a mapping which would cross a page boundary.
* -EFAULT for an untranslatable guest physical address.
* -EFAULT for an untranslatable guest physical address.
*
* This primes a gfn_to_pfn_cache and links it into the @kvm's list for
* invalidations to be processed. Callers are required to use
* kvm_gfn_to_pfn_cache_check() to ensure that the cache is valid before
* accessing the target page.
* This primes a gfn_to_pfn_cache and links it into the @gpc->kvm's list for
* invalidations to be processed. Callers are required to use kvm_gpc_check()
* to ensure that the cache is valid before accessing the target page.
*/
int kvm_gpc_activate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
struct kvm_vcpu *vcpu, enum pfn_cache_usage usage,
gpa_t gpa, unsigned long len);
int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len);
/**
* kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache.
* kvm_gpc_check - check validity of a gfn_to_pfn_cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
* @gpa: current guest physical address to map.
* @len: sanity check; the range being access must fit a single page.
*
* @return: %true if the cache is still valid and the address matches.
@@ -1298,52 +1314,35 @@ int kvm_gpc_activate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
* Callers in IN_GUEST_MODE may do so without locking, although they should
* still hold a read lock on kvm->scru for the memslot checks.
*/
bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
gpa_t gpa, unsigned long len);
bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len);
/**
* kvm_gfn_to_pfn_cache_refresh - update a previously initialized cache.
* kvm_gpc_refresh - update a previously initialized cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
* @gpa: updated guest physical address to map.
* @len: sanity check; the range being access must fit a single page.
*
* @return: 0 for success.
* -EINVAL for a mapping which would cross a page boundary.
* -EFAULT for an untranslatable guest physical address.
* -EFAULT for an untranslatable guest physical address.
*
* This will attempt to refresh a gfn_to_pfn_cache. Note that a successful
* returm from this function does not mean the page can be immediately
* return from this function does not mean the page can be immediately
* accessed because it may have raced with an invalidation. Callers must
* still lock and check the cache status, as this function does not return
* with the lock still held to permit access.
*/
int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
gpa_t gpa, unsigned long len);
/**
* kvm_gfn_to_pfn_cache_unmap - temporarily unmap a gfn_to_pfn_cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
*
* This unmaps the referenced page. The cache is left in the invalid state
* but at least the mapping from GPA to userspace HVA will remain cached
* and can be reused on a subsequent refresh.
*/
void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);
int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len);
/**
* kvm_gpc_deactivate - deactivate and unlink a gfn_to_pfn_cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
*
* This removes a cache from the @kvm's list to be processed on MMU notifier
* This removes a cache from the VM's list to be processed on MMU notifier
* invocation.
*/
void kvm_gpc_deactivate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);
void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
void kvm_sigset_activate(struct kvm_vcpu *vcpu);
void kvm_sigset_deactivate(struct kvm_vcpu *vcpu);

View File

@@ -67,6 +67,7 @@ struct gfn_to_pfn_cache {
gpa_t gpa;
unsigned long uhva;
struct kvm_memory_slot *memslot;
struct kvm *kvm;
struct kvm_vcpu *vcpu;
struct list_head list;
rwlock_t lock;

View File

@@ -3088,6 +3088,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
#define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */
#define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */
#define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */
#define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */
/*
* FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each

View File

@@ -132,8 +132,9 @@ enum pageflags {
PG_young,
PG_idle,
#endif
#ifdef CONFIG_64BIT
#ifdef CONFIG_ARCH_USES_PG_ARCH_X
PG_arch_2,
PG_arch_3,
#endif
#ifdef CONFIG_KASAN_HW_TAGS
PG_skip_kasan_poison,