Commit Graph

1324261 Commits

Author SHA1 Message Date
Marc Zyngier
d0670128d4 Merge branch kvm-arm64/pkvm-np-guest into kvmarm-master/next
* kvm-arm64/pkvm-np-guest:
  : .
  : pKVM support for non-protected guests using the standard MM
  : infrastructure, courtesy of Quentin Perret. From the cover letter:
  :
  : "This series moves the stage-2 page-table management of non-protected
  : guests to EL2 when pKVM is enabled. This is only intended as an
  : incremental step towards a 'feature-complete' pKVM, there is however a
  : lot more that needs to come on top.
  :
  : With that series applied, pKVM provides near-parity with standard KVM
  : from a functional perspective all while Linux no longer touches the
  : stage-2 page-tables itself at EL1. The majority of mm-related KVM
  : features work out of the box, including MMU notifiers, dirty logging,
  : RO memslots and things of that nature. There are however two gotchas:
  :
  :  - We don't support mapping devices into guests: this requires
  :    additional hypervisor support for tracking the 'state' of devices,
  :    which will come in a later series. No device assignment until then.
  :
  :  - Stage-2 mappings are forced to page-granularity even when backed by a
  :    huge page for the sake of simplicity of this series. I'm only aiming
  :    at functional parity-ish (from userspace's PoV) for now, support for
  :    HP can be added on top later as a perf improvement."
  : .
  KVM: arm64: Plumb the pKVM MMU in KVM
  KVM: arm64: Introduce the EL1 pKVM MMU
  KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
  KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
  KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
  KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
  KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
  KVM: arm64: Introduce __pkvm_host_unshare_guest()
  KVM: arm64: Introduce __pkvm_host_share_guest()
  KVM: arm64: Introduce __pkvm_vcpu_{load,put}()
  KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
  KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
  KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
  KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
  KVM: arm64: Move host page ownership tracking to the hyp vmemmap
  KVM: arm64: Make hyp_page::order a u8
  KVM: arm64: Move enum pkvm_page_state to memory.h
  KVM: arm64: Change the layout of enum pkvm_page_state

Signed-off-by: Marc Zyngier <maz@kernel.org>

# Conflicts:
#	arch/arm64/kvm/arm.c
2025-01-12 10:37:15 +00:00
Marc Zyngier
b7bf7c87bb Merge branch kvm-arm64/debug-6.14 into kvmarm-master/next
* kvm-arm64/debug-6.14:
  : .
  : Large rework of the debug code to make it a bit less horrid,
  : courtesy of Oliver Upton. From the original cover letter:
  :
  : "The debug code has become a bit difficult to reason about, especially
  : all the hacks and bandaids for state tracking + trap configuration.
  :
  : This series reworks the entire mess around using a single enumeration to
  : track the state of the debug registers (free, guest-owned, host-owned),
  : using that to drive trap configuration and save/restore.
  :
  : On top of that, this series wires most of the implementation into vCPU
  : load/put rather than the main KVM_RUN loop. This has been a long time
  : coming for VHE, as a lot of the trap configuration and EL1 state gets
  : loaded into hardware at that point anyway.
  :
  : The save/restore of the debug registers is simplified quite a bit as
  : well. KVM will now restore the registers for *any* access rather than
  : just writes, and keep doing so until the next vcpu_put() instead of
  : dropping it on the floor after the next exception."
  : .
  KVM: arm64: Promote guest ownership for DBGxVR/DBGxCR reads
  KVM: arm64: Fold DBGxVR/DBGxCR accessors into common set
  KVM: arm64: Avoid reading ID_AA64DFR0_EL1 for debug save/restore
  KVM: arm64: nv: Honor MDCR_EL2.TDE routing for debug exceptions
  KVM: arm64: Manage software step state at load/put
  KVM: arm64: Don't hijack guest context MDSCR_EL1
  KVM: arm64: Compute MDCR_EL2 at vcpu_load()
  KVM: arm64: Reload vCPU for accesses to OSLAR_EL1
  KVM: arm64: Use debug_owner to track if debug regs need save/restore
  KVM: arm64: Remove vestiges of debug_ptr
  KVM: arm64: Remove debug tracepoints
  KVM: arm64: Select debug state to save/restore based on debug owner
  KVM: arm64: Clean up KVM_SET_GUEST_DEBUG handler
  KVM: arm64: Evaluate debug owner at vcpu_load()
  KVM: arm64: Write MDCR_EL2 directly from kvm_arm_setup_mdcr_el2()
  KVM: arm64: Move host SME/SVE tracking flags to host data
  KVM: arm64: Track presence of SPE/TRBE in kvm_host_data instead of vCPU
  KVM: arm64: Get rid of __kvm_get_mdcr_el2() and related warts
  KVM: arm64: Drop MDSCR_EL1_DEBUG_MASK

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12 10:36:00 +00:00
Quentin Perret
fce886a602 KVM: arm64: Plumb the pKVM MMU in KVM
Introduce the KVM_PGT_CALL() helper macro to allow switching from the
traditional pgtable code to the pKVM version easily in mmu.c. The cost
of this 'indirection' is expected to be very minimal due to
is_protected_kvm_enabled() being backed by a static key.

With this, everything is in place to allow the delegation of
non-protected guest stage-2 page-tables to pKVM, so let's stop using the
host's kvm_s2_mmu from EL2 and enjoy the ride.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-19-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
e912efed48 KVM: arm64: Introduce the EL1 pKVM MMU
Introduce a set of helper functions allowing to manipulate the pKVM
guest stage-2 page-tables from EL1 using pKVM's HVC interface.

Each helper has an exact one-to-one correspondance with the traditional
kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly
matching prototype. This will ease plumbing later on in mmu.c.

These callbacks track the gfn->pfn mappings in a simple rb_tree indexed
by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's
state and is protected by the mmu_lock like a traditional stage-2
page-table.

Signed-off-by: Quentin Perret <qperret@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-18-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
0adce4d42f KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
Introduce a new hypercall to flush the TLBs of non-protected guests. The
host kernel will be responsible for issuing this hypercall after changing
stage-2 permissions using the __pkvm_host_relax_guest_perms() or
__pkvm_host_wrprotect_guest() paths. This is left under the host's
responsibility for performance reasons.

Note however that the TLB maintenance for all *unmap* operations still
remains entirely under the hypervisor's responsibility for security
reasons -- an unmapped page may be donated to another entity, so a stale
TLB entry could be used to leak private data.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-17-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
76f0b18b3d KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
Plumb the kvm_pgtable_stage2_mkyoung() callback into pKVM for
non-protected guests. It will be called later from the fault handling
path.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-16-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
56ab4de37f KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
Plumb the kvm_stage2_test_clear_young() callback into pKVM for
non-protected guest. It will be later be called from MMU notifiers.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-15-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
26117e4c63 KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
Introduce a new hypercall to remove the write permission from a
non-protected guest stage-2 mapping. This will be used for e.g. enabling
dirty logging.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-14-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
34884a0a4a KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
Introduce a new hypercall allowing the host to relax the stage-2
permissions of mappings in a non-protected guest page-table. It will be
used later once we start allowing RO memslots and dirty logging.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-13-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
72db3d3fba KVM: arm64: Introduce __pkvm_host_unshare_guest()
In preparation for letting the host unmap pages from non-protected
guests, introduce a new hypercall implementing the host-unshare-guest
transition.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-12-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
d0bd3e6570 KVM: arm64: Introduce __pkvm_host_share_guest()
In preparation for handling guest stage-2 mappings at EL2, introduce a
new pKVM hypercall allowing to share pages with non-protected guests.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-11-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Marc Zyngier
f7d03fcbf1 KVM: arm64: Introduce __pkvm_vcpu_{load,put}()
Rather than look-up the hyp vCPU on every run hypercall at EL2,
introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated
by a pair of load/put hypercalls called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-10-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
99996d575e KVM: arm64: Add {get,put}_pkvm_hyp_vm() helpers
In preparation for accessing pkvm_hyp_vm structures at EL2 in a context
where we can't always expect a vCPU to be loaded (e.g. MMU notifiers),
introduce get/put helpers to get temporary references to hyp VMs from
any context.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-9-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
c77e5181fe KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
Turn kvm_pgtable_stage2_init() into a static inline function instead of
a macro. This will allow the usage of typeof() on it later on.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-8-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
e279c25d78 KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
kvm_pgtable_stage2_relax_perms currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM. To
allow for the re-use of that function, make the walk flags one of its
parameters.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-7-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:44:00 +00:00
Quentin Perret
5398ddc5c9 KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
kvm_pgtable_stage2_mkyoung currently assumes that it is being called
from a 'shared' walker, which will not be true once called from pKVM.
To allow for the re-use of that function, make the walk flags one of
its parameters.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-6-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:43:59 +00:00
Quentin Perret
e94a7dea29 KVM: arm64: Move host page ownership tracking to the hyp vmemmap
We currently store part of the page-tracking state in PTE software bits
for the host, guests and the hypervisor. This is sub-optimal when e.g.
sharing pages as this forces to break block mappings purely to support
this software tracking. This causes an unnecessarily fragmented stage-2
page-table for the host in particular when it shares pages with Secure,
which can lead to measurable regressions. Moreover, having this state
stored in the page-table forces us to do multiple costly walks on the
page transition path, hence causing overhead.

In order to work around these problems, move the host-side page-tracking
logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:43:59 +00:00
Quentin Perret
b35875d466 KVM: arm64: Make hyp_page::order a u8
We don't need 16 bits to store the hyp page order, and we'll need some
bits to store page ownership data soon, so let's reduce the order
member.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:43:59 +00:00
Quentin Perret
d4fc42a479 KVM: arm64: Move enum pkvm_page_state to memory.h
In order to prepare the way for storing page-tracking information in
pKVM's vmemmap, move the enum pkvm_page_state definition to
nvhe/memory.h.

No functional changes intended.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:43:59 +00:00
Quentin Perret
a1a1f1ff1f KVM: arm64: Change the layout of enum pkvm_page_state
The 'concrete' (a.k.a non-meta) page states are currently encoded using
software bits in PTEs. For performance reasons, the abstract
pkvm_page_state enum uses the same bits to encode these states as that
makes conversions from and to PTEs easy.

In order to prepare the ground for moving the 'concrete' state storage
to the hyp vmemmap, re-arrange the enum to use bits 0 and 1 for this
purpose.

No functional changes intended.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20241218194059.3670226-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:43:59 +00:00
Oliver Upton
c4a6ed8545 KVM: arm64: Promote guest ownership for DBGxVR/DBGxCR reads
Only yielding control of the debug registers for writes is a bit silly,
unless of course you're a fan of pointless traps. Give control of the
debug registers to the guest upon the first access, regardless of
direction.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-20-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:04:19 +00:00
Oliver Upton
3ce9f3357e KVM: arm64: Fold DBGxVR/DBGxCR accessors into common set
There is a nauseating amount of boilerplate for accessing the
breakpoint and watchpoint registers. Fold everything together into a
single set of accessors and select the right storage based on the sysreg
encoding.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-19-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:04:17 +00:00
Oliver Upton
8c02c2bbd6 KVM: arm64: Avoid reading ID_AA64DFR0_EL1 for debug save/restore
Similar to other per-CPU profiling/debug features we handle, store the
number of breakpoints/watchpoints in kvm_host_data to avoid reading the
ID register 4 times on every guest entry/exit. And if you're in the
nested virt business that's quite a few avoidable exits to the L0
hypervisor.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-18-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:04:14 +00:00
Oliver Upton
b0ee51033a KVM: arm64: nv: Honor MDCR_EL2.TDE routing for debug exceptions
Inject debug exceptions into vEL2 if MDCR_EL2.TDE is set.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-17-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:04:11 +00:00
Marc Zyngier
2ca3f03bf5 KVM: arm64: Manage software step state at load/put
KVM takes over the guest's software step state machine if the VMM is
debugging the guest, but it does the save/restore fiddling for every
guest entry.

Note that the only constraint on host usage of software step is that the
guest's configuration remains visible to userspace via the ONE_REG
ioctls. So, we can cut down on the amount of fiddling by doing this at
load/put instead.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-16-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:04:06 +00:00
Oliver Upton
4ad3a0b87f KVM: arm64: Don't hijack guest context MDSCR_EL1
Stealing MDSCR_EL1 in the guest's kvm_cpu_context for external debugging
is rather gross. Just add a field for this instead and let the context
switch code pick the correct one based on the debug owner.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-15-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:02:51 +00:00
Oliver Upton
75a5fbaf66 KVM: arm64: Compute MDCR_EL2 at vcpu_load()
KVM has picked up several hacks to cope with vcpu->arch.mdcr_el2 needing
to be prepared before vcpu_load(), which is when it gets programmed
into hardware on VHE.

Now that the flows for reprogramming MDCR_EL2 have been simplified, move
that computation to vcpu_load().

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-14-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
06d22a9c1b KVM: arm64: Reload vCPU for accesses to OSLAR_EL1
KVM takes ownership of the debug regs if the guest enables the OS lock,
as it needs to use MDSCR_EL1 to mask debug exceptions. Just reload the
vCPU if the guest toggles the OS lock, relying on kvm_vcpu_load_debug()
to update the debug owner and get the right trap configuration in place.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-13-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
beb470d96c KVM: arm64: Use debug_owner to track if debug regs need save/restore
Use the debug owner to determine if the debug regs are in use instead of
keeping around the DEBUG_DIRTY flag. Debug registers are now
saved/restored after the first trap, regardless of whether it was a read
or a write. This also shifts the point at which KVM becomes lazy to
vcpu_put() rather than the next exception taken from the guest.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-12-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
803602b0d9 KVM: arm64: Remove vestiges of debug_ptr
Delete the remnants of debug_ptr now that debug registers are selected
based on the debug owner instead.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-11-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
3b7780945c KVM: arm64: Remove debug tracepoints
The debug tracepoints are a useless firehose of information that track
implementation detail rather than well-defined events. These are going
to be rather difficult to uphold now that the implementation is getting
redone, so throw them out instead of bending over backwards.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-10-oliver.upton@linux.dev
[maz: fixed compilation after trace-ectomy]
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
58db67e9ac KVM: arm64: Select debug state to save/restore based on debug owner
Select the set of debug registers to use based on the owner rather than
relying on debug_ptr. Besides the code cleanup, this allows us to
eliminate a couple instances kern_hyp_va() as well.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-9-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
4cefbec97d KVM: arm64: Clean up KVM_SET_GUEST_DEBUG handler
No particular reason other than it isn't nice to look at.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-8-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
cd9b10102a KVM: arm64: Evaluate debug owner at vcpu_load()
In preparation for tossing the debug_ptr mess, introduce an enumeration
to track the ownership of the debug registers while in the guest. Update
the owner at vcpu_load() based on whether the host needs to steal the
guest's debug context or if breakpoints/watchpoints are actively in use.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-7-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
b47ffd13fd KVM: arm64: Write MDCR_EL2 directly from kvm_arm_setup_mdcr_el2()
Expecting the callee to know when MDCR_EL2 needs to be written to
hardware asking for trouble. Do the deed from kvm_arm_setup_mdcr_el2()
instead.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-6-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 09:01:25 +00:00
Oliver Upton
d381e53384 KVM: arm64: Move host SME/SVE tracking flags to host data
The SME/SVE state tracking flags have no business in the vCPU. Move them
to kvm_host_data.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-5-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 08:49:09 +00:00
Oliver Upton
38131c02a5 KVM: arm64: Track presence of SPE/TRBE in kvm_host_data instead of vCPU
Add flags to kvm_host_data to track if SPE/TRBE is present +
programmable on a per-CPU basis. Set the flags up at init rather than
vcpu_load() as the programmability of these buffers is unlikely to
change.

Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-4-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 08:49:09 +00:00
Oliver Upton
2417218f2f KVM: arm64: Get rid of __kvm_get_mdcr_el2() and related warts
KVM caches MDCR_EL2 on a per-CPU basis in order to preserve the
configuration of MDCR_EL2.HPMN while running a guest. This is a bit
gross, since we're relying on some baked configuration rather than the
hardware definition of implemented counters.

Discover the number of implemented counters by reading PMCR_EL0.N
instead. This works because:

 - In VHE the kernel runs at EL2, and N always returns the number of
   counters implemented in hardware

 - In {n,h}VHE, the EL2 setup code programs MDCR_EL2.HPMN with the EL2
   view of PMCR_EL0.N for the host

Lastly, avoid traps under nested virtualization by saving PMCR_EL0.N in
host data.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-3-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 08:49:08 +00:00
Oliver Upton
8ca19c40c4 KVM: arm64: Drop MDSCR_EL1_DEBUG_MASK
Nothing is using this macro, get rid of it.

Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241219224116.3941496-2-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 08:49:08 +00:00
Linus Torvalds
78d4f34e21 Linux 6.13-rc3 v6.13-rc3 2024-12-15 15:58:23 -08:00
Linus Torvalds
42a19aa170 Merge tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:

 - Sundry build and misc fixes

* tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: build: Try to guess GCC variant of cross compiler
  ARC: bpf: Correct conditional check in 'check_jmp_32'
  ARC: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices
  ARC: build: Use __force to suppress per-CPU cmpxchg warnings
  ARC: fix reference of dependency for PAE40 config
  ARC: build: disallow invalid PAE40 + 4K page config
  arc: rename aux.h to arc_aux.h
2024-12-15 15:38:12 -08:00
Linus Torvalds
7031a38ab7 Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Limit EFI zboot to GZIP and ZSTD before it comes in wider use

 - Fix inconsistent error when looking up a non-existent file in
   efivarfs with a name that does not adhere to the NAME-GUID format

 - Drop some unused code

* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/esrt: remove esre_attribute::store()
  efivarfs: Fix error on non-existent file
  efi/zboot: Limit compression options to GZIP and ZSTD
2024-12-15 15:33:41 -08:00
Linus Torvalds
151167d85a Merge tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was
  missing a sentinel, and RIIC was missing rounding up"

* tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: riic: Always round-up when calculating bus period
  i2c: nomadik: Add missing sentinel to match table
  i2c: pnx: Fix timeout in wait functions
2024-12-15 15:29:07 -08:00
Linus Torvalds
dccbe2047a Merge tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:

 - Make sure amd64_edac loads successfully on certain Zen4 memory
   configurations

* tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Simplify ECC check on unified memory controllers
2024-12-15 10:01:10 -08:00
Linus Torvalds
f7c7a1ba22 Merge tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:

 - Disable the secure programming interface of the GIC500 chip in the
   RK3399 SoC to fix interrupt priority assignment and even make a dead
   machine boot again when the gic-v3 driver enables pseudo NMIs

 - Correct the declaration of a percpu variable to fix several sparse
   warnings

* tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Work around insecure GIC integrations
  irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base
2024-12-15 09:58:27 -08:00
Linus Torvalds
acd855a949 Merge tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:

 - Prevent incorrect dequeueing of the deadline dlserver helper task and
   fix its time accounting

 - Properly track the CFS runqueue runnable stats

 - Check the total number of all queued tasks in a sched fair's runqueue
   hierarchy before deciding to stop the tick

 - Fix the scheduling of the task that got woken last (NEXT_BUDDY) by
   preventing those from being delayed

* tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/dlserver: Fix dlserver time accounting
  sched/dlserver: Fix dlserver double enqueue
  sched/eevdf: More PELT vs DELAYED_DEQUEUE
  sched/fair: Fix sched_can_stop_tick() for fair tasks
  sched/fair: Fix NEXT_BUDDY
2024-12-15 09:38:03 -08:00
Linus Torvalds
81576a9a27 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix confusion with implicitly-shifted MDCR_EL2 masks breaking
     SPE/TRBE initialization

   - Align nested page table walker with the intended memory attribute
     combining rules of the architecture

   - Prevent userspace from constraining the advertised ASID width,
     avoiding horrors of guest TLBIs not matching the intended context
     in hardware

   - Don't leak references on LPIs when insertion into the translation
     cache fails

  RISC-V:

   - Replace csr_write() with csr_set() for HVIEN PMU overflow bit

  x86:

   - Cache CPUID.0xD XSTATE offsets+sizes during module init

     On Intel's Emerald Rapids CPUID costs hundreds of cycles and there
     are a lot of leaves under 0xD. Getting rid of the CPUIDs during
     nested VM-Enter and VM-Exit is planned for the next release, for
     now just cache them: even on Skylake that is 40% faster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
  RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit
  KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation
  KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden
  KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type
  arm64: Fix usage of new shifted MDCR_EL2 values
2024-12-15 09:26:13 -08:00
Linus Torvalds
2d8308bf5b Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
 "Single one-line fix in the ufs driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe
2024-12-14 15:53:02 -08:00
Linus Torvalds
35f301dd45 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:

 - Fix a bug in the BPF verifier to track changes to packet data
   property for global functions (Eduard Zingerman)

 - Fix a theoretical BPF prog_array use-after-free in RCU handling of
   __uprobe_perf_func (Jann Horn)

 - Fix BPF tracing to have an explicit list of tracepoints and their
   arguments which need to be annotated as PTR_MAYBE_NULL (Kumar
   Kartikeya Dwivedi)

 - Fix a logic bug in the bpf_remove_insns code where a potential error
   would have been wrongly propagated (Anton Protopopov)

 - Avoid deadlock scenarios caused by nested kprobe and fentry BPF
   programs (Priya Bala Govindasamy)

 - Fix a bug in BPF verifier which was missing a size check for
   BTF-based context access (Kumar Kartikeya Dwivedi)

 - Fix a crash found by syzbot through an invalid BPF prog_array access
   in perf_event_detach_bpf_prog (Jiri Olsa)

 - Fix several BPF sockmap bugs including a race causing a refcount
   imbalance upon element replace (Michal Luczaj)

 - Fix a use-after-free from mismatching BPF program/attachment RCU
   flavors (Jann Horn)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
  bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
  selftests/bpf: Add tests for raw_tp NULL args
  bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
  bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
  selftests/bpf: Add test for narrow ctx load for pointer args
  bpf: Check size for BTF-based ctx access of pointer members
  selftests/bpf: extend changes_pkt_data with cases w/o subprograms
  bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs
  bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
  bpf: fix potential error return
  selftests/bpf: validate that tail call invalidates packet pointers
  bpf: consider that tail calls invalidate packet pointers
  selftests/bpf: freplace tests for tracking of changes_packet_data
  bpf: check changes_pkt_data property for extension programs
  selftests/bpf: test for changing packet data from global functions
  bpf: track changes_pkt_data property for global functions
  bpf: refactor bpf_helper_changes_pkt_data to use helper number
  bpf: add find_containing_subprog() utility function
  bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
  bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
  ...
2024-12-14 12:58:14 -08:00
Priya Bala Govindasamy
c83508da56 bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
BPF program types like kprobe and fentry can cause deadlocks in certain
situations. If a function takes a lock and one of these bpf programs is
hooked to some point in the function's critical section, and if the
bpf program tries to call the same function and take the same lock it will
lead to deadlock. These situations have been reported in the following
bug reports.

In percpu_freelist -
Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com/T/
In bpf_lru_list -
Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@mail.gmail.com/T/

Similar bugs have been reported by syzbot.
In queue_stack_maps -
Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/
Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@sina.com/T/
In lpm_trie -
Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@google.com/T/
In ringbuf -
Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@sina.com/T/

Prevent kprobe and fentry bpf programs from attaching to these critical
sections by removing CC_FLAGS_FTRACE for percpu_freelist.o,
bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files.

The bugs reported by syzbot are due to tracepoint bpf programs being
called in the critical sections. This patch does not aim to fix deadlocks
caused by tracepoint programs. However, it does prevent deadlocks from
occurring in similar situations due to kprobe and fentry programs.

Signed-off-by: Priya Bala Govindasamy <pgovind2@uci.edu>
Link: https://lore.kernel.org/r/CAPPBnEZpjGnsuA26Mf9kYibSaGLm=oF6=12L21X1GEQdqjLnzQ@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-14 09:49:27 -08:00