* kvm-arm64/nv-tcr2:
: Fixes to the handling of TCR_EL1, courtesy of Marc Zyngier
:
: Series addresses a couple gaps that are present in KVM (from cover
: letter):
:
: - VM configuration: HCRX_EL2.TCR2En is forced to 1, and we blindly
: save/restore stuff.
:
: - trap bit description and routing: none, obviously, since we make a
: point in not trapping.
KVM: arm64: Honor trap routing for TCR2_EL1
KVM: arm64: Make PIR{,E0}_EL1 save/restore conditional on FEAT_TCRX
KVM: arm64: Make TCR2_EL1 save/restore dependent on the VM features
KVM: arm64: Get rid of HCRX_GUEST_FLAGS
KVM: arm64: Correctly honor the presence of FEAT_TCRX
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/nv-sve:
: CPTR_EL2, FPSIMD/SVE support for nested
:
: This series brings support for honoring the guest hypervisor's CPTR_EL2
: trap configuration when running a nested guest, along with support for
: FPSIMD/SVE usage at L1 and L2.
KVM: arm64: Allow the use of SVE+NV
KVM: arm64: nv: Add additional trap setup for CPTR_EL2
KVM: arm64: nv: Add trap description for CPTR_EL2
KVM: arm64: nv: Add TCPAC/TTA to CPTR->CPACR conversion helper
KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2
KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap
KVM: arm64: nv: Handle CPACR_EL1 traps
KVM: arm64: Spin off helper for programming CPTR traps
KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state
KVM: arm64: nv: Use guest hypervisor's max VL when running nested guest
KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context
KVM: arm64: nv: Load guest hyp's ZCR into EL1 state
KVM: arm64: nv: Handle ZCR_EL2 traps
KVM: arm64: nv: Forward SVE traps to guest hypervisor
KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/el2-kcfi:
: kCFI support in the EL2 hypervisor, courtesy of Pierre-Clément Tosi
:
: Enable the usage fo CONFIG_CFI_CLANG (kCFI) for hardening indirect
: branches in the EL2 hypervisor. Unlike kernel support for the feature,
: CFI failures at EL2 are always fatal.
KVM: arm64: nVHE: Support CONFIG_CFI_CLANG at EL2
KVM: arm64: Introduce print_nvhe_hyp_panic helper
arm64: Introduce esr_brk_comment, esr_is_cfi_brk
KVM: arm64: VHE: Mark __hyp_call_panic __noreturn
KVM: arm64: nVHE: gen-hyprel: Skip R_AARCH64_ABS32
KVM: arm64: nVHE: Simplify invalid_host_el2_vect
KVM: arm64: Fix __pkvm_init_switch_pgd call ABI
KVM: arm64: Fix clobbered ELR in sync abort/SError
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/ctr-el0:
: Support for user changes to CTR_EL0, courtesy of Sebastian Ott
:
: Allow userspace to change the guest-visible value of CTR_EL0 for a VM,
: so long as the requested value represents a subset of features supported
: by hardware. In other words, prevent the VMM from over-promising the
: capabilities of hardware.
:
: Make this happen by fitting CTR_EL0 into the existing infrastructure for
: feature ID registers.
KVM: selftests: Assert that MPIDR_EL1 is unchanged across vCPU reset
KVM: arm64: nv: Unfudge ID_AA64PFR0_EL1 masking
KVM: selftests: arm64: Test writes to CTR_EL0
KVM: arm64: rename functions for invariant sys regs
KVM: arm64: show writable masks for feature registers
KVM: arm64: Treat CTR_EL0 as a VM feature ID register
KVM: arm64: unify code to prepare traps
KVM: arm64: nv: Use accessors for modifying ID registers
KVM: arm64: Add helper for writing ID regs
KVM: arm64: Use read-only helper for reading VM ID registers
KVM: arm64: Make idregs debugfs iterator search sysreg table directly
KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show()
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/shadow-mmu:
: Shadow stage-2 MMU support for NV, courtesy of Marc Zyngier
:
: Initial implementation of shadow stage-2 page tables to support a guest
: hypervisor. In the author's words:
:
: So here's the 10000m (approximately 30000ft for those of you stuck
: with the wrong units) view of what this is doing:
:
: - for each {VMID,VTTBR,VTCR} tuple the guest uses, we use a
: separate shadow s2_mmu context. This context has its own "real"
: VMID and a set of page tables that are the combination of the
: guest's S2 and the host S2, built dynamically one fault at a time.
:
: - these shadow S2 contexts are ephemeral, and behave exactly as
: TLBs. For all intent and purposes, they *are* TLBs, and we discard
: them pretty often.
:
: - TLB invalidation takes three possible paths:
:
: * either this is an EL2 S1 invalidation, and we directly emulate
: it as early as possible
:
: * or this is an EL1 S1 invalidation, and we need to apply it to
: the shadow S2s (plural!) that match the VMID set by the L1 guest
:
: * or finally, this is affecting S2, and we need to teardown the
: corresponding part of the shadow S2s, which invalidates the TLBs
KVM: arm64: nv: Truely enable nXS TLBI operations
KVM: arm64: nv: Add handling of NXS-flavoured TLBI operations
KVM: arm64: nv: Add handling of range-based TLBI operations
KVM: arm64: nv: Add handling of outer-shareable TLBI operations
KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information
KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level
KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations
KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations
KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations
KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations
KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1
KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation
KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives
KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
KVM: arm64: nv: Handle shadow stage 2 page faults
KVM: arm64: nv: Implement nested Stage-2 page table walk logic
KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/ffa-1p1:
: Improvements to the pKVM FF-A Proxy, courtesy of Sebastian Ene
:
: Various minor improvements to how host FF-A calls are proxied with the
: TEE, along with support for v1.1 of the protocol.
KVM: arm64: Use FF-A 1.1 with pKVM
KVM: arm64: Update the identification range for the FF-A smcs
KVM: arm64: Add support for FFA_PARTITION_INFO_GET
KVM: arm64: Trap FFA_VERSION host call in pKVM
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/misc:
: Miscellaneous updates
:
: - Provide a command-line parameter to statically control the WFx trap
: selection in KVM
:
: - Make sysreg masks allocation accounted
Revert "KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarity"
KVM: arm64: nv: Use GFP_KERNEL_ACCOUNT for sysreg_masks allocation
KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarity
KVM: arm64: Add early_param to control WFx trapping
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
This reverts commit eb9d53d4a9.
As Marc pointed out on the list [*], this patch is wrong, and those who
find themselves in the SOB chain should have their heads checked.
Annoyingly, the architecture has some FGT trap bits that are negative
(i.e. 0 implies trap), and there was some confusion how KVM handles
this for nested guests. However, it is clear now that KVM honors the
RES0-ness of FGT traps already, meaning traps for features never exposed
to the guest hypervisor get handled at L0. As they should.
Link: https://lore.kernel.org/kvmarm/86bk3c3uss.wl-maz@kernel.org/T/#mb9abb3dd79f6a4544a91cb35676bd637c3a5e836
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Although we now have support for nXS-flavoured TLBI instructions,
we still don't expose the feature to the guest thanks to a mixture
of misleading comment and use of a bunch of magic values.
Fix the comment and correctly express the masking of LS64, which
is enough to expose nXS to the world. Not that anyone cares...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240703154743.824824-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
TCR2_EL1 handling is missing the handling of its trap configuration:
- HCRX_EL2.TCR2En must be handled in conjunction with HCR_EL2.{TVM,TRVM}
- HFG{R,W}TR_EL2.TCR_EL1 does apply to TCR2_EL1 as well
Without these two controls being implemented, it is impossible to
correctly route TCR2_EL1 traps.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240625130042.259175-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
As for other registers, save/restore of TCR2_EL1 should be gated
on the feature being actually present.
In the case of a nVHE hypervisor, it is perfectly fine to leave
the host value in the register, as HCRX_EL2.TCREn==0 imposes that
TCR2_EL1 is treated as 0.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240625130042.259175-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
HCRX_GUEST_FLAGS gives random KVM hackers the impression that
they can stuff bits in this macro and unconditionally enable
features in the guest.
In general, this is wrong (we have been there with FEAT_MOPS,
and again with FEAT_TCRX).
Document that HCRX_EL2.SMPME is an exception rather than the rule,
and get rid of HCRX_GUEST_FLAGS.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20240625130042.259175-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Marc reports that L1 VMs aren't booting with the NV series applied to
today's kvmarm/next. After bisecting the issue, it appears that
44241f34fa ("KVM: arm64: nv: Use accessors for modifying ID
registers") is to blame.
Poking around at the issue a bit further, it'd appear that the value for
ID_AA64PFR0_EL1 is complete garbage, as 'val' still contains the value
we set ID_AA64ISAR1_EL1 to.
Fix the read-modify-write pattern to actually use ID_AA64PFR0_EL1 as the
starting point. Excuse me as I return to my shame cube.
Reported-by: Marc Zyngier <maz@kernel.org>
Fixes: 44241f34fa ("KVM: arm64: nv: Use accessors for modifying ID registers")
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240621224044.2465901-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Start folding the guest hypervisor's FP/SVE traps into the value
programmed in hardware. Note that as of writing this is dead code, since
KVM does a full put() / load() for every nested exception boundary which
saves + flushes the FP/SVE state.
However, this will become useful when we can keep the guest's FP/SVE
state alive across a nested exception boundary and the host no longer
needs to conservatively program traps.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240620164653.1130714-12-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Handle CPACR_EL1 accesses when running a VHE guest. In order to
limit the cost of the emulation, implement it ass a shallow exit.
In the other cases:
- this is a nVHE L1 which will write to memory, and we don't trap
- this is a L2 guest:
* the L1 has CPTR_EL2.TCPAC==0, and the L2 has direct register
access
* the L1 has CPTR_EL2.TCPAC==1, and the L2 will trap, but the
handling is defered to the general handling for forwarding
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240620164653.1130714-10-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
A subsequent change to KVM will add preliminary support for merging a
guest hypervisor's CPTR traps with that of KVM. Prepare by spinning off
a new helper for managing CPTR traps.
Avoid reading CPACR_EL1 for the baseline trap config, and start off with
the most restrictive set of traps that is subsequently relaxed.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240620164653.1130714-9-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
It is possible that the guest hypervisor has selected a smaller VL than
the maximum for its nested guest. As such, ZCR_EL2 may be configured for
a different VL when exiting a nested guest.
Set ZCR_EL2 (via the EL1 alias) to the maximum VL for the VM before
saving SVE state as the SVE save area is dimensioned by the max VL.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240620164653.1130714-8-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The max VL for nested guests is additionally constrained by the max VL
selected by the guest hypervisor. Use that instead of KVM's max VL when
running a nested guest.
Note that the guest hypervisor's ZCR_EL2 is sanitised against the VM's
max VL at the time of access, so there's no additional handling required
at the time of use.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240620164653.1130714-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Unlike other SVE-related registers, ZCR_EL2 takes a sysreg trap to EL2
when HCR_EL2.NV = 1. KVM still needs to honor the guest hypervisor's
trap configuration, which expects an SVE trap (i.e. ESR_EL2.EC = 0x19)
when CPTR traps are enabled for the vCPU's current context.
Otherwise, if the guest hypervisor has traps disabled, emulate the
access by mapping the requested VL into ZCR_EL1.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240620164653.1130714-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Give precedence to the guest hypervisor's trap configuration when
routing an FP/ASIMD trap taken to EL2. Take advantage of the
infrastructure for translating CPTR_EL2 into the VHE (i.e. EL1) format
and base the trap decision solely on the VHE view of the register. The
in-memory value of CPTR_EL2 will always be up to date for the guest
hypervisor (more on that later), so just read it directly from memory.
Bury all of this behind a macro keyed off of the CPTR bitfield in
anticipation of supporting other traps (e.g. SVE).
[maz: account for HCR_EL2.E2H when testing for TFP/FPEN, with
all the hard work actually being done by Chase Conklin]
[ oliver: translate nVHE->VHE format for testing traps; macro for reuse
in other CPTR_EL2.xEN fields ]
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240620164653.1130714-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The compiler implements kCFI by adding type information (u32) above
every function that might be indirectly called and, whenever a function
pointer is called, injects a read-and-compare of that u32 against the
value corresponding to the expected type. In case of a mismatch, a BRK
instruction gets executed. When the hypervisor triggers such an
exception in nVHE, it panics and triggers and exception return to EL1.
Therefore, teach nvhe_hyp_panic_handler() to detect kCFI errors from the
ESR and report them. If necessary, remind the user that EL2 kCFI is not
affected by CONFIG_CFI_PERMISSIVE.
Pass $(CC_FLAGS_CFI) to the compiler when building the nVHE hyp code.
Use SYM_TYPED_FUNC_START() for __pkvm_init_switch_pgd, as nVHE can't
call it directly and must use a PA function pointer from C (because it
is part of the idmap page), which would trigger a kCFI failure if the
type ID wasn't present.
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240610063244.2828978-9-ptosi@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Ignore R_AARCH64_ABS32 relocations, instead of panicking, when emitting
the relocation table of the hypervisor. The toolchain might produce them
when generating function calls with kCFI to represent the 32-bit type ID
which can then be resolved across compilation units at link time. These
are NOT actual 32-bit addresses and are therefore not needed in the
final (runtime) relocation table (which is unlikely to use 32-bit
absolute addresses for arm64 anyway).
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240610063244.2828978-5-ptosi@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Fix the mismatch between the (incorrect) C signature, C call site, and
asm implementation by aligning all three on an API passing the
parameters (pgd and SP) separately, instead of as a bundled struct.
Remove the now unnecessary memory accesses while the MMU is off from the
asm, which simplifies the C caller (as it does not need to convert a VA
struct pointer to PA) and makes the code slightly more robust by
offsetting the struct fields from C and properly expressing the call to
the C compiler (e.g. type checker and kCFI).
Fixes: f320bc742b ("KVM: arm64: Prepare the creation of s1 mappings at EL2")
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240610063244.2828978-3-ptosi@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When the hypervisor receives a SError or synchronous exception (EL2h)
while running with the __kvm_hyp_vector and if ELR_EL2 doesn't point to
an extable entry, it panics indirectly by overwriting ELR with the
address of a panic handler in order for the asm routine it returns to to
ERET into the handler.
However, this clobbers ELR_EL2 for the handler itself. As a result,
hyp_panic(), when retrieving what it believes to be the PC where the
exception happened, actually ends up reading the address of the panic
handler that called it! This results in an erroneous and confusing panic
message where the source of any synchronous exception (e.g. BUG() or
kCFI) appears to be __guest_exit_panic, making it hard to locate the
actual BRK instruction.
Therefore, store the original ELR_EL2 in the per-CPU kvm_hyp_ctxt and
point the sysreg to a routine that first restores it to its previous
value before running __guest_exit_panic.
Fixes: 7db2153047 ("KVM: arm64: Restore hyp when panicking in guest context")
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240610063244.2828978-2-ptosi@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
CTR_EL0 is currently handled as an invariant register, thus
guests will be presented with the host value of that register.
Add emulation for CTR_EL0 based on a per VM value. Userspace can
switch off DIC and IDC bits and reduce DminLine and IminLine sizes.
Naturally, ensure CTR_EL0 is trapped (HCR_EL2.TID2=1) any time that a
VM's CTR_EL0 differs from hardware.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240619174036.483943-8-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
There are 2 functions to calculate traps via HCR_EL2:
* kvm_init_sysreg() called via KVM_RUN (before the 1st run or when
the pid changes)
* vcpu_reset_hcr() called via KVM_ARM_VCPU_INIT
To unify these 2 and to support traps that are dependent on the
ID register configuration, move the code from vcpu_reset_hcr()
to sys_regs.c and call it via kvm_init_sysreg().
We still have to keep the non-FWB handling stuff in vcpu_reset_hcr().
Also the initialization with HCR_GUEST_FLAGS is kept there but guarded
by !vcpu_has_run_once() to ensure that previous calculated values
don't get overwritten.
While at it rename kvm_init_sysreg() to kvm_calculate_traps() to
better reflect what it's doing.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20240619174036.483943-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
In the interest of abstracting away the underlying storage of feature
ID registers, rework the nested code to go through the accessors instead
of directly iterating the id_regs array.
This means we now lose the property that ID registers unknown to the
nested code get zeroed, but we really ought to be handling those
explicitly going forward.
Link: https://lore.kernel.org/r/20240619174036.483943-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
IDREG() expands to the storage of a particular ID reg, which can be
useful for handling both reads and writes. However, outside of a select
few situations, the ID registers should be considered read only.
Replace current readers with a new macro that expands to the value of
the field rather than the field itself.
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20240619174036.483943-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
CTR_EL0 complicates the existing scheme for iterating feature ID
registers, as it is not in the contiguous range that we presently
support. Just search the sysreg table for the Nth feature ID register in
anticipation of this. Yes, the debugfs interface has quadratic time
completixy now. Boo hoo.
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20240619174036.483943-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>