* kvm-arm64/nv-misc-6.18:
: .
: Various NV-related fixes:
:
: - Relax KVM's SError injection to consider that HCR_EL2.AMO's
: effective value is 1 when HCR_EL2.{E2H,TGE)=={1,0}.
: (20250918164632.410404-1-oliver.upton@linux.dev)
:
: - Allow userspace to disable some S2 base granule sizes
: (20250918165505.415017-1-oliver.upton@linux.dev)
: .
KVM: arm64: nv: Allow userspace to de-feature stage-2 TGRANs
KVM: arm64: nv: Treat AMO as 1 when at EL2 and {E2H,TGE} = {1, 0}
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/el2-feature-control: (23 commits)
: .
: General rework of EL2 features that can be disabled to satisfy
: the requirement of migration between heterogeneous hosts:
:
: - Handle effective RES0 behaviour of undefined registers, making sure
: that disabling a feature affects full registeres, and not just
: individual control bits. (20250918151402.1665315-1-maz@kernel.org)
:
: - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace.
: (20250911114621.3724469-1-yangjinqian1@huawei.com)
:
: - Turn the NV feature management into a deny-list, and expose
: missing features to EL2 guests.
: (20250912212258.407350-1-oliver.upton@linux.dev)
: .
KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs
KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs
KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs
KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs
KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set
KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs
KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs
KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac
KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs
KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs
KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()
KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED}
KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace
KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits()
KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits()
KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2
KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2}
KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits()
KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2
KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/nv-debug:
: .
: Fix handling of MDSCR_EL1 in NV context, which is unfortunately
: mishandled by the architecture. Patches courtesy of Oliver Upton
: (20250917203125.283116-2-oliver.upton@linux.dev)
: .
KVM: arm64: nv: Apply guest's MDCR traps in nested context
KVM: arm64: nv: Trap debug registers when in hyp context
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/gic-v5-nv:
: .
: Add NV support to GICv5 in GICv3 emulation mode, ensuring that the v3
: guest support is identical to that of a pure v3 platform.
:
: Patches courtesy of Sascha Bischoff (20250828105925.3865158-1-sascha.bischoff@arm.com)
: .
irqchip/gic-v5: Drop has_gcie_v3_compat from gic_kvm_info
KVM: arm64: Use ARM64_HAS_GICV5_LEGACY for GICv5 probing
arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capability
KVM: arm64: Enable nested for GICv5 host with FEAT_GCIE_LEGACY
KVM: arm64: Don't access ICC_SRE_EL2 if GICv3 doesn't support v2 compatibility
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/52bit-at:
: .
: Upgrade the S1 page table walker to support 52bit PA, and use it to
: report the fault level when taking a S2 fault on S1PTW, which is required
: by the architecture (20250915114451.660351-1-maz@kernel.org).
: .
KVM: arm64: selftest: Expand external_aborts test to look for TTW levels
KVM: arm64: Populate level on S1PTW SEA injection
KVM: arm64: Add S1 IPA to page table level walker
KVM: arm64: Add filtering hook to S1 page table walk
KVM: arm64: Don't switch MMU on translation from non-NV context
KVM: arm64: Allow EL1 control registers to be accessed from the CPU state
KVM: arm64: Allow use of S1 PTW for non-NV vcpus
KVM: arm64: Report faults from S1 walk setup at the expected start level
KVM: arm64: Expand valid block mappings to FEAT_LPA/LPA2 support
KVM: arm64: Populate PAR_EL1 with 52bit addresses
KVM: arm64: Compute shareability for LPA2
KVM: arm64: Pass the walk_info structure to compute_par_s1()
KVM: arm64: Decouple output address from the PT descriptor
KVM: arm64: Compute 52bit TTBR address and alignment
KVM: arm64: Account for 52bit when computing maximum OA
KVM: arm64: Add helper computing the state of 52bit PA support
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add a basic test corrupting a level-2 table entry to check that
the resulting abort is a SEA on a PTW at level-3.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Our fault injection mechanism is mildly primitive, and doesn't
really implement the architecture when it comes to reporting
the level of a failing S1 PTW (we blindly report a SEA outside
of a PTW).
Now that we can walk the S1 page tables and look for a particular
IPA in the descriptors, it is pretty easy to improve the SEA
injection code.
Note that we only do it for AArch64 guests, and that 32bit guests
are left to their own device (oddly enough, I don't fancy writing
a 32bit PTW...).
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Use the filtering hook infrastructure to implement a new walker
that, for a given VA and an IPA, returns the level of the first
occurence of this IPA in the walk from that VA.
This will be used to improve our SEA syndrome reporting.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add a filtering hook that can get called on each level of the
walk, and providing access to the full state.
Crucially, this is called *before* the access is made, so that
it is possible to track down the level of a faulting access.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
If calling into the AT code from guest EL1, there is no need
to consider any context switch, as we are guaranteed to be
in the correct context.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
As we are about to plug the SW PTW into the EL1-only code, we can
no longer assume that the EL1 state is not resident on the CPU,
as we don't necessarily get there from EL2 traps.
Turn the __vcpu_sys_reg() access on the EL1 state into calls to
the vcpu_read_sys_reg() helper, which is guaranteed to do the
right thing.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
As we are about to use the S1 PTW in non-NV contexts, we must make
sure that we don't evaluate the EL2 state when dealing with the EL1&0
translation regime.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Translation faults from TTBR must be reported on the start level,
and not level-0. Enforcing this requires moving quite a lot of
code around so that the start level can be computed early enough
that it is usable.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
With 52bit PAs, block mappings can exist at different levels (such
as level 0 for 4kB pages, or level 1 for 16kB and 64kB pages).
Account for this in walk_s1().
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Expand the output address populated in PAR_EL1 to 52bit addresses.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
LPA2 gets the memory access shareability from TCR_ELx instead of
getting it form the descriptors. Store it in the walk info struct
so that it is passed around and evaluated as required.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Instead of just passing the translation regime, pass the full
walk_info structure to compute_par_s1(). This will help further
chamges that will require it.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add a helper converting the descriptor into a nicely formed OA,
irrespective of the in-descriptor representation (< 52bit, LPA
or LPA2).
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
52bit addresses from TTBR need extra adjustment and alignment
checks. Implement the requirements of the architecture.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Adjust the computation of the max OA to account for 52bit PAs.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Track whether the guest is using 52bit PAs, either LPA or LPA2.
This further simplifies the handling of LVA for 4k and 16k pages,
as LPA2 implies LVA in this case.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The changes to the debug architecture up to v8.8 are concerned with
external debug, which of course has no direct impact on VMs. Raise the
feature limit and document what's preventing us from raising it further.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
While KVM does not expose IMPDEF features to VMs, FEAT_TIDCP1 is an
architecturally-defined EL1 trap of a particular sysreg encoding range.
Furthermore, KVM already advertises this feature to non-NV VMs.
As there is no interaction with EL2 traps, expose the feature.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
FEAT_SpecSEI is an informational feature describing whether speculative
loads may generate SErrors. Since there are already cases where KVM
reinjects an SError into the VM it is already possible this may happen
due to a speculative load within the VM.
Stop hiding the feature from NV-enabled VMs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM now handles HCR_EL2.{TWEDEn,TWEDEL} correctly when computing the
effective HCR for a nested context. Advertise the feature.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Ignore the guest hypervisor's configured TWE delay if it hasn't actually
requested WFE traps. Otherwise, OR'ing these fields into the effective
HCR when the guest sets TWE is safe as KVM doesn't use FEAT_TWED and
leaves the fields initialized to 0.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
FEAT_AFP doesn't intersect with any EL2 trap behavior, expose to
NV-enabled VMs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The exact wording of the restrictions on branch prediction due to
FEAT_ECBHB in DDI0487L.b is as follows:
When FEAT_ECBHB is implemented, the branch history information created
in a context before an exception to a higher Exception level using
AArch64 cannot be used by code before that exception to exploitatively
control the execution of any indirect branches in code in a different
context after the exception.
While vEL2 and EL1 are multiplexed at EL1, they exist in different
hardware-described contexts as KVM uses different stage-2 MMUs to
represent the corresponding translation regimes. Additionally, exception
entries into vEL2 always imply a hardware exception entry into literal EL2
for the emulated regime change.
Given all of this, and the fact that FEAT_ECBHB places no limitation on
the EL of the protected context after the exception, we can claim
FEAT_ECBHB on supporting hardware.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM already supports FEAT_RASv1p1 for NV-enabled VMs but only when
advertised through the canonical field. Stop masking the silly frac
field to expose the feature on systems without FEAT_DF.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The supporting infrastructure in KVM's abort injection code was merged a
while ago, but the author (me!) forgot to relax the NV limitation when
FEAT_DF2 got exposed to non-NV VMs. Fix it.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature
fields where a non-negative value implies that a feature is implemented
and a negative value implies that it is not. While the intention of
masking this field was likely to hide the feature, KVM actually
advertises it, even on unsupporting hardware.
Remove FEAT_DoubleLock from the mask, making the NI value visible to the
VM. Take care to accept the old, incorrect values for this field as
we've lied to userspace.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Consistently use denylisting of features such that the limitations of
KVM's nested implementation are explicitly documented (rather than
implied).
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Assert that the EL2 features {HCX, TWED} of ID_AA64MMFR1_EL1 are writable
from userspace. They are only allowed to be downgraded in userspace.
Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Allow userspace to downgrade {HCX, TWED} in ID_AA64MMFR1_EL1. Userspace can
only change the value from high to low.
Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
While MDCR_EL2 cannot be RES0, convert it to the same infrastructure
anyway, as it make things cleaner.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
While SCTLR_EL1 cannot be RES0, convert it to the same infrastructure
anyway, as it make things cleaner.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Enforce that TCR2_EL2 are RES0 when FEAT_TCR2 isn't present.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Enforce that SCTLR2_EL{1,2} are RES0 when FEAT_SCTLR2 isn't present.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
While HCR_EL2 is unlikely to ever be RES0 (at least when NV is on),
but consistency doesn't hurt, and it can be described in the same
way as the other registers.
Convert it over to the new RES0-computing infrastructure.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add the dependency between the HCRX_EL2 register and FEAT_HCX.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Similarly to the FEAT_FGT registers, add the dependency between
the registers and the controlling feature.
WHile we're at it, add the missing checks for the RES0 vs valid
bit overlap.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
As we want to enforce FGT registers behaving as RES0 when FEAT_FGT
is not exposed to the guest, We move a bumch of things that are
so far passed as parameter into a structure that points to the
bit description.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
struct reg_bits_to_feat_map is great to describe bit-to-feature
dependency, but not so much to describe register-to-feature
dependency. Yet both need to exist.
Add a new reg_feat_map_desc structure to describe this.
Extra complexity is added by the need to source the RES0 bits from
the runtime-computed FGT masks, for which we need an extra flag
and extra complexity. Oh well.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Turns out I'm rather bad at noticing that the description of features
has already been added. Remove superflusous definitions for SYSREG128
and MTE2.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM advertises the stage-2 TGRAN fields as writable to userspace but
prevents any modification for NV-enabled VMs. Update the special-cased
sanitization to permit de-featuring a particular TGRAN without allowing
the legacy value which refers to the stage-1 field for support.
Reported-by: Itaru Kitayama <itaru.kitayama@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
SErrors are not deliverable at EL2 when the effective value of
HCR_EL2.{TGE,AMO} = {0, 0}. This is bothersome to deal with in nested
as we need to use auxiliary pending state to track the pending vSError
since HCR_EL2.VSE has no mechanism for honoring the guest HCR. On top of
that, we have no way of making that auxiliary pending state visible in
ISR_EL1.
A defect against the architecture now allows an implementation to treat
HCR_EL2.AMO as 1 when HCR_EL2.{E2H,TGE} = {1, 0}. Let's do exactly that,
meaning SErrors are always deliverable at EL2 for the typical E2H=RES1
VM.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM needs to ensure the guest hypervisor's traps take effect when the
vCPU is in a nested context. While supporting infrastructure is in place
for most of the EL2 trap registers, MDCR_EL2 is not.
Fold the guest's trap configuration into the effective MDCR_EL2. Apply
it directly to the in-memory representation as it gets recomputed on
every vcpu_load() anyway.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
In case you haven't realized it yet, the architecture is _slightly_
broken in the context of nested virt. Here we have another example of
FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually
affects execution at vEL2.
Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this
mess at the expense of unnecessarily trapping the breakpoint/watchpoint
registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for
obvious correctness to start.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The presence of FEAT_GCIE_LEGACY is now handled as a CPU
feature. Therefore, drop the check and flag from the GIC driver and
gic_kvm_info as it is no longer required or used by KVM.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The previous implementation of the probing function had the flaw that
it wouldn't catch mismatched CPU features. Specifically, GICv5 legacy
support (support for GICv3 VMs on a GICv5 host) was being enabled as
long as the initial boot CPU had support for the feature. This allowed
the support to become enabled on mismatched configurations.
Move to using cpus_have_final_cap(ARM64_HAS_GICV5_LEGACY) instead,
which only returns true when all booted CPUs support
FEAT_GCIE_LEGACY. A byproduct of this is that it ensures that late
onlining of CPUs is blocked on feature mismatch.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>