Commit Graph

1185830 Commits

Author SHA1 Message Date
Oliver Upton
92d05e2492 Merge branch kvm-arm64/ampere1-hafdbs-mitigation into kvmarm/next
* kvm-arm64/ampere1-hafdbs-mitigation:
  : AmpereOne erratum AC03_CPU_38 mitigation
  :
  : AmpereOne does not advertise support for FEAT_HAFDBS due to an
  : underlying erratum in the feature. The associated control bits do not
  : have RES0 behavior as required by the architecture.
  :
  : Introduce mitigations to prevent KVM from enabling the feature at
  : stage-2 as well as preventing KVM guests from enabling HAFDBS at
  : stage-1.
  KVM: arm64: Prevent guests from enabling HA/HD on Ampere1
  KVM: arm64: Refactor HFGxTR configuration into separate helpers
  arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-16 00:49:36 +00:00
Oliver Upton
082fdfd138 KVM: arm64: Prevent guests from enabling HA/HD on Ampere1
An erratum in the HAFDBS implementation in AmpereOne was addressed by
clearing the feature in the ID register, with the expectation that
software would not attempt to use the corresponding controls in TCR_EL1.
The architecture, on the other hand, takes a much more pedantic stance
on the subject, requiring the TCR bits behave as RES0.

Take an extremely conservative stance on the issue and leverage the
precise write trap afforded by FGT. Handle guest writes by clearing HA
and HD before writing the intended value to the EL1 register alias.

Link: https://lore.kernel.org/r/20230609220104.1836988-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-16 00:31:44 +00:00
Oliver Upton
ce4a362257 KVM: arm64: Refactor HFGxTR configuration into separate helpers
A subsequent change will need to flip more trap bits in HFGWTR_EL2. Make
room for this by factoring out the programming of the HFGxTR registers
into helpers and using locals to build the set/clear masks.

Link: https://lore.kernel.org/r/20230609220104.1836988-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-16 00:31:44 +00:00
Oliver Upton
6df696cd9b arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2
AmpereOne has an erratum in its implementation of FEAT_HAFDBS that
required disabling the feature on the design. This was done by reporting
the feature as not implemented in the ID register, although the
corresponding control bits were not actually RES0. This does not align
well with the requirements of the architecture, which mandates these
bits be RES0 if HAFDBS isn't implemented.

The kernel's use of stage-1 is unaffected, as the HA and HD bits are
only set if HAFDBS is detected in the ID register. KVM, on the other
hand, relies on the RES0 behavior at stage-2 to use the same value for
VTCR_EL2 on any cpu in the system. Mitigate the non-RES0 behavior by
leaving VTCR_EL2.HA clear on affected systems.

Cc: stable@vger.kernel.org
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Acked-by: D Scott Phillips <scott@os.amperecomputing.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609220104.1836988-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-16 00:31:44 +00:00
Oliver Upton
e1e315c4d5 Merge branch kvm-arm64/misc into kvmarm/next
* kvm-arm64/misc:
  : Miscellaneous updates
  :
  :  - Avoid trapping CTR_EL0 on systems with FEAT_EVT, as the register is
  :    commonly read by userspace
  :
  :  - Make use of FEAT_BTI at hyp stage-1, setting the Guard Page bit to 1
  :    for executable mappings
  :
  :  - Use a separate set of pointer authentication keys for the hypervisor
  :    when running in protected mode (i.e. pKVM)
  :
  :  - Plug a few holes in timer initialization where KVM fails to free the
  :    timer IRQ(s)
  KVM: arm64: Use different pointer authentication keys for pKVM
  KVM: arm64: timers: Fix resource leaks in kvm_timer_hyp_init()
  KVM: arm64: Use BTI for nvhe
  KVM: arm64: Relax trapping of CTR_EL0 when FEAT_EVT is available

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 13:09:43 +00:00
Oliver Upton
89a734b54c Merge branch kvm-arm64/configurable-id-regs into kvmarm/next
* kvm-arm64/configurable-id-regs:
  : Configurable ID register infrastructure, courtesy of Jing Zhang
  :
  : Create generalized infrastructure for allowing userspace to select the
  : supported feature set for a VM, so long as the feature set is a subset
  : of what hardware + KVM allows. This does not add any new features that
  : are user-configurable, and instead focuses on the necessary refactoring
  : to enable future work.
  :
  : As a consequence of the series, feature asymmetry is now deliberately
  : disallowed for KVM. It is unlikely that VMMs ever configured VMs with
  : asymmetry, nor does it align with the kernel's overall stance that
  : features must be uniform across all cores in the system.
  :
  : Furthermore, KVM incorrectly advertised an IMP_DEF PMU to guests for
  : some time. Migrations from affected kernels was supported by explicitly
  : allowing such an ID register value from userspace, and forwarding that
  : along to the guest. KVM now allows an IMP_DEF PMU version to be restored
  : through the ID register interface, but reinterprets the user value as
  : not implemented (0).
  KVM: arm64: Rip out the vestiges of the 'old' ID register scheme
  KVM: arm64: Handle ID register reads using the VM-wide values
  KVM: arm64: Use generic sanitisation for ID_AA64PFR0_EL1
  KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1
  KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes
  KVM: arm64: Save ID registers' sanitized value per guest
  KVM: arm64: Reuse fields of sys_reg_desc for idreg
  KVM: arm64: Rewrite IMPDEF PMU version as NI
  KVM: arm64: Make vCPU feature flags consistent VM-wide
  KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF
  KVM: arm64: Separate out feature sanitisation and initialisation

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 13:05:11 +00:00
Oliver Upton
acfdf34c7d Merge branch for-next/module-alloc into kvmarm/next
* for-next/module-alloc:
  : Drag in module VA rework to handle conflicts w/ sw feature refactor
  arm64: module: rework module VA range selection
  arm64: module: mandate MODULE_PLTS
  arm64: module: move module randomization to module.c
  arm64: kaslr: split kaslr/module initialization
  arm64: kasan: remove !KASAN_VMALLOC remnants
  arm64: module: remove old !KASAN_VMALLOC logic

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 13:04:15 +00:00
Oliver Upton
b710fe0d30 Merge branch kvm-arm64/hvhe into kvmarm/next
* kvm-arm64/hvhe:
  : Support for running split-hypervisor w/VHE, courtesy of Marc Zyngier
  :
  : From the cover letter:
  :
  : KVM (on ARMv8.0) and pKVM (on all revisions of the architecture) use
  : the split hypervisor model that makes the EL2 code more or less
  : standalone. In the later case, we totally ignore the VHE mode and
  : stick with the good old v8.0 EL2 setup.
  :
  : We introduce a new "mode" for KVM called hVHE, in reference to the
  : nVHE mode, and indicating that only the hypervisor is using VHE.
  KVM: arm64: Fix hVHE init on CPUs where HCR_EL2.E2H is not RES1
  arm64: Allow arm64_sw.hvhe on command line
  KVM: arm64: Force HCR_E2H in guest context when ARM64_KVM_HVHE is set
  KVM: arm64: Program the timer traps with VHE layout in hVHE mode
  KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration
  KVM: arm64: Adjust EL2 stage-1 leaf AP bits when ARM64_KVM_HVHE is set
  KVM: arm64: Disable TTBR1_EL2 when using ARM64_KVM_HVHE
  KVM: arm64: Force HCR_EL2.E2H when ARM64_KVM_HVHE is set
  KVM: arm64: Key use of VHE instructions in nVHE code off ARM64_KVM_HVHE
  KVM: arm64: Remove alternatives from sysreg accessors in VHE hypervisor context
  arm64: Use CPACR_EL1 format to set CPTR_EL2 when E2H is set
  arm64: Allow EL1 physical timer access when running VHE
  arm64: Don't enable VHE for the kernel if OVERRIDE_HVHE is set
  arm64: Add KVM_HVHE capability and has_hvhe() predicate
  arm64: Turn kaslr_feature_override into a generic SW feature override
  arm64: Prevent the use of is_kernel_in_hyp_mode() in hypervisor code
  KVM: arm64: Drop is_kernel_in_hyp_mode() from __invalidate_icache_guest_page()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 13:02:49 +00:00
Oliver Upton
1a08f4927a Merge branch kvm-arm64/ffa-proxy into kvmarm/next
* kvm-arm64/ffa-proxy:
  : pKVM FF-A Proxy, courtesy Will Deacon and Andrew Walbran
  :
  : From the cover letter:
  :
  : pKVM's primary goal is to protect guest pages from a compromised host by
  : enforcing access control restrictions using stage-2 page-tables. Sadly,
  : this cannot prevent TrustZone from accessing non-secure memory, and a
  : compromised host could, for example, perform a 'confused deputy' attack
  : by asking TrustZone to use pages that have been donated to protected
  : guests. This would effectively allow the host to have TrustZone
  : exfiltrate guest secrets on its behalf, hence breaking the isolation
  : that pKVM intends to provide.
  :
  : This series addresses this problem by providing pKVM with the ability to
  : monitor SMCs following the Arm FF-A protocol. FF-A provides (among other
  : things) a set of memory management APIs allowing the Normal World to
  : share, donate or lend pages with Secure. By monitoring these SMCs, pKVM
  : can ensure that the pages that are shared, lent or donated to Secure by
  : the host kernel are only pages that it owns.
  KVM: arm64: pkvm: Add support for fragmented FF-A descriptors
  KVM: arm64: Handle FFA_FEATURES call from the host
  KVM: arm64: Handle FFA_MEM_LEND calls from the host
  KVM: arm64: Handle FFA_MEM_RECLAIM calls from the host
  KVM: arm64: Handle FFA_MEM_SHARE calls from the host
  KVM: arm64: Add FF-A helpers to share/unshare memory with secure world
  KVM: arm64: Handle FFA_RXTX_MAP and FFA_RXTX_UNMAP calls from the host
  KVM: arm64: Allocate pages for hypervisor FF-A mailboxes
  KVM: arm64: Probe FF-A version and host/hyp partition ID during init
  KVM: arm64: Block unsafe FF-A calls from the host

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 13:02:37 +00:00
Oliver Upton
83510396c0 Merge branch kvm-arm64/eager-page-splitting into kvmarm/next
* kvm-arm64/eager-page-splitting:
  : Eager Page Splitting, courtesy of Ricardo Koller.
  :
  : Dirty logging performance is dominated by the cost of splitting
  : hugepages to PTE granularity. On systems that mere mortals can get their
  : hands on, each fault incurs the cost of a full break-before-make
  : pattern, wherein the broadcast invalidation and ensuing serialization
  : significantly increases fault latency.
  :
  : The goal of eager page splitting is to move the cost of hugepage
  : splitting out of the stage-2 fault path and instead into the ioctls
  : responsible for managing the dirty log:
  :
  :  - If manual protection is enabled for the VM, hugepage splitting
  :    happens in the KVM_CLEAR_DIRTY_LOG ioctl. This is desirable as it
  :    provides userspace granular control over hugepage splitting.
  :
  :  - Otherwise, if userspace relies on the legacy dirty log behavior
  :    (clear on collection), hugepage splitting is done at the moment dirty
  :    logging is enabled for a particular memslot.
  :
  : Support for eager page splitting requires explicit opt-in from
  : userspace, which is realized through the
  : KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE capability.
  arm64: kvm: avoid overflow in integer division
  KVM: arm64: Use local TLBI on permission relaxation
  KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG
  KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked()
  KVM: arm64: Split huge pages when dirty logging is enabled
  KVM: arm64: Add kvm_uninit_stage2_mmu()
  KVM: arm64: Refactor kvm_arch_commit_memory_region()
  KVM: arm64: Add kvm_pgtable_stage2_split()
  KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
  KVM: arm64: Export kvm_are_all_memslots_empty()
  KVM: arm64: Add helper for creating unlinked stage2 subtrees
  KVM: arm64: Add KVM_PGTABLE_WALK flags for skipping CMOs and BBM TLBIs
  KVM: arm64: Rename free_removed to free_unlinked

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 13:02:11 +00:00
Oliver Upton
686672407e KVM: arm64: Rip out the vestiges of the 'old' ID register scheme
There's no longer a need for the baggage of the old scheme for handling
configurable ID register fields. Rip it all out in favor of the
generalized infrastructure.

Link: https://lore.kernel.org/r/20230609190054.1542113-12-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 12:55:35 +00:00
Oliver Upton
6db7af0d5b KVM: arm64: Handle ID register reads using the VM-wide values
Everything is in place now to use the generic ID register
infrastructure. Use the VM-wide values to service ID register reads.
The ID registers are invariant after the VM has started, so there is no
need for locking in that case. This is rather desirable for VM live
migration, as the needless lock contention could prolong the VM blackout
period.

Link: https://lore.kernel.org/r/20230609190054.1542113-11-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 12:55:35 +00:00
Jing Zhang
c39f5974d3 KVM: arm64: Use generic sanitisation for ID_AA64PFR0_EL1
KVM allows userspace to write to the CSV2 and CSV3 fields of
ID_AA64PFR0_EL1 so long as it doesn't over-promise on the
Spectre/Meltdown mitigation state. Switch over to the new way of the
world for screening user writes. Leave the old plumbing in place until
we actually start handling ID register reads from the VM-wide values.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20230609190054.1542113-10-oliver.upton@linux.dev
[Oliver: split from monster patch, added commit description]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 12:55:29 +00:00
Jing Zhang
c118cead07 KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1
KVM allows userspace to specify a PMU version for the guest by writing
to the corresponding ID registers. Currently the validation of these
writes is done manuallly, but there's no reason we can't switch over to
the generic sanitisation infrastructure.

Start screening user writes through arm64_check_features() to prevent
userspace from over-promising in terms of vPMU support. Leave the old
masking in place for now, as we aren't completely ready to serve reads
from the VM-wide values.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20230609190054.1542113-9-oliver.upton@linux.dev
[Oliver: split off from monster patch, cleaned up handling of NI vPMU
 values, wrote commit description]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 12:55:20 +00:00
Jing Zhang
2e8bf0cbd0 KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes
Rather than reinventing the wheel in KVM to do ID register sanitisation
we can rely on the work already done in the core kernel. Implement a
generalized sanitisation of ID registers based on the combination of the
arm64_ftr_bits definitions from the core kernel and (optionally) a set
of KVM-specific overrides.

This all amounts to absolutely nothing for now, but will be used in
subsequent changes to realize user-configurable ID registers.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20230609190054.1542113-8-oliver.upton@linux.dev
[Oliver: split off from monster patch, rewrote commit description,
 reworked RAZ handling, return EINVAL to userspace]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 12:55:20 +00:00
Jing Zhang
4733414690 KVM: arm64: Save ID registers' sanitized value per guest
Initialize the default ID register values upon the first call to
KVM_ARM_VCPU_INIT. The vCPU feature flags are finalized at that point,
so it is possible to determine the maximum feature set supported by a
particular VM configuration. Do nothing with these values for now, as we
need to rework the plumbing of what's already writable to be compatible
with the generic infrastructure.

Co-developed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20230609190054.1542113-7-oliver.upton@linux.dev
[Oliver: Hoist everything into KVM_ARM_VCPU_INIT time, so the features
 are final]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 12:55:08 +00:00
Marc Zyngier
1700f89cb9 KVM: arm64: Fix hVHE init on CPUs where HCR_EL2.E2H is not RES1
On CPUs where E2H is RES1, we very quickly set the scene for
running EL2 with a VHE configuration, as we do not have any other
choice.

However, CPUs that conform to the current writing of the architecture
start with E2H=0, and only later upgrade with E2H=1. This is all
good, but nothing there is actually reconfiguring EL2 to be able
to correctly run the kernel at EL1. Huhuh...

The "obvious" solution is not to just reinitialise the timer
controls like we do, but to really intitialise *everything*
unconditionally.

This requires a bit of surgery, and is a good opportunity to
remove the macro that messes with SPSR_EL2 in init_el2_state.

With that, hVHE now works correctly on my trusted A55 machine!

Reported-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614155129.2697388-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15 09:27:51 +00:00
Mostafa Saleh
8c15c2a028 KVM: arm64: Use different pointer authentication keys for pKVM
When the use of pointer authentication is enabled in the kernel it
applies to both the kernel itself as well as KVM's nVHE hypervisor. The
same keys are used for both the kernel and the nVHE hypervisor, which is
less than desirable for pKVM as the host is not trusted at runtime.

Naturally, the fix is to use a different set of keys for the hypervisor
when running in protected mode. Have the host generate a new set of keys
for the hypervisor before deprivileging the kernel. While there might be
other sources of random directly available at EL2, this keeps the
implementation simple, and the host is trusted anyways until it is
deprivileged.

Since the host and hypervisor no longer share a set of pointer
authentication keys, start context switching them on the host entry/exit
path exactly as we do for guest entry/exit. There is no need to handle
CPU migration as the nVHE code is not migratable in the first place.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20230614122600.2098901-1-smostafa@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-14 15:17:32 +00:00
Dan Carpenter
21e87daece KVM: arm64: timers: Fix resource leaks in kvm_timer_hyp_init()
Smatch detected this bug:
    arch/arm64/kvm/arch_timer.c:1425 kvm_timer_hyp_init()
    warn: missing unwind goto?

There are two resources to be freed the vtimer and ptimer.  The
line that Smatch complains about should free the vtimer first
before returning and then after that cleanup code should free
the ptimer.

I've added a out_free_ptimer_irq to free the ptimer and renamed
the existing label to out_free_vtimer_irq.

Fixes: 9e01dc76be ("KVM: arm/arm64: arch_timer: Assign the phys timer on VHE systems")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/72fffc35-7669-40b1-9d14-113c43269cf3@kili.mountain
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-13 12:06:05 +00:00
Marc Zyngier
ad744e8cb3 arm64: Allow arm64_sw.hvhe on command line
Add the arm64_sw.hvhe=1 option to force the use of the hVHE mode
in the hypervisor code only.

This enables the hVHE mode of operation when using KVM on VHE
hardware.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-17-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:24 +00:00
Marc Zyngier
38cba55008 KVM: arm64: Force HCR_E2H in guest context when ARM64_KVM_HVHE is set
Also make sure HCR_EL2.E2H is set when switching HCR_EL2 in guest
context.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-16-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:24 +00:00
Marc Zyngier
aca18585db KVM: arm64: Program the timer traps with VHE layout in hVHE mode
Just like the rest of the timer code, we need to shift the enable
bits around when HCR_EL2.E2H is set, which is the case in hVHE mode.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-15-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:24 +00:00
Marc Zyngier
75c76ab5a6 KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration
Just like we repainted the early arm64 code, we need to update
the CPTR_EL2 accesses that are taking place in the nVHE code
when hVHE is used, making them look as if they were CPACR_EL1
accesses. Just like the VHE code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-14-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:24 +00:00
Marc Zyngier
6537565fd9 KVM: arm64: Adjust EL2 stage-1 leaf AP bits when ARM64_KVM_HVHE is set
El2 stage-1 page-table format is subtly (and annoyingly) different
when HCR_EL2.E2H is set.

Take the ARM64_KVM_HVHE configuration into account when setting
the AP bits.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-13-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
cff3b5cf96 KVM: arm64: Disable TTBR1_EL2 when using ARM64_KVM_HVHE
When using hVHE, we end-up with two TTBRs at EL2. That's great,
but we're not quite ready for this just yet.

Disable TTBR1_EL2 by setting TCR_EL2.EPD1 so that we only
translate via TTBR0_EL2.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-12-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
d0daf5a21e KVM: arm64: Force HCR_EL2.E2H when ARM64_KVM_HVHE is set
Obviously, in order to be able to use VHE whilst at EL2, we need
to set HCR_EL2.E2H. Do so when ARM64_KVM_HVHE is set.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-11-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
6f617d3aa6 KVM: arm64: Key use of VHE instructions in nVHE code off ARM64_KVM_HVHE
We can now start with the fun stuff: if we enable VHE *only* for
the hypervisor, we need to generate the VHE instructions when
accessing the system registers.

For this, reporpose the alternative sequence to be keyed off
ARM64_KVM_HVHE in the nVHE hypervisor code, and only there.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
57e784b407 KVM: arm64: Remove alternatives from sysreg accessors in VHE hypervisor context
In the VHE hypervisor code, we should be using the remapped VHE
accessors, no ifs, no buts. No need to generate any alternative.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
659803aef4 arm64: Use CPACR_EL1 format to set CPTR_EL2 when E2H is set
When HCR_EL2.E2H is set, the CPTR_EL2 register takes the CPACR_EL1
format. Yes, this is good fun.

Hack the bits of startup code that assume E2H=0 while setting up
CPTR_EL2 to make them grok the CPTR_EL1 format.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-8-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
9e7462bbe0 arm64: Allow EL1 physical timer access when running VHE
To initialise the timer access from EL2 when HCR_EL2.E2H is set,
we must make use the CNTHCTL_EL2 formap used is appropriate.

This amounts to shifting the timer/counter enable bits by 10
to the left.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
7a26e1f51e arm64: Don't enable VHE for the kernel if OVERRIDE_HVHE is set
If the OVERRIDE_HVHE SW override is set (as a precursor of
the KVM_HVHE capability), do not enable VHE for the kernel
and drop to EL1 as if VHE was either disabled or unavailable.

Further changes will enable VHE at EL2 only, with the kernel
still running at EL1.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
e2d6c906f0 arm64: Add KVM_HVHE capability and has_hvhe() predicate
Expose a capability keying the hVHE feature as well as a new
predicate testing it. Nothing is so far using it, and nothing
is enabling it yet.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
0ddc312b7c arm64: Turn kaslr_feature_override into a generic SW feature override
Disabling KASLR from the command line is implemented as a feature
override. Repaint it slightly so that it can further be used as
more generic infrastructure for SW override purposes.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
35230be87e arm64: Prevent the use of is_kernel_in_hyp_mode() in hypervisor code
Using is_kernel_in_hyp_mode() in hypervisor code is a pretty bad
mistake. This helper only checks for CurrentEL being EL2, which
is always true.

Make the compilation fail if using the helper in hypervisor context
Whilst we're at it, flag the helper as __always_inline, which it
really should be.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Marc Zyngier
c4b9fd2ac0 KVM: arm64: Drop is_kernel_in_hyp_mode() from __invalidate_icache_guest_page()
It is pretty obvious that is_kernel_in_hyp_mode() doesn't make much
sense in the hypervisor part of KVM, and should be reserved to the
kernel side.

However, mem_protect.c::invalidate_icache_guest_page() calls into
__invalidate_icache_guest_page(), which uses is_kernel_in_hyp_mode().
Given that this is part of the pKVM side of the hypervisor, this
helper can only return true.

Nothing goes really bad, but __invalidate_icache_guest_page() could
spell out what the actual check is: we cannot invalidate the cache
if the i-cache is VPIPT and we're running at EL1.

Drop the is_kernel_in_hyp_mode() check for an explicit check against
CurrentEL being EL1 or not.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230609162200.2024064-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:17:23 +00:00
Jing Zhang
d86cde6e33 KVM: arm64: Reuse fields of sys_reg_desc for idreg
sys_reg_desc::{reset, val} are presently unused for ID register
descriptors. Repurpose these fields to support user-configurable ID
registers.
Use the ::reset() function pointer to return the sanitised value of a
given ID register, optionally with KVM-specific feature sanitisation.
Additionally, keep a mask of writable register fields in ::val.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20230609190054.1542113-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:08:33 +00:00
Oliver Upton
f90f9360c3 KVM: arm64: Rewrite IMPDEF PMU version as NI
KVM allows userspace to write an IMPDEF PMU version to the corresponding
32bit and 64bit ID register fields for the sake of backwards
compatibility with kernels that lacked commit 3d0dba5764 ("KVM: arm64:
PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation"). Plumbing
that IMPDEF PMU version through to the gues is getting in the way of
progress, and really doesn't any sense in the first place.

Bite the bullet and reinterpret the IMPDEF PMU version as NI (0) for
userspace writes. Additionally, spill the dirty details into a comment.

Link: https://lore.kernel.org/r/20230609190054.1542113-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:08:33 +00:00
Oliver Upton
2251e9ff15 KVM: arm64: Make vCPU feature flags consistent VM-wide
To date KVM has allowed userspace to construct asymmetric VMs where
particular features may only be supported on a subset of vCPUs. This
wasn't really the intened usage pattern, and it is a total pain in the
ass to keep working in the kernel. What's more, this is at odds with CPU
features in host userspace, where asymmetric features are largely hidden
or disabled.

It's time to put an end to the whole game. Require all vCPUs in the VM
to have the same feature set, rejecting deviants in the
KVM_ARM_VCPU_INIT ioctl. Preserve some of the vestiges of per-vCPU
feature flags in case we need to reinstate the old behavior for some
limited configurations. Yes, this is a sign of cowardice around a
user-visibile change.

Hoist all of the 32-bit limitations into kvm_vcpu_init_check_features()
to avoid nested attempts to acquire the config_lock, which won't end
well.

Link: https://lore.kernel.org/r/20230609190054.1542113-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:08:33 +00:00
Oliver Upton
e3c1c0cae3 KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF
Allow the value of KVM_ARM_VCPU_POWER_OFF to differ between calls to
KVM_ARM_VCPU_INIT. Userspace can already change the state of the vCPU
through the KVM_SET_MP_STATE ioctl, so making the bit invariant seems
needlessly restrictive.

Link: https://lore.kernel.org/r/20230609190054.1542113-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:08:33 +00:00
Oliver Upton
a7a2c72ae0 KVM: arm64: Separate out feature sanitisation and initialisation
kvm_vcpu_set_target() iteratively sanitises and copies feature flags in
one go. This is rather odd, especially considering the fact that bitmap
accessors can do the heavy lifting. A subsequent change will make vCPU
features VM-wide, and fitting that into the present implementation is
just a chore.

Rework the whole thing to use bitmap accessors to sanitise and copy
flags.

Link: https://lore.kernel.org/r/20230609190054.1542113-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12 23:08:33 +00:00
Mark Rutland
3e35d303ab arm64: module: rework module VA range selection
Currently, the modules region is 128M in size, which is a problem for
some large modules. Shanker reports [1] that the NVIDIA GPU driver alone
can consume 110M of module space in some configurations. We'd like to
make the modules region a full 2G such that we can always make use of a
2G range.

It's possible to build kernel images which are larger than 128M in some
configurations, such as when many debug options are selected and many
drivers are built in. In these configurations, we can't legitimately
select a base for a 128M module region, though we currently select a
value for which allocation will fail. It would be nicer to have a
diagnostic message in this case.

Similarly, in theory it's possible to build a kernel image which is
larger than 2G and which cannot support modules. While this isn't likely
to be the case for any realistic kernel deplyed in the field, it would
be nice if we could print a diagnostic in this case.

This patch reworks the module VA range selection to use a 2G range, and
improves handling of cases where we cannot select legitimate module
regions. We now attempt to select a 128M region and a 2G region:

* The 128M region is selected such that modules can use direct branches
  (with JUMP26/CALL26 relocations) to branch to kernel code and other
  modules, and so that modules can reference data and text (using PREL32
  relocations) anywhere in the kernel image and other modules.

  This region covers the entire kernel image (rather than just the text)
  to ensure that all PREL32 relocations are in range even when the
  kernel data section is absurdly large. Where we cannot allocate from
  this region, we'll fall back to the full 2G region.

* The 2G region is selected such that modules can use direct branches
  with PLTs to branch to kernel code and other modules, and so that
  modules can use reference data and text (with PREL32 relocations) in
  the kernel image and other modules.

  This region covers the entire kernel image, and the 128M region (if
  one is selected).

The two module regions are randomized independently while ensuring the
constraints described above.

[1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 17:39:06 +01:00
Mark Rutland
ea3752ba96 arm64: module: mandate MODULE_PLTS
Contemporary kernels and modules can be relatively large, especially
when common debug options are enabled. Using GCC 12.1.0, a v6.3-rc7
defconfig kernel is ~38M, and with PROVE_LOCKING + KASAN_INLINE enabled
this expands to ~117M. Shanker reports [1] that the NVIDIA GPU driver
alone can consume 110M of module space in some configurations.

Both KASLR and ARM64_ERRATUM_843419 select MODULE_PLTS, so anyone
wanting a kernel to have KASLR or run on Cortex-A53 will have
MODULE_PLTS selected. This is the case in defconfig and distribution
kernels (e.g. Debian, Android, etc).

Practically speaking, this means we're very likely to need MODULE_PLTS
and while it's almost guaranteed that MODULE_PLTS will be selected, it
is possible to disable support, and we have to maintain some awkward
special cases for such unusual configurations.

This patch removes the MODULE_PLTS config option, with the support code
always enabled if MODULES is selected. This results in a slight
simplification, and will allow for further improvement in subsequent
patches.

For any config which currently selects MODULE_PLTS, there will be no
functional change as a result of this patch.

[1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 17:39:05 +01:00
Mark Rutland
e46b7103ae arm64: module: move module randomization to module.c
When CONFIG_RANDOMIZE_BASE=y, module_alloc_base is a variable which is
configured by kaslr_module_init() in kaslr.c, and otherwise it is an
expression defined in module.h.

As kaslr_module_init() is no longer tightly coupled with the KASLR
initialization code, we can centralize this in module.c.

This patch moves kaslr_module_init() to module.c, making
module_alloc_base a static variable, and removing redundant includes from
kaslr.c. For the defintion of struct arm64_ftr_override we must include
<asm/cpufeature.h>, which was previously included transitively via
another header.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 17:39:05 +01:00
Mark Rutland
6e13b6b923 arm64: kaslr: split kaslr/module initialization
Currently kaslr_init() handles a mixture of detecting/announcing whether
KASLR is enabled, and randomizing the module region depending on whether
KASLR is enabled.

To make it easier to rework the module region initialization, split the
KASLR initialization into two steps:

* kaslr_init() determines whether KASLR should be enabled, and announces
  this choice, recording this to a new global boolean variable. This is
  called from setup_arch() just before the existing call to
  kaslr_requires_kpti() so that this will always provide the expected
  result.

* kaslr_module_init() randomizes the module region when required. This
  is called as a subsys_initcall, where we previously called
  kaslr_init().

As a bonus, moving the KASLR reporting earlier makes it easier to spot
and permits it to be logged via earlycon, making it easier to debug any
issues that could be triggered by KASLR.

Booting a v6.4-rc1 kernel with this patch applied, the log looks like:

| EFI stub: Booting Linux Kernel...
| EFI stub: Generating empty DTB
| EFI stub: Exiting boot services...
| [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
| [    0.000000] Linux version 6.4.0-rc1-00006-g4763a8f8aeb3 (mark@lakrids) (aarch64-linux-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) #2 SMP PREEMPT Tue May  9 11:03:37 BST 2023
| [    0.000000] KASLR enabled
| [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
| [    0.000000] printk: bootconsole [pl11] enabled

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 17:39:05 +01:00
Mark Rutland
55123afffe arm64: kasan: remove !KASAN_VMALLOC remnants
Historically, KASAN could be selected with or without KASAN_VMALLOC, but
since commit:

  f6f37d9320 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes")

... we can never select KASAN without KASAN_VMALLOC on arm64, and thus
arm64 code for KASAN && !KASAN_VMALLOC is redundant and can be removed.

Remove the redundant code kasan_init.c

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 17:39:05 +01:00
Mark Rutland
8339f7d8e1 arm64: module: remove old !KASAN_VMALLOC logic
Historically, KASAN could be selected with or without KASAN_VMALLOC, and
we had to be very careful where to place modules when KASAN_VMALLOC was
not selected.

However, since commit:

  f6f37d9320 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes")

Selecting CONFIG_KASAN on arm64 will also select CONFIG_KASAN_VMALLOC,
and so the logic for handling CONFIG_KASAN without CONFIG_KASAN_VMALLOC
is redundant and can be removed.

Note: the "kasan.vmalloc={on,off}" option which only exists for HW_TAGS
changes whether the vmalloc region is given non-match-all tags, and does
not affect the page table manipulation code.

The VM_DEFER_KMEMLEAK flag was only necessary for !CONFIG_KASAN_VMALLOC
as described in its introduction in commit:

  60115fa54a ("mm: defer kmemleak object creation of module_alloc()")

... and therefore it can also be removed.

Remove the redundant logic for !CONFIG_KASAN_VMALLOC. At the same time,
add the missing braces around the multi-line conditional block in
arch/arm64/kernel/module.c.

Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 17:39:05 +01:00
Quentin Perret
0a9f15fd56 KVM: arm64: pkvm: Add support for fragmented FF-A descriptors
FF-A memory descriptors may need to be sent in fragments when they don't
fit in the mailboxes. Doing so involves using the FRAG_TX and FRAG_RX
primitives defined in the FF-A protocol.

Add support in the pKVM FF-A relayer for fragmented descriptors by
monitoring outgoing FRAG_TX transactions and by buffering large
descriptors on the reclaim path.

Co-developed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230523101828.7328-11-will@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-01 21:34:51 +00:00
Fuad Tabba
20936cd114 KVM: arm64: Handle FFA_FEATURES call from the host
Filter out advertising unsupported features, and only advertise
features and properties that are supported by the hypervisor proxy.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230523101828.7328-10-will@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-01 21:34:51 +00:00
Will Deacon
634d90cf0a KVM: arm64: Handle FFA_MEM_LEND calls from the host
Handle FFA_MEM_LEND calls from the host by treating them identically to
FFA_MEM_SHARE calls for the purposes of the host stage-2 page-table, but
forwarding on the original request to EL3.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230523101828.7328-9-will@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-01 21:34:51 +00:00
Will Deacon
0e3bcb49c1 KVM: arm64: Handle FFA_MEM_RECLAIM calls from the host
Intecept FFA_MEM_RECLAIM calls from the host and transition the host
stage-2 page-table entries from the SHARED_OWNED state back to the OWNED
state once EL3 has confirmed that the secure mapping has been reclaimed.

Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230523101828.7328-8-will@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-01 21:34:51 +00:00