Pull more RISC-V updates from Palmer Dabbelt:
- Support for handling misaligned accesses in S-mode
- Probing for misaligned access support is now properly cached and
handled in parallel
- PTDUMP now reflects the SW reserved bits, as well as the PBMT and
NAPOT extensions
- Performance improvements for TLB flushing
- Support for many new relocations in the module loader
- Various bug fixes and cleanups
* tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
riscv: Optimize bitops with Zbb extension
riscv: Rearrange hwcap.h and cpufeature.h
drivers: perf: Do not broadcast to other cpus when starting a counter
drivers: perf: Check find_first_bit() return value
of: property: Add fw_devlink support for msi-parent
RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs
riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings
riscv: Don't use PGD entries for the linear mapping
RISC-V: Probe misaligned access speed in parallel
RISC-V: Remove __init on unaligned_emulation_finish()
RISC-V: Show accurate per-hart isa in /proc/cpuinfo
RISC-V: Don't rely on positional structure initialization
riscv: Add tests for riscv module loading
riscv: Add remaining module relocations
riscv: Avoid unaligned access when relocating modules
riscv: split cache ops out of dma-noncoherent.c
riscv: Improve flush_tlb_kernel_range()
riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
riscv: Improve flush_tlb_range() for hugetlb pages
riscv: Improve tlb_flush()
...
Pull RISC-V updates from Palmer Dabbelt:
- Support for cbo.zero in userspace
- Support for CBOs on ACPI-based systems
- A handful of improvements for the T-Head cache flushing ops
- Support for software shadow call stacks
- Various cleanups and fixes
* tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
RISC-V: hwprobe: Fix vDSO SIGSEGV
riscv: configs: defconfig: Enable configs required for RZ/Five SoC
riscv: errata: prefix T-Head mnemonics with th.
riscv: put interrupt entries into .irqentry.text
riscv: mm: Update the comment of CONFIG_PAGE_OFFSET
riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause
riscv/mm: Fix the comment for swap pte format
RISC-V: clarify the QEMU workaround in ISA parser
riscv: correct pt_level name via pgtable_l5/4_enabled
RISC-V: Provide pgtable_l5_enabled on rv32
clocksource: timer-riscv: Increase rating of clock_event_device for Sstc
clocksource: timer-riscv: Don't enable/disable timer interrupt
lkdtm: Fix CFI_BACKWARD on RISC-V
riscv: Use separate IRQ shadow call stacks
riscv: Implement Shadow Call Stack
riscv: Move global pointer loading to a macro
riscv: Deduplicate IRQ stack switching
riscv: VMAP_STACK overflow detection thread-safe
RISC-V: cacheflush: Initialize CBO variables on ACPI systems
RISC-V: ACPI: RHCT: Add function to get CBO block sizes
...
Charlie Jenkins <charlie@rivosinc.com> says:
A handful of module relocations were missing, this patch includes the
remaining ones. I also wrote some test cases to ensure that module
loading works properly. Some relocations cannot be supported in the
kernel, these include the ones that rely on thread local storage and
dynamic linking.
This patch also overhauls the implementation of ADD/SUB/SET/ULEB128
relocations to handle overflow. "Overflow" is different for ULEB128
since it is a variable-length encoding that the compiler can be expected
to generate enough space for. Instead of overflowing, ULEB128 will
expand into the next 8-bit segment of the location.
A psABI proposal [1] was merged that mandates that SET_ULEB128 and
SUB_ULEB128 are paired, however the discussion following the merging of
the pull request revealed that while the pull request was valid, it
would be better for linkers to properly handle this overflow. This patch
proactively implements this methodology for future compatibility.
This can be tested by enabling KUNIT, RUNTIME_KERNEL_TESTING_MENU, and
RISCV_MODULE_LINKING_KUNIT.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/403
* b4-shazam-merge:
riscv: Add tests for riscv module loading
riscv: Add remaining module relocations
riscv: Avoid unaligned access when relocating modules
Link: https://lore.kernel.org/r/20231101-module_relocations-v9-0-8dfa3483c400@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull kvm updates from Paolo Bonzini:
"ARM:
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its
guest
- Optimization for vSGI injection, opportunistically compressing
MPIDR to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems,
reducing the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
LoongArch:
- New architecture for kvm.
The hardware uses the same model as x86, s390 and RISC-V, where
guest/host mode is orthogonal to supervisor/user mode. The
virtualization extensions are very similar to MIPS, therefore the
code also has some similarities but it's been cleaned up to avoid
some of the historical bogosities that are found in arch/mips. The
kernel emulates MMU, timer and CSR accesses, while interrupt
controllers are only emulated in userspace, at least for now.
RISC-V:
- Support for the Smstateen and Zicond extensions
- Support for virtualizing senvcfg
- Support for virtualized SBI debug console (DBCN)
S390:
- Nested page table management can be monitored through tracepoints
and statistics
x86:
- Fix incorrect handling of VMX posted interrupt descriptor in
KVM_SET_LAPIC, which could result in a dropped timer IRQ
- Avoid WARN on systems with Intel IPI virtualization
- Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
without forcing more common use cases to eat the extra memory
overhead.
- Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
SBPB, aka Selective Branch Predictor Barrier).
- Fix a bug where restoring a vCPU snapshot that was taken within 1
second of creating the original vCPU would cause KVM to try to
synchronize the vCPU's TSC and thus clobber the correct TSC being
set by userspace.
- Compute guest wall clock using a single TSC read to avoid
generating an inaccurate time, e.g. if the vCPU is preempted
between multiple TSC reads.
- "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
complain about a "Firmware Bug" if the bit isn't set for select
F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
appease Windows Server 2022.
- Don't apply side effects to Hyper-V's synthetic timer on writes
from userspace to fix an issue where the auto-enable behavior can
trigger spurious interrupts, i.e. do auto-enabling only for guest
writes.
- Remove an unnecessary kick of all vCPUs when synchronizing the
dirty log without PML enabled.
- Advertise "support" for non-serializing FS/GS base MSR writes as
appropriate.
- Harden the fast page fault path to guard against encountering an
invalid root when walking SPTEs.
- Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
- Use the fast path directly from the timer callback when delivering
Xen timer events, instead of waiting for the next iteration of the
run loop. This was not done so far because previously proposed code
had races, but now care is taken to stop the hrtimer at critical
points such as restarting the timer or saving the timer information
for userspace.
- Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
flag.
- Optimize injection of PMU interrupts that are simultaneous with
NMIs.
- Usual handful of fixes for typos and other warts.
x86 - MTRR/PAT fixes and optimizations:
- Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
- Zap EPT entries when non-coherent DMA assignment stops/start to
prevent using stale entries with the wrong memtype.
- Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y
This was done as a workaround for virtual machine BIOSes that did
not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
to set it, in turn), and there's zero reason to extend the quirk to
also ignore guest PAT.
x86 - SEV fixes:
- Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
SHUTDOWN while running an SEV-ES guest.
- Clean up the recognition of emulation failures on SEV guests, when
KVM would like to "skip" the instruction but it had already been
partially emulated. This makes it possible to drop a hack that
second guessed the (insufficient) information provided by the
emulator, and just do the right thing.
Documentation:
- Various updates and fixes, mostly for x86
- MTRR and PAT fixes and optimizations"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
KVM: selftests: Avoid using forced target for generating arm64 headers
tools headers arm64: Fix references to top srcdir in Makefile
KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
KVM: arm64: selftest: Perform ISB before reading PAR_EL1
KVM: arm64: selftest: Add the missing .guest_prepare()
KVM: arm64: Always invalidate TLB for stage-2 permission faults
KVM: x86: Service NMI requests after PMI requests in VM-Enter path
KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
KVM: arm64: Refine _EL2 system register list that require trap reinjection
arm64: Add missing _EL2 encodings
arm64: Add missing _EL12 encodings
KVM: selftests: aarch64: vPMU test for validating user accesses
KVM: selftests: aarch64: vPMU register test for unimplemented counters
KVM: selftests: aarch64: vPMU register test for implemented counters
KVM: selftests: aarch64: Introduce vpmu_counter_access test
tools: Import arm_pmuv3.h
KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
...
The frozen SBI v2.0 specification defines the SBI debug console
(DBCN) extension which replaces the legacy SBI v0.1 console
functions namely sbi_console_getchar() and sbi_console_putchar().
The SBI DBCN extension needs to be emulated in the KVM user-space
(i.e. QEMU-KVM or KVMTOOL) so we forward SBI DBCN calls from KVM
guest to the KVM user-space which can then redirect the console
input/output to wherever it wants (e.g. telnet, file, stdio, etc).
The SBI debug console is simply a early console available to KVM
guest for early prints and it does not intend to replace the proper
console devices such as 8250, VirtIO console, etc.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zicond extension for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Add senvcfg context save/restore for guest VCPUs and also add it to the
ONE_REG interface to allow its access from user space.
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Configure hstateen0 register so that the AIA state and envcfg are
accessible to the vcpus. This includes registers such as siselect,
sireg, siph, sieh and all the IMISC registers.
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Pull more RISC-V updates from Palmer Dabbelt:
- The kernel now dynamically probes for misaligned access speed, as
opposed to relying on a table of known implementations.
- Support for non-coherent devices on systems using the Andes AX45MP
core, including the RZ/Five SoCs.
- Support for the V extension in ptrace(), again.
- Support for KASLR.
- Support for the BPF prog pack allocator in RISC-V.
- A handful of bug fixes and cleanups.
* tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits)
soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
bpf, riscv: use prog pack allocator in the BPF JIT
riscv: implement a memset like function for text
riscv: extend patch_text_nosync() for multiple pages
bpf: make bpf_prog_pack allocator portable
riscv: libstub: Implement KASLR by using generic functions
libstub: Fix compilation warning for rv32
arm64: libstub: Move KASLR handling functions to kaslr.c
riscv: Dump out kernel offset information on panic
riscv: Introduce virtual kernel mapping KASLR
RISC-V: Add ptrace support for vectors
soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
cache: Add L2 cache management for Andes AX45MP RISC-V core
dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
riscv: mm: dma-noncoherent: nonstandard cache operations support
riscv: errata: Add Andes alternative ports
riscv: asm: vendorid_list: Add Andes Technology to the vendors list
...
Pull kvm updates from Paolo Bonzini:
"ARM:
- Clean up vCPU targets, always returning generic v8 as the preferred
target
- Trap forwarding infrastructure for nested virtualization (used for
traps that are taken from an L2 guest and are needed by the L1
hypervisor)
- FEAT_TLBIRANGE support to only invalidate specific ranges of
addresses when collapsing a table PTE to a block PTE. This avoids
that the guest refills the TLBs again for addresses that aren't
covered by the table PTE.
- Fix vPMU issues related to handling of PMUver.
- Don't unnecessary align non-stack allocations in the EL2 VA space
- Drop HCR_VIRT_EXCP_MASK, which was never used...
- Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
parameter instead
- Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
- Remove prototypes without implementations
RISC-V:
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
s390:
- PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
- Guest debug fixes (Ilya)
x86:
- Clean up KVM's handling of Intel architectural events
- Intel bugfixes
- Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
debug registers and generate/handle #DBs
- Clean up LBR virtualization code
- Fix a bug where KVM fails to set the target pCPU during an IRTE
update
- Fix fatal bugs in SEV-ES intrahost migration
- Fix a bug where the recent (architecturally correct) change to
reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
skip it)
- Retry APIC map recalculation if a vCPU is added/enabled
- Overhaul emergency reboot code to bring SVM up to par with VMX, tie
the "emergency disabling" behavior to KVM actually being loaded,
and move all of the logic within KVM
- Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
TSC ratio MSR cannot diverge from the default when TSC scaling is
disabled up related code
- Add a framework to allow "caching" feature flags so that KVM can
check if the guest can use a feature without needing to search
guest CPUID
- Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
- Fix KVM's handling of !visible guest roots to avoid premature
triple fault injection
- Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
API surface that is needed by external users (currently only
KVMGT), and fix a variety of issues in the process
Generic:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
events to pass action specific data without needing to constantly
update the main handlers.
- Drop unused function declarations
Selftests:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts
to use printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
KVM: x86/mmu: Include mmu.h in spte.h
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Add helper to convert root hpa to shadow page
drm/i915/gvt: Drop final dependencies on KVM internal details
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Assert that correct locks are held for page write-tracking
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86: Remove the unused page-track hook track_flush_slot()
drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
...
This patch add back the ptrace support with the following fix:
- Define NT_RISCV_CSR and re-number NT_RISCV_VECTOR to prevent
conflicting with gdb's NT_RISCV_CSR.
- Use struct __riscv_v_regset_state to handle ptrace requests
Since gdb does not directly include the note description header in
Linux and has already defined NT_RISCV_CSR as 0x900, we decide to
sync with gdb and renumber NT_RISCV_VECTOR to solve and prevent future
conflicts.
Fixes: 0c59922c76 ("riscv: Add ptrace vector support")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com
[Palmer: Drop the unused "size" variable in riscv_vr_set().]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull RISC-V updates from Palmer Dabbelt:
- Support for the new "riscv,isa-extensions" and "riscv,isa-base"
device tree interfaces for probing extensions
- Support for userspace access to the performance counters
- Support for more instructions in kprobes
- Crash kernels can be allocated above 4GiB
- Support for KCFI
- Support for ELFs in !MMU configurations
- ARCH_KMALLOC_MINALIGN has been reduced to 8
- mmap() defaults to sv48-sized addresses, with longer addresses hidden
behind a hint (similar to Arm and Intel)
- Also various fixes and cleanups
* tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V
riscv: support PREEMPT_DYNAMIC with static keys
riscv: Move create_tmp_mapping() to init sections
riscv: Mark KASAN tmp* page tables variables as static
riscv: mm: use bitmap_zero() API
riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B
riscv: remove redundant mv instructions
RISC-V: mm: Document mmap changes
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Restrict address space for sv39,sv48,sv57
riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
riscv: allow kmalloc() caches aligned to the smallest value
riscv: support the elf-fdpic binfmt loader
binfmt_elf_fdpic: support 64-bit systems
riscv: Allow CONFIG_CFI_CLANG to be selected
riscv/purgatory: Disable CFI
riscv: Add CFI error handling
riscv: Add ftrace_stub_graph
riscv: Add types to indirectly called assembly functions
...
KVM/riscv changes for 6.6
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for Guest/VM
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
Greg Ungerer <gerg@kernel.org> says:
The following changes add the ability to run ELF format binaries when
running RISC-V in nommu mode. That support is actually part of the
ELF-FDPIC loader, so these changes are all about making that work on
RISC-V.
The first issue to deal with is making the ELF-FDPIC loader capable of
handling 64-bit ELF files. As coded right now it only supports 32-bit
ELF files.
Secondly some changes are required to enable and compile the ELF-FDPIC
loader on RISC-V and to pass the ELF-FDPIC mapping addresses through to
user space when execing the new program.
These changes have not been used to run actual ELF-FDPIC binaries.
It is used to load and run normal ELF - compiled -pie format. Though the
underlying changes are expected to work with full ELF-FDPIC binaries if
or when that is supported on RISC-V in gcc.
To avoid needing changes to the C-library (tested with uClibc-ng
currently) there is a simple runtime dynamic loader (interpreter)
available to do the final relocations, https://github.com/gregungerer/uldso.
The nice thing about doing it this way is that the same program
binary can also be loaded with the usual ELF loader in MMU linux.
The motivation here is to provide an easy to use alternative to the
flat format binaries normally used for RISC-V nommu based systems.
* b4-shazam-merge:
riscv: support the elf-fdpic binfmt loader
binfmt_elf_fdpic: support 64-bit systems
Link: https://lore.kernel.org/r/20230711130754.481209-1-gerg@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull RISC-V fixes from Palmer Dabbelt:
"This is obviously not ideal, particularly for something this late in
the cycle.
Unfortunately we found some uABI issues in the vector support while
reviewing the GDB port, which has triggered a revert -- probably a
good sign we should have reviewed GDB before merging this, I guess I
just dropped the ball because I was so worried about the context
extension and libc suff I forgot. Hence the late revert.
There's some risk here as we're still exposing the vector context for
signal handlers, but changing that would have meant reverting all of
the vector support. The issues we've found so far have been fixed
already and they weren't absolute showstoppers, so we're essentially
just playing it safe by holding ptrace support for another release (or
until we get through a proper userspace code review).
Summary:
- The vector ucontext extension has been extended with vlenb
- The vector registers ELF core dump note type has been changed to
avoid aliasing with the CSR type used in embedded systems
- Support for accessing vector registers via ptrace() has been
reverted
- Another build fix for the ISA spec changes around Zifencei/Zicsr
that manifests on some systems built with binutils-2.37 and
gcc-11.2"
* tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix build errors using binutils2.37 toolchains
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
RISC-V: Remove ptrace support for vectors
Add support for enabling and using the binfmt_elf_fdpic program loader
on RISC-V platforms. The most important change is to setup registers
during program load to pass the mapping addresses to the new process.
One of the interesting features of the elf-fdpic loader is that it
also allows appropriately compiled ELF format binaries to be loaded on
nommu systems. Appropriate being those compiled with -pie.
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230711130754.481209-3-gerg@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull asm-generic regression fix from Arnd Bergmann:
"Just one partial revert for a commit from the merge window that caused
annoying behavior when building old kernels on arm64 hosts"
* tag 'asm-generic-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: partially revert "Unify uapi bitsperlong.h for arm64, riscv and loongarch"
Unifying the asm-generic headers across 32-bit and 64-bit architectures
based on the compiler provided macros was a good idea and appears to work
with all user space, but it caused a regression when building old kernels
on systems that have the new headers installed in /usr/include, as this
combination trips an inconsistency in the kernel's own tools/include
headers that are a mix of userspace and kernel-internal headers.
This affects kernel builds on arm64, riscv64 and loongarch64 systems that
might end up using the "#define __BITS_PER_LONG 32" default from the old
tools headers. Backporting the commit into stable kernels would address
this, but it would still break building kernels without that backport,
and waste time for developers trying to understand the problem.
arm64 build machines are rather common, and on riscv64 this can also
happen in practice, but loongarch64 is probably new enough to not
be used much for building old kernels, so only revert the bits
for arm64 and riscv.
Link: https://lore.kernel.org/all/20230731160402.GB1823389@dev-arch.thelio-3990X/
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 8386f58f8d ("asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch")
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
KVM userspaces need to be aware of the host SATP to allow them to
advertise it back to the guest OS.
Since this information is used to build the guest FDT we can't wait for
the SATP reg to be readable. We just need to read the SATP mode, thus
we can use the existing 'satp_mode' global that represents the SATP reg
with MODE set and both ASID and PPN cleared. E.g. for a 32 bit host
running with sv32 satp_mode is 0x80000000, for a 64 bit host running
sv57 satp_mode is 0xa000000000000000, and so on.
Add a new userspace virtual config register 'satp_mode' to allow
userspace to read the current SATP mode the host is using with
GET_ONE_REG API before spinning the vcpu.
'satp_mode' can't be changed via KVM, so SET_ONE_REG is allowed as long
as userspace writes the existing 'satp_mode'.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zicntr, Zicsr, Zifencei, and Zihpm
extensions for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zba and Zbs extensions for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Currently, the ISA extension ONE_REG interface only allows enabling or
disabling one extension at a time. To improve this, we extend the ISA
extension ONE_REG interface (similar to SBI extension ONE_REG interface)
so that KVM user space can enable/disable multiple extensions in one
ioctl.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Pull more RISC-V updates from Palmer Dabbelt:
- A bunch of fixes/cleanups from the first part of the merge window,
mostly related to ACPI and vector as those were large
- Some documentation improvements, mostly related to the new code
- The "riscv,isa" DT key is deprecated
- Support for link-time dead code elimination
- Support for minor fault registration in userfaultd
- A handful of cleanups around CMO alternatives
* tag 'riscv-for-linus-6.5-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (23 commits)
riscv: mm: mark noncoherent_supported as __ro_after_init
riscv: mm: mark CBO relate initialization funcs as __init
riscv: errata: thead: only set cbom size & noncoherent during boot
riscv: Select HAVE_ARCH_USERFAULTFD_MINOR
RISC-V: Document the ISA string parsing rules for ACPI
risc-v: Fix order of IPI enablement vs RCU startup
mm: riscv: fix an unsafe pte read in huge_pte_alloc()
dt-bindings: riscv: deprecate riscv,isa
RISC-V: drop error print from riscv_hartid_to_cpuid()
riscv: Discard vector state on syscalls
riscv: move memblock_allow_resize() after linear mapping is ready
riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle
riscv: vdso: include vdso/vsyscall.h for vdso_data
selftests: Test RISC-V Vector's first-use handler
riscv: vector: clear V-reg in the first-use trap
riscv: vector: only enable interrupts in the first-use trap
RISC-V: Fix up some vector state related build failures
RISC-V: Document that V registers are clobbered on syscalls
riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD
riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
...
Pull asm-generic updates from Arnd Bergmann:
"These are cleanups for architecture specific header files:
- the comments in include/linux/syscalls.h have gone out of sync and
are really pointless, so these get removed
- The asm/bitsperlong.h header no longer needs to be architecture
specific on modern compilers, so use a generic version for newer
architectures that use new enough userspace compilers
- A cleanup for virt_to_pfn/virt_to_bus to have proper type checking,
forcing the use of pointers"
* tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: Remove file path comments from headers
tools arch: Remove uapi bitsperlong.h of hexagon and microblaze
asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
m68k/mm: Make pfn accessors static inlines
arm64: memory: Make virt_to_pfn() a static inline
ARM: mm: Make virt_to_pfn() a static inline
asm-generic/page.h: Make pfn accessors static inlines
xen/netback: Pass (void *) to virt_to_page()
netfs: Pass a pointer to virt_to_page()
cifs: Pass a pointer to virt_to_page() in cifsglob
cifs: Pass a pointer to virt_to_page()
riscv: mm: init: Pass a pointer to virt_to_page()
ARC: init: Pass a pointer to virt_to_pfn() in init
m68k: Pass a pointer to virt_to_pfn() virt_to_page()
fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of hugepage splitting in the
stage-2 fault path.
- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
with services that live in the Secure world. pKVM intervenes on
FF-A calls to guarantee the host doesn't misuse memory donated to
the hyp or a pKVM guest.
- Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
- Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set
configuration from userspace, but the intent is to relax this
limitation and allow userspace to select a feature set consistent
with the CPU.
- Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
- Use a separate set of pointer authentication keys for the
hypervisor when running in protected mode, as the host is untrusted
at runtime.
- Ensure timer IRQs are consistently released in the init failure
paths.
- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
Traps (FEAT_EVT), as it is a register commonly read from userspace.
- Erratum workaround for the upcoming AmpereOne part, which has
broken hardware A/D state management.
RISC-V:
- Redirect AMO load/store misaligned traps to KVM guest
- Trap-n-emulate AIA in-kernel irqchip for KVM guest
- Svnapot support for KVM Guest
s390:
- New uvdevice secret API
- CMM selftest and fixes
- fix racy access to target CPU for diag 9c
x86:
- Fix missing/incorrect #GP checks on ENCLS
- Use standard mmu_notifier hooks for handling APIC access page
- Drop now unnecessary TR/TSS load after VM-Exit on AMD
- Print more descriptive information about the status of SEV and
SEV-ES during module load
- Add a test for splitting and reconstituting hugepages during and
after dirty logging
- Add support for CPU pinning in demand paging test
- Add support for AMD PerfMonV2, with a variety of cleanups and minor
fixes included along the way
- Add a "nx_huge_pages=never" option to effectively avoid creating NX
hugepage recovery threads (because nx_huge_pages=off can be toggled
at runtime)
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
- Fix output of PIC poll command emulation when there's an interrupt
- Add a maintainer's handbook to document KVM x86 processes,
preferred coding style, testing expectations, etc.
- Misc cleanups, fixes and comments
Generic:
- Miscellaneous bugfixes and cleanups
Selftests:
- Generate dependency files so that partial rebuilds work as
expected"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
Documentation/process: Add a maintainer handbook for KVM x86
Documentation/process: Add a label for the tip tree handbook's coding style
KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
RISC-V: KVM: Remove unneeded semicolon
RISC-V: KVM: Allow Svnapot extension for Guest/VM
riscv: kvm: define vcpu_sbi_ext_pmu in header
RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel emulation of AIA APLIC
RISC-V: KVM: Implement device interface for AIA irqchip
RISC-V: KVM: Skeletal in-kernel AIA irqchip support
RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
RISC-V: KVM: Add APLIC related defines
RISC-V: KVM: Add IMSIC related defines
RISC-V: KVM: Implement guest external interrupt line management
KVM: x86: Remove PRIx* definitions as they are solely for user space
s390/uv: Update query for secret-UVCs
s390/uv: replace scnprintf with sysfs_emit
s390/uvdevice: Add 'Lock Secret Store' UVC
...
I get a few build failures along the lines of
./arch/riscv/include/uapi/asm/sigcontext.h:19:36: error: field ‘v_state’ has incomplete type
19 | struct __riscv_v_ext_state v_state;
| ^~~~~~~
./arch/riscv/include/uapi/asm/sigcontext.h:32:49: error: field ‘sc_extdesc’ has incomplete type
32 | struct __riscv_extra_ext_header sc_extdesc;
The V structures in question are defined for !assembly, so let's just do
so for the others.
Fixes: 8ee0b41898 ("riscv: signal: Add sigcontext save/restore for vector")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230619172101.18692-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Svnapot extension for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Evan Green <evan@rivosinc.com> says:
This change detects the presence of Zba, Zbb, and Zbs extensions and exports
them per-hart to userspace via the hwprobe mechanism. Glibc can then use
these in setting up hwcaps-based library search paths.
There's a little bit of extra housekeeping here: the first change adds
Zba and Zbs to the set of extensions the kernel recognizes, and the second
change starts tracking ISA features per-hart (in addition to the ANDed
mask of features across all harts which the kernel uses to make
decisions). Now that we track the ISA information per-hart, we could
even fix up /proc/cpuinfo to accurately report extension per-hart,
though I've left that out of this series for now.
* b4-shazam-merge:
RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
RISC-V: Track ISA extensions per hart
RISC-V: Add Zba, Zbs extension probing
Link: https://lore.kernel.org/r/20230509182504.2997252-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We expose IMSIC registers as KVM device attributes of the in-kernel
AIA irqchip device. This will allow KVM user-space to save/restore
IMISC state of each VCPU using KVM device ioctls().
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We expose APLIC registers as KVM device attributes of the in-kernel
AIA irqchip device. This will allow KVM user-space to save/restore
APLIC state using KVM device ioctls().
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We implement KVM device interface for in-kernel AIA irqchip so that
user-space can use KVM device ioctls to create, configure, and destroy
in-kernel AIA irqchip.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
To incrementally implement in-kernel AIA irqchip support, we first
add minimal skeletal support which only compiles but does not provide
any functionality.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The vector register belongs to the signal context. They need to be stored
and restored as entering and leaving the signal handler. According to the
V-extension specification, the maximum length of the vector registers can
be 2^16. Hence, if userspace refers to the MINSIGSTKSZ to create a
sigframe, it may not be enough. To resolve this problem, this patch refers
to the commit 94b07c1f8c
("arm64: signal: Report signal frame size to userspace via auxv") to enable
userspace to know the minimum required sigframe size through the auxiliary
vector and use it to allocate enough memory for signal context.
Note that auxv always reports size of the sigframe as if V exists for
all starting processes, whenever the kernel has CONFIG_RISCV_ISA_V. The
reason is that users usually reference this value to allocate an
alternative signal stack, and the user may use V anytime. So the user
must reserve a space for V-context in sigframe in case that the signal
handler invokes after the kernel allocating V.
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-16-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This patch facilitates the existing fp-reserved words for placement of
the first extension's context header on the user's sigframe. A context
header consists of a distinct magic word and the size, including the
header itself, of an extension on the stack. Then, the frame is followed
by the context of that extension, and then a header + context body for
another extension if exists. If there is no more extension to come, then
the frame must be ended with a null context header. A special case is
rv64gc, where the kernel support no extensions requiring to expose
additional regfile to the user. In such case the kernel would place the
null context header right after the first reserved word of
__riscv_q_ext_state when saving sigframe. And the kernel would check if
all reserved words are zeros when a signal handler returns.
__riscv_q_ext_state---->| |<-__riscv_extra_ext_header
~ ~
.reserved[0]--->|0 |<- .reserved
<-------|magic |<- .hdr
| |size |_______ end of sc_fpregs
| |ext-bdy|
| ~ ~
+)size ------->|magic |<- another context header
|size |
|ext-bdy|
~ ~
|magic:0|<- null context header
|size:0 |
The vector registers will be saved in datap pointer. The datap pointer
will be allocated dynamically when the task needs in kernel space. On
the other hand, datap pointer on the sigframe will be set right after
the __riscv_v_ext_state data structure.
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Suggested-by: Vineet Gupta <vineetg@rivosinc.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-15-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add vector state context struct to be added later in thread_struct. And
prepare low-level helper functions to save/restore vector contexts.
This include Vector Regfile and CSRs holding dynamic configuration state
(vstart, vl, vtype, vcsr). The Vec Register width could be implementation
defined, but same for all processes, so that is saved separately.
This is not yet wired into final thread_struct - will be done when
__switch_to actually starts doing this in later patches.
Given the variable (and potentially large) size of regfile, they are
saved in dynamically allocated memory, pointed to by datap pointer in
__riscv_v_ext_state.
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-10-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>