Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert the left-over
uses in the s390/topology code.
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390 sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during
setup_arch() to deactivate the "elfcorehdr= kernel" parameter,
resulting in is_kdump_kernel() returning "false".
During vmcore_init()->elfcorehdr_alloc(), if on a dump kernel and
allocation succeeded, elfcorehdr_addr will be set to a valid address
and is_kdump_kernel() will consequently return "true".
is_kdump_kernel() should return a consistent result during all boot
stages, and properly return "true" if in a kdump environment - just
like it is done on powerpc where "false" is indicated in fadump
environments, as added in commit b098f1c323 ("powerpc/fadump: make
is_kdump_kernel() return false when fadump is active").
Similarly provide a custom is_kdump_kernel() implementation that will only
return "true" in kdump environments, and will do so consistently during
boot.
Update the documentation of dump_available().
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/20241023090651.1115507-1-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
kernel_page_present() was intentionally not implemented when adding
ARCH_HAS_SET_DIRECT_MAP support, since it was only used for suspend/resume
which is not supported anymore on s390.
A new bpf use case led to a compile error specific to s390. Even though
this specific use case went away implement kernel_page_present(), so that
the API is complete and potential future users won't run into this problem.
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/045de961-ac69-40cc-b141-ab70ec9377ec@iogearbox.net
Fixes: 0490d6d7ba ("s390/mm: enable ARCH_HAS_SET_DIRECT_MAP")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Compiling the kernel with clang W=2 produces a warning that the
parameter declarations in some routines would shadow the definition of
the global variable stfle_fac_list. Address this warning by renaming the
parameters to fac_list.
Fixes: 17e89e1340 ("s390/facilities: move stfl information from lowcore to global data")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over
sprintf for presenting attributes to user space. Convert over two
left-over uses in the s390 PCI code.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Set PMU enabled bit after lpp() call. This ensures the proper
task PID is loaded by the llp instruction into Program Parameter
register from where it is copied to each sampling data buffer
(SDB) sample entry after the sampling was enabled.
The barrier() instruction is not needed as cpumsf_pmu_enable()
changes a CPU specific variable. Only the CPU that task is
running on changes structure members.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Ensure that VFs used in isolation, that is with their parent PF not
visible to the configuration but with their RID exposed, are treated
compatibly with existing isolated VF use cases without exposed RID
including RoCE Express VFs. This allows creating configurations where
one LPAR manages PFs while their child VFs are used by other LPARs. This
gives the LPAR managing the PFs a role analogous to that of the
hypervisor in a typical use case of passing child VFs to guests.
Instead of creating a multifunction struct zpci_bus whenever a PCI
function with RID exposed is discovered only create such a bus for
configured physical functions and only consider multifunction busses
when searching for an existing bus. Additionally only set zdev->devfn to
the devfn part of the RID once the function is added to a multifunction
bus.
This also fixes probing of more than 7 such isolated VFs from the same
physical bus. This is because common PCI code in pci_scan_slot() only
looks for more functions when pdev->multifunction is set which somewhat
counter intutively is not the case for VFs.
Note that PFs are looked at before their child VFs is guaranteed because
we sort the zpci_list by RID ascending.
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The newly introduced topology ID (TID) field in the CLP Query PCI
Function explicitly identifies groups of PCI functions whose RIDs belong
to the same (sub-)topology. When available use the TID instead of the
PCHID to match zPCI busses/domains for multi-function devices. Note that
currently only a single PCI bus per TID is supported. This change is
required because in future machines the PCHID will not identify a PCI
card but a specific port in the case of some multi-port NICs while from
a PCI point of view the entire card is a subtopology.
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Instead of relying on the observed but not architected firmware behavior
that PCI functions from the same card are listed in ascending RID order
in clp_list_pci() ensure this by sorting. To allow for sorting separate
the initial clp_list_pci() and creation of the virtual PCI busses.
Note that fundamentally in our per-PCI function hotplug design non RID
order of discovery is still possible. For example when the two PFs of
a two port NIC are hotplugged after initial boot and in descending RID
order. In this case the virtual PCI bus would be created by the second
PF using that PF's UID as domain number instead of that of the first PF.
Thus the domain number would then change from the UID of the second PF
to that of the first PF on reboot but there is really nothing we can do
about that since changing domain numbers at runtime seems even worse.
This only impacts the domain number as the RIDs are consistent and thus
even with just the second PF visible it will show up in the correct
position on the virtual bus.
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix pKVM error path on init, making sure we do not change critical
system registers as we're about to fail
- Make sure that the host's vector length is at capped by a value
common to all CPUs
- Fix kvm_has_feat*() handling of "negative" features, as the current
code is pretty broken
- Promote Joey to the status of official reviewer, while James steps
down -- hopefully only temporarly
x86:
- Fix compilation with KVM_INTEL=KVM_AMD=n
- Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use
Selftests:
- Fix compilation on non-x86 architectures"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86/reboot: emergency callbacks are now registered by common KVM code
KVM: x86: leave kvm.ko out of the build if no vendor module is requested
KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU
KVM: arm64: Fix kvm_has_feat*() handling of negative features
KVM: selftests: Fix build on architectures other than x86_64
KVM: arm64: Another reviewer reshuffle
KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM
KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
Pull powerpc fix from Michael Ellerman:
- Allow r30 to be used in vDSO code generation of getrandom
Thanks to Jason A. Donenfeld
* tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/vdso: allow r30 in vDSO code generation of getrandom
KVM/arm64 fixes for 6.12, take #1
- Fix pKVM error path on init, making sure we do not change critical
system registers as we're about to fail
- Make sure that the host's vector length is at capped by a value
common to all CPUs
- Fix kvm_has_feat*() handling of "negative" features, as the current
code is pretty broken
- Promote Joey to the status of official reviewer, while James steps
down -- hopefully only temporarly
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d8 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222 ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
It provides no functionality on its own and it is unnecessary unless one
of the vendor-specific module is compiled. In particular, /dev/kvm is
not created until one of kvm-intel.ko or kvm-amd.ko is loaded.
Use CONFIG_KVM to decide if it is built-in or a module, but use the
vendor-specific modules for the actual decision on whether to build it.
This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
are both disabled. The cpu_emergency_register_virt_callback() function
is called from kvm.ko, but it is only defined if at least one of
CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.
Fixes: 590b09b1d8 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull arm64 fixes from Catalin Marinas:
"A couple of build/config issues and expanding the speculative SSBS
workaround to more CPUs:
- Expand the speculative SSBS workaround to cover Cortex-A715,
Neoverse-N3 and Microsoft Azure Cobalt 100
- Force position-independent veneers - in some kernel configurations,
the LLD linker generates position-dependent veneers for otherwise
position-independent code, resulting in early boot-time failures
- Fix Kconfig selection of HAVE_DYNAMIC_FTRACE_WITH_ARGS so that it
is not enabled when not supported by the combination of clang and
GNU ld"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Subscribe Microsoft Azure Cobalt 100 to erratum 3194386
arm64: fix selection of HAVE_DYNAMIC_FTRACE_WITH_ARGS
arm64: errata: Expand speculative SSBS workaround once more
arm64: cputype: Add Neoverse-N3 definitions
arm64: Force position-independent veneers
Pull RISC-V fixes from Palmer Dabbelt:
- PERF_TYPE_BREAKPOINT now returns -EOPNOTSUPP instead of -ENOENT,
which aligns to other ports and is a saner value
- The KASAN-related stack size increasing logic has been moved to a C
header, to avoid dependency issues
* tag 'riscv-for-linus-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix kernel stack size when KASAN is enabled
drivers/perf: riscv: Align errno for unsupported perf event
Pull tracing fixes from Steven Rostedt:
- Fix tp_printk command line option crashing the kernel
With the code that can handle a buffer from a previous boot, the
trace_check_vprintf() needed access to the delta of the address space
used by the old buffer and the current buffer. To do so, the
trace_array (tr) parameter was used. But when tp_printk is enabled on
the kernel command line, no trace buffer is used and the trace event
is sent directly to printk(). That meant the tr field of the iterator
descriptor was NULL, and since tp_printk still uses
trace_check_vprintf() it caused a NULL dereference.
- Add ptrace.h include to x86 ftrace file for completeness
- Fix rtla installation when done with out-of-tree build
- Fix the help messages in rtla that were incorrect
- Several fixes to fix races with the timerlat and hwlat code
Several locking issues were discovered with the coordination between
timerlat kthread creation and hotplug. As timerlat has callbacks from
hotplug code to start kthreads when CPUs come online. There are also
locking issues with grabbing the cpu_read_lock() and the locks within
timerlat.
* tag 'trace-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/hwlat: Fix a race during cpuhp processing
tracing/timerlat: Fix a race during cpuhp processing
tracing/timerlat: Drop interface_lock in stop_kthread()
tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline
x86/ftrace: Include <asm/ptrace.h>
rtla: Fix the help text in osnoise and timerlat top tools
tools/rtla: Fix installation from out-of-tree build
tracing: Fix trace_check_vprintf() when tp_printk is used
As was tried in commit 4e103134b8 ("KVM: x86/mmu: Zap only the relevant
pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs,
need to be zapped. All of the accounting for a shadow page is tied to the
memslot, i.e. the shadow page holds a reference to the memslot, for all
intents and purposes. Deleting the memslot without removing all relevant
shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled,
results in NULL pointer derefs when tearing down the VM.
Reintroduce from that commit the code that walks the whole memslot when
there are active shadow MMU pages.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver reports that the kvm_has_feat() helper is not behaviing as
expected for negative feature. On investigation, the main issue
seems to be caused by the following construct:
#define get_idreg_field(kvm, id, fld) \
(id##_##fld##_SIGNED ? \
get_idreg_field_signed(kvm, id, fld) : \
get_idreg_field_unsigned(kvm, id, fld))
where one side of the expression evaluates as something signed,
and the other as something unsigned. In retrospect, this is totally
braindead, as the compiler converts this into an unsigned expression.
When compared to something that is 0, the test is simply elided.
Epic fail. Similar issue exists in the expand_field_sign() macro.
The correct way to handle this is to chose between signed and unsigned
comparisons, so that both sides of the ternary expression are of the
same type (bool).
In order to keep the code readable (sort of), we introduce new
comparison primitives taking an operator as a parameter, and
rewrite the kvm_has_feat*() helpers in terms of these primitives.
Fixes: c62d7a23b9 ("KVM: arm64: Add feature checking helpers")
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Tested-by: Oliver Upton <oliver.upton@linux.dev>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241002204239.2051637-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
Declarations local to arch/*/kernel/*.c are better off *not* in a public
header - arch/arc/kernel/unaligned.h is just fine for those
bits.
Unlike the parisc case, here we have an extra twist - asm/mmu.h
has an implicit dependency on struct pt_regs, and in some users
that used to be satisfied by include of asm/ptrace.h from
asm/unaligned.h (note that asm/mmu.h itself did _not_ pull asm/unaligned.h
- it relied upon the users having pulled asm/unaligned.h before asm/mmu.h
got there).
Seeing that asm/mmu.h only wants struct pt_regs * arguments in
an extern, just pre-declare it there - less brittle that way.
With that done _all_ asm/unaligned.h instances are reduced to include
of asm-generic/unaligned.h and can be removed - unaligned.h is in
mandatory-y in include/asm-generic/Kbuild.
What's more, we can move asm-generic/unaligned.h to linux/unaligned.h
and switch includes of <asm/unaligned.h> to <linux/unaligned.h>; that's
better off as an auto-generated commit, though, to be done by Linus
at -rc1 time next cycle.
Acked-by: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Declarations local to arch/*/kernel/*.c are better off *not* in a public
header - arch/parisc/kernel/unaligned.h is just fine for those
bits.
With that done parisc asm/unaligned.h is reduced to include
of asm-generic/unaligned.h and can be removed - unaligned.h is in
mandatory-y in include/asm-generic/Kbuild.
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When pKVM saves and restores the host floating point state on a SVE system,
it programs the vector length in ZCR_EL2.LEN to be whatever the maximum VL
for the PE is. But it uses a buffer allocated with kvm_host_sve_max_vl, the
maximum VL shared by all PEs in the system. This means that if we run on a
system where the maximum VLs are not consistent, we will overflow the buffer
on PEs which support larger VLs.
Since the host will not currently attempt to make use of non-shared VLs, fix
this by explicitly setting the EL2 VL to be the maximum shared VL when we
save and restore. This will enforce the limit on host VL usage. Should we
wish to support asymmetric VLs, this code will need to be updated along with
the required changes for the host:
https://lore.kernel.org/r/20240730-kvm-arm64-fix-pkvm-sve-vl-v6-0-cae8a2e0bd66@kernel.org
Fixes: b5b9955617 ("KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM")
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240912-kvm-arm64-limit-guest-vl-v2-1-dd2c29cb2ac9@kernel.org
[maz: added punctuation to the commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
The Kconfig logic to select HAVE_DYNAMIC_FTRACE_WITH_ARGS is incorrect,
and HAVE_DYNAMIC_FTRACE_WITH_ARGS may be selected when it is not
supported by the combination of clang and GNU LD, resulting in link-time
errors:
aarch64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.meminit.data' in mm/sparse.o] sections
aarch64-linux-gnu-ld: final link failed: bad value
... which can be seen when building with CC=clang using a binutils
version older than 2.36.
We originally fixed that in commit:
45bd895180 ("arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang")
... by splitting the "select HAVE_DYNAMIC_FTRACE_WITH_ARGS" statement
into separete CLANG_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS and
GCC_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS options which individually select
HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Subsequently we accidentally re-introduced the common "select
HAVE_DYNAMIC_FTRACE_WITH_ARGS" statement in commit:
26299b3f6b ("ftrace: arm64: move from REGS to ARGS")
... then we removed it again in commit:
68a63a412d ("arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y")
... then we accidentally re-introduced it again in commit:
2aa6ac0351 ("arm64: ftrace: Add direct call support")
Fix this for the third time by keeping the unified select statement and
making this depend onf either GCC_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS or
CLANG_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS. This is more consistent with
usual style and less likely to go wrong in future.
Fixes: 2aa6ac0351 ("arm64: ftrace: Add direct call support")
Cc: <stable@vger.kernel.org> # 6.4.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240930120448.3352564-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
A number of Arm Ltd CPUs suffer from errata whereby an MSR to the SSBS
special-purpose register does not affect subsequent speculative
instructions, permitting speculative store bypassing for a window of
time.
We worked around this for a number of CPUs in commits:
* 7187bb7d0b ("arm64: errata: Add workaround for Arm errata 3194386 and 3312417")
* 75b3c43eab ("arm64: errata: Expand speculative SSBS workaround")
* 145502cac7ea70b5 ("arm64: errata: Expand speculative SSBS workaround (again)")
Since then, a (hopefully final) batch of updates have been published,
with two more affected CPUs. For the affected CPUs the existing
mitigation is sufficient, as described in their respective Software
Developer Errata Notice (SDEN) documents:
* Cortex-A715 (MP148) SDEN v15.0, erratum 3456084https://developer.arm.com/documentation/SDEN-2148827/1500/
* Neoverse-N3 (MP195) SDEN v5.0, erratum 3456111
https://developer.arm.com/documentation/SDEN-3050973/0500/
Enable the existing mitigation by adding the relevant MIDRs to
erratum_spec_ssbs_list, and update silicon-errata.rst and the
Kconfig text accordingly.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240930111705.3352047-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Certain portions of code always need to be position-independent
regardless of CONFIG_RELOCATABLE, including code which is executed in an
idmap or which is executed before relocations are applied. In some
kernel configurations the LLD linker generates position-dependent
veneers for such code, and when executed these result in early boot-time
failures.
Marc Zyngier encountered a boot failure resulting from this when
building a (particularly cursed) configuration with LLVM, as he reported
to the list:
https://lore.kernel.org/linux-arm-kernel/86wmjwvatn.wl-maz@kernel.org/
In Marc's kernel configuration, the .head.text and .rodata.text sections
end up more than 128MiB apart, requiring a veneer to branch between the
two:
| [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w _text
| ffff800080000000 g .head.text 0000000000000000 _text
| [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w primary_entry
| ffff8000889df0e0 g .rodata.text 000000000000006c primary_entry,
... consequently, LLD inserts a position-dependent veneer for the branch
from _stext (in .head.text) to primary_entry (in .rodata.text):
| ffff800080000000 <_text>:
| ffff800080000000: fa405a4d ccmp x18, #0x0, #0xd, pl // pl = nfrst
| ffff800080000004: 14003fff b ffff800080010000 <__AArch64AbsLongThunk_primary_entry>
...
| ffff800080010000 <__AArch64AbsLongThunk_primary_entry>:
| ffff800080010000: 58000050 ldr x16, ffff800080010008 <__AArch64AbsLongThunk_primary_entry+0x8>
| ffff800080010004: d61f0200 br x16
| ffff800080010008: 889df0e0 .word 0x889df0e0
| ffff80008001000c: ffff8000 .word 0xffff8000
... and as this is executed early in boot before the kernel is mapped in
TTBR1 this results in a silent boot failure.
Fix this by passing '--pic-veneer' to the linker, which will cause the
linker to use position-independent veneers, e.g.
| ffff800080000000 <_text>:
| ffff800080000000: fa405a4d ccmp x18, #0x0, #0xd, pl // pl = nfrst
| ffff800080000004: 14003fff b ffff800080010000 <__AArch64ADRPThunk_primary_entry>
...
| ffff800080010000 <__AArch64ADRPThunk_primary_entry>:
| ffff800080010000: f004e3f0 adrp x16, ffff800089c8f000 <__idmap_text_start>
| ffff800080010004: 91038210 add x16, x16, #0xe0
| ffff800080010008: d61f0200 br x16
I've opted to pass '--pic-veneer' unconditionally, as:
* In addition to solving the boot failure, these sequences are generally
nicer as they require fewer instructions and don't need to perform
data accesses.
* While the position-independent veneer sequences have a limited +/-2GiB
range, this is not a new restriction. Even kernels built with
CONFIG_RELOCATABLE=n are limited to 2GiB in size as we have several
structues using 32-bit relative offsets and PPREL32 relocations, which
are similarly limited to +/-2GiB in range. These include extable
entries, jump table entries, and alt_instr entries.
* GNU LD defaults to using position-independent veneers, and supports
the same '--pic-veneer' option, so this change is not expected to
adversely affect GNU LD.
I've tested with GNU LD 2.30 to 2.42 inclusive and LLVM 13.0.1 to 19.1.0
inclusive, using the kernel.org binaries from:
* https://mirrors.edge.kernel.org/pub/tools/crosstool/
* https://mirrors.edge.kernel.org/pub/tools/llvm/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Marc Zyngier <maz@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240927101838.3061054-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For gettimeofday, -ffixed-r30 was passed to work around a bug in Go
code, where the vDSO trampoline forgot to save and restore this register
across function calls. But Go requires a different trampoline for every
call, and there's no reason that new Go code needs to be broken and add
more bugs. So remove -ffixed-r30 for getrandom.
Fixes: 8072b39c3a ("powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240925175021.1526936-2-Jason@zx2c4.com
The cpu_emergency_register_virt_callback() function is used
unconditionally by the x86 kvm code, but it is declared (and defined)
conditionally:
#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
...
leading to a build error when neither KVM_INTEL nor KVM_AMD support is
enabled:
arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’:
arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration]
12517 | cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’:
arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration]
12522 | cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix the build by defining empty helper functions the same way the old
cpu_emergency_disable_virtualization() function was dealt with for the
same situation.
Maybe we could instead have made the call sites conditional, since the
callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak
fallback. I'll leave that to the kvm people to argue about, this at
least gets the build going for that particular config.
Fixes: 590b09b1d8 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Chao Gao <chao.gao@intel.com>
Cc: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes from Ingo Molnar:
"Fix TDX MMIO #VE fault handling, and add two new Intel model numbers
for 'Pantherlake' and 'Diamond Rapids'"
* tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add two Intel CPU model numbers
x86/tdx: Fix "in-kernel MMIO" check
Pull x86 kvm updates from Paolo Bonzini:
"x86:
- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.
This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.
Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.
- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)
- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs
This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset
- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler
- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit
- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)
- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU
- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB
- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area
- Remove unnecessary accounting of temporary nested VMCB related
allocations
- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2
- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures
- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)
- Minor SGX fix and a cleanup
- Misc cleanups
Generic:
- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed
- Enable virtualization when KVM is loaded, not right before the
first VM is created
Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled
- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase
- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs
Selftests:
- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest
- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries
- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath
- Misc cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...
Pull more s390 updates from Vasily Gorbik:
- Clean up and improve vdso code: use SYM_* macros for function and
data annotations, add CFI annotations to fix GDB unwinding, optimize
the chacha20 implementation
- Add vfio-ap driver feature advertisement for use by libvirt and
mdevctl
* tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Driver feature advertisement
s390/vdso: Use one large alternative instead of an alternative branch
s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects
tools: Add additional SYM_*() stubs to linkage.h
s390/vdso: Use macros for annotation of asm functions
s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack()
s390/vdso: Fix comment within __arch_chacha20_blocks_nostack()
s390/vdso: Get rid of permutation constants
Pull UML updates from Richard Weinberger:
- Removal of dead code (TT mode leftovers, etc)
- Fixes for the network vector driver
- Fixes for time-travel mode
* tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: fix time-travel syscall scheduling hack
um: Remove outdated asm/sysrq.h header
um: Remove the declaration of user_thread function
um: Remove the call to SUBARCH_EXECVE1 macro
um: Remove unused mm_fd field from mm_id
um: Remove unused fields from thread_struct
um: Remove the redundant newpage check in update_pte_range
um: Remove unused kpte_clear_flush macro
um: Remove obsoleted declaration for execute_syscall_skas
user_mode_linux_howto_v2: add VDE vector support in doc
vector_user: add VDE support
um: remove ARCH_NO_PREEMPT_DYNAMIC
um: vector: Fix NAPI budget handling
um: vector: Replace locks guarding queue depth with atomics
um: remove variable stack array in os_rcv_fd_msg()
Pull more random number generator updates from Jason Donenfeld:
- Christophe realized that the LoongArch64 instructions could be
scheduled more similar to how GCC generates code, which Ruoyao
implemented, for a 5% speedup from basically some rearrangements
- An update to MAINTAINERS to match the right files
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
LoongArch: vDSO: Tune chacha implementation
MAINTAINERS: make vDSO getrandom matches more generic
Pull LoongArch updates from Huacai Chen:
- Fix objtool about do_syscall() and Clang
- Enable generic CPU vulnerabilites support
- Enable ACPI BGRT handling
- Rework CPU feature probe from CPUCFG/IOCSR
- Add ARCH_HAS_SET_MEMORY support
- Add ARCH_HAS_SET_DIRECT_MAP support
- Improve hardware page table walker
- Simplify _percpu_read() and _percpu_write()
- Add advanced extended IRQ model documentions
- Some bug fixes and other small changes
* tag 'loongarch-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
Docs/LoongArch: Add advanced extended IRQ model description
LoongArch: Remove posix_types.h include from sigcontext.h
LoongArch: Fix memleak in pci_acpi_scan_root()
LoongArch: Simplify _percpu_read() and _percpu_write()
LoongArch: Improve hardware page table walker
LoongArch: Add ARCH_HAS_SET_DIRECT_MAP support
LoongArch: Add ARCH_HAS_SET_MEMORY support
LoongArch: Rework CPU feature probe from CPUCFG/IOCSR
LoongArch: Enable ACPI BGRT handling
LoongArch: Enable generic CPU vulnerabilites support
LoongArch: Remove STACK_FRAME_NON_STANDARD(do_syscall)
LoongArch: Set AS_HAS_THIN_ADD_SUB as y if AS_IS_LLVM
LoongArch: Enable objtool for Clang
objtool: Handle frame pointer related instructions
Pull sh updates from John Paul Adrian Glaubitz:
"The first change by Gaosheng Cui removes unused declarations which
have been obsoleted since commit 5a4053b232 ("sh: Kill off dead
boards.") and the second by his colleague Hongbo Li replaces the use
of the unsafe simple_strtoul() with the safer kstrtoul() function in
the sh interrupt controller driver code"
* tag 'sh-for-v6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: intc: Replace simple_strtoul() with kstrtoul()
sh: Remove unused declarations for make_maskreg_irq() and irq_mask_register
Pull more xen updates from Juergen Gross:
"A second round of Xen related changes and features:
- a small fix of the xen-pciback driver for a warning issued by
sparse
- support PCI passthrough when using a PVH dom0
- enable loading the kernel in PVH mode at arbitrary addresses,
avoiding conflicts with the memory map when running as a Xen dom0
using the host memory layout"
* tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pvh: Add 64bit relocation page tables
x86/kernel: Move page table macros to header
x86/pvh: Set phys_base when calling xen_prepare_pvh()
x86/pvh: Make PVH entrypoint PIC for x86-64
xen: sync elfnote.h from xen tree
xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t
xen/privcmd: Add new syscall to get gsi from dev
xen/pvh: Setup gsi for passthrough device
xen/pci: Add a function to reset device for xen
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a bug in the LLVM implementation of KASAN and GCOV that makes
these options incompatible with the CFI_ICALL_NORMALIZE_INTEGERS option.
The bug has already been fixed in llvm/clang [1] and rustc [2]. However,
Kconfig currently has no way to gate features on the LLVM version inside
rustc, so we cannot write down a precise `depends on` clause in this
case. Instead, a `def_bool` option is defined for whether
CFI_ICALL_NORMALIZE_INTEGERS is available, and its default value is set
to false when GCOV or KASAN are turned on. End users using a patched
clang/rustc can turn on the HAVE_CFI_ICALL_NORMALIZE_INTEGERS option
directly to override this.
An alternative solution is to inspect a binary created by clang or rustc
to see whether the faulty CFI tags are in the binary. This would be a
precise check, but it would involve hard-coding the *hashed* version of
the CFI tag. This is because there's no way to get clang or rustc to
output the unhased version of the CFI tag. Relying on the precise
hashing algorithm using by CFI seems too fragile, so I have not pursued
this option. Besides, this kind of hack is exactly what lead to the LLVM
bug in the first place.
If the CFI_ICALL_NORMALIZE_INTEGERS option is used without CONFIG_RUST,
then we actually can perform a precise check today: just compare the
clang version number. This works since clang and llvm are always updated
in lockstep. However, encoding this in Kconfig would give the
HAVE_CFI_ICALL_NORMALIZE_INTEGERS option a dependency on CONFIG_RUST,
which is not possible as the reverse dependency already exists.
HAVE_CFI_ICALL_NORMALIZE_INTEGERS is defined to be a `def_bool` instead
of `bool` to avoid asking end users whether they want to turn on the
option. Turning it on explicitly is something only experts should do, so
making it hard to do so is not an issue.
I added a `depends on CFI_CLANG` clause to the new Kconfig option. I'm
not sure whether that makes sense or not, but it doesn't seem to make a
big difference.
In a future kernel release, I would like to add a Kconfig option similar
to CLANG_VERSION/RUSTC_VERSION for inspecting the version of the LLVM
inside rustc. Once that feature lands, this logic will be replaced with
a precise version check. This check is not being introduced here to
avoid introducing a new _VERSION constant in a fix.
Link: https://github.com/llvm/llvm-project/pull/104826 [1]
Link: https://github.com/rust-lang/rust/pull/129373 [2]
Fixes: ce4a262098 ("cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409231044.4f064459-oliver.sang@intel.com
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20240925-cfi-norm-kasan-fix-v1-1-0328985cdf33@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>