Pull x86 fixes from Ingo Molnar:
"Fix a kprobes bug, plus add a new Intel model number to the upstream
<asm/intel-family.h> header for drivers to use"
* tag 'x86-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Lunar Lake M
x86/kprobes: Fix 1 byte conditional jump target
The uncore subsystem for Meteor Lake is similar to the previous Alder
Lake. The main difference is that MTL provides PMU support for different
tiles, while ADL only provides PMU support for the whole package. On
ADL, there are CBOX, ARB, and clockbox uncore PMON units. On MTL, they
are split into CBOX/HAC_CBOX, ARB/HAC_ARB, and cncu/sncu which provides
a fixed counter for clockticks. Also, new MSR addresses are introduced
on MTL.
The IMC uncore PMON is the same as Alder Lake. Add new PCIIDs of IMC for
Meteor Lake.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230210190238.1726237-1-kan.liang@linux.intel.com
Some of Nano series processors will lead GP when accessing
PMC fixed counter. Meanwhile, their hardware support for PMC
has not announced externally. So exclude Nano CPUs from ZXC
by checking stepping information. This is an unambiguous way
to differentiate between ZXC and Nano CPUs.
Following are Nano and ZXC FMS information:
Nano FMS: Family=6, Model=F, Stepping=[0-A][C-D]
ZXC FMS: Family=6, Model=F, Stepping=E-F OR
Family=6, Model=0x19, Stepping=0-3
Fixes: 3a4ac121c2 ("x86/perf: Add hardware performance events support for Zhaoxin CPU.")
Reported-by: Arjan <8vvbbqzo567a@nospam.xutrox.com>
Reported-by: Kevin Brace <kevinbrace@gmx.com>
Signed-off-by: silviazhao <silviazhao-oc@zhaoxin.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
The time order is incorrect when the TSC in a PEBS record is used.
$perf record -e cycles:upp dd if=/dev/zero of=/dev/null
count=10000
$ perf script --show-task-events
perf-exec 0 0.000000: PERF_RECORD_COMM: perf-exec:915/915
dd 915 106.479872: PERF_RECORD_COMM exec: dd:915/915
dd 915 106.483270: PERF_RECORD_EXIT(915:915):(914:914)
dd 915 106.512429: 1 cycles:upp:
ffffffff96c011b7 [unknown] ([unknown])
... ...
The perf time is from sched_clock_cpu(). The current PEBS code
unconditionally convert the TSC to native_sched_clock(). There is a
shift between the two clocks. If the TSC is stable, the shift is
consistent, __sched_clock_offset. If the TSC is unstable, the shift has
to be calculated at runtime.
This patch doesn't support the conversion when the TSC is unstable. The
TSC unstable case is a corner case and very unlikely to happen. If it
happens, the TSC in a PEBS record will be dropped and fall back to
perf_event_clock().
Fixes: 47a3aeb39e ("perf/x86/intel/pebs: Fix PEBS timestamps overwritten")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/CAM9d7cgWDVAq8-11RbJ2uGfwkKD6fA-OMwOKDrNUrU_=8MgEjg@mail.gmail.com/
So far we were flushing almost the entire universe whenever a VM would
load/unload the SCTLR_EL1 and the two versions of that register had
different MMU enabled settings. This turned out to be so slow that it
prevented forward progress for a nested VM, because a scheduler timer
tick interrupt would always be pending when we reached the nested VM.
To avoid this problem, we consider the SCTLR_EL2 when evaluating if
caches are on or off when entering virtual EL2 (because this is the
value that we end up shadowing onto the hardware EL1 register).
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230209175820.1939006-19-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.
One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
bit. Therefore, add a handler for it and don't treat it as a
non-trap-registers when preparing a shadow context.
These registers, being only a view on their EL1 counterpart, are
permanently hidden from userspace.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL12_REG(), register visibility]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230209175820.1939006-17-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
So far, we never needed to distinguish between registers hidden
from userspace and being hidden from a guest (they are always
either visible to both, or hidden from both).
With NV, we have the ugly case of the EL02 and EL12 registers,
which are only a view on the EL0 and EL1 registers. It makes
absolutely no sense to expose them to userspace, since it
already has the canonical view.
Add a new visibility flag (REG_HIDDEN_USER) and a new helper that
checks for it and REG_HIDDEN when checking whether to expose
a sysreg to userspace. Subsequent patches will make use of it.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230209175820.1939006-16-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
to the guest and vice versa when taking an exception to the hypervisor,
because we emulate virtual EL2 in EL1 and therefore have to translate
the mode field from EL2 to EL1 and vice versa.
This requires keeping track of the state we enter the guest, for which
we transiently use a dedicated flag.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230209175820.1939006-15-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
completed, which happens further down the line...
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230209175820.1939006-14-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Non-nested guests have used the hvc instruction to initiate SMCCC
calls into KVM. This is quite a poor fit for NV as hvc exceptions are
always taken to EL2. In other words, KVM needs to unconditionally
forward the hvc exception back into vEL2 to uphold the architecture.
Instead, treat the smc instruction from vEL2 as we would a guest
hypercall, thereby allowing the vEL2 to interact with KVM's hypercall
surface. Note that on NV-capable hardware HCR_EL2.TSC causes smc
instructions executed in non-secure EL1 to trap to EL2, even if EL3 is
not implemented.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230209175820.1939006-13-maz@kernel.org
[Oliver: redo commit message, only handle smc from vEL2]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Do the feature check as the very first thing in the function. Everything
else comes after that and is meaningless work if the TSC CPUID bit is
not even set. Switch to cpu_feature_enabled() too, while at it.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
A quick search doesn't reveal any use outside of the kernel - which
would be questionable to begin with anyway - so make the export GPL
only.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
instructions that trap to EL2 with the NV bit were undef at EL1 prior to
ARM v8.3. The only instruction that was not undef is eret.
This patch sets up a handler for EL2 registers and SP_EL1 register
accesses at EL1. The host hypervisor keeps those register values in
memory, and will emulate their behavior.
This patch doesn't set the NV bit yet. It will be set in a later patch
once nested virtualization support is completed.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL2_REG() macros]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230209175820.1939006-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Most of our S2 helpers take a kvm_s2_mmu pointer, but quickly
revert back to using the kvm structure. By doing so, we lose
track of which S2 MMU context we were initially using, and fallback
to the "canonical" context.
If we were trying to unmap a S2 context managed by a guest hypervisor,
we end-up parsing the wrong set of page tables, and bad stuff happens
(as this is often happening on the back of a trapped TLBI from the
guest hypervisor).
Instead, make sure we always use the provided MMU context all the way.
This has no impact on non-NV, as we always pass the canonical MMU
context.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20230209175820.1939006-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
====================
pull-request: bpf-next 2023-02-11
We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).
There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad2 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:
[...]
ice_set_netdev_features(netdev);
netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_XSK_ZEROCOPY;
ice_set_ops(netdev);
[...]
Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/
The main changes are:
1) Add support for BPF trampoline on s390x which finally allows to remove many
test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.
2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.
3) Add capability to export the XDP features supported by the NIC.
Along with that, add a XDP compliance test tool,
from Lorenzo Bianconi & Marek Majtyka.
4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
from David Vernet.
5) Add a deep dive documentation about the verifier's register
liveness tracking algorithm, from Eduard Zingerman.
6) Fix and follow-up cleanups for resolve_btfids to be compiled
as a host program to avoid cross compile issues,
from Jiri Olsa & Ian Rogers.
7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
when testing on different NICs, from Jesper Dangaard Brouer.
8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.
9) Extend libbpf to add an option for when the perf buffer should
wake up, from Jon Doron.
10) Follow-up fix on xdp_metadata selftest to just consume on TX
completion, from Stanislav Fomichev.
11) Extend the kfuncs.rst document with description on kfunc
lifecycle & stability expectations, from David Vernet.
12) Fix bpftool prog profile to skip attaching to offline CPUs,
from Tonghao Zhang.
====================
Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Match the x86 implementation to improve build errors.
Noticed when building allyesconfig.
e.g.
../arch/um/include/asm/processor-generic.h:94:19: error: called object is not a function or function pointer
94 | #define cpu_data (&boot_cpu_data)
| ~^~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:2157:16: note: in expansion of macro ‘cpu_data’
2157 | return cpu_data(first_cpu_of_numa_node).apicid;
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The 'syscall' instruction clobbers '%rcx' and '%r11', but they are not
listed in the inline Assembly that performs the syscall instruction.
No real bug is found. It wasn't buggy by luck because '%rcx' and '%r11'
are caller-saved registers, and not used in the functions, and the
functions are never inlined.
Add them to the clobber list for code correctness.
Fixes: f1c2bb8b99 ("um: implement a x86_64 vDSO")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
CONFIG_RUST currently supports x86_64, but does not support it under
UML. With the previous patches applied, adding support is trivial:
add CONFIG_HAVE_RUST to UML if X86_64 is set.
The scripts/generate_rust_target.rs file already checks for
CONFIG_X86_64, not CONFIG_X86, so is prepared for UML support.
The Rust support does not currently support X86_32.
Also, update the Rust architecture support documentation to not that
this is being maintained: I intend to look after this as best I can.
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Sergio González Collado <sergio.collado@gmail.com>
Tested-by: Sergio González Collado <sergio.collado@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The kernel disables all SSE and similar FP/SIMD instructions on
x86-based architectures (partly because we shouldn't be using floats in
the kernel, and partly to avoid the need for stack alignment, see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 )
UML does not do the same thing, which isn't in itself a problem, but
does add to the list of differences between UML and "normal" x86 builds.
In addition, there was a crash bug with LLVM < 15 / rustc < 1.65 when
building with SSE, so disabling it fixes rust builds with earlier
compiler versions, see:
https://github.com/Rust-for-Linux/linux/pull/881
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Sergio González Collado <sergio.collado@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
UML expects a position independent executable for some reason, so tell
rustc to generate pie objects. Otherwise we get a bunch of relocations
we can't deal with in libcore.
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
* for-next/signal:
: Signal handling cleanups
arm64/signal: Only read new data when parsing the ZT context
arm64/signal: Only read new data when parsing the ZA context
arm64/signal: Only read new data when parsing the SVE context
arm64/signal: Avoid rereading context frame sizes
arm64/signal: Make interface for restore_fpsimd_context() consistent
arm64/signal: Remove redundant size validation from parse_user_sigframe()
arm64/signal: Don't redundantly verify FPSIMD magic
* for-next/sysreg-hwcaps:
: Make use of sysreg helpers for hwcaps
arm64/cpufeature: Use helper macros to specify hwcaps
arm64/cpufeature: Always use symbolic name for feature value in hwcaps
arm64/sysreg: Initial unsigned annotations for ID registers
arm64/sysreg: Initial annotation of signed ID registers
arm64/sysreg: Allow enumerations to be declared as signed or unsigned
* arm64/for-next/perf:
perf: arm_spe: Print the version of SPE detected
perf: arm_spe: Add support for SPEv1.2 inverted event filtering
perf: Add perf_event_attr::config3
drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
perf: arm_spe: Support new SPEv1.2/v8.7 'not taken' event
perf: arm_spe: Use new PMSIDR_EL1 register enums
perf: arm_spe: Drop BIT() and use FIELD_GET/PREP accessors
arm64/sysreg: Convert SPE registers to automatic generation
arm64: Drop SYS_ from SPE register defines
perf: arm_spe: Use feature numbering for PMSEVFR_EL1 defines
perf/marvell: Add ACPI support to TAD uncore driver
perf/marvell: Add ACPI support to DDR uncore driver
perf/arm-cmn: Reset DTM_PMU_CONFIG at probe
drivers/perf: hisi: Extract initialization of "cpa_pmu->pmu"
drivers/perf: hisi: Simplify the parameters of hisi_pmu_init()
drivers/perf: hisi: Advertise the PERF_PMU_CAP_NO_EXCLUDE capability
* for-next/sysreg:
: arm64 sysreg and cpufeature fixes/updates
KVM: arm64: Use symbolic definition for ISR_EL1.A
arm64/sysreg: Add definition of ISR_EL1
arm64/sysreg: Add definition for ICC_NMIAR1_EL1
arm64/cpufeature: Remove 4 bit assumption in ARM64_FEATURE_MASK()
arm64/sysreg: Fix errors in 32 bit enumeration values
arm64/cpufeature: Fix field sign for DIT hwcap detection
* for-next/sme:
: SME-related updates
arm64/sme: Optimise SME exit on syscall entry
arm64/sme: Don't use streaming mode to probe the maximum SME VL
arm64/ptrace: Use system_supports_tpidr2() to check for TPIDR2 support
* for-next/kselftest: (23 commits)
: arm64 kselftest fixes and improvements
kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
kselftest/arm64: Copy whole EXTRA context
kselftest/arm64: Fix enumeration of systems without 128 bit SME for SSVE+ZA
kselftest/arm64: Fix enumeration of systems without 128 bit SME
kselftest/arm64: Don't require FA64 for streaming SVE tests
kselftest/arm64: Limit the maximum VL we try to set via ptrace
kselftest/arm64: Correct buffer size for SME ZA storage
kselftest/arm64: Remove the local NUM_VL definition
kselftest/arm64: Verify simultaneous SSVE and ZA context generation
kselftest/arm64: Verify that SSVE signal context has SVE_SIG_FLAG_SM set
kselftest/arm64: Remove spurious comment from MTE test Makefile
kselftest/arm64: Support build of MTE tests with clang
kselftest/arm64: Initialise current at build time in signal tests
kselftest/arm64: Don't pass headers to the compiler as source
kselftest/arm64: Remove redundant _start labels from FP tests
kselftest/arm64: Fix .pushsection for strings in FP tests
kselftest/arm64: Run BTI selftests on systems without BTI
kselftest/arm64: Fix test numbering when skipping tests
kselftest/arm64: Skip non-power of 2 SVE vector lengths in fp-stress
kselftest/arm64: Only enumerate power of two VLs in syscall-abi
...
* for-next/misc:
: Miscellaneous arm64 updates
arm64/mm: Intercept pfn changes in set_pte_at()
Documentation: arm64: correct spelling
arm64: traps: attempt to dump all instructions
arm64: Apply dynamic shadow call stack patching in two passes
arm64: el2_setup.h: fix spelling typo in comments
arm64: Kconfig: fix spelling
arm64: cpufeature: Use kstrtobool() instead of strtobool()
arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
arm64: make ARCH_FORCE_MAX_ORDER selectable
* for-next/sme2: (23 commits)
: Support for arm64 SME 2 and 2.1
arm64/sme: Fix __finalise_el2 SMEver check
kselftest/arm64: Remove redundant _start labels from zt-test
kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
kselftest/arm64: Add coverage of the ZT ptrace regset
kselftest/arm64: Add SME2 coverage to syscall-abi
kselftest/arm64: Add test coverage for ZT register signal frames
kselftest/arm64: Teach the generic signal context validation about ZT
kselftest/arm64: Enumerate SME2 in the signal test utility code
kselftest/arm64: Cover ZT in the FP stress test
kselftest/arm64: Add a stress test program for ZT0
arm64/sme: Add hwcaps for SME 2 and 2.1 features
arm64/sme: Implement ZT0 ptrace support
arm64/sme: Implement signal handling for ZT
arm64/sme: Implement context switching for ZT0
arm64/sme: Provide storage for ZT0
arm64/sme: Add basic enumeration for SME2
arm64/sme: Enable host kernel to access ZT0
arm64/sme: Manually encode ZT0 load and store instructions
arm64/esr: Document ISS for ZT0 being disabled
arm64/sme: Document SME 2 and SME 2.1 ABI
...
* for-next/tpidr2:
: Include TPIDR2 in the signal context
kselftest/arm64: Add test case for TPIDR2 signal frame records
kselftest/arm64: Add TPIDR2 to the set of known signal context records
arm64/signal: Include TPIDR2 in the signal context
arm64/sme: Document ABI for TPIDR2 signal information
* for-next/scs:
: arm64: harden shadow call stack pointer handling
arm64: Stash shadow stack pointer in the task struct on interrupt
arm64: Always load shadow stack pointer directly from the task struct
* for-next/compat-hwcap:
: arm64: Expose compat ARMv8 AArch32 features (HWCAPs)
arm64: Add compat hwcap SSBS
arm64: Add compat hwcap SB
arm64: Add compat hwcap I8MM
arm64: Add compat hwcap ASIMDBF16
arm64: Add compat hwcap ASIMDFHM
arm64: Add compat hwcap ASIMDDP
arm64: Add compat hwcap FPHP and ASIMDHP
* for-next/ftrace:
: Add arm64 support for DYNAMICE_FTRACE_WITH_CALL_OPS
arm64: avoid executing padding bytes during kexec / hibernation
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
arm64: ftrace: Update stale comment
arm64: patching: Add aarch64_insn_write_literal_u64()
arm64: insn: Add helpers for BTI
arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
ACPI: Don't build ACPICA with '-Os'
Compiler attributes: GCC cold function alignment workarounds
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
* for-next/efi-boot-mmu-on:
: Permit arm64 EFI boot with MMU and caches on
arm64: kprobes: Drop ID map text from kprobes blacklist
arm64: head: Switch endianness before populating the ID map
efi: arm64: enter with MMU and caches enabled
arm64: head: Clean the ID map and the HYP text to the PoC if needed
arm64: head: avoid cache invalidation when entering with the MMU on
arm64: head: record the MMU state at primary entry
arm64: kernel: move identity map out of .text mapping
arm64: head: Move all finalise_el2 calls to after __enable_mmu
* for-next/ptrauth:
: arm64 pointer authentication cleanup
arm64: pauth: don't sign leaf functions
arm64: unify asm-arch manipulation
* for-next/pseudo-nmi:
: Pseudo-NMI code generation optimisations
arm64: irqflags: use alternative branches for pseudo-NMI logic
arm64: add ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap
arm64: make ARM64_HAS_GIC_PRIO_MASKING depend on ARM64_HAS_GIC_CPUIF_SYSREGS
arm64: rename ARM64_HAS_IRQ_PRIO_MASKING to ARM64_HAS_GIC_PRIO_MASKING
arm64: rename ARM64_HAS_SYSREG_GIC_CPUIF to ARM64_HAS_GIC_CPUIF_SYSREGS
Pull ARM SoC fixes from Arnd Bergmann:
"All the changes this time are minor devicetree corrections, the
majority being for 64-bit Rockchip SoC support. These are a couple of
corrections for properties that are in violation of the binding, some
that put the machine into safer operating points for the eMMC and
thermal settings, and missing properties that prevented rk356x PCIe
and ethernet from working correctly.
The changes for amlogic and mediatek address incorrect properties that
were preventing the display support on MT8195 and the MMC support on
various Meson SoCs from working correctly.
The stihxxx-b2120 change fixes the GPIO polarity for the DVB tuner to
allow this to be used correctly after a futre driver change, though it
has no effect on older kernels"
* tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
ARM: dts: stihxxx-b2120: fix polarity of reset line of tsin0 port
arm64: dts: mediatek: mt8195: Fix vdosys* compatible strings
arm64: dts: rockchip: align rk3399 DMC OPP table with bindings
arm64: dts: rockchip: set sdmmc0 speed to sd-uhs-sdr50 on rock-3a
arm64: dts: rockchip: fix probe of analog sound card on rock-3a
arm64: dts: rockchip: add missing #interrupt-cells to rk356x pcie2x1
arm64: dts: rockchip: fix input enable pinconf on rk3399
ARM: dts: rockchip: add power-domains property to dp node on rk3288
arm64: dts: rockchip: add io domain setting to rk3566-box-demo
arm64: dts: rockchip: remove unsupported property from sdmmc2 for rock-3a
arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
arm64: dts: rockchip: reduce thermal limits on rk3399-pinephone-pro
arm64: dts: rockchip: use correct reset names for rk3399 crypto nodes
Pull RISC-V fixes from Palmer Dabbelt:
"This is a little bigger that I'd hope for this late in the cycle, but
they're all pretty concrete fixes and the only one that's bigger than
a few lines is pmdp_collapse_flush() (which is almost all
boilerplate/comment). It's also all bug fixes for issues that have
been around for a while.
So I think it's not all that scary, just bad timing.
- avoid partial TLB fences for huge pages, which are disallowed by
the ISA
- avoid missing a frame when dumping stacks
- avoid misaligned accesses (and possibly overflows) in kprobes
- fix a race condition in tracking page dirtiness"
* tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
riscv: kprobe: Fixup misaligned load text
riscv: stacktrace: Fix missing the first frame
riscv: mm: Implement pmdp_collapse_flush for THP
EFI mixed mode on x86 requires a recipe for each protocol method or
firmware service that takes u64 arguments by value, or returns pointer
or 'native int' (UINTN) values by reference (e.g,, through a void ** or
unsigned long * parameter), due to the fact that these types cannot be
translated 1:1 between the i386 and MS x64 calling conventions.
So add the missing recipe for GetMemoryAttributes, which is not actually
being used yet on x86, but the code exists and can be built for x86 so
let's make sure it works as it should.
Cc: Evgeniy Baskov <baskov@ispras.ru>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Some Kconfig options has moved around after a 'make savedefconfig' run,
so move them to their new location to make it easier to see what other
options got removed.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
By default, KVM/SVM will intercept attempts by the guest to transition
out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used
by a VMM to change this behavior. To mitigate the cross-thread return
address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to
override the default behavior to intercept C0 transitions.
Use a module parameter to control the mitigation on processors that are
vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the
X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug,
KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
transitions out of C0 state, the other sibling thread could use return
target predictions from the sibling thread that transitioned out of C0.
The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM to
prevent exiting guest mode when transitioning out of C0. A guest could
act maliciously in this situation, so create a new x86 BUG that can be
used to detect if the processor is vulnerable.
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <91cec885656ca1fcd4f0185ce403a53dd9edecb7.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instrumented memory accesses provided by KCSAN will access core-local
memories (which will save and restore IRQs) as well as restoring IRQs
directly. Avoid recursive instrumentation by applying __no_kcsan
annotation to IRQ restore routines.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
[mpe: Resolve merge conflict with IRQ replay recursion changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-5-rmclure@linux.ibm.com
Annotate memory barriers *mb() with calls to kcsan_mb(), signaling to
compilers supporting KCSAN that the respective memory barrier has been
issued. Rename memory barrier *mb() to __*mb() to opt in for
asm-generic/barrier.h to generate the respective *mb() macro.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-4-rmclure@linux.ibm.com
In order for KCSAN to increase its likelihood of observing a data race,
it sets a watchpoint on memory accesses and stalls, allowing for
detection of conflicting accesses by other kernel threads or interrupts.
Stalls are implemented by injecting a call to udelay in instrumented code.
To prevent recursive instrumentation, exclude udelay from being instrumented.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
The early paca and boot cpuid dance is complicated and currently does
not quite work as expected for boot cpuid != 0 cases.
early_init_devtree() currently allocates the paca_ptrs and boot cpuid
paca, but until that returns and early_setup() calls setup_paca(), this
thread is currently still executing with smp_processor_id() == 0.
One problem this causes is the paca_ptrs[smp_processor_id()] pointer is
poisoned, so valid_emergency_stack() (any backtrace) and any similar
users will crash.
Another is that the hardware id which is set here will not be returned
by get_hard_smp_processor_id(smp_processor_id()), but it would work
correctly for boot_cpuid == 0, which could lead to difficult to
reproduce or find bugs. The hard id does not seem to be used by the rest
of early_init_devtree(), it just looks like all this code might have
been put here to allocate somewhere to store boot CPU hardware id while
scanning the devtree.
Rearrange things so the hwid is put in a global variable like
boot_cpuid, and do all the paca allocation and boot paca setup in the
64-bit early_setup() after we have everything ready to go.
The paca_ptrs[0] re-poisoning code in early_setup does not seem to have
ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid
is not 0.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix build error on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com