asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
Pull random number generator updates from Jason Donenfeld:
"Originally I'd planned on sending each of the vDSO getrandom()
architecture ports to their respective arch trees. But as we started
to work on this, we found lots of interesting issues in the shared
code and infrastructure, the fixes for which the various archs needed
to base their work.
So in the end, this turned into a nice collaborative effort fixing up
issues and porting to 5 new architectures -- arm64, powerpc64,
powerpc32, s390x, and loongarch64 -- with everybody pitching in and
commenting on each other's code. It was a fun development cycle.
This contains:
- Numerous fixups to the vDSO selftest infrastructure, getting it
running successfully on more platforms, and fixing bugs in it.
- Additions to the vDSO getrandom & chacha selftests. Basically every
time manual review unearthed a bug in a revision of an arch patch,
or an ambiguity, the tests were augmented.
By the time the last arch was submitted for review, s390x, v1 of
the series was essentially fine right out of the gate.
- Fixes to the the generic C implementation of vDSO getrandom, to
build and run successfully on all archs, decoupling it from
assumptions we had (unintentionally) made on x86_64 that didn't
carry through to the other architectures.
- Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by
Huacai Chen.
- Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked
by Will Deacon.
- Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit
varieties, from Christophe Leroy and acked by Michael Ellerman.
- Port of vDSO getrandom to S390X from Heiko Carstens, the arch
maintainer.
While it'd be natural for there to be things to fix up over the course
of the development cycle, these patches got a decent amount of review
from a fairly diverse crew of folks on the mailing lists, and, for the
most part, they've been cooking in linux-next, which has been helpful
for ironing out build issues.
In terms of architectures, I think that mostly takes care of the
important 64-bit archs with hardware still being produced and running
production loads in settings where vDSO getrandom is likely to help.
Arguably there's still RISC-V left, and we'll see for 6.13 whether
they find it useful and submit a port"
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits)
selftests: vDSO: check cpu caps before running chacha test
s390/vdso: Wire up getrandom() vdso implementation
s390/vdso: Move vdso symbol handling to separate header file
s390/vdso: Allow alternatives in vdso code
s390/module: Provide find_section() helper
s390/facility: Let test_facility() generate static branch if possible
s390/alternatives: Remove ALT_FACILITY_EARLY
s390/facility: Disable compile time optimization for decompressor code
selftests: vDSO: fix vdso_config for s390
selftests: vDSO: fix ELF hash table entry size for s390x
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
powerpc/vdso: Refactor CFLAGS for CVDSO build
powerpc/vdso32: Add crtsavres
mm: Define VM_DROPPABLE for powerpc/32
powerpc/vdso: Fix VDSO data access when running in a non-root time namespace
selftests: vDSO: don't include generated headers for chacha test
arm64: vDSO: Wire up getrandom() vDSO implementation
arm64: alternative: make alternative_has_cap_likely() VDSO compatible
selftests: vDSO: also test counter in vdso_test_chacha
...
Pull misc x86 updates from Thomas Gleixner:
- Rework kcpuid to handle the the autogenerated CSV file correctly and
update the CSV file to cover the whole zoo of CPUID.
- Avoid memcpy() for ia32 syscall_get_arguments() and use direct
assignments as fortified memcpy() is unhappy about writing/reading
beyond the end of the addresses destination/source struct member
- A few new PCI IDs for AMD
- Update MAINTAINERS to cover x86 specific selftests
* tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add selftests/x86 entry
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h
x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()
MAINTAINERS: Add x86 cpuid database entry
tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file
tools/x86/kcpuid: Parse subleaf ranges if provided
tools/x86/kcpuid: Recognize all leaves with subleaves
tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace
tools/x86/kcpuid: Protect against faulty "max subleaf" values
tools/x86/kcpuid: Set max possible subleaves count to 64
tools/x86/kcpuid: Properly align long-description columns
tools/x86/kcpuid: Remove unused variable
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
Architectures use different location for vDSO sources:
arch/mips/vdso
arch/sparc/vdso
arch/arm64/kernel/vdso
arch/riscv/kernel/vdso
arch/csky/kernel/vdso
arch/x86/um/vdso
arch/x86/entry/vdso
arch/powerpc/kernel/vdso
arch/arm/vdso
arch/loongarch/vdso
Don't hard-code vdso sources location in selftest Makefile. Instead
create a vdso/ symbolic link in tools/arch/$arch/ and update Makefile
accordingly.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
To pick up changes from:
149fd4712b perf/x86/intel: Support Perfmon MSRs aliasing
21b362cc76 x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
4f460bff7b cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
7ea81936b8 x86/cpufeatures: Add HWP highest perf change feature flag
78ce84b9e0 x86/cpufeatures: Flip the /proc/cpuinfo appearance logic
1beb348d5c x86/sev: Provide SVSM discovery support
This should be used to beautify x86 syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
And other arch-specific UAPI headers to pick up changes from:
4b23e0c199 KVM: Ensure new code that references immediate_exit gets extra scrutiny
85542adb65 KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flag
6fef518594 KVM: x86: Add a capability to configure bus frequency for APIC timer
34ff659017 x86/sev: Use kernel provided SVSM Calling Areas
5dcc1e7614 Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD
9a0d2f4995 KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register
e9eb790b25 KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register
1a1e6865f5 KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register
This should be used to beautify KVM syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
For parsing the cpuid bitfields, kcpuid uses an incomplete CSV file with
300+ bitfields.
Use an auto-generated CSV file from the x86-cpuid.org project instead.
It provides complete bitfields coverage: 830+ bitfields, all with proper
descriptions.
The auto-generated file has the following blurb automatically added:
# SPDX-License-Identifier: CC0-1.0
# Generator: x86-cpuid-db v1.0
The generator tag includes the project's workspace "git describe"
version string. It is intended for projects like KernelCI, to aid in
verifying that the auto-generated files have not been tampered with.
The file also has the blurb:
# Auto-generated file.
# Please submit all updates and bugfixes to https://x86-cpuid.org
It's thus kindly requested that the Linux kernel's x86 tree maintainers
enforce sending all updates to x86-cpuid.org's upstream database first,
thus benefiting the whole ecosystem.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v1.0/LICENSE.rst
Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db
Link: https://lore.kernel.org/all/20240718134755.378115-9-darwi@linutronix.de
It's a common pattern in cpuid leaves to have the same bitfields format
repeated across a number of subleaves. Typically, this is used for
enumerating hierarchial structures like cache and TLB levels, CPU
topology levels, etc.
Modify kcpuid.c to handle subleaf ranges in the CSV file subleaves
column. For example, make it able to parse lines in the form:
# LEAF, SUBLEAVES, reg, bits, short_name , ...
0xb, 1:0, eax, 4:0, x2apic_id_shift , ...
0xb, 1:0, ebx, 15:0, domain_lcpus_count , ...
0xb, 1:0, ecx, 7:0, domain_nr , ...
This way, full output can be printed to the user.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-8-darwi@linutronix.de
cpuid.csv will be extended in further commits with all-publicly-known
CPUID leaves and bitfields. Thus, modify has_subleafs() to identify all
known leaves with subleaves.
Remove the redundant "is_amd" check since all x86 vendors already report
the maxium supported extended leaf at leaf 0x80000000 EAX register.
The extra mentioned leaves are:
- Leaf 0x12, Intel Software Guard Extensions (SGX) enumeration
- Leaf 0x14, Intel process trace (PT) enumeration
- Leaf 0x17, Intel SoC vendor attributes enumeration
- Leaf 0x1b, Intel PCONFIG (Platform configuration) enumeration
- Leaf 0x1d, Intel AMX (Advanced Matrix Extensions) tile information
- Leaf 0x1f, Intel v2 extended topology enumeration
- Leaf 0x23, Intel ArchPerfmonExt (Architectural PMU ext) enumeration
- Leaf 0x80000020, AMD Platform QoS extended features enumeration
- Leaf 0x80000026, AMD v2 extended topology enumeration
Set the 'max_subleaf' variable for all the newly marked leaves with extra
subleaves. Ideally, this should be fetched from the CSV file instead,
but the current kcpuid code architecture has two runs: one run to
serially invoke the cpuid instructions and save all the output in-memory,
and one run to parse this in-memory output through the CSV specification.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-7-darwi@linutronix.de
While parsing and saving bitfield names from the CSV file, an extra
leading space is copied verbatim. That extra space is not a big issue
now, but further commits will add a new CSV file with much more padding
for the bitfield's name column.
Strip leading/trailing whitespaces while saving bitfield names.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-6-darwi@linutronix.de
Protect against the kcpuid code parsing faulty max subleaf numbers
through a min() expression. Thus, ensuring that max_subleaf will always
be ≤ MAX_SUBLEAF_NUM.
Use "u32" for the subleaf numbers since kcpuid is compiled with -Wextra,
which includes signed/unsigned comparisons warnings.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-5-darwi@linutronix.de
cpuid.csv will be extended in further commits with all-publicly-known
CPUID leaves and bitfields. One of the new leaves is 0xd for extended
CPU state enumeration. Depending on XCR0 dword bits, it can export up to
64 subleaves.
Set kcpuid.c MAX_SUBLEAF_NUM to 64.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-4-darwi@linutronix.de
When kcpuid is invoked with "--all --details", the detailed description
column is not properly aligned for all bitfield rows:
CPUID_0x4_ECX[0x0]:
cache_level : 0x1 - Cache Level ...
cache_self_init - Cache Self Initialization
This is due to differences in output handling between boolean single-bit
"bitflags" and multi-bit bitfields. For the former, the bitfield's value
is not outputted as it is implied to be true by just outputting the
bitflag's name in its respective line.
If long descriptions were requested through the --all parameter, properly
align the bitflag's description columns through extra tabs. With that,
the sample output above becomes:
CPUID_0x4_ECX[0x0]:
cache_level : 0x1 - Cache Level ...
cache_self_init - Cache Self Initialization
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240718134755.378115-3-darwi@linutronix.de
So far the Makefile just installed the csv into $(HWDATADIR)/cpuid.csv, which
made it unaware about $DESTDIR. Add $DESTDIR to the install command and while
at it also create the directory, should it not exist already. This eases the
packaging of kcpuid and allows i.e. for the install on Arch to look like this:
$ make BINDIR=/usr/bin DESTDIR="$pkgdir" -C tools/arch/x86/kcpuid install
Some background on DESTDIR:
DESTDIR is commonly used in packaging for staged installs (regardless of the
used package manager):
https://www.gnu.org/prep/standards/html_node/DESTDIR.html
So the package is built and installed into a directory which the package
manager later picks up and creates some archive from it.
What is specific to Arch Linux here is only the usage of $pkgdir in the
example, DESTDIR itself is widely used.
[ bp: Extend the commit message with Christian's info on DESTDIR as a GNU
coding standards thing. ]
Signed-off-by: Christian Heusel <christian@heusel.eu>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240531111757.719528-2-christian@heusel.eu
To pick the changes in:
4af663c2f6 ("KVM: SEV: Allow per-guest configuration of GHCB protocol version")
4f5defae70 ("KVM: SEV: introduce KVM_SEV_INIT2 operation")
26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
ac5c48027b ("KVM: SEV: publish supported VMSA features")
651d61bc8b ("KVM: PPC: Fix documentation for ppc mmu caps")
That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/ZlYxAdHjyAkvGtMW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf tools updates from Arnaldo Carvalho de Melo:
"General:
- Integrate the shellcheck utility with the build of perf to allow
catching shell problems early in areas such as 'perf test', 'perf
trace' scrape scripts, etc
- Add 'uretprobe' variant in the 'perf bench uprobe' tool
- Add script to run instances of 'perf script' in parallel
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on
systems where those tracepoints aren't available
- Add Kan Liang to MAINTAINERS as a perf tools reviewer
- Add support for using the 'capstone' disassembler library in
various tools, such as 'perf script' and 'perf annotate'. This is
an alternative for the use of the 'xed' and 'objdump' disassemblers
Data-type profiling improvements:
- Resolve types for a->b->c by backtracking the assignments until it
finds DWARF info for one of those members
- Support for global variables, keeping a cache to speed up lookups
- Handle the 'call' instruction, dealing with effects on registers
and handling its return when tracking register data types
- Handle x86's segment based addressing like %gs:0x28, to support
things like per CPU variables, the stack canary, etc
- Data-type profiling got big speedups when using capstone for
disassembling. The objdump outoput parsing method is left as a
fallback when capstone fails or isn't available. There are patches
posted for 6.11 that to use a LLVM disassembler
- Support event group display in the TUI when annotating types with
--data-type, for instance to show memory load and store events for
the data type fields
- Optimize the 'perf annotate' data structures, reducing memory usage
- Add a initial 'perf test' for 'perf annotate', checking that a
target symbol appears on the output, specifying objdump via the
command line, etc
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
erroneously in TopdownL1
- Add AMD's Zen 5 core and uncore events and metrics. Those come from
the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
Processors" document, with events that capture information on op
dispatch, execution and retirement, branch prediction, L1 and L2
cache activity, TLB activity, etc
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
AmpereOneX
Miscellaneous:
- Sync header copies with the kernel sources
- Move some header copies used only for generating translation string
tables for ioctl cmds and other syscall integer arguments to a new
directory under tools/perf/beauty/, to separate from copies in
tools/include/ that are used to build the tools
- Introduce scrape script for several syscall 'flags'/'mask'
arguments
- Improve cpumap utilization, fixing up pairing of refcounts, using
the right iterators (perf_cpu_map__for_each_cpu), etc
- Give more details about raw event encodings in 'perf list', show
tracepoint encoding in the detailed output
- Refactor the DSOs handling code, reducing memory usage
- Document the BPF event modifier and add a 'perf test' for it
- Improve the event parser, better error messages and add further
'perf test's for it
- Add reference count checking to 'struct comm_str' and 'struct
mem_info'
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust
- Tweak the ARM64's Coresight 'perf test's
- Improve ARM64's CoreSight ETM version detection and error reporting
- Fix handling of symbols when using kcore
- Fix PAI (Processor Activity Instrumentation) counter names for s390
virtual machines in 'perf report'
- Fix -g/--call-graph option failure in 'perf sched timehist'
- Add LIBTRACEEVENT_DIR build option to allow building with
libtraceevent installed in non-standard directories, such as when
doing cross builds
- Various 'perf test' and 'perf bench' fixes
- Improve 'perf probe' error message for long C++ probe names"
* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
tools lib subcmd: Show parent options in help
perf pmu: Count sys and cpuid JSON events separately
perf stat: Don't display metric header for non-leader uncore events
perf annotate-data: Ensure the number of type histograms
perf annotate: Fix segfault on sample histogram
perf daemon: Fix file leak in daemon_session__control
libsubcmd: Fix parse-options memory leak
perf lock: Avoid memory leaks from strdup()
perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
perf tools: Ignore deleted cgroups
perf parse: Allow tracepoint names to start with digits
perf parse-events: Add new 'fake_tp' parameter for tests
perf parse-events: pass parse_state to add_tracepoint
perf symbols: Fix ownership of string in dso__load_vmlinux()
perf symbols: Update kcore map before merging in remaining symbols
perf maps: Re-use __maps__free_maps_by_name()
perf symbols: Remove map from list before updating addresses
perf tracepoint: Don't scan all tracepoints to test if one exists
perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
perf thread: Fixes to thread__new() related to initializing comm
...
Pull perf event updates from Ingo Molnar:
- Extend the x86 instruction decoder with APX and
other new instructions
- Misc cleanups
* tag 'perf-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/cstate: Remove unused 'struct perf_cstate_msr'
perf/x86/rapl: Rename 'maxdie' to nr_rapl_pmu and 'dieid' to rapl_pmu_idx
x86/insn: Add support for APX EVEX instructions to the opcode map
x86/insn: Add support for APX EVEX to the instruction decoder logic
x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map
x86/insn: Add support for REX2 prefix to the instruction decoder logic
x86/insn: Add misc new Intel instructions
x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS
x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
x86/insn: Add Key Locker instructions to the opcode map
Pull x86 platform driver updates from Hans de Goede:
- New drivers/platform/arm64 directory for arm64 embedded-controller
drivers
- New drivers:
- Acer Aspire 1 embedded controllers (for arm64 models)
- ACPI quickstart PNP0C32 buttons
- Dell All-In-One backlight support (dell-uart-backlight)
- Lenovo WMI camera buttons
- Lenovo Yoga Tablet 2 Pro 1380F/L fast charging
- MeeGoPad ANX7428 Type-C Cross Switch (power sequencing only)
- MSI WMI sensors (fan speed sensors only for now)
- Asus WMI:
- 2024 ROG Mini-LED support
- MCU powersave support
- Vivobook GPU MUX support
- Misc. other improvements
- Ideapad laptop:
- Export FnLock LED as LED class device
- Switch platform profiles using thermal management key
- Intel drivers:
- IFS: various improvements
- PMC: Lunar Lake support
- SDSI: various improvements
- TPMI/ISST: various improvements
- tools: intel-speed-select: various improvements
- MS Surface drivers:
- Fan profile switching support
- Surface Pro thermal sensors support
- ThinkPad ACPI:
- Reworked hotkey support to use sparse keymaps
- Add support for new trackpoint-doubletap, Fn+N and Fn+G hotkeys
- WMI core:
- New WMI driver development guide
- x86 Android tablets:
- Lenovo Yoga Tablet 2 Pro 1380F/L support
- Xiaomi MiPad 2 status LED and bezel touch buttons backlight
support
- Miscellaneous cleanups / fixes / improvements
* tag 'platform-drivers-x86-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (128 commits)
platform/x86: Add new MeeGoPad ANX7428 Type-C Cross Switch driver
devm-helpers: Fix a misspelled cancellation in the comments
tools arch x86: Add dell-uart-backlight-emulator
platform/x86: Add new Dell UART backlight driver
platform/x86: x86-android-tablets: Create LED device for Xiaomi Pad 2 bottom bezel touch buttons
platform/x86: x86-android-tablets: Xiaomi pad2 RGB LED fwnode updates
platform/x86: x86-android-tablets: Pass struct device to init()
platform/x86/amd: pmc: Add new ACPI ID AMDI000B
platform/x86/amd: pmf: Add new ACPI ID AMDI0105
platform/x86: p2sb: Don't init until unassigned resources have been assigned
platform/surface: aggregator: Log critical errors during SAM probing
platform/x86: ISST: Support SST-BF and SST-TF per level
platform/x86/fujitsu-laptop: Replace sprintf() with sysfs_emit()
tools/power/x86/intel-speed-select: v1.19 release
tools/power/x86/intel-speed-select: Display CPU as None for -1
tools/power/x86/intel-speed-select: SST BF/TF support per level
tools/power/x86/intel-speed-select: Increase number of CPUs displayed
tools/power/x86/intel-speed-select: Present all TRL levels for turbo-freq
tools/power/x86/intel-speed-select: Fix display for unsupported levels
tools/power/x86/intel-speed-select: Support multiple dies
...
Dell All In One (AIO) models released after 2017 use a backlight controller
board connected to an UART.
Add a small emulator to allow development and testing of
the drivers/platform/x86/dell/dell-uart-backlight.c driver for
this board, without requiring access to an actual Dell All In One.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240513144603.93874-3-hdegoede@redhat.com
To support APX functionality, the EVEX prefix is used to:
- promote legacy instructions
- promote VEX instructions
- add new instructions
Promoted VEX instructions require no extra annotation because the opcodes
do not change and the permissive nature of the instruction decoder already
allows them to have an EVEX prefix.
Promoted legacy instructions and new instructions are placed in map 4 which
has not been used before.
Create a new table for map 4 and add APX instructions.
Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add
support for APX EVEX to the instruction decoder logic". SCALABLE
instructions must be represented in both no-prefix (NP) and 66 prefix
forms.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-9-adrian.hunter@intel.com
Intel Advanced Performance Extensions (APX) extends the EVEX prefix to
support:
- extended general purpose registers (EGPRs) i.e. r16 to r31
- Push-Pop Acceleration (PPX) hints
- new data destination (NDD) register
- suppress status flags writes (NF) of common instructions
- new instructions
Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.
The extended EVEX prefix does not need amended instruction decoder logic,
except in one area. Some instructions are defined as SCALABLE which means
the EVEX.W bit and EVEX.pp bits are used to determine operand size.
Specifically, if an instruction is SCALABLE and EVEX.W is zero, then
EVEX.pp value 0 (representing no prefix NP) means default operand size,
whereas EVEX.pp value 1 (representing 66 prefix) means operand size
override i.e. 16 bits
Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and
amend the logic appropriately.
Amend the awk script that generates the attribute tables from the opcode
map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-8-adrian.hunter@intel.com
Support for REX2 has been added to the instruction decoder logic and the
awk script that generates the attribute tables from the opcode map.
Add REX2 prefix byte (0xD5) to the opcode map.
Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2.
Add JMPABS to the opcode map and add annotation (REX2) to identify that it
has a mandatory REX2 prefix. A separate opcode attribute table is not
needed at this time because JMPABS has the same attribute encoding as the
MOV instruction that it shares an opcode with i.e. INAT_MOFFSET.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-7-adrian.hunter@intel.com
Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.
The REX2 prefix is effectively an extended version of the REX prefix.
REX2 and EVEX are also used with PUSH/POP instructions to provide a
Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
fast-forward register data between matching PUSH and POP instructions.
REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
other maps is provided by the EVEX prefix, covered in a separate patch.
Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
for a new 64-bit absolute direct jump instruction JMPABS.
Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.
Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
opcodes (only JMPABS) that require a mandatory REX2 prefix
(INAT_REX2_VARIANT).
Amend logic to read the REX2 prefix and get the opcode attribute for the
map number (0 or 1) encoded in the REX2 prefix.
Amend the awk script that generates the attribute tables from the opcode
map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-6-adrian.hunter@intel.com
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Add instructions documented in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference March 2024
319433-052, that have not been added yet:
AADD
AAND
AOR
AXOR
CMPccXADD
PBNDKB
RDMSRLIST
URDMSR
UWRMSR
VBCSTNEBF162PS
VBCSTNESH2PS
VCVTNEEBF162PS
VCVTNEEPH2PS
VCVTNEOBF162PS
VCVTNEOPH2PS
VCVTNEPS2BF16
VPDPB[SU,UU,SS]D[,S]
VPDPW[SU,US,UU]D[,S]
VPMADD52HUQ
VPMADD52LUQ
VSHA512MSG1
VSHA512MSG2
VSHA512RNDS2
VSM3MSG1
VSM3MSG2
VSM3RNDS2
VSM4KEY4
VSM4RNDS4
WRMSRLIST
TCMMIMFP16PS
TCMMRLFP16PS
TDPFP16PS
PREFETCHIT1
PREFETCHIT0
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-5-adrian.hunter@intel.com
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Intel Architecture Instruction Set Extensions and Future Features manual
number 319433-044 of May 2021, documented VEX versions of instructions
VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them
listed as EVEX only.
Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS,
VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX
or EVEX prefix.
Fixes: 0153d98f2d ("x86/insn: Add misc instructions to x86 instruction decoder")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-4-adrian.hunter@intel.com
The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.
Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
contradicted by the Instruction Set Reference section for PUSH in the
same manual.
Remove 64-bit operand size only annotation from opcode 0x68 PUSH
instruction.
Example:
$ cat pushw.s
.global _start
.text
_start:
pushw $0x1234
mov $0x1,%eax # system call number (sys_exit)
int $0x80
$ as -o pushw.o pushw.s
$ ld -s -o pushw pushw.o
$ objdump -d pushw | tail -4
0000000000401000 <.text>:
401000: 66 68 34 12 pushw $0x1234
401004: b8 01 00 00 00 mov $0x1,%eax
401009: cd 80 int $0x80
$ perf record -e intel_pt//u ./pushw
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
Before:
$ perf script --insn-trace=disasm
Warning:
1 instruction trace errors
pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax)
pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch
pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax)
instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction
After:
$ perf script --insn-trace=disasm
pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax
Fixes: eb13296cfa ("x86: Instruction decoder API")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com
The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:
LOADIWKEY:
Load a CPU-internal wrapping key.
ENCODEKEY128:
Wrap a 128-bit AES key to a key handle.
ENCODEKEY256:
Wrap a 256-bit AES key to a key handle.
AESENC128KL:
Encrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.
AESENC256KL:
Encrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.
AESDEC128KL:
Decrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.
AESDEC256KL:
Decrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.
AESENCWIDE128KL:
Encrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.
AESENCWIDE256KL:
Encrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.
AESDECWIDE128KL:
Decrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.
AESDECWIDE256KL:
Decrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.
The detail can be found in Intel Software Developer Manual.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240502105853.5338-2-adrian.hunter@intel.com
To pick the changes from:
95a6ccbdc7 ("x86/bhi: Mitigate KVM by default")
ec9404e40e ("x86/bhi: Add BHI mitigation knob")
be482ff950 ("x86/bhi: Enumerate Branch History Injection (BHI) bug")
0f4a837615 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S")
7390db8aea ("x86/bhi: Add support for clearing branch history at syscall entry")
This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZirIx4kPtJwGFZS0@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from:
598c2fafc0 ("perf/x86/amd/lbr: Use freeze based on availability")
7f274e609f ("x86/cpufeatures: Add new word for scattered features")
This should address these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-6-namhyung@kernel.org
To pick up the changes from:
6bda055d62 ("KVM: define __KVM_HAVE_GUEST_DEBUG unconditionally")
5d9cb71642 ("KVM: arm64: move ARM-specific defines to uapi/asm/kvm.h")
71cd774ad2 ("KVM: s390: move s390-specific structs to uapi/asm/kvm.h")
d750951c9e ("KVM: powerpc: move powerpc-specific structs to uapi/asm/kvm.h")
bcac047727 ("KVM: x86: move x86-specific structs to uapi/asm/kvm.h")
c0a411904e ("KVM: remove more traces of device assignment UAPI")
f3c80061c0 ("KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP")
That should be used to beautify the KVM arguments and it addresses these
tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240408185520.1550865-4-namhyung@kernel.org
Pull kvm updates from Paolo Bonzini:
"S390:
- Changes to FPU handling came in via the main s390 pull request
- Only deliver to the guest the SCLP events that userspace has
requested
- More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same)
- Fix selftests undefined behavior
x86:
- Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can
be programmed *using the architectural encoding*. The enumeration
does NOT say anything about the encoding when the CPU doesn't
report support the event *in general*. It might support it, and it
might support it using the same encoding that made it into the
architectural PMU spec
- Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly
emulates RDMPC, counter availability, and a variety of other
PMC-related behaviors that depend on guest CPUID and therefore are
easier to validate with selftests than with custom guests (aka
kvm-unit-tests)
- Zero out PMU state on AMD if the virtual PMU is disabled, it does
not cause any bug but it wastes time in various cases where KVM
would check if a PMC event needs to be synthesized
- Optimize triggering of emulated events, with a nice ~10%
performance improvement in VM-Exit microbenchmarks when a vPMU is
exposed to the guest
- Tighten the check for "PMI in guest" to reduce false positives if
an NMI arrives in the host while KVM is handling an IRQ VM-Exit
- Fix a bug where KVM would report stale/bogus exit qualification
information when exiting to userspace with an internal error exit
code
- Add a VMX flag in /proc/cpuinfo to report 5-level EPT support
- Rework TDP MMU root unload, free, and alloc to run with mmu_lock
held for read, e.g. to avoid serializing vCPUs when userspace
deletes a memslot
- Tear down TDP MMU page tables at 4KiB granularity (used to be
1GiB). KVM doesn't support yielding in the middle of processing a
zap, and 1GiB granularity resulted in multi-millisecond lags that
are quite impolite for CONFIG_PREEMPT kernels
- Allocate write-tracking metadata on-demand to avoid the memory
overhead when a kernel is built with i915 virtualization support
but the workloads use neither shadow paging nor i915 virtualization
- Explicitly initialize a variety of on-stack variables in the
emulator that triggered KMSAN false positives
- Fix the debugregs ABI for 32-bit KVM
- Rework the "force immediate exit" code so that vendor code
ultimately decides how and when to force the exit, which allowed
some optimization for both Intel and AMD
- Fix a long-standing bug where kvm_has_noapic_vcpu could be left
elevated if vCPU creation ultimately failed, causing extra
unnecessary work
- Cleanup the logic for checking if the currently loaded vCPU is
in-kernel
- Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere
in the kernel) are detected earlier and are less likely to hang the
kernel
x86 Xen emulation:
- Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the gpa
but the underlying host virtual address remains the same
- When possible, use a single host TSC value when computing the
deadline for Xen timers in order to improve the accuracy of the
timer emulation
- Inject pending upcall events when the vCPU software-enables its
APIC to fix a bug where an upcall can be lost (and to follow Xen's
behavior)
- Fall back to the slow path instead of warning if "fast" IRQ
delivery of Xen events fails, e.g. if the guest has aliased xAPIC
IDs
RISC-V:
- Support exception and interrupt handling in selftests
- New self test for RISC-V architectural timer (Sstc extension)
- New extension support (Ztso, Zacas)
- Support userspace emulation of random number seed CSRs
ARM:
- Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
- Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
- Conversion of KVM's representation of LPIs to an xarray, utilized
to address serialization some of the serialization on the LPI
injection path
- Support for _architectural_ VHE-only systems, advertised through
the absence of FEAT_E2H0 in the CPU's ID register
- Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
- Set reserved bits as zero in CPUCFG
- Start SW timer only when vcpu is blocking
- Do not restart SW timer when it is expired
- Remove unnecessary CSR register saving during enter guest
- Misc cleanups and fixes as usual
Generic:
- Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
always true on all architectures except MIPS (where Kconfig
determines the available depending on CPU capabilities). It is
replaced either by an architecture-dependent symbol for MIPS, and
IS_ENABLED(CONFIG_KVM) everywhere else
- Factor common "select" statements in common code instead of
requiring each architecture to specify it
- Remove thoroughly obsolete APIs from the uapi headers
- Move architecture-dependent stuff to uapi/asm/kvm.h
- Always flush the async page fault workqueue when a work item is
being removed, especially during vCPU destruction, to ensure that
there are no workers running in KVM code when all references to
KVM-the-module are gone, i.e. to prevent a very unlikely
use-after-free if kvm.ko is unloaded
- Grab a reference to the VM's mm_struct in the async #PF worker
itself instead of gifting the worker a reference, so that there's
no need to remember to *conditionally* clean up after the worker
Selftests:
- Reduce boilerplate especially when utilize selftest TAP
infrastructure
- Add basic smoke tests for SEV and SEV-ES, along with a pile of
library support for handling private/encrypted/protected memory
- Fix benign bugs where tests neglect to close() guest_memfd files"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
selftests: kvm: remove meaningless assignments in Makefiles
KVM: riscv: selftests: Add Zacas extension to get-reg-list test
RISC-V: KVM: Allow Zacas extension for Guest/VM
KVM: riscv: selftests: Add Ztso extension to get-reg-list test
RISC-V: KVM: Allow Ztso extension for Guest/VM
RISC-V: KVM: Forward SEED CSR access to user space
KVM: riscv: selftests: Add sstc timer test
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Add exception handling support
LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
LoongArch: KVM: Do not restart SW timer when it is expired
LoongArch: KVM: Start SW timer only when vcpu is blocking
LoongArch: KVM: Set reserved bits as zero in CPUCFG
KVM: selftests: Explicitly close guest_memfd files in some gmem tests
KVM: x86/xen: fix recursive deadlock in timer injection
KVM: pfncache: simplify locking and make more self-contained
KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
KVM: x86/xen: improve accuracy of Xen timers
...
Pull core x86 updates from Ingo Molnar:
- The biggest change is the rework of the percpu code, to support the
'Named Address Spaces' GCC feature, by Uros Bizjak:
- This allows C code to access GS and FS segment relative memory
via variables declared with such attributes, which allows the
compiler to better optimize those accesses than the previous
inline assembly code.
- The series also includes a number of micro-optimizations for
various percpu access methods, plus a number of cleanups of %gs
accesses in assembly code.
- These changes have been exposed to linux-next testing for the
last ~5 months, with no known regressions in this area.
- Fix/clean up __switch_to()'s broken but accidentally working handling
of FPU switching - which also generates better code
- Propagate more RIP-relative addressing in assembly code, to generate
slightly better code
- Rework the CPU mitigations Kconfig space to be less idiosyncratic, to
make it easier for distros to follow & maintain these options
- Rework the x86 idle code to cure RCU violations and to clean up the
logic
- Clean up the vDSO Makefile logic
- Misc cleanups and fixes
* tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
x86/idle: Select idle routine only once
x86/idle: Let prefer_mwait_c1_over_halt() return bool
x86/idle: Cleanup idle_setup()
x86/idle: Clean up idle selection
x86/idle: Sanitize X86_BUG_AMD_E400 handling
sched/idle: Conditionally handle tick broadcast in default_idle_call()
x86: Increase brk randomness entropy for 64-bit systems
x86/vdso: Move vDSO to mmap region
x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together
x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
x86/retpoline: Ensure default return thunk isn't used at runtime
x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32
x86/vdso: Use $(addprefix ) instead of $(foreach )
x86/vdso: Simplify obj-y addition
x86/vdso: Consolidate targets and clean-files
x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK
x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO
x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY
x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY
x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS
...
Pull x86 asm updates from Ingo Molnar:
"Two changes to simplify the x86 decoder logic a bit"
* tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/insn: Directly assign x86_64 state in insn_init()
x86/insn: Remove superfluous checks from instruction decoding routines
Pull x86 SEV updates from Borislav Petkov:
- Add the x86 part of the SEV-SNP host support.
This will allow the kernel to be used as a KVM hypervisor capable of
running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP
is the ultimate goal of the AMD confidential computing side,
providing the most comprehensive confidential computing environment
up to date.
This is the x86 part and there is a KVM part which did not get ready
in time for the merge window so latter will be forthcoming in the
next cycle.
- Rework the early code's position-dependent SEV variable references in
order to allow building the kernel with clang and -fPIE/-fPIC and
-mcmodel=kernel
- The usual set of fixes, cleanups and improvements all over the place
* tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/sev: Disable KMSAN for memory encryption TUs
x86/sev: Dump SEV_STATUS
crypto: ccp - Have it depend on AMD_IOMMU
iommu/amd: Fix failure return from snp_lookup_rmpentry()
x86/sev: Fix position dependent variable references in startup code
crypto: ccp: Make snp_range_list static
x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
Documentation: virt: Fix up pre-formatted text block for SEV ioctls
crypto: ccp: Add the SNP_SET_CONFIG command
crypto: ccp: Add the SNP_COMMIT command
crypto: ccp: Add the SNP_PLATFORM_STATUS command
x86/cpufeatures: Enable/unmask SEV-SNP CPU feature
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
crypto: ccp: Handle legacy SEV commands when SNP is enabled
crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
x86/sev: Introduce an SNP leaked pages list
crypto: ccp: Provide an API to issue SEV and SNP commands
...
Pull x86 FRED support from Thomas Gleixner:
"Support for x86 Fast Return and Event Delivery (FRED).
FRED is a replacement for IDT event delivery on x86 and addresses most
of the technical nightmares which IDT exposes:
1) Exception cause registers like CR2 need to be manually preserved
in nested exception scenarios.
2) Hardware interrupt stack switching is suboptimal for nested
exceptions as the interrupt stack mechanism rewinds the stack on
each entry which requires a massive effort in the low level entry
of #NMI code to handle this.
3) No hardware distinction between entry from kernel or from user
which makes establishing kernel context more complex than it needs
to be especially for unconditionally nestable exceptions like NMI.
4) NMI nesting caused by IRET unconditionally reenabling NMIs, which
is a problem when the perf NMI takes a fault when collecting a
stack trace.
5) Partial restore of ESP when returning to a 16-bit segment
6) Limitation of the vector space which can cause vector exhaustion
on large systems.
7) Inability to differentiate NMI sources
FRED addresses these shortcomings by:
1) An extended exception stack frame which the CPU uses to save
exception cause registers. This ensures that the meta information
for each exception is preserved on stack and avoids the extra
complexity of preserving it in software.
2) Hardware interrupt stack switching is non-rewinding if a nested
exception uses the currently interrupt stack.
3) The entry points for kernel and user context are separate and GS
BASE handling which is required to establish kernel context for
per CPU variable access is done in hardware.
4) NMIs are now nesting protected. They are only reenabled on the
return from NMI.
5) FRED guarantees full restore of ESP
6) FRED does not put a limitation on the vector space by design
because it uses a central entry points for kernel and user space
and the CPUstores the entry type (exception, trap, interrupt,
syscall) on the entry stack along with the vector number. The
entry code has to demultiplex this information, but this removes
the vector space restriction.
The first hardware implementations will still have the current
restricted vector space because lifting this limitation requires
further changes to the local APIC.
7) FRED stores the vector number and meta information on stack which
allows having more than one NMI vector in future hardware when the
required local APIC changes are in place.
The series implements the initial FRED support by:
- Reworking the existing entry and IDT handling infrastructure to
accomodate for the alternative entry mechanism.
- Expanding the stack frame to accomodate for the extra 16 bytes FRED
requires to store context and meta information
- Providing FRED specific C entry points for events which have
information pushed to the extended stack frame, e.g. #PF and #DB.
- Providing FRED specific C entry points for #NMI and #MCE
- Implementing the FRED specific ASM entry points and the C code to
demultiplex the events
- Providing detection and initialization mechanisms and the necessary
tweaks in context switching, GS BASE handling etc.
The FRED integration aims for maximum code reuse vs the existing IDT
implementation to the extent possible and the deviation in hot paths
like context switching are handled with alternatives to minimalize the
impact. The low level entry and exit paths are seperate due to the
extended stack frame and the hardware based GS BASE swichting and
therefore have no impact on IDT based systems.
It has been extensively tested on existing systems and on the FRED
simulation and as of now there are no outstanding problems"
* tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/fred: Fix init_task thread stack pointer initialization
MAINTAINERS: Add a maintainer entry for FRED
x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly
x86/fred: Invoke FRED initialization code to enable FRED
x86/fred: Add FRED initialization functions
x86/syscall: Split IDT syscall setup code into idt_syscall_init()
KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code
x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled
x86/traps: Add sysvec_install() to install a system interrupt handler
x86/fred: FRED entry/exit and dispatch code
x86/fred: Add a machine check entry stub for FRED
x86/fred: Add a NMI entry stub for FRED
x86/fred: Add a debug fault entry stub for FRED
x86/idtentry: Incorporate definitions/declarations of the FRED entries
x86/fred: Make exc_page_fault() work for FRED
x86/fred: Allow single-step trap and NMI when starting a new task
x86/fred: No ESPFIX needed when FRED is enabled
...
KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
avoid creating ABI that KVM can't sanely support.
- Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
clear that such VMs are purely a development and testing vehicle, and
come with zero guarantees.
- Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
is to support confidential VMs with deterministic private memory (SNP
and TDX) only in the TDP MMU.
- Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.