Commit Graph

41730 Commits

Author SHA1 Message Date
Linus Torvalds
efb2883060 Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
 "Only updating the turbostat tool here, no kernel changes"

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2022.07.28
  tools/power turbostat: do not decode ACC for ICX and SPR
  tools/power turbostat: fix SPR PC6 limits
  tools/power turbostat: cleanup 'automatic_cstate_conversion_probe()'
  tools/power turbostat: separate SPR from ICX
  tools/power turbosstat: fix comment
  tools/power turbostat: Support RAPTORLAKE P
  tools/power turbostat: add support for ALDERLAKE_N
  tools/power turbostat: dump secondary Turbo-Ratio-Limit
  tools/power turbostat: simplify dump_turbo_ratio_limits()
  tools/power turbostat: dump CPUID.7.EDX.Hybrid
  tools/power turbostat: update turbostat.8
  tools/power turbostat: Show uncore frequency
  tools/power turbostat: Fix file pointer leak
  tools/power turbostat: replace strncmp with single character compare
  tools/power turbostat: print the kernel boot commandline
  tools/power turbostat: Introduce support for RaptorLake
2022-08-02 12:47:31 -07:00
Linus Torvalds
9de1f9c8ca Merge tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "Updates for interrupt core and drivers:

  Core:

   - Fix a few inconsistencies between UP and SMP vs interrupt
     affinities

   - Small updates and cleanups all over the place

  New drivers:

   - LoongArch interrupt controller

   - Renesas RZ/G2L interrupt controller

  Updates:

   - Hotpath optimization for SiFive PLIC

   - Workaround for broken PLIC edge triggered interrupts

   - Simall cleanups and improvements as usual"

* tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  irqchip/mmp: Declare init functions in common header file
  irqchip/mips-gic: Check the return value of ioremap() in gic_of_init()
  genirq: Use for_each_action_of_desc in actions_show()
  irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch
  irqchip: Add LoongArch CPU interrupt controller support
  irqchip: Add Loongson Extended I/O interrupt controller support
  irqchip/loongson-liointc: Add ACPI init support
  irqchip/loongson-pch-msi: Add ACPI init support
  irqchip/loongson-pch-pic: Add ACPI init support
  irqchip: Add Loongson PCH LPC controller support
  LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain
  LoongArch: Use ACPI_GENERIC_GSI for gsi handling
  genirq/generic_chip: Export irq_unmap_generic_chip
  ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback
  APCI: irq: Add support for multiple GSI domains
  LoongArch: Provisionally add ACPICA data structures
  irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains
  irqdomain: Report irq number for NOMAP domains
  irqchip/gic-v3: Fix comment typo
  dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/V2L SoC
  ...
2022-08-01 12:48:15 -07:00
Linus Torvalds
63e6053add Merge tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:

 - Fix Intel Alder Lake PEBS memory access latency & data source
   profiling info bugs.

 - Use Intel large-PEBS hardware feature in more circumstances, to
   reduce PMI overhead & reduce sampling data.

 - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI
   variant, which tells tooling the exact number of samples lost.

 - Add new IBS register bits definitions.

 - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements.

* tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/ibs: Add new IBS register bits into header
  perf/x86/intel: Fix PEBS data source encoding for ADL
  perf/x86/intel: Fix PEBS memory access info encoding for ADL
  perf/core: Add a new read format to get a number of lost samples
  perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments
  perf/x86/amd/uncore: Add PerfMonV2 DF event format
  perf/x86/amd/uncore: Detect available DF counters
  perf/x86/amd/uncore: Use attr_update for format attributes
  perf/x86/amd/uncore: Use dynamic events array
  x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
2022-08-01 12:24:30 -07:00
Linus Torvalds
22a39c3d86 Merge tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "This was a fairly quiet cycle for the locking subsystem:

   - lockdep: Fix a handful of the more complex lockdep_init_map_*()
     primitives that can lose the lock_type & cause false reports. No
     such mishap was observed in the wild.

   - jump_label improvements: simplify the cross-arch support of initial
     NOP patching by making it arch-specific code (used on MIPS only),
     and remove the s390 initial NOP patching that was superfluous"

* tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Fix lockdep_init_map_*() confusion
  jump_label: make initial NOP patching the special case
  jump_label: mips: move module NOP patching into arch code
  jump_label: s390: avoid pointless initial NOP patching
2022-08-01 12:15:27 -07:00
Linus Torvalds
0cec3f24a7 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "Highlights include a major rework of our kPTI page-table rewriting
  code (which makes it both more maintainable and considerably faster in
  the cases where it is required) as well as significant changes to our
  early boot code to reduce the need for data cache maintenance and
  greatly simplify the KASLR relocation dance.

  Summary:

   - Remove unused generic cpuidle support (replaced by PSCI version)

   - Fix documentation describing the kernel virtual address space

   - Handling of some new CPU errata in Arm implementations

   - Rework of our exception table code in preparation for handling
     machine checks (i.e. RAS errors) more gracefully

   - Switch over to the generic implementation of ioremap()

   - Fix lockdep tracking in NMI context

   - Instrument our memory barrier macros for KCSAN

   - Rework of the kPTI G->nG page-table repainting so that the MMU
     remains enabled and the boot time is no longer slowed to a crawl
     for systems which require the late remapping

   - Enable support for direct swapping of 2MiB transparent huge-pages
     on systems without MTE

   - Fix handling of MTE tags with allocating new pages with HW KASAN

   - Expose the SMIDR register to userspace via sysfs

   - Continued rework of the stack unwinder, particularly improving the
     behaviour under KASAN

   - More repainting of our system register definitions to match the
     architectural terminology

   - Improvements to the layout of the vDSO objects

   - Support for allocating additional bits of HWCAP2 and exposing
     FEAT_EBF16 to userspace on CPUs that support it

   - Considerable rework and optimisation of our early boot code to
     reduce the need for cache maintenance and avoid jumping in and out
     of the kernel when handling relocation under KASLR

   - Support for disabling SVE and SME support on the kernel
     command-line

   - Support for the Hisilicon HNS3 PMU

   - Miscellanous cleanups, trivial updates and minor fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits)
  arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}
  arm64: fix KASAN_INLINE
  arm64/hwcap: Support FEAT_EBF16
  arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long
  arm64/hwcap: Document allocation of upper bits of AT_HWCAP
  arm64: enable THP_SWAP for arm64
  arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
  arm64: errata: Remove AES hwcap for COMPAT tasks
  arm64: numa: Don't check node against MAX_NUMNODES
  drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
  perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
  docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
  arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
  mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
  mm: kasan: Skip unpoisoning of user pages
  mm: kasan: Ensure the tags are visible before the tag in page->flags
  drivers/perf: hisi: add driver for HNS3 PMU
  drivers/perf: hisi: Add description for HNS3 PMU driver
  drivers/perf: riscv_pmu_sbi: perf format
  perf/arm-cci: Use the bitmap API to allocate bitmaps
  ...
2022-08-01 10:37:00 -07:00
Linus Torvalds
60ee49fac8 Merge tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kdump updates from Borislav Petkov:

 - Add the ability to pass early an RNG seed to the kernel from the boot
   loader

 - Add the ability to pass the IMA measurement of kernel and bootloader
   to the kexec-ed kernel

* tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/setup: Use rng seeds from setup_data
  x86/kexec: Carry forward IMA measurement log on kexec
2022-08-01 10:17:19 -07:00
Linus Torvalds
8b7054528c Merge tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:

 - Fix stack protector builds when cross compiling with Clang

 - Other Kbuild improvements and fixes

* tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory: Omit use of bin2c
  x86/purgatory: Hard-code obj-y in Makefile
  x86/build: Remove unused OBJECT_FILES_NON_STANDARD_test_nx.o
  x86/Kconfig: Fix CONFIG_CC_HAS_SANE_STACKPROTECTOR when cross compiling with clang
2022-08-01 10:14:19 -07:00
Linus Torvalds
ecf9b7bfea Merge tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:

 - Have invalid MSR accesses warnings appear only once after a
   pr_warn_once() change broke that

 - Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching
   infra take care of them instead of having unreadable alternative
   macros there

* tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/extable: Fix ex_handler_msr() print condition
  x86,nospec: Simplify {JMP,CALL}_NOSPEC
2022-08-01 10:04:00 -07:00
Linus Torvalds
98b1783de2 Merge tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:

 - Add a bunch of PCI IDs for new AMD CPUs and use them in k10temp

 - Free the pmem platform device on the registration error path

* tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hwmon: (k10temp): Add support for new family 17h and 19h models
  x86/amd_nb: Add AMD PCI IDs for SMN communication
  x86/pmem: Fix platform-device leak in error path
2022-08-01 10:00:43 -07:00
Linus Torvalds
42efa5e3a8 Merge tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:

 - Remove the vendor check when selecting MWAIT as the default idle
   state

 - Respect idle=nomwait when supplied on the kernel cmdline

 - Two small cleanups

* tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Use MSR_IA32_MISC_ENABLE constants
  x86: Fix comment for X86_FEATURE_ZEN
  x86: Remove vendor checks from prefer_mwait_c1_over_halt
  x86: Handle idle=nomwait cmdline properly for x86_idle
2022-08-01 09:49:29 -07:00
Linus Torvalds
650ea1f626 Merge tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu update from Borislav Petkov:

 - Add machinery to initialize AMX register state in order for
   AMX-capable CPUs to be able to enter deeper low-power state

* tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  intel_idle: Add a new flag to initialize the AMX state
  x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
2022-08-01 09:36:18 -07:00
Linus Torvalds
92598ae22f Merge tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:

 - Rename a PKRU macro to make more sense when reading the code

 - Update pkeys documentation

 - Avoid reading contended mm's TLB generation var if not absolutely
   necessary along with fixing a case where arch_tlbbatch_flush()
   doesn't adhere to the generation scheme and thus violates the
   conditions for the above avoidance.

* tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/tlb: Ignore f->new_tlb_gen when zero
  x86/pkeys: Clarify PKRU_AD_KEY macro
  Documentation/protection-keys: Clean up documentation for User Space pkeys
  x86/mm/tlb: Avoid reading mm_tlb_gen when possible
2022-08-01 09:34:39 -07:00
Linus Torvalds
94e37e8489 Merge tag 'x86_cleanups_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanup from Borislav Petkov:

 - A single CONFIG_ symbol correction in a comment

* tag 'x86_cleanups_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Refer to the intended config STRICT_DEVMEM in a comment
2022-08-01 09:33:17 -07:00
Linus Torvalds
dbc1f5a9f4 Merge tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vmware cleanup from Borislav Petkov:

 - A single statement simplification by using the BIT() macro

* tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vmware: Use BIT() macro for shifting
2022-08-01 09:31:49 -07:00
Linus Torvalds
296d3b3e05 Merge tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS update from Borislav Petkov:
 "A single RAS change:

   - Probe whether hardware error injection (direct MSR writes) is
     possible when injecting errors on AMD platforms. In some cases, the
     platform could prohibit those"

* tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Check whether writes to MCA_STATUS are getting ignored
2022-08-01 09:29:41 -07:00
Thadeu Lima de Souza Cascardo
571c30b1a8 x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Some cloud hypervisors do not provide IBPB on very recent CPU processors,
including AMD processors affected by Retbleed.

Using IBPB before firmware calls on such systems would cause a GPF at boot
like the one below. Do not enable such calls when IBPB support is not
present.

  EFI Variables Facility v0.08 2004-May-17
  general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
  Workqueue: efi_rts_wq efi_call_rts
  RIP: 0010:efi_call_rts
  Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48
  RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246
  RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049
  RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300
  RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000
  R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78
  R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048
  FS:  0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0
  Call Trace:
   <TASK>
   ? __wake_up
   process_one_work
   worker_thread
   ? rescuer_thread
   kthread
   ? kthread_complete_and_exit
   ret_from_fork
   </TASK>
  Modules linked in:

Fixes: 28a99e95f5 ("x86/amd: Use IBPB for firmware calls")
Reported-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com
2022-07-29 10:02:35 +02:00
Len Brown
4af184ee8b tools/power turbostat: dump secondary Turbo-Ratio-Limit
Intel Performance Hybrid processors have a 2nd MSR
describing the turbo limits enforced on the Ecores.

Note, TRL and Secondary-TRL are usually R/O information,
but on overclock-capable parts, they can be written.

Signed-off-by: Len Brown <len.brown@intel.com>
2022-07-28 14:23:26 -04:00
Borislav Petkov
5bb6c1d112 Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
This reverts commit 007faec014.

Now that hyperv does its own protocol negotiation:

  49d6a3c062 ("x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM")

revert this exposure of the sev_es_ghcb_hv_call() helper.

Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by:Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
2022-07-27 18:09:13 +02:00
Ravi Bangoria
326ecc15c6 perf/x86/ibs: Add new IBS register bits into header
IBS support has been enhanced with two new features in upcoming uarch:

  1. DataSrc extension and
  2. L3 miss filtering.

Additional set of bits has been introduced in IBS registers to use these
features. Define these new bits into arch/x86/ header.

  [ bp: Massage commit message. ]

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20220604044519.594-7-ravi.bangoria@amd.com
2022-07-27 13:54:38 +02:00
Masahiro Yamada
2d17bd24b0 x86/purgatory: Omit use of bin2c
The .incbin assembler directive is much faster than bin2c + $(CC).

Do similar refactoring as in

  4c0f032d49 ("s390/purgatory: Omit use of bin2c").

Please note the .quad directive matches to size_t in C (both 8
byte) because the purgatory is compiled only for the 64-bit kernel.
(KEXEC_FILE depends on X86_64).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220725020812.622255-2-masahiroy@kernel.org
2022-07-25 10:32:32 +02:00
Masahiro Yamada
61922d3fa6 x86/purgatory: Hard-code obj-y in Makefile
arch/x86/Kbuild guards the entire purgatory/ directory, and
CONFIG_KEXEC_FILE is bool type.

$(CONFIG_KEXEC_FILE) is always 'y' when this directory is being built.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220725020812.622255-1-masahiroy@kernel.org
2022-07-25 10:26:49 +02:00
Linus Torvalds
af2c9ac240 Merge tag 'perf_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:

 - Reorganize the perf LBR init code so that a TSX quirk is applied
   early enough in order for the LBR MSR access to not #GP

* tag 'perf_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/lbr: Fix unchecked MSR access error on HSW
2022-07-24 09:55:53 -07:00
Linus Torvalds
05017fed92 Merge tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
 "A couple more retbleed fallout fixes.

  It looks like their urgency is decreasing so it seems like we've
  managed to catch whatever snafus the limited -rc testing has exposed.
  Maybe we're getting ready... :)

   - Make retbleed mitigations 64-bit only (32-bit will need a bit more
     work if even needed, at all).

   - Prevent return thunks patching of the LKDTM modules as it is not
     needed there

   - Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS
     parts

   - Enhance error output of apply_returns() when it fails to patch a
     return thunk

   - A sparse fix to the sev-guest module

   - Protect EFI fw calls by issuing an IBPB on AMD"

* tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Make all RETbleed mitigations 64-bit only
  lkdtm: Disable return thunks in rodata.c
  x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
  x86/alternative: Report missing return thunk details
  virt: sev-guest: Pass the appropriate argument type to iounmap()
  x86/amd: Use IBPB for firmware calls
2022-07-24 09:40:17 -07:00
Linus Torvalds
515f71412b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:

 - Check for invalid flags to KVM_CAP_X86_USER_SPACE_MSR

 - Fix use of sched_setaffinity in selftests

 - Sync kernel headers to tools

 - Fix KVM_STATS_UNIT_MAX

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Protect the unused bits in MSR exiting flags
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  KVM: selftests: Fix target thread to be migrated in rseq_test
  KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats
2022-07-23 10:22:26 -07:00
Ben Hutchings
b648ab487f x86/speculation: Make all RETbleed mitigations 64-bit only
The mitigations for RETBleed are currently ineffective on x86_32 since
entry_32.S does not use the required macros.  However, for an x86_32
target, the kconfig symbols for them are still enabled by default and
/sys/devices/system/cpu/vulnerabilities/retbleed will wrongly report
that mitigations are in place.

Make all of these symbols depend on X86_64, and only enable RETHUNK by
default on X86_64.

Fixes: f43b9876e8 ("x86/retbleed: Add fine grained Kconfig knobs")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YtwSR3NNsWp1ohfV@decadent.org.uk
2022-07-23 18:45:11 +02:00
Peter Zijlstra
1e9fdf21a4 mmu_gather: Remove per arch tlb_{start,end}_vma()
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.

 - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
   but does *NOT* want to call it for each VMA.

 - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
   invalidate across multiple VMAs if possible.

With these it is possible to capture the three forms:

  1) empty stubs;
     select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS

  2) start: flush_cache_range(), end: empty;
     select MMU_GATHER_MERGE_VMAS

  3) start: flush_cache_range(), end: flush_tlb_range();
     default

Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-21 10:50:13 -07:00
Peter Zijlstra
a1a5482a2c x86/extable: Fix ex_handler_msr() print condition
On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote:
> Some changes to the way invalid MSR accesses are reported by the
> kernel is causing some problems with messages printed on the
> console.
>
> We have seen several cases of ex_handler_msr() printing invalid MSR
> accesses once but the callstack multiple times causing confusion on
> the console.

> The problem here is that another earlier commit (5.13):
>
> a358f40600 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
>
> Modifies all the pr_*_once() calls to always return true claiming
> that no caller is ever checking the return value of the functions.
>
> This is why we are seeing the callstack printed without the
> associated printk() msg.

Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to
implement DO_ONCE_LITE_IF() and fix the extable code.

Fixes: a358f40600 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Stephane Eranian <eranian@google.com>
Link: https://lkml.kernel.org/r/YqyVFsbviKjVGGZ9@worktop.programming.kicks-ass.net
2022-07-21 10:39:42 +02:00
Peter Zijlstra
09d09531a5 x86,nospec: Simplify {JMP,CALL}_NOSPEC
Have {JMP,CALL}_NOSPEC generate the same code GCC does for indirect
calls and rely on the objtool retpoline patching infrastructure.

There's no reason these should be alternatives while the vast bulk of
compiler generated retpolines are not.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-07-21 10:39:42 +02:00
Kan Liang
b0380e1350 perf/x86/intel/lbr: Fix unchecked MSR access error on HSW
The fuzzer triggers the below trace.

[ 7763.384369] unchecked MSR access error: WRMSR to 0x689
(tried to write 0x1fffffff8101349e) at rIP: 0xffffffff810704a4
(native_write_msr+0x4/0x20)
[ 7763.397420] Call Trace:
[ 7763.399881]  <TASK>
[ 7763.401994]  intel_pmu_lbr_restore+0x9a/0x1f0
[ 7763.406363]  intel_pmu_lbr_sched_task+0x91/0x1c0
[ 7763.410992]  __perf_event_task_sched_in+0x1cd/0x240

On a machine with the LBR format LBR_FORMAT_EIP_FLAGS2, when the TSX is
disabled, a TSX quirk is required to access LBR from registers.
The lbr_from_signext_quirk_needed() is introduced to determine whether
the TSX quirk should be applied. However, the
lbr_from_signext_quirk_needed() is invoked before the
intel_pmu_lbr_init(), which parses the LBR format information. Without
the correct LBR format information, the TSX quirk never be applied.

Move the lbr_from_signext_quirk_needed() into the intel_pmu_lbr_init().
Checking x86_pmu.lbr_has_tsx in the lbr_from_signext_quirk_needed() is
not required anymore.

Both LBR_FORMAT_EIP_FLAGS2 and LBR_FORMAT_INFO have LBR_TSX flag, but
only the LBR_FORMAT_EIP_FLAGS2 requirs the quirk. Update the comments
accordingly.

Fixes: 1ac7fd8159 ("perf/x86/intel/lbr: Support LBR format V7")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220714182630.342107-1-kan.liang@linux.intel.com
2022-07-20 19:24:55 +02:00
Josh Poimboeuf
efc72a665a lkdtm: Disable return thunks in rodata.c
The following warning was seen:

  WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:557 apply_returns (arch/x86/kernel/alternative.c:557 (discriminator 1))
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc4-00008-gee88d363d156 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
  RIP: 0010:apply_returns (arch/x86/kernel/alternative.c:557 (discriminator 1))
  Code: ff ff 74 cb 48 83 c5 04 49 39 ee 0f 87 81 fe ff ff e9 22 ff ff ff 0f 0b 48 83 c5 04 49 39 ee 0f 87 6d fe ff ff e9 0e ff ff ff <0f> 0b 48 83 c5 04 49 39 ee 0f 87 59 fe ff ff e9 fa fe ff ff 48 89

The warning happened when apply_returns() failed to convert "JMP
__x86_return_thunk" to RET.  It was instead a JMP to nowhere, due to the
thunk relocation not getting resolved.

That rodata.o code is objcopy'd to .rodata, and later memcpy'd, so
relocations don't work (and are apparently silently ignored).

LKDTM is only used for testing, so the naked RET should be fine.  So
just disable return thunks for that file.

While at it, disable objtool and KCSAN for the file.

Fixes: 0b53c374b9 ("x86/retpoline: Use -mfunction-return")
Reported-by: kernel test robot <oliver.sang@intel.com>
Debugged-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Ys58BxHxoDZ7rfpr@xsang-OptiPlex-9020/
2022-07-20 19:24:53 +02:00
Pawan Gupta
eb23b5ef91 x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
IBRS mitigation for spectre_v2 forces write to MSR_IA32_SPEC_CTRL at
every kernel entry/exit. On Enhanced IBRS parts setting
MSR_IA32_SPEC_CTRL[IBRS] only once at boot is sufficient. MSR writes at
every kernel entry/exit incur unnecessary performance loss.

When Enhanced IBRS feature is present, print a warning about this
unnecessary performance loss.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/2a5eaf54583c2bfe0edc4fea64006656256cca17.1657814857.git.pawan.kumar.gupta@linux.intel.com
2022-07-20 19:24:53 +02:00
Kees Cook
65cdf0d623 x86/alternative: Report missing return thunk details
Debugging missing return thunks is easier if we can see where they're
happening.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Ys66hwtFcGbYmoiZ@hirez.programming.kicks-ass.net/
2022-07-20 19:24:53 +02:00
Mario Limonciello
f8faf34966 x86/amd_nb: Add AMD PCI IDs for SMN communication
Add support for SMN communication on family 17h model A0h and family 19h
models 60h-70h.

  [ bp: Merge into a single patch. ]

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# pci_ids.h
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220719195256.1516-1-mario.limonciello@amd.com
2022-07-20 17:35:40 +02:00
Paolo Bonzini
3f2adf00f5 x86/cpu: Use MSR_IA32_MISC_ENABLE constants
Instead of the magic numbers 1<<11 and 1<<12 use the constants
from msr-index.h.  This makes it obvious where those bits
of MSR_IA32_MISC_ENABLE are consumed (and in fact that Linux
consumes them at all) to simple minds that grep for
MSR_IA32_MISC_ENABLE_.*_UNAVAIL.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220719174714.2410374-1-pbonzini@redhat.com
2022-07-19 20:53:10 +02:00
Aaron Lewis
cf5029d5dd KVM: x86: Protect the unused bits in MSR exiting flags
The flags for KVM_CAP_X86_USER_SPACE_MSR and KVM_X86_SET_MSR_FILTER
have no protection for their unused bits.  Without protection, future
development for these features will be difficult.  Add the protection
needed to make it possible to extend these features in the future.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20220714161314.1715227-1-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-19 14:04:18 -04:00
Chang S. Bae
f17b168734 x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
When a CPU enters an idle state, a non-initialized AMX register state may
be the cause of preventing a deeper low-power state. Other extended
register states whether initialized or not do not impact the CPU idle
state.

The new helper can ensure the AMX state is initialized before the CPU is
idle, and it will be used by the intel idle driver.

Check the AMX_TILE feature bit before using XGETBV1 as a chain of
dependencies was established via cpuid_deps[]: AMX->XFD->XGETBV1.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com
2022-07-19 18:46:15 +02:00
Nadav Amit
8f1d56f64f x86/mm/tlb: Ignore f->new_tlb_gen when zero
Commit aa44284960 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
possible") introduced an optimization to skip superfluous TLB
flushes based on the generation provided in flush_tlb_info.

However, arch_tlbbatch_flush() does not provide any generation in
flush_tlb_info and populates the flush_tlb_info generation with
0.  This 0 is causes the flush_tlb_info to be interpreted as a
superfluous, old flush.  As a result, try_to_unmap_one() would
not perform any TLB flushes.

Fix it by checking whether f->new_tlb_gen is nonzero. Zero value
is anyhow is an invalid generation value. To avoid future
confusion, introduce TLB_GENERATION_INVALID constant and use it
properly. Add warnings to ensure no partial flushes are done with
TLB_GENERATION_INVALID or when f->mm is NULL, since this does not
make any sense.

In addition, add the missing unlikely().

[ dhansen: change VM_BUG_ON() -> VM_WARN_ON(), clarify changelog ]

Fixes: aa44284960 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Hugh Dickins <hughd@google.com>
Link: https://lkml.kernel.org/r/20220710232837.3618-1-namit@vmware.com
2022-07-19 09:04:52 -07:00
Peter Zijlstra
28a99e95f5 x86/amd: Use IBPB for firmware calls
On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.

And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.

  [ bp: Massage. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
2022-07-18 15:38:09 +02:00
Linus Torvalds
59c80f053d Merge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Improve the check whether the kernel supports WP mappings so that it
   can accomodate a XenPV guest due to how the latter is setting up the
   PAT machinery

  - Now that the retbleed nightmare is public, here's the first round of
    fallout fixes:

      * Fix a build failure on 32-bit due to missing include

      * Remove an untraining point in espfix64 return path

      * other small cleanups

* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Remove apostrophe typo
  um: Add missing apply_returns()
  x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
  x86/bugs: Mark retbleed_strings static
  x86/pat: Fix x86_has_pat_wp()
  x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
2022-07-17 08:27:30 -07:00
Linus Torvalds
16c957f089 Merge tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
 "Fix more fallout from recent changes of the ACPI CPPC handling on AMD
  platforms (Mario Limonciello)"

* tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory
2022-07-16 10:52:41 -07:00
Thadeu Lima de Souza Cascardo
51a6fa0732 efi/x86: use naked RET on mixed mode call wrapper
When running with return thunks enabled under 32-bit EFI, the system
crashes with:

  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
  BUG: unable to handle page fault for address: 000000005bc02900
  #PF: supervisor instruction fetch in kernel mode
  #PF: error_code(0x0011) - permissions violation
  PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
  Oops: 0011 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:0x5bc02900
  Code: Unable to access opcode bytes at RIP 0x5bc028d6.
  RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
  RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
  RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
  R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
  R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 0000000080050033
  CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
  Call Trace:
   ? efi_set_virtual_address_map+0x9c/0x175
   efi_enter_virtual_mode+0x4a6/0x53e
   start_kernel+0x67c/0x71e
   x86_64_start_reservations+0x24/0x2a
   x86_64_start_kernel+0xe9/0xf4
   secondary_startup_64_no_verify+0xe5/0xeb

That's because it cannot jump to the return thunk from the 32-bit code.

Using a naked RET and marking it as safe allows the system to proceed
booting.

Fixes: aa3d480315 ("x86: Use return-thunk in asm code")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: <stable@vger.kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-16 09:51:24 -07:00
Kim Phillips
bcf163150c x86/bugs: Remove apostrophe typo
Remove a superfluous ' in the mitigation string.

Fixes: e8ec1b6e08 ("x86/bugs: Enable STIBP for JMP2RET")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-16 11:39:23 +02:00
Linus Torvalds
a8ebfcd33c Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "RISC-V:
   - Fix missing PAGE_PFN_MASK

   - Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()

  x86:
   - Fix for nested virtualization when TSC scaling is active

   - Estimate the size of fastcc subroutines conservatively, avoiding
     disastrous underestimation when return thunks are enabled

   - Avoid possible use of uninitialized fields of 'struct
     kvm_lapic_irq'

  Generic:
   - Mark as such the boolean values available from the statistics file
     descriptors

   - Clarify statistics documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: emulate: do not adjust size of fastop and setcc subroutines
  KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
  Documentation: kvm: clarify histogram units
  kvm: stats: tell userspace which values are boolean
  x86/kvm: fix FASTOP_SIZE when return thunks are enabled
  KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
  RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
  riscv: Fix missing PAGE_PFN_MASK
2022-07-15 10:31:46 -07:00
Paolo Bonzini
7962918160 KVM: emulate: do not adjust size of fastop and setcc subroutines
Instead of doing complicated calculations to find the size of the subroutines
(which are even more complicated because they need to be stringified into
an asm statement), just hardcode to 16.

It is less dense for a few combinations of IBT/SLS/retbleed, but it has
the advantage of being really simple.

Cc: stable@vger.kernel.org # 5.15.x: 84e7051c0b: x86/kvm: fix FASTOP_SIZE when return thunks are enabled
Cc: stable@vger.kernel.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-15 07:49:40 -04:00
Nathan Chancellor
db88697968 x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Clang warns:

  arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection]
  DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
                      ^
  arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here
  extern u64 x86_spec_ctrl_current;
             ^
  1 error generated.

The declaration should be using DECLARE_PER_CPU instead so all
attributes stay in sync.

Cc: stable@vger.kernel.org
Fixes: fc02735b14 ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-14 14:52:43 -07:00
Vitaly Kuznetsov
8a414f943f KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.

Fixes: a183b638b6 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220708125147.593975-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 12:09:43 -04:00
Paolo Bonzini
cca3f3381b Merge commit 'kvm-vmx-nested-tsc-fix' into kvm-master
Merge bugfix needed in both 5.19 (because it's bad) and 5.20 (because
it is a prerequisite to test new features).
2022-07-14 10:04:44 -04:00
Paolo Bonzini
1b870fa557 kvm: stats: tell userspace which values are boolean
Some of the statistics values exported by KVM are always only 0 or 1.
It can be useful to export this fact to userspace so that it can track
them specially (for example by polling the value every now and then to
compute a % of time spent in a specific state).

Therefore, add "boolean value" as a new "unit".  While it is not exactly
a unit, it walks and quacks like one.  In particular, using the type
would be wrong because boolean values could be instantaneous or peak
values (e.g. "is the rmap allocated?") or even two-bucket histograms
(e.g. "number of posted vs. non-posted interrupt injections").

Suggested-by: Amneesh Singh <natto@weirdnatto.in>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 08:01:59 -04:00
Thadeu Lima de Souza Cascardo
84e7051c0b x86/kvm: fix FASTOP_SIZE when return thunks are enabled
The return thunk call makes the fastop functions larger, just like IBT
does. Consider a 16-byte FASTOP_SIZE when CONFIG_RETHUNK is enabled.

Otherwise, functions will be incorrectly aligned and when computing their
position for differently sized operators, they will executed in the middle
or end of a function, which may as well be an int3, leading to a crash
like:

[   36.091116] int3: 0000 [#1] SMP NOPTI
[   36.091119] CPU: 3 PID: 1371 Comm: qemu-system-x86 Not tainted 5.15.0-41-generic #44
[   36.091120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[   36.091121] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.091185] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.091186] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.091188] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.091188] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.091189] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.091190] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.091190] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.091191] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.091192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.091192] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.091195] PKRU: 55555554
[   36.091195] Call Trace:
[   36.091197]  <TASK>
[   36.091198]  ? fastop+0x5a/0xa0 [kvm]
[   36.091222]  x86_emulate_insn+0x7b8/0xe90 [kvm]
[   36.091244]  x86_emulate_instruction+0x2f4/0x630 [kvm]
[   36.091263]  ? kvm_arch_vcpu_load+0x7c/0x230 [kvm]
[   36.091283]  ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel]
[   36.091290]  complete_emulated_mmio+0x297/0x320 [kvm]
[   36.091310]  kvm_arch_vcpu_ioctl_run+0x32f/0x550 [kvm]
[   36.091330]  kvm_vcpu_ioctl+0x29e/0x6d0 [kvm]
[   36.091344]  ? kvm_vcpu_ioctl+0x120/0x6d0 [kvm]
[   36.091357]  ? __fget_files+0x86/0xc0
[   36.091362]  ? __fget_files+0x86/0xc0
[   36.091363]  __x64_sys_ioctl+0x92/0xd0
[   36.091366]  do_syscall_64+0x59/0xc0
[   36.091369]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091370]  ? do_syscall_64+0x69/0xc0
[   36.091371]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091372]  ? __x64_sys_writev+0x1c/0x30
[   36.091374]  ? do_syscall_64+0x69/0xc0
[   36.091374]  ? exit_to_user_mode_prepare+0x37/0xb0
[   36.091378]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091380]  ? do_syscall_64+0x69/0xc0
[   36.091381]  ? do_syscall_64+0x69/0xc0
[   36.091381]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   36.091384] RIP: 0033:0x7efdfe6d1aff
[   36.091390] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   36.091391] RSP: 002b:00007efdfce8c460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   36.091393] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007efdfe6d1aff
[   36.091393] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c
[   36.091394] RBP: 0000558f1609e220 R08: 0000558f13fb8190 R09: 00000000ffffffff
[   36.091394] R10: 0000558f16b5e950 R11: 0000000000000246 R12: 0000000000000000
[   36.091394] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[   36.091396]  </TASK>
[   36.091397] Modules linked in: isofs nls_iso8859_1 kvm_intel joydev kvm input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler drm msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net net_failover crypto_simd ahci xhci_pci cryptd psmouse virtio_blk libahci xhci_pci_renesas failover
[   36.123271] ---[ end trace db3c0ab5a48fabcc ]---
[   36.123272] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.123319] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.123320] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.123321] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.123321] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.123322] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.123322] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.123323] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.123323] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.123324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.123325] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.123327] PKRU: 55555554
[   36.123328] Kernel panic - not syncing: Fatal exception in interrupt
[   36.123410] Kernel Offset: 0x1400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   36.135305] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: aa3d480315 ("x86: Use return-thunk in asm code")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Message-Id: <20220713171241.184026-1-cascardo@canonical.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 07:44:38 -04:00
Vitaly Kuznetsov
9948272645 KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to
hang upon boot or shortly after when a non-default TSC frequency was
set for L1. The issue is observed on a host where TSC scaling is
supported. The problem appears to be that Windows doesn't use TSC
frequency for its guests even when the feature is advertised and KVM
filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from
L1's. This leads to L2 running with the default frequency (matching
host's) while L1 is running with an altered one.

Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when
it was set for L1. TSC_MULTIPLIER is already correctly computed and
written by prepare_vmcs02().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220712135009.952805-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 07:44:05 -04:00