Commit Graph

191053 Commits

Author SHA1 Message Date
Linus Torvalds
c3b68c27f5 Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull more parisc fixes from Helge Deller:
 "Fix a build error in stracktrace.c, fix resolving of addresses to
  function names in backtraces, fix single-stepping in assembly code and
  flush userspace pte's when using set_pte_at()"

* tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc/entry: fix trace test in syscall exit path
  parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
  parisc: Fix implicit declaration of function '__kernel_text_address'
  parisc: Fix backtrace to always include init funtion names
2021-11-14 11:53:59 -08:00
Linus Torvalds
24318ae80d Merge tag 'sh-for-5.16' of git://git.libc.org/linux-sh
Pull arch/sh updates from Rich Felker.

* tag 'sh-for-5.16' of git://git.libc.org/linux-sh:
  sh: pgtable-3level: Fix cast to pointer from integer of different size
  sh: fix READ/WRITE redefinition warnings
  sh: define __BIG_ENDIAN for math-emu
  sh: math-emu: drop unused functions
  sh: fix kconfig unmet dependency warning for FRAME_POINTER
  sh: Cleanup about SPARSE_IRQ
  sh: kdump: add some attribute to function
  maple: fix wrong return value of maple_bus_init().
  sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/
  sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y
  sh: boards: Fix the cacography in irq.c
  sh: check return code of request_irq
  sh: fix trivial misannotations
2021-11-14 11:37:49 -08:00
Linus Torvalds
6ea45c57dc Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:

 - Fix early_iounmap

 - Drop cc-option fallbacks for architecture selection

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9156/1: drop cc-option fallbacks for architecture selection
  ARM: 9155/1: fix early early_iounmap()
2021-11-14 11:30:50 -08:00
Linus Torvalds
0d1503d8d8 Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:

 - Two fixes due to DT node name changes on Arm, Ltd. boards

 - Treewide rename of Ingenic CGU headers

 - Update ST email addresses

 - Remove Netlogic DT bindings

 - Dropping few more cases of redundant 'maxItems' in schemas

 - Convert toshiba,tc358767 bridge binding to schema

* tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: watchdog: sunxi: fix error in schema
  bindings: media: venus: Drop redundant maxItems for power-domain-names
  dt-bindings: Remove Netlogic bindings
  clk: versatile: clk-icst: Ensure clock names are unique
  of: Support using 'mask' in making device bus id
  dt-bindings: treewide: Update @st.com email address to @foss.st.com
  dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml
  dt-bindings: media: Update maintainers for st,stm32-cec.yaml
  dt-bindings: mfd: timers: Update maintainers for st,stm32-timers
  dt-bindings: timer: Update maintainers for st,stm32-timer
  dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
  dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml
  dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
2021-11-14 11:11:51 -08:00
Linus Torvalds
218cc8b860 Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 static call update from Thomas Gleixner:
 "A single fix for static calls to make the trampoline patching more
  robust by placing explicit signature bytes after the call trampoline
  to prevent patching random other jumps like the CFI jump table
  entries"

* tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call,x86: Robustify trampoline patching
2021-11-14 10:30:17 -08:00
Linus Torvalds
fc661f2dcb Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:

 - Avoid touching ~100 config files in order to be able to select the
   preemption model

 - clear cluster CPU masks too, on the CPU unplug path

 - prevent use-after-free in cfs

 - Prevent a race condition when updating CPU cache domains

 - Factor out common shared part of smp_prepare_cpus() into a common
   helper which can be called by both baremetal and Xen, in order to fix
   a booting of Xen PV guests

* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  preempt: Restore preemption model selection configs
  arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
  sched/fair: Prevent dead task groups from regaining cfs_rq's
  sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  x86/smp: Factor out parts of native_smp_prepare_cpus()
2021-11-14 09:39:03 -08:00
Linus Torvalds
f7018be292 Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Prevent unintentional page sharing by checking whether a page
   reference to a PMU samples page has been acquired properly before
   that

 - Make sure the LBR_SELECT MSR is saved/restored too

 - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
   residual data left

* tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Avoid put_page() when GUP fails
  perf/x86/vlbr: Add c->flags to vlbr event constraints
  perf/x86/lbr: Reset LBR_SELECT during vlbr reset
2021-11-14 09:33:12 -08:00
Linus Torvalds
1654e95ee3 Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Add the model number of a new, Raptor Lake CPU, to intel-family.h

 - Do not log spurious corrected MCEs on SKL too, due to an erratum

 - Clarify the path of paravirt ops patches upstream

 - Add an optimization to avoid writing out AMX components to sigframes
   when former are in init state

* tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Raptor Lake to Intel family
  x86/mce: Add errata workaround for Skylake SKX37
  MAINTAINERS: Add some information to PARAVIRT_OPS entry
  x86/fpu: Optimize out sigframe xfeatures when in init state
2021-11-14 09:29:03 -08:00
Sven Schnelle
3ec18fc783 parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8 ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.

Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.

I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-13 22:10:56 +01:00
John David Anglin
38860b2c8b parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
For years, there have been random segmentation faults in userspace on
SMP PA-RISC machines.  It occurred to me that this might be a problem in
set_pte_at().  MIPS and some other architectures do cache flushes when
installing PTEs with the present bit set.

Here I have adapted the code in update_mmu_cache() to flush the kernel
mapping when the kernel flush is deferred, or when the kernel mapping
may alias with the user mapping.  This simplifies calls to
update_mmu_cache().

I also changed the barrier in set_pte() from a compiler barrier to a
full memory barrier.  I know this change is not sufficient to fix the
problem.  It might not be needed.

I have had a few days of operation with 5.14.16 to 5.15.1 and haven't
seen any random segmentation faults on rp3440 or c8000 so far.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@kernel.org # 5.12+
2021-11-13 22:10:56 +01:00
Helge Deller
f0d1cfac45 parisc: Fix implicit declaration of function '__kernel_text_address'
Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-13 22:10:56 +01:00
Helge Deller
279917e27e parisc: Fix backtrace to always include init funtion names
I noticed that sometimes at kernel startup the backtraces did not
included the function names of init functions. Their address were not
resolved to function names and instead only the address was printed.

Debugging shows that the culprit is is_ksym_addr() which is called
by the backtrace functions to check if an address belongs to a function in
the kernel. The problem occurs only for CONFIG_KALLSYMS_ALL=y.

When looking at is_ksym_addr() one can see that for CONFIG_KALLSYMS_ALL=y
the function only tries to resolve the address via is_kernel() function,
which checks like this:
	if (addr >= _stext && addr <= _end)
                return 1;
On parisc the init functions are located before _stext, so this check fails.
Other platforms seem to have all functions (including init functions)
behind _stext.

The following patch moves the _stext symbol at the beginning of the
kernel and thus includes the init section. This fixes the check and does
not seem to have any negative side effects on where the kernel mapping
happens in the map_pages() function in arch/parisc/mm/init.c.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@kernel.org # 5.4+
2021-11-13 22:10:56 +01:00
Linus Torvalds
4d6fe79fde Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
 "New x86 features:

   - Guest API and guest kernel support for SEV live migration

   - SEV and SEV-ES intra-host migration

  Bugfixes and cleanups for x86:

   - Fix misuse of gfn-to-pfn cache when recording guest steal time /
     preempted status

   - Fix selftests on APICv machines

   - Fix sparse warnings

   - Fix detection of KVM features in CPUID

   - Cleanups for bogus writes to MSR_KVM_PV_EOI_EN

   - Fixes and cleanups for MSR bitmap handling

   - Cleanups for INVPCID

   - Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures

  Bugfixes for ARM:

   - Fix finalization of host stage2 mappings

   - Tighten the return value of kvm_vcpu_preferred_target()

   - Make sure the extraction of ESR_ELx.EC is limited to architected
     bits"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from
  KVM: x86: move guest_pv_has out of user_access section
  KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
  KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()
  KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT
  KVM: nVMX: Clean up x2APIC MSR handling for L2
  KVM: VMX: Macrofy the MSR bitmap getters and setters
  KVM: nVMX: Handle dynamic MSR intercept toggling
  KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
  KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
  KVM: x86: Rename kvm_lapic_enable_pv_eoi()
  KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES
  KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows
  kvm: mmu: Use fast PF path for access tracking of huge pages when possible
  KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator
  KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
  kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool
  KVM: x86: Fix recording of guest steal time / preempted status
  selftest: KVM: Add intra host migration tests
  selftest: KVM: Add open sev dev helper
  ...
2021-11-13 10:01:10 -08:00
Linus Torvalds
d4fa09e514 Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull vm86 fix from Eric Biederman:
 "Just the removal of an unnecessary (and incorrect) test from a BUG_ON"

* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal/vm86_32: Remove pointless test in BUG_ON
2021-11-13 09:54:24 -08:00
Linus Torvalds
be427a88a3 Merge tag 's390-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:

 - Add PCI automatic error recovery.

 - Fix tape driver timer initialization broken during timers api
   cleanup.

 - Fix bogus CPU measurement counters values on CPUs offlining.

 - Check the validity of subchanel before reading other fields in the
   schib in cio code.

* tag 's390-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cio: check the subchannel validity for dev_busid
  s390/cpumf: cpum_cf PMU displays invalid value after hotplug remove
  s390/tape: fix timer initialization in tape_std_assign()
  s390/pci: implement minimal PCI error recovery
  PCI: Export pci_dev_lock()
  s390/pci: implement reset_slot for hotplug slot
  s390/pci: refresh function handle in iomap
2021-11-13 09:18:06 -08:00
Linus Torvalds
b89f311d7e Merge tag 'riscv-for-linus-5.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:

 - Support for time namespaces in the VDSO, along with some associated
   cleanups.

 - Support for building rv32 randconfigs.

 - Improvements to the XIP port that allow larger kernels to function

 - Various device tree cleanups for both the SiFive and Microchip boards

 - A handful of defconfig updates, including enabling Nouveau.

There are also various small cleanups.

* tag 'riscv-for-linus-5.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: defconfig: enable DRM_NOUVEAU
  riscv/vdso: Drop unneeded part due to merge issue
  riscv: remove .text section size limitation for XIP
  riscv: dts: sifive: add missing compatible for plic
  riscv: dts: microchip: add missing compatibles for clint and plic
  riscv: dts: sifive: drop duplicated nodes and properties in sifive
  riscv: dts: sifive: fix Unleashed board compatible
  riscv: dts: sifive: use only generic JEDEC SPI NOR flash compatible
  riscv: dts: microchip: use vendor compatible for Cadence SD4HC
  riscv: dts: microchip: drop unused pinctrl-names
  riscv: dts: microchip: drop duplicated MMC/SDHC node
  riscv: dts: microchip: fix board compatible
  riscv: dts: microchip: drop duplicated nodes
  dt-bindings: mmc: cdns: document Microchip MPFS MMC/SDHCI controller
  riscv: add rv32 and rv64 randconfig build targets
  riscv: mm: don't advertise 1 num_asid for 0 asid bits
  riscv: set default pm_power_off to NULL
  riscv/vdso: Add support for time namespaces
2021-11-13 09:15:42 -08:00
Linus Torvalds
4218a96faf Merge tag 'mips_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull more MIPS updates from Thomas Bogendoerfer:

 - Config updates for BMIPS platform

 - Build fixes

 - Makefile cleanups

* tag 'mips_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mips: decompressor: do not copy source files while building
  MIPS: boot/compressed/: add __bswapdi2() to target for ZSTD decompression
  MIPS: fix duplicated slashes for Platform file path
  MIPS: fix *-pkg builds for loongson2ef platform
  PCI: brcmstb: Allow building for BMIPS_GENERIC
  MIPS: BMIPS: Enable PCI Kconfig
  MIPS: VDSO: remove -nostdlib compiler flag
  mips: BCM63XX: ensure that CPU_SUPPORTS_32BIT_KERNEL is set
  MIPS: Update bmips_stb_defconfig
  MIPS: Allow modules to set board_be_handler
2021-11-13 09:11:33 -08:00
Eric W. Biederman
c7a9b6471c signal/vm86_32: Remove pointless test in BUG_ON
kernel test robot <oliver.sang@intel.com> writes[1]:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 1a4d21a23c ("signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> with following parameters:
>
>
> [ 70.645554][ T3747] kernel BUG at arch/x86/kernel/vm86_32.c:109!
> [ 70.646185][ T3747] invalid opcode: 0000 [#1] SMP
> [ 70.646682][ T3747] CPU: 0 PID: 3747 Comm: trinity-c6 Not tainted 5.15.0-rc1-00009-g1a4d21a23c4c #1
> [ 70.647598][ T3747] EIP: save_v86_state (arch/x86/kernel/vm86_32.c:109 (discriminator 3))
> [ 70.648113][ T3747] Code: 89 c3 64 8b 35 60 b8 25 c2 83 ec 08 89 55 f0 8b 96 10 19 00 00 89 55 ec e8 c6 2d 0c 00 fb 8b 55 ec 85 d2 74 05 83 3a 00 75 02 <0f> 0b 8b 86 10 19 00 00 8b 4b 38 8b 78 48 31 cf 89 f8 8b 7a 4c 81
> [ 70.650136][ T3747] EAX: 00000001 EBX: f5f49fac ECX: 0000000b EDX: f610b600
> [ 70.650852][ T3747] ESI: f5f79cc0 EDI: f5f79cc0 EBP: f5f49f04 ESP: f5f49ef0
> [ 70.651593][ T3747] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
> [ 70.652413][ T3747] CR0: 80050033 CR2: 00004000 CR3: 35fc7000 CR4: 000406d0
> [ 70.653169][ T3747] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 70.653897][ T3747] DR6: fffe0ff0 DR7: 00000400
> [ 70.654382][ T3747] Call Trace:
> [ 70.654719][ T3747] arch_do_signal_or_restart (arch/x86/kernel/signal.c:792 arch/x86/kernel/signal.c:867)
> [ 70.655288][ T3747] exit_to_user_mode_prepare (kernel/entry/common.c:174 kernel/entry/common.c:209)
> [ 70.655854][ T3747] irqentry_exit_to_user_mode (kernel/entry/common.c:126 kernel/entry/common.c:317)
> [ 70.656450][ T3747] irqentry_exit (kernel/entry/common.c:406)
> [ 70.656897][ T3747] exc_page_fault (arch/x86/mm/fault.c:1535)
> [ 70.657369][ T3747] ? sysvec_kvm_asyncpf_interrupt (arch/x86/mm/fault.c:1488)
> [ 70.657989][ T3747] handle_exception (arch/x86/entry/entry_32.S:1085)

vm86_32.c:109 is: "BUG_ON(!vm86 || !vm86->user_vm86)"

When trying to understand the failure Brian Gerst pointed out[2] that
the code does not need protection against vm86->user_vm86 being NULL.
The copy_from_user code will already handles that case if the address
is going to fault.

Looking futher I realized that if we care about not allowing struct
vm86plus_struct at address 0 it should be do_sys_vm86 (the system
call) that does the filtering.  Not way down deep when the emulation
has completed in save_v86_state.

So let's just remove the silly case of attempting to filter a
userspace address with a BUG_ON.  Existing userspace can't break and
it won't make the kernel any more attackable as the userspace access
helpers will handle it, if it isn't a good userspace pointer.

I have run the reproducer the fuzzer gave me before I made this change
and it reproduced, and after I made this change and I have not seen
the reported failure.  So it does looks like this fixes the reported
issue.

[1] https://lkml.kernel.org/r/20211112074030.GB19820@xsang-OptiPlex-9020
[2] https://lkml.kernel.org/r/CAMzpN2jkK5sAv-Kg_kVnCEyVySiqeTdUORcC=AdG1gV6r8nUew@mail.gmail.com
Suggested-by: Brian Gerst <brgerst@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-12 15:26:42 -06:00
Paolo Bonzini
84886c262e Merge tag 'kvmarm-fixes-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm64 fixes for 5.16, take #1

- Fix the host S2 finalization by solely iterating over the memblocks
  instead of the whole IPA space

- Tighten the return value of kvm_vcpu_preferred_target() now that
  32bit support is long gone

- Make sure the extraction of ESR_ELx.EC is limited to the architected
  bits

- Comment fixups
2021-11-12 16:01:55 -05:00
Tony Luck
fbdb5e8f29 x86/cpu: Add Raptor Lake to Intel family
Add model ID for Raptor Lake.

[ dhansen: These get added as soon as possible so that folks doing
  development can leverage them. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211112182835.924977-1-tony.luck@intel.com
2021-11-12 11:46:06 -08:00
Dave Jones
e629fc1407 x86/mce: Add errata workaround for Skylake SKX37
Errata SKX37 is word-for-word identical to the other errata listed in
this workaround.   I happened to notice this after investigating a CMCI
storm on a Skylake host.  While I can't confirm this was the root cause,
spurious corrected errors does sound like a likely suspect.

Fixes: 2976908e41 ("x86/mce: Do not log spurious corrected mce errors")
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20211029205759.GA7385@codemonkey.org.uk
2021-11-12 11:43:35 -08:00
Arnd Bergmann
418ace9992 ARM: 9156/1: drop cc-option fallbacks for architecture selection
Naresh and Antonio ran into a build failure with latest Debian
armhf compilers, with lots of output like

 tmp/ccY3nOAs.s:2215: Error: selected processor does not support `cpsid i' in ARM mode

As it turns out, $(cc-option) fails early here when the FPU is not
selected before CPU architecture is selected, as the compiler
option check runs before enabling -msoft-float, which causes
a problem when testing a target architecture level without an FPU:

cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU

Passing e.g. -march=armv6k+fp in place of -march=armv6k would avoid this
issue, but the fallback logic is already broken because all supported
compilers (gcc-5 and higher) are much more recent than these options,
and building with -march=armv5t as a fallback no longer works.

The best way forward that I see is to just remove all the checks, which
also has the nice side-effect of slightly improving the startup time for
'make'.

The -mtune=marvell-f option was apparently never supported by any mainline
compiler, and the custom Codesourcery gcc build that did support is
now too old to build kernels, so just use -mtune=xscale unconditionally
for those.

This should be safe to apply on all stable kernels, and will be required
in order to keep building them with gcc-11 and higher.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996419

Reported-by: Antonio Terceiro <antonio.terceiro@linaro.org>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Cc: Matthias Klose <doko@debian.org>
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-11-12 12:49:37 +00:00
Michał Mirosław
0d08e7bf0d ARM: 9155/1: fix early early_iounmap()
Currently __set_fixmap() bails out with a warning when called in early boot
from early_iounmap(). Fix it, and while at it, make the comment a bit easier
to understand.

Cc: <stable@vger.kernel.org>
Fixes: b089c31c51 ("ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-11-12 12:49:36 +00:00
Paolo Bonzini
501cfe0679 KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from
Use the same cleanup code independent of whether the cgroup to be
uncharged and unref'd is the source or the destination cgroup.  Use a
bool to track whether the destination cgroup has been charged, which also
fixes a bug in the error case: the destination cgroup must be uncharged
only if it does not match the source.

Fixes: b56639318b ("KVM: SEV: Add support for SEV intra host migration")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-12 04:37:51 -05:00
Paolo Bonzini
3e067fd850 KVM: x86: move guest_pv_has out of user_access section
When UBSAN is enabled, the code emitted for the call to guest_pv_has
includes a call to __ubsan_handle_load_invalid_value.  objtool
complains that this call happens with UACCESS enabled; to avoid
the warning, pull the calls to user_access_begin into both arms
of the "if" statement, after the check for guest_pv_has.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-12 02:53:52 -05:00
Paul Cercueil
c4a11bf423 dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
Tidy up a bit the tree, by prefixing all include/dt-bindings/clock/ files
related to Ingenic SoCs with 'ingenic,'.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211016133322.40771-1-paul@crapouillou.net
2021-11-11 22:27:14 -06:00
Linus Torvalds
dbf4989618 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "The post-linux-next material.

  7 patches.

  Subsystems affected by this patch series (all mm): debug,
  slab-generic, migration, memcg, and kasan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kasan: add kasan mode messages when kasan init
  mm: unexport {,un}lock_page_memcg
  mm: unexport folio_memcg_{,un}lock
  mm/migrate.c: remove MIGRATE_PFN_LOCKED
  mm: migrate: simplify the file-backed pages validation when migrating its mapping
  mm: allow only SLUB on PREEMPT_RT
  mm/page_owner.c: modify the type of argument "order" in some functions
2021-11-11 14:31:47 -08:00
Linus Torvalds
6d76f6eb46 Merge tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
 "Only two changes.

  One removes the now unused CONFIG_MCPU32 symbol. The other sets a
  default for the CONFIG_MEMORY_RESERVE config symbol (this aids
  scripting and other automation) so you don't interactively get asked
  for a value at configure time.

  Summary:

   - remove unused CONFIG_MCPU32 symbol

   - default CONFIG_MEMORY_RESERVE value (for scripting)"

* tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: Remove MCPU32 config symbol
  m68k: set a default value for MEMORY_RESERVE
2021-11-11 14:22:05 -08:00
Linus Torvalds
f54ca91fe6 Merge tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - bpf: do not reject when the stack read size is different from the
     tracked scalar size

   - net: fix premature exit from NAPI state polling in napi_disable()

   - riscv, bpf: fix RV32 broken build, and silence RV64 warning

  Current release - new code bugs:

   - net: fix possible NULL deref in sock_reserve_memory

   - amt: fix error return code in amt_init(); fix stopping the
     workqueue

   - ax88796c: use the correct ioctl callback

  Previous releases - always broken:

   - bpf: stop caching subprog index in the bpf_pseudo_func insn

   - security: fixups for the security hooks in sctp

   - nfc: add necessary privilege flags in netlink layer, limit
     operations to admin only

   - vsock: prevent unnecessary refcnt inc for non-blocking connect

   - net/smc: fix sk_refcnt underflow on link down and fallback

   - nfnetlink_queue: fix OOB when mac header was cleared

   - can: j1939: ignore invalid messages per standard

   - bpf, sockmap:
      - fix race in ingress receive verdict with redirect to self
      - fix incorrect sk_skb data_end access when src_reg = dst_reg
      - strparser, and tls are reusing qdisc_skb_cb and colliding

   - ethtool: fix ethtool msg len calculation for pause stats

   - vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to
     access an unregistering real_dev

   - udp6: make encap_rcv() bump the v6 not v4 stats

   - drv: prestera: add explicit padding to fix m68k build

   - drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge

   - drv: mvpp2: fix wrong SerDes reconfiguration order

  Misc & small latecomers:

   - ipvs: auto-load ipvs on genl access

   - mctp: sanity check the struct sockaddr_mctp padding fields

   - libfs: support RENAME_EXCHANGE in simple_rename()

   - avoid double accounting for pure zerocopy skbs"

* tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (123 commits)
  selftests/net: udpgso_bench_rx: fix port argument
  net: wwan: iosm: fix compilation warning
  cxgb4: fix eeprom len when diagnostics not implemented
  net: fix premature exit from NAPI state polling in napi_disable()
  net/smc: fix sk_refcnt underflow on linkdown and fallback
  net/mlx5: Lag, fix a potential Oops with mlx5_lag_create_definer()
  gve: fix unmatched u64_stats_update_end()
  net: ethernet: lantiq_etop: Fix compilation error
  selftests: forwarding: Fix packet matching in mirroring selftests
  vsock: prevent unnecessary refcnt inc for nonblocking connect
  net: marvell: mvpp2: Fix wrong SerDes reconfiguration order
  net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory
  net: stmmac: allow a tc-taprio base-time of zero
  selftests: net: test_vxlan_under_vrf: fix HV connectivity test
  net: hns3: allow configure ETS bandwidth of all TCs
  net: hns3: remove check VF uc mac exist when set by PF
  net: hns3: fix some mac statistics is always 0 in device version V2
  net: hns3: fix kernel crash when unload VF while it is being reset
  net: hns3: sync rx ring head in echo common pull
  net: hns3: fix pfc packet number incorrect after querying pfc parameters
  ...
2021-11-11 09:49:36 -08:00
Linus Torvalds
c55a04176c Merge tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
 "Here is a single fix for 5.16-rc1 to resolve a build problem that came
  in through the coresight tree (and as such came in through the
  char/misc tree merge in the 5.16-rc1 merge window).

  It resolves a build problem with 'allmodconfig' on arm64 and is acked
  by the proper subsystem maintainers. It has been in linux-next all
  week with no reported problems"

* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  arm64: cpufeature: Export this_cpu_has_cap helper
2021-11-11 09:44:29 -08:00
Kuan-Ying Lee
b873e98681 kasan: add kasan mode messages when kasan init
There are multiple kasan modes.  It makes sense that we add some
messages to know which kasan mode is active when booting up [1].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212195 [1]
Link: https://lkml.kernel.org/r/20211020094850.4113-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Yee Lee <yee.lee@mediatek.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-11 09:34:35 -08:00
Alistair Popple
ab09243aa9 mm/migrate.c: remove MIGRATE_PFN_LOCKED
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect().  If it
wasn't then the a second attempt is made to lock the page.  However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity.  So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Link: https://lkml.kernel.org/r/20211025041608.289017-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-11 09:34:35 -08:00
Paolo Bonzini
f5396f2d82 Merge branch 'kvm-5.16-fixes' into kvm-master
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status

* Fix selftests on APICv machines

* Fix sparse warnings

* Fix detection of KVM features in CPUID

* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN

* Fixes and cleanups for MSR bitmap handling

* Cleanups for INVPCID

* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
2021-11-11 11:03:05 -05:00
Paolo Bonzini
1f05833193 Merge branch 'kvm-sev-move-context' into kvm-master
Add support for AMD SEV and SEV-ES intra-host migration support.  Intra
host migration provides a low-cost mechanism for userspace VMM upgrades.

In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant.  As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next.  In-kernel
hand off makes it possible to move any data that would be
unsafe/impossible for the kernel to hand directly to userspace, and
cannot be reproduced using data that can be handed to userspace.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 11:02:58 -05:00
Vitaly Kuznetsov
da1bfd52b9 KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
either num_online_cpus() or num_present_cpus(), s390 has multiple
constants depending on hardware features. On x86, KVM reports an
arbitrary value of '710' which is supposed to be the maximum tested
value but it's possible to test all KVM_MAX_VCPUS even when there are
less physical CPUs available.

Drop the arbitrary '710' value and return num_online_cpus() on x86 as
well. The recommendation will match other architectures and will mean
'no CPU overcommit'.

For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
when the requested vCPU number exceeds it. The static limit of '710'
is quite weird as smaller systems with just a few physical CPUs should
certainly "recommend" less.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:24 -05:00
Vipin Sharma
796c83c58a KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()
Handle #GP on INVPCID due to an invalid type in the common switch
statement instead of relying on the callers (VMX and SVM) to manually
validate the type.

Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
the type before reading the operand from memory, so deferring the
type validity check until after that point is architecturally allowed.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109174426.2350547-3-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:24 -05:00
Vipin Sharma
329bd56ce5 KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT
handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2
field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that
holds the invalidation type. Add a helper to retrieve reg2 from VMX
instruction info to consolidate and document the shift+mask magic.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109174426.2350547-2-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:24 -05:00
Sean Christopherson
a5e0c25284 KVM: nVMX: Clean up x2APIC MSR handling for L2
Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
holdout of open coded bitmap manipulations.  Freshen up the SDM/PRM
comment, rename the function to make it abundantly clear the funky
behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
(the previous comment was flat out wrong for x2APIC behavior).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:23 -05:00
Sean Christopherson
0cacb80b98 KVM: VMX: Macrofy the MSR bitmap getters and setters
Add builder macros to generate the MSR bitmap helpers to reduce the
amount of copy-paste code, especially with respect to all the magic
numbers needed to calc the correct bit location.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:23 -05:00
Sean Christopherson
67f4b9969c KVM: nVMX: Handle dynamic MSR intercept toggling
Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
and always update the relevant bits in vmcs02.  This fixes two distinct,
but intertwined bugs related to dynamic MSR bitmap modifications.

The first issue is that KVM fails to enable MSR interception in vmcs02
for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
and later enables interception.

The second issue is that KVM fails to honor userspace MSR filtering when
preparing vmcs02.

Fix both issues simultaneous as fixing only one of the issues (doesn't
matter which) would create a mess that no one should have to bisect.
Fixing only the first bug would exacerbate the MSR filtering issue as
userspace would see inconsistent behavior depending on the whims of L1.
Fixing only the second bug (MSR filtering) effectively requires fixing
the first, as the nVMX code only knows how to transition vmcs02's
bitmap from 1->0.

Move the various accessor/mutators that are currently buried in vmx.c
into vmx.h so that they can be shared by the nested code.

Fixes: 1a155254ff ("KVM: x86: Introduce MSR filtering")
Fixes: d69129b4e4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:23 -05:00
Sean Christopherson
7dfbc624eb KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled.  In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.

Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL.  Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.

Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.

Fixes: d28b387fb7 ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:22 -05:00
Vitaly Kuznetsov
afd67ee3cb KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails,
MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to
expect that the value we keep internally in KVM wasn't updated.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211108152819.12485-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:22 -05:00
Vitaly Kuznetsov
77c3323f48 KVM: x86: Rename kvm_lapic_enable_pv_eoi()
kvm_lapic_enable_pv_eoi() is a misnomer as the function is also
used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi().

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211108152819.12485-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:22 -05:00
Paul Durrant
760849b147 KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES
Currently when kvm_update_cpuid_runtime() runs, it assumes that the
KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
however, if Hyper-V support is enabled. In this case the KVM leaves will
be offset.

This patch introdues as new 'kvm_cpuid_base' field into struct
kvm_vcpu_arch to track the location of the KVM leaves and function
kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
target the correct leaf.

NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
      into processor.h to avoid having duplicate code for the iteration
      over possible hypervisor base leaves.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:21 -05:00
Sean Christopherson
8b44b174f6 KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
only difference between the two ioctls() is the format of the userspace
struct.  A future fix will add yet more code to the core logic.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211105095101.5384-2-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:21 -05:00
Junaid Shahid
10c30de019 kvm: mmu: Use fast PF path for access tracking of huge pages when possible
The fast page fault path bails out on write faults to huge pages in
order to accommodate dirty logging. This change adds a check to do that
only when dirty logging is actually enabled, so that access tracking for
huge pages can still use the fast path for write faults in the common
case.

Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211104003359.2201967-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:20 -05:00
Sean Christopherson
c435d4b7ba KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator
Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
rcu_dereference().  Shadow pages in the TDP MMU, and thus their SPTEs,
are protected by rcu.

This fixes a Sparse warning at tdp_mmu.c:900:51:
  warning: incorrect type in argument 1 (different address spaces)
  expected unsigned long long [usertype] *sptep
  got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep

Fixes: 7158bee4b4 ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211103161833.3769487-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:20 -05:00
Maxim Levitsky
cae72dcc3b KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.

Since this is a debug feature, just inhibit APICv/AVIC while
KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.

Fixes: 61e5f69ef0 ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:20 -05:00
Jim Mattson
e6cd31f1a8 kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool
These function names sound like predicates, and they have siblings,
*is_valid_msr(), which _are_ predicates. Moreover, there are comments
that essentially warn that these functions behave unexpectedly.

Flip the polarity of the return values, so that they become
predicates, and convert the boolean result to a success/failure code
at the outer call site.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211105202058.1048757-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:19 -05:00
David Woodhouse
7e2175ebd6 KVM: x86: Fix recording of guest steal time / preempted status
In commit b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.

Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.

Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.

In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.

Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have.  We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.

Cc: stable@vger.kernel.org
Fixes: b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
[I didn't entirely agree with David's assessment of the
 usefulness of the gfn_to_pfn cache, and integrated the outcome
 of the discussion in the above commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11 10:56:19 -05:00