Commit Graph

195501 Commits

Author SHA1 Message Date
Dave Airlie
e954d2c94d Backmerge tag 'v5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
Linux 5.18-rc5

There was a build fix for arm I wanted in drm-next, so backmerge rather then cherry-pick.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-05-03 16:08:48 +10:00
Linus Torvalds
b6b2648911 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Take care of faults occuring between the PARange and IPA range by
     injecting an exception

   - Fix S2 faults taken from a host EL0 in protected mode

   - Work around Oops caused by a PMU access from a 32bit guest when PMU
     has been created. This is a temporary bodge until we fix it for
     good.

  x86:

   - Fix potential races when walking host page table

   - Fix shadow page table leak when KVM runs nested

   - Work around bug in userspace when KVM synthesizes leaf 0x80000021
     on older (pre-EPYC) or Intel processors

  Generic (but affects only RISC-V):

   - Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: work around QEMU issue with synthetic CPUID leaves
  Revert "x86/mm: Introduce lookup_address_in_mm()"
  KVM: x86/mmu: fix potential races when walking host page table
  KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
  KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
  KVM: arm64: Inject exception on out-of-IPA-range translation fault
  KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set
  KVM: arm64: Handle host stage-2 faults from 32-bit EL0
2022-05-01 11:49:32 -07:00
Linus Torvalds
b2da7df52e Merge tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is
   solely controlled by the hypervisor

 - A build fix to make the function prototype (__warn()) as visible as
   the definition itself

 - A bunch of objtool annotation fixes which have accumulated over time

 - An ORC unwinder fix to handle bad input gracefully

 - Well, we thought the microcode gets loaded in time in order to
   restore the microcode-emulated MSRs but we thought wrong. So there's
   a fix for that to have the ordering done properly

 - Add new Intel model numbers

 - A spelling fix

* tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
  bug: Have __warn() prototype defined unconditionally
  x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config
  objtool: Use offstr() to print address of missing ENDBR
  objtool: Print data address for "!ENDBR" data warnings
  x86/xen: Add ANNOTATE_NOENDBR to startup_xen()
  x86/uaccess: Add ENDBR to __put_user_nocheck*()
  x86/retpoline: Add ANNOTATE_NOENDBR for retpolines
  x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline
  objtool: Enable unreachable warnings for CLANG LTO
  x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE
  x86,objtool: Mark cpu_startup_entry() __noreturn
  x86,xen,objtool: Add UNWIND hint
  lib/strn*,objtool: Enforce user_access_begin() rules
  MAINTAINERS: Add x86 unwinding entry
  x86/unwind/orc: Recheck address range after stack info was updated
  x86/cpu: Load microcode during restore_processor_state()
  x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
2022-05-01 10:03:36 -07:00
Linus Torvalds
b70ed23c23 Merge tag 'objtool_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:
 "A bunch of objtool fixes to improve unwinding, sibling call detection,
  fallthrough detection and relocation handling of weak symbols when the
  toolchain strips section symbols"

* tag 'objtool_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix code relocs vs weak symbols
  objtool: Fix type of reloc::addend
  objtool: Fix function fallthrough detection for vmlinux
  objtool: Fix sibling call detection in alternatives
  objtool: Don't set 'jump_dest' for sibling calls
  x86/uaccess: Don't jump between functions
2022-05-01 09:34:54 -07:00
Linus Torvalds
8013d1d3d2 Merge tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:

 - A fix for a regression caused by the previous set of bugfixes
   changing tegra and at91 pinctrl properties.

   More work is needed to figure out what this should actually be, but a
   revert makes it work for the moment.

 - Defconfig regression fixes for tegra after renamed symbols

 - Build-time warning and static checker fixes for imx, op-tee, sunxi,
   meson, at91, and omap

 - More at91 DT fixes for audio, regulator and spi nodes

 - A regression fix for Renesas Hyperflash memory probe

 - A stability fix for amlogic boards, modifying the allowed cpufreq
   states

 - Multiple fixes for system suspend on omap2+

 - DT fixes for various i.MX bugs

 - A probe error fix for imx6ull-colibri MMC

 - A MAINTAINERS file entry for samsung bug reports

* tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
  Revert "arm: dts: at91: Fix boolean properties with values"
  bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
  Revert "arm64: dts: tegra: Fix boolean properties with values"
  arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
  ARM: dts: imx6ull-colibri: fix vqmmc regulator
  MAINTAINERS: add Bug entry for Samsung and memory controller drivers
  memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode
  ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35
  ARM: dts: am3517-evm: Fix misc pinmuxing
  ARM: dts: am33xx-l4: Add missing touchscreen clock properties
  ARM: dts: Fix mmc order for omap3-gta04
  ARM: dts: at91: fix pinctrl phandles
  ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name
  ARM: dts: at91: Describe regulators on at91sam9g20ek
  ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek
  ARM: dts: at91: Fix boolean properties with values
  ARM: dts: at91: use generic node name for dataflash
  ARM: dts: at91: align SPI NOR node name with dtschema
  ARM: dts: at91: sama7g5ek: Align the impedance of the QSPI0's HSIO and PCB lines
  ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines
  ...
2022-04-29 15:51:05 -07:00
Linus Torvalds
c0e6265e6c Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
 "A semi-large pile of clk driver fixes this time around.

  Nothing is touching the core so these fixes are fairly well contained
  to specific devices that use these clk drivers.

   - Some Allwinner SoC fixes to gracefully handle errors and mark an
     RTC clk as critical so that the RTC keeps ticking.

   - Fix AXI bus clks and RTC clk design for Microchip PolarFire SoC
     driver introduced this cycle. This has some devicetree bits acked
     by riscv maintainers. We're fixing it now so that the prior
     bindings aren't released in a major kernel version.

   - Remove a reset on Microchip PolarFire SoCs that broke when enabling
     CONFIG_PM.

   - Set a min/max for the Qualcomm graphics clk. This got broken by the
     clk rate range patches introduced this cycle"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource()
  clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical
  riscv: dts: microchip: reparent mpfs clocks
  clk: microchip: mpfs: add RTCREF clock control
  clk: microchip: mpfs: re-parent the configurable clocks
  dt-bindings: rtc: add refclk to mpfs-rtc
  dt-bindings: clk: mpfs: add defines for two new clocks
  dt-bindings: clk: mpfs document msspll dri registers
  riscv: dts: microchip: fix usage of fic clocks on mpfs
  clk: microchip: mpfs: mark CLK_ATHENA as critical
  clk: microchip: mpfs: fix parents for FIC clocks
  clk: qcom: clk-rcg2: fix gfx3d frequency calculation
  clk: microchip: mpfs: don't reset disabled peripherals
  clk: sunxi-ng: fix not NULL terminated coccicheck error
2022-04-29 15:38:23 -07:00
Arnd Bergmann
adee8aa22a Revert "arm: dts: at91: Fix boolean properties with values"
This reverts commit 0dc23d1a8e, which caused another regression
as the pinctrl code actually expects an integer value of 0 or 1
rather than a simple boolean property.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 23:09:49 +02:00
Paolo Bonzini
f751d8eac1 KVM: x86: work around QEMU issue with synthetic CPUID leaves
Synthesizing AMD leaves up to 0x80000021 caused problems with QEMU,
which assumes the *host* CPUID[0x80000000].EAX is higher or equal
to what KVM_GET_SUPPORTED_CPUID reports.

This causes QEMU to issue bogus host CPUIDs when preparing the input
to KVM_SET_CPUID2.  It can even get into an infinite loop, which is
only terminated by an abort():

   cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e)

To work around this, only synthesize those leaves if 0x8000001d exists
on the host.  The synthetic 0x80000021 leaf is mostly useful on Zen2,
which satisfies the condition.

Fixes: f144c49e8c ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 15:24:58 -04:00
Linus Torvalds
2d0de93ca2 Merge tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to properly ensure a single CPU is running during patch_text().

 - A defconfig update to include RPMSG_CTRL when RPMSG_CHAR was set,
   necessary after a recent refactoring.

* tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL
  riscv: patch_text: Fixup last cpu should be master
2022-04-29 10:44:58 -07:00
Linus Torvalds
66c2112b74 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
 "Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type.

  This is a fix to the MTE ELF ABI for a bug that was added during the
  most recent merge window as part of the coredump support.

  The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE
  segment type has already been allocated to PT_AARCH64_UNWIND by the
  ELF ABI, so we've bumped the value and changed the name of the
  identifier to be better aligned with the existing one"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  elf: Fix the arm64 MTE ELF segment name and value
2022-04-29 10:36:47 -07:00
Sean Christopherson
643d95aac5 Revert "x86/mm: Introduce lookup_address_in_mm()"
Drop lookup_address_in_mm() now that KVM is providing it's own variant
of lookup_address_in_pgd() that is safe for use with user addresses, e.g.
guards against page tables being torn down.  A variant that provides a
non-init mm is inherently dangerous and flawed, as the only reason to use
an mm other than init_mm is to walk a userspace mapping, and
lookup_address_in_pgd() does not play nice with userspace mappings, e.g.
doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE()
to ensure an upper level entry isn't converted to a huge page between
checking the PAGE_SIZE bit and grabbing the address of the next level
down.

This reverts commit 13c72c060f.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <YmwIi3bXr/1yhYV/@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:40:41 -04:00
Paolo Bonzini
73331c5d84 Merge branch 'kvm-fixes-for-5.18-rc5' into HEAD
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees:

* Fix potential races when walking host page table

* Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT

* Fix shadow page table leak when KVM runs nested
2022-04-29 12:39:34 -04:00
Mingwei Zhang
44187235cb KVM: x86/mmu: fix potential races when walking host page table
KVM uses lookup_address_in_mm() to detect the hugepage size that the host
uses to map a pfn.  The function suffers from several issues:

 - no usage of READ_ONCE(*). This allows multiple dereference of the same
   page table entry. The TOCTOU problem because of that may cause KVM to
   incorrectly treat a newly generated leaf entry as a nonleaf one, and
   dereference the content by using its pfn value.

 - the information returned does not match what KVM needs; for non-present
   entries it returns the level at which the walk was terminated, as long
   as the entry is not 'none'.  KVM needs level information of only 'present'
   entries, otherwise it may regard a non-present PXE entry as a present
   large page mapping.

 - the function is not safe for mappings that can be torn down, because it
   does not disable IRQs and because it returns a PTE pointer which is never
   safe to dereference after the function returns.

So implement the logic for walking host page tables directly in KVM, and
stop using lookup_address_in_mm().

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220429031757.2042406-1-mizhang@google.com>
[Inline in host_pfn_mapping_level, ensure no semantic change for its
 callers. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:22 -04:00
Paolo Bonzini
d495f942f4 KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused.  Unfortunately this extensibility
mechanism has several issues:

- x86 is not writing the member, so it would not be possible to use it
  on x86 except for new events

- the member is not aligned to 64 bits, so the definition of the
  uAPI struct is incorrect for 32- on 64-bit userspace.  This is a
  problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
  usage of flags was only introduced in 5.18.

Since padding has to be introduced, place a new field in there
that tells if the flags field is valid.  To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid.  The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.

To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:22 -04:00
Sean Christopherson
86931ff720 KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's
MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR.
The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the
guest, possibly with help from userspace, manages to coerce KVM into
creating a SPTE for an "impossible" gfn, KVM will leak the associated
shadow pages (page tables):

  WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57
                                kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G        W         5.18.0-rc1+ #293
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
  Call Trace:
   <TASK>
   kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
   kvm_destroy_vm+0x162/0x2d0 [kvm]
   kvm_vm_release+0x1d/0x30 [kvm]
   __fput+0x82/0x240
   task_work_run+0x5b/0x90
   exit_to_user_mode_prepare+0xd2/0xe0
   syscall_exit_to_user_mode+0x1d/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

On bare metal, encountering an impossible gpa in the page fault path is
well and truly impossible, barring CPU bugs, as the CPU will signal #PF
during the gva=>gpa translation (or a similar failure when stuffing a
physical address into e.g. the VMCS/VMCB).  But if KVM is running as a VM
itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR
of the underlying hardware, in which case the hardware will not fault on
the illegal-from-KVM's-perspective gpa.

Alternatively, KVM could continue allowing the dodgy behavior and simply
zap the max possible range.  But, for hosts with MAXPHYADDR < 52, that's
a (minor) waste of cycles, and more importantly, KVM can't reasonably
support impossible memslots when running on bare metal (or with an
accurate MAXPHYADDR as a VM).  Note, limiting the overhead by checking if
KVM is running as a guest is not a safe option as the host isn't required
to announce itself to the guest in any way, e.g. doesn't need to set the
HYPERVISOR CPUID bit.

A second alternative to disallowing the memslot behavior would be to
disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR.  That
restriction is undesirable as there are legitimate use cases for doing
so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous
systems so that VMs can be migrated between hosts with different
MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess.

Note that any guest.MAXPHYADDR is valid with shadow paging, and it is
even useful in order to test KVM with MAXPHYADDR=52 (i.e. without
any reserved physical address bits).

The now common kvm_mmu_max_gfn() is inclusive instead of exclusive.
The memslot and TDP MMU code want an exclusive value, but the name
implies the returned value is inclusive, and the MMIO path needs an
inclusive check.

Fixes: faaf05b00a ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 524a1e4e38 ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220428233416.2446833-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:21 -04:00
Paolo Bonzini
484c22df5a Merge tag 'kvmarm-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.18, take #2

- Take care of faults occuring between the PARange and
  IPA range by injecting an exception

- Fix S2 faults taken from a host EL0 in protected mode

- Work around Oops caused by a PMU access from a 32bit
  guest when PMU has been created. This is a temporary
  bodge until we fix it for good.
2022-04-29 12:32:14 -04:00
Arnd Bergmann
310b663753 Merge tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes
ARM: tegra: Default configuration fixes for v5.18

This contains two updates to the default configuration needed because of
a Kconfig symbol name change. This fixes a failure that was detected in
the NVIDIA automated test farm.

* tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  ARM: config: multi v7: Enable NVIDIA Tegra video decoder driver
  ARM: tegra_defconfig: Update CONFIG_TEGRA_VDE option

Link: https://lore.kernel.org/r/20220429080626.494150-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 16:41:22 +02:00
Arnd Bergmann
73c7bcdcfd Merge tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.18, 2nd round:

- Fix one sparse warning on imx-weim driver.
- Fix vqmmc regulator to get UHS-I mode work on imx6ull-colibri board.
- Add missing 32.768 kHz PMIC clock for imx8mn-ddr4-evk board to fix
  bd718xx-clk probe error.

* tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
  ARM: dts: imx6ull-colibri: fix vqmmc regulator
  bus: imx-weim: make symbol 'weim_of_notifier' static

Link: https://lore.kernel.org/r/20220426013427.GB14615@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 16:23:31 +02:00
Thomas Gleixner
7e0815b3e0 x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.

This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.

Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.

Fixes: 809f9267bb ("xen: map MSIs into pirqs")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Noah Meyerhans <noahm@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
2022-04-29 14:37:39 +02:00
Catalin Marinas
c35fe2a68f elf: Fix the arm64 MTE ELF segment name and value
Unfortunately, the name/value choice for the MTE ELF segment type
(PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by
PT_AARCH64_UNWIND, as defined in the AArch64 ELF ABI
(https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst).

Update the ELF segment type value to LOPROC+2 and also change the define
to PT_AARCH64_MEMTAG_MTE to match the AArch64 ELF ABI namespace. The
AArch64 ELF ABI document is updating accordingly (segment type not
previously mentioned in the document).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 761b9b366c ("elf: Introduce the ARM MTE ELF segment type")
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>
Link: https://lore.kernel.org/r/20220425151833.2603830-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-28 11:37:06 +01:00
Marc Zyngier
85ea6b1ec9 KVM: arm64: Inject exception on out-of-IPA-range translation fault
When taking a translation fault for an IPA that is outside of
the range defined by the hypervisor (between the HW PARange and
the IPA range), we stupidly treat it as an IO and forward the access
to userspace. Of course, userspace can't do much with it, and things
end badly.

Arguably, the guest is braindead, but we should at least catch the
case and inject an exception.

Check the faulting IPA against:
- the sanitised PARange: inject an address size fault
- the IPA size: inject an abort

Reported-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-04-27 23:02:23 +01:00
Alexandru Elisei
8f6379e207 KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set
kvm->arch.arm_pmu is set when userspace attempts to set the first PMU
attribute. As certain attributes are mandatory, arm_pmu ends up always
being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU.
However, this only happens if the VCPU has the PMU feature. If the VCPU
doesn't have the feature bit set, kvm->arch.arm_pmu will be left
uninitialized and equal to NULL.

KVM doesn't do ID register emulation for 32-bit guests and accesses to the
PMU registers aren't gated by the pmu_visibility() function. This is done
to prevent injecting unexpected undefined exceptions in guests which have
detected the presence of a hardware PMU. But even though the VCPU feature
is missing, KVM still attempts to emulate certain aspects of the PMU when
PMU registers are accessed. This leads to a NULL pointer dereference like
this one, which happens on an odroid-c4 board when running the
kvm-unit-tests pmu-cycle-counter test with kvmtool and without the PMU
feature being set:

[  454.402699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000150
[  454.405865] Mem abort info:
[  454.408596]   ESR = 0x96000004
[  454.411638]   EC = 0x25: DABT (current EL), IL = 32 bits
[  454.416901]   SET = 0, FnV = 0
[  454.419909]   EA = 0, S1PTW = 0
[  454.423010]   FSC = 0x04: level 0 translation fault
[  454.427841] Data abort info:
[  454.430687]   ISV = 0, ISS = 0x00000004
[  454.434484]   CM = 0, WnR = 0
[  454.437404] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000c924000
[  454.443800] [0000000000000150] pgd=0000000000000000, p4d=0000000000000000
[  454.450528] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[  454.456036] Modules linked in:
[  454.459053] CPU: 1 PID: 267 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc4 #113
[  454.465697] Hardware name: Hardkernel ODROID-C4 (DT)
[  454.470612] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  454.477512] pc : kvm_pmu_event_mask.isra.0+0x14/0x74
[  454.482427] lr : kvm_pmu_set_counter_event_type+0x2c/0x80
[  454.487775] sp : ffff80000a9839c0
[  454.491050] x29: ffff80000a9839c0 x28: ffff000000a83a00 x27: 0000000000000000
[  454.498127] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000a510000
[  454.505198] x23: ffff000000a83a00 x22: ffff000003b01000 x21: 0000000000000000
[  454.512271] x20: 000000000000001f x19: 00000000000003ff x18: 0000000000000000
[  454.519343] x17: 000000008003fe98 x16: 0000000000000000 x15: 0000000000000000
[  454.526416] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  454.533489] x11: 000000008003fdbc x10: 0000000000009d20 x9 : 000000000000001b
[  454.540561] x8 : 0000000000000000 x7 : 0000000000000d00 x6 : 0000000000009d00
[  454.547633] x5 : 0000000000000037 x4 : 0000000000009d00 x3 : 0d09000000000000
[  454.554705] x2 : 000000000000001f x1 : 0000000000000000 x0 : 0000000000000000
[  454.561779] Call trace:
[  454.564191]  kvm_pmu_event_mask.isra.0+0x14/0x74
[  454.568764]  kvm_pmu_set_counter_event_type+0x2c/0x80
[  454.573766]  access_pmu_evtyper+0x128/0x170
[  454.577905]  perform_access+0x34/0x80
[  454.581527]  kvm_handle_cp_32+0x13c/0x160
[  454.585495]  kvm_handle_cp15_32+0x1c/0x30
[  454.589462]  handle_exit+0x70/0x180
[  454.592912]  kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0
[  454.597485]  kvm_vcpu_ioctl+0x23c/0x940
[  454.601280]  __arm64_sys_ioctl+0xa8/0xf0
[  454.605160]  invoke_syscall+0x48/0x114
[  454.608869]  el0_svc_common.constprop.0+0xd4/0xfc
[  454.613527]  do_el0_svc+0x28/0x90
[  454.616803]  el0_svc+0x34/0xb0
[  454.619822]  el0t_64_sync_handler+0xa4/0x130
[  454.624049]  el0t_64_sync+0x18c/0x190
[  454.627675] Code: a9be7bfd 910003fd f9000bf3 52807ff3 (b9415001)
[  454.633714] ---[ end trace 0000000000000000 ]---

In this particular case, Linux hasn't detected the presence of a hardware
PMU because the PMU node is missing from the DTB, so userspace would have
been unable to set the VCPU PMU feature even if it attempted it. What
happens is that the 32-bit guest reads ID_DFR0, which advertises the
presence of the PMU, and when it tries to program a counter, it triggers
the NULL pointer dereference because kvm->arch.arm_pmu is NULL.

kvm-arch.arm_pmu was introduced by commit 46b1878214 ("KVM: arm64:
Keep a per-VM pointer to the default PMU"). Until that commit, this
error would be triggered instead:

[   73.388140] ------------[ cut here ]------------
[   73.388189] Unknown PMU version 0
[   73.390420] WARNING: CPU: 1 PID: 264 at arch/arm64/kvm/pmu-emul.c:36 kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.399821] Modules linked in:
[   73.402835] CPU: 1 PID: 264 Comm: kvm-vcpu-0 Not tainted 5.17.0 #114
[   73.409132] Hardware name: Hardkernel ODROID-C4 (DT)
[   73.414048] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   73.420948] pc : kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.425863] lr : kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.430779] sp : ffff80000a8db9b0
[   73.434055] x29: ffff80000a8db9b0 x28: ffff000000dbaac0 x27: 0000000000000000
[   73.441131] x26: ffff000000dbaac0 x25: 00000000c600000d x24: 0000000000180720
[   73.448203] x23: ffff800009ffbe10 x22: ffff00000b612000 x21: 0000000000000000
[   73.455276] x20: 000000000000001f x19: 0000000000000000 x18: ffffffffffffffff
[   73.462348] x17: 000000008003fe98 x16: 0000000000000000 x15: 0720072007200720
[   73.469420] x14: 0720072007200720 x13: ffff800009d32488 x12: 00000000000004e6
[   73.476493] x11: 00000000000001a2 x10: ffff800009d32488 x9 : ffff800009d32488
[   73.483565] x8 : 00000000ffffefff x7 : ffff800009d8a488 x6 : ffff800009d8a488
[   73.490638] x5 : ffff0000f461a9d8 x4 : 0000000000000000 x3 : 0000000000000001
[   73.497710] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000000dbaac0
[   73.504784] Call trace:
[   73.507195]  kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.511768]  kvm_pmu_set_counter_event_type+0x2c/0x80
[   73.516770]  access_pmu_evtyper+0x128/0x16c
[   73.520910]  perform_access+0x34/0x80
[   73.524532]  kvm_handle_cp_32+0x13c/0x160
[   73.528500]  kvm_handle_cp15_32+0x1c/0x30
[   73.532467]  handle_exit+0x70/0x180
[   73.535917]  kvm_arch_vcpu_ioctl_run+0x20c/0x6e0
[   73.540489]  kvm_vcpu_ioctl+0x2b8/0x9e0
[   73.544283]  __arm64_sys_ioctl+0xa8/0xf0
[   73.548165]  invoke_syscall+0x48/0x114
[   73.551874]  el0_svc_common.constprop.0+0xd4/0xfc
[   73.556531]  do_el0_svc+0x28/0x90
[   73.559808]  el0_svc+0x28/0x80
[   73.562826]  el0t_64_sync_handler+0xa4/0x130
[   73.567054]  el0t_64_sync+0x1a0/0x1a4
[   73.570676] ---[ end trace 0000000000000000 ]---
[   73.575382] kvm: pmu event creation failed -2

The root cause remains the same: kvm->arch.pmuver was never set to
something sensible because the VCPU feature itself was never set.

The odroid-c4 is somewhat of a special case, because Linux doesn't probe
the PMU. But the above errors can easily be reproduced on any hardware,
with or without a PMU driver, as long as userspace doesn't set the PMU
feature.

Work around the fact that KVM advertises a PMU even when the VCPU feature
is not set by gating all PMU emulation on the feature. The guest can still
access the registers without KVM injecting an undefined exception.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425145530.723858-1-alexandru.elisei@arm.com
2022-04-27 22:10:29 +01:00
Will Deacon
2a50fc5fd0 KVM: arm64: Handle host stage-2 faults from 32-bit EL0
When pKVM is enabled, host memory accesses are translated by an identity
mapping at stage-2, which is populated lazily in response to synchronous
exceptions from 64-bit EL1 and EL0.

Extend this handling to cover exceptions originating from 32-bit EL0 as
well. Although these are very unlikely to occur in practice, as the
kernel typically ensures that user pages are initialised before mapping
them in, drivers could still map previously untouched device pages into
userspace and expect things to work rather than panic the system.

Cc: Quentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220427171332.13635-1-will@kernel.org
2022-04-27 21:54:30 +01:00
Linus Torvalds
46cf2c613f Merge tag 'pinctrl-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:

 - Fix some register offsets on Intel Alderlake

 - Fix the order the UFS and SDC pins on Qualcomm SM6350

 - Fix a build error in Mediatek Moore.

 - Fix a pin function table in the Sunplus SP7021.

 - Fix some Kconfig and static keywords on the Samsung Tesla FSD SoC.

 - Fix up the EOI function for edge triggered IRQs and keep the block
   clock enabled for level IRQs in the STM32 driver.

 - Fix some bits and order in the Rockchip RK3308 driver.

 - Handle the errorpath in the Pistachio driver probe() properly.

* tag 'pinctrl-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: pistachio: fix use of irq_of_parse_and_map()
  pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested
  pinctrl: rockchip: sort the rk3308_mux_recalced_data entries
  pinctrl: rockchip: fix RK3308 pinmux bits
  pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI
  pinctrl: Fix an error in pin-function table of SP7021
  pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config
  pinctrl: mediatek: moore: Fix build error
  pinctrl: qcom: sm6350: fix order of UFS & SDC pins
  pinctrl: alderlake: Fix register offsets for ADL-N variant
  pinctrl: samsung: staticize fsd_pin_ctrl
2022-04-26 16:34:11 -07:00
Arnaud Pouliquen
ac0280a9ca RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL
In the commit 617d32938d ("rpmsg: Move the rpmsg control device
from rpmsg_char to rpmsg_ctrl"), we split the rpmsg_char driver in two.
By default give everyone who had the old driver enabled the rpmsg_ctrl
driver too.

Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20220404090527.582217-1-arnaud.pouliquen@foss.st.com
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-26 08:19:53 -07:00
Arnd Bergmann
2f477ee3ed Revert "arm64: dts: tegra: Fix boolean properties with values"
This reverts commit 1a67653de0, which caused a boot regression.

The behavior of the "drive-push-pull" in the kernel does not
match what the binding document describes. Revert Rob's patch
to make the DT match the kernel again, rather than the binding.

Link: https://lore.kernel.org/lkml/YlVAy95eF%2F9b1nmu@orome/
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-25 13:49:01 +02:00
Linus Torvalds
5206548f6e Merge tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Partly revert a change to our timer_interrupt() that caused lockups
   with high res timers disabled.

 - Fix a bug in KVM TCE handling that could corrupt kernel memory.

 - Two commits fixing Power9/Power10 perf alternative event selection.

Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic
Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin.

* tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/perf: Fix 32bit compile
  powerpc/perf: Fix power10 event alternatives
  powerpc/perf: Fix power9 event alternatives
  KVM: PPC: Fix TCE handling for VFIO
  powerpc/time: Always set decrementer in timer_interrupt()
2022-04-24 12:11:20 -07:00
Linus Torvalds
f48ffef19d Merge tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Add Sapphire Rapids CPU support

 - Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use)

* tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support
  perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
2022-04-24 12:01:16 -07:00
Fabio Estevam
0310b5aa06 arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
The ROHM BD71847 PMIC has a 32.768 kHz clock.

Describe the PMIC clock to fix the following boot errors:

bd718xx-clk bd71847-clk.1.auto: No parent clk found
bd718xx-clk: probe of bd71847-clk.1.auto failed with error -22

Based on the same fix done for imx8mm-evk as per commit
a6a355ede5 ("arm64: dts: imx8mm-evk: Add 32.768 kHz clock to PMIC")

Fixes: 3e44dd0973 ("arm64: dts: imx8mn-ddr4-evk: Add rohm,bd71847 PMIC support")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2022-04-24 21:16:18 +08:00
Max Krummenacher
45974e4276 ARM: dts: imx6ull-colibri: fix vqmmc regulator
The correct spelling for the property is gpios. Otherwise, the regulator
will neither reserve nor control any GPIOs. Thus, any SD/MMC card which
can use UHS-I modes will fail.

Fixes: c2e4987e0e ("ARM: dts: imx6ull: add Toradex Colibri iMX6ULL support")
Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Signed-off-by: Denys Drozdov <denys.drozdov@toradex.com>
Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2022-04-24 21:08:08 +08:00
Linus Torvalds
f39359260e Merge tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:

 - Assorted fixes

* tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: remove redundant READ_ONCE() in cmpxchg loop
  ARC: atomic: cleanup atomic-llsc definitions
  arc: drop definitions of pgd_index() and pgd_offset{, _k}() entirely
  ARC: dts: align SPI NOR node name with dtschema
  ARC: Remove a redundant memset()
  ARC: fix typos in comments
  ARC: entry: fix syscall_trace_exit argument
2022-04-23 16:24:30 -07:00
Linus Torvalds
b51bd23c61 Merge tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
 "A simple cleanup patch and a refcount fix for Xen on Arm"

* tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  arm/xen: Fix some refcount leaks
  xen: Convert kmap() to kmap_local_page()
2022-04-23 13:53:21 -07:00
Randy Dunlap
9423edfc51 sparc: cacheflush_32.h needs struct page
Add a struct page forward declaration to cacheflush_32.h.
Fixes this build warning:

    CC      drivers/crypto/xilinx/zynqmp-sha.o
  In file included from arch/sparc/include/asm/cacheflush.h:11,
                   from include/linux/cacheflush.h:5,
                   from drivers/crypto/xilinx/zynqmp-sha.c:6:
  arch/sparc/include/asm/cacheflush_32.h:38:37: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
     38 | void sparc_flush_page_to_ram(struct page *page);

Exposed by commit 0e03b8fd29 ("crypto: xilinx - Turn SHA into a
tristate and allow COMPILE_TEST") but not Fixes: that commit because the
underlying problem is older.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: sparclinux@vger.kernel.org
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-23 09:27:17 -07:00
Conor Dooley
6deb9bf458 riscv: dts: microchip: reparent mpfs clocks
The 600M clock in the fabric is not the real reference, replace it with
a 125M clock which is the correct value for the icicle kit. Rename the
msspllclk node to mssrefclk since this is now the input to, not the
output of, the msspll clock. Control of the msspll clock has been moved
into the clock configurator, so add the register range for it to the clk
configurator. Finally, add a new output of the clock config block which
will provide the 1M reference clock for the MTIMER and the rtc.

Fixes: 528a5b1f25 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Fixes: 0fa6107eca ("RISC-V: Initial DTS for Microchip ICICLE board")
Reviewed-by: Daire McNamara <daire.mcnamara@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20220413075835.3354193-10-conor.dooley@microchip.com
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-04-22 18:40:30 -07:00
Conor Dooley
2b6190c804 riscv: dts: microchip: fix usage of fic clocks on mpfs
The fic clocks passed to the pcie controller and other peripherals in
the device tree are not the clocks they actually run on. The fics are
actually clock domain crossers & the clock config blocks output is the
mss/cpu side input to the interconnect. The peripherals are actually
clocked by fixed frequency clocks embedded in the fpga fabric.

Fix the device tree so that these peripherals use the correct clocks.
The fabric side FIC0 & FIC1 inputs both use the same 125 MHz, so only
one clock is created for them.

Fixes: 528a5b1f25 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Reviewed-by: Daire McNamara <daire.mcnamara@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20220413075835.3354193-4-conor.dooley@microchip.com
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-04-22 18:40:07 -07:00
Linus Torvalds
bb4ce2c658 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "The main and larger change here is a workaround for AMD's lack of
  cache coherency for encrypted-memory guests.

  I have another patch pending, but it's waiting for review from the
  architecture maintainers.

  RISC-V:

   - Remove 's' & 'u' as valid ISA extension

   - Do not allow disabling the base extensions 'i'/'m'/'a'/'c'

  x86:

   - Fix NMI watchdog in guests on AMD

   - Fix for SEV cache incoherency issues

   - Don't re-acquire SRCU lock in complete_emulated_io()

   - Avoid NULL pointer deref if VM creation fails

   - Fix race conditions between APICv disabling and vCPU creation

   - Bugfixes for disabling of APICv

   - Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume

  selftests:

   - Do not use bitfields larger than 32-bits, they differ between GCC
     and clang"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: selftests: introduce and use more page size-related constants
  kvm: selftests: do not use bitfields larger than 32-bits for PTEs
  KVM: SEV: add cache flush to solve SEV cache incoherency issues
  KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
  KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
  KVM: selftests: Silence compiler warning in the kvm_page_table_test
  KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
  x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
  KVM: SPDX style and spelling fixes
  KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
  KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
  KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
  KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
  KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
  KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
  KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copy
  KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
  RISC-V: KVM: Restrict the extensions that can be disabled
  RISC-V: KVM: Remove 's' & 'u' as valid ISA extension
2022-04-22 17:58:36 -07:00
Linus Torvalds
4e339e5e2d Merge tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes Palmer Dabbelt:

 - A pair of build fixes for the recent cpuidle driver

 - A fix for systems without sv57 that manifests as a crash
   early in boot

* tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLE
  RISC-V: mm: Fix set_satp_mode() for platform not having Sv57
  cpuidle: riscv: support non-SMP config
2022-04-22 13:53:14 -07:00
Linus Torvalds
7200095fea Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "There's no real pattern to the fixes, but the main one fixes our
  pmd_leaf() definition to resolve a NULL dereference on the migration
  path.

   - Fix PMU event validation in the absence of any event counters

   - Fix allmodconfig build using clang in conjunction with binutils

   - Fix definitions of pXd_leaf() to handle PROT_NONE entries

   - More typo fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: fix p?d_leaf()
  arm64: fix typos in comments
  arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
  arm_pmu: Validate single/group leader events
2022-04-22 13:49:26 -07:00
Miaoqian Lin
533bec143a arm/xen: Fix some refcount leaks
The of_find_compatible_node() function returns a node pointer with
refcount incremented, We should use of_node_put() on it when done
Add the missing of_node_put() to release the refcount.

Fixes: 9b08aaa319 ("ARM: XEN: Move xen_early_init() before efi_init()")
Fixes: b2371587fe ("arm/xen: Read extended regions from DT and init Xen resource")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
2022-04-22 13:33:33 -07:00
Guo Ren
8ec1442953 riscv: patch_text: Fixup last cpu should be master
These patch_text implementations are using stop_machine_cpuslocked
infrastructure with atomic cpu_count. The original idea: When the
master CPU patch_text, the others should wait for it. But current
implementation is using the first CPU as master, which couldn't
guarantee the remaining CPUs are waiting. This patch changes the
last CPU as the master to solve the potential risk.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 043cb41a85 ("riscv: introduce interfaces to patch kernel code")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-22 08:29:24 -07:00
Muchun Song
23bc8f69f0 arm64: mm: fix p?d_leaf()
The pmd_leaf() is used to test a leaf mapped PMD, however, it misses
the PROT_NONE mapped PMD on arm64.  Fix it.  A real world issue [1]
caused by this was reported by Qian Cai. Also fix pud_leaf().

Link: https://patchwork.kernel.org/comment/24798260/ [1]
Fixes: 8aa82df3c1 ("arm64: mm: add p?d_leaf() definitions")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Link: https://lore.kernel.org/r/20220422060033.48711-1-songmuchun@bytedance.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-22 11:28:36 +01:00
Randy Dunlap
bf9bac40b7 RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLE
There can be lots of build errors when building cpuidle-riscv-sbi.o.
They are all caused by a kconfig problem with this warning:

WARNING: unmet direct dependencies detected for RISCV_SBI_CPUIDLE
  Depends on [n]: CPU_IDLE [=y] && RISCV [=y] && RISCV_SBI [=n]
  Selected by [y]:
  - SOC_VIRT [=y] && CPU_IDLE [=y]

so make the 'select' of RISCV_SBI_CPUIDLE also depend on RISCV_SBI.

Fixes: c5179ef1ca ("RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-21 15:10:47 -07:00
Anup Patel
d5fdade933 RISC-V: mm: Fix set_satp_mode() for platform not having Sv57
When Sv57 is not available the satp.MODE test in set_satp_mode() will
fail and lead to pgdir re-programming for Sv48. The pgdir re-programming
will fail as well due to pre-existing pgdir entry used for Sv57 and as
a result kernel fails to boot on RISC-V platform not having Sv57.

To fix above issue, we should clear the pgdir memory in set_satp_mode()
before re-programming.

Fixes: 011f09d120 ("riscv: mm: Set sv57 on defaultly")
Reported-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-21 15:10:39 -07:00
Mingwei Zhang
683412ccf6 KVM: SEV: add cache flush to solve SEV cache incoherency issues
Flush the CPU caches when memory is reclaimed from an SEV guest (where
reclaim also includes it being unmapped from KVM's memslots).  Due to lack
of coherency for SEV encrypted memory, failure to flush results in silent
data corruption if userspace is malicious/broken and doesn't ensure SEV
guest memory is properly pinned and unpinned.

Cache coherency is not enforced across the VM boundary in SEV (AMD APM
vol.2 Section 15.34.7). Confidential cachelines, generated by confidential
VM guests have to be explicitly flushed on the host side. If a memory page
containing dirty confidential cachelines was released by VM and reallocated
to another user, the cachelines may corrupt the new user at a later time.

KVM takes a shortcut by assuming all confidential memory remain pinned
until the end of VM lifetime. Therefore, KVM does not flush cache at
mmu_notifier invalidation events. Because of this incorrect assumption and
the lack of cache flushing, malicous userspace can crash the host kernel:
creating a malicious VM and continuously allocates/releases unpinned
confidential memory pages when the VM is running.

Add cache flush operations to mmu_notifier operations to ensure that any
physical memory leaving the guest VM get flushed. In particular, hook
mmu_notifier_invalidate_range_start and mmu_notifier_release events and
flush cache accordingly. The hook after releasing the mmu lock to avoid
contention with other vCPUs.

Cc: stable@vger.kernel.org
Suggested-by: Sean Christpherson <seanjc@google.com>
Reported-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-4-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 15:41:00 -04:00
Mingwei Zhang
d45829b351 KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
Use clflush_cache_range() to flush the confidential memory when
SME_COHERENT is supported in AMD CPU. Cache flush is still needed since
SME_COHERENT only support cache invalidation at CPU side. All confidential
cache lines are still incoherent with DMA devices.

Cc: stable@vger.kerel.org

Fixes: add5e2f045 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-3-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 13:16:59 -04:00
Sean Christopherson
4bbef7e8eb KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
Rework sev_flush_guest_memory() to explicitly handle only a single page,
and harden it to fall back to WBINVD if VM_PAGE_FLUSH fails.  Per-page
flushing is currently used only to flush the VMSA, and in its current
form, the helper is completely broken with respect to flushing actual
guest memory, i.e. won't work correctly for an arbitrary memory range.

VM_PAGE_FLUSH takes a host virtual address, and is subject to normal page
walks, i.e. will fault if the address is not present in the host page
tables or does not have the correct permissions.  Current AMD CPUs also
do not honor SMAP overrides (undocumented in kernel versions of the APM),
so passing in a userspace address is completely out of the question.  In
other words, KVM would need to manually walk the host page tables to get
the pfn, ensure the pfn is stable, and then use the direct map to invoke
VM_PAGE_FLUSH.  And the latter might not even work, e.g. if userspace is
particularly evil/clever and backs the guest with Secret Memory (which
unmaps memory from the direct map).

Signed-off-by: Sean Christopherson <seanjc@google.com>

Fixes: add5e2f045 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reported-by: Mingwei Zhang <mizhang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-2-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 13:16:30 -04:00
Like Xu
75189d1de1 KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
NMI-watchdog is one of the favorite features of kernel developers,
but it does not work in AMD guest even with vPMU enabled and worse,
the system misrepresents this capability via /proc.

This is a PMC emulation error. KVM does not pass the latest valid
value to perf_event in time when guest NMI-watchdog is running, thus
the perf_event corresponding to the watchdog counter will enter the
old state at some point after the first guest NMI injection, forcing
the hardware register PMC0 to be constantly written to 0x800000000001.

Meanwhile, the running counter should accurately reflect its new value
based on the latest coordinated pmc->counter (from vPMC's point of view)
rather than the value written directly by the guest.

Fixes: 168d918f26 ("KVM: x86: Adjust counter sample period after a wrmsr")
Reported-by: Dongli Cao <caodongli@kingsoft.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220409015226.38619-1-likexu@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 13:16:14 -04:00
Wanpeng Li
0361bdfddc x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
MSR_KVM_POLL_CONTROL is cleared on reset, thus reverting guests to
host-side polling after suspend/resume.  Non-bootstrap CPUs are
restored correctly by the haltpoll driver because they are hot-unplugged
during suspend and hot-plugged during resume; however, the BSP
is not hotpluggable and remains in host-sde polling mode after
the guest resume.  The makes the guest pay for the cost of vmexits
every time the guest enters idle.

Fix it by recording BSP's haltpoll state and resuming it during guest
resume.

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1650267752-46796-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 13:16:14 -04:00
Sean Christopherson
0047fb33f8 KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
Skip the APICv inhibit update for KVM_GUESTDBG_BLOCKIRQ if APICv is
disabled at the module level to avoid having to acquire the mutex and
potentially process all vCPUs. The DISABLE inhibit will (barring bugs)
never be lifted, so piling on more inhibits is unnecessary.

Fixes: cae72dcc3b ("KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220420013732.3308816-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 13:16:13 -04:00
Sean Christopherson
423ecfea77 KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
Make a KVM_REQ_APICV_UPDATE request when creating a vCPU with an
in-kernel local APIC and APICv enabled at the module level.  Consuming
kvm_apicv_activated() and stuffing vcpu->arch.apicv_active directly can
race with __kvm_set_or_clear_apicv_inhibit(), as vCPU creation happens
before the vCPU is fully onlined, i.e. it won't get the request made to
"all" vCPUs.  If APICv is globally inhibited between setting apicv_active
and onlining the vCPU, the vCPU will end up running with APICv enabled
and trigger KVM's sanity check.

Mark APICv as active during vCPU creation if APICv is enabled at the
module level, both to be optimistic about it's final state, e.g. to avoid
additional VMWRITEs on VMX, and because there are likely bugs lurking
since KVM checks apicv_active in multiple vCPU creation paths.  While
keeping the current behavior of consuming kvm_apicv_activated() is
arguably safer from a regression perspective, force apicv_active so that
vCPU creation runs with deterministic state and so that if there are bugs,
they are found sooner than later, i.e. not when some crazy race condition
is hit.

  WARNING: CPU: 0 PID: 484 at arch/x86/kvm/x86.c:9877 vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
  Modules linked in:
  CPU: 0 PID: 484 Comm: syz-executor361 Not tainted 5.16.13 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014
  RIP: 0010:vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
  Call Trace:
   <TASK>
   vcpu_run arch/x86/kvm/x86.c:10039 [inline]
   kvm_arch_vcpu_ioctl_run+0x337/0x15e0 arch/x86/kvm/x86.c:10234
   kvm_vcpu_ioctl+0x4d2/0xc80 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3727
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:874 [inline]
   __se_sys_ioctl fs/ioctl.c:860 [inline]
   __x64_sys_ioctl+0x16d/0x1d0 fs/ioctl.c:860
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

The bug was hit by a syzkaller spamming VM creation with 2 vCPUs and a
call to KVM_SET_GUEST_DEBUG.

  r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x0, 0x0)
  r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
  ioctl$KVM_CAP_SPLIT_IRQCHIP(r1, 0x4068aea3, &(0x7f0000000000)) (async)
  r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) (async)
  r3 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x400000000000002)
  ioctl$KVM_SET_GUEST_DEBUG(r3, 0x4048ae9b, &(0x7f00000000c0)={0x5dda9c14aa95f5c5})
  ioctl$KVM_RUN(r2, 0xae80, 0x0)

Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Fixes: 8df14af42f ("kvm: x86: Add support for dynamic APICv activation")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220420013732.3308816-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 13:16:12 -04:00