Commit Graph

205010 Commits

Author SHA1 Message Date
Linus Torvalds
b2f317173e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Pass the correct address to mte_clear_page_tags() on initialising a
     tagged page

   - Plug a race against a GICv4.1 doorbell interrupt while saving the
     vgic-v3 pending state.

  x86:

   - A command line parsing fix and a clang compilation fix for
     selftests

   - A fix for a longstanding VMX issue, that surprisingly was only
     found now to affect real world guests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Make reclaim_period_ms input always be positive
  KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
  selftests: kvm: move declaration at the beginning of main()
  KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
  KVM: arm64: Pass the actual page address to mte_clear_page_tags()
2023-01-24 17:48:09 -08:00
Linus Torvalds
9946f0981f Merge tag 'efi-fixes-for-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
 "Another couple of EFI fixes, of which the first two were already in
  -next when I sent out the previous PR, but they caused some issues on
  non-EFI boots so I let them simmer for a bit longer.

   - ensure the EFI ResetSystem and ACPI PRM calls are recognized as
     users of the EFI runtime, and therefore protected against
     exceptions

   - account for the EFI runtime stack in the stacktrace code

   - remove Matthew Garrett's MAINTAINERS entry for efivarfs"

* tag 'efi-fixes-for-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Remove Matthew Garrett as efivarfs maintainer
  arm64: efi: Account for the EFI runtime stack in stack unwinder
  arm64: efi: Avoid workqueue to check whether EFI runtime is live
2023-01-23 11:46:19 -08:00
Linus Torvalds
2475bf0250 Merge tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:

 - Make sure the scheduler doesn't use stale frequency scaling values
   when latter get disabled due to a value error

 - Fix a NULL pointer access on UP configs

 - Use the proper locking when updating CPU capacity

* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
  sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
  sched/fair: Fixes for capacity inversion detection
  sched/uclamp: Fix a uninitialized variable warnings
2023-01-22 12:14:58 -08:00
Linus Torvalds
2b299a1cd4 Merge tag 'perf_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:

 - Add Emerald Rapids model support to more perf machinery

* tag 'perf_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/cstate: Add Emerald Rapids
  perf/x86/intel: Add Emerald Rapids
2023-01-22 12:06:18 -08:00
Hendrik Borghorst
a44b331614 KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
When serializing and deserializing kvm_sregs, attributes of the segment
descriptors are stored by user space. For unusable segments,
vmx_segment_access_rights skips all attributes and sets them to 0.

This means we zero out the DPL (Descriptor Privilege Level) for unusable
entries.

Unusable segments are - contrary to their name - usable in 64bit mode and
are used by guests to for example create a linear map through the
NULL selector.

VMENTER checks if SS.DPL is correct depending on the CS segment type.
For types 9 (Execute Only) and 11 (Execute Read), CS.DPL must be equal to
SS.DPL [1].

We have seen real world guests setting CS to a usable segment with DPL=3
and SS to an unusable segment with DPL=3. Once we go through an sregs
get/set cycle, SS.DPL turns to 0. This causes the virtual machine to crash
reproducibly.

This commit changes the attribute logic to always preserve attributes for
unusable segments. According to [2] SS.DPL is always saved on VM exits,
regardless of the unusable bit so user space applications should have saved
the information on serialization correctly.

[3] specifies that besides SS.DPL the rest of the attributes of the
descriptors are undefined after VM entry if unusable bit is set. So, there
should be no harm in setting them all to the previous state.

[1] Intel SDM Vol 3C 26.3.1.2 Checks on Guest Segment Registers
[2] Intel SDM Vol 3C 27.3.2 Saving Segment Registers and Descriptor-Table
Registers
[3] Intel SDM Vol 3C 26.3.2.2 Loading Guest Segment Registers and
Descriptor-Table Registers

Cc: Alexander Graf <graf@amazon.de>
Cc: stable@vger.kernel.org
Signed-off-by: Hendrik Borghorst <hborghor@amazon.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221114164823.69555-1-hborghor@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-22 04:09:28 -05:00
Paolo Bonzini
d732cbf78d Merge tag 'kvmarm-fixes-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.2, take #2

- Pass the correct address to mte_clear_page_tags() on initialising
  a tagged page

- Plug a race against a GICv4.1 doorbell interrupt while saving
  the vgic-v3 pending state.
2023-01-22 03:46:14 -05:00
Marc Zyngier
ef3691683d KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
To save the vgic LPI pending state with GICv4.1, the VPEs must all be
unmapped from the ITSs so that the sGIC caches can be flushed.
The opposite is done once the state is saved.

This is all done by using the activate/deactivate irqdomain callbacks
directly from the vgic code. Crutially, this is done without holding
the irqdesc lock for the interrupts that represent the VPE. And these
callbacks are changing the state of the irqdesc. What could possibly
go wrong?

If a doorbell fires while we are messing with the irqdesc state,
it will acquire the lock and change the interrupt state concurrently.
Since we don't hole the lock, curruption occurs in on the interrupt
state. Oh well.

While acquiring the lock would fix this (and this was Shanker's
initial approach), this is still a layering violation we could do
without. A better approach is actually to free the VPE interrupt,
do what we have to do, and re-request it.

It is more work, but this usually happens only once in the lifetime
of the VM and we don't really care about this sort of overhead.

Fixes: f66b7b151e ("KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables")
Reported-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230118022348.4137094-1-sdonthineni@nvidia.com
2023-01-21 11:02:19 +00:00
Catalin Marinas
c3b37c2d77 KVM: arm64: Pass the actual page address to mte_clear_page_tags()
Commit d77e59a8fc ("arm64: mte: Lock a page for MTE tag
initialisation") added a call to mte_clear_page_tags() in case a
prior mte_copy_tags_from_user() failed in order to avoid stale tags in
the guest page (it should have really been a separate commit).
Unfortunately, the argument passed to this function was the address of
the struct page rather than the actual page address. Fix this function
call.

Fixes: d77e59a8fc ("arm64: mte: Lock a page for MTE tag initialisation")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230119170902.1574756-1-catalin.marinas@arm.com
2023-01-21 11:02:19 +00:00
Linus Torvalds
1ed46384f8 Merge tag 'soc-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC DT and driver fixes from Arnd Bergmann:
 "Lots of dts fixes for Qualcomm Snapdragon and NXP i.MX platforms,
  including:

   - A regression fix for SDHCI controllers on Inforce 6540, and another
     SDHCI fix on SM8350

   - Reenable cluster idle on sm8250 after the the code fix is upstream

   - multiple fixes for the QMP PHY binding, needing an incompatible dt
     change

   - The reserved memory map is updated on Xiaomi Mi 4C and Huawei Nexus
     6P, to avoid instabilities caused by use of protected memory
     regions

   - Fix i.MX8MP DT for missing GPC Interrupt, power-domain typo and USB
     clock error

   - A couple of verdin-imx8mm DT fixes for audio playback support

   - Fix pca9547 i2c-mux node name for i.MX and Vybrid device trees

   - Fix an imx93-11x11-evk uSDHC pad setting problem that causes Micron
     eMMC CMD8 CRC error in HS400ES/HS400 mode

  The remaining ARM and RISC-V platforms only have very few smaller dts
  bugfixes this time:

   - A fix for the SiFive unmatched board's PCI memory space

   - A revert to fix a regression with GPIO on Marvell Armada

   - A fix for the UART address on Marvell AC5

   - Missing chip-select phandles for stm32 boards

   - Selecting the correct clock for the sam9x60 memory controller

   - Amlogic based Odroid-HC4 needs a revert to restore USB
     functionality.

  And finally, there are some minor code fixes:

   - Build fixes for OMAP1, pxa, riscpc, raspberry pi firmware, and zynq
     firmware

   - memory controller driver fixes for an OMAP regression and older
     bugs on tegra, atmel and mvebu

   - reset controller fixes for ti-sci and uniphier platforms

   - ARM SCMI firmware fixes for a couple of rare corner cases

   - Qualcomm platform driver fixes for incorrect error handling and a
     backwards compatibility fix for the apr driver using older dtb

   - NXP i.MX SoC driver fixes for HDMI output, error handling in the
     imx8 soc-id and missing reference counting on older cpuid code"

* tag 'soc-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (60 commits)
  firmware: zynqmp: fix declarations for gcc-13
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp151a-prtt1l
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp157c-emstamp-argon
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcom-som
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcor-som
  ARM: dts: at91: sam9x60: fix the ddr clock for sam9x60
  ARM: omap1: fix building gpio15xx
  ARM: omap1: fix !ARCH_OMAP1_ANY link failures
  firmware: raspberrypi: Fix type assignment
  arm64: dts: qcom: msm8992-libra: Fix the memory map
  arm64: dts: qcom: msm8992: Don't use sfpb mutex
  PM: AVS: qcom-cpr: Fix an error handling path in cpr_probe()
  arm64: dts: msm8994-angler: fix the memory map
  arm64: dts: marvell: AC5/AC5X: Fix address for UART1
  ARM: footbridge: drop unnecessary inclusion
  Revert "ARM: dts: armada-39x: Fix compatible string for gpios"
  Revert "ARM: dts: armada-38x: Fix compatible string for gpios"
  ARM: pxa: enable PXA310/PXA320 for DT-only build
  riscv: dts: sifive: fu740: fix size of pcie 32bit memory
  soc: qcom: apr: Make qcom,protection-domain optional again
  ...
2023-01-20 11:00:03 -08:00
Arnd Bergmann
aca5d87d93 Merge tag 'at91-fixes-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes
AT91 fixes for 6.2:

It contains:
- fix the clock provided via DT for DDR controller on SAM9X60

* tag 'at91-fixes-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  ARM: dts: at91: sam9x60: fix the ddr clock for sam9x60

Link: https://lore.kernel.org/r/20230119112101.42045-1-claudiu.beznea@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-20 10:40:52 +01:00
Arnd Bergmann
b7b0742883 Merge tag 'stm32-dt-for-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into arm/fixes
STM32 DT fixes for v6.2, round 1

Highlights:
-----------

 -STM32MP15:
  - Fix missing chip select phandle in several stm32mp15x based boards.

* tag 'stm32-dt-for-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp151a-prtt1l
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp157c-emstamp-argon
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcom-som
  ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcor-som

Link: https://lore.kernel.org/r/3fe26bf9-297b-5c78-682b-37fa6d8b6190@foss.st.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-20 10:39:46 +01:00
Linus Torvalds
a03df4ec37 Merge tag 's390-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 build fix from Heiko Carstens:

 - Workaround invalid gcc-11 out of bounds read warning caused by s390's
   S390_lowcore definition. This happens only with gcc 11.1.0 and
   11.2.0.

   The code which causes this warning will be gone with the next merge
   window. Therefore just replace the memcpy() with a for loop to get
   rid of the warning.

* tag 's390-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: workaround invalid gcc-11 out of bounds read warning
2023-01-19 12:28:53 -08:00
Kan Liang
5a8a05f165 perf/x86/intel/cstate: Add Emerald Rapids
From the perspective of Intel cstate residency counters,
Emerald Rapids is the same as the Sapphire Rapids and Ice Lake.
Add Emerald Rapids model.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-2-kan.liang@linux.intel.com
2023-01-18 12:42:49 +01:00
Kan Liang
6795e558e9 perf/x86/intel: Add Emerald Rapids
From core PMU's perspective, Emerald Rapids is the same as the Sapphire
Rapids. The only difference is the event list, which will be
supported in the perf tool later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-1-kan.liang@linux.intel.com
2023-01-18 12:42:49 +01:00
Heiko Carstens
41e1992665 s390: workaround invalid gcc-11 out of bounds read warning
GCC 11.1.0 and 11.2.0 generate a wrong warning when compiling the
kernel e.g. with allmodconfig:

arch/s390/kernel/setup.c: In function ‘setup_lowcore_dat_on’:
./include/linux/fortify-string.h:57:33: error: ‘__builtin_memcpy’ reading 128 bytes from a region of size 0 [-Werror=stringop-overread]
...
arch/s390/kernel/setup.c:526:9: note: in expansion of macro ‘memcpy’
  526 |         memcpy(abs_lc->cregs_save_area, S390_lowcore.cregs_save_area,
      |         ^~~~~~

This could be addressed by using absolute_pointer() with the
S390_lowcore macro, but this is not a good idea since this generates
worse code for performance critical paths.

Therefore simply use a for loop to copy the array in question and get
rid of the warning.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-17 19:00:59 +01:00
Patrice Chotard
175281f806 ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp151a-prtt1l
Chip select pinctrl phandle was missing in several stm32mp15x based boards.

Fixes: ea99a5a02e ("ARM: dts: stm32: Create separate pinmux for qspi cs pin in stm32mp15-pinctrl.dtsi")

Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2023-01-17 14:48:44 +01:00
Patrice Chotard
732dbcf52f ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp157c-emstamp-argon
Chip select pinctrl phandle was missing in several stm32mp15x based boards.

Fixes: ea99a5a02e ("ARM: dts: stm32: Create separate pinmux for qspi cs pin in stm32mp15-pinctrl.dtsi")

Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Reinhold Mueller <reinhold.mueller@emtrion.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2023-01-17 14:48:44 +01:00
Patrice Chotard
21d83512bf ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcom-som
Chip select pinctrl phandle was missing in several stm32mp15x based boards.

Fixes: ea99a5a02e ("ARM: dts: stm32: Create separate pinmux for qspi cs pin in stm32mp15-pinctrl.dtsi")

Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2023-01-17 14:48:44 +01:00
Patrice Chotard
7ffd2266bd ARM: dts: stm32: Fix qspi pinctrl phandle for stm32mp15xx-dhcor-som
Chip select pinctrl phandle was missing in several stm32mp15x based boards.

Fixes: ea99a5a02e ("ARM: dts: stm32: Create separate pinmux for qspi cs pin in stm32mp15-pinctrl.dtsi")

Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2023-01-17 14:48:44 +01:00
Jinyang He
dc74a9e8a8 LoongArch: Add generic ex-handler unwind in prologue unwinder
When exception is triggered, code flow go handle_\exception in some
cases. One of stackframe in this case as follows,

high -> +-------+
        | REGS  |  <- a pt_regs
        |       |
        |       |  <- ex trigger
        | REGS  |  <- ex pt_regs   <-+
        |       |                    |
        |       |                    |
low  -> +-------+           ->unwind-+

When unwinder unwinds to handler_\exception it cannot go on prologue
analysis. Because it is an asynchronous code flow, we should get the
next frame PC from regs->csr_era rather than regs->regs[1]. At init time
we copy the handlers to eentry and also copy them to NUMA-affine memory
named pcpu_handlers if NUMA is enabled. Thus, unwinder cannot unwind
normally. To solve this, we try to give some hints in handler_\exception
and fixup unwinders in unwind_next_frame().

Reported-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Jinyang He
c5ac25e0d7 LoongArch: Strip guess unwinder out from prologue unwinder
The prolugue unwinder rely on symbol info. When PC is not in kernel text
address, it cannot find relative symbol info and it will be broken. The
guess unwinder will be used in this case. And the guess unwinder code in
prolugue unwinder is redundant. Strip it out and set the unwinder type
in unwind_state. Make guess_unwinder::unwind_next_frame() as default way
when other unwinders cannot unwind in some extreme case.

Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Jinyang He
5bb8d34449 LoongArch: Use correct sp value to get graph addr in stack unwinders
The stack frame when function_graph enable like follows,

---------  <- function sp_on_entry
    |
    |
    |
 FAKE_RA   <- sp_on_entry - sizeof(pt_regs) + PT_R1
    |
---------  <- sp_on_entry - sizeof(pt_regs)

So if we want to get the &FAKE_RA we should get sp_on_entry first. In
the unwinder_prologue case, we can get the sp_on_entry as state->sp,
because we try to calculate each CFA and the ra saved address. But in
the unwinder_guess case, we cannot get it because we do not try to
calculate the CFA. Although LoongArch have not fixed frame, the $ra is
saved at CFA - 8 in most cases, we can try guess, too. As we store the
pc in state, we not need to dereference state->sp, too.

Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Jinyang He
429a9671f2 LoongArch: Get frame info in unwind_start() when regs is not available
At unwind_start(), it is better to get its frame info here rather than
get them outside, even we don't have 'regs'. In this way we can simply
use unwind_{start, next_frame, done} outside.

Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Jinyang He
e2f2739227 LoongArch: Adjust PC value when unwind next frame in unwinder
When state->first is not set, the PC is a return address in the previous
frame. We need to adjust its value in case overflow to the next symbol.

Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Youling Tang
3200983fa8 LoongArch: Simplify larch_insn_gen_xxx implementation
Simplify larch_insn_gen_xxx implementation by reusing emit_xxx.

Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Tiezhu Yang
2959fce7fd LoongArch: Use common function sign_extend64()
There exists a common function sign_extend64() to sign extend a 64-bit
value using specified bit as sign-bit in include/linux/bitops.h, it is
more efficient, let us use it and remove the arch-specific sign_extend()
under arch/loongarch.

Suggested-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Huacai Chen
d52fec86a4 LoongArch: Add HWCAP_LOONGARCH_CPUCFG to elf_hwcap
HWCAP_LOONGARCH_CPUCFG is missing in elf_hwcap, so add it for glibc's
later use.

Cc: stable@vger.kernel.org
Reported-by: Yinyu Cai <caiyinyu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-01-17 11:42:16 +08:00
Ard Biesheuvel
7ea55715c4 arm64: efi: Account for the EFI runtime stack in stack unwinder
The EFI runtime services run from a dedicated stack now, and so the
stack unwinder needs to be informed about this.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-16 15:27:31 +01:00
Ard Biesheuvel
8a9a1a1873 arm64: efi: Avoid workqueue to check whether EFI runtime is live
Comparing current_work() against efi_rts_work.work is sufficient to
decide whether current is currently running EFI runtime services code at
any level in its call stack.

However, there are other potential users of the EFI runtime stack, such
as the ACPI subsystem, which may invoke efi_call_virt_pointer()
directly, and so any sync exceptions occurring in firmware during those
calls are currently misidentified.

So instead, let's check whether the stashed value of the thread stack
pointer points into current's thread stack. This can only be the case if
current was interrupted while running EFI runtime code. Note that this
implies that we should clear the stashed value after switching back, to
avoid false positives.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-01-16 15:27:31 +01:00
Arnd Bergmann
e2f7096f1e Merge tag 'riscv-dt-fixes-for-v6.2-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes
RISC-V DeviceTrees for v6.2

SiFive:
A solitary fix for the PCI memory regions on the unmatched, triggered by
an SM768. No-one must have tried one until just recently!

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'riscv-dt-fixes-for-v6.2-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  riscv: dts: sifive: fu740: fix size of pcie 32bit memory

Link: https://lore.kernel.org/r/Y78rn+0fVNOxHLKt@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-16 11:05:14 +01:00
Arnd Bergmann
88bcc6fa5e Merge tag 'mvebu-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into arm/fixes
mvebu fixes for 6.2 (part 1)

Fix regression for gpio support on Armada 38x and Armada 38x

Fix address for UART1 on AC5/AC5X

* tag 'mvebu-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
  arm64: dts: marvell: AC5/AC5X: Fix address for UART1
  Revert "ARM: dts: armada-39x: Fix compatible string for gpios"
  Revert "ARM: dts: armada-38x: Fix compatible string for gpios"

Link: https://lore.kernel.org/r/87mt6mg08k.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-16 11:04:41 +01:00
Yair Podemsky
5f5cc9ed99 x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
Once disable_freq_invariance_work is called the scale_freq_tick function
will not compute or update the arch_freq_scale values.
However the scheduler will still read these values and use them.
The result is that the scheduler might perform unfair decisions based on stale
values.

This patch adds the step of setting the arch_freq_scale values for all
cpus to the default (max) value SCHED_CAPACITY_SCALE, Once all cpus
have the same arch_freq_scale value the scaling is meaningless.

Signed-off-by: Yair Podemsky <ypodemsk@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230110160206.75912-1-ypodemsk@redhat.com
2023-01-16 10:19:15 +01:00
Linus Torvalds
f0f70ddb8f Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Make sure the poking PGD is pinned for Xen PV as it requires it this
   way

 - Fixes for two resctrl races when moving a task or creating a new
   monitoring group

 - Fix SEV-SNP guests running under HyperV where MTRRs are disabled to
   not return a UC- type mapping type on memremap() and thus cause a
   serious slowdown

 - Fix insn mnemonics in bioscall.S now that binutils is starting to fix
   confusing insn suffixes

* tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: fix poking_init() for Xen PV guests
  x86/resctrl: Fix event counts regression in reused RMIDs
  x86/resctrl: Fix task CLOSID/RMID update race
  x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
  x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
2023-01-15 07:17:44 -06:00
Linus Torvalds
b1d63f0c77 Merge tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix a build failure with some versions of ld that have an odd version
   string

 - Fix incorrect use of mutex in the IMC PMU driver

Thanks to Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, and
Yang Yingliang.

* tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s/hash: Make stress_hpt_timer_fn() static
  powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
  powerpc/boot: Fix incorrect version calculation issue in ld_version
2023-01-15 07:09:41 -06:00
Linus Torvalds
9e058c2952 Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:

 - Work around apparent firmware issue that made Linux reject MMCONFIG
   space, which broke PCI extended config space (Bjorn Helgaas)

 - Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
   PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
   Bulwahn)

* tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
  x86/pci: Simplify is_mmconf_reserved() messages
  PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN
2023-01-13 17:32:22 -06:00
Linus Torvalds
92783a90bc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the PMCR_EL0 reset value after the PMU rework

   - Correctly handle S2 fault triggered by a S1 page table walk by not
     always classifying it as a write, as this breaks on R/O memslots

   - Document why we cannot exit with KVM_EXIT_MMIO when taking a write
     fault from a S1 PTW on a R/O memslot

   - Put the Apple M2 on the naughty list for not being able to
     correctly implement the vgic SEIS feature, just like the M1 before
     it

   - Reviewer updates: Alex is stepping down, replaced by Zenghui

  x86:

   - Fix various rare locking issues in Xen emulation and teach lockdep
     to detect them

   - Documentation improvements

   - Do not return host topology information from KVM_GET_SUPPORTED_CPUID"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
  KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
  KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
  KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
  Documentation: kvm: fix SRCU locking order docs
  KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  KVM: nSVM: clarify recalc_intercepts() wrt CR8
  MAINTAINERS: Remove myself as a KVM/arm64 reviewer
  MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
  KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
  KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
  KVM: arm64: Fix S1PTW handling on RO memslots
  KVM: arm64: PMU: Fix PMCR_EL0 reset value
2023-01-13 14:41:50 -06:00
Bjorn Helgaas
fd3a8cff4d x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2).

07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.

Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.  After
07eab0901e, that E820 entry is removed, so we reject this ECAM space,
which makes PCI extended config space (offsets 0x100-0xfff) inaccessible.

The lack of extended config space breaks anything that relies on it,
including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc.

Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.

Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891
Fixes: 07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.org
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reported-by: Yunying Sun <yunying.sun@intel.com>
Reported-by: Baowen Zheng <baowen.zheng@corigine.com>
Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
2023-01-13 11:53:54 -06:00
Linus Torvalds
0bf913e07b Merge tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - avoid a potential crash on the efi_subsys_init() error path

 - use more appropriate error code for runtime services calls issued
   after a crash in the firmware occurred

 - avoid READ_ONCE() for accessing firmware tables that may appear
   misaligned in memory

* tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: tpm: Avoid READ_ONCE() for accessing the event log
  efi: rt-wrapper: Add missing include
  efi: fix userspace infinite retry read efivars after EFI runtime services page fault
  efi: fix NULL-deref in init error path
2023-01-13 10:37:10 -06:00
Linus Torvalds
d45b832d6f Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What
  could possibly go wrong?

  The obvious reason we have so much here is because of the holiday
  season right after the merge window, but we've also brought back an
  erratum workaround that was previously dropped at the last minute and
  there's an MTE coredumping fix that strays outside of the arch/arm64
  directory.

  Summary:

   - Fix PAGE_TABLE_CHECK failures on hugepage splitting path

   - Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header

   - Fix NULL deref when accessing debugfs node if PSCI is not present

   - Fix MTE core dumping when VMA list is being updated concurrently

   - Fix SME signal frame handling when SVE is not implemented by the
     CPU

   - Fix asm constraints for cmpxchg_double() to hazard both words

   - Fix build failure with stack tracer and older versions of Clang

   - Bring back workaround for Cortex-A715 erratum 2645198"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y
  arm64/mm: Define dummy pud_user_exec() when using 2-level page-table
  arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
  firmware/psci: Don't register with debugfs if PSCI isn't available
  firmware/psci: Fix MEM_PROTECT_RANGE function numbers
  arm64/signal: Always allocate SVE signal frames on SME only systems
  arm64/signal: Always accept SVE signal frames on SME only systems
  arm64/sme: Fix context switch for SME only systems
  arm64: cmpxchg_double*: hazard against entire exchange variable
  arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
  arm64: mte: Avoid the racy walk of the vma list during core dump
  elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
  arm64: mte: Fix double-freeing of the temporary tag storage during coredump
  arm64: ptrace: Use ARM64_SME to guard the SME register enumerations
  arm64/mm: add pud_user_exec() check in pud_user_accessible_page()
  arm64/mm: fix incorrect file_map_count for invalid pmd
2023-01-13 07:11:45 -06:00
Linus Torvalds
5be413a6e2 Merge tag 's390-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:

 - Add various missing READ_ONCE() to cmpxchg() loops prevent the
   compiler from potentially generating incorrect code. This includes a
   rather large change to the s390 specific hardware sampling code and
   its current use of cmpxchg_double().

   Do the fix now to get it out of the way of Peter Zijlstra's
   cmpxchg128() work, and have something that can be backported. The
   added new code includes a private 128 bit cmpxchg variant which will
   be removed again after Peter's rework is available. Also note that
   this 128 bit cmpxchg variant is used to implement 128 bit
   READ_ONCE(), while strictly speaking it wouldn't be necessary, and
   _READ_ONCE() should also be sufficient; even though it isn't obvious
   for all converted locations that this is the case. Therefore use this
   implementation for for the sake of clarity and consistency for now.

 - Fix ipl report address handling to avoid kdump failures/hangs.

 - Fix misuse of #(el)if in kernel decompressor.

 - Define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36,
   caused by the recently changed discard behaviour.

 - Make sure _edata and _end symbols are always page aligned.

 - The current header guard DEBUG_H in one of the s390 specific header
   files is too generic and conflicts with the ath9k wireless driver.
   Add an _ASM_S390_ prefix to the guard to make it unique.

 - Update defconfigs.

* tag 's390-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfigs
  KVM: s390: interrupt: use READ_ONCE() before cmpxchg()
  s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()
  s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops
  s390/kexec: fix ipl report address for kdump
  s390: fix -Wundef warning for CONFIG_KERNEL_ZSTD
  s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36
  s390: expicitly align _edata and _end symbols on page boundary
  s390/debug: add _ASM_S390_ prefix to header guard
2023-01-12 17:09:20 -06:00
Linus Torvalds
bad8c4a850 Merge tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - two cleanup patches

 - a fix of a memory leak in the Xen pvfront driver

 - a fix of a locking issue in the Xen hypervisor console driver

* tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvcalls: free active map buffer on pvcalls_front_free_map
  hvc/xen: lock console list traversal
  x86/xen: Remove the unused function p2m_index()
  xen: make remove callback of xen driver void returned
2023-01-12 17:02:20 -06:00
Linus Torvalds
a7b19c603e Merge tag 'perf-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events hw enablement from Ingo Molnar:

 - More hardware-enablement for Intel Meteor Lake & Emerald Rapid
   systems: pure model ID enumeration additions that do not affect other
   systems.

* tag 'perf-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add Emerald Rapids
  perf/x86/msr: Add Emerald Rapids
  perf/x86/msr: Add Meteor Lake support
  perf/x86/cstate: Add Meteor Lake support
2023-01-12 16:47:32 -06:00
Claudiu Beznea
9bfa2544db ARM: dts: at91: sam9x60: fix the ddr clock for sam9x60
The 2nd DDR clock for sam9x60 DDR controller is peripheral clock with
id 49.

Fixes: 1e5f532c27 ("ARM: dts: at91: sam9x60: add device tree for soc and board")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20221208115241.36312-1-claudiu.beznea@microchip.com
2023-01-12 13:45:49 +02:00
Juergen Gross
26ce6ec364 x86/mm: fix poking_init() for Xen PV guests
Commit 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()") broke
the kernel for running as Xen PV guest.

It seems as if the new address space is never activated before being
used, resulting in Xen rejecting to accept the new CR3 value (the PGD
isn't pinned).

Fix that by adding the now missing call of paravirt_arch_dup_mmap() to
poking_init(). That call was previously done by dup_mm()->dup_mmap() and
it is a NOP for all cases but for Xen PV, where it is just doing the
pinning of the PGD.

Fixes: 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230109150922.10578-1-jgross@suse.com
2023-01-12 11:22:20 +01:00
Yang Yingliang
f12cd06109 powerpc/64s/hash: Make stress_hpt_timer_fn() static
stress_hpt_timer_fn() is only used in hash_utils.c, make it static.

Fixes: 6b34a099fa ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221228093603.3166599-1-yangyingliang@huawei.com
2023-01-12 10:53:37 +11:00
David Woodhouse
310bc39546 KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
In commit 14243b3871 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
and event channel delivery") the clever version of me left some helpful
notes for those who would come after him:

       /*
        * For the irqfd workqueue, using the main kvm->lock mutex is
        * fine since this function is invoked from kvm_set_irq() with
        * no other lock held, no srcu. In future if it will be called
        * directly from a vCPU thread (e.g. on hypercall for an IPI)
        * then it may need to switch to using a leaf-node mutex for
        * serializing the shared_info mapping.
        */
       mutex_lock(&kvm->lock);

In commit 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
the other version of me ran straight past that comment without reading it,
and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
in the wrong order.

Solve this as originally suggested, by adding a leaf-node lock in the Xen
state rather than using kvm->lock for it.

Fixes: 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-4-dwmw2@infradead.org>
[Rebase, add docs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 17:45:58 -05:00
Bjorn Helgaas
a48fe63769 x86/pci: Simplify is_mmconf_reserved() messages
is_mmconf_reserved() takes a "with_e820" parameter that only determines the
message logged if it finds the MMCONFIG region is reserved.  Pass the
message directly, which will simplify a future patch that adds a new way of
looking for that reservation.  No functional change intended.

Link: https://lore.kernel.org/r/20230110180243.1590045-2-helgaas@kernel.org
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2023-01-11 16:28:09 -06:00
Heiko Carstens
1ecf7bd9c2 s390: update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11 21:26:40 +01:00
David Woodhouse
bbe17c625d KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
The kvm_xen_update_runstate_guest() function can be called when the vCPU
is being scheduled out, from a preempt notifier. It *opportunistically*
updates the runstate area in the guest memory, if the gfn_to_pfn_cache
which caches the appropriate address is still valid.

If there is *contention* when it attempts to obtain gpc->lock, then
locking inside the priority inheritance checks may cause a deadlock.
Lockdep reports:

[13890.148997] Chain exists of:
                 &gpc->lock --> &p->pi_lock --> &rq->__lock

[13890.149002]  Possible unsafe locking scenario:

[13890.149003]        CPU0                    CPU1
[13890.149004]        ----                    ----
[13890.149005]   lock(&rq->__lock);
[13890.149007]                                lock(&p->pi_lock);
[13890.149009]                                lock(&rq->__lock);
[13890.149011]   lock(&gpc->lock);
[13890.149013]
                *** DEADLOCK ***

In the general case, if there's contention for a read lock on gpc->lock,
that's going to be because something else is either invalidating or
revalidating the cache. Either way, we've raced with seeing it in an
invalid state, in which case we would have aborted the opportunistic
update anyway.

So in the 'atomic' case when called from the preempt notifier, just
switch to using read_trylock() and avoid the PI handling altogether.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 13:32:21 -05:00
David Woodhouse
23e60258ae KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
In commit 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate
area") we declared it safe to obtain two gfn_to_pfn_cache locks at the same
time:
	/*
	 * The guest's runstate_info is split across two pages and we
	 * need to hold and validate both GPCs simultaneously. We can
	 * declare a lock ordering GPC1 > GPC2 because nothing else
	 * takes them more than one at a time.
	 */

However, we forgot to tell lockdep. Do so, by setting a subclass on the
first lock before taking the second.

Fixes: 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate area")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-1-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 13:32:21 -05:00