Commit Graph

244117 Commits

Author SHA1 Message Date
Linus Torvalds
5862221fdd Merge tag 'parisc-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:

 - Revert "parisc: led: fix reference leak on failed device
   registration"

 - Fix build failures introduced when allowing to build 32-/64-bit only
   VDSO

 - Switch to dynamic parisc root device to avoid upcoming warnings

 - Fix IRQ leak in LASI driver

* tag 'parisc-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix IRQ leak in LASI driver
  parisc: Fix 64-bit kernel build when CONFIG_COMPAT=n
  parisc: Fix build failure for 32-bit kernel with PA2.0 instruction set
  parisc: drivers: switch to dynamic root device
  Revert "parisc: led: fix reference leak on failed device registration"
2026-05-06 12:51:07 -07:00
Linus Torvalds
adc1e5c620 Merge tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Fix issues in EFI graceful recovery on x86 introduced by changes to
   the kernel mode FPU APIs

 - I-cache coherency fixes for the LoongArch EFI stub

 - Locking fix for EFI pstore

 - Code tweak for efivarfs

* tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efi: Restore IRQ state in EFI page fault handler
  x86/efi: Fix graceful fault handling after FPU softirq changes
  efi/libstub: Synchronize instruction cache after kernel relocation
  efi/loongarch: Implement efi_cache_sync_image()
  efi/libstub: Move efi_relocate_kernel() into its only remaining user
  efi: pstore: Drop efivar lock when efi_pstore_open() returns with an error
  efivarfs: use QSTR() in efivarfs_alloc_dentry
2026-05-06 07:27:30 -07:00
Linus Torvalds
e80948062d Merge tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
 "Fix some build and runtime issues after 32BIT Kconfig option enabled,
  improve the platform-specific PCI controller compatibility, drop
  custom __arch_vdso_hres_capable(), and fix a lot of KVM bugs"

* tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Move unconditional delay into timer clear scenery
  LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software
  LoongArch: KVM: Move AVEC interrupt injection into switch loop
  LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte()
  LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read()
  LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry
  LoongArch: KVM: Compile switch.S directly into the kernel
  LoongArch: vDSO: Drop custom __arch_vdso_hres_capable()
  LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang()
  LoongArch: Use per-root-bridge PCIH flag to skip mem resource fixup
  LoongArch: Fix SYM_SIGFUNC_START definition for 32BIT
  LoongArch: Specify -m32/-m64 explicitly for 32BIT/64BIT
  LoongArch: Make CONFIG_64BIT as the default option
2026-05-05 19:44:46 -07:00
Ard Biesheuvel
2c340aab54 x86/efi: Restore IRQ state in EFI page fault handler
The kernel's softirq API does not permit re-enabling softirqs while IRQs
are disabled. The reason for this is that local_bh_enable() will not
only re-enable delivery of softirqs over the back of IRQs, it will also
handle any pending softirqs immediately, regardless of whether IRQs are
enabled at that point.

For this reason, commit

  d021985504 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")

disables softirqs only when IRQs are enabled, as it is not permitted
otherwise, but also unnecessary, given that asynchronous softirq
delivery never happens to begin with while IRQs are disabled.

However, this does mean that entering a kernel mode FPU section with
IRQs enabled and leaving it with IRQs disabled leads to problems, as
identified by Sashiko [0]: the EFI page fault handler is called from
page_fault_oops() with IRQs disabled, and thus ends the kernel mode FPU
section with IRQs disabled as well, regardless of whether IRQs were
enabled when it was started. This may result in schedule() being called
with a non-zero preempt_count, causing a BUG().

So take care to re-enable IRQs when handling any EFI page faults if they
were taken with IRQs enabled.

[0] https://sashiko.dev/#/patchset/20260430074107.27051-1-ivan.hu%40canonical.com

Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ivan Hu <ivan.hu@canonical.com>
Cc: x86@kernel.org
Cc: <stable@vger.kernel.org>
Fixes: d021985504 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-05-05 09:31:28 +02:00
Ivan Hu
088f65e206 x86/efi: Fix graceful fault handling after FPU softirq changes
Since commit d021985504 ("x86/fpu: Improve crypto performance by
making kernel-mode FPU reliably usable in softirqs"), kernel_fpu_begin()
calls fpregs_lock() which uses local_bh_disable() instead of the
previous preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count
during the entire EFI runtime service call, causing in_interrupt() to
return true in normal task context.

The graceful page fault handler efi_crash_gracefully_on_page_fault()
uses in_interrupt() to bail out for faults in real interrupt context.
With SOFTIRQ_OFFSET now set, the handler always bails out, leaving EFI
firmware page faults unhandled. This escalates to die() which also sees
in_interrupt() as true and calls panic("Fatal exception in interrupt"),
resulting in a hard system freeze. On systems with buggy firmware that
triggers page faults during EFI runtime calls (e.g., accessing unmapped
memory in GetTime()), this causes an unrecoverable hang instead of the
expected graceful EFI_ABORTED recovery.

Fix by replacing in_interrupt() with !in_task(). This preserves the
original intent of bailing for interrupts or NMI faults, while no longer
falsely triggering from the FPU code path's local_bh_disable().

Fixes: d021985504 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ivan Hu <ivan.hu@canonical.com>
[ardb: Sashiko spotted that using 'in_hardirq() || in_nmi()' leaves a
       window where a softirq may be taken before fpregs_lock() is
       called, but after efi_rts_work.efi_rts_id has been assigned,
       and any page faults occurring in that window will then be
       misidentified as having been caused by the firmware. Instead,
       use !in_task(), which incorporates in_serving_softirq(). ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-05-04 12:41:51 +02:00
Bibo Mao
5a873d77ba LoongArch: KVM: Move unconditional delay into timer clear scenery
When timer interrupt arrives in guest kernel, guest kernel clears the
timer interrupt and program timer with the next incoming event.

During this stage, timer tick is -1 and timer interrupt status is
disabled in ESTAT register. KVM hypervisor need write zero with timer
tick register and wait timer interrupt injection from HW side, and
then clear timer interrupt.

So there is 2 cycle delay in KVM hypervisor to emulate such scenery,
and the delay is unnecessary if there is no need to clear the timer
interrupt.

Here move 2 cycle delay into timer clear scenery and add timer ESTAT
checking after delay, and set max timer expire value if timer interrupt
does not arrive still.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:48 +08:00
Bibo Mao
2433f3f572 LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software
With passthrough HW timer, timer interrupt is injected by HW. When
inject emulated CPU interrupt by software such SIP0/SIP1/IPI, HW timer
interrupt may be lost.

Here check whether there is timer tick value inversion before and after
injecting emulated CPU interrupt by software, timer enabling by reading
timer cfg register is skipped. If the timer tick value is detected with
changing, then timer should be enabled. And inject a timer interrupt by
software if there is.

Cc: <stable@vger.kernel.org>
Fixes: f45ad5b8aa ("LoongArch: KVM: Implement vcpu interrupt operations").
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:48 +08:00
Bibo Mao
6debfff785 LoongArch: KVM: Move AVEC interrupt injection into switch loop
When AVEC interrupt controller is emulated in user space, AVEC interrupt
is injected by software like SIP0/SIP1/TI/IPI interrupts. Here also move
the AVEC interrupt injection in switch loop.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:48 +08:00
Tao Cui
81e18777d6 LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte()
kvm_flush_pte() is the only caller that directly assigns *pte instead
of using the kvm_set_pte() wrapper. Use the wrapper for consistency with
the rest of the file.

No functional change intended.

Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:38 +08:00
Tao Cui
f26faae96c LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read()
In the ldptr (0x24...0x27) opcode decoding path, the default case only
breaks out but without setting "ret" value to EMULATE_FAIL. This leaves
run->mmio.len uninitialized (stale from a previous MMIO operation) while
"ret" value remains EMULATE_DO_MMIO, causing the code to proceed with an
incorrect MMIO length.

Add "ret = EMULATE_FAIL" to match the other default branches in the same
function (e.g. the 0x28...0x2e and 0x38 cases).

Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:38 +08:00
Qiang Ma
b3e31a6650 LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of vCPUs
which exceeds the maximum possible number of vCPUs.

Other architectures have already done this, such as commit 57a2e13ebd
("KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS")

Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:37 +08:00
Xianglai Li
b323a441da LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry
Insert the appropriate UNWIND hint into the kvm_exc_entry assembly
function to guide the generation of correct ORC table entries, thereby
solving the timeout problem ("unreliable stack") while loading the
livepatch-sample module on a physical machine running virtual machines
with multiple vcpus.

Cc: stable@vger.kernel.org
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:37 +08:00
Xianglai Li
5203012fa6 LoongArch: KVM: Compile switch.S directly into the kernel
If we directly compile the switch.S file into the kernel, the address of
the kvm_exc_entry function will definitely be within the DMW memory area.
Therefore, we will no longer need to perform a copy relocation of the
kvm_exc_entry.

So this patch compiles switch.S directly into the kernel, and then remove
the copy relocation execution logic for the kvm_exc_entry function.

Cc: stable@vger.kernel.org
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:37 +08:00
Thomas Weißschuh
7e2c41bc62 LoongArch: vDSO: Drop custom __arch_vdso_hres_capable()
The custom definition is identical to the generic fallback one.

So remove it.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:20 +08:00
Wentao Guan
8dfa2f8780 LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang()
The switch case in loongson_gpu_fixup_dma_hang() may not DC2 or DC3, and
readl(crtc_reg) will access with random address, because the "device" is
from "base+PCI_DEVICE_ID", "base" is from "pdev->devfn+1". This is wrong
when my platform inserts a discrete GPU:

lspci -tv
-[0000:00]-+-00.0  Loongson Technology LLC Hyper Transport Bridge Controller
...
           +-06.0  Loongson Technology LLC LG100 GPU
           +-06.2  Loongson Technology LLC Device 7a37
...

Add a default switch case to fix the panic as below:

 Kernel ade access[#1]:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.136-loong64-desktop-hwe+ #4
 pc 90000000017e5534 ra 90000000017e54c0 tp 90000001002f8000 sp 90000001002fb6c0
 a0 80000efe00003100 a1 0000000000003100 a2 0000000000000000 a3 0000000000000002
 a4 90000001002fb6b4 a5 900000087cdb58fd a6 90000000027af000 a7 0000000000000001
 t0 00000000000085b9 t1 000000000000ffff t2 0000000000000000 t3 0000000000000000
 t4 fffffffffffffffd t5 00000000fffb6d9c t6 0000000000083b00 t7 00000000000070c0
 t8 900000087cdb4d94 u0 900000087cdb58fd s9 90000001002fb826 s0 90000000031c12c8
 s1 7fffffffffffff00 s2 90000000031c12d0 s3 0000000000002710 s4 0000000000000000
 s5 0000000000000000 s6 9000000100053000 s7 7fffffffffffff00 s8 90000000030d4000
    ra: 90000000017e54c0 loongson_gpu_fixup_dma_hang+0x40/0x210
   ERA: 90000000017e5534 loongson_gpu_fixup_dma_hang+0xb4/0x210
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00480000 [ADEM] (IS= ECode=8 EsubCode=1)
  BADV: 7fffffffffffff00
  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
 Modules linked in:
 Process swapper/0 (pid: 1, threadinfo=(____ptrval____), task=(____ptrval____))
 Stack : 0000000000000006 90000001002fb778 90000001002fb704 0000000000000007
         0000000016a65700 90000000017e5690 000000000000ffff ffffffffffffffff
         900000000209f7c0 9000000100053000 900000000209f7a8 9000000000eebc08
         0000000000000000 0000000000000000 0000000000000006 90000001002fb778
         90000001000530b8 90000000027af000 0000000000000000 9000000100054000
         9000000100053000 9000000000ebb70c 9000000100004c00 9000000004000001
         90000001002fb7e4 bae765461f31cb12 0000000000000000 0000000000000000
         0000000000000006 90000000027af000 0000000000000030 90000000027af000
         900000087cd6f800 9000000100053000 0000000000000000 9000000000ebc560
         7a2500147cdaf720 bae765461f31cb12 0000000000000001 0000000000000030
         ...
 Call Trace:
 [<90000000017e5534>] loongson_gpu_fixup_dma_hang+0xb4/0x210
 [<9000000000eebc08>] pci_fixup_device+0x108/0x280
 [<9000000000ebb70c>] pci_setup_device+0x24c/0x690
 [<9000000000ebc560>] pci_scan_single_device+0xe0/0x140
 [<9000000000ebc684>] pci_scan_slot+0xc4/0x280
 [<9000000000ebdd00>] pci_scan_child_bus_extend+0x60/0x3f0
 [<9000000000f5bc94>] acpi_pci_root_create+0x2b4/0x420
 [<90000000017e5e74>] pci_acpi_scan_root+0x2d4/0x440
 [<9000000000f5b02c>] acpi_pci_root_add+0x21c/0x3a0
 [<9000000000f4ee54>] acpi_bus_attach+0x1a4/0x3c0
 [<90000000010e200c>] device_for_each_child+0x6c/0xe0
 [<9000000000f4bbf4>] acpi_dev_for_each_child+0x44/0x70
 [<9000000000f4ef40>] acpi_bus_attach+0x290/0x3c0
 [<90000000010e200c>] device_for_each_child+0x6c/0xe0
 [<9000000000f4bbf4>] acpi_dev_for_each_child+0x44/0x70
 [<9000000000f4ef40>] acpi_bus_attach+0x290/0x3c0
 [<9000000000f5211c>] acpi_bus_scan+0x6c/0x280
 [<900000000189c028>] acpi_scan_init+0x194/0x310
 [<900000000189bc6c>] acpi_init+0xcc/0x140
 [<9000000000220cdc>] do_one_initcall+0x4c/0x310
 [<90000000018618fc>] kernel_init_freeable+0x258/0x2d4
 [<900000000184326c>] kernel_init+0x28/0x13c
 [<9000000000222008>] ret_from_kernel_thread+0xc/0xa4

Cc: stable@vger.kernel.org
Fixes: 95db0c9f52 ("LoongArch: Workaround LS2K/LS7A GPU DMA hang bug")
Link: https://gist.github.com/opsiff/ebf2dac51b4013d22462f2124c55f807
Link: https://gist.github.com/opsiff/a62f2a73db0492b3c49bf223a339b133
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:20 +08:00
Huacai Chen
49f33840dc LoongArch: Use per-root-bridge PCIH flag to skip mem resource fixup
When firmware enables 64-bit PCI host bridge support, some root bridges
already provide valid 64-bit mem resource windows through ACPI.

In this case, the LoongArch-specific mem resource high-bits fixup in
acpi_prepare_root_resources() should not be applied unconditionally.
Otherwise, the kernel may override the native resource layout derived
from firmware, and later BAR assignment can fail to place device BARs
into the intended 64-bit address space correctly.

Add a per-root-bridge ACPI flag, PCIH, and evaluate it from the current
root bridge device scope. When PCIH is set, skip the mem resource high-
bits fixup path and let the kernel use the firmware-provided resource
description directly. When PCIH is absent or cleared, keep the existing
behavior and continue filling the high address bits from the host bridge
address.

This makes the behavior per-root-bridge configurable and avoids breaking
valid 64-bit BAR space allocation on bridges whose 64-bit windows have
already been fully described by firmware.

Cc: stable@vger.kernel.org
Suggested-by: Chao Li <lichao@loongson.cn>
Tested-by: Dongyan Qian <qiandongyan@loongson.cn>
Signed-off-by: Dongyan Qian <qiandongyan@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:20 +08:00
Huacai Chen
98b8aebb14 LoongArch: Fix SYM_SIGFUNC_START definition for 32BIT
The SYM_SIGFUNC_START definition should match sigcontext that the length
of GPRs are 8 bytes for both 32BIT and 64BIT. So replace SZREG with 8 to
fix it.

Cc: stable@vger.kernel.org
Fixes: e4878c37f6 ("LoongArch: vDSO: Emit GNU_EH_FRAME correctly")
Suggested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:01 +08:00
Huacai Chen
5643c6b2c8 LoongArch: Specify -m32/-m64 explicitly for 32BIT/64BIT
Clang/LLVM build needs -m32/-m64 to switch triple variants (i.e. the
--target=xxx parameter). Otherwise we get build errors for CONFIG_32BIT.

GCC doesn't support -m32/-m64 now, but maybe support in future, so use
cc-option to specify them.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604232041.ESJDwVG4-lkp@intel.com/
Suggested-by: Nathan Chancellor <nathan@kernel.org
Tested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:01 +08:00
Huacai Chen
4808e5cc4f LoongArch: Make CONFIG_64BIT as the default option
CONFIG_64BIT is the mandatory option before v7.0, but in v7.1-rc1 both
CONFIG_32BIT and CONFIG_64BIT are selectable and CONFIG_32BIT became the
default option. This breaks existing configurations, so explicitly make
CONFIG_64BIT as the default option to keep existing behavior.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-04 09:00:00 +08:00
Linus Torvalds
6d35786de2 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "Three bug fixes for x86:

   - Check that nEPT/nNPT is enabled in slow flush hypercalls. If it is
     not, the hypercalls can be processed as usual even while running a
     nested guest

   - Fix shadow paging use-after-free due to page tables changing
     outside execution of the guest. A bug that is 16 years old and
     stems from an imprecision in the very first KVM series

   - Scan IRR whenever PID.ON is true, even if PIR is empty, which
     avoids a somewhat rare WARN"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix shadow paging use-after-free due to unexpected GFN
  KVM: x86: Fix misleading variable names and add more comments for PIR=>IRR flow
  KVM: x86: Do IRR scan in __kvm_apic_update_irr even if PIR is empty
  KVM: x86: check for nEPT/nNPT in slow flush hypercalls
2026-05-03 15:25:47 -07:00
Sean Christopherson
0cb2af2ea6 KVM: x86: Fix shadow paging use-after-free due to unexpected GFN
The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus
the SPTE index. This assumption breaks for shadow paging if the guest
page tables are modified between VM entries (similar to commit
aad885e774, "KVM: x86/mmu: Drop/zap existing present SPTE even
when creating an MMIO SPTE", 2026-03-27).  The flow is as follows:

- a PDE is installed for a 2MB mapping, and a page in that area is
  accessed.  KVM creates a kvm_mmu_page consisting of 512 4KB pages;
  the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because
  the guest's mapping is a huge page (and thus contiguous).

- the PDE mapping is changed from outside the guest.

- the guest accesses another page in the same 2MB area.  KVM installs
  a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN
  (i.e. based on the new mapping, as changed in the previous step) but
  that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore
  the rmap entry cannot be found and removed when the kvm_mmu_page
  is zapped.

- the memslot that covers the first 2MB mapping is deleted, and the
  kvm_mmu_page for the now-invalid GPA is zapped.  However, rmap_remove()
  only looks at the [sp->gfn, sp->gfn + 511] range established in step 1,
  and fails to find the rmap entry that was recorded by step 3.

- any operation that causes an rmap walk for the same page accessed
  by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page.
  This includes dirty logging or MMU notifier invalidations (e.g., from
  MADV_DONTNEED).

The underlying issue is that KVM's walking of shadow PTEs assumes that
if a SPTE is present when KVM wants to install a non-leaf SPTE, then the
existing kvm_mmu_page must be for the correct gfn.  Because the only way
for the gfn to be wrong is if KVM messed up and failed to zap a SPTE...
which shouldn't happen, but *actually* only happens in response to a
guest write.

That bug dates back literally forever, as even the first version of KVM
assumes that the GFN matches and walks into the "wrong" shadow page.
However, that was only an imprecision until 2032a93d66 ("KVM: MMU:
Don't allocate gfns page for direct mmu pages") came along.

Fix it by checking for a target gfn mismatch and zapping the existing
SPTE.  That way the old SP and rmap entries are gone, KVM installs
the rmap in the right location, and everyone is happy.

Fixes: 2032a93d66 ("KVM: MMU: Don't allocate gfns page for direct mmu pages")
Fixes: 6aa8b732ca ("kvm: userspace interface")
Reported-by: Alexander Bulekov <bkov@amazon.com>
Reported-by: Fred Griffoul <fgriffo@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20260503201029.106481-1-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-03 22:32:53 +02:00
Sean Christopherson
0aec99f9bf KVM: x86: Fix misleading variable names and add more comments for PIR=>IRR flow
Rename kvm_apic_update_irr()'s "irr_updated" and vmx_sync_pir_to_irr()'s
"got_posted_interrupt" to a more accurate "max_irr_is_from_pir", as neither
"irr_updated" nor "got_posted_interrupt" is accurate.
__kvm_apic_update_irr() and thus kvm_apic_update_irr() specifically return
true if and only if the highest priority IRQ, i.e. max_irr, is a "new"
pending IRQ from the PIR.  I.e. it's possible for the IRR to be updated,
i.e. for a posted IRQ to be "got", *without* the APIs returning true.

Expand vmx_sync_pir_to_irr()'s comment to explain why it's necessary to
set KVM_REQ_EVENT only if a "new" IRQ was found, and to explain why it's
safe to do so only if a new IRQ is also the highest priority pending IRQ.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20260503201703.108231-3-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-03 22:32:41 +02:00
Paolo Bonzini
33fd0ccd25 KVM: x86: Do IRR scan in __kvm_apic_update_irr even if PIR is empty
Fall back to apic_find_highest_vector() when PID.ON is set but PIR
turns out to be empty, to correctly report the highest pending interrupt
from the existing IRR.

In a nested VM stress test, the following WARNING fires in
vmx_check_nested_events() when kvm_cpu_has_interrupt() reports a pending
interrupt but the subsequent kvm_apic_has_interrupt() (which invokes
vmx_sync_pir_to_irr() again) returns -1:

  WARNING: CPU: 99 PID: 57767 at arch/x86/kvm/vmx/nested.c:4449 vmx_check_nested_events+0x6bf/0x6e0 [kvm_intel]
  Call Trace:
   kvm_check_and_inject_events
   vcpu_enter_guest.constprop.0
   vcpu_run
   kvm_arch_vcpu_ioctl_run
   kvm_vcpu_ioctl
   __x64_sys_ioctl
   do_syscall_64
   entry_SYSCALL_64_after_hwframe

The root cause is a race between vmx_sync_pir_to_irr() on the target vCPU
and __vmx_deliver_posted_interrupt() on a sender vCPU.  The sender
performs two individually-atomic operations that are not a single
transaction:

  1. pi_test_and_set_pir(vector)  -- sets the PIR bit
  2. pi_test_and_set_on()         -- sets PID.ON

The following interleaving triggers the bug:

  Sender vCPU (IPI):              Target vCPU (1st sync_pir_to_irr):
  B1: set PIR[vector]
                                  A1: pi_clear_on()
                                  A2: pi_harvest_pir() -> sees B1 bit
                                  A3: xchg() -> consumes bit, PIR=0
                                      (1st sync returns correct max_irr)
  B2: set PID.ON = 1

                                  Target vCPU (2nd sync_pir_to_irr):
                                  C1: pi_test_on() -> TRUE (from B2)
                                  C2: pi_clear_on() -> ON=0
                                  C3: pi_harvest_pir() -> PIR empty
                                  C4: *max_irr = -1, early return
                                      IRR NOT SCANNED

The interrupt is not lost (it resides in the IRR from the first sync and
is recovered on the next vcpu_enter_guest() iteration), but the incorrect
max_irr causes a spurious WARNING and a wasted L2 VM-Enter/VM-Exit cycle.

Fixes: b41f8638b9 ("KVM: VMX: Isolate pure loads from atomic XCHG when processing PIR")
Reported-by: Farrah Chen <farrah.chen@intel.com>
Analyzed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvm/20260428070349.1633238-1-chenyi.qiang@intel.com/T/
Link: https://patch.msgid.link/20260503201703.108231-2-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-03 22:18:15 +02:00
Paolo Bonzini
464af6fc2b KVM: x86: check for nEPT/nNPT in slow flush hypercalls
Checking is_guest_mode(vcpu) is incorrect, because translate_nested_gpa()
is only valid if an L2 guest is running *with nested EPT/NPT enabled*.
Instead use the same condition as translate_nested_gpa() itself.

Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Fixes: aee738236d ("KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs", 2022-11-18)
Link: https://patch.msgid.link/20260503200905.106077-1-pbonzini@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-03 22:17:30 +02:00
Linus Torvalds
f377d0025e Merge tag 'sh-for-v7.1-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh fix from John Paul Adrian Glaubitz:
 "The ZERO_PAGE consolidation in v7.1, introduced a regression on sh
  which made these systems unbootable.

  The problem was that on sh, the initial boot parameters were
  previously referenced as an array and after 6215d9f447 ("arch, mm:
  consolidate empty_zero_page"), they were referenced as a pointer which
  caused wrong code generation and boot hang.

  This changes the declaration back to being an array which fixes the
  boot hang"

* tag 'sh-for-v7.1-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: Fix fallout from ZERO_PAGE consolidation
2026-05-03 08:58:42 -07:00
Mike Rapoport (Microsoft)
b0aa5e4b08 sh: Fix fallout from ZERO_PAGE consolidation
Consolidation of empty_zero_page declarations broke boot on sh.

sh stores its initial boot parameters in a page reserved in
arch/sh/kernel/head_32.S. Before commit 6215d9f447 ("arch, mm:
consolidate empty_zero_page") this page was referenced in C code
as an array and after that commit it is referenced as a pointer.

This causes wrong code generation and boot hang.

Declare boot_params_page as an array to fix the issue.

Reported-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Tested-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Fixes: 6215d9f447 ("arch, mm: consolidate empty_zero_page")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Artur Rojek <contact@artur-rojek.eu>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2026-05-03 16:35:40 +02:00
Linus Torvalds
cd546f7ae2 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Avoid writing an uninitialised stack variable to POR_EL0 on sigreturn
   if the poe_context record is absent

 - Reserve one more page for the early 4K-page kernel mapping to cover
   the extra [_text, _stext) split introduced by the non-executable
   read-only mapping

 - Force the arch_local_irq_*() wrappers to be __always_inline so that
   noinstr entry and idle paths cannot call out-of-line, instrumentable
   copies

 - Fix potential sign extension in the arm64 SCS unwinder's DWARF
   advance_loc4 decoding

 - Tolerate arm64 ACPI platforms with only WFI and no deeper PSCI idle
   states, restoring cpuidle registration on such systems

 - Include the UAPI <asm/ptrace.h> header in the arm64 GCS libc test
   rather than carrying a duplicate struct user_gcs definition (the
   original #ifdef NT_ARM_GCS was wrong to cover the structure
   definition as it would be masked out if the toolchain defined it)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: signal: Preserve POR_EL0 if poe_context is missing
  arm64: Reserve an extra page for early kernel mapping
  kselftest/arm64: Include <asm/ptrace.h> for user_gcs definition
  ACPI: arm64: cpuidle: Tolerate platforms with no deep PSCI idle states
  arm64/irqflags: __always_inline the arch_local_irq_*() helpers
  arm64/scs: Fix potential sign extension issue of advance_loc4
2026-05-01 16:32:42 -07:00
Linus Torvalds
bb1d73f2cd Merge tag 's390-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:

 - Reject zero-length writes from userspace that corrupt Debug Facility
   buffers

 - Replace one s390 PCI maintainer

 - Remove SCLP_OFB Kconfig option and enable the guarded code
   unconditionally

 - Replace incorrect use of phys_to_folio() to virt_to_folio() in
   do_secure_storage_access()

* tag 's390-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: Fix phys_to_folio() usage in do_secure_storage_access()
  s390/sclp: Remove SCLP_OFB Kconfig option
  MAINTAINERS: Replace one of the maintainers for s390/pci
  s390/debug: Reject zero-length input in debug_input_flush_fn()
  s390/debug: Reject zero-length input before trimming a newline
2026-05-01 12:58:02 -07:00
Helge Deller
41ca998fbe parisc: Fix 64-bit kernel build when CONFIG_COMPAT=n
VDSO32_SYMBOL() is used in signal.c, defining the value to zero avoids
liker issues when CONFIG_COMPAT=n.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-05-01 19:09:30 +02:00
Kevin Brodsky
030e8a40ff arm64: signal: Preserve POR_EL0 if poe_context is missing
Commit 2e8a1acea8 ("arm64: signal: Improve POR_EL0 handling to
avoid uaccess failures") delayed the write to POR_EL0 in
rt_sigreturn to avoid spurious uaccess failures. This change however
relies on the poe_context frame record being present: on a system
supporting POE, calling sigreturn without a poe_context record now
results in writing arbitrary data from the kernel stack into POR_EL0.

Fix this by adding a __valid_fields member to struct
user_access_state, and zeroing the struct on allocation.
restore_poe_context() then indicates that the por_el0 field is valid
by setting the corresponding bit in __valid_fields, and
restore_user_access_state() only touches POR_EL0 if there is a valid
value to set it to. This is in line with how POR_EL0 was originally
handled; all frame records are currently optional, except
fpsimd_context.

To ensure that __valid_fields is kept in sync, fields (currently
just por_el0) are now accessed via accessors and prefixed with __ to
discourage direct access.

Fixes: 2e8a1acea8 ("arm64: signal: Improve POR_EL0 handling to avoid uaccess failures")
Cc: <stable@vger.kernel.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-01 17:44:25 +01:00
Zhaoyang Huang
4d8e74ad45 arm64: Reserve an extra page for early kernel mapping
The final part of [data, end) segment may overflow into the next page of
init_pg_end[1] which is the gap page before early_init_stack[2]:

[1]
crash_arm64_v9.0.1> vtop ffffffed00601000
VIRTUAL           PHYSICAL
ffffffed00601000  83401000

PAGE DIRECTORY: ffffffecffd62000
   PGD: ffffffecffd62da0 => 10000000833fb003
   PMD: ffffff80033fb018 => 10000000833fe003
   PTE: ffffff80033fe008 => 68000083401f03
  PAGE: 83401000

     PTE        PHYSICAL  FLAGS
68000083401f03  83401000  (VALID|SHARED|AF|NG|PXN|UXN)

      PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS
fffffffec00d0040 83401000                0        0  1 4000 reserved

[2]
ffffffed002c8000 (r) __pi__data
ffffffed0054e000 (d) __pi___bss_start
ffffffed005f5000 (b) __pi_init_pg_dir
ffffffed005fe000 (b) __pi_init_pg_end
ffffffed005ff000 (B) early_init_stack
ffffffed00608000 (b) __pi__end

For 4K pages, the early kernel mapping may use 2MB block entries but the
kernel segments are only 64KB aligned. Segment boundaries that fall
within a 2MB block therefore require a PTE table so that different
attributes can be applied on either side of the boundary.

KERNEL_SEGMENT_COUNT still correctly counts the five permanent kernel
VMAs registered by declare_kernel_vmas(). However, since commit
5973a62efa ("arm64: map [_text, _stext) virtual address range
non-executable+read-only"), the early mapper also maps [_text, _stext)
separately from [_stext, _etext). This adds one more early-only split
and can require one more page-table page than the existing
EARLY_SEGMENT_EXTRA_PAGES allowance reserves.

Increase the 4K-page early mapping allowance by one page to cover that
additional split.

Fixes: 5973a62efa ("arm64: map [_text, _stext) virtual address range non-executable+read-only")
Assisted-by: TRAE:GLM-5.1
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
[catalin.marinas@arm.com: rewrote part of the commit log]
[catalin.marinas@arm.com: expanded the code comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-01 16:20:35 +01:00
Helge Deller
e6a650acbd parisc: Fix build failure for 32-bit kernel with PA2.0 instruction set
The CONFIG_PA11 option can not be used as a reliable check if we build a
32-bit kernel which needs the 32-bit VDSO.
Instead depend on CONFIG_64BIT and CONFIG_COMPAT only.

Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-30 09:10:07 +02:00
Johan Hovold
b8425ceefe parisc: drivers: switch to dynamic root device
Driver core expects devices to be dynamically allocated and will, for
example, complain loudly if a device that lacks a release function
is ever freed.

Use root_device_register() to allocate and register the root device
instead of open coding using a static device.

While at it, drop the redundant additional reference taken at init.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-28 17:18:52 +02:00
Heiko Carstens
b95e0e7928 s390/mm: Fix phys_to_folio() usage in do_secure_storage_access()
In case of a Secure-Storage-Access exception the effective aka virtual
address which caused the exception is contained within the TEID.

do_secure_storage_access() incorrectly uses phys_to_folio() instead of
virt_to_folio() to translate the virtual address to the corresponding
folio.

Fix this by using virt_to_folio() instead of phys_to_folio().

Fixes: 084ea4d611 ("s390/mm: add (non)secure page access exceptions handlers")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-04-28 14:45:03 +02:00
Vasily Gorbik
e14622a758 s390/debug: Reject zero-length input in debug_input_flush_fn()
debug_input_flush_fn() always copies one byte from the userspace buffer
with copy_from_user() regardless of the supplied write length. A
zero-length write therefore reads one byte beyond the caller's buffer.
If the stale byte happens to be '-' or a digit the debug log is
silently flushed. With an unmapped buffer the call returns -EFAULT.

Reject zero-length writes before copying from userspace.

Cc: stable@vger.kernel.org # v5.10+
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-04-28 14:45:02 +02:00
Pengpeng Hou
c366a7b5ed s390/debug: Reject zero-length input before trimming a newline
debug_get_user_string() duplicates the userspace buffer with
memdup_user_nul() and then unconditionally looks at buffer[user_len - 1]
to strip a trailing newline.

A zero-length write reaches this helper unchanged, so the newline trim
reads before the start of the allocated buffer.

Reject empty writes before accessing the last input byte.

Fixes: 66a464dbc8 ("[PATCH] s390: debug feature changes")
Cc: stable@vger.kernel.org
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20260417073530.96002-1-pengpeng@iscas.ac.cn
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-04-28 14:45:02 +02:00
Breno Leitao
caecde119e arm64/irqflags: __always_inline the arch_local_irq_*() helpers
The arch_local_irq_*() wrappers in <asm/irqflags.h> dispatch between two
underlying primitives: the __daif_* path on most systems, and the
__pmr_* path on builds that use GIC PMR-based masking (Pseudo-NMI). The
leaf primitives are already __always_inline, but the wrappers themselves
are plain "static inline".

That is unsafe for noinstr callers: nothing prevents the compiler from
emitting an out-of-line copy of e.g. arch_local_irq_disable(), and an
out-of-line copy can be instrumented (ftrace, kcov, sanitizers), which
breaks the noinstr contract on the entry/idle paths that rely on these
helpers.

x86 hit and fixed exactly this class of bug in commit 7a745be1cc
("x86/entry: __always_inline irqflags for noinstr").

Force-inline all of the arch_local_irq_*() wrappers so they cannot be
emitted out-of-line:

  - arch_local_irq_enable()
  - arch_local_irq_disable()
  - arch_local_save_flags()
  - arch_irqs_disabled_flags()
  - arch_irqs_disabled()
  - arch_local_irq_save()
  - arch_local_irq_restore()

The primary motivation is noinstr safety. There is a useful side effect
for fleet-wide profiling: when the wrapper is emitted out-of-line,
samples taken inside it during the post-WFI IRQ unmask in
default_idle_call() are attributed to arch_local_irq_enable rather than
default_idle_call(), and the FP-unwinder loses default_idle_call() from
the chain.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Leonardo Bras <leo.bras@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-27 13:13:36 +01:00
Wentao Guan
4023b7424e arm64/scs: Fix potential sign extension issue of advance_loc4
The expression (*opcode++ << 24) and exp * code_alignment_factor
may overflow signed int and becomes negative.

Fix this by casting each byte to u64 before shifting. Also fix
the misaligned break statement while we are here.

Example of the result can be seen here:
Link: https://godbolt.org/z/zhY8d3595

It maybe not a real problem, but could be a issue in future.

Fixes: d499e9627d ("arm64/scs: Fix handling of advance_loc4")
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-27 12:16:26 +01:00
Paolo Bonzini
909eac682c Merge tag 'kvmarm-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.1, take #1

- Allow tracing for non-pKVM, which was accidentally disabled when
  the series was merged

- Rationalise the way the pKVM hypercall ranges are defined by using
  the same mechanism as already used for the vcpu_sysreg enum

- Enforce that SMCCC function numbers relayed by the pKVM proxy are
  actually compliant with the specification

- Fix a couple of feature to idreg mappings which resulted in the
  wrong sanitisation being applied

- Fix the GICD_IIDR revision number field that could never been
  written correctly by userspace

- Make kvm_vcpu_initialized() correctly use its parameter instead
  of relying on the surrounding context

- Enforce correct ordering in __pkvm_init_vcpu(), plugging a
  potential pin leak at the same time

- Move __pkvm_init_finalise() to a less dangerous spot, avoiding
  future problems

- Restore functional userspace irqchip support after a four year
  breakage (last functional kernel was 5.18...). This is obviously
  ripe for garbage collection.

- ... and the usual lot of spelling fixes
2026-04-27 04:24:34 -04:00
Linus Torvalds
129d6eb266 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM updates from Russell King:

 - fix a race condition handling PG_dcache_clean

 - further cleanups for the fault handling, allowing RT to be enabled

 - fixing nzones validation in adfs filesystem driver

 - fix for module unwinding

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9463/1: Allow to enable RT
  ARM: 9472/1: fix race condition on PG_dcache_clean in __sync_icache_dcache()
  ARM: 9471/1: module: fix unwind section relocation out of range error
  fs/adfs: validate nzones in adfs_validate_bblk()
  ARM: provide individual is_translation_fault() and is_permission_fault()
  ARM: move FSR fault status definitions before fsr_fs()
  ARM: use BIT() and GENMASK() for fault status register fields
  ARM: move is_permission_fault() and is_translation_fault() to fault.h
  ARM: move vmalloc() lazy-page table population
  ARM: ensure interrupts are enabled in __do_user_fault()
2026-04-25 07:44:26 -07:00
Linus Torvalds
8f4e8687c8 Merge tag 'x86-urgent-2026-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Prevent deadlock during shstk sigreturn (Rick Edgecombe)

 - Disable FRED when PTI is forced on (Dave Hansen)

 - Revert a CPA INVLPGB optimization that did not properly handle
   discontiguous virtual addresses (Dave Hansen)

* tag 'x86-urgent-2026-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Revert INVLPGB optimization for set_memory code
  x86/cpu: Disable FRED when PTI is forced on
  x86/shstk: Prevent deadlock during shstk sigreturn
2026-04-24 10:05:42 -07:00
Linus Torvalds
feff82eb5f Merge tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
 "There is one significant change outside arch/riscv in this pull
  request: the addition of a set of KUnit tests for strlen(), strnlen(),
  and strrchr().

  Otherwise, the most notable changes are to add some RISC-V-specific
  string function implementations, to remove XIP kernel support, to add
  hardware error exception handling, and to optimize our runtime
  unaligned access speed testing.

  A few comments on the motivation for removing XIP support. It's been
  broken in the RISC-V kernel for months. The code is not easy to
  maintain. Furthermore, for XIP support to truly be useful for RISC-V,
  we think that compile-time feature switches would need to be added for
  many of the RISC-V ISA features and microarchitectural properties that
  are currently implemented with runtime patching. No one has stepped
  forward to take responsibility for that work, so many of us think it's
  best to remove it until clear use cases and champions emerge.

  Summary:

   - Add Kunit correctness testing and microbenchmarks for strlen(),
     strnlen(), and strrchr()

   - Add RISC-V-specific strnlen(), strchr(), strrchr() implementations

   - Add hardware error exception handling

   - Clean up and optimize our unaligned access probe code

   - Enable HAVE_IOREMAP_PROT to be able to use generic_access_phys()

   - Remove XIP kernel support

   - Warn when addresses outside the vmemmap range are passed to
     vmemmap_populate()

   - Update the ACPI FADT revision check to warn if it's not at least
     ACPI v6.6, which is when key RISC-V-specific tables were added to
     the specification

   - Increase COMMAND_LINE_SIZE to 2048 to match ARM64, x86, PowerPC,
     etc.

   - Make kaslr_offset() a static inline function, since there's no need
     for it to show up in the symbol table

   - Add KASLR offset and SATP to the VMCOREINFO ELF notes to improve
     kdump support

   - Add Makefile cleanup rule for vdso_cfi copied source files, and add
     a .gitignore for the build artifacts in that directory

   - Remove some redundant ifdefs that check Kconfig macros

   - Add missing SPDX license tag to the CFI selftest

   - Simplify UTS_MACHINE assignment in the RISC-V Makefile

   - Clarify some unclear comments and remove some superfluous comments

   - Fix various English typos across the RISC-V codebase"

* tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
  riscv: Remove support for XIP kernel
  riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
  riscv: Split out compare_unaligned_access()
  riscv: Reuse measure_cycles() in check_vector_unaligned_access()
  riscv: Split out measure_cycles() for reuse
  riscv: Clean up & optimize unaligned scalar access probe
  riscv: lib: add strrchr() implementation
  riscv: lib: add strchr() implementation
  riscv: lib: add strnlen() implementation
  lib/string_kunit: extend benchmarks to strnlen() and chr searches
  lib/string_kunit: add performance benchmark for strlen()
  lib/string_kunit: add correctness test for strrchr()
  lib/string_kunit: add correctness test for strnlen()
  lib/string_kunit: add correctness test for strlen()
  riscv: vdso_cfi: Add .gitignore for build artifacts
  riscv: vdso_cfi: Add clean rule for copied sources
  riscv: enable HAVE_IOREMAP_PROT
  riscv: mm: WARN_ON() for bad addresses in vmemmap_populate()
  riscv: acpi: update FADT revision check to 6.6
  riscv: add hardware error trap handler support
  ...
2026-04-24 10:00:37 -07:00
Linus Torvalds
ff57d59200 Merge tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:

 - Adjust build infrastructure for 32BIT/64BIT

 - Add HIGHMEM (PKMAP and FIX_KMAP) support

 - Show and handle CPU vulnerabilites correctly

 - Batch the icache maintenance for jump_label

 - Add more atomic instructions support for BPF JIT

 - Add more features (e.g. fsession) support for BPF trampoline

 - Some bug fixes and other small changes

* tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (21 commits)
  selftests/bpf: Enable CAN_USE_LOAD_ACQ_STORE_REL for LoongArch
  LoongArch: BPF: Add fsession support for trampolines
  LoongArch: BPF: Introduce emit_store_stack_imm64() helper
  LoongArch: BPF: Support up to 12 function arguments for trampoline
  LoongArch: BPF: Support small struct arguments for trampoline
  LoongArch: BPF: Open code and remove invoke_bpf_mod_ret()
  LoongArch: BPF: Support load-acquire and store-release instructions
  LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions
  LoongArch: BPF: Add the default case in emit_atomic() and rename it
  LoongArch: Define instruction formats for AM{SWAP/ADD}.{B/H} and DBAR
  LoongArch: Batch the icache maintenance for jump_label
  LoongArch: Add flush_icache_all()/local_flush_icache_all()
  LoongArch: Add spectre boundry for syscall dispatch table
  LoongArch: Show CPU vulnerabilites correctly
  LoongArch: Make arch_irq_work_has_interrupt() true only if IPI HW exist
  LoongArch: Use get_random_canary() for stack canary init
  LoongArch: Improve the logging of disabling KASLR
  LoongArch: Align FPU register state to 32 bytes
  LoongArch: Handle CONFIG_32BIT in syscall_get_arch()
  LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support
  ...
2026-04-24 09:54:45 -07:00
Linus Torvalds
64edfa6506 Merge tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking deletions from Jakub Kicinski:
 "Delete some obsolete networking code

  Old code like amateur radio and NFC have long been a burden to core
  networking developers. syzbot loves to find bugs in BKL-era code, and
  noobs try to fix them.

  If we want to have a fighting chance of surviving the LLM-pocalypse
  this code needs to find a dedicated owner or get deleted. We've talked
  about these deletions multiple times in the past and every time
  someone wanted the code to stay. It is never very clear to me how many
  of those people actually use the code vs are just nostalgic to see it
  go. Amateur radio did have occasional users (or so I think) but most
  users switched to user space implementations since its all super slow
  stuff. Nobody stepped up to maintain the kernel code.

  We were lucky enough to find someone who wants to help with NFC so
  we're giving that a chance. Let's try to put the rest of this code
  behind us"

* tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next:
  drivers: net: 8390: wd80x3: Remove this driver
  drivers: net: 8390: ultra: Remove this driver
  drivers: net: 8390: AX88190: Remove this driver
  drivers: net: fujitsu: fmvj18x: Remove this driver
  drivers: net: smsc: smc91c92: Remove this driver
  drivers: net: smsc: smc9194: Remove this driver
  drivers: net: amd: nmclan: Remove this driver
  drivers: net: amd: lance: Remove this driver
  drivers: net: 3com: 3c589: Remove this driver
  drivers: net: 3com: 3c574: Remove this driver
  drivers: net: 3com: 3c515: Remove this driver
  drivers: net: 3com: 3c509: Remove this driver
  net: packetengines: remove obsolete yellowfin driver and vendor dir
  net: packetengines: remove obsolete hamachi driver
  net: remove unused ATM protocols and legacy ATM device drivers
  net: remove ax25 and amateur radio (hamradio) subsystem
  net: remove ISDN subsystem and Bluetooth CMTP
  caif: remove CAIF NETWORK LAYER
2026-04-24 09:41:58 -07:00
Sebastian Andrzej Siewior
c6e61c06d6 ARM: 9463/1: Allow to enable RT
All known issues have been adressed.
Allow to select RT.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2026-04-24 15:14:59 +01:00
Russell King (Oracle)
6f685f12fd Merge branches 'adfs', 'arm-fault-handling', 'fixes' and 'misc' 2026-04-24 15:14:44 +01:00
Brian Ruley
75f9a484e8 ARM: 9472/1: fix race condition on PG_dcache_clean in __sync_icache_dcache()
This bug was already discovered and fixed for arm64 in
commit 588a513d34 ("arm64: Fix race condition on PG_dcache_clean in
__sync_icache_dcache()").

Verified with added instrumentation to track dcache flushes in a ring
buffer, as shown by the (distilled) output:

  kernel: SIGILL at b6b80ac0 cpu 1 pid 32663 linux_pte=8eff659f
          hw_pte=8eff6e7e young=1 exec=1
  kernel: dcache flush START   cpu0 pfn=8eff6 ts=48629557020154
  kernel: dcache flush SKIPPED cpu1 pfn=8eff6 ts=48629557020154
  kernel: dcache flush FINISH  cpu0 pfn=8eff6 ts=48629557036154
  audisp-syslog: comm="journalctl" exe="/usr/bin/journalctl" sig=4 [...]

Discussions in the mailing list mentioned that arch/arm is also affected
but the fix was never applied to it [1][2]. Apply the change now, since
the race condition can cause sporadic SIGILL's and SEGV's especially
while under high memory pressure.

Link: https://lore.kernel.org/all/adzMOdySgMIePcue@willie-the-truck [1]
Link: https://lore.kernel.org/all/20210514095001.13236-1-catalin.marinas@arm.com [2]
Signed-off-by: Brian Ruley <brian.ruley@gehealthcare.com>
Reviewed-by: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 6012191aa9 ("ARM: 6380/1: Introduce __sync_icache_dcache() for VIPT caches")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2026-04-24 15:12:52 +01:00
Dave Hansen
a39a701482 x86/mm: Revert INVLPGB optimization for set_memory code
tl;dr: Revert an INVLPGB optimization that did not properly handle
discontiguous virtual addresses.

Full story:

I got a report from some graphics (i915) folks that bisected a
regression in their test suite to 86e6815b31 ("x86/mm: Change
cpa_flush() to call flush_kernel_range() directly").  There was a bit
of flip-flopping on the exact bisect, but the code here does seem
wrong to me. The i915 folks were calling set_pages_array_wc(), so
using the CPA_PAGES_ARRAY mode.

Basically, the 'struct cpa_data' can wrap up all kinds of page table
changes.  Some of these are virtually contiguous, but some are very
much not which is one reason why there are ->vaddr and ->pages arrays.

86e6815b31 made the mistake of assuming that the virtual addresses
in the cpa_data are always contiguous. It got things right when neither
CPA_ARRAY/CPA_PAGES_ARRAY is used, but theoretically wrong when either
of those is used.

In the i915 case, it probably failed to flush some WB TLB entries and
install WC ones, leaving some data in the caches and not flushing it
out to where the device could see it. That eventually caused graphics
problems.

Revert the INVLPGB optimization. It can be reintroduced later, but it
will need to be a bit careful about the array modes.

Fixes: 86e6815b31 ("x86/mm: Change cpa_flush() to call flush_kernel_range()")
Reported-by: Cui, Ling <ling.cui@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260421151909.6B3281C6@davehans-spike.ostc.intel.com
2026-04-24 15:42:48 +02:00
Marc Zyngier
4ce98bf086 KVM: arm64: Wake-up from WFI when iqrchip is in userspace
It appears that there is nothing in the wake-up path that
evaluates whether the in-kernel interrupts are pending unless
we have a vgic.

This means that the userspace irqchip support has been broken for
about four years, and nobody noticed. It was also broken before
as we wouldn't wake-up on a PMU interrupt, but hey, who cares...

It is probably time to remove the feature altogether, because it
was a terrible idea 10 years ago, and it still is.

Fixes: b57de4ffd7 ("KVM: arm64: Simplify kvm_cpu_has_pending_timer()")
Link: https://patch.msgid.link/20260423163607.486345-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
2026-04-24 12:03:57 +01:00
Quentin Perret
5bb0aed57b KVM: arm64: Fix initialisation order in __pkvm_init_finalise()
fix_host_ownership() walks the hypervisor's stage-1 page-table to
adjust the host's stage-2 accordingly. Any such adjustment that
requires cache maintenance operations depends on the per-CPU hyp
fixmap being present. However, fix_host_ownership() is currently
called before fix_hyp_pgtable_refcnt() and hyp_create_fixmap(), so
the fixmap does not yet exist when it runs.

This is benign today because the host stage-2 starts empty and no
CMOs are needed, but it becomes a latent crash as soon as
fix_host_ownership() is extended to operate on a non-empty
page-table.

Reorder the calls so that fix_hyp_pgtable_refcnt() and
hyp_create_fixmap() complete before fix_host_ownership() is invoked.

Fixes: 0d16d12eb2 ("KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2")
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260424084908.370776-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
2026-04-24 12:03:57 +01:00