Commit Graph

238443 Commits

Author SHA1 Message Date
shechenglong
7f16357378 arm64: proton-pack: Fix hard lockup due to print in scheduler context
Relocate the printk() calls from spectre_v4_mitigations_off() and
spectre_v2_mitigations_off() into setup_system_capabilities() function,
preventing hard lockups caused by printk calls in scheduler context:

  | _raw_spin_lock_nested+168
  | ttwu_queue+180 (rq_lock(rq, &rf); 2nd acquiring the rq->__lock)
  | try_to_wake_up+548
  | wake_up_process+32
  | __up+88
  | up+100
  | __up_console_sem+96
  | console_unlock+696
  | vprintk_emit+428
  | vprintk_default+64
  | vprintk_func+220
  | printk+104
  | spectre_v4_enable_task_mitigation+344
  | __switch_to+100
  | __schedule+1028 (rq_lock(rq, &rf); 1st acquiring the rq->__lock)
  | schedule_idle+48
  | do_idle+388
  | cpu_startup_entry+44
  | secondary_start_kernel+352

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: shechenglong <shechenglong@xfusion.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:49:12 +00:00
shechenglong
62e72463ca arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
Following the pattern established with other Spectre mitigations,
do not print a message when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
Kconfig option is disabled.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: shechenglong <shechenglong@xfusion.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:49:09 +00:00
Ryan Roberts
53357f14f9 arm64: mm: Tidy up force_pte_mapping()
Tidy up the implementation of force_pte_mapping() to make it easier to
read and introduce the split_leaf_mapping_possible() helper to reduce
code duplication in split_kernel_leaf_mapping() and
arch_kfence_init_pool().

Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:43:15 +00:00
Ryan Roberts
40a292f701 arm64: mm: Optimize range_split_to_ptes()
Enter lazy_mmu mode while splitting a range of memory to pte mappings.
This causes barriers, which would otherwise be emitted after every pte
(and pmd/pud) write, to be deferred until exiting lazy_mmu mode.

For large systems, this is expected to significantly speed up fallback
to pte-mapping the linear map for the case where the boot CPU has
BBML2_NOABORT, but secondary CPUs do not. I haven't directly measured
it, but this is equivalent to commit 1fcb7cea8a ("arm64: mm: Batch dsb
and isb when populating pgtables").

Note that for the path from arch_kfence_init_pool(), we may sleep while
allocating memory inside the lazy_mmu mode. Sleeping is not allowed by
generic code inside lazy_mmu, but we know that the arm64 implementation
is sleep-safe. So this is ok and follows the same pattern already used
by split_kernel_leaf_mapping().

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:43:15 +00:00
Ryan Roberts
ce2b3a50ad arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
It has been reported that split_kernel_leaf_mapping() is trying to sleep
in non-sleepable context. It does this when acquiring the
pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or
CONFIG_KFENCE are enabled, which change linear map permissions within
softirq context during memory allocation and/or freeing. All other paths
into this function are called from sleepable context and so are safe.

But it turns out that the memory for which these 2 features may attempt
to modify the permissions is always mapped by pte, so there is no need
to attempt to split the mapping. So let's exit early in these cases and
avoid attempting to take the mutex.

There is one wrinkle to this approach; late-initialized kfence allocates
it's pool from the buddy which may be block mapped. So we must hook that
allocation and convert it to pte-mappings up front. Previously this was
done as a side-effect of kfence protecting all the individual pages in
its pool at init-time, but this no longer works due to the added early
exit path in split_kernel_leaf_mapping().

So instead, do this via the existing arch_kfence_init_pool() arch hook,
and reuse the existing linear_map_split_to_ptes() infrastructure.

Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/
Fixes: a166563e7e ("arm64: mm: support large block mapping when rodata=full")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <groeck@google.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:43:15 +00:00
Yang Shi
0ec364c0c9 arm64: kprobes: check the return value of set_memory_rox()
Since commit a166563e7e ("arm64: mm: support large block mapping when
rodata=full"), __change_memory_common has more chance to fail due to
memory allocation failure when splitting page table. So check the return
value of set_memory_rox(), then bail out if it fails otherwise we may have
RW memory mapping for kprobes insn page.

Fixes: 195a1b7d83 ("arm64: kprobes: call set_memory_rox() for kprobe page")
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:30:22 +00:00
Punit Agrawal
7991fda619 arm64: acpi: Drop message logging SPCR default console
Commit f5a4af3c75 ("ACPI: Add acpi=nospcr to disable ACPI SPCR as
default console on ARM64") introduced a command line parameter to
prevent using SPCR provided console as default. It also introduced a
message to log this choice.

Drop the message as it is not particularly useful and can be incorrect
in situations where no SPCR is provided by the firmware.

Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/
Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:23:33 +00:00
Punit Agrawal
eeb8c19896 Revert "ACPI: Suppress misleading SPCR console message when SPCR table is absent"
This reverts commit bad3fa2fb9.

Commit bad3fa2fb9 ("ACPI: Suppress misleading SPCR console message
when SPCR table is absent") mistakenly assumes acpi_parse_spcr()
returning 0 to indicate a failure to parse SPCR. While addressing the
resultant incorrect logging it was deemed that dropping the message is
a better approach as it is not particularly useful.

Roll back the commit introducing the bug as a step towards dropping
the log message.

Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/
Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:23:33 +00:00
Catalin Marinas
535fdfc5a2 arm64: Use load LSE atomics for the non-return per-CPU atomic operations
The non-return per-CPU this_cpu_*() atomic operations are implemented as
STADD/STCLR/STSET when FEAT_LSE is available. On many microarchitecture
implementations, these instructions tend to be executed "far" in the
interconnect or memory subsystem (unless the data is already in the L1
cache). This is in general more efficient when there is contention as it
avoids bouncing cache lines between CPUs. The load atomics (e.g. LDADD
without XZR as destination), OTOH, tend to be executed "near" with the
data loaded into the L1 cache.

STADD executed back to back as in srcu_read_{lock,unlock}*() incur an
additional overhead due to the default posting behaviour on several CPU
implementations. Since the per-CPU atomics are unlikely to be used
concurrently on the same memory location, encourage the hardware to to
execute them "near" by issuing load atomics - LDADD/LDCLR/LDSET - with
the destination register unused (but not XZR).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
[will: Add comment and link to the discussion thread]
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 14:20:07 +00:00
Mario Limonciello (AMD)
d23550efc6 x86/microcode/AMD: Add more known models to entry sign checking
Two Zen5 systems are missing from need_sha_check(). Add them.

Fixes: 50cef76d5c ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/20251106182904.4143757-1-superm1@kernel.org
2025-11-07 12:12:21 +01:00
Linus Torvalds
225a97d6d4 Merge tag 'riscv-for-linus-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:

 - A fix to disable KASAN checks while walking a non-current task's
   stackframe (following x86)

 - A fix for a kvrealloc()-related memory leak in
   module_frob_arch_sections()

 - Two replacements of strcpy() with strscpy()

 - A change to use the RISC-V .insn assembler directive when possible to
   assemble instructions from hex opcodes

 - Some low-impact fixes in the ptdump code and kprobes test code

* tag 'riscv-for-linus-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  cpuidle: riscv-sbi: Replace deprecated strcpy in sbi_cpuidle_init_cpu
  riscv: KGDB: Replace deprecated strcpy in kgdb_arch_handle_qxfer_pkt
  riscv: asm: use .insn for making custom instructions
  riscv: tests: Make RISCV_KPROBES_KUNIT tristate
  riscv: tests: Rename kprobes_test_riscv to kprobes_riscv
  riscv: Fix memory leak in module_frob_arch_sections()
  riscv: ptdump: use seq_puts() in pt_dump_seq_puts() macro
  riscv: stacktrace: Disable KASAN checks for non-current tasks
2025-11-06 15:44:18 -08:00
Linus Torvalds
c90841db35 Merge tag 'hardening-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
 "This is a work-around for a (now fixed) corner case in the arm32 build
  with Clang KCFI enabled.

   - Introduce __nocfi_generic for arm32 Clang (Nathan Chancellor)"

* tag 'hardening-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  libeth: xdp: Disable generic kCFI pass for libeth_xdp_tx_xmit_bulk()
  ARM: Select ARCH_USES_CFI_GENERIC_LLVM_PASS
  compiler_types: Introduce __nocfi_generic
2025-11-06 11:54:59 -08:00
Sukrit Bhatnagar
d0164c1619 KVM: VMX: Fix check for valid GVA on an EPT violation
On an EPT violation, bit 7 of the exit qualification is set if the
guest linear-address is valid. The derived page fault error code
should not be checked for this bit.

Fixes: f300948251 ("KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid")
Cc: stable@vger.kernel.org
Signed-off-by: Sukrit Bhatnagar <Sukrit.Bhatnagar@sony.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://patch.msgid.link/20251106052853.3071088-1-Sukrit.Bhatnagar@sony.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-06 06:06:18 -08:00
Jiri Olsa
20a0bc1027 x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe
Currently we don't get stack trace via ORC unwinder on top of fgraph exit
handler. We can see that when generating stacktrace from kretprobe_multi
bpf program which is based on fprobe/fgraph.

The reason is that the ORC unwind code won't get pass the return_to_handler
callback installed by fgraph return probe machinery.

Solving this by creating stack frame in return_to_handler expected by
ftrace_graph_ret_addr function to recover original return address and
continue with the unwind.

Also updating the pt_regs data with cs/flags/rsp which are needed for
successful stack retrieval from ebpf bpf_get_stackid helper.
 - in get_perf_callchain we check user_mode(regs) so CS has to be set
 - in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS/FIXED
    has to be unset

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-05 17:05:19 -08:00
Jiri Olsa
6d08340d1e Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"
This reverts commit 83f44ae0f8.

Currently we store initial stacktrace entry twice for non-HW ot_regs, which
means callers that fail perf_hw_regs(regs) condition in perf_callchain_kernel.

It's easy to reproduce this bpftrace:

  # bpftrace -e 'tracepoint:sched:sched_process_exec { print(kstack()); }'
  Attaching 1 probe...

        bprm_execve+1767
        bprm_execve+1767
        do_execveat_common.isra.0+425
        __x64_sys_execve+56
        do_syscall_64+133
        entry_SYSCALL_64_after_hwframe+118

When perf_callchain_kernel calls unwind_start with first_frame, AFAICS
we do not skip regs->ip, but it's added as part of the unwind process.
Hence reverting the extra perf_callchain_store for non-hw regs leg.

I was not able to bisect this, so I'm not really sure why this was needed
in v5.2 and why it's not working anymore, but I could see double entries
as far as v5.10.

I did the test for both ORC and framepointer unwind with and without the
this fix and except for the initial entry the stacktraces are the same.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-05 17:05:19 -08:00
Linus Torvalds
dc77806cf3 Merge tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:

 - Fix/workaround a couple Rust 1.91.0 build issues when sanitizers are
   enabled due to extra checking performed by the compiler and an
   upstream issue already fixed for Rust 1.93.0

 - Fix future Rust 1.93.0 builds by supporting the stabilized name for
   the 'no-jump-tables' flag

 - Fix a couple private/broken intra-doc links uncovered by the future
   move of pin-init to 'syn'

* tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: kbuild: support `-Cjump-tables=n` for Rust 1.93.0
  rust: kbuild: workaround `rustdoc` doctests modifier bug
  rust: kbuild: treat `build_error` and `rustdoc` as kernel objects
  rust: condvar: fix broken intra-doc link
  rust: devres: fix private intra-doc link
2025-11-05 11:15:36 -08:00
Linus Torvalds
284922f4c5 x86: uaccess: don't use runtime-const rewriting in modules
The runtime-const infrastructure was never designed to handle the
modular case, because the constant fixup is only done at boot time for
core kernel code.

But by the time I used it for the x86-64 user space limit handling in
commit 86e6b1547b ("x86: fix user address masking non-canonical
speculation issue"), I had completely repressed that fact.

And it all happens to work because the only code that currently actually
gets inlined by modules is for the access_ok() limit check, where the
default constant value works even when not fixed up.  Because at least I
had intentionally made it be something that is in the non-canonical
address space region.

But it's technically very wrong, and it does mean that at least in
theory, the use of 'access_ok()' + '__get_user()' can trigger the same
speculation issue with non-canonical addresses that the original commit
was all about.

The pattern is unusual enough that this probably doesn't matter in
practice, but very wrong is still very wrong.  Also, let's fix it before
the nice optimized scoped user accessor helpers that Thomas Gleixner is
working on cause this pseudo-constant to then be more widely used.

This all came up due to an unrelated discussion with Mateusz Guzik about
using the runtime const infrastructure for names_cachep accesses too.
There the modular case was much more obviously broken, and Mateusz noted
it in his 'v2' of the patch series.

That then made me notice how broken 'access_ok()' had been in modules
all along.  Mea culpa, mea maxima culpa.

Fix it by simply not using the runtime-const code in modules, and just
using the USER_PTR_MAX variable value instead.  This is not
performance-critical like the core user accessor functions (get_user()
and friends) are.

Also make sure this doesn't get forgotten the next time somebody wants
to do runtime constant optimizations by having the x86 runtime-const.h
header file error out if included by modules.

Fixes: 86e6b1547b ("x86: fix user address masking non-canonical speculation issue")
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Triggered-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-05 10:24:36 +09:00
Miguel Ojeda
789521b471 rust: kbuild: support -Cjump-tables=n for Rust 1.93.0
Rust 1.93.0 (expected 2026-01-22) is stabilizing `-Zno-jump-tables`
[1][2] as `-Cjump-tables=n` [3].

Without this change, one would eventually see:

      RUSTC L rust/core.o
    error: unknown unstable option: `no-jump-tables`

Thus support the upcoming version.

Link: https://github.com/rust-lang/rust/issues/116592 [1]
Link: https://github.com/rust-lang/rust/pull/105812 [2]
Link: https://github.com/rust-lang/rust/pull/145974 [3]
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251101094011.1024534-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-04 19:11:39 +01:00
Maxim Levitsky
fd92bd3b44 KVM: SVM: switch to raw spinlock for svm->ir_list_lock
Use a raw spinlock for vcpu_svm.ir_list_lock as the lock can be taken
during schedule() via kvm_sched_out() => __avic_vcpu_put(), and "normal"
spinlocks are sleepable locks when PREEMPT_RT=y.

This fixes the following lockdep warning:

  =============================
  [ BUG: Invalid wait context ]
  6.12.0-146.1640_2124176644.el10.x86_64+debug #1 Not tainted
  -----------------------------
  qemu-kvm/38299 is trying to lock:
  ff11000239725600 (&svm->ir_list_lock){....}-{3:3}, at: __avic_vcpu_put+0xfd/0x300 [kvm_amd]
  other info that might help us debug this:
  context-{5:5}
  2 locks held by qemu-kvm/38299:
   #0: ff11000239723ba8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x240/0xe00 [kvm]
   #1: ff11000b906056d8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2e/0x130
  stack backtrace:
  CPU: 1 UID: 0 PID: 38299 Comm: qemu-kvm Kdump: loaded Not tainted 6.12.0-146.1640_2124176644.el10.x86_64+debug #1 PREEMPT(voluntary)
  Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS RQZ100AB 09/14/2023
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6f/0xb0
   __lock_acquire+0x921/0xb80
   lock_acquire.part.0+0xbe/0x270
   _raw_spin_lock_irqsave+0x46/0x90
   __avic_vcpu_put+0xfd/0x300 [kvm_amd]
   svm_vcpu_put+0xfa/0x130 [kvm_amd]
   kvm_arch_vcpu_put+0x48c/0x790 [kvm]
   kvm_sched_out+0x161/0x1c0 [kvm]
   prepare_task_switch+0x36b/0xf60
   __schedule+0x4f7/0x1890
   schedule+0xd4/0x260
   xfer_to_guest_mode_handle_work+0x54/0xc0
   vcpu_run+0x69a/0xa70 [kvm]
   kvm_arch_vcpu_ioctl_run+0xdc0/0x17e0 [kvm]
   kvm_vcpu_ioctl+0x39f/0xe00 [kvm]

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://patch.msgid.link/20251030194130.307900-1-mlevitsk@redhat.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:28 -08:00
Sean Christopherson
aaac099459 KVM: SVM: Make avic_ga_log_notifier() local to avic.c
Make amd_iommu_register_ga_log_notifier() a local symbol now that it's
defined and used purely within avic.c.

No functional change intended.

Fixes: 4bdec12aa8 ("KVM: SVM: Detect X2APIC virtualization (x2AVIC) support")
Link: https://patch.msgid.link/20251016190643.80529-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:28 -08:00
Sean Christopherson
adc6ae9729 KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
Unregister the GALog notifier (used to get notified of wake events for
blocking vCPUs) on kvm-amd.ko exit so that a KVM or IOMMU driver bug that
results in a spurious GALog event "only" results in a spurious IRQ, and
doesn't trigger a use-after-free due to executing unloaded module code.

Fixes: 5881f73757 ("svm: Introduce AMD IOMMU avic_ga_log_notifier")
Reported-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Closes: https://lore.kernel.org/all/20250918130320.GA119526@k08j02272.eu95sqa
Link: https://patch.msgid.link/20251016190643.80529-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:27 -08:00
Sean Christopherson
59a217ced3 KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
Setup the per-CPU SVM data structures at the very end of hardware setup so
that svm_hardware_unsetup() can be used in svm_hardware_setup() to unwind
AVIC setup (for the GALog notifier).  Alternatively, the error path could
do an explicit, manual unwind, e.g. by adding a helper to free the per-CPU
structures.  But the per-CPU allocations have no interactions or
dependencies, i.e. can comfortably live at the end, and so converting to
a manual unwind would introduce churn and code without providing any
immediate advantage.

Link: https://patch.msgid.link/20251016190643.80529-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:26 -08:00
Chao Gao
cab4098be4 KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES
Update the comment above is_xstate_managed_msr() to note that
MSR_IA32_S_CET isn't saved/restored by XSAVES/XRSTORS.

MSR_IA32_S_CET isn't part of CET_U/S state as the SDM states:
  The register state used by Control-Flow Enforcement Technology (CET)
  comprises the two 64-bit MSRs (IA32_U_CET and IA32_PL3_SSP) that manage
  CET when CPL = 3 (CET_U state); and the three 64-bit MSRs
  (IA32_PL0_SSP–IA32_PL2_SSP) that manage CET when CPL < 3 (CET_S state).

Opportunistically shift the snippet about the safety of loading certain
MSRs to the function comment for kvm_access_xstate_msr(), which is where
the MSRs are actually loaded into hardware.

Fixes: e44eb58334 ("KVM: x86: Load guest FPU state when access XSAVE-managed MSRs")
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251028060142.29830-1-chao.gao@intel.com
[sean: shift snippet about safety to kvm_access_xstate_msr()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:26 -08:00
Sean Christopherson
9bc610b6a2 KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
Assert, via KVM_BUG_ON(), that guest FPU state isn't/is in use when
loading/putting the FPU to help detect KVM bugs without needing an assist
from KASAN.  If an imbalanced load/put is detected, skip the redundant
load/put to avoid clobbering guest state and/or crashing the host.

Note, kvm_access_xstate_msr() already provides a similar assertion.

Reviewed-by: Yao Yuan <yaoyuan@linux.alibaba.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251030185802.3375059-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:21 -08:00
Sean Christopherson
8819a49f9f KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
Replace the hack added by commit f958bd2314 ("KVM: x86: Fix potential
put_fpu() w/o load_fpu() on MPX platform") with a more robust approach of
unloading+reloading guest FPU state based on whether or not the vCPU's FPU
is currently in-use, i.e. currently loaded.  This fixes a bug on hosts
that support CET but not MPX, where kvm_arch_vcpu_ioctl_get_mpstate()
neglects to load FPU state (it only checks for MPX support) and leads to
KVM attempting to put FPU state due to kvm_apic_accept_events() triggering
INIT emulation.  E.g. on a host with CET but not MPX, syzkaller+KASAN
generates:

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
  CPU: 211 UID: 0 PID: 20451 Comm: syz.9.26 Tainted: G S                  6.18.0-smp-DEV #7 NONE
  Tainted: [S]=CPU_OUT_OF_SPEC
  Hardware name: Google Izumi/izumi, BIOS 0.20250729.1-0 07/29/2025
  RIP: 0010:fpu_swap_kvm_fpstate+0x3ce/0x610 ../arch/x86/kernel/fpu/core.c:377
  RSP: 0018:ff1100410c167cc0 EFLAGS: 00010202
  RAX: 0000000000000004 RBX: 0000000000000020 RCX: 00000000000001aa
  RDX: 00000000000001ab RSI: ffffffff817bb960 RDI: 0000000022600000
  RBP: dffffc0000000000 R08: ff110040d23c8007 R09: 1fe220081a479000
  R10: dffffc0000000000 R11: ffe21c081a479001 R12: ff110040d23c8d98
  R13: 00000000fffdc578 R14: 0000000000000000 R15: ff110040d23c8d90
  FS:  00007f86dd1876c0(0000) GS:ff11007fc969b000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f86dd186fa8 CR3: 00000040d1dfa003 CR4: 0000000000f73ef0
  PKRU: 80000000
  Call Trace:
   <TASK>
   kvm_vcpu_reset+0x80d/0x12c0 ../arch/x86/kvm/x86.c:11818
   kvm_apic_accept_events+0x1cb/0x500 ../arch/x86/kvm/lapic.c:3489
   kvm_arch_vcpu_ioctl_get_mpstate+0xd0/0x4e0 ../arch/x86/kvm/x86.c:12145
   kvm_vcpu_ioctl+0x5e2/0xed0 ../virt/kvm/kvm_main.c:4539
   __se_sys_ioctl+0x11d/0x1b0 ../fs/ioctl.c:51
   do_syscall_x64 ../arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x6e/0x940 ../arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f86de71d9c9
   </TASK>

with a very simple reproducer:

  r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x80b00, 0x0)
  r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
  ioctl$KVM_CREATE_IRQCHIP(r1, 0xae60)
  r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0)
  ioctl$KVM_SET_IRQCHIP(r1, 0x8208ae63, ...)
  ioctl$KVM_GET_MP_STATE(r2, 0x8004ae98, &(0x7f00000000c0))

Alternatively, the MPX hack in GET_MP_STATE could be extended to cover CET,
but from a "don't break existing functionality" perspective, that isn't any
less risky than peeking at the state of in_use, and it's far less robust
for a long term solution (as evidenced by this bug).

Reported-by: Alexander Potapenko <glider@google.com>
Fixes: 69cc3e8865 ("KVM: x86: Add XSS support for CET_KERNEL and CET_USER")
Reviewed-by: Yao Yuan <yaoyuan@linux.alibaba.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251030185802.3375059-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:11 -08:00
Mario Limonciello
f1fdffe0af x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
Running x86_match_min_microcode_rev() on a Zen5 CPU trips up KASAN for an out
of bounds access.

Fixes: 607b9fb2ce ("x86/CPU/AMD: Add RDSEED fix for Zen5")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251104161007.269885-1-mario.limonciello@amd.com
2025-11-04 17:16:14 +01:00
Helge Deller
fd9f30d103 parisc: Avoid crash due to unaligned access in unwinder
Guenter Roeck reported this kernel crash on his emulated B160L machine:

Starting network: udhcpc: started, v1.36.1
 Backtrace:
  [<104320d4>] unwind_once+0x1c/0x5c
  [<10434a00>] walk_stackframe.isra.0+0x74/0xb8
  [<10434a6c>] arch_stack_walk+0x28/0x38
  [<104e5efc>] stack_trace_save+0x48/0x5c
  [<105d1bdc>] set_track_prepare+0x44/0x6c
  [<105d9c80>] ___slab_alloc+0xfc4/0x1024
  [<105d9d38>] __slab_alloc.isra.0+0x58/0x90
  [<105dc80c>] kmem_cache_alloc_noprof+0x2ac/0x4a0
  [<105b8e54>] __anon_vma_prepare+0x60/0x280
  [<105a823c>] __vmf_anon_prepare+0x68/0x94
  [<105a8b34>] do_wp_page+0x8cc/0xf10
  [<105aad88>] handle_mm_fault+0x6c0/0xf08
  [<10425568>] do_page_fault+0x110/0x440
  [<10427938>] handle_interruption+0x184/0x748
  [<11178398>] schedule+0x4c/0x190
  BUG: spinlock recursion on CPU#0, ifconfig/2420
  lock: terminate_lock.2+0x0/0x1c, .magic: dead4ead, .owner: ifconfig/2420, .owner_cpu: 0

While creating the stack trace, the unwinder uses the stack pointer to guess
the previous frame to read the previous stack pointer from memory.  The crash
happens, because the unwinder tries to read from unaligned memory and as such
triggers the unalignment trap handler which then leads to the spinlock
recursion and finally to a deadlock.

Fix it by checking the alignment before accessing the memory.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # v6.12+
2025-11-04 12:21:59 +01:00
Yazen Ghannam
0a4b61d9c2 x86/amd_node: Fix AMD root device caching
Recent AMD node rework removed the "search and count" method of caching AMD
root devices. This depended on the value from a Data Fabric register that was
expected to hold the PCI bus of one of the root devices attached to that
fabric.

However, this expectation is incorrect. The register, when read from PCI
config space, returns the bitwise-OR of the buses of all attached root
devices.

This behavior is benign on AMD reference design boards, since the bus numbers
are aligned. This results in a bitwise-OR value matching one of the buses. For
example, 0x00 | 0x40 | 0xA0 | 0xE0 = 0xE0.

This behavior breaks on boards where the bus numbers are not exactly aligned.
For example, 0x00 | 0x07 | 0xE0 | 0x15 = 0x1F.

The examples above are for AMD node 0. The first root device on other nodes
will not be 0x00. The first root device for other nodes will depend on the
total number of root devices, the system topology, and the specific PCI bus
number assignment.

For example, a system with 2 AMD nodes could have this:

  Node 0 : 0x00 0x07 0x0e 0x15
  Node 1 : 0x1c 0x23 0x2a 0x31

The bus numbering style in the reference boards is not a requirement.  The
numbering found in other boards is not incorrect. Therefore, the root device
caching method needs to be adjusted.

Go back to the "search and count" method used before the recent rework.
Search for root devices using PCI class code rather than fixed PCI IDs.

This keeps the goal of the rework (remove dependency on PCI IDs) while being
able to support various board designs.

Merge helper functions to reduce code duplication.

  [ bp: Reflow comment. ]

Fixes: 40a5f6ffdf ("x86/amd_nb: Simplify root device search")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/all/20251028-fix-amd-root-v2-1-843e38f8be2c@amd.com
2025-11-03 12:46:57 +01:00
Linus Torvalds
e3e0141d3d Merge tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:

 - Limit AMD microcode Entrysign sha256 signature checking to
   known CPU generations

 - Disable AMD RDSEED32 on certain Zen5 CPUs that have a
   microcode version before when the microcode-based fix was
   issued for the AMD-SB-7055 erratum

 - Fix FPU AMD XFD state synchronization on signal delivery

 - Fix (work around) a SSE4a-disassembly related build failure
   on X86_NATIVE_CPU=y builds

 - Extend the AMD Zen6 model space with a new range of models

 - Fix <asm/intel-family.h> CPU model comments

 - Fix the CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y build, which
   was unhappy due to missing kCFI type annotations of clear_page()
   variants

* tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
  x86/cpu: Add/fix core comments for {Panther,Nova} Lake
  x86/CPU/AMD: Extend Zen6 model range
  x86/build: Disable SSE4a
  x86/fpu: Ensure XFD state on signal delivery
  x86/CPU/AMD: Add RDSEED fix for Zen5
  x86/microcode/AMD: Limit Entrysign signature checking to known generations
2025-11-01 10:20:07 -07:00
Linus Torvalds
f9bc8e0912 Merge tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
 "Miscellaneous fixes and CPU model updates:

   - Fix an out-of-bounds access on non-hybrid platforms in the Intel
     PMU DS code, reported by KASAN

   - Add WildcatLake PMU and uncore support: it's identical to the
     PantherLake version"

* tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add uncore PMU support for Wildcat Lake
  perf/x86/intel: Add PMU support for WildcatLake
  perf/x86/intel: Fix KASAN global-out-of-bounds warning
2025-11-01 10:17:40 -07:00
Linus Torvalds
ba36dd5ee6 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Mark migrate_disable/enable() as always_inline to avoid issues with
   partial inlining (Yonghong Song)

 - Fix powerpc stack register definition in libbpf bpf_tracing.h (Andrii
   Nakryiko)

 - Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann)

 - Conditionally include dynptr copy kfuncs (Malin Jonsson)

 - Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal)

 - Do not audit capability check in x86 do_jit() (Ondrej Mosnacek)

 - Fix arm64 JIT of BPF_ST insn when it writes into arena memory
   (Puranjay Mohan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf/arm64: Fix BPF_ST into arena memory
  bpf: Make migrate_disable always inline to avoid partial inlining
  bpf: Reject negative head_room in __bpf_skb_change_head
  bpf: Conditionally include dynptr copy kfuncs
  libbpf: Fix powerpc's stack register definition in bpf_tracing.h
  bpf: Do not audit capability check in do_jit()
  bpf: Sync pending IRQ work before freeing ring buffer
2025-10-31 18:22:26 -07:00
Nathan Chancellor
9b041a4b66 x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
When building with CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y, there is a series
of errors from the various versions of clear_page() not having __kcfi_typeid_
symbols.

  $ cat kernel/configs/repro.config
  CONFIG_CFI=y
  # CONFIG_LTO_NONE is not set
  CONFIG_LTO_CLANG_FULL=y

  $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config bzImage
  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_rep
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_rep)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_orig
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_orig)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_erms
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_erms)

With full LTO, it is possible for LLVM to realize that these functions never
have their address taken (as they are only used within an alternative, which
will make them a direct call) across the whole kernel and either drop or skip
generating their kCFI type identification symbols.

clear_page_{rep,orig,erms}() are defined in clear_page_64.S with
SYM_TYPED_FUNC_START as a result of

  2981557cb0 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI"),

as exported functions are free to be called indirectly thus need kCFI type
identifiers.

Use KCFI_REFERENCE with these clear_page() functions to force LLVM to see
these functions as address-taken and generate then keep the kCFI type
identifiers.

Fixes: 2981557cb0 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI")
Closes: https://github.com/ClangBuiltLinux/linux/issues/2128
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://patch.msgid.link/20251013-x86-fix-clear_page-cfi-full-lto-errors-v1-1-d69534c0be61@kernel.org
2025-10-31 22:47:24 +01:00
Linus Torvalds
b4f7f01ea1 Merge tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:

 - Use correct locking in zPCI event code to avoid deadlock

 - Get rid of irqs_registered flag in zpci_dev structure and restore IRQ
   unconditionally for zPCI devices. This fixes sit uations where the
   flag was not correctly updated

 - Fix potential memory leak kernel page table dumper code

 - Disable (revert) ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP for s390 again.

   The optimized hugetlb vmemmap code modifies kernel page tables in a
   way which does not work on s390 and leads to reproducible kernel
   crashes due to stale TLB entries. This needs to be addressed with
   some larger changes. For now simply disable the feature

 - Update defconfigs

* tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
  s390/mm: Fix memory leak in add_marker() when kvrealloc() fails
  s390/pci: Restore IRQ unconditionally for the zPCI device
  s390: Update defconfigs
  s390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump
2025-10-31 12:50:35 -07:00
Puranjay Mohan
be708ed300 bpf/arm64: Fix BPF_ST into arena memory
The arm64 JIT supports BPF_ST with BPF_PROBE_MEM32 (arena) by using the
tmp2 register to hold the dst + arena_vm_base value and using tmp2 as the
new dst register. But this is broken because in case is_lsi_offset()
returns false the tmp2 will be clobbered by emit_a64_mov_i(1, tmp2, off,
ctx); and hence the emitted store instruction will be of the form:
	strb    w10, [x11, x11]
Fix this by using the third temporary register to hold the dst +
arena_vm_base.

Fixes: 339af577ec ("bpf: Add arm64 JIT support for PROBE_MEM32 pseudo instructions.")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251030121715.55214-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 11:20:53 -07:00
Linus Torvalds
3ad81aa520 Merge tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - Fix double free in aspeed

 - Fix req->nbytes clobbering in s390/phmac

* tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aspeed - fix double free caused by devm
  crypto: s390/phmac - Do not modify the req->nbytes value
2025-10-31 07:25:10 -07:00
Sebastian Ene
103e17aac0 KVM: arm64: Check the untrusted offset in FF-A memory share
Verify the offset to prevent OOB access in the hypervisor
FF-A buffer in case an untrusted large enough value
[U32_MAX - sizeof(struct ffa_composite_mem_region) + 1, U32_MAX]
is set from the host kernel.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251017075710.2605118-1-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30 16:14:58 +00:00
Vincent Donnefort
f71f7afd0a KVM: arm64: Check range args for pKVM mem transitions
There's currently no verification for host issued ranges in most of the
pKVM memory transitions. The end boundary might therefore be subject to
overflow and later checks could be evaded.

Close this loophole with an additional pfn_range_is_valid() check on a
per public function basis. Once this check has passed, it is safe to
convert pfn and nr_pages into a phys_addr_t and a size.

host_unshare_guest transition is already protected via
__check_host_shared_guest(), while assert_host_shared_guest() callers
are already ignoring host checks.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20251016164541.3771235-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30 16:14:37 +00:00
Sascha Bischoff
da888524c3 KVM: arm64: vgic-v3: Trap all if no in-kernel irqchip
If there is no in-kernel irqchip for a GICv3 host set all of the trap
bits to block all accesses. This fixes the no-vgic-v3 selftest again.

Fixes: 3193287ddf ("KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/23072856-6b8c-41e2-93d1-ea8a240a7079@sirena.org.uk
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20251021094358.1963807-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30 16:11:21 +00:00
Heiko Carstens
64e2f60f35 s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
As reported by Luiz Capitulino enabling HVO on s390 leads to reproducible
crashes. The problem is that kernel page tables are modified without
flushing corresponding TLB entries.

Even if it looks like the empty flush_tlb_all() implementation on s390 is
the problem, it is actually a different problem: on s390 it is not allowed
to replace an active/valid page table entry with another valid page table
entry without the detour over an invalid entry. A direct replacement may
lead to random crashes and/or data corruption.

In order to invalidate an entry special instructions have to be used
(e.g. ipte or idte). Alternatively there are also special instructions
available which allow to replace a valid entry with a different valid
entry (e.g. crdte or cspg).

Given that the HVO code currently does not provide the hooks to allow for
an implementation which is compliant with the s390 architecture
requirements, disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP again, which is
basically a revert of the original patch which enabled it.

Reported-by: Luiz Capitulino <luizcap@redhat.com>
Closes: https://lore.kernel.org/all/20251028153930.37107-1-luizcap@redhat.com/
Fixes: 00a34d5a99 ("s390: select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP")
Cc: stable@vger.kernel.org
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-30 16:59:28 +01:00
Tony Luck
89216c9051 x86/cpu: Add/fix core comments for {Panther,Nova} Lake
The E-core in Panther Lake is Darkmont, not Crestmont.

Nova Lake is built from Coyote Cove (P-core) and Arctic Wolf (E-core).

Fixes: 43bb700cff ("x86/cpu: Update Intel Family comments")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20251028172948.6721-1-tony.luck@intel.com
2025-10-30 11:34:02 +01:00
Borislav Petkov (AMD)
847ebc4476 x86/CPU/AMD: Extend Zen6 model range
Add some more Zen6 models.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251029123056.19987-1-bp@kernel.org
2025-10-30 11:33:55 +01:00
Nathan Chancellor
1ed9e6b100 ARM: Select ARCH_USES_CFI_GENERIC_LLVM_PASS
Prior to clang 22.0.0 [1], ARM did not have an architecture specific
kCFI bundle lowering in the backend, which may cause issues. Select
CONFIG_ARCH_USES_CFI_GENERIC_LLVM_PASS to enable use of __nocfi_generic.

Link: d130f40264 [1]
Link: https://github.com/ClangBuiltLinux/linux/issues/2124
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251025-idpf-fix-arm-kcfi-build-error-v1-2-ec57221153ae@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-10-29 20:04:55 -07:00
Nathan Chancellor
39c89ee6e9 compiler_types: Introduce __nocfi_generic
There are two different ways that LLVM can expand kCFI operand bundles
in LLVM IR: generically in the middle end or using an architecture
specific sequence when lowering LLVM IR to machine code in the backend.
The generic pass allows any architecture to take advantage of kCFI but
the expansion of these bundles in the middle end can mess with
optimizations that may turn indirect calls into direct calls when the
call target is known at compile time, such as after inlining.

Add __nocfi_generic, dependent on an architecture selecting
CONFIG_ARCH_USES_CFI_GENERIC_LLVM_PASS, to disable kCFI bundle
generation in functions where only the generic kCFI pass may cause
problems.

Link: https://github.com/ClangBuiltLinux/linux/issues/2124
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251025-idpf-fix-arm-kcfi-build-error-v1-1-ec57221153ae@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-10-29 20:04:55 -07:00
Miaoqian Lin
07ad45e06b s390/mm: Fix memory leak in add_marker() when kvrealloc() fails
The function has a memory leak when kvrealloc() fails.
The function directly assigns NULL to the markers pointer, losing the
reference to the previously allocated memory. This causes kvfree() in
pt_dump_init() to free NULL instead of the leaked memory.

Fix by:
1. Using kvrealloc() uniformly for all allocations
2. Using a temporary variable to preserve the original pointer until
   allocation succeeds
3. Removing the error path that sets markers_cnt=0 to keep
   consistency between markers and markers_cnt

Found via static analysis and this is similar to commit 42378a9ca5
("bpf, verifier: Fix memory leak in array reallocation for stack state")

Fixes: d0e7915d2a ("s390/mm/ptdump: Generate address marker array dynamically")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-29 14:17:50 +01:00
dongsheng
f4c12e5cef perf/x86/intel/uncore: Add uncore PMU support for Wildcat Lake
WildcatLake (WCL) is a variant of PantherLake (PTL) and shares the same
uncore PMU features with PTL. Therefore, directly reuse Pantherlake's
uncore PMU enabling code for WildcatLake.

Signed-off-by: dongsheng <dongsheng.x.zhang@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20250908061639.938105-2-dapeng1.mi@linux.intel.com
2025-10-29 11:31:44 +01:00
Dapeng Mi
b796a8feb7 perf/x86/intel: Add PMU support for WildcatLake
WildcatLake is a variant of PantherLake and shares same PMU features,
so directly reuse Pantherlake's code to enable PMU features for
WildcatLake.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://patch.msgid.link/20250908061639.938105-1-dapeng1.mi@linux.intel.com
2025-10-29 11:31:44 +01:00
Dapeng Mi
0ba6502ce1 perf/x86/intel: Fix KASAN global-out-of-bounds warning
When running "perf mem record" command on CWF, the below KASAN
global-out-of-bounds warning is seen.

  ==================================================================
  BUG: KASAN: global-out-of-bounds in cmt_latency_data+0x176/0x1b0
  Read of size 4 at addr ffffffffb721d000 by task dtlb/9850

  Call Trace:

   kasan_report+0xb8/0xf0
   cmt_latency_data+0x176/0x1b0
   setup_arch_pebs_sample_data+0xf49/0x2560
   intel_pmu_drain_arch_pebs+0x577/0xb00
   handle_pmi_common+0x6c4/0xc80

The issue is caused by below code in __grt_latency_data(). The code
tries to access x86_hybrid_pmu structure which doesn't exist on
non-hybrid platform like CWF.

        WARN_ON_ONCE(hybrid_pmu(event->pmu)->pmu_type == hybrid_big)

So add is_hybrid() check before calling this WARN_ON_ONCE to fix the
global-out-of-bounds access issue.

Fixes: 090262439f ("perf/x86/intel: Rename model-specific pebs_latency_data functions")
Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251028064214.1451968-1-dapeng1.mi@linux.intel.com
2025-10-29 10:29:52 +01:00
Peter Zijlstra
0d6e9ec80c x86/build: Disable SSE4a
Leyvi Rose reported that his X86_NATIVE_CPU=y build is failing because our
instruction decoder doesn't support SSE4a and the AMDGPU code seems to be
generating those with his compiler of choice (CLANG+LTO).

Now, our normal build flags disable SSE MMX SSE2 3DNOW AVX, but then
CC_FLAGS_FPU re-enable SSE SSE2.

Since nothing mentions SSE3 or SSE4, I'm assuming that -msse (or its negative)
control all SSE variants -- but why then explicitly enumerate SSE2 ?

Anyway, until the instruction decoder gets fixed, explicitly disallow SSE4a
(an AMD specific SSE4 extension).

Fixes: ea1dcca1de ("x86/kbuild/64: Add the CONFIG_X86_NATIVE_CPU option to locally optimize the kernel with '-march=native'")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Arisu Tachibana <arisu.tachibana@miraclelinux.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Cc: <stable@kernel.org>
2025-10-28 20:43:36 +01:00
Chang S. Bae
388eff894d x86/fpu: Ensure XFD state on signal delivery
Sean reported [1] the following splat when running KVM tests:

   WARNING: CPU: 232 PID: 15391 at xfd_validate_state+0x65/0x70
   Call Trace:
    <TASK>
    fpu__clear_user_states+0x9c/0x100
    arch_do_signal_or_restart+0x142/0x210
    exit_to_user_mode_loop+0x55/0x100
    do_syscall_64+0x205/0x2c0
    entry_SYSCALL_64_after_hwframe+0x4b/0x53

Chao further identified [2] a reproducible scenario involving signal
delivery: a non-AMX task is preempted by an AMX-enabled task which
modifies the XFD MSR.

When the non-AMX task resumes and reloads XSTATE with init values,
a warning is triggered due to a mismatch between fpstate::xfd and the
CPU's current XFD state. fpu__clear_user_states() does not currently
re-synchronize the XFD state after such preemption.

Invoke xfd_update_state() which detects and corrects the mismatch if
there is a dynamic feature.

This also benefits the sigreturn path, as fpu__restore_sig() may call
fpu__clear_user_states() when the sigframe is inaccessible.

[ dhansen: minor changelog munging ]

Closes: https://lore.kernel.org/lkml/aDCo_SczQOUaB2rS@google.com [1]
Fixes: 672365477a ("x86/fpu: Update XFD state where required")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/all/aDWbctO%2FRfTGiCg3@intel.com [2]
Cc:stable@vger.kernel.org
Link: https://patch.msgid.link/20250610001700.4097-1-chang.seok.bae%40intel.com
2025-10-28 12:10:59 -07:00
Gregory Price
607b9fb2ce x86/CPU/AMD: Add RDSEED fix for Zen5
There's an issue with RDSEED's 16-bit and 32-bit register output
variants on Zen5 which return a random value of 0 "at a rate inconsistent
with randomness while incorrectly signaling success (CF=1)". Search the
web for AMD-SB-7055 for more detail.

Add a fix glue which checks microcode revisions.

  [ bp: Add microcode revisions checking, rewrite. ]

Cc: stable@vger.kernel.org
Signed-off-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20251018024010.4112396-1-gourry@gourry.net
2025-10-28 12:37:49 +01:00