linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 02:01:18 -04:00

Author	SHA1	Message	Date
Linus Torvalds	4844e7c4c2	Merge tag 'for-linus-7.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - one simple cleanup - a fix for a corner case when running as Xen PV dom0 - a fix of a regression for Xen PV guests, introduced in 7.0 * tag 'for-linus-7.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Tolerate nested XEN_LAZY_MMU entering/leaving x86/xen: Fix xen_e820_swap_entry_with_ram() xen/arm: Replace __ASSEMBLY__ with __ASSEMBLER__ in interface.h	2026-05-15 11:24:51 -07:00
Linus Torvalds	48f76a1271	Merge tag 'acpi-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support fixes from Rafael Wysocki: "These fix several platform drivers that use the ACPI companion of the given platform device without checking its presence, which may lead to a NULL pointer dereference or other kind of malfunction if the driver is forced to match a device without an ACPI companion via driver override, and restore debug log level for some messages in the ACPI CPPC library: - Check ACPI_COMPANION() against NULL during probe in several core ACPI device drivers (Rafael Wysocki) - Restore log level of messages in amd_set_max_freq_ratio() (Mario Limonciello)" * tag 'acpi-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PAD: xen: Check ACPI_COMPANION() against NULL ACPI: driver: Check ACPI_COMPANION() against NULL during probe Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"	2026-05-14 14:06:31 -07:00
Rafael J. Wysocki	af149b667b	Merge branch 'acpi-cppc' Merge a revert of an ACPI CPPC commit that increased the log level of some debug messages which turned out to be a bad idea: - Restore log level of messages in amd_set_max_freq_ratio() (Mario Limonciello) * acpi-cppc: Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"	2026-05-14 22:46:33 +02:00
Juergen Gross	4594437880	x86/xen: Tolerate nested XEN_LAZY_MMU entering/leaving With the support of nested lazy mmu sections it can happen that arch_enter_lazy_mmu_mode() is being called twice without a call of arch_leave_lazy_mmu_mode() in between, as the lazy_mmu_*() helpers are not disabling preemption when checking for nested lazy mmu sections. This is a problem when running as a Xen PV guest, as xen_enter_lazy_mmu() and xen_leave_lazy_mmu() don't tolerate this case. Fix that in xen_enter_lazy_mmu() and xen_leave_lazy_mmu() in order not to hurt all other lazy mmu mode users. Fixes: `291b3abed6` ("x86/xen: use lazy_mmu_state when context-switching") Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260508143933.493013-1-jgross@suse.com>	2026-05-14 18:33:05 +02:00
Juergen Gross	28e03f78e6	x86/xen: Fix xen_e820_swap_entry_with_ram() When swapping a not page-aligned E820 map entry with RAM, the start address of the modified entry is calculated wrong (the offset into the page is subtracted instead of being added to the page address). Fixes: `be35d91c88` ("xen: tolerate ACPI NVS memory overlapping with Xen allocated memory") Reported-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260505102417.208138-1-jgross@suse.com>	2026-05-14 18:33:05 +02:00
Linus Torvalds	e1914add27	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "arm64: - Add the pKVM side of the workaround for ARM's erratum 4193714, provided that the EL3 firmware does its part of the job. KVM will refuse to initialise otherwise - Correctly handle 52bit VAs for guest EL2 stage-1 translations when running under NV with E2H==0 - Correctly deal with permission faults in guest_memfd memslots - Fix the steal-time selftest after the infrastructure was reworked - Make sure the host cannot pass a non-sensical clock update to the EL2 tracing infrastructure - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390 ability to run arm64 guests, which will inevitably lead to arm64 code being directly used on s390 - Make sure that EL2 is configured with both exception entry and exit being Context Synchronization Events - Handle the current vcpu being NULL on EL2 panic - Fix the selftest_vcpu memcache being empty at the point of donation or sharing - Check that the memcache has enough capacity before engaging on the share/donate path - Fix __deactivate_fgt() to use its parameter rather than a variable in the macro context s390: - Fix array overrun with large amounts of PCI devices x86: - Never use L0's PAUSE loop exiting while L2 is running, since it's unlikely that a nested guest will help solving the hypervisor's spinlock contention - Fix emulation of MOVNTDQA - Fix typo in Xen hypercall tracepoint - Add back an optimization that was left behind when recently fixing a bug - Add module parameter to disable CET, whose implementation seems to have issues. For now it remains enabled by default Generic: - Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn() Documentation: - Update stale links Selftests: - Fix guest_memfd_test with host page size > guest page size" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: VMX: introduce module parameter to disable CET KVM: x86: Swap the dst and src operand for MOVNTDQA KVM: x86: use again the flush argument of __link_shadow_page() KVM: selftests: Ensure gmem file sizes are multiple of host page size Documentation: kvm: update links in the references section of AMD Memory Encryption KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running KVM: x86: Fix Xen hypercall tracepoint argument assignment KVM: Reject wrapped offset in kvm_reset_dirty_gfn() KVM: arm64: Pre-check vcpu memcache for host->guest donate KVM: arm64: Pre-check vcpu memcache for host->guest share KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache KVM: arm64: Fix __deactivate_fgt macro parameter typo KVM: arm64: Guard against NULL vcpu on VHE hyp panic path KVM: arm64: Make EL2 exception entry and exit context-synchronization events MAINTAINERS: Add Steffen as reviewer for KVM/arm64 KVM: arm64: Remove potential UB on nvhe tracing clock update KVM: selftests: arm64: Fix steal_time test after UAPI refactoring KVM: arm64: Handle permission faults with guest_memfd KVM: arm64: nv: Consider the DS bit when translating TCR_EL2 KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests ...	2026-05-13 11:53:51 -07:00
Paolo Bonzini	2d5d3fc593	KVM: VMX: introduce module parameter to disable CET There have been reports of host hangs caused by CET virtualization. Until these are analyzed further, introduce a module parameter that makes it possible to easily disable it. Link: https://lore.kernel.org/all/85548beb-1486-40f9-beb4-632c78e3360b@proxmox.com/ Cc: David Riley <d.riley@proxmox.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-05-13 15:38:22 +02:00
Paolo Bonzini	ef7e0c51d9	Merge tag 'kvm-s390-master-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: pci: fix array indexing For large amounts of PCI devices its possible to overrun the arrays as the index was miscalculated in 2 places.	2026-05-12 23:15:38 +02:00
Sean Christopherson	3098c076c8	KVM: x86: Swap the dst and src operand for MOVNTDQA Swap the MOVNTDQA operands, as MOVNTDQA does NOT in fact have "the same characteristics as 0F E7 (MOVNTDQ)"; MOVNTDQA loads from memory and stores to registers, while MOVNTDQ loads from registers and stores to memory. Per the SDM: MOVNTDQ - Move packed integer values in xmm1 to m128 using non-temporal hint. MOVNTDQA - Move double quadword from m128 to xmm1 using non-temporal hint if WC memory type. Reported-by: Josh Eads <josheads@google.com> Fixes: `c57d9bafbd` ("KVM: x86: Add support for emulating MOVNTDQA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260506213514.2781948-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-05-12 23:12:32 +02:00
Paolo Bonzini	6b72d0578c	KVM: x86: use again the flush argument of __link_shadow_page() Except in the case of parentless nested-TDP pages, mmu_page_zap_pte() clears the SPTE but leaves the invalid_list empty. In this case, using kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill. Avoid flushing the entirety of the remote TLBs unless the invalid_list was populated: instead, use a more efficient gfn-targeting flush (if available) and skip it altogether if the caller guarantees that a TLB flush is not necessary. Based-on: <20260503201029.106481-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20260503210917.121840-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-05-12 23:12:31 +02:00
Paolo Bonzini	4a9ee4fc79	Merge tag 'kvmarm-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #2 - Add the pKVM side of the workaround for ARM's erratum 4193714, provided that the EL3 firmware does its part of the job. KVM will refuse to initialise otherwise. - Correctly handle 52bit VAs for guest EL2 stage-1 translations when running under NV with E2H==0. - Correctly deal with permission faults in guest_memfd memslots. - Fix the steal-time selftest after the infrastructure was reworked. - Make sure the host cannot pass a non-sensical clock update to the EL2 tracing infrastructure. - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390 ability to run arm64 guests, which will inevitably lead to arm64 code being directly used on s390. - Make sure that EL2 is configured with both exception entry and exit being Context Synchronization Events. - Handle the current vcpu being NULL on EL2 panic. - Fix the selftest_vcpu memcache being empty at the point of donation or sharing. - Check that the memcache has enough capacity before engaging on the share/donate path. - Fix __deactivate_fgt() to use its parameter rather than a variable in the macro context.	2026-05-12 22:19:20 +02:00
Sean Christopherson	5bd1ddb791	KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running Never use L0's (KVM's) PAUSE loop exiting controls while L2 is running, and instead always configure vmcb02 according to L1's exact capabilities and desires. The purpose of intercepting PAUSE after N attempts is to detect when the vCPU may be stuck waiting on a lock, so that KVM can schedule in a different vCPU that may be holding said lock. Barring a very interesting setup, L1 and L2 do not share locks, and it's extremely unlikely that an L1 vCPU would hold a spinlock while running L2. I.e. having a vCPU executing in L1 yield to a vCPU running in L2 will not allow the L1 vCPU to make forward progress, and vice versa. While teaching KVM's "on spin" logic to only yield to other vCPUs in L2 is doable, in all likelihood it would do more harm than good for most setups. KVM has limited visibility into which L2 "vCPUs" belong to the same VM, and thus share a locking domain. And even if L2 vCPUs are in the same VM, KVM has no visilibity into L2 vCPU's that are scheduled out by the L1 hypervisor. Furthermore, KVM doesn't actually steal PAUSE exits from L1. If L1 is intercepting PAUSE, KVM will route PAUSE exits to L1, not L0, as nested_svm_intercept() gives priority to the vmcb12 intercept. As such, overriding the count/threshold fields in vmcb02 with vmcb01's values is nonsensical, as doing so clobbers all the training/learning that has been done in L1. Even worse, if L1 is not intercepting PAUSE, i.e. KVM is handling PAUSE exits, then KVM will adjust the PLE knobs based on L2 behavior, which could very well be detrimental to L1, e.g. due to essentially poisoning L1 PLE training with bad data. And copying the count from vmcb02 to vmcb01 on a nested VM-Exit makes even less sense, because again, the purpose of PLE is to detect spinning vCPUs. Whether or not a vCPU is spinning in L2 at the time of a nested VM-Exit has no relevance as to the behavior of the vCPU when it executes in L1. The only scenarios where any of this actually works is if at least one of KVM or L1 is NOT intercepting PAUSE for the guest. Per the original changelog, those were the only scenarios considered to be supported. Disabling KVM's use of PLE makes it so the VM is always in a "supported" mode. Last, but certainly not least, using KVM's count/threshold instead of the values provided by L1 is a blatant violation of the SVM architecture. Fixes: `74fd41ed16` ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Cc: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260508213321.373309-1-seanjc@google.com/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-05-12 22:17:28 +02:00
Qiang Ma	2b72f1674e	KVM: x86: Fix Xen hypercall tracepoint argument assignment TRACE_EVENT(kvm_xen_hypercall) stores a5 in __entry->a4 instead of __entry->a5. That overwrites the recorded a4 argument and leaves a5 unset in the trace entry. Fix the typo so both arguments are captured correctly. Signed-off-by: Qiang Ma <maqianga@uniontech.com> Link: https://patch.msgid.link/20260512015313.1685784-1-maqianga@uniontech.com/ Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-05-12 22:16:26 +02:00
Prathyushi Nangia	c21b90f776	x86/CPU/AMD: Prevent improper isolation of shared resources in Zen2's op cache Make sure resources are not improperly shared in the op cache and cause instruction corruption this way. Signed-off-by: Prathyushi Nangia <prathyushi.nangia@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-05-11 20:06:36 -07:00
Linus Torvalds	bf0e022821	Merge tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix KASAN sanitization flag for core_$(BITS).o - Fixes for handling offset values in pseries htmdump - Fix interrupt mask in cpm1_gpiochip_add16() - ps3/pasemi fixes to drop redundant result assignment - Fixes in papr-hvpipe code path - powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events Thanks to Aboorva Devarajan, Athira Rajeev, Christophe Leroy (CS GROUP), Geert Uytterhoeven, Haren Myneni, Krzysztof Kozlowski, Mukesh Kumar Chaurasiya (IBM), Nathan Chancellor, Ritesh Harjani (IBM), Shivani Nittor, Sourabh Jain, Thomas Zimmermann, and Venkat Rao Bagalkote. * tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits) powerpc/pasemi: Drop redundant res assignment powerpc/ps3: Drop redundant result assignment powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16() powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ() pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg() pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release() pseries/papr-hvpipe: Fix the usage of copy_to_user() pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init() pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle() pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace pseries/papr-hvpipe: Fix race with interrupt handler powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module powerpc/pseries/htmdump: Fix the offset value used in htm status dump powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump ...	2026-05-09 08:03:21 -07:00
Linus Torvalds	70390501d1	Merge tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix memory map enumeration bug in the Xen e820 parsing code (Juergen Gross) - Re-enable e820 BIOS fallback if e820 table is empty (David Gow) * tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/e820: Re-enable BIOS fallback if e820 table is empty x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()	2026-05-08 20:28:45 -07:00
Linus Torvalds	e5cf0260a7	Merge tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fixes from Ingo Molnar: - Fix deadlock in the perf_mmap() failure path (Peter Zijlstra) - Intel ACR (Auto Counter Reload) fixes (Dapeng Mi): - Fix validation and configuration of ACR masks - Fix ACR rescheduling bug causing stale masks - Disable the PMI on ACR-enabled hardware - Enable ACR on Panther Cover uarch too * tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Enable auto counter reload for DMR perf/x86/intel: Disable PMI for self-reloaded ACR events perf/x86/intel: Always reprogram ACR events to prevent stale masks perf/x86/intel: Improve validation and configuration of ACR masks perf/core: Fix deadlock in perf_mmap() failure path	2026-05-08 19:39:18 -07:00
Linus Torvalds	27a26ccfd5	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: - ptrace(PTRACE_SETREGSET) fix to zero the target's fpsimd_state rather than the tracer's * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's	2026-05-08 16:18:35 -07:00
Mario Limonciello	db5dadb562	Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn" Some older systems don't support CPPC in the firmware and this just makes noise for them when booting. Drop back to debug. This reverts commit `21fb59ab4b`. Fixes: `21fb59ab4b` ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn") Suggested-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: All applicable <stable@vger.kernel.org> Link: https://patch.msgid.link/20260504230141.484743-2-mario.limonciello@amd.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-05-08 21:14:19 +02:00
Fuad Tabba	effc0a39b8	KVM: arm64: Pre-check vcpu memcache for host->guest donate __pkvm_host_donate_guest() flips the host stage-2 PTE for the donated page to a non-valid annotation via host_stage2_set_owner_metadata_locked() and then calls kvm_pgtable_stage2_map() to install the matching guest stage-2 mapping. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM even at PAGE_SIZE granularity: the donate path verifies PKVM_NOPAGE for the guest IPA before the map, so the walker must allocate fresh page-table pages from the vcpu memcache, and the host controls the vcpu memcache via the topup interface. An under-provisioned donation request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation alongside the existing __host_check_page_state_range() / __guest_check_page_state_range() pre-checks, using the helper introduced for host->guest share. If the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(), return -ENOMEM before any state mutation. Fixes: `1e579adca1` ("KVM: arm64: Introduce __pkvm_host_donate_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:42 +01:00
Fuad Tabba	8234409ffb	KVM: arm64: Pre-check vcpu memcache for host->guest share __pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to install the guest stage-2 mapping, after a forward pass that mutates the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments host_share_guest_count) for every page in the range. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM when the stage-2 walker exhausts the caller's memcache, and the host controls the vcpu memcache via the topup interface, so an under-provisioned share request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation in the existing pre-check pass so that kvm_pgtable_stage2_map() cannot fail at the call site, using kvm_mmu_cache_min_pages() -- the same bound host EL1 uses for its own stage-2 maps. If the vcpu memcache holds fewer pages, return -ENOMEM before any state mutation. Fixes: `d0bd3e6570` ("KVM: arm64: Introduce __pkvm_host_share_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:42 +01:00
Fuad Tabba	5130d450d1	KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache The hypercall handlers call pkvm_refill_memcache() to top up the hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest(). pkvm_ownership_selftest invokes those functions directly with a static selftest_vcpu that has an empty memcache. Seed selftest_vcpu's memcache from the prepopulated selftest pages, leaving the remainder for selftest_vm.pool. Required by the memcache-sufficiency pre-check added in the following patches. Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:41 +01:00
Fuad Tabba	d4d215e5b8	KVM: arm64: Fix __deactivate_fgt macro parameter typo __deactivate_fgt() declares its first parameter as "htcxt" but the body references "hctxt". The parameter is unused; the macro silently captures "hctxt" from the enclosing scope. Both existing callers (__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen to define a local "struct kvm_cpu_context *hctxt", so the macro works by coincidence. A future caller without an "hctxt" local in scope, or naming it differently, would compile but bind to the wrong context. Align the parameter name with the sibling __activate_fgt() macro. The "vcpu" parameter remains unused in the body, kept for API symmetry with __activate_fgt() (which uses it). Fixes: `f5a5a406b4` ("KVM: arm64: Propagate and handle Fine-Grained UNDEF bits") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:41 +01:00
Fuad Tabba	300fac4cc2	KVM: arm64: Guard against NULL vcpu on VHE hyp panic path On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu) on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer is cleared after every guest exit (and is never set when no guest is running), so an unexpected EL2 exception landing in _guest_exit_panic, e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this function with vcpu == NULL. __deactivate_traps() then dereferences vcpu via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv() -> vcpu->arch.features, faulting inside the panic handler and obscuring the original failure. The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c) already guards its vcpu-using cleanup with "if (vcpu)"; mirror that here. sysreg_restore_host_state_vhe() does not depend on vcpu and continues to run unconditionally, preserving panic forensics. The trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's %p handling. Fixes: `6a0259ed29` ("KVM: arm64: Remove hyp_panic arguments") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:41 +01:00
Fuad Tabba	d7396a72ea	KVM: arm64: Make EL2 exception entry and exit context-synchronization events SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI 0487 M.b D24.2.175 (p. D24-9754): - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally a CSE. - FEAT_ExS: the reset value is architecturally UNKNOWN; software must set the bit to make the entry/exit a CSE. INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after MSRs to context-switching system registers (HCR_EL2, ZCR_EL2, ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not architecturally backed unless EOS=1 (and, for entry, EIS=1). Until commit `0a35bd285f` ("arm64: Convert SCTLR_EL2 to sysreg infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON was setting both unconditionally. The conversion made SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only models unconditionally-RES1 fields and EIS/EOS are RES1 only when FEAT_ExS is absent, the auto-generated mask is UL(0). The seven other bits dropped from the old mask (positions 4, 5, 16, 18, 23, 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS are the only bits whose semantics changed for FEAT_ExS hardware and where the kernel relies on the value being 1. Make the guarantee explicit: include SCTLR_ELx_EIS \| SCTLR_ELx_EOS in INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are unconditionally CSEs regardless of whether FEAT_ExS is implemented. This matches the pairing in arch/arm64/kvm/config.c which treats EIS and EOS together as RES1 under !FEAT_ExS. Fixes: `0a35bd285f` ("arm64: Convert SCTLR_EL2 to sysreg infrastructure") Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-07 14:12:41 +01:00
David Gow	5772f65352	x86/boot/e820: Re-enable BIOS fallback if e820 table is empty In commit: `157266edcc` ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables") the check on the number of entries in the e820 table was removed. The intention was to support single-entry maps, but by removing the check entirely, we also skip the fallback (to, e.g., the BIOS 88h function). This means that if no E820 map is passed in from the bootloader (which is the case on some bootloaders, like linld), we end up with an empty memory map, and the kernel fails to boot (either by deadlocking on OOM, or by failing to allocate the real mode trampoline, or similar). Re-instate the check in append_e820_table(), but only check that nr_entries is non-zero. This allows e820__memory_setup_default() to fall back to other memory size sources, and doesn't affect e820__memory_setup_extended(), as the latter ignores the return value from append_e820_table(). In doing so, we also update the return values to be proper error codes, with -ENOENT for this case (there are no entries), and -EINVAL for the case where an entry appears invalid. Given none of the callers check the actual value -- just whether it's nonzero -- this is largely aesthetic in practice. Tested against linld, and the kernel boots again fine. [ mingo: Readability edits to the comment and the changelog. ] Fixes: `157266edcc` ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables") Signed-off-by: David Gow <david@davidgow.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://patch.msgid.link/20260416065746.1896647-1-david@davidgow.net	2026-05-07 10:04:54 +02:00
Linus Torvalds	5862221fdd	Merge tag 'parisc-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - Revert "parisc: led: fix reference leak on failed device registration" - Fix build failures introduced when allowing to build 32-/64-bit only VDSO - Switch to dynamic parisc root device to avoid upcoming warnings - Fix IRQ leak in LASI driver * tag 'parisc-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix IRQ leak in LASI driver parisc: Fix 64-bit kernel build when CONFIG_COMPAT=n parisc: Fix build failure for 32-bit kernel with PA2.0 instruction set parisc: drivers: switch to dynamic root device Revert "parisc: led: fix reference leak on failed device registration"	2026-05-06 12:51:07 -07:00
Mostafa Saleh	9a624ea3f2	KVM: arm64: Remove potential UB on nvhe tracing clock update Sashiko(locally) reports possiblity of division by zero and out-of-bounds bitwise shift in trace_clock_update(). Although the clock update is untrusted, we should at least have some basic checks to avoid undefined behaviours. Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://patch.msgid.link/20260430103724.2151625-1-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-06 17:09:48 +01:00
Alexandru Elisei	9be19df816	KVM: arm64: Handle permission faults with guest_memfd gmem_abort() calls kvm_pgtable_stage2_map() to make changes to stage 2. It does this for both relaxing permissions on an existing mapping and to install a missing mapping. kvm_pgtable_stage2_map() doesn't make changes to stage 2 if there is an existing, valid entry and the new entry modifies only the permissions. This is checked in: kvm_pgtable_stage2_map() stage2_map_walk_leaf() stage2_map_walker_try_leaf() stage2_pte_needs_update() and if only the permissions differ, kvm_pgtable_stage2_map() returns -EAGAIN and KVM returns to the guest to replay the instruction. The assumption is that a concurrent fault on a different VCPU already mapped the faulting IPA, and replaying the instruction will either succeed, or cause a permission fault, which should be handled with kvm_pgtable_stage2_relax_perms(). gmem_abort(), on a read or write fault on a system without DIC (instruction cache invalidation required for data to instruction coherence), installs a valid entry with read and write permissions, but without executable permissions. On an execution fault on the same page, gmem_abort() attempts to relax the permissions to allow execution, but calls kvm_pgtable_stage2_map() to change the existing, valid, entry. kvm_pgtable_stage2_map() returns -EAGAIN and KVM resumes execution from the faulting instruction, which leads to an infinite loop of permission faults on the same instruction. Allow the guest to make progress by using kvm_pgtable_stage2_relax_perms() to relax permissions. Fixes: `a7b57e0995` ("KVM: arm64: Handle guest_memfd-backed guest page faults") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260505094913.75317-1-alexandru.elisei@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-06 17:08:39 +01:00
Wei-Lin Chang	8d9b9d985a	KVM: arm64: nv: Consider the DS bit when translating TCR_EL2 When running an nVHE L1, TCR_EL2 is mapped to TCR_EL1. Writes to the register are trapped and written to TCR_EL1 after a translation. Booting an nVHE L1 with 52-bit VA isn't working because the translation was ignoring the DS bit set by the guest, hence causing repeating level 0 faults. Add it in the translation function. Signed-off-by: Wei-Lin Chang <weilin.chang@arm.com> Link: https://patch.msgid.link/20260505144735.1496530-1-weilin.chang@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-06 17:08:39 +01:00
James Morse	1f7305d87a	KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests C1-Pro cores with SME have an erratum where TLBI+DSB does not complete all outstanding SME accesses. Instead a DSB needs to be executed on the affected CPUs. The implication is that pages cannot be unmapped from the host Stage 2 and then provided to a protected guest or to the hypervisor. Host SME accesses may still complete after this point. This erratum breaks pKVM's guarantees, and the workaround is hard to implement as EL2 and EL1 share a security state meaning EL1 can mask IPIs sent by EL2, leading to interrupt blackouts. Instead, do this in EL3. This has the advantage of a separate security state, meaning lower EL cannot mask the IPI. It is also simpler for EL3 to know about CPUs that are off or in PSCI's CPU_SUSPEND. Add the needed hook to host_stage2_set_owner_metadata_locked(). This covers the cases where the host loses access to a page: __pkvm_host_donate_guest() __pkvm_guest_unshare_host() host_stage2_set_owner_locked() when owner_id == PKVM_ID_HYP Since pKVM relies on the firmware call for correctness, check for the firmware counterpart during protected KVM initialisation and fail the pKVM initialisation if it is missing. Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Sudeep Holla <sudeep.holla@kernel.org> Link: https://patch.msgid.link/20260505165205.2690919-1-catalin.marinas@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-05-06 17:08:39 +01:00
Linus Torvalds	adc1e5c620	Merge tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Fix issues in EFI graceful recovery on x86 introduced by changes to the kernel mode FPU APIs - I-cache coherency fixes for the LoongArch EFI stub - Locking fix for EFI pstore - Code tweak for efivarfs * tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efi: Restore IRQ state in EFI page fault handler x86/efi: Fix graceful fault handling after FPU softirq changes efi/libstub: Synchronize instruction cache after kernel relocation efi/loongarch: Implement efi_cache_sync_image() efi/libstub: Move efi_relocate_kernel() into its only remaining user efi: pstore: Drop efivar lock when efi_pstore_open() returns with an error efivarfs: use QSTR() in efivarfs_alloc_dentry	2026-05-06 07:27:30 -07:00
Breno Leitao	5cbb61bf41	arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's sve_set_common() is the backend for PTRACE_SETREGSET(NT_ARM_SVE) and PTRACE_SETREGSET(NT_ARM_SSVE). Every write in the function operates on the tracee (target) - except a single memset that uses current instead, zeroing the tracer's saved V0-V31 / FPSR / FPCR shadow on every ptrace SETREGSET call. The memset is meant to give the tracee a defined zero register image before the user-supplied payload is copied in (for partial writes, header-only writes, and FPSIMD<->SVE format switches). Aiming it at current both denies the tracee that clean slate and silently corrupts the tracer. The corruption of the tracer's saved FPSIMD state is not always observable. Where the tracer's state is live on a CPU, this may be reused without loading the corrupted state from memory, and will eventually be written back over the corrupted state. Where the tracer's state is saved in SVE_PT_REGS_SVE format, only the FPSR and FPCR are clobbered, and the effective copy of the vectors is in the task's sve_state. Reproducible on an arm64 kernel with SVE: a single-threaded tracer that loads a known pattern into V0-V31, issues PTRACE_SETREGSET(NT_ARM_SVE) on a child, and reads V0-V31 back observes them all zeroed within tens of thousands of iterations when a sibling thread keeps stealing the FPSIMD CPU binding. Fixes: `316283f276` ("arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE") Cc: <stable@vger.kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-05-06 12:11:49 +01:00
Linus Torvalds	e80948062d	Merge tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix some build and runtime issues after 32BIT Kconfig option enabled, improve the platform-specific PCI controller compatibility, drop custom __arch_vdso_hres_capable(), and fix a lot of KVM bugs" * tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Move unconditional delay into timer clear scenery LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software LoongArch: KVM: Move AVEC interrupt injection into switch loop LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte() LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read() LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry LoongArch: KVM: Compile switch.S directly into the kernel LoongArch: vDSO: Drop custom __arch_vdso_hres_capable() LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang() LoongArch: Use per-root-bridge PCIH flag to skip mem resource fixup LoongArch: Fix SYM_SIGFUNC_START definition for 32BIT LoongArch: Specify -m32/-m64 explicitly for 32BIT/64BIT LoongArch: Make CONFIG_64BIT as the default option	2026-05-05 19:44:46 -07:00
Krzysztof Kozlowski	f583bd5f64	powerpc/pasemi: Drop redundant res assignment Return value of pas_add_bridge() is not used, so code can be simplified to fix W=1 clang warnings: arch/powerpc/platforms/pasemi/pci.c:275:6: error: variable 'res' set but not used [-Werror,-Wunused-but-set-variable] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260317130823.240279-4-krzysztof.kozlowski@oss.qualcomm.com	2026-05-06 07:49:19 +05:30
Krzysztof Kozlowski	8333e49160	powerpc/ps3: Drop redundant result assignment Return value of ps3_start_probe_thread() is not used, so code can be simplified to fix W=1 clang warnings: arch/powerpc/platforms/ps3/device-init.c:953:6: error: variable 'result' set but not used [-Werror,-Wunused-but-set-variable] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260317130823.240279-3-krzysztof.kozlowski@oss.qualcomm.com	2026-05-06 07:49:19 +05:30
Nathan Chancellor	60c71369ee	powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang After commit `73cdf24e81` ("powerpc64: make clang cross-build friendly"), building 64-bit little endian + CONFIG_COMPAT=y with clang results in many warnings along the lines of: $ cat arch/powerpc/configs/compat.config CONFIG_COMPAT=y $ make -skj"$(nproc)" ARCH=powerpc LLVM=1 ppc64le_defconfig compat.config arch/powerpc/kernel/vdso/ ... In file included from <built-in>:4: In file included from lib/vdso/gettimeofday.c:6: In file included from include/vdso/datapage.h:15: In file included from include/vdso/cache.h:5: arch/powerpc/include/asm/cache.h:77:8: warning: unknown attribute 'patchable_function_entry' ignored [-Wunknown-attributes] 77 \| static inline u32 l1_icache_bytes(void) \| ^~~~~~ include/linux/compiler_types.h:235:58: note: expanded from macro 'inline' 235 \| #define inline inline __gnu_inline __inline_maybe_unused notrace \| ^~~~~~~ include/linux/compiler_types.h:215:34: note: expanded from macro 'notrace' 215 \| #define notrace __attribute__((patchable_function_entry(0, 0))) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... arch/powerpc/Makefile adds -DCC_USING_PATCHABLE_FUNCTION_ENTRY to KBUILD_CPPFLAGS, which is inherited by the 32-bit vDSO. However, the 32-bit little endian target does not support '-fpatchable-function-entry', resulting in the warnings above. Remove -DCC_USING_PATCHABLE_FUNCTION_ENTRY from the 32-bit vDSO flags when building with clang to avoid the warnings. Fixes: `73cdf24e81` ("powerpc64: make clang cross-build friendly") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311-ppc-vdso-drop-cc-using-pfe-define-clang-v1-1-66c790e22650@kernel.org	2026-05-06 07:45:59 +05:30
Thomas Zimmermann	4052b93204	arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files CONFIG_FIRMWARE_EDID=y depends on X86 or EFI_GENERIC_STUB. Neither is true here, so drop the lines from the defconfig files. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260401083023.214426-1-tzimmermann@suse.de	2026-05-06 07:41:15 +05:30
Shivani Nittor	131717e656	powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events The core-book3s PMU sampling code validates the SIER TYPE field when PERF_SAMPLE_DATA_SRC is requested. The SIER TYPE field indicates the instruction type and is only valid for random sampling (marked events). To handle cases observed where SIER TYPE could be zero even for marked events,validation was added to drop such samples and increment event->lost_samples. However, this validation was applied to all samples, including continuous sampling. In continuous sampling mode, the PMU does not set the SIER TYPE field, so it remains zero. As a result, valid continuous samples were incorrectly treated as invalid and dropped. Fixed this by gating the SIER TYPE validation with mark_event, so the check runs only for marked (random) events. Continuous samples now skip this check and are recorded normally in the final data recording path. Fixes: `2ffb26afa6` ("arch/powerpc/perf: Check the instruction type before creating sample with perf_mem_data_src") Signed-off-by: Shivani Nittor <shivani@linux.ibm.com> Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Reviewed-by: Athira Rajeev <atrajeev@linux.ibm.com> [Maddy: Fixed reviewed-by tag] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260421150628.96500-1-shivani@linux.ibm.com	2026-05-06 07:38:10 +05:30
Christophe Leroy (CS GROUP)	da107152c4	powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16() Allthough fsl,cpm1-gpio-irq-mask always contains a 16 bits value, it is a standard u32 OF property as documented in Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt The driver erroneously uses of_property_read_u16() leading to a mask which is always 0. Fix it by using of_property_read_u32() instead. Fixes: `726bd22310` ("powerpc/8xx: Adding support of IRQ in MPC8xx GPIO") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/bb0b6d6c4543238c38d5d29a776d0674a8c0c180.1776752750.git.chleroy@kernel.org	2026-05-06 07:35:19 +05:30
Sourabh Jain	38e989d504	powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec The kexec sequence invokes enter_vmx_ops() via copy_page() with the MMU disabled. In this context, code must not rely on normal virtual address translations or trigger page faults. With KASAN enabled, functions get instrumented and may access shadow memory using regular address translation. When executed with the MMU off, this can lead to page faults (bad_page_fault) from which the kernel cannot recover in the kexec path, resulting in a hang. The kexec path sets preempt_count to HARDIRQ_OFFSET before entering the MMU-off copy sequence. current_thread_info()->preempt_count = HARDIRQ_OFFSET kexec_sequence(..., copy_with_mmu_off = 1) -> kexec_copy_flush(image) copy_segments() -> copy_page(dest, addr) bl enter_vmx_ops() if (in_interrupt()) return 0 beq .Lnonvmx_copy Since kexec sets preempt_count to HARDIRQ_OFFSET, in_interrupt() evaluates to true and enter_vmx_ops() returns early. As in_interrupt() (and preempt_count()) are always inlined, mark enter_vmx_ops() with __no_sanitize_address to avoid KASAN instrumentation and shadow memory access with MMU disabled, helping kexec boot fine with KASAN enabled. Reported-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260407124349.1698552-2-sourabhjain@linux.ibm.com	2026-05-06 07:31:28 +05:30
Sourabh Jain	b3a97f9484	powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o KASAN instrumentation is intended to be disabled for the kexec core code, but the existing Makefile entry misses the object suffix. As a result, the flag is not applied correctly to core_$(BITS).o. So when KASAN is enabled, kexec_copy_flush and copy_segments in kexec/core_64.c are instrumented, which can result in accesses to shadow memory via normal address translation paths. Since these run with the MMU disabled, such accesses may trigger page faults (bad_page_fault) that cannot be handled in the kdump path, ultimately causing a hang and preventing the kdump kernel from booting. The same is true for kexec as well, since the same functions are used there. Update the entry to include the “.o” suffix so that KASAN instrumentation is properly disabled for this object file. Fixes: `2ab2d5794f` ("powerpc/kasan: Disable address sanitization in kexec paths") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/1dee8891-8bcc-46b4-93f3-fc3a774abd5b@linux.ibm.com/ Cc: stable@vger.kernel.org Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260407124349.1698552-1-sourabhjain@linux.ibm.com	2026-05-06 07:31:27 +05:30
Ritesh Harjani (IBM)	629d1a901d	pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ() While at it let's also fix the similar style issue in enable_hvpipe_IRQ() function. This also fixes a minor checkpatch warning which I got due to an extra space before " ==". Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/1174f60d0ae128e773dbefd11dd8d46d69e7f50e.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:25 +05:30
Ritesh Harjani (IBM)	fe53d2ae82	pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg() Simplify hvpipe_rtas_recv_msg() by removing three levels of nesting... if (!ret) if (buf) if (size < bytes_written) ... this refactoring of the function bails out to "out:" label first, in case of any error. This simplifies the init flow. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/bbe7ddf8b8e25c9be8fc5e2c4aea9e5fca128bf4.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:25 +05:30
Ritesh Harjani (IBM)	4e2d83c804	pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info We don't really use task_struct pointer for anything meaningful. So just kill it for now, and we can bring back later if we need this for any future debug purposes. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/895e061e45cdc95db36fa7f27aa1922b81eed867.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:25 +05:30
Ritesh Harjani (IBM)	2eeac57748	pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release() Once the src_info is removed from the global list, no one can access it. This simplies the usage of spin_unlock_irqrestore() in papr_hvpipe_handle_release() Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/4a980331557af3d10aada8576aaa16cddc691c65.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:25 +05:30
Ritesh Harjani (IBM)	d48654bd8b	pseries/papr-hvpipe: Fix the usage of copy_to_user() copy_to_user() return bytes_not_copied to the user buffer. If there was an error writing bytes into the user buffer, i.e. if copy_to_user returns a non-zero value, then we should simply return -EFAULT from the ->read() call. Otherwise, in the non-patched version, we may end up mixing "bytes_not_copied + bytes_copied (HVPIPE_HDR_LEN)" as the return value to the user in ->read() call Also let's make sure we clear the hvpipe_status flag, if we have consumed the hvpipe msg by making the rtas call. ret = -EFAULT means copy_to_user has failed but that still means that the msg was read from the hvpipe, hence for both cases, success & -EFAULT, we should clear the HVPIPE_MSG_AVAILABLE flag in hvpipe_status. Cc: stable@vger.kernel.org Fixes: `cebdb522fd` ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8fda3212a1ad48879c174e92f67472d9b9f1c3b7.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:24 +05:30
Ritesh Harjani (IBM)	713e468cdb	pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init() Remove such 3 levels of nesting patterns to check success return values from function calls. ret = enable_hvpipe_IRQ() if (!ret) ret = set_hvpipe_sys_param(1) if (!ret) ret = misc_register() Instead just bail out to "out*:" labels, in case of any error. This simplifies the init flow. While at it let's also fix the following error handling logic: We have already enabled interrupt sources and enabled hvpipe to received interrupts, if misc_register() fails, we will destroy the workqueue, but the HMC might send us a msg via hvpipe which will call, queue work on the workqueue which might be destroyed. So instead, let's reverse the order of enabling set_hvpipe_sys_param(1) and in case of an error let's remove the misc dev by calling misc_deregister(). Cc: stable@vger.kernel.org Fixes: `39a08a4f94` ("powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/f2141eafb80e7780395e03aa9a22e8a37be80513.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:24 +05:30
Ritesh Harjani (IBM)	1b9f7aafa4	pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle() commit `6d3789d347` ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()"), changed the create handle to FD_PREPARE(), but it caused kernel null-ptr-deref because after call to retain_and_null_ptr(src_info), src_info is re-used for adding it to the global list. Getting the following kernel panic in papr_hvpipe_dev_create_handle() when trying to add src_info to the list. Kernel attempted to write user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on write at 0x00000000 Faulting instruction address: 0xc0000000001b44a0 Oops: Kernel access of bad area, sig: 11 [#1] ... Call Trace: papr_hvpipe_dev_ioctl+0x1f4/0x48c (unreliable) sys_ioctl+0x528/0x1064 system_call_exception+0x128/0x360 system_call_vectored_common+0x15c/0x2ec Now, the error handling with FD_PREPARE's file cleanup and __free(kfree) auto cleanup is getting too convoluted. This is mainly because we need to ensure only 1 user get the srcID handle. To simplify this, we allocate prepare the src_info in the beginning and add it to the global list under a spinlock after checking that no duplicates exist. This simplify the error handling where if the FD_ADD fails, we can simply remove the src_info from the list and consume any pending msg in hvpipe to be cleared, after src_info became visible in the global list. Cc: stable@vger.kernel.org Fixes: `6d3789d347` ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()") Reported-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/31ad94bc89d44156ee700c5bd006cb47a748e3cb.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:24 +05:30
Ritesh Harjani (IBM)	cefeed4429	pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace The hdr variable is allocated on the stack and only hdr.version and hdr.flags are initialized explicitly. Because the struct papr_hvpipe_hdr contains reserved padding bytes (reserved[3] and reserved2[40]), these could leak the uninitialized bytes to userspace after copy_to_user(). This patch fixes that by initializing the whole struct to 0. Cc: stable@vger.kernel.org Fixes: `cebdb522fd` ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/7bfe03b65a282c856ed8182d1871bb973c0b78f2.1777606826.git.ritesh.list@gmail.com	2026-05-06 07:30:24 +05:30

1 2 3 4 5 ...

244176 Commits