linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 21:52:29 -04:00

Author	SHA1	Message	Date
Linus Torvalds	ac354b5cb0	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "s390: - Lots of small and not-so-small fixes for the newly rewritten gmap, mostly affecting the handling of nested guests. x86: - Fix an issue with shadow paging, which causes KVM to install an MMIO PTE in the shadow page tables without first zapping a non-MMIO SPTE if KVM didn't see the write that modified the shadowed guest PTE. While commit `a54aa15c6b` ("KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte()") was right about it being impossible to miss such a write if it was coming from the guest, it failed to account for writes to guest memory that are outside the scope of KVM: if userspace modifies the guest PTE, and then the guest hits a relevant page fault, KVM will get confused" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Only WARN in direct MMUs when overwriting shadow-present SPTE KVM: x86/mmu: Drop/zap existing present SPTE even when creating an MMIO SPTE KVM: s390: Fix KVM_S390_VCPU_FAULT ioctl KVM: s390: vsie: Fix guest page tables protection KVM: s390: vsie: Fix unshadowing while shadowing KVM: s390: vsie: Fix refcount overflow for shadow gmaps KVM: s390: vsie: Fix nested guest memory shadowing KVM: s390: Correctly handle guest mappings without struct page KVM: s390: Fix gmap_link() KVM: s390: vsie: Fix check for pre-existing shadow mapping KVM: s390: Remove non-atomic dat_crstep_xchg() KVM: s390: vsie: Fix dat_split_ste()	2026-03-29 11:58:47 -07:00
Linus Torvalds	e522b75c44	Merge tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Add array_index_nospec() to syscall dispatch table lookup to prevent limited speculative out-of-bounds access with user-controlled syscall number - Mark array_index_mask_nospec() __always_inline since GCC may emit an out-of-line call instead of the inline data dependency sequence the mitigation relies on - Clear r12 on kernel entry to prevent potential speculative use of user value in system_call, ext/io/mcck interrupt handlers * tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/entry: Scrub r12 register on kernel entry s390/syscalls: Add spectre boundary for syscall dispatch table s390/barrier: Make array_index_mask_nospec() __always_inline	2026-03-28 09:50:11 -07:00
Vasily Gorbik	0738d395aa	s390/entry: Scrub r12 register on kernel entry Before commit `f33f2d4c7c` ("s390/bp: remove TIF_ISOLATE_BP"), all entry handlers loaded r12 with the current task pointer (lg %r12,__LC_CURRENT) for use by the BPENTER/BPEXIT macros. That commit removed TIF_ISOLATE_BP, dropping both the branch prediction macros and the r12 load, but did not add r12 to the register clearing sequence. Add the missing xgr %r12,%r12 to make the register scrub consistent across all entry points. Fixes: `f33f2d4c7c` ("s390/bp: remove TIF_ISOLATE_BP") Cc: stable@kernel.org Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-28 00:43:39 +01:00
Greg Kroah-Hartman	48b8814e25	s390/syscalls: Add spectre boundary for syscall dispatch table The s390 syscall number is directly controlled by userspace, but does not have an array_index_nospec() boundary to prevent access past the syscall function pointer tables. Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Fixes: `56e62a7370` ("s390: convert to generic entry") Cc: stable@kernel.org Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/2026032404-sterling-swoosh-43e6@gregkh Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-28 00:43:39 +01:00
Vasily Gorbik	c5c0a268b3	s390/barrier: Make array_index_mask_nospec() __always_inline Mark array_index_mask_nospec() as __always_inline to guarantee the mitigation is emitted inline regardless of compiler inlining decisions. Fixes: `e2dd833389` ("s390: add optimized array_index_mask_nospec") Cc: stable@kernel.org Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-28 00:43:24 +01:00
Claudio Imbrenda	0a28e06575	KVM: s390: Fix KVM_S390_VCPU_FAULT ioctl A previous commit changed the behaviour of the KVM_S390_VCPU_FAULT ioctl. The current (wrong) implementation will trigger a guest addressing exception if the requested address lies outside of a memslot, unless the VM is UCONTROL. Restore the previous behaviour by open coding the fault-in logic. Fixes: `3762e905ec` ("KVM: s390: use __kvm_faultin_pfn()") Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:38 +01:00
Claudio Imbrenda	a12cc7e3d6	KVM: s390: vsie: Fix guest page tables protection When shadowing, the guest page tables are write-protected, in order to trap changes and properly unshadow the shadow mapping for the nested guest. Already shadowed levels are skipped, so that only the needed levels are write protected. Currently the levels that get write protected are exactly one level too deep: the last level (nested guest memory) gets protected in the wrong way, and will be protected again correctly a few lines afterwards; most importantly, the highest non-shadowed level does not get write protected. Moreover, if the nested guest is running in a real address space, there are no DAT tables to shadow. Write protect the correct levels, so that all the levels that need to be protected are protected, and avoid double protecting the last level; skip attempting to shadow the DAT tables when the nested guest is running in a real address space. Fixes: `e38c884df9` ("KVM: s390: Switch to new gmap") Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:34 +01:00
Claudio Imbrenda	19d6c5b804	KVM: s390: vsie: Fix unshadowing while shadowing If shadowing causes the shadow gmap to get unshadowed, exit early to prevent an attempt to dereference the parent pointer, which at this point is NULL. Opportunistically add some more checks to prevent NULL parents. Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Fixes: `e5f98a6899` ("KVM: s390: Add some helper functions needed for vSIE") Fixes: `e38c884df9` ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:30 +01:00
Claudio Imbrenda	0ec456b8a5	KVM: s390: vsie: Fix refcount overflow for shadow gmaps In most cases gmap_put() was not called when it should have. Add the missing gmap_put() in vsie_run(). Fixes: `e38c884df9` ("KVM: s390: Switch to new gmap") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:25 +01:00
Claudio Imbrenda	fd7bc612cf	KVM: s390: vsie: Fix nested guest memory shadowing Fix _do_shadow_pte() to use the correct pointer (guest pte instead of nested guest) to set up the new pte. Add a check to return -EOPNOTSUPP if the mapping for the nested guest is writeable but the same page in the guest is only read-only. Fixes: `e38c884df9` ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:21 +01:00
Claudio Imbrenda	0f2b760a17	KVM: s390: Correctly handle guest mappings without struct page Introduce a new special softbit for large pages, like already presend for normal pages, and use it to mark guest mappings that do not have struct pages. Whenever a leaf DAT entry becomes dirty, check the special softbit and only call SetPageDirty() if there is an actual struct page. Move the logic to mark pages dirty inside _gmap_ptep_xchg() and _gmap_crstep_xchg_atomic(), to avoid needlessly duplicating the code. Fixes: `5a74e3d934` ("KVM: s390: KVM-specific bitfields and helper functions") Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:18 +01:00
Claudio Imbrenda	45921d0212	KVM: s390: Fix gmap_link() The slow path of the fault handler ultimately called gmap_link(), which assumed the fault was a major fault, and blindly called dat_link(). In case of minor faults, things were not always handled properly; in particular the prefix and vsie marker bits were ignored. Move dat_link() into gmap.c, renaming it accordingly. Once moved, the new _gmap_link() function will be able to correctly honour the prefix and vsie markers. This will cause spurious unshadows in some uncommon cases. Fixes: `94fd9b16cc` ("KVM: s390: KVM page table management functions: lifecycle management") Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:13 +01:00
Claudio Imbrenda	6f93d1ed6f	KVM: s390: vsie: Fix check for pre-existing shadow mapping When shadowing a nested guest, a check is performed and no shadowing is attempted if the nested guest is already shadowed. The existing check was incomplete; fix it by also checking whether the leaf DAT table entry in the existing shadow gmap has the same protection as the one specified in the guest DAT entry. Fixes: `e38c884df9` ("KVM: s390: Switch to new gmap") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:07 +01:00
Claudio Imbrenda	b827ef02f4	KVM: s390: Remove non-atomic dat_crstep_xchg() In practice dat_crstep_xchg() is racy and hard to use correctly. Simply remove it and replace its uses with dat_crstep_xchg_atomic(). This solves some actual races that lead to system hangs / crashes. Opportunistically fix an alignment issue in _gmap_crstep_xchg_atomic(). Fixes: `589071eaaa` ("KVM: s390: KVM page table management functions: clear and replace") Fixes: `94fd9b16cc` ("KVM: s390: KVM page table management functions: lifecycle management") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:12:03 +01:00
Claudio Imbrenda	0f54755343	KVM: s390: vsie: Fix dat_split_ste() If the guest misbehaves and puts the page tables for its nested guest inside the memory of the nested guest itself, and the guest and nested guest are being mapped with large pages, the shadow mapping will lose synchronization with the actual mapping, since this will cause the large page with the vsie notification bit to be split, but the vsie notification bit will not be propagated to the resulting small pages. Fix this by propagating the vsie_notif bit from large pages to normal pages when splitting a large page. Fixes: `2db149a0a6` ("KVM: s390: KVM page table management functions: walks") Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-26 16:11:58 +01:00
Paolo Bonzini	12fd965871	Merge tag 'kvm-s390-master-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for 7.0 - fix deadlock in new memory management - handle kernel faults on donated memory properly - fix bounds checking for irq routing + selftest - fix invalid machine checks + logging	2026-03-24 17:32:13 +01:00
Christian Borntraeger	ab5119735e	KVM: s390: vsie: Avoid injecting machine check on signal The recent XFER_TO_GUEST_WORK change resulted in a situation, where the vsie code would interpret a signal during work as a machine check during SIE as both use the EINTR return code. The exit_reason of the sie64a function has nothing to do with the kvm_run exit_reason. Rename it and define a specific code for machine checks instead of abusing -EINTR. rename exit_reason into sie_return to avoid the naming conflict and change the code flow in vsie.c to have a separate variable for rc and sie_return. Fixes: `2bd1337a12` ("KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-16 16:56:39 +01:00
Christian Borntraeger	1ca90f4ae5	KVM: s390: log machine checks more aggressively KVM will reinject machine checks that happen during guest activity. From a host perspective this machine check is no longer visible and even for the guest, the guest might decide to only kill a userspace program or even ignore the machine check. As this can be a disruptive event nevertheless, we should log this not only in the VM debug event (that gets lost after guest shutdown) but also on the global KVM event as well as syslog. Consolidate the logging and log with loglevel 2 and higher. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>	2026-03-16 16:56:39 +01:00
Janosch Frank	dcf96f7ad5	KVM: s390: Limit adapter indicator access to mapped page While we check the address for errors, we don't seem to check the bit offsets and since they are 32 and 64 bits a lot of memory can be reached indirectly via those offsets. Fixes: `8422359877` ("KVM: s390: irq routing for adapter interrupts.") Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2026-03-16 16:56:39 +01:00
Janosch Frank	b00be77302	s390/mm: Add missing secure storage access fixups for donated memory There are special cases where secure storage access exceptions happen in a kernel context for pages that don't have the PG_arch_1 bit set. That bit is set for non-exported guest secure storage (memory) but is absent on storage donated to the Ultravisor since the kernel isn't allowed to export donated pages. Prior to this patch we would try to export the page by calling arch_make_folio_accessible() which would instantly return since the arch bit is absent signifying that the page was already exported and no further action is necessary. This leads to secure storage access exception loops which can never be resolved. With this patch we unconditionally try to export and if that fails we fixup. Fixes: `084ea4d611` ("s390/mm: add (non)secure page access exceptions handlers") Reported-by: Heiko Carstens <hca@linux.ibm.com> Suggested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2026-03-16 16:56:28 +01:00
Linus Torvalds	11e8c7e947	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk after the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...	2026-03-15 12:22:10 -07:00
Linus Torvalds	8d9968859c	Merge tag 's390-7.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Revert IRQ entry/exit path optimization that incorrectly cleared some PSW bits before irqentry_exit(), causing boot failures with linux-next and HRTIMER_REARM_DEFERRED (which only uncovered the problem) - Fix zcrypt code to show CCA card serial numbers even when the default crypto domain is offline by selecting any domain available, preventing empty sysfs entries * tag 's390-7.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Enable AUTOSEL_DOM for CCA serialnr sysfs attribute s390: Revert "s390/irq/idle: Remove psw bits early"	2026-03-13 14:18:13 -07:00
Paolo Bonzini	94fe3e6515	Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into HEAD KVM generic changes for 7.0 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end. - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive.	2026-03-11 18:01:55 +01:00
Heiko Carstens	75aa996ea6	s390: Revert "s390/irq/idle: Remove psw bits early" This reverts commit `d8b5cf9c63`. Mikhail Zaslonko reported that linux-next doesn't boot anymore [2]. Reason for this is recent change [2] was supposed to slightly optimize the irq entry/exit path by removing some psw bits early in case of an idle exit. This however is incorrect since irqentry_exit() requires the correct old psw state at irq entry. Otherwise the embedded regs_irqs_disabled() will not provide the correct result. With linux-next and HRTIMER_REARM_DEFERRED this leads to the observed boot problems, however the commit is broken in any case. Revert the commit which introduced this. Thanks to Peter Zijlstra for pointing out that this is a bug in the s390 entry code. Fixes: `d8b5cf9c63` ("s390/irq/idle: Remove psw bits early") [1] Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Closes: https://lore.kernel.org/r/af549a19-db99-4b16-8511-bf315177a13e@linux.ibm.com/ [2] Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260306111919.362559-1-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-07 22:41:10 +01:00
Linus Torvalds	4ae12d8bd9	Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Split out .modinfo section from ELF_DETAILS macro, as that macro may be used in other areas that expect to discard .modinfo, breaking certain image layouts - Adjust genksyms parser to handle optional attributes in certain declarations, necessary after commit `07919126ec` ("netfilter: annotate NAT helper hook pointers with __rcu") - Include resolve_btfids in external module build created by scripts/package/install-extmod-build when it may be run on external modules - Avoid removing objtool binary with 'make clean', as it is required for external module builds * tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Leave objtool binary around with 'make clean' kbuild: install-extmod-build: Package resolve_btfids if necessary genksyms: Fix parsing a declarator with a preceding attribute kbuild: Split .modinfo out from ELF_DETAILS	2026-03-06 20:27:13 -08:00
Claudio Imbrenda	f303406efd	KVM: s390: Fix a deadlock In some scenarios, a deadlock can happen, involving _do_shadow_pte(). Convert all usages of pgste_get_lock() to pgste_get_trylock() in _do_shadow_pte() and return -EAGAIN. All callers can already deal with -EAGAIN being returned. Fixes: `e38c884df9` ("KVM: s390: Switch to new gmap") Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-03-06 12:41:28 +01:00
Heiko Carstens	674c5ff0f4	s390/stackleak: Fix __stackleak_poison() inline assembly constraint The __stackleak_poison() inline assembly comes with a "count" operand where the "d" constraint is used. "count" is used with the exrl instruction and "d" means that the compiler may allocate any register from 0 to 15. If the compiler would allocate register 0 then the exrl instruction would not or the value of "count" into the executed instruction - resulting in a stackframe which is only partially poisoned. Use the correct "a" constraint, which excludes register 0 from register allocation. Fixes: `2a405f6bb3` ("s390/stackleak: provide fast __stackleak_poison() implementation") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-4-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-03 16:42:14 +01:00
Heiko Carstens	87ff6da300	s390/xor: Improve inline assembly constraints The inline assembly constraint for the "bytes" operand is "d" for all xor() inline assemblies. "d" means that any register from 0 to 15 can be used. If the compiler would use register 0 then the exrl instruction would not or the value of "bytes" into the executed instruction - resulting in an incorrect result. However all the xor() inline assemblies make hard-coded use of register 0, and it is correctly listed in the clobber list, so that this cannot happen. Given that this is quite subtle use the better "a" constraint, which excludes register 0 from register allocation in any case. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-03 16:42:14 +01:00
Heiko Carstens	f775276edc	s390/xor: Fix xor_xc_2() inline assembly constraints The inline assembly constraints for xor_xc_2() are incorrect. "bytes", "p1", and "p2" are input operands, while all three of them are modified within the inline assembly. Given that the function consists only of this inline assembly it seems unlikely that this may cause any problems, however fix this in any case. Fixes: `2cfc5f9ce7` ("s390/xor: optimized xor routing using the XC instruction") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-2-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-03 16:42:14 +01:00
Vasily Gorbik	5f25805303	s390/xor: Fix xor_xc_5() inline assembly xor_xc_5() contains a larl 1,2f that is not used by the asm and is not declared as a clobber. This can corrupt a compiler-allocated value in %r1 and lead to miscompilation. Remove the instruction. Fixes: `745600ed69` ("s390/lib: Use exrl instead of ex in xor functions") Cc: stable@vger.kernel.org Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-03-03 16:42:14 +01:00
Linus Torvalds	949d0a46ad	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Arm: - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... [his words, not mine] - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info() Generic: - Remove internal Kconfigs that are now set on all architectures - Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all architectures finally enable it in Linux 7.0" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: always define KVM_CAP_SYNC_MMU KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER KVM: arm64: Deduplicate ASID retrieval code irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs KVM: arm64: Fix protected mode handling of pages larger than 4kB KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state() KVM: arm64: Fix ID register initialization for non-protected pKVM guests KVM: arm64: Optimise away S1POE handling when not supported by host KVM: arm64: Hide S1POE from guests when not supported by the host	2026-03-01 15:34:47 -08:00
Paolo Bonzini	70295a479d	KVM: always define KVM_CAP_SYNC_MMU KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always available. Move the definition from individual architectures to common code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-02-28 15:31:35 +01:00
Paolo Bonzini	407fd8b8d8	KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER All architectures now use MMU notifier for KVM page table management. Remove the Kconfig symbol and the code that is used when it is disabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-02-28 15:31:35 +01:00
Nathan Chancellor	8678591b47	kbuild: Split .modinfo out from ELF_DETAILS Commit `3e86e4d74c` ("kbuild: keep .modinfo section in vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and ELF_DETAILS was present in all architecture specific vmlinux linker scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and COMMON_DISCARDS may be used by other linker scripts, such as the s390 and x86 compressed boot images, which may not expect to have a .modinfo section. In certain circumstances, this could result in a bootloader failing to load the compressed kernel [1]. Commit `ddc6cbef3e` ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer") recently addressed this for the s390 bzImage but the same bug remains for arm, parisc, and x86. The presence of .modinfo in the x86 bzImage was the root cause of the issue worked around with commit `d50f210913` ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition. Split .modinfo out from ELF_DETAILS into its own macro and handle it in all vmlinux linker scripts. Discard .modinfo in the places where it was previously being discarded from being in COMMON_DISCARDS, as it has never been necessary in those uses. Cc: stable@vger.kernel.org Fixes: `3e86e4d74c` ("kbuild: keep .modinfo section in vmlinux.unstripped") Reported-by: Ed W <lists@wildgooses.com> Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1] Tested-by: Ed W <lists@wildgooses.com> # x86_64 Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2026-02-26 11:50:19 -07:00
Alexander Gordeev	d879ac6756	s390/pfault: Fix virtual vs physical address confusion When Linux is running as guest, runs a user space process and the user space process accesses a page that the host has paged out, the guest gets a pfault interrupt and schedules a different process. Without this mechanism the host would have to suspend the whole virtual CPU until the page has been paged in. To setup the pfault interrupt the real address of parameter list should be passed to DIAGNOSE 0x258, but a virtual address is passed instead. That has a performance impact, since the pfault setup never succeeds, the interrupt is never delivered to a guest and the whole virtual CPU is suspended as result. Cc: stable@vger.kernel.org Fixes: `c98d2ecae0` ("s390/mm: Uncouple physical vs virtual address spaces") Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 17:00:25 +01:00
Vasily Gorbik	1623a554c6	s390/kexec: Disable stack protector in s390_reset_system() s390_reset_system() calls set_prefix(0), which switches back to the absolute lowcore. At that point the stack protector canary no longer matches the canary from the lowcore the function was entered with, so the stack check fails. Mark s390_reset_system() __no_stack_protector. This is safe here since its callers (__do_machine_kdump() and __do_machine_kexec()) are effectively no-return and fall back to disabled_wait() on failure. Fixes: `f5730d44e0` ("s390: Add stackprotector support") Reported-by: Nikita Dubrovskii <nikita@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 17:00:25 +01:00
Heiko Carstens	acbc0437ca	s390/idle: Remove psw_idle() prototype psw_idle() does not exist anymore. Remove its prototype. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	80f63fd09e	s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON() Use lockdep_assert_irqs_disabled() instead of BUG_ON(). This avoids crashing the kernel, and generates better code if CONFIG_PROVE_LOCKING is disabled. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	e282ccd308	s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE() do_account_vtime() runs always with interrupts disabled, therefore use __this_cpu_read() instead of this_cpu_read() to get rid of a pointless preempt_disable() / preempt_enable() pair. Also there are no concurrent writers to the cpu time accounting fields in lowcore. Therefore get rid of READ_ONCE() usages. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	d8b5cf9c63	s390/irq/idle: Remove psw bits early Remove wait, io, external interrupt bits early in do_io_irq()/do_ext_irq() when previous context was idle. This saves one conditional branch and is closer to the original old assembly code. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	257c14e5a1	s390/idle: Inline update_timer_idle() Inline update_timer_idle() again to avoid an extra function call. This way the generated code is close to old assembler version again. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	00d8b035eb	s390/idle: Slightly optimize idle time accounting Slightly optimize account_idle_time_irq() and update_timer_idle(): - Use fast single instruction __atomic64() primitives to update per cpu idle_time and idle_count, instead of READ_ONCE() / WRITE_ONCE() pairs - stcctm() is an inline assembly with a full memory barrier. This leads to a not necessary extra dereference of smp_cpu_mtid in update_timer_idle(). Avoid this and read smp_cpu_mtid into a variable - Use __this_cpu_add() instead of this_cpu_add() to avoid disabling / enabling of preemption several times in a loop in update_timer_idle(). Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	aefa6ec890	s390/idle: Add comment for non obvious code Add a comment to update_timer_idle() which describes why wall time (not steal time) is added to steal_timer. This is not obvious and was reported by Frederic Weisbecker. Reported-by: Frederic Weisbecker <frederic@kernel.org> Closes: https://lore.kernel.org/all/aXEVM-04lj0lntMr@localhost.localdomain/ Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	dbc0fb3567	s390/vtime: Fix virtual timer forwarding Since delayed accounting of system time [1] the virtual timer is forwarded by do_account_vtime() but also vtime_account_kernel(), vtime_account_softirq(), and vtime_account_hardirq(). This leads to double accounting of system, guest, softirq, and hardirq time. Remove accounting from the vtime_account*() family to restore old behavior. There is only one user of the vtimer interface, which might explain why nobody noticed this so far. Fixes: `b7394a5f4c` ("sched/cputime, s390: Implement delayed accounting of system time") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Heiko Carstens	0d785e2c32	s390/idle: Fix cpu idle exit cpu time accounting With the conversion to generic entry [1] cpu idle exit cpu time accounting was converted from assembly to C. This introduced an reversed order of cpu time accounting. On cpu idle exit the current accounting happens with the following call chain: -> do_io_irq()/do_ext_irq() -> irq_enter_rcu() -> account_hardirq_enter() -> vtime_account_irq() -> vtime_account_kernel() vtime_account_kernel() accounts the passed cpu time since last_update_timer as system time, and updates last_update_timer to the current cpu timer value. However the subsequent call of -> account_idle_time_irq() will incorrectly subtract passed cpu time from timer_idle_enter to the updated last_update_timer value from system_timer. Then last_update_timer is updated to a sys_enter_timer, which means that last_update_timer goes back in time. Subsequently account_hardirq_exit() will account too much cpu time as hardirq time. The sum of all accounted cpu times is still correct, however some cpu time which was previously accounted as system time is now accounted as hardirq time, plus there is the oddity that last_update_timer goes back in time. Restore previous behavior by extracting cpu time accounting code from account_idle_time_irq() into a new update_timer_idle() function and call it before irq_enter_rcu(). Fixes: `56e62a7370` ("s390: convert to generic entry") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2026-02-25 16:46:07 +01:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	9806790115	Merge tag 's390-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Make KEXEC_SIG available again for CONFIG_MODULES=n - The s390 topology code used to call rebuild_sched_domains() before common code scheduling domains were setup. This was silently ignored by common code, but now results in a warning. Address by avoiding the early call - Convert debug area lock from spinlock to raw spinlock to address lockdep warnings - The recent 3490 tape device driver rework resulted in a different device driver name, which is visible via sysfs for user space. This breaks at least one user space application. Change the device driver name back to its old name to fix this * tag 's390-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/tape: Fix device driver name s390/debug: Convert debug area lock from a spinlock to a raw spinlock s390/smp: Avoid calling rebuild_sched_domains() early s390/kexec: Make KEXEC_SIG available when CONFIG_MODULES=n	2026-02-20 09:24:45 -08:00

1 2 3 4 5 ...

11242 Commits