linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-17 16:00:38 -05:00

Author	SHA1	Message	Date
Zi Yan	77008e1b2e	mm/huge_memory: do not change split_huge_page() target order silently Page cache folios from a file system that support large block size (LBS) can have minimal folio order greater than 0, thus a high order folio might not be able to be split down to order-0. Commit `e220917fa5` ("mm: split a folio in minimum folio order chunks") bumps the target order of split_huge_page() to the minimum allowed order when splitting a LBS folio. This causes confusion for some split_huge_page*() callers like memory failure handling code, since they expect after-split folios all have order-0 when split succeeds but in reality get min_order_for_split() order folios and give warnings. Fix it by failing a split if the folio cannot be split to the target order. Rename try_folio_split() to try_folio_split_to_order() to reflect the added new_order parameter. Remove its unused list parameter. [The test poisons LBS folios, which cannot be split to order-0 folios, and also tries to poison all memory. The non split LBS folios take more memory than the test anticipated, leading to OOM. The patch fixed the kernel warning and the test needs some change to avoid OOM.] Link: https://lkml.kernel.org/r/20251017013630.139907-1-ziy@nvidia.com Fixes: `e220917fa5` ("mm: split a folio in minimum folio order chunks") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/ Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:41 -08:00
Bibo Mao	237e74bfa2	LoongArch: KVM: Fix max supported vCPUs set with EIOINTC VM fails to boot with 256 vCPUs, the detailed command is qemu-system-loongarch64 -smp 256 and there is an error reported as follows: KVM_LOONGARCH_EXTIOI_INIT_NUM_CPU failed: Invalid argument There is typo issue in function kvm_eiointc_ctrl_access() when set max supported vCPUs. Cc: stable@vger.kernel.org Fixes: `47256c4c8b` ("LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access()") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:07 +08:00
Bibo Mao	11f340ece4	LoongArch: KVM: Skip PMU checking on vCPU context switch PMU hardware about VM is switched on VM exit to host rather than vCPU context sched off, PMU is checked and restored on return to VM. It is not necessary to check PMU on vCPU context sched on callback, since the request is made on the VM exit entry or VM PMU CSR access abort routine already. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:07 +08:00
Bibo Mao	5001bcf86e	LoongArch: KVM: Restore guest PMU if it is enabled On LoongArch system, guest PMU hardware is shared by guest and host but PMU interrupt is separated. PMU is pass-through to VM, and there is PMU context switch when exit to host and return to guest. There is optimiation to check whether PMU is enabled by guest. If not, it is not necessary to return to guest. However, if it is enabled, PMU context for guest need switch on. Now KVM_REQ_PMU notification is set on vCPU context switch, but it is missing if there is no vCPU context switch while PMU is used by guest VM, so fix it. Cc: <stable@vger.kernel.org> Fixes: `f4e40ea9f7` ("LoongArch: KVM: Add PMU support for guest") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:07 +08:00
Bibo Mao	d3c9515e4f	LoongArch: KVM: Add delay until timer interrupt injected When timer is fired in oneshot mode, CSR.TVAL will stop with value -1 rather than 0. However when the register CSR.TVAL is restored, it will continue to count down rather than stop there. Now the method is to write 0 to CSR.TVAL, wait to count down for 1 cycle at least, which is 10ns with a timer freq 100MHz, and then retore timer interrupt status. Here add 2 cycles delay to assure that timer interrupt is injected. With this patch, timer selftest case passes to run always. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:07 +08:00
Bibo Mao	37e9d1a913	LoongArch: KVM: Set page with write attribute if dirty track disabled With secondary MMU page table, if there is a read page fault, the page's write attribute will not set even if it is writable from master MMU page table. This logic only works if dirty tracking is enabled, so page table should be set with _PAGE_WRITE if dirty tracking is disabled. It reduces extra page fault on secondary MMU page table if a VM finishes migration, when the master MMU page table is ready and the secondary MMU page is fresh. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:07 +08:00
Qiang Ma	62cda5e54f	LoongArch: kexec: Print out debugging message if required When specifying '-d' for kexec_file_load interface, loaded locations of kernel/initrd/cmdline etc can be printed out to help debug. Commit `eb7622d908` ("kexec_file, riscv: print out debugging message if required") fixes the same issue on RISC-V. So, remove kexec_image_info() because the content has been printed out in generic code. And on Loongson-3A5000, the printed messages look like below: kexec_file: kernel: 00000000d9aad283 kernel_size: 0x2e77f30 kexec_file(EFI): No LoongArch PE image header. kexec_file: Loaded initrd at 0x80000000 bufsz=0x1637cd0 memsz=0x1638000 kexec_file(ELF): Loaded kernel at 0x9c20000 bufsz=0x27f1800 memsz=0x2950000 kexec_file: nr_segments = 2 kexec_file: segment[0]: buf=0x00000000cc3e6c33 bufsz=0x27f1800 mem=0x9c20000 memsz=0x2950000 kexec_file: segment[1]: buf=0x00000000bb75a541 bufsz=0x1637cd0 mem=0x80000000 memsz=0x1638000 kexec_file: kexec_file_load: type:0, start:0xb15d000 head:0x18db60002 flags:0x8 Signed-off-by: Qiang Ma <maqianga@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:07 +08:00
Youling Tang	df16b8956c	LoongArch: kexec: Initialize the kexec_buf structure The kexec_buf structure was previously declared without initialization. commit `bf454ec31a` ("kexec_file: allow to place kexec_buf randomly") added a field that is always read but not consistently populated by all architectures. This un-initialized field will contain garbage. This is also triggering a UBSAN warning when the uninitialized data is accessed: ------------[ cut here ]------------ UBSAN: invalid-load in ./include/linux/kexec.h:210:10 load of value 252 is not a valid value for type '_Bool' Zero-initializing kexec_buf at declaration ensures all fields are cleanly set, preventing future instances of uninitialized memory being used. Fixes: `bf454ec31a` ("kexec_file: allow to place kexec_buf randomly") Link: https://lore.kernel.org/r/20250827-kbuf_all-v1-2-1df9882bb01a@debian.org Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:07 +08:00
Huacai Chen	eeeeaafa62	LoongArch: Use correct accessor to read FWPC/MWPC CSR.FWPC and CSR.MWPC are 32bit registers, so use csr_read32() rather than csr_read64() to read the values of FWPC/MWPC. Cc: stable@vger.kernel.org Fixes: `edffa33c7b` ("LoongArch: Add hardware breakpoints/watchpoints support") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Tiezhu Yang	4c8a7c9827	LoongArch: Refine the init_hw_perf_events() function (1) Use the existing CPUCFG6_PMNUM_SHIFT macro definition instead of the magic value 4 to get the PMU number. (2) Detect the value of PMU bits via CPUCFG instruction according to the ISA manual instead of hard-coded as 64, because the value may be different for various micro-architectures. Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_cpucfg Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Vishal Moola (Oracle)	17f838512a	LoongArch: Remove __GFP_HIGHMEM masking in pud_alloc_one() Remove the unnecessary __GFP_HIGHMEM masking in pud_alloc_one(), which was introduced with commit `382739797f` ("loongarch: convert various functions to use ptdescs"). GFP_KERNEL doesn't contain __GFP_HIGHMEM. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Tianyang Zhang	a073d637c8	LoongArch: Let {pte,pmd}_modify() record the status of _PAGE_DIRTY Now if the PTE/PMD is dirty with _PAGE_DIRTY but without _PAGE_MODIFIED, after {pte,pmd}_modify() we lose _PAGE_DIRTY, then {pte,pmd}_dirty() return false and lead to data loss. This can happen in certain scenarios such as HW PTW doesn't set _PAGE_MODIFIED automatically, so here we need _PAGE_MODIFIED to record the dirty status (_PAGE_DIRTY). The new modification involves checking whether the original PTE/PMD has the _PAGE_DIRTY flag. If it exists, the _PAGE_MODIFIED bit is also set, ensuring that the {pte,pmd}_dirty() interface can always return accurate information. Cc: stable@vger.kernel.org Co-developed-by: Liupu Wang <wangliupu@loongson.cn> Signed-off-by: Liupu Wang <wangliupu@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>	2025-11-10 08:37:06 +08:00
Huacai Chen	ce5ad03e45	LoongArch: Consolidate max_pfn & max_low_pfn calculation Now there 5 places which calculate max_pfn & max_low_pfn: 1. in fdt_setup() for FDT systems; 2. in memblock_init() for ACPI systems; 3. in init_numa_memory() for NUMA systems; 4. in arch_mem_init() to recalculate for "mem=" cmdline; 5. in paging_init() to recalculate for NUMA systems. Since memblock_init() is called both for ACPI and FDT systems, move the calculation out of the for_each_efi_memory_desc() loop can eliminate the first case. The last case is very questionable (may be derived from the MIPS/Loongson code) and breaks the "mem=" cmdline, so should be removed. And then the NUMA version of paging_init() can be also eliminated. After consolidation there are 3 places of calculation: 1. in memblock_init() for both ACPI and FDT systems; 2. in init_numa_memory() to recalculate for NUMA systems; 3. in arch_mem_init() to recalculate for the "mem=" cmdline. For all cases the calculation is: max_pfn = PFN_DOWN(memblock_end_of_DRAM()); max_low_pfn = min(PFN_DOWN(HIGHMEM_START), max_pfn); Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Huacai Chen	43a9e6a10b	LoongArch: Consolidate early_ioremap()/ioremap_prot() 1. Use phys_addr_t instead of u64, which can work for both 32/64 bits. 2. Check whether the input physical address is above TO_PHYS_MASK (and return NULL if yes) for the DMW version. Note: In theory early_ioremap() also need the TO_PHYS_MASK checking, but the UEFI BIOS pass some DMW virtual addresses. Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Huacai Chen	4e67526840	LoongArch: Use physical addresses for CSR_MERRENTRY/CSR_TLBRENTRY Now we use virtual addresses to fill CSR_MERRENTRY/CSR_TLBRENTRY, but hardware hope physical addresses. Now it works well because the high bits are ignored above PA_BITS (48 bits), but explicitly use physical addresses can avoid potential bugs. So fix it. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Huacai Chen	f28abb9f96	LoongArch: Clarify 3 MSG interrupt features LoongArch's MSG interrupt features are used across multiple subsystems. Clarify these features to avoid misuse, existing users will be adjusted if necessary. MSGINT: Infrastructure, means the CPU core supports message interupts. Indicated by CPUCFG1.MSGINT. AVECINT: AVEC interrupt controller based on MSGINT, means the CPU chip supports direct message interrupts. Indicated by IOCSR.FEATURES.DMSI. REDIRECTINT: REDIRECT interrupt controller based on MSGINT and AVECINT, means the CPU chip supports redirect message interrupts. Indicated by IOCSR.FEATURES.RMSI. For example: Loongson-3A5000/3C5000 doesn't support MSGINT/AVECINT/REDIRECTINT; Loongson-3A6000 supports MSGINT but doesn't support AVECINT/REDIRECTINT; Loongson-3C6000 supports MSGINT/AVECINT/REDIRECTINT. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Xi Ruoyao	fe4b3a34e9	rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags It's used to work around an objtool issue since commit `abb2a55722` ("LoongArch: Add cflag -fno-isolate-erroneous-paths-dereference"), but it's then passed to bindgen and cause an error because Clang does not have this option. Fixes: `abb2a55722` ("LoongArch: Add cflag -fno-isolate-erroneous-paths-dereference") Acked-by: Miguel Ojeda <ojeda@kernel.org> Tested-by: Mingcong Bai <jeffbai@aosc.io> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-11-10 08:37:06 +08:00
Joshua Rogers	98a5fd31cb	ksmbd: close accepted socket when per-IP limit rejects connection When the per-IP connection limit is exceeded in ksmbd_kthread_fn(), the code sets ret = -EAGAIN and continues the accept loop without closing the just-accepted socket. That leaks one socket per rejected attempt from a single IP and enables a trivial remote DoS. Release client_sk before continuing. This bug was found with ZeroPath. Cc: stable@vger.kernel.org Signed-off-by: Joshua Rogers <linux@joshua.hu> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-11-09 17:47:52 -06:00
Joshua Rogers	e904d81ad1	smb: server: rdma: avoid unmapping posted recv on accept failure smb_direct_prepare_negotiation() posts a recv and then, if smb_direct_accept_client() fails, calls put_recvmsg() on the same buffer. That unmaps and recycles a buffer that is still posted on the QP., which can lead to device DMA into unmapped or reused memory. Track whether the recv was posted and only return it if it was never posted. If accept fails after a post, leave it for teardown to drain and complete safely. Signed-off-by: Joshua Rogers <linux@joshua.hu> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-11-09 17:47:49 -06:00
Linus Torvalds	e9a6fb0bcd	Linux 6.18-rc5 v6.18-rc5	2025-11-09 15:10:19 -08:00
Linus Torvalds	f850568efe	Merge tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "Two reverts merged into one commit to handle a regression caused by a wrong cleanup because the underlying implications were unclear" * tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: muxes: pca954x: Fix broken reset-gpio usage	2025-11-09 09:29:44 -08:00
Linus Torvalds	3461e958c1	Merge tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Strip trailing padding bytes from modules.builtin.modinfo to fix error during modules_install with certain versions of kmod - Drop unused static inline function warning in .c files with clang from W=1 to W=2 - Ensure kernel-doc.py invocations use the PYTHON3 make variable to ensure user's choice of Python interpreter is always respected * tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Let kernel-doc.py use PYTHON3 override compiler_types: Move unused static inline functions warning to W=2 kbuild: Strip trailing padding bytes from modules.builtin.modinfo	2025-11-09 09:22:08 -08:00
Patrisious Haddad	583b4fe1c1	net/mlx5: fs, set non default device per namespace Add mlx5_fs_set_root_dev() function which swaps the root namespace core device with another one for a given table_type. It is intended for usage only by RDMA_TRANSPORT tables in case of LAG configuration, to allow the creation of tables during LAG always through the LAG master device, which is valid since during LAG the master is allowed to manage the RDMA_TRANSPORT tables of its slaves. In addition move the table_type enum to global include to allow its use in a downstream patch in the RDMA driver. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-3-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:16:58 -05:00
Patrisious Haddad	3b848dec7e	net/mlx5: fs, Add other_eswitch support for steering tables Add other_eswitch support which allows flow tables creation above vports that reside on different esw managers. The new flag MLX5_FLOW_TABLE_OTHER_ESWITCH indicates if the esw_owner_vhca_id attribute is supported. Note that this is only supported if the Advanced-RDMA cap- rdma_transport_manager_other_eswitch is set. And it is the caller responsibility to check that. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-2-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:16:53 -05:00
Patrisious Haddad	6948417b3f	net/mlx5: Add OTHER_ESWITCH HW capabilities Add OTHER_ESWITCH capabilities which includes other_eswitch and eswitch_owner_vhca_id to all steering objects. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-1-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:16:47 -05:00
Yishai Hadas	2d838c11e1	net/mlx5: Add direct ST mode support for RDMA Add support for direct ST mode where ST Table Location equals PCI_TPH_LOC_NONE. In that case, no steering table exists, the steering tag itself will be used directly by the SW, FW, HW from the mkey. This enables RDMA users to use the current exposed APIs to work in direct mode. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251027-st-direct-mode-v1-2-e0ad953866b6@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:13:34 -05:00
Yishai Hadas	7b8a8ec20c	PCI/TPH: Expose pcie_tph_get_st_table_loc() Expose pcie_tph_get_st_table_loc() to be used by drivers as will be done in the next patch from the series. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251027-st-direct-mode-v1-1-e0ad953866b6@nvidia.com Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:13:02 -05:00
Yosry Ahmed	8a4821412c	KVM: nSVM: Fix and simplify LBR virtualization handling with nested The current scheme for handling LBRV when nested is used is very complicated, especially when L1 does not enable LBRV (i.e. does not set LBR_CTL_ENABLE_MASK). To avoid copying LBRs between VMCB01 and VMCB02 on every nested transition, the current implementation switches between using VMCB01 or VMCB02 as the source of truth for the LBRs while L2 is running. If L2 enables LBR, VMCB02 is used as the source of truth. When L2 disables LBR, the LBRs are copied to VMCB01 and VMCB01 is used as the source of truth. This introduces significant complexity, and incorrect behavior in some cases. For example, on a nested #VMEXIT, the LBRs are only copied from VMCB02 to VMCB01 if LBRV is enabled in VMCB01. This is because L2's writes to MSR_IA32_DEBUGCTLMSR to enable LBR are intercepted and propagated to VMCB01 instead of VMCB02. However, LBRV is only enabled in VMCB02 when L2 is running. This means that if L2 enables LBR and exits to L1, the LBRs will not be propagated from VMCB02 to VMCB01, because LBRV is disabled in VMCB01. There is no meaningful difference in CPUID rate in L2 when copying LBRs on every nested transition vs. the current approach, so do the simple and correct thing and always copy LBRs between VMCB01 and VMCB02 on nested transitions (when LBRV is disabled by L1). Drop the conditional LBRs copying in __svm_{enable/disable}_lbrv() as it is now unnecessary. VMCB02 becomes the only source of truth for LBRs when L2 is running, regardless of LBRV being enabled by L1, drop svm_get_lbr_vmcb() and use svm->vmcb directly in its place. Fixes: `1d5a1b5860` ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running") Cc: stable@vger.kernel.org Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251108004524.1600006-4-yosry.ahmed@linux.dev Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-11-09 08:50:13 +01:00
Yosry Ahmed	fbe5e5f030	KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv() svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on nested transitions where LBRV is used. It checks whether LBRV enablement needs to be changed in the current VMCB, and if it does, it also recalculate intercepts to LBR MSRs. However, there are cases where intercepts need to be updated even when LBRV enablement doesn't. Example scenario: - L1 has MSR_IA32_DEBUGCTLMSR cleared. - L1 runs L2 without LBR_CTL_ENABLE (no LBRV). - L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs. - L2 exits to L1, svm_update_lbrv() is not called on this transition. - L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing. - Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs. Fix it by always recalculating intercepts in svm_update_lbrv(). Fixes: `1d5a1b5860` ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running") Cc: stable@vger.kernel.org Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251108004524.1600006-3-yosry.ahmed@linux.dev Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-11-09 08:49:52 +01:00
Yosry Ahmed	dc55b3c3f6	KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated The APM lists the DbgCtlMsr field as being tracked by the VMCB_LBR clean bit. Always clear the bit when MSR_IA32_DEBUGCTLMSR is updated. The history is complicated, it was correctly cleared for L1 before commit `1d5a1b5860` ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running"). At that point svm_set_msr() started to rely on svm_update_lbrv() to clear the bit, but when nested virtualization is enabled the latter does not always clear it even if MSR_IA32_DEBUGCTLMSR changed. Go back to clearing it directly in svm_set_msr(). Fixes: `1d5a1b5860` ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running") Reported-by: Matteo Rizzo <matteorizzo@google.com> Reported-by: evn@google.com Co-developed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251108004524.1600006-2-yosry.ahmed@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-11-09 08:49:48 +01:00
Paolo Bonzini	ca00c3af8e	Merge tag 'kvmarm-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm654 fixes for 6.18, take #2 * Core fixes - Fix trapping regression when no in-kernel irqchip is present (20251021094358.1963807-1-sascha.bischoff@arm.com) - Check host-provided, untrusted ranges and offsets in pKVM (20251016164541.3771235-1-vdonnefort@google.com) (20251017075710.2605118-1-sebastianene@google.com) - Fix regression restoring the ID_PFR1_EL1 register (20251030122707.2033690-1-maz@kernel.org - Fix vgic ITS locking issues when LPIs are not directly injected (20251107184847.1784820-1-oupton@kernel.org) * Test fixes - Correct target CPU programming in vgic_lpi_stress selftest (20251020145946.48288-1-mdittgen@amazon.de) - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest (20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org) (20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org) * Misc - Update Oliver's email address (20251107012830.1708225-1-oupton@kernel.org)	2025-11-09 08:07:55 +01:00
Paolo Bonzini	0e5ba55750	Merge tag 'kvm-x86-fixes-6.18-rc5' of https://github.com/kvm-x86/linux into HEAD KVM x86 fixes for 6.18: - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM doesn't support virtualization the instructions, but the instructions are gated only by VMXON, i.e. will VM-Exit instead of taking a #UD and thus result in KVM exiting to userspace with an emulation error. - Unload the "FPU" when emulating INIT of XSTATE features if and only if the FPU is actually loaded, instead of trying to predict when KVM will emulate an INIT (CET support missed the MP_STATE path). Add sanity checks to detect and harden against similar bugs in the future. - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded. - Use a raw spinlock for svm->ir_list_lock as the lock is taken during schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y. - Remove guest_memfd bindings on memslot deletion when a gmem file is dying to fix a use-after-free race found by syzkaller. - Fix a goof in the EPT Violation handler where KVM checks the wrong variable when determining if the reported GVA is valid.	2025-11-09 08:07:32 +01:00
Paolo Bonzini	36567f1de1	Merge tag 'kvm-riscv-fixes-6.18-2' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv fixes for 6.18, take #2 - Fix check for local interrupts on riscv32 - Read HGEIP CSR on the correct cpu when checking for IMSIC interrupts - Remove automatic I/O mapping from kvm_arch_prepare_memory_region()	2025-11-09 08:07:03 +01:00
Jean Delvare	002621a4df	kbuild: Let kernel-doc.py use PYTHON3 override It is possible to force a specific version of python to be used when building the kernel by passing PYTHON3= on the make command line. However kernel-doc.py is currently called with python3 hard-coded and thus ignores this setting. Use $(PYTHON3) to run $(KERNELDOC) so that the desired version of python is used. Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://patch.msgid.link/20251107192933.2bfe9e57@endymion Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2025-11-08 19:42:22 -07:00
Linus Torvalds	439fc29dfd	Merge tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel Pull drm fix from Dave Airlie: "Brown paper bag, the dma mask fix which I applied and actually looked through for bad things, actually broke newer GPUs, there might be some latent part in the boot path that is assuming 32-bit still, but we will figure that out elsewhere. nouveau: - revert DMA mask change" * tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel: Revert "drm/nouveau: set DMA mask before creating the flush page"	2025-11-08 15:37:03 -08:00
Linus Torvalds	41d318c47f	Merge tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC fixes from Alexandre Belloni: "The two reverts are for patches that I shouldn't have applied. The rx8025 patch fixes an issue present since 2022: - cpcap, tps6586x: revert incorrect irq enable/disable balance fix - rx8025: fix incorrect register reference" * tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: rx8025: fix incorrect register reference Revert "rtc: cpcap: Fix initial enable_irq/disable_irq balance" Revert "rtc: tps6586x: Fix initial enable_irq/disable_irq balance"	2025-11-08 15:34:23 -08:00
Yuta Hayama	162f24cbb0	rtc: rx8025: fix incorrect register reference This code is intended to operate on the CTRL1 register, but ctrl[1] is actually CTRL2. Correctly, ctrl[0] is CTRL1. Signed-off-by: Yuta Hayama <hayama@lineo.co.jp> Fixes: `71af915650` ("rtc: rx8025: fix 12/24 hour mode detection on RX-8035") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/eae5f479-5d28-4a37-859d-d54794e7628c@lineo.co.jp Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>	2025-11-08 20:56:12 +01:00
Linus Torvalds	7bb4d65125	Merge tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix change notify packet validation check - Refcount fix (e.g. rename error paths) - Fix potential UAF due to missing locks on directory lease refcount * tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: validate change notify buffer before copy smb: client: fix refcount leak in smb2_set_path_attr smb: client: fix potential UAF in smb2_close_cached_fid()	2025-11-08 10:17:30 -08:00
Linus Torvalds	0d7bee10be	Merge tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix AMD PCI root device caching regression that triggers on certain firmware variants - Fix the zen5_rdseed_microcode[] array to be NULL-terminated - Add more AMD models to microcode signature checking * tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add more known models to entry sign checking x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode x86/amd_node: Fix AMD root device caching	2025-11-08 09:01:11 -08:00
Linus Torvalds	b5c0946029	Merge tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a group-throttling bug in the fair scheduler" * tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining	2025-11-08 08:59:05 -08:00
Linus Torvalds	133262cae9	Merge tag 'perf-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix a system hang caused by cpu-clock events deadlock" * tag 'perf-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix system hang caused by cpu-clock usage	2025-11-08 08:54:13 -08:00
Linus Torvalds	e6f55fe790	Merge tag 'locking-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix (well, cut in half) a futex performance regression on PowerPC" * tag 'locking-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Optimize per-cpu reference counting	2025-11-08 08:51:22 -08:00
Linus Torvalds	3636cfa745	Merge tag 'io_uring-6.18-20251107' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fix from Jens Axboe: "Single fix in there, fixing an overflow in calculating the needed segments for converting into a bvec array" * tag 'io_uring-6.18-20251107' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: fix regbuf vector size truncation	2025-11-08 08:47:31 -08:00
Linus Torvalds	e284d5118a	Merge tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: "This contain fixes for the RT and zoned allocator, and a few fixes for atomic writes" * tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: free xfs_busy_extents structure when no RT extents are queued xfs: fix zone selection in xfs_select_open_zone_mru xfs: fix a rtgroup leak when xfs_init_zone fails xfs: fix various problems in xfs_atomic_write_cow_iomap_begin xfs: fix delalloc write failures in software-provided atomic writes	2025-11-08 08:43:01 -08:00
Oliver Upton	4af235bf64	MAINTAINERS: Switch myself to using kernel.org address I've been running into issues with the linux.dev email semi-periodically, switching to my kernel.org address while I go figure out a better home for my inbox. Signed-off-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251107012830.1708225-1-oupton@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:21:20 +00:00
Oliver Upton	66768669f2	KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock xa_release() expects to be called outside of the xa_lock. Fix vgic_add_lpi() to drop the lock before calling and restructure to get rid of the goto label. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Closes: https://lore.kernel.org/kvmarm/d0853e82-7d95-5025-7abf-c6f1e0cdf7b5@huawei.com/ Fixes: `481c9ee846` ("KVM: arm64: vgic-its: Get rid of the lpi_list_lock") Signed-off-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251107184847.1784820-3-oupton@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:19:32 +00:00
Oliver Upton	75360a9a33	KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray Zenghui reports that running a KVM guest with an assigned device and lockdep enabled produces an unfriendly splat due to an inconsistent irq context when taking the lpi_xa's spinlock. This is no good as in rare cases the last reference to an LPI can get dropped after injection of a cached LPI translation. In this case, vgic_put_irq() will release the IRQ struct and take the lpi_xa's spinlock to erase it from the xarray. Reinstate the IRQ ordering and update the lockdep hint accordingly. Note that there is no irqsave equivalent of might_lock(), so just explictly grab and release the spinlock on lockdep kernels. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Closes: https://lore.kernel.org/kvmarm/b4d7cb0f-f007-0b81-46d1-998b15cc14bc@huawei.com/ Fixes: `982f31bbb5` ("KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock") Signed-off-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251107184847.1784820-2-oupton@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:19:32 +00:00
Marc Zyngier	50e7cce81b	KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip Now that the idreg's GIC field is in sync with the irqchip, limit the runtime clearing of these fields to the pathological case where we do not have an in-kernel GIC. While we're at it, use the existing API instead of open-coded accessors to access the ID regs. Fixes: `5cb57a1aff` ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest") Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251030122707.2033690-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:17:28 +00:00
Marc Zyngier	8a9866ff86	KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured Drive the idreg fields indicating the presence of GICv3 directly from the vgic code. This avoids having to do any sort of runtime clearing of the idreg. Fixes: `5cb57a1aff` ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest") Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251030122707.2033690-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:17:28 +00:00
Marc Zyngier	3f9eacf4f0	KVM: arm64: Make all 32bit ID registers fully writable 32bit ID registers aren't getting much love these days, and are often missed in updates. One of these updates broke restoring a GICv2 guest on a GICv3 machine. Instead of performing a piecemeal fix, just bite the bullet and make all 32bit ID regs fully writable. KVM itself never relies on them for anything, and if the VMM wants to mess up the guest, so be it. Fixes: `5cb57a1aff` ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest") Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: stable@vger.kernel.org Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20251030122707.2033690-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-11-08 11:17:28 +00:00

... 7 8 9 10 11 ...

1399042 Commits