Commit Graph

1399042 Commits

Author SHA1 Message Date
Zi Yan
77008e1b2e mm/huge_memory: do not change split_huge_page*() target order silently
Page cache folios from a file system that support large block size (LBS)
can have minimal folio order greater than 0, thus a high order folio might
not be able to be split down to order-0.  Commit e220917fa5 ("mm: split
a folio in minimum folio order chunks") bumps the target order of
split_huge_page*() to the minimum allowed order when splitting a LBS
folio.  This causes confusion for some split_huge_page*() callers like
memory failure handling code, since they expect after-split folios all
have order-0 when split succeeds but in reality get min_order_for_split()
order folios and give warnings.

Fix it by failing a split if the folio cannot be split to the target
order.  Rename try_folio_split() to try_folio_split_to_order() to reflect
the added new_order parameter.  Remove its unused list parameter.

[The test poisons LBS folios, which cannot be split to order-0 folios, and
also tries to poison all memory.  The non split LBS folios take more
memory than the test anticipated, leading to OOM.  The patch fixed the
kernel warning and the test needs some change to avoid OOM.]

Link: https://lkml.kernel.org/r/20251017013630.139907-1-ziy@nvidia.com
Fixes: e220917fa5 ("mm: split a folio in minimum folio order chunks")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-09 21:19:41 -08:00
Bibo Mao
237e74bfa2 LoongArch: KVM: Fix max supported vCPUs set with EIOINTC
VM fails to boot with 256 vCPUs, the detailed command is

  qemu-system-loongarch64 -smp 256

and there is an error reported as follows:

  KVM_LOONGARCH_EXTIOI_INIT_NUM_CPU failed: Invalid argument

There is typo issue in function kvm_eiointc_ctrl_access() when set
max supported vCPUs.

Cc: stable@vger.kernel.org
Fixes: 47256c4c8b ("LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access()")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:07 +08:00
Bibo Mao
11f340ece4 LoongArch: KVM: Skip PMU checking on vCPU context switch
PMU hardware about VM is switched on VM exit to host rather than vCPU
context sched off, PMU is checked and restored on return to VM. It is
not necessary to check PMU on vCPU context sched on callback, since the
request is made on the VM exit entry or VM PMU CSR access abort routine
already.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:07 +08:00
Bibo Mao
5001bcf86e LoongArch: KVM: Restore guest PMU if it is enabled
On LoongArch system, guest PMU hardware is shared by guest and host but
PMU interrupt is separated. PMU is pass-through to VM, and there is PMU
context switch when exit to host and return to guest.

There is optimiation to check whether PMU is enabled by guest. If not,
it is not necessary to return to guest. However, if it is enabled, PMU
context for guest need switch on. Now KVM_REQ_PMU notification is set
on vCPU context switch, but it is missing if there is no vCPU context
switch while PMU is used by guest VM, so fix it.

Cc: <stable@vger.kernel.org>
Fixes: f4e40ea9f7 ("LoongArch: KVM: Add PMU support for guest")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:07 +08:00
Bibo Mao
d3c9515e4f LoongArch: KVM: Add delay until timer interrupt injected
When timer is fired in oneshot mode, CSR.TVAL will stop with value -1
rather than 0. However when the register CSR.TVAL is restored, it will
continue to count down rather than stop there.

Now the method is to write 0 to CSR.TVAL, wait to count down for 1 cycle
at least, which is 10ns with a timer freq 100MHz, and then retore timer
interrupt status. Here add 2 cycles delay to assure that timer interrupt
is injected.

With this patch, timer selftest case passes to run always.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:07 +08:00
Bibo Mao
37e9d1a913 LoongArch: KVM: Set page with write attribute if dirty track disabled
With secondary MMU page table, if there is a read page fault, the page's
write attribute will not set even if it is writable from master MMU page
table. This logic only works if dirty tracking is enabled, so page table
should be set with _PAGE_WRITE if dirty tracking is disabled.

It reduces extra page fault on secondary MMU page table if a VM finishes
migration, when the master MMU page table is ready and the secondary MMU
page is fresh.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:07 +08:00
Qiang Ma
62cda5e54f LoongArch: kexec: Print out debugging message if required
When specifying '-d' for kexec_file_load interface, loaded locations of
kernel/initrd/cmdline etc can be printed out to help debug.

Commit eb7622d908 ("kexec_file, riscv: print out debugging message if
required") fixes the same issue on RISC-V.

So, remove kexec_image_info() because the content has been printed out
in generic code.

And on Loongson-3A5000, the printed messages look like below:

 kexec_file: kernel: 00000000d9aad283 kernel_size: 0x2e77f30
 kexec_file(EFI): No LoongArch PE image header.
 kexec_file: Loaded initrd at 0x80000000 bufsz=0x1637cd0 memsz=0x1638000
 kexec_file(ELF): Loaded kernel at 0x9c20000 bufsz=0x27f1800 memsz=0x2950000
 kexec_file: nr_segments = 2
 kexec_file: segment[0]: buf=0x00000000cc3e6c33 bufsz=0x27f1800 mem=0x9c20000 memsz=0x2950000
 kexec_file: segment[1]: buf=0x00000000bb75a541 bufsz=0x1637cd0 mem=0x80000000 memsz=0x1638000
 kexec_file: kexec_file_load: type:0, start:0xb15d000 head:0x18db60002 flags:0x8

Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:07 +08:00
Youling Tang
df16b8956c LoongArch: kexec: Initialize the kexec_buf structure
The kexec_buf structure was previously declared without initialization.
commit bf454ec31a ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.

This is also triggering a UBSAN warning when the uninitialized data is
accessed:

        ------------[ cut here ]------------
        UBSAN: invalid-load in ./include/linux/kexec.h:210:10
        load of value 252 is not a valid value for type '_Bool'

Zero-initializing kexec_buf at declaration ensures all fields are
cleanly set, preventing future instances of uninitialized memory being
used.

Fixes: bf454ec31a ("kexec_file: allow to place kexec_buf randomly")
Link: https://lore.kernel.org/r/20250827-kbuf_all-v1-2-1df9882bb01a@debian.org
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:07 +08:00
Huacai Chen
eeeeaafa62 LoongArch: Use correct accessor to read FWPC/MWPC
CSR.FWPC and CSR.MWPC are 32bit registers, so use csr_read32() rather
than csr_read64() to read the values of FWPC/MWPC.

Cc: stable@vger.kernel.org
Fixes: edffa33c7b ("LoongArch: Add hardware breakpoints/watchpoints support")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Tiezhu Yang
4c8a7c9827 LoongArch: Refine the init_hw_perf_events() function
(1) Use the existing CPUCFG6_PMNUM_SHIFT macro definition instead of
the magic value 4 to get the PMU number.

(2) Detect the value of PMU bits via CPUCFG instruction according to
the ISA manual instead of hard-coded as 64, because the value may be
different for various micro-architectures.

Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_cpucfg
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Vishal Moola (Oracle)
17f838512a LoongArch: Remove __GFP_HIGHMEM masking in pud_alloc_one()
Remove the unnecessary __GFP_HIGHMEM masking in pud_alloc_one(), which
was introduced with commit 382739797f ("loongarch: convert various
functions to use ptdescs"). GFP_KERNEL doesn't contain __GFP_HIGHMEM.

Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Tianyang Zhang
a073d637c8 LoongArch: Let {pte,pmd}_modify() record the status of _PAGE_DIRTY
Now if the PTE/PMD is dirty with _PAGE_DIRTY but without _PAGE_MODIFIED,
after {pte,pmd}_modify() we lose _PAGE_DIRTY, then {pte,pmd}_dirty()
return false and lead to data loss. This can happen in certain scenarios
such as HW PTW doesn't set _PAGE_MODIFIED automatically, so here we need
_PAGE_MODIFIED to record the dirty status (_PAGE_DIRTY).

The new modification involves checking whether the original PTE/PMD has
the _PAGE_DIRTY flag. If it exists, the _PAGE_MODIFIED bit is also set,
ensuring that the {pte,pmd}_dirty() interface can always return accurate
information.

Cc: stable@vger.kernel.org
Co-developed-by: Liupu Wang <wangliupu@loongson.cn>
Signed-off-by: Liupu Wang <wangliupu@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
2025-11-10 08:37:06 +08:00
Huacai Chen
ce5ad03e45 LoongArch: Consolidate max_pfn & max_low_pfn calculation
Now there 5 places which calculate max_pfn & max_low_pfn:
1. in fdt_setup() for FDT systems;
2. in memblock_init() for ACPI systems;
3. in init_numa_memory() for NUMA systems;
4. in arch_mem_init() to recalculate for "mem=" cmdline;
5. in paging_init() to recalculate for NUMA systems.

Since memblock_init() is called both for ACPI and FDT systems, move the
calculation out of the for_each_efi_memory_desc() loop can eliminate the
first case. The last case is very questionable (may be derived from the
MIPS/Loongson code) and breaks the "mem=" cmdline, so should be removed.
And then the NUMA version of paging_init() can be also eliminated.

After consolidation there are 3 places of calculation:
1. in memblock_init() for both ACPI and FDT systems;
2. in init_numa_memory() to recalculate for NUMA systems;
3. in arch_mem_init() to recalculate for the "mem=" cmdline.

For all cases the calculation is:
max_pfn = PFN_DOWN(memblock_end_of_DRAM());
max_low_pfn = min(PFN_DOWN(HIGHMEM_START), max_pfn);

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Huacai Chen
43a9e6a10b LoongArch: Consolidate early_ioremap()/ioremap_prot()
1. Use phys_addr_t instead of u64, which can work for both 32/64 bits.
2. Check whether the input physical address is above TO_PHYS_MASK (and
   return NULL if yes) for the DMW version.

Note: In theory early_ioremap() also need the TO_PHYS_MASK checking, but
the UEFI BIOS pass some DMW virtual addresses.

Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Huacai Chen
4e67526840 LoongArch: Use physical addresses for CSR_MERRENTRY/CSR_TLBRENTRY
Now we use virtual addresses to fill CSR_MERRENTRY/CSR_TLBRENTRY, but
hardware hope physical addresses. Now it works well because the high
bits are ignored above PA_BITS (48 bits), but explicitly use physical
addresses can avoid potential bugs. So fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Huacai Chen
f28abb9f96 LoongArch: Clarify 3 MSG interrupt features
LoongArch's MSG interrupt features are used across multiple subsystems.
Clarify these features to avoid misuse, existing users will be adjusted
if necessary.

MSGINT: Infrastructure, means the CPU core supports message interupts.
Indicated by CPUCFG1.MSGINT.

AVECINT: AVEC interrupt controller based on MSGINT, means the CPU chip
supports direct message interrupts. Indicated by IOCSR.FEATURES.DMSI.

REDIRECTINT: REDIRECT interrupt controller based on MSGINT and AVECINT,
means the CPU chip supports redirect message interrupts. Indicated by
IOCSR.FEATURES.RMSI.

For example:
Loongson-3A5000/3C5000 doesn't support MSGINT/AVECINT/REDIRECTINT;
Loongson-3A6000 supports MSGINT but doesn't support AVECINT/REDIRECTINT;
Loongson-3C6000 supports MSGINT/AVECINT/REDIRECTINT.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Xi Ruoyao
fe4b3a34e9 rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags
It's used to work around an objtool issue since commit abb2a55722
("LoongArch: Add cflag -fno-isolate-erroneous-paths-dereference"), but
it's then passed to bindgen and cause an error because Clang does not
have this option.

Fixes: abb2a55722 ("LoongArch: Add cflag -fno-isolate-erroneous-paths-dereference")
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10 08:37:06 +08:00
Joshua Rogers
98a5fd31cb ksmbd: close accepted socket when per-IP limit rejects connection
When the per-IP connection limit is exceeded in ksmbd_kthread_fn(),
the code sets ret = -EAGAIN and continues the accept loop without
closing the just-accepted socket. That leaks one socket per rejected
attempt from a single IP and enables a trivial remote DoS.

Release client_sk before continuing.

This bug was found with ZeroPath.

Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-09 17:47:52 -06:00
Joshua Rogers
e904d81ad1 smb: server: rdma: avoid unmapping posted recv on accept failure
smb_direct_prepare_negotiation() posts a recv and then, if
smb_direct_accept_client() fails, calls put_recvmsg() on the same
buffer. That unmaps and recycles a buffer that is still posted on
the QP., which can lead to device DMA into unmapped or reused memory.

Track whether the recv was posted and only return it if it was never
posted. If accept fails after a post, leave it for teardown to drain
and complete safely.

Signed-off-by: Joshua Rogers <linux@joshua.hu>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-09 17:47:49 -06:00
Linus Torvalds
e9a6fb0bcd Linux 6.18-rc5 v6.18-rc5 2025-11-09 15:10:19 -08:00
Linus Torvalds
f850568efe Merge tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "Two reverts merged into one commit to handle a regression caused by a
  wrong cleanup because the underlying implications were unclear"

* tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: muxes: pca954x: Fix broken reset-gpio usage
2025-11-09 09:29:44 -08:00
Linus Torvalds
3461e958c1 Merge tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:

 - Strip trailing padding bytes from modules.builtin.modinfo to fix
   error during modules_install with certain versions of kmod

 - Drop unused static inline function warning in .c files with clang
   from W=1 to W=2

 - Ensure kernel-doc.py invocations use the PYTHON3 make variable to
   ensure user's choice of Python interpreter is always respected

* tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: Let kernel-doc.py use PYTHON3 override
  compiler_types: Move unused static inline functions warning to W=2
  kbuild: Strip trailing padding bytes from modules.builtin.modinfo
2025-11-09 09:22:08 -08:00
Patrisious Haddad
583b4fe1c1 net/mlx5: fs, set non default device per namespace
Add mlx5_fs_set_root_dev() function which swaps the root namespace
core device with another one for a given table_type.

It is intended for usage only by RDMA_TRANSPORT tables in case of LAG
configuration, to allow the creation of tables during LAG always
through the LAG master device, which is valid since during LAG the
master is allowed to manage the RDMA_TRANSPORT tables of its slaves.

In addition move the table_type enum to global include to allow its use
in a downstream patch in the RDMA driver.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-3-98bb707b5d57@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09 05:16:58 -05:00
Patrisious Haddad
3b848dec7e net/mlx5: fs, Add other_eswitch support for steering tables
Add other_eswitch support which allows flow tables creation above vports
that reside on different esw managers.

The new flag MLX5_FLOW_TABLE_OTHER_ESWITCH indicates if the
esw_owner_vhca_id attribute is supported.

Note that this is only supported if the Advanced-RDMA cap-
rdma_transport_manager_other_eswitch is set.
And it is the caller responsibility to check that.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-2-98bb707b5d57@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09 05:16:53 -05:00
Patrisious Haddad
6948417b3f net/mlx5: Add OTHER_ESWITCH HW capabilities
Add OTHER_ESWITCH capabilities which includes other_eswitch and
eswitch_owner_vhca_id to all steering objects.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-1-98bb707b5d57@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09 05:16:47 -05:00
Yishai Hadas
2d838c11e1 net/mlx5: Add direct ST mode support for RDMA
Add support for direct ST mode where ST Table Location equals
PCI_TPH_LOC_NONE.

In that case, no steering table exists, the steering tag itself will be
used directly by the SW, FW, HW from the mkey.

This enables RDMA users to use the current exposed APIs to work in
direct mode.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20251027-st-direct-mode-v1-2-e0ad953866b6@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09 05:13:34 -05:00
Yishai Hadas
7b8a8ec20c PCI/TPH: Expose pcie_tph_get_st_table_loc()
Expose pcie_tph_get_st_table_loc() to be used by drivers as will be done
in the next patch from the series.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20251027-st-direct-mode-v1-1-e0ad953866b6@nvidia.com
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09 05:13:02 -05:00
Yosry Ahmed
8a4821412c KVM: nSVM: Fix and simplify LBR virtualization handling with nested
The current scheme for handling LBRV when nested is used is very
complicated, especially when L1 does not enable LBRV (i.e. does not set
LBR_CTL_ENABLE_MASK).

To avoid copying LBRs between VMCB01 and VMCB02 on every nested
transition, the current implementation switches between using VMCB01 or
VMCB02 as the source of truth for the LBRs while L2 is running. If L2
enables LBR, VMCB02 is used as the source of truth. When L2 disables
LBR, the LBRs are copied to VMCB01 and VMCB01 is used as the source of
truth. This introduces significant complexity, and incorrect behavior in
some cases.

For example, on a nested #VMEXIT, the LBRs are only copied from VMCB02
to VMCB01 if LBRV is enabled in VMCB01. This is because L2's writes to
MSR_IA32_DEBUGCTLMSR to enable LBR are intercepted and propagated to
VMCB01 instead of VMCB02. However, LBRV is only enabled in VMCB02 when
L2 is running.

This means that if L2 enables LBR and exits to L1, the LBRs will not be
propagated from VMCB02 to VMCB01, because LBRV is disabled in VMCB01.

There is no meaningful difference in CPUID rate in L2 when copying LBRs
on every nested transition vs. the current approach, so do the simple
and correct thing and always copy LBRs between VMCB01 and VMCB02 on
nested transitions (when LBRV is disabled by L1). Drop the conditional
LBRs copying in __svm_{enable/disable}_lbrv() as it is now unnecessary.

VMCB02 becomes the only source of truth for LBRs when L2 is running,
regardless of LBRV being enabled by L1, drop svm_get_lbr_vmcb() and use
svm->vmcb directly in its place.

Fixes: 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251108004524.1600006-4-yosry.ahmed@linux.dev
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-09 08:50:13 +01:00
Yosry Ahmed
fbe5e5f030 KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on
nested transitions where LBRV is used. It checks whether LBRV enablement
needs to be changed in the current VMCB, and if it does, it also
recalculate intercepts to LBR MSRs.

However, there are cases where intercepts need to be updated even when
LBRV enablement doesn't. Example scenario:
- L1 has MSR_IA32_DEBUGCTLMSR cleared.
- L1 runs L2 without LBR_CTL_ENABLE (no LBRV).
- L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv()
  sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs.
- L2 exits to L1, svm_update_lbrv() is not called on this transition.
- L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that
  LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing.
- Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.

Fix it by always recalculating intercepts in svm_update_lbrv().

Fixes: 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251108004524.1600006-3-yosry.ahmed@linux.dev
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-09 08:49:52 +01:00
Yosry Ahmed
dc55b3c3f6 KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
The APM lists the DbgCtlMsr field as being tracked by the VMCB_LBR clean
bit.  Always clear the bit when MSR_IA32_DEBUGCTLMSR is updated.

The history is complicated, it was correctly cleared for L1 before
commit 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when
L2 is running").  At that point svm_set_msr() started to rely on
svm_update_lbrv() to clear the bit, but when nested virtualization
is enabled the latter does not always clear it even if MSR_IA32_DEBUGCTLMSR
changed. Go back to clearing it directly in svm_set_msr().

Fixes: 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
Reported-by: Matteo Rizzo <matteorizzo@google.com>
Reported-by: evn@google.com
Co-developed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251108004524.1600006-2-yosry.ahmed@linux.dev
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-09 08:49:48 +01:00
Paolo Bonzini
ca00c3af8e Merge tag 'kvmarm-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm654 fixes for 6.18, take #2

* Core fixes

  - Fix trapping regression when no in-kernel irqchip is present
    (20251021094358.1963807-1-sascha.bischoff@arm.com)

  - Check host-provided, untrusted ranges and offsets in pKVM
    (20251016164541.3771235-1-vdonnefort@google.com)
    (20251017075710.2605118-1-sebastianene@google.com)

  - Fix regression restoring the ID_PFR1_EL1 register
    (20251030122707.2033690-1-maz@kernel.org

  - Fix vgic ITS locking issues when LPIs are not directly injected
    (20251107184847.1784820-1-oupton@kernel.org)

* Test fixes

  - Correct target CPU programming in vgic_lpi_stress selftest
    (20251020145946.48288-1-mdittgen@amazon.de)

  - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
    (20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org)
    (20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org)

* Misc

  - Update Oliver's email address
    (20251107012830.1708225-1-oupton@kernel.org)
2025-11-09 08:07:55 +01:00
Paolo Bonzini
0e5ba55750 Merge tag 'kvm-x86-fixes-6.18-rc5' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.18:

 - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM
   doesn't support virtualization the instructions, but the instructions
   are gated only by VMXON, i.e. will VM-Exit instead of taking a #UD and
   thus result in KVM exiting to userspace with an emulation error.

 - Unload the "FPU" when emulating INIT of XSTATE features if and only if
   the FPU is actually loaded, instead of trying to predict when KVM will
   emulate an INIT (CET support missed the MP_STATE path).  Add sanity
   checks to detect and harden against similar bugs in the future.

 - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded.

 - Use a raw spinlock for svm->ir_list_lock as the lock is taken during
   schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y.

 - Remove guest_memfd bindings on memslot deletion when a gmem file is dying
   to fix a use-after-free race found by syzkaller.

 - Fix a goof in the EPT Violation handler where KVM checks the wrong
   variable when determining if the reported GVA is valid.
2025-11-09 08:07:32 +01:00
Paolo Bonzini
36567f1de1 Merge tag 'kvm-riscv-fixes-6.18-2' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.18, take #2

- Fix check for local interrupts on riscv32
- Read HGEIP CSR on the correct cpu when checking for IMSIC interrupts
- Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
2025-11-09 08:07:03 +01:00
Jean Delvare
002621a4df kbuild: Let kernel-doc.py use PYTHON3 override
It is possible to force a specific version of python to be used when
building the kernel by passing PYTHON3= on the make command line.
However kernel-doc.py is currently called with python3 hard-coded and
thus ignores this setting.

Use $(PYTHON3) to run $(KERNELDOC) so that the desired version of
python is used.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://patch.msgid.link/20251107192933.2bfe9e57@endymion
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-11-08 19:42:22 -07:00
Linus Torvalds
439fc29dfd Merge tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fix from Dave Airlie:
 "Brown paper bag, the dma mask fix which I applied and actually looked
  through for bad things, actually broke newer GPUs, there might be some
  latent part in the boot path that is assuming 32-bit still, but we
  will figure that out elsewhere.

  nouveau:
   - revert DMA mask change"

* tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel:
  Revert "drm/nouveau: set DMA mask before creating the flush page"
2025-11-08 15:37:03 -08:00
Linus Torvalds
41d318c47f Merge tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
 "The two reverts are for patches that I shouldn't have applied. The
  rx8025 patch fixes an issue present since 2022:

   - cpcap, tps6586x: revert incorrect irq enable/disable balance fix

   - rx8025: fix incorrect register reference"

* tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: rx8025: fix incorrect register reference
  Revert "rtc: cpcap: Fix initial enable_irq/disable_irq balance"
  Revert "rtc: tps6586x: Fix initial enable_irq/disable_irq balance"
2025-11-08 15:34:23 -08:00
Yuta Hayama
162f24cbb0 rtc: rx8025: fix incorrect register reference
This code is intended to operate on the CTRL1 register, but ctrl[1] is
actually CTRL2. Correctly, ctrl[0] is CTRL1.

Signed-off-by: Yuta Hayama <hayama@lineo.co.jp>
Fixes: 71af915650 ("rtc: rx8025: fix 12/24 hour mode detection on RX-8035")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/eae5f479-5d28-4a37-859d-d54794e7628c@lineo.co.jp
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-11-08 20:56:12 +01:00
Linus Torvalds
7bb4d65125 Merge tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - Fix change notify packet validation check

 - Refcount fix (e.g. rename error paths)

 - Fix potential UAF due to missing locks on directory lease refcount

* tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: validate change notify buffer before copy
  smb: client: fix refcount leak in smb2_set_path_attr
  smb: client: fix potential UAF in smb2_close_cached_fid()
2025-11-08 10:17:30 -08:00
Linus Torvalds
0d7bee10be Merge tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Fix AMD PCI root device caching regression that triggers
   on certain firmware variants

 - Fix the zen5_rdseed_microcode[] array to be NULL-terminated

 - Add more AMD models to microcode signature checking

* tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Add more known models to entry sign checking
  x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
  x86/amd_node: Fix AMD root device caching
2025-11-08 09:01:11 -08:00
Linus Torvalds
b5c0946029 Merge tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix a group-throttling bug in the fair scheduler"

* tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
2025-11-08 08:59:05 -08:00
Linus Torvalds
133262cae9 Merge tag 'perf-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
 "Fix a system hang caused by cpu-clock events deadlock"

* tag 'perf-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix system hang caused by cpu-clock usage
2025-11-08 08:54:13 -08:00
Linus Torvalds
e6f55fe790 Merge tag 'locking-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
 "Fix (well, cut in half) a futex performance regression on PowerPC"

* tag 'locking-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Optimize per-cpu reference counting
2025-11-08 08:51:22 -08:00
Linus Torvalds
3636cfa745 Merge tag 'io_uring-6.18-20251107' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fix from Jens Axboe:
 "Single fix in there, fixing an overflow in calculating the needed
  segments for converting into a bvec array"

* tag 'io_uring-6.18-20251107' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring: fix regbuf vector size truncation
2025-11-08 08:47:31 -08:00
Linus Torvalds
e284d5118a Merge tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
 "This contain fixes for the RT and zoned allocator, and a few fixes for
  atomic writes"

* tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: free xfs_busy_extents structure when no RT extents are queued
  xfs: fix zone selection in xfs_select_open_zone_mru
  xfs: fix a rtgroup leak when xfs_init_zone fails
  xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
  xfs: fix delalloc write failures in software-provided atomic writes
2025-11-08 08:43:01 -08:00
Oliver Upton
4af235bf64 MAINTAINERS: Switch myself to using kernel.org address
I've been running into issues with the linux.dev email
semi-periodically, switching to my kernel.org address while I go figure
out a better home for my inbox.

Signed-off-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251107012830.1708225-1-oupton@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-11-08 11:21:20 +00:00
Oliver Upton
66768669f2 KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
xa_release() expects to be called outside of the xa_lock. Fix
vgic_add_lpi() to drop the lock before calling and restructure to get
rid of the goto label.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Closes: https://lore.kernel.org/kvmarm/d0853e82-7d95-5025-7abf-c6f1e0cdf7b5@huawei.com/
Fixes: 481c9ee846 ("KVM: arm64: vgic-its: Get rid of the lpi_list_lock")
Signed-off-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251107184847.1784820-3-oupton@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-11-08 11:19:32 +00:00
Oliver Upton
75360a9a33 KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
Zenghui reports that running a KVM guest with an assigned device and
lockdep enabled produces an unfriendly splat due to an inconsistent irq
context when taking the lpi_xa's spinlock.

This is no good as in rare cases the last reference to an LPI can get
dropped after injection of a cached LPI translation. In this case,
vgic_put_irq() will release the IRQ struct and take the lpi_xa's
spinlock to erase it from the xarray.

Reinstate the IRQ ordering and update the lockdep hint accordingly. Note
that there is no irqsave equivalent of might_lock(), so just explictly
grab and release the spinlock on lockdep kernels.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Closes: https://lore.kernel.org/kvmarm/b4d7cb0f-f007-0b81-46d1-998b15cc14bc@huawei.com/
Fixes: 982f31bbb5 ("KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock")
Signed-off-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251107184847.1784820-2-oupton@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-11-08 11:19:32 +00:00
Marc Zyngier
50e7cce81b KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip
Now that the idreg's GIC field is in sync with the irqchip, limit
the runtime clearing of these fields to the pathological case where
we do not have an in-kernel GIC.

While we're at it, use the existing API instead of open-coded
accessors to access the ID regs.

Fixes: 5cb57a1aff ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest")
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251030122707.2033690-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-11-08 11:17:28 +00:00
Marc Zyngier
8a9866ff86 KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
Drive the idreg fields indicating the presence of GICv3 directly from
the vgic code. This avoids having to do any sort of runtime clearing
of the idreg.

Fixes: 5cb57a1aff ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest")
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251030122707.2033690-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-11-08 11:17:28 +00:00
Marc Zyngier
3f9eacf4f0 KVM: arm64: Make all 32bit ID registers fully writable
32bit ID registers aren't getting much love these days, and are
often missed in updates. One of these updates broke restoring
a GICv2 guest on a GICv3 machine.

Instead of performing a piecemeal fix, just bite the bullet
and make all 32bit ID regs fully writable. KVM itself never
relies on them for anything, and if the VMM wants to mess up
the guest, so be it.

Fixes: 5cb57a1aff ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: stable@vger.kernel.org
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251030122707.2033690-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-11-08 11:17:28 +00:00