linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-05 09:36:39 -04:00

Author	SHA1	Message	Date
Linus Torvalds	dbfc6422a3	Merge tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Remove dead code leftovers after a recent mitigations cleanup which fail a Clang build - Make sure a Retbleed mitigation message is printed only when necessary - Correct the last Zen1 microcode revision for which Entrysign sha256 check is needed - Fix a NULL ptr deref when mounting the resctrl fs on a system which supports assignable counters but where L3 total and local bandwidth monitoring has been disabled at boot * tag 'x86_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Remove dead code which might prevent from building x86/bugs: Qualify RETBLEED_INTEL_MSG x86/microcode: Fix Entrysign revision check for Zen1/Naples x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode	2025-10-26 09:57:18 -07:00
Linus Torvalds	9bb956508c	Merge tag 'riscv-for-linus-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: - Close a race during boot between userspace vDSO usage and some late-initialized vDSO data - Improve performance on systems with non-CPU-cache-coherent DMA-capable peripherals by enabling write combining on pgprot_dmacoherent() allocations - Add human-readable detail for RISC-V IPI tracing - Provide more information to zsmalloc on 64-bit RISC-V to improve allocation - Silence useless boot messages about CPUs that have been disabled in DT - Resolve some compiler and smatch warnings and remove a redundant macro * tag 'riscv-for-linus-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: hwprobe: avoid uninitialized variable use in hwprobe_arch_id() riscv: cpufeature: avoid uninitialized variable in has_thead_homogeneous_vlenb() riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot riscv: add a forward declaration for cpuinfo_op RISC-V: Don't print details of CPUs disabled in DT riscv: Remove the PER_CPU_OFFSET_SHIFT macro riscv: mm: Define MAX_POSSIBLE_PHYSMEM_BITS for zsmalloc riscv: Register IPI IRQs with unique names ACPI: RIMT: Fix unused function warnings when CONFIG_IOMMU_API is disabled RISC-V: Define pgprot_dmacoherent() for non-coherent devices	2025-10-25 09:35:26 -07:00
Linus Torvalds	31009296f8	Merge tag 'pci-v6.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Add DWC custom pci_ops for the root bus instead of overwriting the DBI base address, which broke drivers that rely on the DBI address for iATU programming; fixes an FU740 probe regression (Krishna Chaitanya Chundru) - Revert qcom ECAM enablement, which is rendered unnecessary by the DWC custom pci_ops (Krishna Chaitanya Chundru) - Fix longstanding MIPS Malta resource registration issues to avoid exposing them when the next commit fixes the boot failure (Maciej W. Rozycki) - Use pcibios_align_resource() on MIPS Malta to fix boot failure caused by using the generic pci_enable_resources() (Ilpo Järvinen) - Enable only ASPM L0s and L1, not L1 PM Substates, for devicetree platforms because we lack information required to configure L1 Substates; fixes regressions on powerpc and rockchip. A qcom regression (L1 Substates no longer enabled) remains and will be addressed next (Bjorn Helgaas) * tag 'pci-v6.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/ASPM: Enable only L0s and L1 for devicetree platforms MIPS: Malta: Use pcibios_align_resource() to block io range MIPS: Malta: Fix PCI southbridge legacy resource reservations MIPS: Malta: Fix keyboard resource preventing i8042 driver from registering Revert "PCI: qcom: Prepare for the DWC ECAM enablement" PCI: dwc: Use custom pci_ops for root bus DBI vs ECAM config access	2025-10-24 16:43:08 -07:00
Linus Torvalds	9b9b6e71ee	Merge tag 'soc-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The main change this time is an update to the MAINTAINERS file, listing Krzysztof Kozlowski, Alexandre Belloni, and Linus Walleij as additional maintainers for the SoC tree, in order to go back to a group maintainership. Drew Fustini joins as an additional reviewer for the SoC tree. Thanks to all of you for volunteering to help out. On the actual bugfixes, we have a few correctness changes for firmware drivers (qtee, arm-ffa, scmi) and two devicetree fixes for Raspberry Pi" * tag 'soc-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: soc: officially expand maintainership team firmware: arm_scmi: Fix premature SCMI_XFER_FLAG_IS_RAW clearing in raw mode firmware: arm_scmi: Skip RAW initialization on failure include: trace: Fix inflight count helper on failed initialization firmware: arm_scmi: Account for failed debug initialization ARM: dts: broadcom: rpi: Switch to V3D firmware clock arm64: dts: broadcom: bcm2712: Define VGIC interrupt firmware: arm_ffa: Add support for IMPDEF value in the memory access descriptor tee: QCOMTEE should depend on ARCH_QCOM tee: qcom: return -EFAULT instead of -EINVAL if copy_from_user() fails tee: qcom: prevent potential off by one read	2025-10-24 11:15:17 -07:00
Andy Shevchenko	84dfce65a7	x86/bugs: Remove dead code which might prevent from building Clang, in particular, is not happy about dead code: arch/x86/kernel/cpu/bugs.c:1830:20: error: unused function 'match_option' [-Werror,-Wunused-function] 1830 \| static inline bool match_option(const char arg, int arglen, const char opt) \| ^~~~~~~~~~~~ 1 error generated. Remove a leftover from the previous cleanup. Fixes: `02ac6cc8c5` ("x86/bugs: Simplify SSB cmdline parsing") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251024125959.1526277-1-andriy.shevchenko%40linux.intel.com	2025-10-24 09:42:00 -07:00
Arnd Bergmann	e58794dbf7	Merge tag 'arm-soc/for-6.18/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM64-based SoCs Device Tree fixes for 6.18, please pull the following: - Peter describes the VGIC interrupt line such that KVM can be used on Raspberry Pi 5 systems. * tag 'arm-soc/for-6.18/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux: arm64: dts: broadcom: bcm2712: Define VGIC interrupt Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-10-23 22:30:41 +02:00
Linus Torvalds	266ee584e5	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Do not make a clean PTE dirty in pte_mkwrite() The Arm architecture, for backwards compatibility reasons (ARMv8.0 before in-hardware dirty bit management - DBM), uses the PTE_RDONLY bit to mean !dirty while the PTE_WRITE bit means DBM enabled. The arm64 pte_mkwrite() simply clears the PTE_RDONLY bit and this inadvertently makes the PTE pte_hw_dirty(). Most places making a PTE writable also invoke pte_mkdirty() but do_swap_page() does not and we end up with dirty, freshly swapped in, writeable pages. - Do not warn if the destination page is already MTE-tagged in copy_highpage() In the majority of the cases, a destination page copied into is freshly allocated without the PG_mte_tagged flag set. However, the folio migration may be restarted if __folio_migrate_mapping() failed, triggering the benign WARN_ON_ONCE(). * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not warn if the page is already tagged in copy_highpage() arm64, mm: avoid always making PTE dirty in pte_mkwrite()	2025-10-23 09:26:47 -10:00
Catalin Marinas	b98c94eed4	arm64: mte: Do not warn if the page is already tagged in copy_highpage() The arm64 copy_highpage() assumes that the destination page is newly allocated and not MTE-tagged (PG_mte_tagged unset) and warns accordingly. However, following commit `060913999d` ("mm: migrate: support poisoned recover from migrate folio"), folio_mc_copy() is called before __folio_migrate_mapping(). If the latter fails (-EAGAIN), the copy will be done again to the same destination page. Since copy_highpage() already set the PG_mte_tagged flag, this second copy will warn. Replace the WARN_ON_ONCE(page already tagged) in the arm64 copy_highpage() with a comment. Reported-by: syzbot+d1974fc28545a3e6218b@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/68dda1ae.a00a0220.102ee.0065.GAE@google.com Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: stable@vger.kernel.org # 6.12.x Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-10-23 17:34:58 +01:00
Linus Torvalds	0f3ad9c610	Merge tag 'mm-hotfixes-stable-2025-10-22-12-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "17 hotfixes. 12 are cc:stable and 14 are for MM. There's a two-patch DAMON series from SeongJae Park which addresses a missed check and possible memory leak. Apart from that it's all singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-10-22-12-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: csky: abiv2: adapt to new folio flags field mm/damon/core: use damos_commit_quota_goal() for new goal commit mm/damon/core: fix potential memory leak by cleaning ops_filter in damon_destroy_scheme hugetlbfs: move lock assertions after early returns in huge_pmd_unshare() vmw_balloon: indicate success when effectively deflating during migration mm/damon/core: fix list_add_tail() call on damon_call() mm/mremap: correctly account old mapping after MREMAP_DONTUNMAP remap mm: prevent poison consumption when splitting THP ocfs2: clear extent cache after moving/defragmenting extents mm: don't spin in add_stack_record when gfp flags don't allow dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC mm/damon/sysfs: dealloc commit test ctx always mm/damon/sysfs: catch commit test ctx alloc failure hung_task: fix warnings caused by unaligned lock pointers	2025-10-22 14:57:35 -10:00
Ilpo Järvinen	f294a5fd34	MIPS: Malta: Use pcibios_align_resource() to block io range According to Maciej W. Rozycki <macro@orcam.me.uk>, the mips_pcibios_init() for malta adjusts root bus IO resource start address to prevent interfering with PIIX4 I/O cycle decoding. Adjusting lower bound leaves PIIX4 IO resources outside of the root bus resource and assign_fixed_resource_on_bus() does not link the resources into the resource tree. Prior to commit `ae81aad5c2` ("MIPS: PCI: Use pci_enable_resources()") the arch specific pcibios_enable_resources() did not check if the resources were assigned which diverges from what PCI core checks, effectively hiding the PIIX4 IO resources were not properly within the resource tree. After starting to use pcibios_enable_resources() from PCI core, enabling PIIX4 fails: ata_piix 0000:00:0a.1: BAR 0 [io 0x01f0-0x01f7]: not claimed; can't enable device ata_piix 0000:00:0a.1: probe with driver ata_piix failed with error -22 MIPS PCI code already has support for enforcing lower bounds using PCIBIOS_MIN_IO in pcibios_align_resource() without altering the IO window start address itself. Make malta PCI code too to use PCIBIOS_MIN_IO. Fixes: `ae81aad5c2` ("MIPS: PCI: Use pci_enable_resources()") Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/linux-pci/9085ab12-1559-4462-9b18-f03dcb9a4088@roeck-us.net/ Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/linux-pci/alpine.DEB.2.21.2510132229120.39634@angie.orcam.me.uk/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://patch.msgid.link/20251017110903.1973-1-ilpo.jarvinen@linux.intel.com	2025-10-22 11:06:31 -05:00
Maciej W. Rozycki	1d5d166361	MIPS: Malta: Fix PCI southbridge legacy resource reservations Covering the PCI southbridge legacy port I/O range with a northbridge resource reservation prevents MIPS Malta platform code from claiming its standard legacy resources. This is because request_resource() calls cause a clash with the previous reservation and consequently fail. Change to using insert_resource() so as to prevent the clash, switching the legacy reservations from: 00000000-00ffffff : MSC PCI I/O 00000020-00000021 : pic1 00000070-00000077 : rtc0 000000a0-000000a1 : pic2 [...] to: 00000000-00ffffff : MSC PCI I/O 00000000-0000001f : dma1 00000020-00000021 : pic1 00000040-0000005f : timer 00000060-0000006f : keyboard 00000070-00000077 : rtc0 00000080-0000008f : dma page reg 000000a0-000000a1 : pic2 000000c0-000000df : dma2 [...] Fixes: `ae81aad5c2` ("MIPS: PCI: Use pci_enable_resources()") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: stable@vger.kernel.org # v6.18+ Link: https://patch.msgid.link/alpine.DEB.2.21.2510212001250.8377@angie.orcam.me.uk	2025-10-22 11:06:24 -05:00
Maciej W. Rozycki	bf5570590a	MIPS: Malta: Fix keyboard resource preventing i8042 driver from registering MIPS Malta platform code registers the PCI southbridge legacy port I/O PS/2 keyboard range as a standard resource marked as busy. It prevents the i8042 driver from registering as it fails to claim the resource in a call to i8042_platform_init(). Consequently PS/2 keyboard and mouse devices cannot be used with this platform. Fix the issue by removing the busy marker from the standard reservation, making the driver register successfully: serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 and the resource show up as expected among the legacy devices: 00000000-00ffffff : MSC PCI I/O 00000000-0000001f : dma1 00000020-00000021 : pic1 00000040-0000005f : timer 00000060-0000006f : keyboard 00000060-0000006f : i8042 00000070-00000077 : rtc0 00000080-0000008f : dma page reg 000000a0-000000a1 : pic2 000000c0-000000df : dma2 [...] If the i8042 driver has not been configured, then the standard resource will remain there preventing any conflicting dynamic assignment of this PCI port I/O address range. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/alpine.DEB.2.21.2510211919240.8377@angie.orcam.me.uk	2025-10-22 11:06:07 -05:00
Thomas Weißschuh	9aa12167ef	csky: abiv2: adapt to new folio flags field Recent changes require the raw folio flags to be accessed via ".f". The merge commit introducing this change adapted most architecture code but forgot the csky abiv2. [rppt@kernel.org: add fix for arch/csky/abiv2/cacheflush.c] Link: https://lkml.kernel.org/r/aPCE238oxAB9QcZa@kernel.org Fixes: `53fbef56e0` ("mm: introduce memdesc_flags_t") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Guo Ren <guoren@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-10-21 15:46:18 -07:00
Huang Ying	143937ca51	arm64, mm: avoid always making PTE dirty in pte_mkwrite() Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may mark some pages that are never written dirty wrongly. For example, do_swap_page() may map the exclusive pages with writable and clean PTEs if the VMA is writable and the page fault is for read access. However, current pte_mkwrite_novma() implementation always dirties the PTE. This may cause unnecessary disk writing if the pages are never written before being reclaimed. So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the PTE_DIRTY bit is set to make it possible to make the PTE writable and clean. The current behavior was introduced in commit `73e86cb03c` ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"). Before that, pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits are set. To test the performance impact of the patch, on an arm64 server machine, run 16 redis-server processes on socket 1 and 16 memtier_benchmark processes on socket 0 with mostly get transactions (that is, redis-server will mostly read memory only). The memory footprint of redis-server is larger than the available memory, so swap out/in will be triggered. Test results show that the patch can avoid most swapping out because the pages are mostly clean. And the benchmark throughput improves ~23.9% in the test. Fixes: `73e86cb03c` ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-10-21 15:00:25 +01:00
David Kaplan	204ced4108	x86/bugs: Qualify RETBLEED_INTEL_MSG When retbleed mitigation is disabled, the kernel already prints an info message that the system is vulnerable. Recent code restructuring also inadvertently led to RETBLEED_INTEL_MSG being printed as an error, which is unnecessary as retbleed mitigation was already explicitly disabled (by config option, cmdline, etc.). Qualify this print statement so the warning is not printed unless an actual retbleed mitigation was selected and is being disabled due to incompatibility with spectre_v2. Fixes: `e3b78a7ad5` ("x86/bugs: Restructure retbleed mitigation") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220624 Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251003171936.155391-1-david.kaplan@amd.com	2025-10-21 12:32:28 +02:00
Andrew Cooper	876f0d43af	x86/microcode: Fix Entrysign revision check for Zen1/Naples ... to match AMD's statement here: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7033.html Fixes: `50cef76d5c` ("x86/microcode/AMD: Load only SHA256-checksummed patches") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://patch.msgid.link/20251020144124.2930784-1-andrew.cooper3@citrix.com	2025-10-21 12:16:51 +02:00
Babu Moger	19de7113bf	x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode The following NULL pointer dereference is encountered on mount of resctrl fs after booting a system that supports assignable counters with the "rdt=!mbmtotal,!mbmlocal" kernel parameters: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:mbm_cntr_get Call Trace: rdtgroup_assign_cntr_event rdtgroup_assign_cntrs rdt_get_tree Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features and the MBM events they represent. This results in the per-domain MBM event related data structures to not be allocated during early initialization. resctrl fs initialization follows by implicitly enabling both MBM total and local events on a system that supports assignable counters (mbm_event mode), but this enabling occurs after the per-domain data structures have been created. After booting, resctrl fs assumes that an enabled event can access all its state. This results in NULL pointer dereference when resctrl attempts to access the un-allocated structures of an enabled event. Remove the late MBM event enabling from resctrl fs. This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter (mbm_event) mode is enabled without any events to support. Switching between the "default" and "mbm_event" mode without any events is not practical. Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL or X86_FEATURE_CQM_MBM_LOCAL. This ensures all needed MBM related data structures are created before use and that it is only possible to switch between "default" and "mbm_event" mode when the same events are available in both modes. This dependency does not exist in the hardware but this usage of these feature settings work for known systems. [ bp: Massage commit message. ] Fixes: `13390861b4` ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details") Co-developed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com	2025-10-20 18:06:31 +02:00
Linus Torvalds	c7864eeaa4	Merge tag 'x86_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Reset the why-the-system-rebooted register on AMD to avoid stale bits remaining from previous boots - Add a missing barrier in the TLB flushing code to prevent erroneously not flushing a TLB generation - Make sure cpa_flush() does not overshoot when computing the end range of a flush region - Fix resctrl bandwidth counting on AMD systems when the amount of monitoring groups created exceeds the number the hardware can track * tag 'x86_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Prevent reset reasons from being retained across reboot x86/mm: Fix SMP ordering in switch_mm_irqs_off() x86/mm: Fix overflow in __cpa_addr() x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID	2025-10-19 04:41:27 -10:00
Linus Torvalds	02e5f74ef0	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the handling of ZCR_EL2 in NV VMs - Pick the correct translation regime when doing a PTW on the back of a SEA - Prevent userspace from injecting an event into a vcpu that isn't initialised yet - Move timer save/restore to the sysreg handling code, fixing EL2 timer access in the process - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug - Fix trapping configuration when the host isn't GICv3 - Improve the detection of HCR_EL2.E2H being RES1 - Drop a spurious 'break' statement in the S1 PTW - Don't try to access SPE when owned by EL3 Documentation updates: - Document the failure modes of event injection - Document that a GICv3 guest can be created on a GICv5 host with FEAT_GCIE_LEGACY Selftest improvements: - Add a selftest for the effective value of HCR_EL2.AMO - Address build warning in the timer selftest when building with clang - Teach irqfd selftests about non-x86 architectures - Add missing sysregs to the set_id_regs selftest - Fix vcpu allocation in the vgic_lpi_stress selftest - Correctly enable interrupts in the vgic_lpi_stress selftest x86: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit `3ccbf6f470` ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. guest_memfd: - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits) arm64: Revamp HCR_EL2.E2H RES1 detection KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available KVM: arm64: Compute per-vCPU FGTs at vcpu_load() KVM: arm64: selftests: Fix misleading comment about virtual timer encoding KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit KVM: arm64: Kill leftovers of ad-hoc timer userspace access KVM: arm64: Fix WFxT handling of nested virt KVM: arm64: Move CNTCT_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT_CVAL_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT_CTL_EL0 userspace accessors to generic infrastructure KVM: arm64: Add timer UAPI workaround to sysreg infrastructure KVM: arm64: Make timer_set_offset() generally accessible KVM: arm64: Replace timer context vcpu pointer with timer_id KVM: arm64: Introduce timer_context_to_vcpu() helper KVM: arm64: Hide CNTHV__EL2 from userspace for nVHE guests Documentation: KVM: Update GICv3 docs for GICv5 hosts KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress KVM: arm64: selftests: Allocate vcpus with correct size ...	2025-10-18 07:07:14 -10:00
Linus Torvalds	0e622c4b0e	Merge tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix to handle NULL pointer dereference at irq domain teardown - Fix for handling extraction of struct xive_irq_data - Fix to skip parameter area allocation when fadump disabled Thanks to Ganesh Goudar, Hari Bathini, Nam Cao, Ritesh Harjani (IBM), Sourabh Jain, and Venkat Rao Bagalkote, * tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/fadump: skip parameter area allocation when fadump is disabled powerpc, ocxl: Fix extraction of struct xive_irq_data powerpc/pseries/msi: Fix NULL pointer dereference at irq domain teardown	2025-10-18 07:02:28 -10:00
Paul Walmsley	b7776a802f	riscv: hwprobe: avoid uninitialized variable use in hwprobe_arch_id() Resolve this smatch warning: arch/riscv/kernel/sys_hwprobe.c:50 hwprobe_arch_id() error: uninitialized symbol 'cpu_id'. This could happen if hwprobe_arch_id() was called with a key ID of something other than MVENDORID, MIMPID, and MARCHID. This does not happen in the current codebase. The only caller of hwprobe_arch_id() is a function that only passes one of those three key IDs. For the sake of reducing static analyzer warning noise, and in the unlikely event that hwprobe_arch_id() is someday called with some other key ID, validate hwprobe_arch_id()'s input to ensure that 'cpu_id' is always initialized before use. Fixes: `ea3de9ce8a` ("RISC-V: Add a syscall for HW probing") Cc: Evan Green <evan@rivosinc.com> Signed-off-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/cf5a13ec-19d0-9862-059b-943f36107bf3@kernel.org	2025-10-18 09:36:36 -06:00
Paul Walmsley	2dc99ea272	riscv: cpufeature: avoid uninitialized variable in has_thead_homogeneous_vlenb() In has_thead_homogeneous_vlenb(), smatch detected that the vlenb variable could be used while uninitialized. It appears that this could happen if no CPUs described in DT have the "thead,vlenb" property. Fix by initializing vlenb to 0, which will keep thead_vlenb_of set to 0 (as it was statically initialized). This in turn will cause riscv_v_setup_vsize() to fall back to CSR probing - the desired result if thead,vlenb isn't provided in the DT data. While here, fix a nearby comment typo. Cc: stable@vger.kernel.org Cc: Charlie Jenkins <charlie@rivosinc.com> Fixes: `377be47f90` ("riscv: vector: Use vlenb from DT for thead") Signed-off-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/22674afb-2fe8-2a83-1818-4c37bd554579@kernel.org	2025-10-18 09:36:11 -06:00
Paolo Bonzini	4361f5aa8b	Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD KVM x86 fixes for 6.18: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit `3ccbf6f470` ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabbilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory.	2025-10-18 10:25:43 +02:00
Jingwei Wang	5d15d2ad36	riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF, is determined by an asynchronous kthread. This can create a race condition where the kthread finishes after the vDSO data has already been populated, causing userspace to read stale values. To fix this race, a new 'ready' flag is added to the vDSO data, initialized to 'false' during arch_initcall_sync. This flag is checked by both the vDSO's user-space code and the riscv_hwprobe syscall. The syscall serves as a one-time gate, using a completion to wait for any pending probes before populating the data and setting the flag to 'true', thus ensuring userspace reads fresh values on its first request. Reported-by: Tsukasa OI <research_trasio@irq.a4lg.com> Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@irq.a4lg.com/ Fixes: `e7c9d66e31` ("RISC-V: Report vector unaligned access speed hwprobe") Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Olof Johansson <olof@lixom.net> Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Co-developed-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Jingwei Wang <wangjingwei@iscas.ac.cn> Link: https://lore.kernel.org/r/20250811142035.105820-1-wangjingwei@iscas.ac.cn [pjw@kernel.org: fix checkpatch issues] Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-10-17 22:23:11 -06:00
Paul Walmsley	492c513ec6	riscv: add a forward declaration for cpuinfo_op Add a forward declaration for cpuinfo_op to resolve a sparse warning. Link: https://lore.kernel.org/r/b831f349-5d0c-f7ac-8362-acb20bc6221a@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-10-17 22:10:01 -06:00
Anup Patel	d2721bb165	RISC-V: Don't print details of CPUs disabled in DT Early boot stages may disable CPU DT nodes for unavailable CPUs based on SKU, pinstraps, eFuse, etc. Currently, the riscv_early_of_processor_hartid() prints details of a CPU if it is disabled in DT which has no value and gives a false impression to the users that there some issue with the CPU. Fixes: `e3d794d555` ("riscv: treat cpu devicetree nodes without status as enabled") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20251014163009.182381-1-apatel@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-10-17 22:02:21 -06:00
Samuel Holland	768e054de0	riscv: Remove the PER_CPU_OFFSET_SHIFT macro __per_cpu_offset is an array of unsigned long, so we can reuse the existing RISCV_LGPTR macro. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20251015225604.3860409-1-samuel.holland@sifive.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-10-17 22:00:29 -06:00
Samuel Holland	5898fc01ff	riscv: mm: Define MAX_POSSIBLE_PHYSMEM_BITS for zsmalloc This definition is used by zsmalloc to optimize memory allocation. On riscv64, it is the same as MAX_PHYSMEM_BITS from asm/sparsemem.h, but that definition depends on CONFIG_SPARSEMEM. The correct definition is already provided for riscv32. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20251015233327.3885003-1-samuel.holland@sifive.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-10-17 22:00:29 -06:00
Samuel Holland	223bfc4d40	riscv: Register IPI IRQs with unique names This allows different IPIs to be distinguished in tracing output. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20251016003244.3910332-1-samuel.holland@sifive.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-10-17 22:00:29 -06:00
Anup Patel	ca525d53f9	RISC-V: Define pgprot_dmacoherent() for non-coherent devices The pgprot_dmacoherent() is used when allocating memory for non-coherent devices and by default pgprot_dmacoherent() is same as pgprot_noncached() unless architecture overrides it. Currently, there is no pgprot_dmacoherent() definition for RISC-V hence non-coherent device memory is being mapped as IO thereby making CPU access to such memory slow. Define pgprot_dmacoherent() to be same as pgprot_writecombine() for RISC-V so that CPU access non-coherent device memory as NOCACHE which is better than accessing it as IO. Fixes: `ff689fd21c` ("riscv: add RISC-V Svpbmt extension support") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Tested-by: Han Gao <rabenda.cn@gmail.com> Tested-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Link: https://lore.kernel.org/r/20250820152316.1012757-1-apatel@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-10-17 21:30:05 -06:00
Linus Torvalds	f406055cb1	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Explicitly encode the XZR register if the value passed to write_sysreg_s() is 0. The GIC CDEOI instruction is encoded as a system register write with XZR as the source register. However, clang does not honour the "Z" register constraint, leading to incorrect code generation - Ensure the interrupts (DAIF.IF) are unmasked when completing single-step of a suspended breakpoint before calling exit_to_user_mode(). With pseudo-NMIs, interrupts are (additionally) masked at the PMR_EL1 register, handled by local_irq_() tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: debug: always unmask interrupts in el0_softstp() arm64/sysreg: Fix GIC CDEOI instruction encoding	2025-10-17 13:04:21 -10:00
Linus Torvalds	fe69107ec7	Merge tag 'riscv-for-linux-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: - Disable CFI with Rust for any platform other than x86 and ARM64 - Keep task mm_cpumasks up-to-date to avoid triggering M-mode firmware warnings if the kernel tries to send an IPI to an offline CPU - Improve kprobe address validation performance and avoid desyncs (following x86) - Avoid duplicate device probes by avoiding DT hardware probing when ACPI is enabled in early boot - Use the correct set of dependencies for CONFIG_ARCH_HAS_ELF_CORE_EFLAGS, avoiding an allnoconfig warning - Fix a few other minor issues * tag 'riscv-for-linux-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: kprobes: convert one final __ASSEMBLY__ to __ASSEMBLER__ riscv: Respect dependencies of ARCH_HAS_ELF_CORE_EFLAGS riscv: acpi: avoid errors caused by probing DT devices when ACPI is used riscv: kprobes: Fix probe address validation riscv: entry: fix typo in comment 'instruciton' -> 'instruction' RISC-V: clear hot-unplugged cores from all task mm_cpumasks to avoid rfence errors riscv: kgdb: Ensure that BUFMAX > NUMREGBYTES rust: cfi: only 64-bit arm and x86 support CFI_CLANG	2025-10-17 12:59:31 -10:00
Ada Couprie Diaz	ea0d55ae4b	arm64: debug: always unmask interrupts in el0_softstp() We intend that EL0 exception handlers unmask all DAIF exceptions before calling exit_to_user_mode(). When completing single-step of a suspended breakpoint, we do not call local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(), leaving all DAIF exceptions masked. When pseudo-NMIs are not in use this is benign. When pseudo-NMIs are in use, this is unsound. At this point interrupts are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag manipulation may not work correctly. For example, a subsequent local_irq_enable() within exit_to_user_mode_loop() will only unmask interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and anything depending on interrupts being unmasked (e.g. delivery of signals) will not work correctly. This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING. Move the call to `try_step_suspended_breakpoints()` outside of the check so that interrupts can be unmasked even if we don't call the step handler. Fixes: `0ac7584c08` ("arm64: debug: split single stepping exception entry") Cc: <stable@vger.kernel.org> # 6.17 Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-10-17 18:08:05 +01:00
Lorenzo Pieralisi	e9ad390a48	arm64/sysreg: Fix GIC CDEOI instruction encoding The GIC CDEOI system instruction requires the Rt field to be set to 0b11111 otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE. Currenly, its usage is encoded as a system register write, with a constant 0 value: write_sysreg_s(0, GICV5_OP_GIC_CDEOI) While compiling with GCC, the 0 constant value, through these asm constraints and modifiers ('x' modifier and 'Z' constraint combo): asm volatile(__msr_s(r, "%x0") : : "rZ" (__val)); forces the compiler to issue the XZR register for the MSR operation (ie that corresponds to Rt == 0b11111) issuing the right instruction encoding. Unfortunately LLVM does not yet understand that modifier/constraint combo so it ends up issuing a different register from XZR for the MSR source, which in turns means that it encodes the GIC CDEOI instruction wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE that we must prevent. Add a conditional to write_sysreg_s() macro that detects whether it is passed a constant 0 value and issues an MSR write with XZR as source register - explicitly doing what the asm modifier/constraint is meant to achieve through constraints/modifiers, fixing the LLVM compilation issue. Fixes: `7ec80fb3f0` ("irqchip/gic-v5: Add GICv5 PPI support") Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Cc: Sascha Bischoff <sascha.bischoff@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-10-17 17:55:01 +01:00
Rong Zhang	e6416c2dfe	x86/CPU/AMD: Prevent reset reasons from being retained across reboot The S5_RESET_STATUS register is parsed on boot and printed to kmsg. However, this could sometimes be misleading and lead to users wasting a lot of time on meaningless debugging for two reasons: * Some bits are never cleared by hardware. It's the software's responsibility to clear them as per the Processor Programming Reference (see [1]). * Some rare hardware-initiated platform resets do not update the register at all. In both cases, a previous reboot could leave its trace in the register, resulting in users seeing unrelated reboot reasons while debugging random reboots afterward. Write the read value back to the register in order to clear all reason bits since they are write-1-to-clear while the others must be preserved. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=206537#attach_303991 [ bp: Massage commit message. ] Fixes: `ab81310287` ("x86/CPU/AMD: Print the reason for the last reset") Signed-off-by: Rong Zhang <i@rong.moe> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/all/20250913144245.23237-1-i@rong.moe/	2025-10-15 21:38:06 +02:00
Linus Torvalds	1f4a222b0e	Remove long-stale ext3 defconfig option Inspired by commit `c065b6046b` ("Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs") I looked around for any other left-over EXT3 config options, and found some old defconfig files still mentioned CONFIG_EXT3_DEFAULTS_TO_ORDERED. That config option was removed a decade ago in commit `c290ea01ab` ("fs: Remove ext3 filesystem driver"). It had a good run, but let's remove it for good. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-10-15 07:57:28 -07:00
Linus Torvalds	66f8e4df00	Merge tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: - Fix regression caused by removing CONFIG_EXT3_FS when testing some very old defconfigs - Avoid a BUG_ON when opening a file on a maliciously corrupted file system - Avoid mm warnings when freeing a very large orphan file metadata - Avoid a theoretical races between metadata writeback and checkpoints (it's very hard to hit in practice, since the race requires that the writeback take a very long time) * tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs ext4: free orphan info with kvfree ext4: detect invalid INLINE_DATA + EXTENTS flag combination ext4, doc: fix and improve directory hash tree description ext4: wait for ongoing I/O to complete before freeing blocks jbd2: ensure that all ongoing I/O complete before freeing blocks	2025-10-15 07:51:57 -07:00
Marc Zyngier	ca88ecdce5	arm64: Revamp HCR_EL2.E2H RES1 detection We currently have two ways to identify CPUs that only implement FEAT_VHE and not FEAT_E2H0: - either they advertise it via ID_AA64MMFR4_EL1.E2H0, - or the HCR_EL2.E2H bit is RAO/WI However, there is a third category of "cpus" that fall between these two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value is zero. A consequence of this is that on systems such as Neoverse V2, a NV guest cannot reliably detect that it is in a VHE-only configuration (E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's best effort to repaint the id register. Replace the RAO/WI test by a sequence that makes use of the VHE register remnapping between EL1 and EL2 to detect this situation, and work out whether we get the VHE behaviour even after having set HCR_EL2.E2H to 0. This solves the NV problem, and provides a more reliable acid test for CPUs that do not completely follow the letter of the architecture while providing a RES1 behaviour for HCR_EL2.E2H. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Jan Kotas <jank@cadence.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com	2025-10-14 08:18:40 +01:00
Theodore Ts'o	c065b6046b	Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs Commit `d6ace46c82` ("ext4: remove obsolete EXT3 config options") removed the obsolete EXT3_CONFIG options, since it had been over a decade since fs/ext3 had been removed. Unfortunately, there were a number of defconfigs that still used CONFIG_EXT3_FS which the cleanup commit didn't fix up. This led to a large number of defconfig test builds to fail. Oops. Fixes: `d6ace46c82` ("ext4: remove obsolete EXT3 config options") Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-10-13 21:50:40 -04:00
Ingo Molnar	83b0177a6c	x86/mm: Fix SMP ordering in switch_mm_irqs_off() Stephen noted that it is possible to not have an smp_mb() between the loaded_mm store and the tlb_gen load in switch_mm(), meaning the ordering against flush_tlb_mm_range() goes out the window, and it becomes possible for switch_mm() to not observe a recent tlb_gen update and fail to flush the TLBs. [ dhansen: merge conflict fixed by Ingo ] Fixes: `209954cbc7` ("x86/mm/tlb: Update mm_cpumask lazily") Reported-by: Stephen Dolan <sdolan@janestreet.com> Closes: https://lore.kernel.org/all/CAHDw0oGd0B4=uuv8NGqbUQ_ZVmSheU2bN70e4QhFXWvuAZdt2w@mail.gmail.com/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>	2025-10-13 13:55:53 -07:00
Rik van Riel	f25785f9b0	x86/mm: Fix overflow in __cpa_addr() The change to have cpa_flush() call flush_kernel_pages() introduced a bug where __cpa_addr() can access an address one larger than the largest one in the cpa->pages array. KASAN reports the issue like this: BUG: KASAN: slab-out-of-bounds in __cpa_addr arch/x86/mm/pat/set_memory.c:309 [inline] BUG: KASAN: slab-out-of-bounds in __cpa_addr+0x1d3/0x220 arch/x86/mm/pat/set_memory.c:306 Read of size 8 at addr ffff88801f75e8f8 by task syz.0.17/5978 This bug could cause cpa_flush() to not properly flush memory, which somehow never showed any symptoms in my tests, possibly because cpa_flush() is called so rarely, but could potentially cause issues for other people. Fix the issue by directly calculating the flush end address from the start address. Fixes: `86e6815b31` ("x86/mm: Change cpa_flush() to call flush_kernel_range() directly") Reported-by: syzbot+afec6555eef563c66c97@syzkaller.appspotmail.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Link: https://lore.kernel.org/all/68e2ff90.050a0220.2c17c1.0038.GAE@google.com/	2025-10-13 13:55:48 -07:00
Babu Moger	15292f1b4c	x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID Users can create as many monitoring groups as the number of RMIDs supported by the hardware. However, on AMD systems, only a limited number of RMIDs are guaranteed to be actively tracked by the hardware. RMIDs that exceed this limit are placed in an "Unavailable" state. When a bandwidth counter is read for such an RMID, the hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable remains set on first read after tracking re-starts and is clear on all subsequent reads as long as the RMID is tracked. resctrl miscounts the bandwidth events after an RMID transitions from the "Unavailable" state back to being tracked. This happens because when the hardware starts counting again after resetting the counter to zero, resctrl in turn compares the new count against the counter value stored from the previous time the RMID was tracked. This results in resctrl computing an event value that is either undercounting (when new counter is more than stored counter) or a mistaken overflow (when new counter is less than stored counter). Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to zero whenever the RMID is in the "Unavailable" state to ensure accurate counting after the RMID resets to zero when it starts to be tracked again. Example scenario that results in mistaken overflow ================================================== 1. The resctrl filesystem is mounted, and a task is assigned to a monitoring group. $mount -t resctrl resctrl /sys/fs/resctrl $mkdir /sys/fs/resctrl/mon_groups/test1/ $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes 21323 <- Total bytes on domain 0 "Unavailable" <- Total bytes on domain 1 Task is running on domain 0. Counter on domain 1 is "Unavailable". 2. The task runs on domain 0 for a while and then moves to domain 1. The counter starts incrementing on domain 1. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes 7345357 <- Total bytes on domain 0 4545 <- Total bytes on domain 1 3. At some point, the RMID in domain 0 transitions to the "Unavailable" state because the task is no longer executing in that domain. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes "Unavailable" <- Total bytes on domain 0 434341 <- Total bytes on domain 1 4. Since the task continues to migrate between domains, it may eventually return to domain 0. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1 In this case, the RMID on domain 0 transitions from "Unavailable" state to active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when the counter is read and begins tracking the RMID counting from 0. Subsequent reads succeed but return a value smaller than the previously saved MSR value (7345357). Consequently, the resctrl's overflow logic is triggered, it compares the previous value (7345357) with the new, smaller value and incorrectly interprets this as a counter overflow, adding a large delta. In reality, this is a false positive: the counter did not overflow but was simply reset when the RMID transitioned from "Unavailable" back to active state. Here is the text from APM [1] available from [2]. "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the first QM_CTR read when it begins tracking an RMID that it was not previously tracking. The U bit will be zero for all subsequent reads from that RMID while it is still tracked by the hardware. Therefore, a QM_CTR read with the U bit set when that RMID is in use by a processor can be considered 0 when calculating the difference with a subsequent read." [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory Bandwidth (MBM). [ bp: Split commit message into smaller paragraph chunks for better consumption. ] Fixes: `4d05bf71f1` ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org # needs adjustments for <= v6.17 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]	2025-10-13 21:24:39 +02:00
Stefan Wahren	4adc20ba95	ARM: dts: broadcom: rpi: Switch to V3D firmware clock Until commit `919d6924ae` ("clk: bcm: rpi: Turn firmware clock on/off when preparing/unpreparing") the clk-raspberrypi driver wasn't able to change the state of the V3D clock. Only the clk-bcm2835 was able to do this before. After this commit both drivers were able to work against each other, which could result in a system freeze. One step to avoid this conflict is to switch all V3D consumer to the firmware clock. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/linux-arm-kernel/727aa0c8-2981-4662-adf3-69cac2da956d@samsung.com/ Fixes: `919d6924ae` ("clk: bcm: rpi: Turn firmware clock on/off when preparing/unpreparing") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Co-developed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251005113816.6721-1-wahrenst@gmx.net Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-13 10:31:25 -07:00
Peter Robinson	aa960b5976	arm64: dts: broadcom: bcm2712: Define VGIC interrupt Define the interrupt in the GICv2 for vGIC so KVM can be used, it was missed from the original upstream DTB for some reason. Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Cc: Andrea della Porta <andrea.porta@suse.com> Cc: Phil Elwell <phil@raspberrypi.com> Fixes: `faa3381267` ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5") Link: https://lore.kernel.org/r/20250924085612.1039247-1-pbrobinson@gmail.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-13 10:31:04 -07:00
Oliver Upton	e0b5a7967d	KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available Marc reports that the performance of running an L3 guest has regressed by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture. That's of course terrible for the single user of recursive NV ;-) While there's nothing to be done on non-FGT systems, take advantage of the precise write trap of MDSCR_EL1 and leave the rest of the debug registers untrapped. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:44:37 +01:00
Oliver Upton	fb10ddf35c	KVM: arm64: Compute per-vCPU FGTs at vcpu_load() To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:44:37 +01:00
Marc Zyngier	386aac77da	KVM: arm64: Kill leftovers of ad-hoc timer userspace access Now that the whole timer infrastructure is handled as system register accesses, get rid of the now unused ad-hoc infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	892f7c38ba	KVM: arm64: Fix WFxT handling of nested virt The spec for WFxT indicates that the parameter to the WFxT instruction is relative to the reading of CNTVCT_EL0. This means that the implementation needs to take the execution context into account, as CNTVOFF_EL2 does not always affect readings of CNTVCT_EL0 (such as when HCR_EL2.E2H is 1 and that we're in host context). This also rids us of the last instance of KVM_REG_ARM_TIMER_CNT outside of the userspace interaction code. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	c3be3a48fb	KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure Moving the counter registers is a bit more involved than for the control and comparator (there is no shadow data for the counter), but still pretty manageable. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	8af198980e	KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure As for the control registers, move the comparator registers to the common infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00

1 2 3 4 5 ...

238326 Commits