linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-17 23:37:10 -04:00

Author	SHA1	Message	Date
Linus Torvalds	40286d6379	Merge tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Allow TLP Processing Hints to be enabled for RCiEPs (George Abraham P) - Enable AtomicOps only if we know the Root Port supports them (Gerd Bayer) - Don't enable AtomicOps for RCiEPs since none of them need Atomic Ops and we can't tell whether the Root Complex would support them (Gerd Bayer) - Leave Precision Time Measurement disabled until a driver enables it to avoid PCIe errors (Mika Westerberg) - Make pci_set_vga_state() fail if bridge doesn't support VGA routing, i.e., PCI_BRIDGE_CTL_VGA is not writable, and return errors to vga_get() callers including userspace via /dev/vga_arbiter (Simon Richter) - Validate max-link-speed from DT in j721e, brcmstb, mediatek-gen3, rzg3s drivers (where the actual controller constraints are known), and remove validation from the generic OF DT accessor (Hans Zhang) - Remove pc110pad driver (no longer useful after 486 CPU support removed) and no_pci_devices() (pc110pad was the last user) (Dmitry Torokhov, Heiner Kallweit) Resource management: - Prevent assigning space to unimplemented bridge windows; previously we mistakenly assumed prefetchable window existed and assigned space and put a BAR there (Ahmed Naseef) - Avoid shrinking bridge windows to fit in the initial Root Port window; fixes one problem with devices with large BARs connected via switches, e.g., Thunderbolt (Ilpo Järvinen) - Pass full extent of empty space, not just the aligned space, to resource_alignf callback so free space before the requested alignment can be used (Ilpo Järvinen) - Place small resources before larger ones for better utilization of address space (Ilpo Järvinen) - Fix alignment calculation for resource size larger than align, e.g., bridge windows larger than the 1MB required alignment (Ilpo Järvinen) Reset: - Update slot handling so all ARI functions are treated as being in the same slot. They're all reset by Secondary Bus Reset, but previously drivers of ARI functions that appeared to be on a non-zero device weren't notified and fatal hardware errors could result (Keith Busch) - Make sysfs reset_subordinate hotplug safe to avoid spurious hotplug events (Keith Busch) - Hide Secondary Bus Reset ('bus') from sysfs reset_methods if masked by CXL because it has no effect (Vidya Sagar) - Avoid FLR for AMD NPU device, where it causes the device to hang (Lizhi Hou) Error handling: - Clear only error bits in PCIe Device Status to avoid accidentally clearing Emergency Power Reduction Detected (Shuai Xue) - Check for AER errors even in devices without drivers (Lukas Wunner) - Initialize ratelimit info so DPC and EDR paths log AER error information (Kuppuswamy Sathyanarayanan) Power control: - Add UPD720201/UPD720202 USB 3.0 xHCI Host Controller .compatible so generic pwrctrl driver can control it (Neil Armstrong) Hotplug: - Set LED_HW_PLUGGABLE for NPEM hotplug-capable ports so LED core doesn't complain when setting brightness fails because the endpoint is gone (Richard Cheng) Peer-to-peer DMA: - Allow wildcards in list of host bridges that support peer-to-peer DMA between hierarchy domains and add all Google SoCs (Jacob Moroni) Endpoint framework: - Advertise dynamic inbound mapping support in pci-epf-test and update host pci_endpoint_test to skip doorbell testing if not advertised by endpoint (Koichiro Den) - Return 0, not remaining timeout, when MHI eDMA ops complete so mhi_ep_ring_add_element() doesn't interpret non-zero as failure (Daniel Hodges) - Remove vntb and ntb duplicate resource teardown that leads to oops when .allow_link() fails or .drop_link() is called (Koichiro Den) - Disable vntb delayed work before clearing BAR mappings and doorbells to avoid oops caused by doing the work after resources have been torn down (Koichiro Den) - Add a way to describe reserved subregions within BARs, e.g., platform-owned fixed register windows, and use it for the RK3588 BAR4 DMA ctrl window (Koichiro Den) - Add BAR_DISABLED for BARs that will never be available to an EPF driver, and change some BAR_RESERVED annotations to BAR_DISABLED (Niklas Cassel) - Add NTB .get_dma_dev() callback for cases where DMA API requires a different device, e.g., vNTB devices (Koichiro Den) - Add reserved region types for MSI-X Table and PBA so Endpoint controllers can them as describe hardware-owned regions in a BAR_RESERVED BAR (Manikanta Maddireddy) - Make Tegra194/234 BAR0 programmable and remove 1MB size limit (Manikanta Maddireddy) - Expose Tegra BAR2 (MSI-X) and BAR4 (DMA) as 64-bit BAR_RESERVED (Manikanta Maddireddy) - Add Tegra194 and Tegra234 device table entries to pci_endpoint_test (Manikanta Maddireddy) - Skip the BAR subrange selftest if there are not enough inbound window resources to run the test (Christian Bruel) New native PCIe controller drivers: - Add DT binding and driver for Andes QiLai SoC PCIe host controller (Randolph Lin) - Add DT binding and driver for ESWIN PCIe Root Complex (Senchuan Zhang) Baikal T-1 PCIe controller driver: - Remove driver since it never quite became usable (Andy Shevchenko) Cadence PCIe controller driver: - Implement byte/word config reads with dword (32-bit) reads because some Cadence controllers don't support sub-dword accesses (Aksh Garg) CIX Sky1 PCIe controller driver: - Add 'power-domains' to DT binding for SCMI power domain (Gary Yang) Freescale i.MX6 PCIe controller driver: - Add i.MX94 and i.MX943 to fsl,imx6q-pcie-ep DT binding (Richard Zhu) - Delay instead of polling for L2/L3 Ready after PME_Turn_off when suspending i.MX6SX because LTSSM registers are inaccessible (Richard Zhu) - Separate PERST# assertion (for resetting endpoints) from core reset (for resetting the RC itself) to prepare for new DTs with PERST# GPIO in per-Root Port nodes (Sherry Sun) - Retain Root Port MSI capability on i.MX7D, i.MX8MM, and i.MX8MQ so MSI from downstream devices will work (Richard Zhu) - Fix i.MX95 reference clock source selection when internal refclk is used (Franz Schnyder) Freescale Layerscape PCIe controller driver: - Allow building as a removable module (Sascha Hauer) MediaTek PCIe Gen3 controller driver: - Use dev_err_probe() to simplify error paths and make deferred probe messages visible in /sys/kernel/debug/devices_deferred (Chen-Yu Tsai) - Power off device if setup fails (Chen-Yu Tsai) - Integrate new pwrctrl API to enable power control for WiFi/BT adapters on mainboard or in PCIe or M.2 slots (Chen-Yu Tsai) NVIDIA Tegra194 PCIe controller driver: - Poll less aggressively and non-atomically for PME_TO_Ack during transition to L2 (Vidya Sagar) - Disable LTSSM after transition to Detect on surprise link down to stop toggling between Polling and Detect (Manikanta Maddireddy) - Don't force the device into the D0 state before L2 when suspending or shutting down the controller (Vidya Sagar) - Disable PERST# IRQ only in Endpoint mode because it's not registered in Root Port mode (Manikanta Maddireddy) - Handle 'nvidia,refclk-select' as optional (Vidya Sagar) - Disable direct speed change in Endpoint mode so link speed change is controlled by the host (Vidya Sagar) - Set LTR values before link up to avoid bogus LTR messages with 0 latency (Vidya Sagar) - Allow system suspend when the Endpoint link is down (Vidya Sagar) - Use DWC IP core version, not Tegra custom values, to avoid DWC core version check warnings (Manikanta Maddireddy) - Apply ECRC workaround to devices based on DesignWare 5.00a as well as 4.90a (Manikanta Maddireddy) - Disable PM Substate L1.2 in Endpoint mode to work around Tegra234 erratum (Vidya Sagar) - Delay post-PERST# cleanup until core is powered on to avoid CBB timeout (Manikanta Maddireddy) - Assert CLKREQ# so switches that forward it to their downstream side can bring up those links successfully (Vidya Sagar) - Calibrate pipe to UPHY for Endpoint mode to reset stale PLL state from any previous bad link state (Vidya Sagar) - Remove IRQF_ONESHOT flag from Endpoint interrupt registration so DMA driver and Endpoint controller driver can share the interrupt line (Vidya Sagar) - Enable DMA interrupt to support DMA in both Root Port and Endpoint modes (Vidya Sagar) - Enable hardware link retraining after link goes down in Endpoint mode (Vidya Sagar) - Add DT binding and driver support for core clock monitoring (Vidya Sagar) Qualcomm PCIe controller driver: - Advertise 'Hot-Plug Capable' and set 'No Command Completed Support' since Qcom Root Ports support hotplug events like DL_Up/Down and can accept writes to Slot Control without delays between writes (Krishna Chaitanya Chundru) Renesas R-Car PCIe controller driver: - Mark Endpoint BAR0 and BAR2 as Resizable (Koichiro Den) - Reduce EPC BAR alignment requirement to 4K (Koichiro Den) Renesas RZ/G3S PCIe controller driver: - Add RZ/G3E to DT binding and to driver (John Madieu) - Assert (not deassert) resets in probe error path (John Madieu) - Assert resets in suspend path in reverse order they were deasserted during probe (John Madieu) - Rework inbound window algorithm to prevent mapping more than intended region and enforce alignment on size, to prepare for RZ/G3E support (John Madieu) Rockchip DesignWare PCIe controller driver: - Add tracepoints for PCIe controller LTSSM transitions and link rate changes (Shawn Lin) - Trace LTSSM events collected by the dw-rockchip debug FIFO (Shawn Lin) SOPHGO PCIe controller driver: - Disable ASPM L0s and L1 on Sophgo 2042 PCIe Root Ports that advertise support for them (Yao Zi) Synopsys DesignWare PCIe controller driver: - Continue with system suspend even if an Endpoint doesn't respond with PME_TO_Ack message (Manivannan Sadhasivam) - Set Endpoint MSI-X Table Size in the correct function of a multi-function device when configuring MSI-X, not in Function 0 (Aksh Garg) - Set Max Link Width and Max Link Speed for all functions of a multi-function device, not just Function 0 (Aksh Garg) - Expose PCIe event counters in groups 5-7 in debugfs (Hans Zhang) Miscellaneous: - Warn only once about invalid ACS kernel parameter format (Richard Cheng) - Suppress FW_BUG warning when writing sysfs 'numa_node' with the current value (Li RongQing) - Drop redundant 'depends on PCI' from Kconfig (Julian Braha)" * tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (165 commits) PCI/P2PDMA: Add Google SoCs to the P2P DMA host bridge list PCI/P2PDMA: Allow wildcard Device IDs in host bridge list PCI: sg2042: Avoid L0s and L1 on Sophgo 2042 PCIe Root Ports PCI: cadence: Add flags for disabling ASPM capability for broken Root Ports PCI: tegra194: Add core monitor clock support dt-bindings: PCI: tegra194: Add monitor clock support PCI: tegra194: Enable hardware hot reset mode in Endpoint mode PCI: tegra194: Enable DMA interrupt PCI: tegra194: Remove IRQF_ONESHOT flag during Endpoint interrupt registration PCI: tegra194: Calibrate pipe to UPHY for Endpoint mode PCI: tegra194: Assert CLKREQ# explicitly by default PCI: tegra194: Fix CBB timeout caused by DBI access before core power-on PCI: tegra194: Disable L1.2 capability of Tegra234 EP PCI: dwc: Apply ECRC workaround to DesignWare 5.00a as well PCI: tegra194: Use DWC IP core version PCI: tegra194: Free up Endpoint resources during remove() PCI: tegra194: Allow system suspend when the Endpoint link is not up PCI: tegra194: Set LTR message request before PCIe link up in Endpoint mode PCI: tegra194: Disable direct speed change for Endpoint mode PCI: tegra194: Use devm_gpiod_get_optional() to parse "nvidia,refclk-select" ...	2026-04-15 14:41:21 -07:00
Linus Torvalds	3203a08c12	Merge tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - powerpc support for huge pfnmaps - Cleanups to use masked user access - Rework pnv_ioda_pick_m64_pe() to use better bitmap API - Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC - Backup region offset update to eflcorehdr - Fixes for wii/ps3 platform - Implement JIT support for private stack in powerpc - Implement JIT support for fsession in powerpc64 trampoline - Add support for instruction array and indirect jump in powerpc - Misc selftest fixes and cleanups Thanks to Abhishek Dubey, Aditya Gupta, Alex Williamson, Amit Machhiwal, Andrew Donnellan, Bartosz Golaszewski, Cédric Le Goater, Chen Ni, Christophe Leroy (CS GROUP), Hari Bathini, J. Neuschäfer, Mukesh Kumar Chaurasiya (IBM), Nam Cao, Nilay Shroff, Pavithra Prakash, Randy Dunlap, Ritesh Harjani (IBM), Shrikanth Hegde, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote, and Yury Norov (NVIDIA) * tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (47 commits) mailmap: Add entry for Andrew Donnellan powerpc32/bpf: fix loading fsession func metadata using PPC_LI32 selftest/bpf: Enable gotox tests for powerpc64 powerpc64/bpf: Add support for indirect jump selftest/bpf: Enable instruction array test for powerpc powerpc/bpf: Add support for instruction array powerpc32/bpf: Add fsession support powerpc64/bpf: Implement fsession support selftests/bpf: Enable private stack tests for powerpc64 powerpc64/bpf: Implement JIT support for private stack powerpc: pci-ioda: Optimize pnv_ioda_pick_m64_pe() powerpc: pci-ioda: use bitmap_alloc() in pnv_ioda_pick_m64_pe() powerpc/net: Inline checksum wrappers and convert to scoped user access powerpc/sstep: Convert to scoped user access powerpc/align: Convert emulate_spe() to scoped user access powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access powerpc/futex: Use masked user access powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC cpuidle: powerpc: avoid double clear when breaking snooze powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings ...	2026-04-14 17:10:15 -07:00
Linus Torvalds	f21f7b5162	Merge tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vdso updates from Thomas Gleixner: - Make the handling of compat functions consistent and more robust - Rework the underlying data store so that it is dynamically allocated, which allows the conversion of the last holdout SPARC64 to the generic VDSO implementation - Rework the SPARC64 VDSO to utilize the generic implementation - Mop up the left overs of the non-generic VDSO support in the core code - Expand the VDSO selftest and make them more robust - Allow time namespaces to be enabled independently of the generic VDSO support, which was not possible before due to SPARC64 not using it - Various cleanups and improvements in the related code * tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) timens: Use task_lock guard in timens_get*() timens: Use mutex guard in proc_timens_set_offset() timens: Simplify some calls to put_time_ns() timens: Add a __free() wrapper for put_time_ns() timens: Remove dependency on the vDSO vdso/timens: Move functions to new file selftests: vDSO: vdso_test_correctness: Add a test for time() selftests: vDSO: vdso_test_correctness: Use facilities from parse_vdso.c selftests: vDSO: vdso_test_correctness: Handle different tv_usec types selftests: vDSO: vdso_test_correctness: Drop SYS_getcpu fallbacks selftests: vDSO: vdso_test_gettimeofday: Remove nolibc checks Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers" random: vDSO: Remove ifdeffery random: vDSO: Trim vDSO includes vdso/datapage: Trim down unnecessary includes vdso/datapage: Remove inclusion of gettimeofday.h vdso/helpers: Explicitly include vdso/processor.h vdso/gettimeofday: Add explicit includes random: vDSO: Add explicit includes MIPS: vdso: Explicitly include asm/vdso/vdso.h ...	2026-04-14 10:53:44 -07:00
Linus Torvalds	d568788baa	Merge tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - randomize_kstack: Improve implementation across arches (Ryan Roberts) - lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test - refcount: Remove unused __signed_wrap function annotations * tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test refcount: Remove unused __signed_wrap function annotations randomize_kstack: Unify random source across arches randomize_kstack: Maintain kstack_offset per task	2026-04-13 17:52:29 -07:00
Linus Torvalds	ef3da345cc	Merge tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - coredump: add tracepoint for coredump events - fs: hide file and bfile caches behind runtime const machinery Fixes: - fix architecture-specific compat_ftruncate64 implementations - dcache: Limit the minimal number of bucket to two - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START - fs/mbcache: cancel shrink work before destroying the cache - dcache: permit dynamic_dname()s up to NAME_MAX Cleanups: - remove or unexport unused fs_context infrastructure - trivial ->setattr cleanups - selftests/filesystems: Assume that TIOCGPTPEER is defined - writeback: fix kernel-doc function name mismatch for wb_put_many() - autofs: replace manual symlink buffer allocation in autofs_dir_symlink - init/initramfs.c: trivial fix: FSM -> Finite-state machine - fs: remove stale and duplicate forward declarations - readdir: Introduce dirent_size() - fs: Replace user_access_{begin/end} by scoped user access - kernel: acct: fix duplicate word in comment - fs: write a better comment in step_into() concerning .mnt assignment - fs: attr: fix comment formatting and spelling issues" * tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) dcache: permit dynamic_dname()s up to NAME_MAX fs: attr: fix comment formatting and spelling issues fs: hide file and bfile caches behind runtime const machinery fs: write a better comment in step_into() concerning .mnt assignment proc: rename proc_notify_change to proc_setattr proc: rename proc_setattr to proc_nochmod_setattr affs: rename affs_notify_change to affs_setattr adfs: rename adfs_notify_change to adfs_setattr hfs: update comments on hfs_inode_setattr kernel: acct: fix duplicate word in comment fs: Replace user_access_{begin/end} by scoped user access readdir: Introduce dirent_size() coredump: add tracepoint for coredump events fs: remove do_sys_truncate fs: pass on FTRUNCATE_* flags to do_truncate fs: fix archiecture-specific compat_ftruncate64 fs: remove stale and duplicate forward declarations init/initramfs.c: trivial fix: FSM -> Finite-state machine autofs: replace manual symlink buffer allocation in autofs_dir_symlink fs/mbcache: cancel shrink work before destroying the cache ...	2026-04-13 14:20:11 -07:00
Gaurav Batra	328335a794	powerpc/powernv/iommu: iommu incorrectly bypass DMA APIs In a PowerNV environment, for devices that supports DMA mask less than 64 bit but larger than 32 bits, iommu is incorrectly bypassing DMA APIs while allocating and mapping buffers for DMA operations. Devices are failing with ENOMEN during probe with the following messages amdgpu 0000:01:00.0: [drm] Detected VRAM RAM=4096M, BAR=4096M amdgpu 0000:01:00.0: [drm] RAM width 128bits GDDR5 amdgpu 0000:01:00.0: iommu: 64-bit OK but direct DMA is limited by 0 amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff amdgpu 0000:01:00.0: 4096M of VRAM memory ready amdgpu 0000:01:00.0: 32570M of GTT memory ready. amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo amdgpu 0000:01:00.0: [drm] Debug VRAM access will use slowpath MM access amdgpu 0000:01:00.0: [drm] GART: num cpu pages 4096, num gpu pages 65536 amdgpu 0000:01:00.0: [drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000). amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo amdgpu 0000:01:00.0: (-12) create WB bo failed amdgpu 0000:01:00.0: amdgpu_device_wb_init failed -12 amdgpu 0000:01:00.0: amdgpu_device_ip_init failed amdgpu 0000:01:00.0: Fatal error during GPU init amdgpu 0000:01:00.0: finishing device. amdgpu 0000:01:00.0: probe with driver amdgpu failed with error -12 amdgpu 0000:01:00.0: ttm finalized Fixes: `1471c517cf` ("powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory") Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reported-by: Dan Horák <dan@danny.cz> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5039 Tested-by: Dan Horak <dan@danny.cz> Closes: https://lore.kernel.org/linuxppc-dev/20260313142351.609bc4c3efe1184f64ca5f44@danny.cz/ Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com> Closes: https://lore.kernel.org/linuxppc-dev/20260313142351.609bc4c3efe1184f64ca5f44@danny.cz/ [Maddy: Fixed tags] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260331223022.47488-1-gbatra@linux.ibm.com	2026-04-01 22:08:55 +05:30
Christophe Leroy (CS GROUP)	bf53ede003	powerpc/align: Convert emulate_spe() to scoped user access Commit `861574d51b` ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert emulate_spe() to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/4ff83cb240da4e2d0c34e2bce4b8b6ef19a33777.1773136880.git.chleroy@kernel.org	2026-04-01 09:21:07 +05:30
Christophe Leroy (CS GROUP)	679fa9c756	powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access Commit `861574d51b` ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert gpr32_set_common_user() to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Also changes label from Efault to efault to avoid checkpatch complaining about CamelCase. Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/2409643daab08b4bc07004c2b88f42085d1ef45a.1773136838.git.chleroy@kernel.org	2026-04-01 09:21:07 +05:30
Christophe Leroy	f26ad12356	powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC Commit `e65e1fc2d2` ("[PATCH] syscall class hookup for all normal targets") added generic support for AUDIT but that didn't include support for bi-arch like powerpc. Commit `4b58841149` ("audit: Add generic compat syscall support") added generic support for bi-arch. Convert powerpc to that bi-arch generic audit support. With this change generated text is similar. Thomas has confirmed that the previously failing filter_exclude/test is now successful both without and with this patch, see [1] [1] https://lore.kernel.org/all/20260306115350-ef265661-6d6b-4043-9bd0-8e6b437d0d67@linutronix.de/ Link: https://github.com/linuxppc/issues/issues/412 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/261b1be5b8dc526b83d73e8281e682a73536ea28.1773155031.git.chleroy@kernel.org	2026-04-01 09:21:06 +05:30
Randy Dunlap	7695a4e12e	powerpc: kgdb: fix kernel-doc warnings Remove empty comment line at the beginning of a kernel-doc function block. Add a "Return:" section for this function. These changes prevent 2 kernel-doc warnings: Warning: ../arch/powerpc/kernel/kgdb.c:103 Cannot find identifier on line: * Warning: kgdb.c:113 No description found for return value of 'kgdb_skipexception' Fixes: `949616cf2d` ("powerpc/kgdb: Bail out of KGDB when we've been triggered") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260225055314.247966-1-rdunlap@infradead.org	2026-04-01 09:21:06 +05:30
Ilpo Järvinen	9036bd0efc	PCI: Align head space better When a bridge window contains big and small resource(s), the small resource(s) may not amount to the half of the size of the big resource which would allow calculate_head_align() to shrink the head alignment. This results in always placing the small resource(s) after the big resource. In general, it would be good to be able to place the small resource(s) before the big resource to achieve better utilization of the address space. In the cases where the large resource can only fit at the end of the window, it is even required. However, carrying the information over from pbus_size_mem() and calculate_head_align() to __pci_assign_resource() and pcibios_align_resource() is not easy with the current data structures. A somewhat hacky way to move the non-aligning tail part to the head is possible within pcibios_align_resource(). The free space between the start of the free space span and the aligned start address can be compared with the non-aligning remainder of the size. If the free space is larger than the remainder, placing the remainder before the start address is possible. This relocation should generally work, because PCI resources consist only power-of-2 atoms. Various arch requirements may still need to override the relocation, so the relocation is only applied selectively in such cases. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221205 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xifer <xiferdev@gmail.com> Link: https://patch.msgid.link/20260324165633.4583-10-ilpo.jarvinen@linux.intel.com	2026-03-27 10:19:08 -05:00
Ilpo Järvinen	f699bcc8bc	resource: Pass full extent of empty space to resource_alignf callback __find_resource_space() calculates the full extent of empty space but only passes the aligned space to resource_alignf callback. In some situations, the callback may choose take advantage of the free space before the requested alignment. Pass the full extent of the calculated empty space to resource_alignf callback as an additional parameter. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xifer <xiferdev@gmail.com> Link: https://patch.msgid.link/20260324165633.4583-3-ilpo.jarvinen@linux.intel.com	2026-03-27 10:18:39 -05:00
Ryan Roberts	a96ef5848c	randomize_kstack: Unify random source across arches Previously different architectures were using random sources of differing strength and cost to decide the random kstack offset. A number of architectures (loongarch, powerpc, s390, x86) were using their timestamp counter, at whatever the frequency happened to be. Other arches (arm64, riscv) were using entropy from the crng via get_random_u16(). There have been concerns that in some cases the timestamp counters may be too weak, because they can be easily guessed or influenced by user space. And get_random_u16() has been shown to be too costly for the level of protection kstack offset randomization provides. So let's use a common, architecture-agnostic source of entropy; a per-cpu prng, seeded at boot-time from the crng. This has a few benefits: - We can remove choose_random_kstack_offset(); That was only there to try to make the timestamp counter value a bit harder to influence from user space []. - The architecture code is simplified. All it has to do now is call add_random_kstack_offset() in the syscall path. - The strength of the randomness can be reasoned about independently of the architecture. - Arches previously using get_random_u16() now have much faster syscall paths, see below results. [] Additionally, this gets rid of some redundant work on s390 and x86. Before this patch, those architectures called choose_random_kstack_offset() under arch_exit_to_user_mode_prepare(), which is also called for exception returns to userspace which were not syscalls (e.g. regular interrupts). Getting rid of choose_random_kstack_offset() avoids a small amount of redundant work for the non-syscall cases. In some configurations, add_random_kstack_offset() will now call instrumentable code, so for a couple of arches, I have moved the call a bit later to the first point where instrumentation is allowed. This doesn't impact the efficacy of the mechanism. There have been some claims that a prng may be less strong than the timestamp counter if not regularly reseeded. But the prng has a period of about 2^113. So as long as the prng state remains secret, it should not be possible to guess. If the prng state can be accessed, we have bigger problems. Additionally, we are only consuming 6 bits to randomize the stack, so there are only 64 possible random offsets. I assert that it would be trivial for an attacker to brute force by repeating their attack and waiting for the random stack offset to be the desired one. The prng approach seems entirely proportional to this level of protection. Performance data are provided below. The baseline is v6.18 with rndstack on for each respective arch. (I)/(R) indicate statistically significant improvement/regression. arm64 platform is AWS Graviton3 (m7g.metal). x86_64 platform is AWS Sapphire Rapids (m7i.24xlarge): +-----------------+--------------+---------------+---------------+ \| Benchmark \| Result Class \| per-cpu-prng \| per-cpu-prng \| \| \| \| arm64 (metal) \| x86_64 (VM) \| +=================+==============+===============+===============+ \| syscall/getpid \| mean (ns) \| (I) -9.50% \| (I) -17.65% \| \| \| p99 (ns) \| (I) -59.24% \| (I) -24.41% \| \| \| p99.9 (ns) \| (I) -59.52% \| (I) -28.52% \| +-----------------+--------------+---------------+---------------+ \| syscall/getppid \| mean (ns) \| (I) -9.52% \| (I) -19.24% \| \| \| p99 (ns) \| (I) -59.25% \| (I) -25.03% \| \| \| p99.9 (ns) \| (I) -59.50% \| (I) -28.17% \| +-----------------+--------------+---------------+---------------+ \| syscall/invalid \| mean (ns) \| (I) -10.31% \| (I) -18.56% \| \| \| p99 (ns) \| (I) -60.79% \| (I) -20.06% \| \| \| p99.9 (ns) \| (I) -61.04% \| (I) -25.04% \| +-----------------+--------------+---------------+---------------+ I tested an earlier version of this change on x86 bare metal and it showed a smaller but still significant improvement. The bare metal system wasn't available this time around so testing was done in a VM instance. I'm guessing the cost of rdtsc is higher for VMs. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://patch.msgid.link/20260303150840.3789438-3-ryan.roberts@arm.com Signed-off-by: Kees Cook <kees@kernel.org>	2026-03-24 21:12:03 -07:00
Christoph Hellwig	e43dce8a0b	fs: fix archiecture-specific compat_ftruncate64 The "small" argument to do_sys_ftruncate indicates if > 32-bit size should be reject, but all the arch-specific compat ftruncate64 implementations get this wrong. Merge do_sys_ftruncate and ksys_ftruncate, replace the integer as boolean small flag with a descriptive one about LFS semantics, and use it correctly in the architecture-specific ftruncate64 implementations. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Fixes: `3dd681d944` ("arm64: 32-bit (compat) applications support") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260323070205.2939118-2-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-23 12:41:57 +01:00
Ritesh Harjani (IBM)	07791ff060	powerpc: Print MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS at startup Similar to CPU_FTRS_[POSSIBLE\|ALWAYS], let's also print MMU_FTRS_[POSSIBLE\|ALWAYS]. This has some useful data to capture during bootup. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/c37a9f314a723048d25aa5424f7ede8eec691d86.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Linus Torvalds	4f3df2e5ea	Merge tag 'powerpc-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix KUAP warning in VMX usercopy path - Fix lockdep warning during PCI enumeration - Fix to move CMA reservations to arch_mm_preinit - Fix to check current->mm is alive before getting user callchain Thanks to Aboorva Devarajan, Christophe Leroy (CS GROUP), Dan Horák, Nicolin Chen, Nilay Shroff, Qiao Zhao, Ritesh Harjani (IBM), Saket Kumar Bhaskar, Sayali Patil, Shrikanth Hegde, Venkat Rao Bagalkote, and Viktor Malik. * tag 'powerpc-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/iommu: fix lockdep warning during PCI enumeration powerpc/selftests/copyloops: extend selftest to exercise __copy_tofrom_user_power7_vmx powerpc: fix KUAP warning in VMX usercopy path powerpc, perf: Check that current->mm is alive before getting user callchain powerpc/mem: Move CMA reservations to arch_mm_preinit	2026-03-15 11:36:11 -07:00
Nilay Shroff	82f73ef9c4	powerpc/iommu: fix lockdep warning during PCI enumeration Commit `a75b2be249` ("iommu: Add iommu_driver_get_domain_for_dev() helper") introduced iommu_driver_get_domain_for_dev() for driver code paths that hold iommu_group->mutex while attaching a device to an IOMMU domain. The same commit also added a lockdep assertion in iommu_get_domain_for_dev() to ensure that callers do not hold iommu_group->mutex when invoking it. On powerpc platforms, when PCI device ownership is switched from BLOCKED to the PLATFORM domain, the attach callback spapr_tce_platform_iommu_attach_dev() still calls iommu_get_domain_for_dev(). This happens while iommu_group->mutex is held during domain switching, which triggers the lockdep warning below during PCI enumeration: WARNING: drivers/iommu/iommu.c:2252 at iommu_get_domain_for_dev+0x38/0x80, CPU#2: swapper/0/1 Modules linked in: CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc2+ #35 PREEMPT Hardware name: IBM,9105-22A Power11 (architected) 0x820200 0xf000007 of:IBM,FW1120.00 (RB1120_115) hv:phyp pSeries NIP: c000000000c244c4 LR: c00000000005b5a4 CTR: c00000000005b578 REGS: c00000000a7bf280 TRAP: 0700 Not tainted (7.0.0-rc2+) MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 22004422 XER: 0000000a CFAR: c000000000c24508 IRQMASK: 0 GPR00: c00000000005b5a4 c00000000a7bf520 c000000001dc8100 0000000000000001 GPR04: c00000000f972f10 0000000000000000 0000000000000000 0000000000000001 GPR08: 0000001ffbc60000 0000000000000001 0000000000000000 0000000000000000 GPR12: c00000000005b578 c000001fffffe480 c000000000011618 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: ffffffffffffefff 0000000000000000 c000000002d30eb0 0000000000000001 GPR24: c0000000017881f8 0000000000000000 0000000000000001 c00000000f972e00 GPR28: c00000000bbba0d0 0000000000000000 c00000000bbba0d0 c00000000f972e00 NIP [c000000000c244c4] iommu_get_domain_for_dev+0x38/0x80 LR [c00000000005b5a4] spapr_tce_platform_iommu_attach_dev+0x2c/0x98 Call Trace: iommu_get_domain_for_dev+0x68/0x80 (unreliable) spapr_tce_platform_iommu_attach_dev+0x2c/0x98 __iommu_attach_device+0x44/0x220 __iommu_device_set_domain+0xf4/0x194 __iommu_group_set_domain_internal+0xec/0x228 iommu_setup_default_domain+0x5f4/0x6a4 __iommu_probe_device+0x674/0x724 iommu_probe_device+0x50/0xb4 iommu_add_device+0x48/0x198 pci_dma_dev_setup_pSeriesLP+0x198/0x4f0 pcibios_bus_add_device+0x80/0x464 pci_bus_add_device+0x40/0x100 pci_bus_add_devices+0x54/0xb0 pcibios_init+0xd8/0x140 do_one_initcall+0x8c/0x598 kernel_init_freeable+0x3ec/0x850 kernel_init+0x34/0x270 ret_from_kernel_user_thread+0x14/0x1c Fix this by using iommu_driver_get_domain_for_dev() instead of iommu_get_domain_for_dev() in spapr_tce_platform_iommu_attach_dev(), which is the appropriate helper for callers holding the group mutex. Cc: stable@vger.kernel.org Fixes: `a75b2be249` ("iommu: Add iommu_driver_get_domain_for_dev() helper") Closes: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/d5c834ff-4c95-44dd-8bef-57242d63aeee@linux.ibm.com/ Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> [Maddy: Added Closes, tested and reviewed by tags] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260310082129.3630996-1-nilay@linux.ibm.com	2026-03-13 12:13:35 +05:30
Ritesh Harjani (IBM)	0a8321dde0	powerpc/mem: Move CMA reservations to arch_mm_preinit commit `4267739cab` ("arch, mm: consolidate initialization of SPARSE memory model"), changed the initialization order of "pageblock_order" from... start_kernel() - setup_arch() - initmem_init() - sparse_init() - set_pageblock_order(); // this sets the pageblock_order - xxx_cma_reserve(); to... start_kernel() - setup_arch() - xxx_cma_reserve(); - mm_core_init_early() - free_area_init() - sparse_init() - set_pageblock_order() // this sets the pageblock_order. So this means, pageblock_order is not initialized before these cma reservation function calls, hence we are seeing CMA failures like... [ 0.000000] kvm_cma_reserve: reserving 3276 MiB for global area [ 0.000000] cma: pageblock_order not yet initialized. Called during early boot? [ 0.000000] cma: Failed to reserve 3276 MiB .... [ 0.000000][ T0] cma: pageblock_order not yet initialized. Called during early boot? [ 0.000000][ T0] cma: Failed to reserve 1024 MiB This patch moves these CMA reservations to arch_mm_preinit() which happens in mm_core_init() (which happens after pageblock_order is initialized), but before the memblock moves the free memory to buddy. Fixes: `4267739cab` ("arch, mm: consolidate initialization of SPARSE memory model") Suggested-by: Mike Rapoport <rppt@kernel.org> Reported-and-tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Closes: https://lore.kernel.org/linuxppc-dev/4c338a29-d190-44f3-8874-6cfa0a031f0b@linux.ibm.com/ Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Dan Horák <dan@danny.cz> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/6e532cf0db5be99afbe20eed699163d5e86cd71f.1772303986.git.ritesh.list@gmail.com	2026-03-12 10:57:31 +05:30
Linus Torvalds	2b8e3fac9b	Merge tag 'powerpc-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Correct MSI allocation tracking - Always use 64 bits PTE for powerpc/e500 - Fix inline assembly for clang build on PPC32 - Fixes for clang build issues in powerpc64/ftrace - Fixes for powerpc64/bpf JIT and tailcall support - Cleanup MPC83XX devicetrees - Fix keymile vendor prefix - Fix to use big-endian types for crash variables Thanks to Abhishek Dubey, Christophe Leroy (CS GROUP), Hari Bathini, Heiko Schocher, J. Neuschäfer, Mahesh Salgaonkar, Nam Cao, Nilay Shroff, Rob Herring (Arm), Saket Kumar Bhaskar, Sourabh Jain, Stan Johnson, and Venkat Rao Bagalkote. * tag 'powerpc-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (23 commits) powerpc/pseries: Correct MSI allocation tracking powerpc: dts: mpc83xx: Add unit addresses to /memory powerpc: dts: mpc8315erdb: Add missing #cells properties to SPI bus powerpc: dts: mpc8315erdb: Rename LED nodes to comply with schema powerpc: dts: mpc8315erdb: Use IRQ_TYPE_* macros powerpc: dts: mpc8313erdb: Use IRQ_TYPE_* macros powerpc: 83xx: km83xx: Fix keymile vendor prefix dt-bindings: powerpc: Add Freescale/NXP MPC83xx SoCs powerpc64/bpf: fix kfunc call support powerpc64/bpf: fix handling of BPF stack in exception callback powerpc64/bpf: remove BPF redzone protection in trampoline stack powerpc64/bpf: use consistent tailcall offset in trampoline powerpc64/bpf: fix the address returned by bpf_get_func_ip powerpc64/bpf: do not increment tailcall count when prog is NULL powerpc64/ftrace: workaround clang recording GEP in __patchable_function_entries powerpc64/ftrace: fix OOL stub count with clang powerpc64: make clang cross-build friendly powerpc/crash: adjust the elfcorehdr size powerpc/kexec/core: use big-endian types for crash variables powerpc/prom_init: Fixup missing #size-cells on PowerMac media-bay nodes ...	2026-03-11 08:35:31 -07:00
Thomas Weißschuh	9b444349a2	powerpc/audit: Directly include unistd_32.h from compat_audit.c This source file undefines '__powerpc64__' to get the 32-bit system call numbers from asm/unistd.h. However this symbol is also evaluated by other headers, among them is asm/bitsperlong.h. The undefinition leads to an inconsistency between __BITS_PER_LONG and the C type 'long'. An upcoming consistency check will be tripped by this. Directly include asm/unistd_32.h to get access to the 32-bit system call numbers instead. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260302-vdso-compat-checkflags-v2-4-78e55baa58ba@linutronix.de	2026-03-11 10:15:43 +01:00
Hari Bathini	db54c28702	powerpc64/ftrace: workaround clang recording GEP in __patchable_function_entries Support for -fpatchable-function-entry on ppc64le was added in Clang with [1]. However, when no prefix NOPs are specified - as is the case with CONFIG_PPC_FTRACE_OUT_OF_LINE - the first NOP is emitted at LEP, but Clang records the Global Entry Point (GEP) unlike GCC which does record the Local Entry Point (LEP). Issue [2] has been raised to align Clang's behavior with GCC. As a temporary workaround to ensure ftrace initialization works as expected with Clang, derive the LEP using ppc_function_entry() for kernel symbols and by looking for the below module GEP sequence for module addresses, until [2] is resolved: ld r2, -8(r12) add r2, r2, r12 [1] https://github.com/llvm/llvm-project/pull/151569 [2] https://github.com/llvm/llvm-project/issues/163706 Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260127084926.34497-4-hbathini@linux.ibm.com	2026-03-07 16:02:25 +05:30
Linus Torvalds	4ae12d8bd9	Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Split out .modinfo section from ELF_DETAILS macro, as that macro may be used in other areas that expect to discard .modinfo, breaking certain image layouts - Adjust genksyms parser to handle optional attributes in certain declarations, necessary after commit `07919126ec` ("netfilter: annotate NAT helper hook pointers with __rcu") - Include resolve_btfids in external module build created by scripts/package/install-extmod-build when it may be run on external modules - Avoid removing objtool binary with 'make clean', as it is required for external module builds * tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Leave objtool binary around with 'make clean' kbuild: install-extmod-build: Package resolve_btfids if necessary genksyms: Fix parsing a declarator with a preceding attribute kbuild: Split .modinfo out from ELF_DETAILS	2026-03-06 20:27:13 -08:00
Rob Herring (Arm)	6fc5d63c6f	powerpc/prom_init: Fixup missing #size-cells on PowerMac media-bay nodes Similar to other PowerMac mac-io devices, the media-bay node is missing the "#size-cells" property. Depends-on: commit `045b14ca5c` ("of: WARN on deprecated #address-cells/#size-cells handling") Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251029174047.1620073-1-robh@kernel.org	2026-03-04 11:07:36 +05:30
Christophe Leroy	b9e7e3ea60	powerpc/e500: Always use 64 bits PTE Today there are two PTE formats for e500: - The 64 bits format, used - On 64 bits kernel - On 32 bits kernel with 64 bits physical addresses - On 32 bits kernel with support of huge pages - The 32 bits format, used in other cases Maintaining two PTE formats means unnecessary maintenance burden because every change needs to be implemented and tested for both formats. Remove the 32 bits PTE format. The memory usage increase due to larger PTEs is minimal (approx. 0,1% of memory). This also means that from now on huge pages are supported also with 32 bits physical addresses. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/04a658209ea78dcc0f3dbde6b2c29cf1939adfe9.1767721208.git.chleroy@kernel.org	2026-03-04 11:05:06 +05:30
Nilay Shroff	2185904ff8	powerpc/pci: Initialize msi_addr_mask for OF-created PCI devices Recent changes replaced the use of no_64bit_msi with msi_addr_mask. As a result, msi_addr_mask is now expected to be initialized to DMA_BIT_MASK(64) when a pci_dev is set up. However, this initialization was missed on powerpc due to differences in the device initialization path compared to other (x86) architecture. Due to this, now PCI device probe method fails on powerpc system. On powerpc systems, struct pci_dev instances are created from device tree nodes via of_create_pci_dev(). Because msi_addr_mask was not initialized there, it remained zero. Later, during MSI setup, msi_verify_entries() validates the programmed MSI address against pdev->msi_addr_mask. Since the mask was not set correctly, the validation fails, causing PCI driver probe failures for devices on powerpc systems. Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev() so that MSI address validation succeeds and device probe works as expected. Fixes: `386ced19e9` ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Tested-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260220070239.1693303-2-nilay@linux.ibm.com	2026-03-03 10:29:15 -06:00
Nathan Chancellor	8678591b47	kbuild: Split .modinfo out from ELF_DETAILS Commit `3e86e4d74c` ("kbuild: keep .modinfo section in vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and ELF_DETAILS was present in all architecture specific vmlinux linker scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and COMMON_DISCARDS may be used by other linker scripts, such as the s390 and x86 compressed boot images, which may not expect to have a .modinfo section. In certain circumstances, this could result in a bootloader failing to load the compressed kernel [1]. Commit `ddc6cbef3e` ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer") recently addressed this for the s390 bzImage but the same bug remains for arm, parisc, and x86. The presence of .modinfo in the x86 bzImage was the root cause of the issue worked around with commit `d50f210913` ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition. Split .modinfo out from ELF_DETAILS into its own macro and handle it in all vmlinux linker scripts. Discard .modinfo in the places where it was previously being discarded from being in COMMON_DISCARDS, as it has never been necessary in those uses. Cc: stable@vger.kernel.org Fixes: `3e86e4d74c` ("kbuild: keep .modinfo section in vmlinux.unstripped") Reported-by: Ed W <lists@wildgooses.com> Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1] Tested-by: Ed W <lists@wildgooses.com> # x86_64 Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2026-02-26 11:50:19 -07:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	136114e0ab	Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...	2026-02-12 12:13:01 -08:00
Linus Torvalds	4cff5c05e0	Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "powerpc/64s: do not re-activate batched TLB flush" makes arch_{enter\|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev) It adds a generic enter/leave layer and switches architectures to use it. Various hacks were removed in the process. - "zram: introduce compressed data writeback" implements data compression for zram writeback (Richard Chang and Sergey Senozhatsky) - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous page ranges for hugepages. Large improvements during demand faulting are demonstrated (David Hildenbrand) - "memcg cleanups" tidies up some memcg code (Chen Ridong) - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats" improves DAMOS stat's provided information, deterministic control, and readability (SeongJae Park) - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few issues in the hugetlb cgroup charging selftests (Li Wang) - "Fix va_high_addr_switch.sh test failure - again" addresses several issues in the va_high_addr_switch test (Chunyu Hu) - "mm/damon/tests/core-kunit: extend existing test scenarios" improves the KUnit test coverage for DAMON (Shu Anzai) - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN (Shivank Garg) - "arch, mm: consolidate hugetlb early reservation" reworks and consolidates a pile of straggly code related to reservation of hugetlb memory from bootmem and creation of CMA areas for hugetlb (Mike Rapoport) - "mm: clean up anon_vma implementation" cleans up the anon_vma implementation in various ways (Lorenzo Stoakes) - "tweaks for __alloc_pages_slowpath()" does a little streamlining of the page allocator's slowpath code (Vlastimil Babka) - "memcg: separate private and public ID namespaces" cleans up the memcg ID code and prevents the internal-only private IDs from being exposed to userspace (Shakeel Butt) - "mm: hugetlb: allocate frozen gigantic folio" cleans up the allocation of frozen folios and avoids some atomic refcount operations (Kefeng Wang) - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement of memory betewwn the active and inactive LRUs and adds auto-tuning of the ratio-based quotas and of monitoring intervals (SeongJae Park) - "Support page table check on PowerPC" makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan) - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes nodes_and() and nodes_andnot() propagate the return values from the underlying bit operations, enabling some cleanup in calling code (Yury Norov) - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up some DAMON internal interfaces (SeongJae Park) - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work in khupaged and fixes a scan limit accounting issue (Shivank Garg) - "mm: balloon infrastructure cleanups" goes to town on the balloon infrastructure and its page migration function. Mainly cleanups, also some locking simplification (David Hildenbrand) - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds additional tracepoints to the page reclaim code (Jiayuan Chen) - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is part of Marco's kernel-wide migration from the legacy workqueue APIs over to the preferred unbound workqueues (Marco Crivellari) - "Various mm kselftests improvements/fixes" provides various unrelated improvements/fixes for the mm kselftests (Kevin Brodsky) - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic folio allocation, mainly by avoiding unnecessary work in pfn_range_valid_contig() (Kefeng Wang) - "selftests/damon: improve leak detection and wss estimation reliability" improves the reliability of two of the DAMON selftests (SeongJae Park) - "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION" does some cleanup work in the core DAMON code (SeongJae Park) - "Docs/mm/damon: update intro, modules, maintainer profile, and misc" performs maintenance work on the DAMON documentation (SeongJae Park) - "mm: add and use vma_assert_stabilised() helper" refactors and cleans up the core VMA code. The main aim here is to be able to use the mmap write lock's lockdep state to perform various assertions regarding the locking which the VMA code requires (Lorenzo Stoakes) - "mm, swap: swap table phase II: unify swapin use" removes some old swap code (swap cache bypassing and swap synchronization) which wasn't working very well. Various other cleanups and simplifications were made. The end result is a 20% speedup in one benchmark (Kairui Song) - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM available on 64-bit alpha, loongarch, mips, parisc, and um. Various cleanups were performed along the way (Qi Zheng) * tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits) mm/memory: handle non-split locks correctly in zap_empty_pte_table() mm: move pte table reclaim code to memory.c mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config um: mm: enable MMU_GATHER_RCU_TABLE_FREE parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE mips: mm: enable MMU_GATHER_RCU_TABLE_FREE LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles zsmalloc: make common caches global mm: add SPDX id lines to some mm source files mm/zswap: use %pe to print error pointers mm/vmscan: use %pe to print error pointers mm/readahead: fix typo in comment mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file() mm: refactor vma_map_pages to use vm_insert_pages mm/damon: unify address range representation with damon_addr_range mm/cma: replace snprintf with strscpy in cma_new_area ...	2026-02-12 11:32:37 -08:00
Linus Torvalds	192c015940	Merge tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates for 7.0 - Implement masked user access - Add bpf support for internal only per-CPU instructions and inline the bpf_get_smp_processor_id() and bpf_get_current_task() functions - Fix pSeries MSI-X allocation failure when quota is exceeded - Fix recursive pci_lock_rescan_remove locking in EEH event handling - Support tailcalls with subprogs & BPF exceptions on 64bit - Extend "trusted" keys to support the PowerVM Key Wrapping Module (PKWM) Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li, Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote. * tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits) powerpc/pseries: plpks: export plpks_wrapping_is_supported docs: trusted-encryped: add PKWM as a new trust source keys/trusted_keys: establish PKWM as a trusted source pseries/plpks: add HCALLs for PowerVM Key Wrapping Module pseries/plpks: expose PowerVM wrapping features via the sysfs powerpc/pseries: move the PLPKS config inside its own sysfs directory pseries/plpks: fix kernel-doc comment inconsistencies powerpc/smp: Add check for kcalloc() failure in parse_thread_groups() powerpc: kgdb: Remove OUTBUFMAX constant powerpc64/bpf: Additional NVR handling for bpf_throw powerpc64/bpf: Support exceptions powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT powerpc64/bpf: Avoid tailcall restore from trampoline powerpc64/bpf: Support tailcalls with subprogs powerpc64/bpf: Moving tail_call_cnt to bottom of frame powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf() powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs ...	2026-02-10 21:46:12 -08:00
Linus Torvalds	f1c538ca81	Merge tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO updates from Thomas Gleixner: - Provide the missing 64-bit variant of clock_getres() This allows the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and finally the removal of 32-bit time types from the kernel and UAPI. - Remove the useless and broken getcpu_cache from the VDSO The intention was to provide a trivial way to retrieve the CPU number from the VDSO, but as the VDSO data is per process there is no way to make it work. - Switch get/put_unaligned() from packed struct to memcpy() The packed struct violates strict aliasing rules which requires to pass -fno-strict-aliasing to the compiler. As this are scalar values __builtin_memcpy() turns them into simple loads and stores - Use __typeof_unqual__() for __unqual_scalar_typeof() The get/put_unaligned() changes triggered a new sparse warning when __beNN types are used with get/put_unaligned() as sparse builds add a special 'bitwise' attribute to them which prevents sparse to evaluate the Generic in __unqual_scalar_typeof(). Newer sparse versions support __typeof_unqual__() which avoids the problem, but requires a recent sparse install. So this adds a sanity check to sparse builds, which validates that sparse is available and capable of handling it. - Force inline __cvdso_clock_getres_common() Compilers sometimes un-inline agressively, which results in function call overhead and problems with automatic stack variable initialization. Interestingly enough the force inlining results in smaller code than the un-inlined variant produced by GCC when optimizing for size. * tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vdso/gettimeofday: Force inlining of __cvdso_clock_getres_common() x86/percpu: Make CONFIG_USE_X86_SEG_SUPPORT work with sparse compiler: Use __typeof_unqual__() for __unqual_scalar_typeof() powerpc/vdso: Provide clock_getres_time64() tools headers: Remove unneeded ignoring of warnings in unaligned.h tools headers: Update the linux/unaligned.h copy with the kernel sources vdso: Switch get/put_unaligned() from packed struct to memcpy() parisc: Inline a type punning version of get_unaligned_le32() vdso: Remove struct getcpu_cache MIPS: vdso: Provide getres_time64() for 32-bit ABIs arm64: vdso32: Provide clock_getres_time64() ARM: VDSO: Provide clock_getres_time64() ARM: VDSO: Patch out __vdso_clock_getres() if unavailable x86/vdso: Provide clock_getres_time64() for x86-32 selftests: vDSO: vdso_test_abi: Add test for clock_getres_time64() selftests: vDSO: vdso_test_abi: Use UAPI system call numbers selftests: vDSO: vdso_config: Add configurations for clock_getres_time64() vdso: Add prototype for __vdso_clock_getres_time64()	2026-02-10 17:02:23 -08:00
Peter Zijlstra	3e4067169c	Merge branch 'v6.19-rc8' Update to avoid conflicts with /urgent patches. Signed-off-by: Peter Zijlstra <peterz@infradead.org>	2026-02-03 12:04:13 +01:00
Srish Srinivasan	40850c909f	powerpc/pseries: move the PLPKS config inside its own sysfs directory The /sys/firmware/secvar/config directory represents Power LPAR Platform KeyStore (PLPKS) configuration properties such as max_object_size, signed_ update_algorithms, supported_policies, total_size, used_space, and version. These attributes describe the PLPKS, and not the secure boot variables (secvars). Create /sys/firmware/plpks directory and move the PLPKS config inside this directory. For backwards compatibility, create a soft link from the secvar sysfs directory to this config and emit a warning stating that the older sysfs path has been deprecated. Separate out the plpks specific documentation from secvar. Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260127145228.48320-3-ssrish@linux.ibm.com	2026-01-30 09:27:26 +05:30
Guangshuo Li	33c1c6d8a2	powerpc/smp: Add check for kcalloc() failure in parse_thread_groups() As kcalloc() may fail, check its return value to avoid a NULL pointer dereference when passing it to of_property_read_u32_array(). Fixes: `790a1662d3` ("powerpc/smp: Parse ibm,thread-groups with multiple properties") Cc: stable@vger.kernel.org Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250923133235.1862108-1-lgs201920130244@gmail.com	2026-01-29 09:06:01 +05:30
Mike Rapoport (Microsoft)	9fac145b6d	mm, arch: consolidate hugetlb CMA reservation Every architecture that supports hugetlb_cma command line parameter reserves CMA areas for hugetlb during setup_arch(). This obfuscates the ordering of hugetlb CMA initialization with respect to the rest initialization of the core MM. Introduce arch_hugetlb_cma_order() callback to allow architectures report the desired order-per-bit of CMA areas and provide a week implementation of arch_hugetlb_cma_order() for architectures that don't support hugetlb with CMA. Use this callback in hugetlb_cma_reserve() instead if passing the order as parameter and call hugetlb_cma_reserve() from mm_core_init_early() rather than have it spread over architecture specific code. Link: https://lkml.kernel.org/r/20260111082105.290734-28-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:19 -08:00
Thomas Gleixner	99d2592023	rseq: Implement sys_rseq_slice_yield() Provide a new syscall which has the only purpose to yield the CPU after the kernel granted a time slice extension. sched_yield() is not suitable for that because it unconditionally schedules, but the end of the time slice extension is not required to schedule when the task was already preempted. This also allows to have a strict check for termination to catch user space invoking random syscalls including sched_yield() from a time slice extension region. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20251215155708.929634896@linutronix.de	2026-01-22 11:11:17 +01:00
Randy Dunlap	24c776355f	kernel.h: drop hex.h and update all hex.h users Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:19 -08:00
Alexander Gordeev	58852f24f9	powerpc/64s: do not re-activate batched TLB flush Patch series "Nesting support for lazy MMU mode", v6. When the lazy MMU mode was introduced eons ago, it wasn't made clear whether such a sequence was legal: arch_enter_lazy_mmu_mode() ... arch_enter_lazy_mmu_mode() ... arch_leave_lazy_mmu_mode() ... arch_leave_lazy_mmu_mode() It seems fair to say that nested calls to arch_{enter,leave}_lazy_mmu_mode() were not expected, and most architectures never explicitly supported it. Nesting does in fact occur in certain configurations, and avoiding it has proved difficult. This series therefore enables lazy_mmu sections to nest, on all architectures. Nesting is handled using a counter in task_struct (patch 8), like other stateless APIs such as pagefault_{disable,enable}(). This is fully handled in a new generic layer in <linux/pgtable.h>; the arch_* API remains unchanged. A new pair of calls, lazy_mmu_mode_{pause,resume}(), is also introduced to allow functions that are called with the lazy MMU mode enabled to temporarily pause it, regardless of nesting. An arch now opts in to using the lazy MMU mode by selecting CONFIG_ARCH_LAZY_MMU; this is more appropriate now that we have a generic API, especially with state conditionally added to task_struct. This patch (of 14): Since commit `b9ef323ea1` ("powerpc/64s: Disable preemption in hash lazy mmu mode") a task can not be preempted while in lazy MMU mode. Therefore, the batch re-activation code is never called, so remove it. Link: https://lkml.kernel.org/r/20251215150323.2218608-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20251215150323.2218608-2-kevin.brodsky@arm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: levi.yun <yeoreum.yun@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:32 -08:00
Feng Tang	e561383a39	powerpc/watchdog: add support for hardlockup_sys_info sysctl Commit `a9af76a787` ("watchdog: add sys_info sysctls to dump sys info on system lockup") adds 'hardlock_sys_info' systcl knob for general kernel watchdog to control what kinds of system debug info to be dumped on hardlockup. Add similar support in powerpc watchdog code to make the sysctl knob more general, which also fixes a compiling warning in general watchdog code reported by 0day bot. Link: https://lkml.kernel.org/r/20251231080309.39642-1-feng.tang@linux.alibaba.com Fixes: `a9af76a787` ("watchdog: add sys_info sysctls to dump sys info on system lockup") Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512030920.NFKtekA7-lkp@intel.com/ Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-14 22:16:22 -08:00
Thomas Weißschuh	759a1f9737	powerpc/vdso: Provide clock_getres_time64() For consistency with __vdso_clock_gettime64() there should also be a 64-bit variant of clock_getres(). This will allow the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and finally the removal of 32-bit time types from the kernel and UAPI. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Link: https://patch.msgid.link/20260114-vdso-powerpc-align-v1-1-acf09373d568@linutronix.de	2026-01-14 14:57:39 +01:00
Narayana Murty N	815a8d2feb	powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling The recent commit `1010b4c012` ("powerpc/eeh: Make EEH driver device hotplug safe") restructured the EEH driver to improve synchronization with the PCI hotplug layer. However, it inadvertently moved pci_lock_rescan_remove() outside its intended scope in eeh_handle_normal_event(), leading to broken PCI error reporting and improper EEH event triggering. Specifically, eeh_handle_normal_event() acquired pci_lock_rescan_remove() before calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to acquire the same lock internally, causing nested locking and disrupting normal EEH event handling paths. This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(), with two public wrappers: eeh_pe_bus_get() with locking enabled. eeh_pe_bus_get_nolock() that skips locking. Callers that already hold pci_lock_rescan_remove() now use eeh_pe_bus_get_nolock() to avoid recursive lock acquisition. Additionally, pci_lock_rescan_remove() calls are restored to the correct position—after eeh_pe_bus_get() and immediately before iterating affected PEs and devices. This ensures EEH-triggered PCI removes occur under proper bus rescan locking without recursive lock contention. The eeh_pe_loc_get() function has been split into two functions: eeh_pe_loc_get(struct eeh_pe pe) which retrieves the loc for given PE. eeh_pe_loc_get_bus(struct pci_bus bus) which retrieves the location code for given bus. This resolves lockdep warnings such as: <snip> [ 84.964298] [ T928] ============================================ [ 84.964304] [ T928] WARNING: possible recursive locking detected [ 84.964311] [ T928] 6.18.0-rc3 #51 Not tainted [ 84.964315] [ T928] -------------------------------------------- [ 84.964320] [ T928] eehd/928 is trying to acquire lock: [ 84.964324] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964342] [ T928] but task is already holding lock: [ 84.964347] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964357] [ T928] other info that might help us debug this: [ 84.964363] [ T928] Possible unsafe locking scenario: [ 84.964367] [ T928] CPU0 [ 84.964370] [ T928] ---- [ 84.964373] [ T928] lock(pci_rescan_remove_lock); [ 84.964378] [ T928] lock(pci_rescan_remove_lock); [ 84.964383] [ T928] * DEADLOCK * [ 84.964388] [ T928] May be due to missing lock nesting notation [ 84.964393] [ T928] 1 lock held by eehd/928: [ 84.964397] [ T928] #0: c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964408] [ T928] stack backtrace: [ 84.964414] [ T928] CPU: 2 UID: 0 PID: 928 Comm: eehd Not tainted 6.18.0-rc3 #51 VOLUNTARY [ 84.964417] [ T928] Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries [ 84.964419] [ T928] Call Trace: [ 84.964420] [ T928] [c0000011a7157990] [c000000001705de4] dump_stack_lvl+0xc8/0x130 (unreliable) [ 84.964424] [ T928] [c0000011a71579d0] [c0000000002f66e0] print_deadlock_bug+0x430/0x440 [ 84.964428] [ T928] [c0000011a7157a70] [c0000000002fd0c0] __lock_acquire+0x1530/0x2d80 [ 84.964431] [ T928] [c0000011a7157ba0] [c0000000002fea54] lock_acquire+0x144/0x410 [ 84.964433] [ T928] [c0000011a7157cb0] [c0000011a7157cb0] __mutex_lock+0xf4/0x1050 [ 84.964436] [ T928] [c0000011a7157e00] [c000000000de21d8] pci_lock_rescan_remove+0x28/0x40 [ 84.964439] [ T928] [c0000011a7157e20] [c00000000004ed98] eeh_pe_bus_get+0x48/0xc0 [ 84.964442] [ T928] [c0000011a7157e50] [c000000000050434] eeh_handle_normal_event+0x64/0xa60 [ 84.964446] [ T928] [c0000011a7157f30] [c000000000051de8] eeh_event_handler+0xf8/0x190 [ 84.964450] [ T928] [c0000011a7157f90] [c0000000002747ac] kthread+0x16c/0x180 [ 84.964453] [ T928] [c0000011a7157fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18 </snip> Fixes: `1010b4c012` ("powerpc/eeh: Make EEH driver device hotplug safe") Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251210142559.8874-1-nnmlinux@linux.ibm.com	2026-01-08 17:56:53 +05:30
Gaurav Batra	1471c517cf	powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory Leverage ARCH_HAS_DMA_MAP_DIRECT config option for coherent allocations as well. This will bypass DMA ops for memory allocations that have been pre-mapped. Always set device bus_dma_limit when memory is pre-mapped. In some architectures, like PowerPC, pmemory can be converted to regular memory via daxctl command. This will gate the coherent allocations to pre-mapped RAM only, by dma_coherent_ok(). Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251107161105.85999-1-gbatra@linux.ibm.com	2026-01-07 09:33:55 +05:30
Christophe Leroy	fb7903771c	powerpc/32s: Fix segments setup when TASK_SIZE is not a multiple of 256M For book3s/32 it is assumed that TASK_SIZE is a multiple of 256 Mbytes, but Kconfig allows any value for TASK_SIZE. In all relevant calculations, align TASK_SIZE to the upper 256 Mbytes boundary. Also use ASM_CONST() in the definition of TASK_SIZE to ensure it is seen as an unsigned constant. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8928d906079e156c59794c41e826a684eaaaebb4.1766574657.git.chleroy@kernel.org	2026-01-07 09:31:04 +05:30
Christophe Leroy (CS GROUP)	608328ba5b	powerpc/32: Restore disabling of interrupts at interrupt/syscall exit Commit `2997876c4a` ("powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit") delayed clearing of MSR[RI], but missed that both MSR[RI] and MSR[EE] are cleared at the same time, so the commit also delayed the disabling of interrupts, leading to unexpected behaviour. To fix that, mostly revert the blamed commit and restore the clearing of MSR[RI] in interrupt_exit_kernel_prepare() instead. For 8xx it implies adding a synchronising instruction after the mtspr in order to make sure no instruction counter interrupt (used for perf events) will fire just after clearing MSR[RI]. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Closes: https://lore.kernel.org/all/4d0bd05d-6158-1323-3509-744d3fbe8fc7@xenosoft.de/ Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/all/6b05eb1c-fdef-44e0-91a7-8286825e68f1@roeck-us.net/ Fixes: `2997876c4a` ("powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/585ea521b2be99d293b539bbfae148366cfb3687.1766146895.git.chleroy@kernel.org	2025-12-22 18:25:07 +05:30
Finn Thain	b94b735675	powerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf() Since Linux v6.7, booting using BootX on an Old World PowerMac produces an early crash. Stan Johnson writes, "the symptoms are that the screen goes blank and the backlight stays on, and the system freezes (Linux doesn't boot)." Further testing revealed that the failure can be avoided by disabling CONFIG_BOOTX_TEXT. Bisection revealed that the regression was caused by a change to the font bitmap pointer that's used when btext_init() begins painting characters on the display, early in the boot process. Christophe Leroy explains, "before kernel text is relocated to its final location ... data is addressed with an offset which is added to the Global Offset Table (GOT) entries at the start of bootx_init() by function reloc_got2(). But the pointers that are located inside a structure are not referenced in the GOT and are therefore not updated by reloc_got2(). It is therefore needed to apply the offset manually by using PTRRELOC() macro." Cc: stable@vger.kernel.org Link: https://lists.debian.org/debian-powerpc/2025/10/msg00111.html Link: https://lore.kernel.org/linuxppc-dev/d81ddca8-c5ee-d583-d579-02b19ed95301@yahoo.com/ Reported-by: Cedar Maxwell <cedarmaxwell@mac.com> Closes: https://lists.debian.org/debian-powerpc/2025/09/msg00031.html Bisected-by: Stan Johnson <userm57@yahoo.com> Tested-by: Stan Johnson <userm57@yahoo.com> Fixes: `0ebc7feae7` ("powerpc: Use shared font data") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/22b3b247425a052b079ab84da926706b3702c2c7.1762731022.git.fthain@linux-m68k.org	2025-12-22 17:59:07 +05:30
Linus Torvalds	a7405aa92f	Merge tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - More DMA mapping API refactoring to physical addresses as the primary interface instead of page+offset parameters. This time dma_map_ops callbacks are converted to physical addresses, what in turn results also in some simplification of architecture specific code (Leon Romanovsky and Jason Gunthorpe) - Clarify that dma_map_benchmark is not a kernel self-test, but standalone tool (Qinxin Xia) * tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: remove unused map_page callback xen: swiotlb: Convert mapping routine to rely on physical address x86: Use physical address for DMA mapping sparc: Use physical address DMA mapping powerpc: Convert to physical address DMA mapping parisc: Convert DMA map_page to map_phys interface MIPS/jazzdma: Provide physical address directly alpha: Convert mapping routine to rely on physical address dma-mapping: remove unused mapping resource callbacks xen: swiotlb: Switch to physical address mapping callbacks ARM: dma-mapping: Switch to physical address mapping callbacks ARM: dma-mapping: Reduce struct page exposure in arch_sync_dma*() dma-mapping: convert dummy ops to physical address mapping dma-mapping: prepare dma_map_ops to conversion to physical address tools/dma: move dma_map_benchmark from selftests to tools/dma	2025-12-06 09:25:05 -08:00
Linus Torvalds	ad952db4a8	Merge tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Restore clearing of MSR[RI] at interrupt/syscall exit on 32-bit - Fix unpaired stwcx on interrupt exit on 32-bit - Fix race condition leading to double list-add in mac_hid_toggle_emumouse() - Fix mprotect on book3s 32-bit - Fix SLB multihit issue during SLB preload with 64-bit hash MMU - Add support for crashkernel CMA reservation - Add die_id and die_cpumask for Power10 & later to expose chip hemispheres - A series of minor fixes and improvements to the hash SLB code Thanks to Antonio Alvarez Feijoo, Ben Collins, Bhaskar Chowdhury, Christophe Leroy, Daniel Thompson, Dave Vasilevsky, Donet Tom, J. Neuschäfer, Kunwu Chan, Long Li, Naresh Kamboju, Nathan Chancellor, Ritesh Harjani (IBM), Shirisha G, Shrikanth Hegde, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Thomas Zimmermann, Venkat Rao Bagalkote, and Vishal Chourasia. * tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (32 commits) macintosh/via-pmu-backlight: Include <linux/fb.h> and <linux/of.h> powerpc/powermac: backlight: Include <linux/of.h> powerpc/64s/slb: Add no_slb_preload early cmdline param powerpc/64s/slb: Make preload_add return type as void powerpc/ptdump: Dump PXX level info for kernel_page_tables powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash powerpc/64s/hash: Update directMap page counters for Hash powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU powerpc/64s/hash: Improve hash mmu printk messages powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize() powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit powerpc/64s/slb: Fix SLB multihit issue during SLB preload powerpc, mm: Fix mprotect on book3s 32-bit powerpc/smp: Expose die_id and die_cpumask powerpc/83xx: Add a null pointer check to mcu_gpiochip_add arch:powerpc:tools This file was missing shebang line, so added it kexec: Include kernel-end even without crashkernel powerpc: p2020: Rename wdt@ nodes to watchdog@ powerpc: 86xx: Rename wdt@ nodes to watchdog@ ...	2025-12-05 16:18:21 -08:00
Linus Torvalds	ce5cfb0fa2	Merge tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: - Introduction of the generic IO page-table framework with support for Intel and AMD IOMMU formats from Jason. This has good potential for unifying more IO page-table implementations and making future enhancements more easy. But this also needed quite some fixes during development. All known issues have been fixed, but my feeling is that there is a higher potential than usual that more might be needed. - Intel VT-d updates: - Use right invalidation hint in qi_desc_iotlb() - Reduce the scope of INTEL_IOMMU_FLOPPY_WA - ARM-SMMU updates: - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs and a new clock for the TBU. - Fix error handling if level 1 CD table allocation fails. - Permit more than the architectural maximum number of SMRs for funky Qualcomm mis-implementations of SMMUv2. - Mediatek driver: - MT8189 iommu support - Move ARM IO-pgtable selftests to kunit - Device leak fixes for a couple of drivers - Random smaller fixes and improvements * tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits) iommupt/vtd: Support mgaw's less than a 4 level walk for first stage iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires powerpc/pseries/svm: Make mem_encrypt.h self contained genpt: Make GENERIC_PT invisible iommupt: Avoid a compiler bug with sw_bit iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal iommupt: Fix unlikely flows in increase_top() iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga() MAINTAINERS: Update my email address iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock iommu/vt-d: Restore previous domain::aperture_end calculation iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD iommu/tegra: fix device leak on probe_device() iommu/sun50i: fix device leak on of_xlate() iommu/omap: simplify probe_device() error handling iommu/omap: fix device leaks on probe_device() iommu/mediatek-v1: add missing larb count sanity check iommu/mediatek-v1: fix device leaks on probe() ...	2025-12-04 18:05:06 -08:00
Donet Tom	00312419f0	powerpc/64s/slb: Fix SLB multihit issue during SLB preload On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. / context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ / * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). / load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() / * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. / load_elf_binary continues... START_THREAD start_thread preload_new_slb_context / * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 >From the above analysis, during early exec the hardware SLB is cleared, and entries from the software preload cache are reloaded into hardware by switch_slb. However, preload_new_slb_context and slb_setup_new_exec also attempt to load some of the same entries, which can trigger a multi-hit. In most cases, these additional preloads simply hit existing entries and add nothing new. Removing these functions avoids redundant preloads and eliminates the multi-hit issue. This patch removes these two functions. We tested process switching performance using the context_switch benchmark on POWER9/hash, and observed no regression. Without this patch: 129041 ops/sec With this patch: 129341 ops/sec We also measured SLB faults during boot, and the counts are essentially the same with and without this patch. SLB faults without this patch: 19727 SLB faults with this patch: 19786 Fixes: `5434ae7462` ("powerpc/64s/hash: Add a SLB preload cache") cc: stable@vger.kernel.org Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com	2025-11-18 12:35:52 +05:30

1 2 3 4 5 ...

9357 Commits