linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 14:51:51 -04:00

Author	SHA1	Message	Date
Abhishek Dubey	e640bcd1bf	selftests/bpf: Enable private stack tests for powerpc64 With support of private stack, relevant tests must pass on powerpc64. #./test_progs -t struct_ops_private_stack #434/1 struct_ops_private_stack/private_stack:OK #434/2 struct_ops_private_stack/private_stack_fail:OK #434/3 struct_ops_private_stack/private_stack_recur:OK #434 struct_ops_private_stack:OK Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260401103215.104438-2-adubey@linux.ibm.com	2026-04-03 14:09:43 +05:30
Abhishek Dubey	156d985123	powerpc64/bpf: Implement JIT support for private stack Provision the private stack as a per-CPU allocation during bpf_int_jit_compile(). Align the stack to 16 bytes and place guard regions at both ends to detect runtime stack overflow and underflow. Round the private stack size up to the nearest 16-byte boundary. Make each guard region 16 bytes to preserve the required overall 16-byte alignment. When private stack is set, skip bpf stack size accounting in kernel stack. There is no stack pointer in powerpc. Stack referencing during JIT is done using frame pointer. Frame pointer calculation goes like: BPF frame pointer = Priv stack allocation start address + Overflow guard + Actual stack size defined by verifier Memory layout: High Addr +--------------------------------------------------+ \| \| \| 16 bytes Underflow guard (0xEB9F12345678eb9fULL) \| \| \| BPF FP -> +--------------------------------------------------+ \| \| \| Private stack - determined by verifier \| \| 16-bytes aligned \| \| \| +--------------------------------------------------+ \| \| Lower Addr \| 16 byte Overflow guard (0xEB9F12345678eb9fULL) \| \| \| Priv stack alloc ->+--------------------------------------------------+ start Update BPF_REG_FP to point to the calculated offset within the allocated private stack buffer. Now, BPF stack usage reference in the allocated private stack. Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260401103215.104438-1-adubey@linux.ibm.com	2026-04-03 14:09:43 +05:30
Yury Norov (NVIDIA)	bd77a34e9a	powerpc: pci-ioda: Optimize pnv_ioda_pick_m64_pe() bitmap_empty() in pnv_ioda_pick_m64_pe() is O(N) and useless because the following find_next_bit() does the same work. Drop it, and while there replace a while() loop with the dedicated for_each_set_bit(). Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250814190936.381346-3-yury.norov@gmail.com	2026-04-01 09:21:07 +05:30
Yury Norov (NVIDIA)	f73338d089	powerpc: pci-ioda: use bitmap_alloc() in pnv_ioda_pick_m64_pe() Use the dedicated bitmap_alloc() in pnv_ioda_pick_m64_pe() and drop some housekeeping code. Because pe_alloc is local, annotate it with __free() and get rid of the explicit kfree() calls. Suggested-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250814190936.381346-2-yury.norov@gmail.com	2026-04-01 09:21:07 +05:30
Christophe Leroy (CS GROUP)	cae734710d	powerpc/net: Inline checksum wrappers and convert to scoped user access Commit `861574d51b` ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert csum_and_copy_to_user() and csum_and_copy_from_user() to scoped user access to benefit from masked user access. csum_and_copy_to_user() and csum_and_copy_from_user() are only called respectively by csum_and_copy_to_iter() and csum_and_copy_from_iter_full() and they are only called twice. Those functions used to be large but they were first reduced by commit `c693cc4676` ("saner calling conventions for csum_and_copy_..._user()") then commit `70d65cd555` ("ppc: propagate the calling conventions change down to csum_partial_copy_generic()"). With the additional size reduction provided by conversion to scoped user access they are not worth being kept out of line. $ ./scripts/bloat-o-meter vmlinux.0 vmlinux.1 add/remove: 0/2 grow/shrink: 2/0 up/down: 136/-176 (-40) Function old new delta csum_and_copy_to_iter 2416 2488 +72 csum_and_copy_from_iter_full 2272 2336 +64 csum_and_copy_to_user 88 - -88 csum_and_copy_from_user 88 - -88 Total: Before=11514471, After=11514431, chg -0.00% Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/f44e1b2760dbed35b237040001a91bc8304b726b.1773137098.git.chleroy@kernel.org	2026-04-01 09:21:07 +05:30
Christophe Leroy (CS GROUP)	cd54714e93	powerpc/sstep: Convert to scoped user access Commit `861574d51b` ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert single step emulation functions to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8f2d85bddacff18046096dc255fd94f6a0f8b230.1773137010.git.chleroy@kernel.org	2026-04-01 09:21:07 +05:30
Christophe Leroy (CS GROUP)	bf53ede003	powerpc/align: Convert emulate_spe() to scoped user access Commit `861574d51b` ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert emulate_spe() to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/4ff83cb240da4e2d0c34e2bce4b8b6ef19a33777.1773136880.git.chleroy@kernel.org	2026-04-01 09:21:07 +05:30
Christophe Leroy (CS GROUP)	679fa9c756	powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access Commit `861574d51b` ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert gpr32_set_common_user() to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Also changes label from Efault to efault to avoid checkpatch complaining about CamelCase. Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/2409643daab08b4bc07004c2b88f42085d1ef45a.1773136838.git.chleroy@kernel.org	2026-04-01 09:21:07 +05:30
Christophe Leroy (CS GROUP)	40a1b9d044	powerpc/futex: Use masked user access Commit `861574d51b` ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Use masked user access in arch_futex_atomic_op_inuser() Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/e29f6a5c14e5938df68d94bfac6b2f762fb922aa.1773136636.git.chleroy@kernel.org	2026-04-01 09:21:06 +05:30
Christophe Leroy	f26ad12356	powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC Commit `e65e1fc2d2` ("[PATCH] syscall class hookup for all normal targets") added generic support for AUDIT but that didn't include support for bi-arch like powerpc. Commit `4b58841149` ("audit: Add generic compat syscall support") added generic support for bi-arch. Convert powerpc to that bi-arch generic audit support. With this change generated text is similar. Thomas has confirmed that the previously failing filter_exclude/test is now successful both without and with this patch, see [1] [1] https://lore.kernel.org/all/20260306115350-ef265661-6d6b-4043-9bd0-8e6b437d0d67@linutronix.de/ Link: https://github.com/linuxppc/issues/issues/412 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/261b1be5b8dc526b83d73e8281e682a73536ea28.1773155031.git.chleroy@kernel.org	2026-04-01 09:21:06 +05:30
Shrikanth Hegde	64ed1e3e72	cpuidle: powerpc: avoid double clear when breaking snooze snooze_loop is done often in any system which has fair bit of idle time. So it qualifies for even micro-optimizations. When breaking the snooze due to timeout, TIF_POLLING_NRFLAG is cleared twice. Clearing the bit invokes atomics. Avoid double clear and thereby avoid one atomic write. dev->poll_time_limit indicates whether the loop was broken due to timeout. Use that instead of defining a new variable. Fixes: `7ded429152` ("cpuidle: powerpc: no memory barrier after break from idle") Cc: stable@vger.kernel.org Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311061709.1230440-1-sshegde@linux.ibm.com	2026-04-01 09:21:06 +05:30
Randy Dunlap	26d76caac4	powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings Fix enum and function return value kernel-doc warnings: Warning: spu.c:36 Excess enum value '%spe_type_logical' description in 'spe_type' Warning: spu.c:78 Excess enum value '%spe_ex_state_unexecutable' description in 'spe_ex_state' Warning: spu.c:78 Excess enum value '%spe_ex_state_executable' description in 'spe_ex_state' Warning: spu.c:78 Excess enum value '%spe_ex_state_executed' description in 'spe_ex_state' Warning: spu.c:190 No description found for return value of 'setup_areas' Fixes: `de91a53429` ("[POWERPC] ps3: add spu support") Fixes: `b47027795a` ("powerpc/ps3: Fix ioremap of spu shadow regs") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260225055328.249204-1-rdunlap@infradead.org	2026-04-01 09:21:06 +05:30
Randy Dunlap	7695a4e12e	powerpc: kgdb: fix kernel-doc warnings Remove empty comment line at the beginning of a kernel-doc function block. Add a "Return:" section for this function. These changes prevent 2 kernel-doc warnings: Warning: ../arch/powerpc/kernel/kgdb.c:103 Cannot find identifier on line: * Warning: kgdb.c:113 No description found for return value of 'kgdb_skipexception' Fixes: `949616cf2d` ("powerpc/kgdb: Bail out of KGDB when we've been triggered") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260225055314.247966-1-rdunlap@infradead.org	2026-04-01 09:21:06 +05:30
Randy Dunlap	d1e6f90d6b	powerpc/ps3: fix ps3.h kernel-doc warnings Fix some kernel-doc warnings in ps3.h: - add @dev to struct ps3_dma_region - don't mark a function as "struct" - add Returns: description for one function - add a short description for ps3_system_bus_set_drvdata() - correct an enum @name - move intervening "struct ps3_system_bus_device;" from between kernel-doc for ps3_dma_region_init() and the function declaration to eliminate these warnings: Warning: arch/powerpc/include/asm/ps3.h:96 struct member 'dev' not described in 'ps3_dma_region' Warning: arch/powerpc/include/asm/ps3.h:118 struct ps3_system_bus_device; error: Cannot parse struct or union! Warning: arch/powerpc/include/asm/ps3.h:166 int ps3_mmio_region_init(struct ps3_system_bus_device dev, struct ps3_mmio_region r, unsigned long bus_addr, unsigned long len, enum ps3_mmio_page_size page_size); error: Cannot parse struct or union! Warning: arch/powerpc/include/asm/ps3.h:167 No description found for return value of 'ps3_mmio_region_init' Warning: arch/powerpc/include/asm/ps3.h:407 missing initial short description on line: * ps3_system_bus_set_drvdata - Warning: arch/powerpc/include/asm/ps3.h:473 Enum value 'PS3_LPM_TB_TYPE_INTERNAL' not described in enum 'ps3_lpm_tb_type' Warning: arch/powerpc/include/asm/ps3.h:473 Excess enum value '@PS3_LPM_RIGHTS_USE_TB' description in 'ps3_lpm_tb_type' This leaves struct members in several structs and function parameters in one function still undescribed. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251129183636.1893634-1-rdunlap@infradead.org	2026-04-01 09:21:06 +05:30
J. Neuschäfer	47a05517c6	powerpc: wii: Fix LED name pattern Adjust the name of the drive slot LED node to comply with the schema in Documentation/devicetree/bindings/leds/leds-gpio.yaml. arch/powerpc/boot/dts/wii.dtb: gpio-leds: 'drive-slot' does not match any of the regexes: '(^led-[0-9a-f]$\|led)', 'pinctrl-[0-9]+' Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311-wii-schema-v1-3-1563ac4aefa8@posteo.net	2026-04-01 09:21:05 +05:30
J. Neuschäfer	4a03d824b3	powerpc: wii: Fix GPIO key name pattern Adjust the names of GPIO key nodes to comply with the schema in Documentation/devicetree/bindings/input/gpio-keys.yaml. Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311-wii-schema-v1-2-1563ac4aefa8@posteo.net	2026-04-01 09:21:05 +05:30
J. Neuschäfer	d1620f27ed	powerpc: wii: Add unit address to /memory This fixes the following dtschema warning: arch/powerpc/boot/dts/wii.dtb: /: memory: False schema does not allow {'device_type': ['memory'], 'reg': [[0, 25165824], [268435456, 67108864]]} Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311-wii-schema-v1-1-1563ac4aefa8@posteo.net	2026-04-01 09:21:05 +05:30
J. Neuschäfer	89f46b5786	powerpc: Move GameCube/Wii options under EMBEDDED6xx Move CONFIG_GAMECUBE and CONFIG_WII directly below other embedded6xx boards, and above options such as TSI108_BRIDGE. This has two advantages for the GC/Wii options: - They won't be moved around by USBGECKO_UDBG appearing or disappearing - They will be intendented in menuconfig/nconfig, to make it clear they are part of the embedded6xx platforms Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260303-gcwii-kconfig-v1-1-636b288e7270@posteo.net	2026-04-01 09:21:05 +05:30
Chen Ni	5716caceba	powerpc/44x/uic: Consolidate chained IRQ handler install/remove The driver currently sets the handler data and the chained handler in two separate steps. This creates a theoretical race window where an interrupt could fire after the handler is set but before the data is assigned, leading to a NULL pointer dereference. Replace the two calls with irq_set_chained_handler_and_data() to set both the handler and its data atomically under the irq_desc->lock. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260119063507.940782-1-nichen@iscas.ac.cn	2026-04-01 09:21:05 +05:30
Chen Ni	7593721cd7	powerpc/52xx/mpc52xx_gpt: consolidate chained IRQ handler install/remove The driver currently sets the handler data and the chained handler in two separate steps. This creates a theoretical race window where an interrupt could fire after the handler is set but before the data is assigned, leading to a NULL pointer dereference. Replace the two calls with irq_set_chained_handler_and_data() to set both the handler and its data atomically under the irq_desc->lock. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260119061232.889236-1-nichen@iscas.ac.cn	2026-04-01 09:21:05 +05:30
Chen Ni	1ef8cf10cd	powerpc/52xx/media5200: Consolidate chained IRQ handler install/remove The driver currently sets the handler data and the chained handler in two separate steps. This creates a theoretical race window where an interrupt could fire after the handler is set but before the data is assigned, leading to a NULL pointer dereference. Replace the two calls with irq_set_chained_handler_and_data() to set both the handler and its data atomically under the irq_desc->lock. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260119060450.889119-1-nichen@iscas.ac.cn	2026-04-01 09:21:05 +05:30
Amit Machhiwal	6e65886fce	selftests/powerpc: Suppress -Wmaybe-uninitialized with GCC 15 GCC 15 reports the below false positive '-Wmaybe-uninitialized' warning in vphn_unpack_associativity() when building the powerpc selftests. # make -C tools/testing/selftests TARGETS="powerpc" [...] CC test-vphn In file included from test-vphn.c:3: In function ‘vphn_unpack_associativity’, inlined from ‘test_one’ at test-vphn.c:371:2, inlined from ‘test_vphn’ at test-vphn.c:399:9: test-vphn.c:10:33: error: ‘be_packed’ may be used uninitialized [-Werror=maybe-uninitialized] 10 \| #define be16_to_cpup(x) bswap_16(x) \| ^~~~~~~~ vphn.c:42:27: note: in expansion of macro ‘be16_to_cpup’ 42 \| u16 new = be16_to_cpup(field++); \| ^~~~~~~~~~~~ In file included from test-vphn.c:19: vphn.c: In function ‘test_vphn’: vphn.c:27:16: note: ‘be_packed’ declared here 27 \| __be64 be_packed[VPHN_REGISTER_COUNT]; \| ^~~~~~~~~ cc1: all warnings being treated as errors When vphn_unpack_associativity() is called from hcall_vphn() in kernel the error is not seen while building vphn.c during kernel compilation. This is because the top level Makefile includes '-fno-strict-aliasing' flag always. The issue here is that GCC 15 emits '-Wmaybe-uninitialized' due to type punning between __be64[] and __b16 when accessing the buffer via be16_to_cpup(). The underlying object is fully initialized but GCC 15 fails to track the aliasing due to the strict aliasing violation here. Please refer [1] and [2]. This results in a false positive warning which is promoted to an error under '-Werror'. This problem is not seen when the compilation is performed with GCC 13 and 14. An issue [1] has also been created on GCC bugzilla. The selftest compiles fine with '-fno-strict-aliasing'. Since this GCC flag is used to compile vphn.c in kernel too, the same flag should be used to build vphn tests when compiling vphn.c in the selftest as well. Fix this by including '-fno-strict-aliasing' during vphn.c compilation in the selftest. This keeps the build working while limiting the scope of the suppression to building vphn tests. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124427 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99768 Fixes: `58dae82843` ("selftests/powerpc: Add test for VPHN") Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260313165426.43259-1-amachhiw@linux.ibm.com	2026-04-01 09:21:04 +05:30
Yury Norov	ce7c43b087	powerpc/xive: rework xive_find_target_in_mask() Switch the function to using modern cpumask API and drop most of the housekeeping code. Notice, if first >= nr_cpu_ids, for_each_cpu_wrap() iterator behaves just like for_each_cpu(), i.e. begins from 0. So even if WARN_ON() is triggered, no special handling is needed. Signed-off-by: Yury Norov <ynorov@nvidia.com> Tested-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260319033647.881246-3-ynorov@nvidia.com	2026-04-01 09:21:04 +05:30
Yury Norov	cad2a72c29	Revert "powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()" This reverts commit `a9dadc1c51`. The commit message states: When called from xive_irq_startup(), the size of the cpumask can be larger than nr_cpu_ids. This can result in a WARN_ON. [...] This happens because we're being called with our affinity mask set to irq_default_affinity. That in turn was populated using cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a mask which has > nr_cpu_ids bits set. In modern kernel, cpumask_weight() can't return > nr_cpu_ids. In inline case, cpumask_setall() explicitly clears all bits above nr_cpu_ids, see commit `63355b9884` ("cpumask: be more careful with 'cpumask_setall()'"). So, despite that cpumask_weight() is passed with small_cpumask_bits, which is NR_CPUS in this case, it can't count over the nr_cpu_ids. In outline case, cpumask_setall() may set bits beyond the limit up to the next byte alignment, but in this case small_cpumask_bits is wired to nr_cpu_ids, thus making overcounting impossible. Signed-off-by: Yury Norov <ynorov@nvidia.com> Tested-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260319033647.881246-2-ynorov@nvidia.com	2026-04-01 09:21:04 +05:30
Sourabh Jain	f53b24d1fa	powerpc/crash: Update backup region offset in elfcorehdr on memory hotplug When elfcorehdr is prepared for kdump, the program header representing the first 64 KB of memory is expected to have its offset point to the backup region. This is required because purgatory copies the first 64 KB of the crashed kernel memory to this backup region following a kernel crash. This allows the capture kernel to use the first 64 KB of memory to place the exception vectors and other required data. When elfcorehdr is recreated due to memory hotplug, the offset of the program header representing the first 64 KB is not updated. As a result, the capture kernel exports the first 64 KB at offset 0, even though the data actually resides in the backup region. Fix this by calling sync_backup_region_phdr() to update the program header offset in the elfcorehdr created during memory hotplug. sync_backup_region_phdr() works for images loaded via the kexec_file_load syscall. However, it does not work for kexec_load, because image->arch.backup_start is not initialized in that case. So introduce machine_kexec_post_load() to process the elfcorehdr prepared by kexec-tools and initialize image->arch.backup_start for kdump images loaded via kexec_load syscall. Rename update_backup_region_phdr() to sync_backup_region_phdr() and extend it to synchronize the backup region offset between the kdump image and the ELF core header. The helper now supports updating either the kdump image from the ELF program header or updating the ELF program header from the kdump image, avoiding code duplication. Define ARCH_HAS_KIMAGE_ARCH and struct kimage_arch when CONFIG_KEXEC_FILE or CONFIG_CRASH_DUMP is enabled so that kimage->arch.backup_start is available with the kexec_load system call. This patch depends on the patch titled "powerpc/crash: fix backup region offset update to elfcorehdr". Fixes: `849599b702` ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260312083051.1935737-3-sourabhjain@linux.ibm.com	2026-04-01 09:21:04 +05:30
Sourabh Jain	789335cacd	powerpc/crash: fix backup region offset update to elfcorehdr update_backup_region_phdr() in file_load_64.c iterates over all the program headers in the kdump kernel’s elfcorehdr and updates the p_offset of the program header whose physical address starts at 0. However, the loop logic is incorrect because the program header pointer is not updated during iteration. Since elfcorehdr typically contains PT_NOTE entries first, the PT_LOAD program header with physical address 0 is never reached. As a result, its p_offset is not updated to point to the backup region. Because of this behavior, the capture kernel exports the first 64 KB of the crashed kernel’s memory at offset 0, even though that memory actually lives in the backup region. When a crash happens, purgatory copies the first 64 KB of the crashed kernel’s memory into the backup region so the capture kernel can safely use it. This has not caused problems so far because the first 64 KB is usually identical in both the crashed and capture kernels. However, this is just an assumption and is not guaranteed to always hold true. Fix update_backup_region_phdr() to correctly update the p_offset of the program header with a starting physical address of 0 by correcting the logic used to iterate over the program headers. Fixes: `cb350c1f1f` ("powerpc/kexec_file: Prepare elfcore header for crashing kernel") Reviewed-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260312083051.1935737-2-sourabhjain@linux.ibm.com	2026-04-01 09:21:04 +05:30
Nilay Shroff	6771c54728	powerpc/xive: fix kmemleak caused by incorrect chip_data lookup The kmemleak reports the following memory leak: Unreferenced object 0xc0000002a7fbc640 (size 64): comm "kworker/8:1", pid 540, jiffies 4294937872 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 09 04 00 04 00 00 ................ 00 00 a7 81 00 00 0a c0 00 00 08 04 00 04 00 00 ................ backtrace (crc 177d48f6): __kmalloc_cache_noprof+0x520/0x730 xive_irq_alloc_data.constprop.0+0x40/0xe0 xive_irq_domain_alloc+0xd0/0x1b0 irq_domain_alloc_irqs_parent+0x44/0x6c pseries_irq_domain_alloc+0x1cc/0x354 irq_domain_alloc_irqs_parent+0x44/0x6c msi_domain_alloc+0xb0/0x220 irq_domain_alloc_irqs_locked+0x138/0x4d0 __irq_domain_alloc_irqs+0x8c/0xfc __msi_domain_alloc_irqs+0x214/0x4d8 msi_domain_alloc_irqs_all_locked+0x70/0xf8 pci_msi_setup_msi_irqs+0x60/0x78 __pci_enable_msix_range+0x54c/0x98c pci_alloc_irq_vectors_affinity+0x16c/0x1d4 nvme_pci_enable+0xac/0x9c0 [nvme] nvme_probe+0x340/0x764 [nvme] This occurs when allocating MSI-X vectors for an NVMe device. During allocation the XIVE code creates a struct xive_irq_data and stores it in irq_data->chip_data. When the MSI-X irqdomain is later freed, xive_irq_free_data() is responsible for retrieving this structure and freeing it. However, after commit `cc0cc23bab` ("powerpc/xive: Untangle xive from child interrupt controller drivers"), xive_irq_free_data() retrieves the chip_data using irq_get_chip_data(), which looks up the data through the child domain. This is incorrect because the XIVE-specific irq data is associated with the XIVE (parent) domain. As a result the lookup fails and the allocated struct xive_irq_data is never freed, leading to the kmemleak report shown above. Fix this by retrieving the irq_data from the correct domain using irq_domain_get_irq_data() and then accessing the chip_data via irq_data_get_irq_chip_data(). Cc: stable@vger.kernel.org Fixes: `cc0cc23bab` ("powerpc/xive: Untangle xive from child interrupt controller drivers") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311134336.326996-1-nilay@linux.ibm.com	2026-03-30 16:15:57 +05:30
Ritesh Harjani (IBM)	d1503aa9ab	powerpc/64s: Add support for huge pfnmaps This uses _RPAGE_SW2 bit for the PMD and PUDs similar to PTEs. This also adds support for {pte,pmd,pud}_pgprot helpers needed for follow_pfnmap APIs. This allows us to extend the PFN mappings, e.g. PCI MMIO bars where it can grow as large as 8GB or even bigger, to map at PMD / PUD level. VFIO PCI core driver already supports fault handling at PMD / PUD level for more efficient BAR mappings. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/6fca726574236f556dd4e1e259692e82a4c29e85.1773058761.git.ritesh.list@gmail.com	2026-03-30 15:52:17 +05:30
Ritesh Harjani (IBM)	948b71aa81	drivers/vfio_pci_core: Change PXD_ORDER check from switch case to if/else block Architectures like PowerPC uses runtime defined values for PMD_ORDER/PUD_ORDER. This is because it can use either RADIX or HASH MMU at runtime using kernel cmdline. So the pXd_index_size is not known at compile time. Without this fix, when we add huge pfn support on powerpc in the next patch, vfio_pci_core driver compilation can fail with the following errors. CC [M] drivers/vfio/vfio_main.o CC [M] drivers/vfio/group.o CC [M] drivers/vfio/container.o CC [M] drivers/vfio/virqfd.o CC [M] drivers/vfio/vfio_iommu_spapr_tce.o CC [M] drivers/vfio/pci/vfio_pci_core.o CC [M] drivers/vfio/pci/vfio_pci_intrs.o CC [M] drivers/vfio/pci/vfio_pci_rdwr.o CC [M] drivers/vfio/pci/vfio_pci_config.o CC [M] drivers/vfio/pci/vfio_pci.o AR kernel/built-in.a ../drivers/vfio/pci/vfio_pci_core.c: In function ‘vfio_pci_vmf_insert_pfn’: ../drivers/vfio/pci/vfio_pci_core.c:1678:9: error: case label does not reduce to an integer constant 1678 \| case PMD_ORDER: \| ^~~~ ../drivers/vfio/pci/vfio_pci_core.c:1682:9: error: case label does not reduce to an integer constant 1682 \| case PUD_ORDER: \| ^~~~ make[6]: * [../scripts/Makefile.build:289: drivers/vfio/pci/vfio_pci_core.o] Error 1 make[6]: * Waiting for unfinished jobs.... make[5]: * [../scripts/Makefile.build:546: drivers/vfio/pci] Error 2 make[5]: * Waiting for unfinished jobs.... make[4]: * [../scripts/Makefile.build:546: drivers/vfio] Error 2 make[3]: * [../scripts/Makefile.build:546: drivers] Error 2 Fixes: `f9e54c3a2f` ("vfio/pci: implement huge_fault support") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Alex Williamson <alex@shazbot.org> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/b155e19993ee1f5584c72050192eb468b31c5029.1773058761.git.ritesh.list@gmail.com	2026-03-30 15:52:17 +05:30
Ritesh Harjani (IBM)	07791ff060	powerpc: Print MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS at startup Similar to CPU_FTRS_[POSSIBLE\|ALWAYS], let's also print MMU_FTRS_[POSSIBLE\|ALWAYS]. This has some useful data to capture during bootup. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/c37a9f314a723048d25aa5424f7ede8eec691d86.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	24eb637840	powerpc/64s: Make use of H_RPTI_TYPE_ALL macro Instead of opencoding, let's use the pre-defined macro (H_RPTI_TYPE_ALL) at the following places. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/d1d32404d5f0d3e93cd0faad2298b7bfed31288f.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	f074059c7a	powerpc/64s: Rename tlbie_lpid_va to tlbie_va_lpid In previous patch we renamed tlbie_va_lpid functions to tlbie_va_pid_lpid() since those were working with PIDs as well. This then allows us to rename tlbie_lpid_va to tlbie_va_lpid, which finally makes all the tlbie function naming consistent. No functional change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8fadd2beb2f883c65ba0d797c87d238098cd13c8.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	7bcfba20e9	powerpc/64s: Rename tlbie_va_lpid to tlbie_va_pid_lpid It only make sense to rename these functions, so it's better reflect what they are supposed to do. For e.g. __tlbie_va_pid_lpid name better reflect that it is invalidating tlbie using VA, PID and LPID. No functional change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/0a0b2cf23b9522f891f9a0f976bbdc5c8e6f6d8b.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	4894e2fb7b	powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb In previous patch we removed the only caller of exit_lazy_flush_tlb() which was passing always_flush = false in it's second argument. With that gone, all the callers of exit_lazy_flush_tlb() are local to radix_pgtable.c and there is no need of an additional argument. This patch does the required cleanup. There should not be any functionality change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/6f96ea53588034312ae84f74b1e2fa9c4ce7cfd5.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	bf7c1497d2	powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c Originally, commit `fa4531f753` ("powerpc/mm: Don't send IPI to all cpus on THP updates") introduced serialize_against_pte_lookup() call for both Radix and Hash. However below commit fixed the race with Radix commit `70cbc3cc78` ("mm: gup: fix the fast GUP race against THP collapse") And therefore following commit removed the serialize_against_pte_lookup() call from radix_pgtable.c commit `bedf034169` ("powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush") Now since serialize_against_pte_lookup() only gets called from hash__pmdp_collapse_flush(), thus move the related functions to hash_pgtable.c Hence this patch: - moves serialize_against_pte_lookup() from radix_pgtable.c to hash_pgtable.c - removes the radix specific calls from do_serialize() - renames do_serialize() to do_nothing(). There should not be any functionality change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/a73ebe800a9be257329507703779f822363f8b2f.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	4a342f3e6f	powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc() Commit `52162ec784` ("powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all") removed radix__flush_tlb_pwc() definition, but missed to remove the extern declaration. This patch removes it. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/b79c8ce8f00aa3e96ab9b1c77bc004759c397d3f.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	68b1fa0ed5	powerpc/64s: Fix _HPAGE_CHG_MASK to include _PAGE_SPECIAL bit commit `af38538801` ("mm/memory: factor out common code from vm_normal_page_()"), added a VM_WARN_ON_ONCE for huge zero pfn. This can lead to the following call stack. ------------[ cut here ]------------ WARNING: mm/memory.c:735 at vm_normal_page_pmd+0xf0/0x140, CPU#19: hmm-tests/3366 NIP [c00000000078d0c0] vm_normal_page_pmd+0xf0/0x140 LR [c00000000078d060] vm_normal_page_pmd+0x90/0x140 Call Trace: [c00000016f56f850] [c00000000078d060] vm_normal_page_pmd+0x90/0x140 (unreliable) [c00000016f56f8a0] [c0000000008a9e30] change_huge_pmd+0x7c0/0x870 [c00000016f56f930] [c0000000007b2bc4] change_protection+0x17a4/0x1e10 [c00000016f56fba0] [c0000000007b3440] mprotect_fixup+0x210/0x4c0 [c00000016f56fc30] [c0000000007b3c3c] do_mprotect_pkey+0x54c/0x780 [c00000016f56fdb0] [c0000000007b3ed8] sys_mprotect+0x68/0x90 [c00000016f56fdf0] [c00000000003ae40] system_call_exception+0x190/0x500 [c00000016f56fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec This happens when we call mprotect -> change_huge_pmd() mprotect() change_pmd_range() pmd_modify(oldpmd, newprot) # this clears _PAGE_SPECIAL for zero huge pmd pmdv = pmd_val(pmd); pmdv &= _HPAGE_CHG_MASK; # -> gets cleared here return pmd_set_protbits(__pmd(pmdv), newprot); can_change_pmd_writable(vma, vmf->address, pmd) vm_normal_page_pmd(vma, addr, pmd) __vm_normal_page() VM_WARN_ON(is_zero_pfn(pfn) \|\| is_huge_zero_pfn(pfn)); # this get hits as _PAGE_SPECIAL for zero huge pmd was cleared. It can be easily reproduced with the following testcase: p = mmap(NULL, 2 hpage_pmd_size, PROT_READ, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0); madvise((void )p, 2 hpage_pmd_size, MADV_HUGEPAGE); aligned = (char)(((unsigned long)p + hpage_pmd_size - 1) & ~(hpage_pmd_size - 1)); (void)((volatile char)aligned); // read fault, installs huge zero PMD mprotect((void )aligned, hpage_pmd_size, PROT_READ \| PROT_WRITE); This patch adds _PAGE_SPECIAL to _HPAGE_CHG_MASK similar to _PAGE_CHG_MASK, as we don't want to clear this bit when calling pmd_modify() while changing protection bits. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/7416f5cdbcfeaad947860fcac488b483f1287172.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	bbcbf045d6	powerpc/64s: Fix unmap race with PMD migration entries The following race is possible with migration swap entries or device-private THP entries. e.g. when move_pages is called on a PMD THP page, then there maybe an intermediate state, where PMD entry acts as a migration swap entry (pmd_present() is true). Then if an munmap happens at the same time, then this VM_BUG_ON() can happen in pmdp_huge_get_and_clear_full(). This patch fixes that. Thread A: move_pages() syscall add_folio_for_migration() mmap_read_lock(mm) folio_isolate_lru(folio) mmap_read_unlock(mm) do_move_pages_to_node() migrate_pages() try_to_migrate_one() spin_lock(ptl) set_pmd_migration_entry() pmdp_invalidate() # PMD: _PAGE_INVALID \| _PAGE_PTE \| pfn set_pmd_at() # PMD: migration swap entry (pmd_present=0) spin_unlock(ptl) [page copy phase] # <--- RACE WINDOW --> Thread B: munmap() mmap_write_downgrade(mm) unmap_vmas() -> zap_pmd_range() zap_huge_pmd() __pmd_trans_huge_lock() pmd_is_huge(): # !pmd_present && !pmd_none -> TRUE (swap entry) pmd_lock() -> # spin_lock(ptl), waits for Thread A to release ptl pmdp_huge_get_and_clear_full() VM_BUG_ON(!pmd_present(*pmdp)) # HITS! [ 287.738700][ T1867] ------------[ cut here ]------------ [ 287.743843][ T1867] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:187! cpu 0x0: Vector: 700 (Program Check) at [c00000044037f4f0] pc: c000000000094ca4: pmdp_huge_get_and_clear_full+0x6c/0x23c lr: c000000000645dec: zap_huge_pmd+0xb0/0x868 sp: c00000044037f790 msr: 800000000282b033 current = 0xc0000004032c1a00 paca = 0xc000000004fe0000 irqmask: 0x03 irq_happened: 0x09 pid = 1867, comm = a.out kernel BUG at :187! Linux version 6.19.0-12136-g14360d4f917c-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #27 SMP PREEMPT Sun Feb 22 10:38:56 IST 2026 enter ? for help [link register ] c000000000645dec zap_huge_pmd+0xb0/0x868 [c00000044037f790] c00000044037f7d0 (unreliable) [c00000044037f7d0] c000000000645dcc zap_huge_pmd+0x90/0x868 [c00000044037f840] c0000000005724cc unmap_page_range+0x176c/0x1f40 [c00000044037fa00] c000000000572ea0 unmap_vmas+0xb0/0x1d8 [c00000044037fa90] c0000000005af254 unmap_region+0xb4/0x128 [c00000044037fb50] c0000000005af400 vms_complete_munmap_vmas+0x138/0x310 [c00000044037fbe0] c0000000005b0f1c do_vmi_align_munmap+0x1ec/0x238 [c00000044037fd30] c0000000005b3688 __vm_munmap+0x170/0x1f8 [c00000044037fdf0] c000000000587f74 sys_munmap+0x2c/0x40 [c00000044037fe10] c000000000032668 system_call_exception+0x128/0x350 [c00000044037fe50] c00000000000d05c system_call_vectored_common+0x15c/0x2ec ---- Exception: 3000 (System Call Vectored) at 0000000010064a2c SP (7fff9b1ee9c0) is in userspace 0:mon> zh commit `a30b48bf1b` ("mm/migrate_device: implement THP migration of zone device pages"), enabled migration for device-private PMD entries. Hence this is one other path where this warning could get trigger from. ------------[ cut here ]------------ WARNING: arch/powerpc/mm/book3s64/hash_pgtable.c:199 at hash__pmd_hugepage_update+0x48/0x284, CPU#3: hmm-tests/1905 Modules linked in: test_hmm CPU: 3 UID: 0 PID: 1905 Comm: hmm-tests Tainted: G B W L N 7.0.0-rc1-01438-g7e2f0ee7581c #21 PREEMPT Tainted: [B]=BAD_PAGE, [W]=WARN, [L]=SOFTLOCKUP, [N]=TEST Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries NIP [c000000000096b70] hash__pmd_hugepage_update+0x48/0x284 LR [c000000000096e7c] hash__pmdp_huge_get_and_clear+0xd0/0xd4 Call Trace: [c000000604707670] [c000000004e102b8] 0xc000000004e102b8 (unreliable) [c000000604707700] [c00000000064ec3c] set_pmd_migration_entry+0x414/0x498 [c000000604707760] [c00000000063e5a4] migrate_vma_collect_pmd+0x12e8/0x16c4 [c000000604707890] [c00000000059282c] walk_pgd_range+0x7fc/0xd2c [c000000604707990] [c000000000592e40] __walk_page_range+0xe4/0x2ac [c000000604707a10] [c000000000593534] walk_page_range_mm_unsafe+0x204/0x2a4 [c000000604707ab0] [c00000000063af10] migrate_vma_setup+0x1dc/0x2e8 [c000000604707b10] [c008000006a21838] dmirror_migrate_to_system.constprop.0+0x210/0x4b0 [test_hmm] [c000000604707c30] [c008000006a245b0] dmirror_fops_unlocked_ioctl+0x454/0xa5c [test_hmm] [c000000604707d20] [c0000000006aab84] sys_ioctl+0x4ec/0x1178 [c000000604707e10] [c0000000000326a8] system_call_exception+0x128/0x350 [c000000604707e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec ---- interrupt: 3000 at 0x7fffbe44f50c Fixes: `75358ea359` ("powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race") Fixes: `a30b48bf1b` ("mm/migrate_device: implement THP migration of zone device pages") Reported-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/9437e5ef28d1e2f5cbdd7f8286350ce93c1d43c5.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)	fda4d71651	powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy powerpc uses pt_frag_refcount as a reference counter for tracking it's pte and pmd page table fragments. For PTE table, in case of Hash with 64K pagesize, we have 16 fragments of 4K size in one 64K page. Patch series [1] "mm: free retracted page table by RCU" added pte_free_defer() to defer the freeing of PTE tables when retract_page_tables() is called for madvise MADV_COLLAPSE on shmem range. [1]: https://lore.kernel.org/all/7cd843a9-aa80-14f-5eb2-33427363c20@google.com/ pte_free_defer() sets the active flag on the corresponding fragment's folio & calls pte_fragment_free(), which reduces the pt_frag_refcount. When pt_frag_refcount reaches 0 (no active fragment using the folio), it checks if the folio active flag is set, if set, it calls call_rcu to free the folio, it the active flag is unset then it calls pte_free_now(). Now, this can lead to following problem in a corner case... [ 265.351553][ T183] BUG: Bad page state in process a.out pfn:20d62 [ 265.353555][ T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62 [ 265.355457][ T183] flags: 0x3ffff800000100(active\|node=0\|zone=0\|lastcpupid=0x7ffff) [ 265.358719][ T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000 [ 265.360177][ T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000 [ 265.361438][ T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 265.362572][ T183] Modules linked in: [ 265.364622][ T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY [ 265.364785][ T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries [ 265.364908][ T183] Call Trace: [ 265.364955][ T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable) [ 265.365202][ T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8 [ 265.365384][ T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08 [ 265.365554][ T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310 [ 265.365729][ T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218 [ 265.365912][ T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820 [ 265.366080][ T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300 [ 265.366244][ T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508 [ 265.366421][ T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148 [ 265.366602][ T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178 [ 265.366780][ T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0 [ 265.366958][ T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec The bad page state error occurs when such a folio gets freed (with active flag set), from do_exit() path in parallel. ... this can happen when the pte fragment was allocated from this folio, but when all the fragments get freed, the pte_frag_refcount still had some unused fragments. Now, if this process exits, with such folio as it's cached pte_frag in mm->context, then during pte_frag_destroy(), we simply call pagetable_dtor() and pagetable_free(), meaning it doesn't clear the active flag. This, can lead to the above bug. Since we are anyway in do_exit() path, then if the refcount is 0, then I guess it should be ok to simply clear the folio active flag before calling pagetable_dtor() & pagetable_free(). Fixes: `32cc0b7c9d` ("powerpc: add pte_free_defer() for pgtables sharing page") Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/ee13e7f99b8f258019da2b37655b998e73e5ef8b.1773078178.git.ritesh.list@gmail.com	2026-03-17 13:56:36 +05:30
Linus Torvalds	f338e77383	Linux 7.0-rc4 v7.0-rc4	2026-03-15 13:52:05 -07:00
Linus Torvalds	5c2fe8d11a	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "The one core change is a re-roll of the tag allocation fix from the last pull request that uses the correct goto to unroll all the allocations. The remianing fixes are all small ones in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: hisi_sas: Fix NULL pointer exception during user_scan() scsi: qla2xxx: Completely fix fcport double free scsi: ufs: core: Fix SError in ufshcd_rtc_work() during UFS suspend scsi: core: Fix error handling for scsi_alloc_sdev()	2026-03-15 13:15:39 -07:00
Linus Torvalds	d9bf296c39	Merge tag 'probes-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Avoid crash when rmmod/insmod after ftrace killed This fixes a kernel crash caused by kprobes on the symbol in a module which is unloaded after ftrace_kill() is called. - Remove unneeded warnings from __arm_kprobe_ftrace() Remove unneeded WARN messages which can be triggered if the kprobe is using ftrace and it fails to enable the ftrace. Since kprobes correctly handle such failure, we don't need to warn it. * tag 'probes-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() kprobes: avoid crash when rmmod/insmod after ftrace killed	2026-03-15 13:08:05 -07:00
Linus Torvalds	62cda74c79	Merge tag 'bootconfig-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig fixes from Masami Hiramatsu: - fix off-by-one in xbc_verify_tree() unclosed brace error. This fixes a wrong error place in unclosed brace error message - check bounds before writing in __xbc_open_brace(). This fixes to check the array index before setting array, so that the bootconfig can support 16th-depth nested brace correctly - fix snprintf truncation check in xbc_node_compose_key_after(). This fixes to handle the return value of snprintf() correctly in case of the return value == size - Add bootconfig tests about braces Add test cases for checking error position about unclosed brace and ensuring supporting 16th depth nested braces correctly * tag 'bootconfig-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: bootconfig: Add bootconfig tests about braces lib/bootconfig: fix snprintf truncation check in xbc_node_compose_key_after() lib/bootconfig: check bounds before writing in __xbc_open_brace() lib/bootconfig: fix off-by-one in xbc_verify_tree() unclosed brace error	2026-03-15 12:50:05 -07:00
Linus Torvalds	11e8c7e947	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk after the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...	2026-03-15 12:22:10 -07:00
Linus Torvalds	4f3df2e5ea	Merge tag 'powerpc-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix KUAP warning in VMX usercopy path - Fix lockdep warning during PCI enumeration - Fix to move CMA reservations to arch_mm_preinit - Fix to check current->mm is alive before getting user callchain Thanks to Aboorva Devarajan, Christophe Leroy (CS GROUP), Dan Horák, Nicolin Chen, Nilay Shroff, Qiao Zhao, Ritesh Harjani (IBM), Saket Kumar Bhaskar, Sayali Patil, Shrikanth Hegde, Venkat Rao Bagalkote, and Viktor Malik. * tag 'powerpc-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/iommu: fix lockdep warning during PCI enumeration powerpc/selftests/copyloops: extend selftest to exercise __copy_tofrom_user_power7_vmx powerpc: fix KUAP warning in VMX usercopy path powerpc, perf: Check that current->mm is alive before getting user callchain powerpc/mem: Move CMA reservations to arch_mm_preinit	2026-03-15 11:36:11 -07:00
Linus Torvalds	13af67f599	Merge tag 'x86-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Work around S2RAM hang if the firmware unexpectedly re-enables the x2apic hardware while it was disabled by the kernel. Force-disable it again and issue a warning into the syslog" * tag 'x86-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Disable x2apic on resume if the kernel expects so	2026-03-15 11:26:36 -07:00
Linus Torvalds	164cb546e9	Merge tag 'timers-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix function tracer recursion bug by marking jiffies_64_to_clock_t() notrace" * tag 'timers-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time/jiffies: Mark jiffies_64_to_clock_t() notrace	2026-03-15 11:14:09 -07:00
Linus Torvalds	63724e9519	Merge tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "More MM-CID fixes, mostly fixing hangs/races: - Fix CID hangs due to a race between concurrent forks - Fix vfork()/CLONE_VM MMCID bug causing hangs - Remove pointless preemption guard - Fix CID task list walk performance regression on large systems by removing the known-flaky and slow counting logic using for_each_process_thread() in mm_cid_fixup_tasks_to_cpus(), and implementing a simple sched_mm_cid::node list instead" tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Avoid full tasklist walks sched/mmcid: Remove pointless preempt guard sched/mmcid: Handle vfork()/CLONE_VM correctly sched/mmcid: Prevent CID stalls due to concurrent forks	2026-03-15 10:49:47 -07:00
Linus Torvalds	9745031130	Merge tag 'objtool-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: - Fix cross-build bug by using HOSTCFLAGS for HAVE_XXHASH test - Fix klp bug by fixing detection of corrupt static branch/call entries - Handle unsupported pr_debug() usage more gracefully - Fix hypothetical klp bug by avoiding NULL pointer dereference when printing code symbol name - Fix data alignment bug in elf_add_data() causing mangled strings - Fix confusing ERROR_INSN() error message - Handle unexpected Clang RSP musical chairs causing false positive warnings - Fix another objtool stack overflow in validate_branch() * tag 'objtool-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix another stack overflow in validate_branch() objtool: Handle Clang RSP musical chairs objtool: Fix ERROR_INSN() error message objtool: Fix data alignment in elf_add_data() objtool: Use HOSTCFLAGS for HAVE_XXHASH test objtool/klp: Avoid NULL pointer dereference when printing code symbol name objtool/klp: Disable unsupported pr_debug() usage objtool/klp: Fix detection of corrupt static branch/call entries	2026-03-15 10:36:01 -07:00
Linus Torvalds	be2e3750ce	Merge tag 'irq-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Two fixes for the riscv-aplic irqchip driver: - Fix probing dependency bug on probing failure - Fix double register_syscore() bug" * tag 'irq-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-aplic: Register syscore operations only once irqchip/riscv-aplic: Do not clear ACPI dependencies on probe failure	2026-03-15 10:32:57 -07:00

1 2 3 4 5 ...

1428451 Commits