From fda4d71651f71c44b35829d13f3c8bf920032f77 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:24 +0530 Subject: [PATCH 01/47] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy powerpc uses pt_frag_refcount as a reference counter for tracking it's pte and pmd page table fragments. For PTE table, in case of Hash with 64K pagesize, we have 16 fragments of 4K size in one 64K page. Patch series [1] "mm: free retracted page table by RCU" added pte_free_defer() to defer the freeing of PTE tables when retract_page_tables() is called for madvise MADV_COLLAPSE on shmem range. [1]: https://lore.kernel.org/all/7cd843a9-aa80-14f-5eb2-33427363c20@google.com/ pte_free_defer() sets the active flag on the corresponding fragment's folio & calls pte_fragment_free(), which reduces the pt_frag_refcount. When pt_frag_refcount reaches 0 (no active fragment using the folio), it checks if the folio active flag is set, if set, it calls call_rcu to free the folio, it the active flag is unset then it calls pte_free_now(). Now, this can lead to following problem in a corner case... [ 265.351553][ T183] BUG: Bad page state in process a.out pfn:20d62 [ 265.353555][ T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62 [ 265.355457][ T183] flags: 0x3ffff800000100(active|node=0|zone=0|lastcpupid=0x7ffff) [ 265.358719][ T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000 [ 265.360177][ T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000 [ 265.361438][ T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 265.362572][ T183] Modules linked in: [ 265.364622][ T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY [ 265.364785][ T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries [ 265.364908][ T183] Call Trace: [ 265.364955][ T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable) [ 265.365202][ T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8 [ 265.365384][ T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08 [ 265.365554][ T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310 [ 265.365729][ T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218 [ 265.365912][ T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820 [ 265.366080][ T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300 [ 265.366244][ T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508 [ 265.366421][ T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148 [ 265.366602][ T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178 [ 265.366780][ T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0 [ 265.366958][ T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec The bad page state error occurs when such a folio gets freed (with active flag set), from do_exit() path in parallel. ... this can happen when the pte fragment was allocated from this folio, but when all the fragments get freed, the pte_frag_refcount still had some unused fragments. Now, if this process exits, with such folio as it's cached pte_frag in mm->context, then during pte_frag_destroy(), we simply call pagetable_dtor() and pagetable_free(), meaning it doesn't clear the active flag. This, can lead to the above bug. Since we are anyway in do_exit() path, then if the refcount is 0, then I guess it should be ok to simply clear the folio active flag before calling pagetable_dtor() & pagetable_free(). Fixes: 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables sharing page") Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/ee13e7f99b8f258019da2b37655b998e73e5ef8b.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/mm/pgtable-frag.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 77e55eac16e4..ae742564a3d5 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -25,6 +25,7 @@ void pte_frag_destroy(void *pte_frag) count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT; /* We allow PTE_FRAG_NR fragments from a PTE page */ if (atomic_sub_and_test(PTE_FRAG_NR - count, &ptdesc->pt_frag_refcount)) { + folio_clear_active(ptdesc_folio(ptdesc)); pagetable_dtor(ptdesc); pagetable_free(ptdesc); } From bbcbf045d6c778e82b47a35fc8728387708e9a3d Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:25 +0530 Subject: [PATCH 02/47] powerpc/64s: Fix unmap race with PMD migration entries The following race is possible with migration swap entries or device-private THP entries. e.g. when move_pages is called on a PMD THP page, then there maybe an intermediate state, where PMD entry acts as a migration swap entry (pmd_present() is true). Then if an munmap happens at the same time, then this VM_BUG_ON() can happen in pmdp_huge_get_and_clear_full(). This patch fixes that. Thread A: move_pages() syscall add_folio_for_migration() mmap_read_lock(mm) folio_isolate_lru(folio) mmap_read_unlock(mm) do_move_pages_to_node() migrate_pages() try_to_migrate_one() spin_lock(ptl) set_pmd_migration_entry() pmdp_invalidate() # PMD: _PAGE_INVALID | _PAGE_PTE | pfn set_pmd_at() # PMD: migration swap entry (pmd_present=0) spin_unlock(ptl) [page copy phase] # <--- RACE WINDOW --> Thread B: munmap() mmap_write_downgrade(mm) unmap_vmas() -> zap_pmd_range() zap_huge_pmd() __pmd_trans_huge_lock() pmd_is_huge(): # !pmd_present && !pmd_none -> TRUE (swap entry) pmd_lock() -> # spin_lock(ptl), waits for Thread A to release ptl pmdp_huge_get_and_clear_full() VM_BUG_ON(!pmd_present(*pmdp)) # HITS! [ 287.738700][ T1867] ------------[ cut here ]------------ [ 287.743843][ T1867] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:187! cpu 0x0: Vector: 700 (Program Check) at [c00000044037f4f0] pc: c000000000094ca4: pmdp_huge_get_and_clear_full+0x6c/0x23c lr: c000000000645dec: zap_huge_pmd+0xb0/0x868 sp: c00000044037f790 msr: 800000000282b033 current = 0xc0000004032c1a00 paca = 0xc000000004fe0000 irqmask: 0x03 irq_happened: 0x09 pid = 1867, comm = a.out kernel BUG at :187! Linux version 6.19.0-12136-g14360d4f917c-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #27 SMP PREEMPT Sun Feb 22 10:38:56 IST 2026 enter ? for help [link register ] c000000000645dec zap_huge_pmd+0xb0/0x868 [c00000044037f790] c00000044037f7d0 (unreliable) [c00000044037f7d0] c000000000645dcc zap_huge_pmd+0x90/0x868 [c00000044037f840] c0000000005724cc unmap_page_range+0x176c/0x1f40 [c00000044037fa00] c000000000572ea0 unmap_vmas+0xb0/0x1d8 [c00000044037fa90] c0000000005af254 unmap_region+0xb4/0x128 [c00000044037fb50] c0000000005af400 vms_complete_munmap_vmas+0x138/0x310 [c00000044037fbe0] c0000000005b0f1c do_vmi_align_munmap+0x1ec/0x238 [c00000044037fd30] c0000000005b3688 __vm_munmap+0x170/0x1f8 [c00000044037fdf0] c000000000587f74 sys_munmap+0x2c/0x40 [c00000044037fe10] c000000000032668 system_call_exception+0x128/0x350 [c00000044037fe50] c00000000000d05c system_call_vectored_common+0x15c/0x2ec ---- Exception: 3000 (System Call Vectored) at 0000000010064a2c SP (7fff9b1ee9c0) is in userspace 0:mon> zh commit a30b48bf1b24 ("mm/migrate_device: implement THP migration of zone device pages"), enabled migration for device-private PMD entries. Hence this is one other path where this warning could get trigger from. ------------[ cut here ]------------ WARNING: arch/powerpc/mm/book3s64/hash_pgtable.c:199 at hash__pmd_hugepage_update+0x48/0x284, CPU#3: hmm-tests/1905 Modules linked in: test_hmm CPU: 3 UID: 0 PID: 1905 Comm: hmm-tests Tainted: G B W L N 7.0.0-rc1-01438-g7e2f0ee7581c #21 PREEMPT Tainted: [B]=BAD_PAGE, [W]=WARN, [L]=SOFTLOCKUP, [N]=TEST Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries NIP [c000000000096b70] hash__pmd_hugepage_update+0x48/0x284 LR [c000000000096e7c] hash__pmdp_huge_get_and_clear+0xd0/0xd4 Call Trace: [c000000604707670] [c000000004e102b8] 0xc000000004e102b8 (unreliable) [c000000604707700] [c00000000064ec3c] set_pmd_migration_entry+0x414/0x498 [c000000604707760] [c00000000063e5a4] migrate_vma_collect_pmd+0x12e8/0x16c4 [c000000604707890] [c00000000059282c] walk_pgd_range+0x7fc/0xd2c [c000000604707990] [c000000000592e40] __walk_page_range+0xe4/0x2ac [c000000604707a10] [c000000000593534] walk_page_range_mm_unsafe+0x204/0x2a4 [c000000604707ab0] [c00000000063af10] migrate_vma_setup+0x1dc/0x2e8 [c000000604707b10] [c008000006a21838] dmirror_migrate_to_system.constprop.0+0x210/0x4b0 [test_hmm] [c000000604707c30] [c008000006a245b0] dmirror_fops_unlocked_ioctl+0x454/0xa5c [test_hmm] [c000000604707d20] [c0000000006aab84] sys_ioctl+0x4ec/0x1178 [c000000604707e10] [c0000000000326a8] system_call_exception+0x128/0x350 [c000000604707e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec ---- interrupt: 3000 at 0x7fffbe44f50c Fixes: 75358ea359e7c ("powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race") Fixes: a30b48bf1b24 ("mm/migrate_device: implement THP migration of zone device pages") Reported-by: Pavithra Prakash Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/9437e5ef28d1e2f5cbdd7f8286350ce93c1d43c5.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/include/asm/book3s/64/pgtable.h | 15 +++++++++++++++ arch/powerpc/mm/book3s64/pgtable.c | 13 +++++++++---- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 1a91762b455d..66a953046a49 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1313,12 +1313,27 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, { pmd_t old_pmd; + /* + * Non-present PMDs can be migration entries or device-private THP + * entries. This can happen at 2 places: + * - When the address space is being unmapped zap_huge_pmd(), and we + * encounter non-present pmds. + * - migrate_vma_collect_huge_pmd() could calls this during migration + * of device-private pmd entries. + */ + if (!pmd_present(*pmdp)) { + old_pmd = READ_ONCE(*pmdp); + pmd_clear(pmdp); + goto out; + } + if (radix_enabled()) { old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp); } else { old_pmd = hash__pmdp_huge_get_and_clear(mm, addr, pmdp); } +out: page_table_check_pmd_clear(mm, addr, old_pmd); return old_pmd; diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 4b09c04654a8..42c7906d0e43 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -209,16 +209,21 @@ pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, int full) { pmd_t pmd; + bool was_present = pmd_present(*pmdp); + VM_BUG_ON(addr & ~HPAGE_PMD_MASK); - VM_BUG_ON((pmd_present(*pmdp) && !pmd_trans_huge(*pmdp)) || - !pmd_present(*pmdp)); + VM_BUG_ON(was_present && !pmd_trans_huge(*pmdp)); + /* + * Check pmdp_huge_get_and_clear() for non-present pmd case. + */ pmd = pmdp_huge_get_and_clear(vma->vm_mm, addr, pmdp); /* * if it not a fullmm flush, then we can possibly end up converting * this PMD pte entry to a regular level 0 PTE by a parallel page fault. - * Make sure we flush the tlb in this case. + * Make sure we flush the tlb in this case. TLB flush not needed for + * non-present case. */ - if (!full) + if (was_present && !full) flush_pmd_tlb_range(vma, addr, addr + HPAGE_PMD_SIZE); return pmd; } From 68b1fa0ed5c84769e4e60d58f6a5af37e7273b51 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:26 +0530 Subject: [PATCH 03/47] powerpc/64s: Fix _HPAGE_CHG_MASK to include _PAGE_SPECIAL bit commit af38538801c6a ("mm/memory: factor out common code from vm_normal_page_*()"), added a VM_WARN_ON_ONCE for huge zero pfn. This can lead to the following call stack. ------------[ cut here ]------------ WARNING: mm/memory.c:735 at vm_normal_page_pmd+0xf0/0x140, CPU#19: hmm-tests/3366 NIP [c00000000078d0c0] vm_normal_page_pmd+0xf0/0x140 LR [c00000000078d060] vm_normal_page_pmd+0x90/0x140 Call Trace: [c00000016f56f850] [c00000000078d060] vm_normal_page_pmd+0x90/0x140 (unreliable) [c00000016f56f8a0] [c0000000008a9e30] change_huge_pmd+0x7c0/0x870 [c00000016f56f930] [c0000000007b2bc4] change_protection+0x17a4/0x1e10 [c00000016f56fba0] [c0000000007b3440] mprotect_fixup+0x210/0x4c0 [c00000016f56fc30] [c0000000007b3c3c] do_mprotect_pkey+0x54c/0x780 [c00000016f56fdb0] [c0000000007b3ed8] sys_mprotect+0x68/0x90 [c00000016f56fdf0] [c00000000003ae40] system_call_exception+0x190/0x500 [c00000016f56fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec This happens when we call mprotect -> change_huge_pmd() mprotect() change_pmd_range() pmd_modify(oldpmd, newprot) # this clears _PAGE_SPECIAL for zero huge pmd pmdv = pmd_val(pmd); pmdv &= _HPAGE_CHG_MASK; # -> gets cleared here return pmd_set_protbits(__pmd(pmdv), newprot); can_change_pmd_writable(vma, vmf->address, pmd) vm_normal_page_pmd(vma, addr, pmd) __vm_normal_page() VM_WARN_ON(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)); # this get hits as _PAGE_SPECIAL for zero huge pmd was cleared. It can be easily reproduced with the following testcase: p = mmap(NULL, 2 * hpage_pmd_size, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); madvise((void *)p, 2 * hpage_pmd_size, MADV_HUGEPAGE); aligned = (char*)(((unsigned long)p + hpage_pmd_size - 1) & ~(hpage_pmd_size - 1)); (void)(*(volatile char*)aligned); // read fault, installs huge zero PMD mprotect((void *)aligned, hpage_pmd_size, PROT_READ | PROT_WRITE); This patch adds _PAGE_SPECIAL to _HPAGE_CHG_MASK similar to _PAGE_CHG_MASK, as we don't want to clear this bit when calling pmd_modify() while changing protection bits. Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/7416f5cdbcfeaad947860fcac488b483f1287172.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 66a953046a49..59e24507f237 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -107,8 +107,8 @@ * in here, on radix we expect them to be zero. */ #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \ - _PAGE_ACCESSED | H_PAGE_THP_HUGE | _PAGE_PTE | \ - _PAGE_SOFT_DIRTY) + _PAGE_ACCESSED | H_PAGE_THP_HUGE | _PAGE_SPECIAL | \ + _PAGE_PTE | _PAGE_SOFT_DIRTY) /* * user access blocked by key */ From 4a342f3e6f6848c816a661d8d7b10c75430598cf Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:27 +0530 Subject: [PATCH 04/47] powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc() Commit 52162ec784fa ("powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all") removed radix__flush_tlb_pwc() definition, but missed to remove the extern declaration. This patch removes it. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/b79c8ce8f00aa3e96ab9b1c77bc004759c397d3f.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/include/asm/book3s/64/tlbflush-radix.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h index a38542259fab..de9b96660582 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h @@ -92,7 +92,6 @@ extern void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmad #define radix__flush_tlb_page(vma,addr) radix__local_flush_tlb_page(vma,addr) #define radix__flush_tlb_page_psize(mm,addr,p) radix__local_flush_tlb_page_psize(mm,addr,p) #endif -extern void radix__flush_tlb_pwc(struct mmu_gather *tlb, unsigned long addr); extern void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr); extern void radix__flush_tlb_all(void); From bf7c1497d2568ff803a0b0fc6728a1c06d11bf6e Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:28 +0530 Subject: [PATCH 05/47] powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c Originally, commit fa4531f753f1 ("powerpc/mm: Don't send IPI to all cpus on THP updates") introduced serialize_against_pte_lookup() call for both Radix and Hash. However below commit fixed the race with Radix commit 70cbc3cc78a9 ("mm: gup: fix the fast GUP race against THP collapse") And therefore following commit removed the serialize_against_pte_lookup() call from radix_pgtable.c commit bedf03416913 ("powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush") Now since serialize_against_pte_lookup() only gets called from hash__pmdp_collapse_flush(), thus move the related functions to hash_pgtable.c Hence this patch: - moves serialize_against_pte_lookup() from radix_pgtable.c to hash_pgtable.c - removes the radix specific calls from do_serialize() - renames do_serialize() to do_nothing(). There should not be any functionality change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/a73ebe800a9be257329507703779f822363f8b2f.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/include/asm/book3s/64/pgtable.h | 1 - arch/powerpc/mm/book3s64/hash_pgtable.c | 21 ++++++++++++++++ arch/powerpc/mm/book3s64/pgtable.c | 25 -------------------- 3 files changed, 21 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 59e24507f237..a105aede4f6b 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1415,7 +1415,6 @@ static inline bool arch_needs_pgtable_deposit(void) return false; return true; } -extern void serialize_against_pte_lookup(struct mm_struct *mm); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c index ac2a24d15d2e..d9b5b751d7b7 100644 --- a/arch/powerpc/mm/book3s64/hash_pgtable.c +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c @@ -221,6 +221,27 @@ unsigned long hash__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr return old; } +static void do_nothing(void *arg) +{ + +} + +/* + * Serialize against __find_linux_pte() which does lock-less + * lookup in page tables with local interrupts disabled. For huge pages + * it casts pmd_t to pte_t. Since format of pte_t is different from + * pmd_t we want to prevent transit from pmd pointing to page table + * to pmd pointing to huge page (and back) while interrupts are disabled. + * We clear pmd to possibly replace it with page table pointer in + * different code paths. So make sure we wait for the parallel + * __find_linux_pte() to finish. + */ +static void serialize_against_pte_lookup(struct mm_struct *mm) +{ + smp_mb(); + smp_call_function_many(mm_cpumask(mm), do_nothing, mm, 1); +} + pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 42c7906d0e43..faec2dc71a5c 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -150,31 +150,6 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr, return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud)); } -static void do_serialize(void *arg) -{ - /* We've taken the IPI, so try to trim the mask while here */ - if (radix_enabled()) { - struct mm_struct *mm = arg; - exit_lazy_flush_tlb(mm, false); - } -} - -/* - * Serialize against __find_linux_pte() which does lock-less - * lookup in page tables with local interrupts disabled. For huge pages - * it casts pmd_t to pte_t. Since format of pte_t is different from - * pmd_t we want to prevent transit from pmd pointing to page table - * to pmd pointing to huge page (and back) while interrupts are disabled. - * We clear pmd to possibly replace it with page table pointer in - * different code paths. So make sure we wait for the parallel - * __find_linux_pte() to finish. - */ -void serialize_against_pte_lookup(struct mm_struct *mm) -{ - smp_mb(); - smp_call_function_many(mm_cpumask(mm), do_serialize, mm, 1); -} - /* * We use this to invalidate a pmdp entry before switching from a * hugepte to regular pmd entry. From 4894e2fb7b9a25cef843ee2c3b2ac49fd808647d Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:29 +0530 Subject: [PATCH 06/47] powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb In previous patch we removed the only caller of exit_lazy_flush_tlb() which was passing always_flush = false in it's second argument. With that gone, all the callers of exit_lazy_flush_tlb() are local to radix_pgtable.c and there is no need of an additional argument. This patch does the required cleanup. There should not be any functionality change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/6f96ea53588034312ae84f74b1e2fa9c4ce7cfd5.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/mm/book3s64/internal.h | 2 -- arch/powerpc/mm/book3s64/pgtable.c | 2 -- arch/powerpc/mm/book3s64/radix_tlb.c | 14 +++++--------- 3 files changed, 5 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h index cad08d83369c..f7055251c8b7 100644 --- a/arch/powerpc/mm/book3s64/internal.h +++ b/arch/powerpc/mm/book3s64/internal.h @@ -31,6 +31,4 @@ static inline bool slb_preload_disabled(void) void hpt_do_stress(unsigned long ea, unsigned long hpte_group); -void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush); - #endif /* ARCH_POWERPC_MM_BOOK3S64_INTERNAL_H */ diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index faec2dc71a5c..d32197d3298a 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -23,8 +23,6 @@ #include #include -#include "internal.h" - struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT]; EXPORT_SYMBOL_GPL(mmu_psize_defs); diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 9e1f6558d026..339bd276840b 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -19,8 +19,6 @@ #include #include -#include "internal.h" - /* * tlbiel instruction for radix, set invalidation * i.e., r=1 and is=01 or is=10 or is=11 @@ -660,7 +658,7 @@ static bool mm_needs_flush_escalation(struct mm_struct *mm) * If always_flush is true, then flush even if this CPU can't be removed * from mm_cpumask. */ -void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush) +static void exit_lazy_flush_tlb(struct mm_struct *mm) { unsigned long pid = mm->context.id; int cpu = smp_processor_id(); @@ -703,19 +701,17 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush) if (cpumask_test_cpu(cpu, mm_cpumask(mm))) { dec_mm_active_cpus(mm); cpumask_clear_cpu(cpu, mm_cpumask(mm)); - always_flush = true; } out: - if (always_flush) - _tlbiel_pid(pid, RIC_FLUSH_ALL); + _tlbiel_pid(pid, RIC_FLUSH_ALL); } #ifdef CONFIG_SMP static void do_exit_flush_lazy_tlb(void *arg) { struct mm_struct *mm = arg; - exit_lazy_flush_tlb(mm, true); + exit_lazy_flush_tlb(mm); } static void exit_flush_lazy_tlbs(struct mm_struct *mm) @@ -777,7 +773,7 @@ static enum tlb_flush_type flush_type_needed(struct mm_struct *mm, bool fullmm) * to trim. */ if (tick_and_test_trim_clock()) { - exit_lazy_flush_tlb(mm, true); + exit_lazy_flush_tlb(mm); return FLUSH_TYPE_NONE; } } @@ -823,7 +819,7 @@ static enum tlb_flush_type flush_type_needed(struct mm_struct *mm, bool fullmm) if (current->mm == mm) return FLUSH_TYPE_LOCAL; if (cpumask_test_cpu(cpu, mm_cpumask(mm))) - exit_lazy_flush_tlb(mm, true); + exit_lazy_flush_tlb(mm); return FLUSH_TYPE_NONE; } From 7bcfba20e946ec160fd72c3a0b4cf6e3e845d629 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:30 +0530 Subject: [PATCH 07/47] powerpc/64s: Rename tlbie_va_lpid to tlbie_va_pid_lpid It only make sense to rename these functions, so it's better reflect what they are supposed to do. For e.g. __tlbie_va_pid_lpid name better reflect that it is invalidating tlbie using VA, PID and LPID. No functional change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/0a0b2cf23b9522f891f9a0f976bbdc5c8e6f6d8b.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/mm/book3s64/radix_tlb.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 339bd276840b..1adf20798ca6 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -1411,7 +1411,7 @@ static __always_inline void __tlbie_pid_lpid(unsigned long pid, trace_tlbie(0, 0, rb, rs, ric, prs, r); } -static __always_inline void __tlbie_va_lpid(unsigned long va, unsigned long pid, +static __always_inline void __tlbie_va_pid_lpid(unsigned long va, unsigned long pid, unsigned long lpid, unsigned long ap, unsigned long ric) { @@ -1443,7 +1443,7 @@ static inline void fixup_tlbie_pid_lpid(unsigned long pid, unsigned long lpid) if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync" : : : "memory"); - __tlbie_va_lpid(va, pid, lpid, mmu_get_ap(MMU_PAGE_64K), + __tlbie_va_pid_lpid(va, pid, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); } } @@ -1474,7 +1474,7 @@ static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long lpid, asm volatile("eieio; tlbsync; ptesync" : : : "memory"); } -static inline void fixup_tlbie_va_range_lpid(unsigned long va, +static inline void fixup_tlbie_va_range_pid_lpid(unsigned long va, unsigned long pid, unsigned long lpid, unsigned long ap) @@ -1486,11 +1486,11 @@ static inline void fixup_tlbie_va_range_lpid(unsigned long va, if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync" : : : "memory"); - __tlbie_va_lpid(va, pid, lpid, ap, RIC_FLUSH_TLB); + __tlbie_va_pid_lpid(va, pid, lpid, ap, RIC_FLUSH_TLB); } } -static inline void __tlbie_va_range_lpid(unsigned long start, unsigned long end, +static inline void __tlbie_va_range_pid_lpid(unsigned long start, unsigned long end, unsigned long pid, unsigned long lpid, unsigned long page_size, unsigned long psize) @@ -1499,12 +1499,12 @@ static inline void __tlbie_va_range_lpid(unsigned long start, unsigned long end, unsigned long ap = mmu_get_ap(psize); for (addr = start; addr < end; addr += page_size) - __tlbie_va_lpid(addr, pid, lpid, ap, RIC_FLUSH_TLB); + __tlbie_va_pid_lpid(addr, pid, lpid, ap, RIC_FLUSH_TLB); - fixup_tlbie_va_range_lpid(addr - page_size, pid, lpid, ap); + fixup_tlbie_va_range_pid_lpid(addr - page_size, pid, lpid, ap); } -static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end, +static inline void _tlbie_va_range_pid_lpid(unsigned long start, unsigned long end, unsigned long pid, unsigned long lpid, unsigned long page_size, unsigned long psize, bool also_pwc) @@ -1512,7 +1512,7 @@ static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end, asm volatile("ptesync" : : : "memory"); if (also_pwc) __tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC); - __tlbie_va_range_lpid(start, end, pid, lpid, page_size, psize); + __tlbie_va_range_pid_lpid(start, end, pid, lpid, page_size, psize); asm volatile("eieio; tlbsync; ptesync" : : : "memory"); } @@ -1563,7 +1563,7 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid, _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB); return; } - _tlbie_va_range_lpid(start, end, pid, lpid, + _tlbie_va_range_pid_lpid(start, end, pid, lpid, (1UL << def->shift), psize, false); } } From f074059c7a4d4b93914eee404391dcdb0fd60aa6 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:31 +0530 Subject: [PATCH 08/47] powerpc/64s: Rename tlbie_lpid_va to tlbie_va_lpid In previous patch we renamed tlbie_va_lpid functions to tlbie_va_pid_lpid() since those were working with PIDs as well. This then allows us to rename tlbie_lpid_va to tlbie_va_lpid, which finally makes all the tlbie function naming consistent. No functional change in this patch. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/8fadd2beb2f883c65ba0d797c87d238098cd13c8.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/mm/book3s64/radix_tlb.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 1adf20798ca6..6ce94eaefc1b 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -185,7 +185,7 @@ static __always_inline void __tlbie_va(unsigned long va, unsigned long pid, trace_tlbie(0, 0, rb, rs, ric, prs, r); } -static __always_inline void __tlbie_lpid_va(unsigned long va, unsigned long lpid, +static __always_inline void __tlbie_va_lpid(unsigned long va, unsigned long lpid, unsigned long ap, unsigned long ric) { unsigned long rb,rs,prs,r; @@ -249,17 +249,17 @@ static inline void fixup_tlbie_pid(unsigned long pid) } } -static inline void fixup_tlbie_lpid_va(unsigned long va, unsigned long lpid, +static inline void fixup_tlbie_va_lpid(unsigned long va, unsigned long lpid, unsigned long ap) { if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { asm volatile("ptesync": : :"memory"); - __tlbie_lpid_va(va, 0, ap, RIC_FLUSH_TLB); + __tlbie_va_lpid(va, 0, ap, RIC_FLUSH_TLB); } if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); - __tlbie_lpid_va(va, lpid, ap, RIC_FLUSH_TLB); + __tlbie_va_lpid(va, lpid, ap, RIC_FLUSH_TLB); } } @@ -278,7 +278,7 @@ static inline void fixup_tlbie_lpid(unsigned long lpid) if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); - __tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); + __tlbie_va_lpid(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); } } @@ -529,14 +529,14 @@ static void do_tlbiel_va_range(void *info) t->psize, t->also_pwc); } -static __always_inline void _tlbie_lpid_va(unsigned long va, unsigned long lpid, +static __always_inline void _tlbie_va_lpid(unsigned long va, unsigned long lpid, unsigned long psize, unsigned long ric) { unsigned long ap = mmu_get_ap(psize); asm volatile("ptesync": : :"memory"); - __tlbie_lpid_va(va, lpid, ap, ric); - fixup_tlbie_lpid_va(va, lpid, ap); + __tlbie_va_lpid(va, lpid, ap, ric); + fixup_tlbie_va_lpid(va, lpid, ap); asm volatile("eieio; tlbsync; ptesync": : :"memory"); } @@ -1147,7 +1147,7 @@ void radix__flush_tlb_lpid_page(unsigned int lpid, { int psize = radix_get_mmu_psize(page_size); - _tlbie_lpid_va(addr, lpid, psize, RIC_FLUSH_TLB); + _tlbie_va_lpid(addr, lpid, psize, RIC_FLUSH_TLB); } EXPORT_SYMBOL_GPL(radix__flush_tlb_lpid_page); From 24eb6378408fc125eacc4ad498d120ecf7becc35 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:32 +0530 Subject: [PATCH 09/47] powerpc/64s: Make use of H_RPTI_TYPE_ALL macro Instead of opencoding, let's use the pre-defined macro (H_RPTI_TYPE_ALL) at the following places. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/d1d32404d5f0d3e93cd0faad2298b7bfed31288f.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/mm/book3s64/radix_tlb.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 6ce94eaefc1b..7de5760164a9 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -885,8 +885,7 @@ static void __flush_all_mm(struct mm_struct *mm, bool fullmm) } else if (type == FLUSH_TYPE_GLOBAL) { if (!mmu_has_feature(MMU_FTR_GTSE)) { unsigned long tgt = H_RPTI_TARGET_CMMU; - unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC | - H_RPTI_TYPE_PRT; + unsigned long type = H_RPTI_TYPE_ALL; if (atomic_read(&mm->context.copros) > 0) tgt |= H_RPTI_TARGET_NMMU; @@ -982,8 +981,7 @@ void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end) { if (!mmu_has_feature(MMU_FTR_GTSE)) { unsigned long tgt = H_RPTI_TARGET_CMMU | H_RPTI_TARGET_NMMU; - unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC | - H_RPTI_TYPE_PRT; + unsigned long type = H_RPTI_TYPE_ALL; pseries_rpt_invalidate(0, tgt, type, H_RPTI_PAGE_ALL, start, end); @@ -1337,8 +1335,7 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr) unsigned long tgt, type, pg_sizes; tgt = H_RPTI_TARGET_CMMU; - type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC | - H_RPTI_TYPE_PRT; + type = H_RPTI_TYPE_ALL; pg_sizes = psize_to_rpti_pgsize(mmu_virtual_psize); if (atomic_read(&mm->context.copros) > 0) From 07791ff060dd3aa270cc03861f2599d81a77b97f Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 23:44:33 +0530 Subject: [PATCH 10/47] powerpc: Print MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS at startup Similar to CPU_FTRS_[POSSIBLE|ALWAYS], let's also print MMU_FTRS_[POSSIBLE|ALWAYS]. This has some useful data to capture during bootup. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/c37a9f314a723048d25aa5424f7ede8eec691d86.1773078178.git.ritesh.list@gmail.com --- arch/powerpc/kernel/setup-common.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index b1761909c23f..8a86b0efcb1c 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -865,6 +865,10 @@ static __init void print_system_info(void) cur_cpu_spec->cpu_user_features, cur_cpu_spec->cpu_user_features2); pr_info("mmu_features = 0x%08x\n", cur_cpu_spec->mmu_features); + pr_info(" possible = 0x%016lx\n", + (unsigned long)MMU_FTRS_POSSIBLE); + pr_info(" always = 0x%016lx\n", + (unsigned long)MMU_FTRS_ALWAYS); #ifdef CONFIG_PPC64 pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features); #ifdef CONFIG_PPC_BOOK3S From 948b71aa81cd89b222942db6055e8d9c51c54e78 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 18:08:37 +0530 Subject: [PATCH 11/47] drivers/vfio_pci_core: Change PXD_ORDER check from switch case to if/else block MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Architectures like PowerPC uses runtime defined values for PMD_ORDER/PUD_ORDER. This is because it can use either RADIX or HASH MMU at runtime using kernel cmdline. So the pXd_index_size is not known at compile time. Without this fix, when we add huge pfn support on powerpc in the next patch, vfio_pci_core driver compilation can fail with the following errors. CC [M] drivers/vfio/vfio_main.o CC [M] drivers/vfio/group.o CC [M] drivers/vfio/container.o CC [M] drivers/vfio/virqfd.o CC [M] drivers/vfio/vfio_iommu_spapr_tce.o CC [M] drivers/vfio/pci/vfio_pci_core.o CC [M] drivers/vfio/pci/vfio_pci_intrs.o CC [M] drivers/vfio/pci/vfio_pci_rdwr.o CC [M] drivers/vfio/pci/vfio_pci_config.o CC [M] drivers/vfio/pci/vfio_pci.o AR kernel/built-in.a ../drivers/vfio/pci/vfio_pci_core.c: In function ‘vfio_pci_vmf_insert_pfn’: ../drivers/vfio/pci/vfio_pci_core.c:1678:9: error: case label does not reduce to an integer constant 1678 | case PMD_ORDER: | ^~~~ ../drivers/vfio/pci/vfio_pci_core.c:1682:9: error: case label does not reduce to an integer constant 1682 | case PUD_ORDER: | ^~~~ make[6]: *** [../scripts/Makefile.build:289: drivers/vfio/pci/vfio_pci_core.o] Error 1 make[6]: *** Waiting for unfinished jobs.... make[5]: *** [../scripts/Makefile.build:546: drivers/vfio/pci] Error 2 make[5]: *** Waiting for unfinished jobs.... make[4]: *** [../scripts/Makefile.build:546: drivers/vfio] Error 2 make[3]: *** [../scripts/Makefile.build:546: drivers] Error 2 Fixes: f9e54c3a2f5b7 ("vfio/pci: implement huge_fault support") Signed-off-by: Ritesh Harjani (IBM) Tested-by: Venkat Rao Bagalkote Reviewed-by: Alex Williamson Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/b155e19993ee1f5584c72050192eb468b31c5029.1773058761.git.ritesh.list@gmail.com --- drivers/vfio/pci/vfio_pci_core.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index d43745fe4c84..0967307235b8 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1670,21 +1670,16 @@ vm_fault_t vfio_pci_vmf_insert_pfn(struct vfio_pci_core_device *vdev, if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) return VM_FAULT_SIGBUS; - switch (order) { - case 0: + if (!order) return vmf_insert_pfn(vmf->vma, vmf->address, pfn); -#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP - case PMD_ORDER: + + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PMD_PFNMAP) && order == PMD_ORDER) return vmf_insert_pfn_pmd(vmf, pfn, false); -#endif -#ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP - case PUD_ORDER: + + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && order == PUD_ORDER) return vmf_insert_pfn_pud(vmf, pfn, false); - break; -#endif - default: - return VM_FAULT_FALLBACK; - } + + return VM_FAULT_FALLBACK; } EXPORT_SYMBOL_GPL(vfio_pci_vmf_insert_pfn); From d1503aa9ab8057cb93367e0184528f61f7510845 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 9 Mar 2026 18:08:38 +0530 Subject: [PATCH 12/47] powerpc/64s: Add support for huge pfnmaps This uses _RPAGE_SW2 bit for the PMD and PUDs similar to PTEs. This also adds support for {pte,pmd,pud}_pgprot helpers needed for follow_pfnmap APIs. This allows us to extend the PFN mappings, e.g. PCI MMIO bars where it can grow as large as 8GB or even bigger, to map at PMD / PUD level. VFIO PCI core driver already supports fault handling at PMD / PUD level for more efficient BAR mappings. Reviewed-by: Christophe Leroy (CS GROUP) Signed-off-by: Ritesh Harjani (IBM) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/6fca726574236f556dd4e1e259692e82a4c29e85.1773058761.git.ritesh.list@gmail.com --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/64/pgtable.h | 23 ++++++++++++++++++++ arch/powerpc/include/asm/pgtable.h | 14 ++++++++++++ 3 files changed, 38 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 10240cb80904..fd3e66e4ee0a 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -172,6 +172,7 @@ config PPC select ARCH_STACKWALK select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_DEBUG_PAGEALLOC if PPC_BOOK3S || PPC_8xx + select ARCH_SUPPORTS_HUGE_PFNMAP if PPC_BOOK3S_64 && TRANSPARENT_HUGEPAGE select ARCH_SUPPORTS_PAGE_TABLE_CHECK if !HUGETLB_PAGE select ARCH_SUPPORTS_SCHED_MC if SMP select ARCH_SUPPORTS_SCHED_SMT if PPC64 && SMP diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index a105aede4f6b..1b8916618f89 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1289,6 +1289,29 @@ static inline pud_t pud_mkhuge(pud_t pud) return pud; } +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP +static inline bool pmd_special(pmd_t pmd) +{ + return pte_special(pmd_pte(pmd)); +} + +static inline pmd_t pmd_mkspecial(pmd_t pmd) +{ + return pte_pmd(pte_mkspecial(pmd_pte(pmd))); +} +#endif + +#ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP +static inline bool pud_special(pud_t pud) +{ + return pte_special(pud_pte(pud)); +} + +static inline pud_t pud_mkspecial(pud_t pud) +{ + return pte_pud(pte_mkspecial(pud_pte(pud))); +} +#endif #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS extern int pmdp_set_access_flags(struct vm_area_struct *vma, diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index dcd3a88caaf6..97ccfa6e3dde 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -63,6 +63,20 @@ static inline pgprot_t pte_pgprot(pte_t pte) return __pgprot(pte_flags); } +#ifdef CONFIG_PPC64 +#define pmd_pgprot pmd_pgprot +static inline pgprot_t pmd_pgprot(pmd_t pmd) +{ + return pte_pgprot(pmd_pte(pmd)); +} + +#define pud_pgprot pud_pgprot +static inline pgprot_t pud_pgprot(pud_t pud) +{ + return pte_pgprot(pud_pte(pud)); +} +#endif /* CONFIG_PPC64 */ + static inline pgprot_t pgprot_nx(pgprot_t prot) { return pte_pgprot(pte_exprotect(__pte(pgprot_val(prot)))); From 6771c54728c278bf1e4bfdab4fddbbb186e33498 Mon Sep 17 00:00:00 2001 From: Nilay Shroff Date: Wed, 11 Mar 2026 19:13:31 +0530 Subject: [PATCH 13/47] powerpc/xive: fix kmemleak caused by incorrect chip_data lookup The kmemleak reports the following memory leak: Unreferenced object 0xc0000002a7fbc640 (size 64): comm "kworker/8:1", pid 540, jiffies 4294937872 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 09 04 00 04 00 00 ................ 00 00 a7 81 00 00 0a c0 00 00 08 04 00 04 00 00 ................ backtrace (crc 177d48f6): __kmalloc_cache_noprof+0x520/0x730 xive_irq_alloc_data.constprop.0+0x40/0xe0 xive_irq_domain_alloc+0xd0/0x1b0 irq_domain_alloc_irqs_parent+0x44/0x6c pseries_irq_domain_alloc+0x1cc/0x354 irq_domain_alloc_irqs_parent+0x44/0x6c msi_domain_alloc+0xb0/0x220 irq_domain_alloc_irqs_locked+0x138/0x4d0 __irq_domain_alloc_irqs+0x8c/0xfc __msi_domain_alloc_irqs+0x214/0x4d8 msi_domain_alloc_irqs_all_locked+0x70/0xf8 pci_msi_setup_msi_irqs+0x60/0x78 __pci_enable_msix_range+0x54c/0x98c pci_alloc_irq_vectors_affinity+0x16c/0x1d4 nvme_pci_enable+0xac/0x9c0 [nvme] nvme_probe+0x340/0x764 [nvme] This occurs when allocating MSI-X vectors for an NVMe device. During allocation the XIVE code creates a struct xive_irq_data and stores it in irq_data->chip_data. When the MSI-X irqdomain is later freed, xive_irq_free_data() is responsible for retrieving this structure and freeing it. However, after commit cc0cc23babc9 ("powerpc/xive: Untangle xive from child interrupt controller drivers"), xive_irq_free_data() retrieves the chip_data using irq_get_chip_data(), which looks up the data through the child domain. This is incorrect because the XIVE-specific irq data is associated with the XIVE (parent) domain. As a result the lookup fails and the allocated struct xive_irq_data is never freed, leading to the kmemleak report shown above. Fix this by retrieving the irq_data from the correct domain using irq_domain_get_irq_data() and then accessing the chip_data via irq_data_get_irq_chip_data(). Cc: stable@vger.kernel.org Fixes: cc0cc23babc9 ("powerpc/xive: Untangle xive from child interrupt controller drivers") Signed-off-by: Nilay Shroff Tested-by: Venkat Rao Bagalkote Reviewed-by: Nam Cao Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260311134336.326996-1-nilay@linux.ibm.com --- arch/powerpc/sysdev/xive/common.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index e1a4f8a97393..6b1b7541ca31 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1038,13 +1038,19 @@ static struct xive_irq_data *xive_irq_alloc_data(unsigned int virq, irq_hw_numbe return xd; } -static void xive_irq_free_data(unsigned int virq) +static void xive_irq_free_data(struct irq_domain *domain, unsigned int virq) { - struct xive_irq_data *xd = irq_get_chip_data(virq); + struct xive_irq_data *xd; + struct irq_data *data = irq_domain_get_irq_data(domain, virq); + if (!data) + return; + + xd = irq_data_get_irq_chip_data(data); if (!xd) return; - irq_set_chip_data(virq, NULL); + + irq_domain_reset_irq_data(data); xive_cleanup_irq_data(xd); kfree(xd); } @@ -1305,7 +1311,7 @@ static int xive_irq_domain_map(struct irq_domain *h, unsigned int virq, static void xive_irq_domain_unmap(struct irq_domain *d, unsigned int virq) { - xive_irq_free_data(virq); + xive_irq_free_data(d, virq); } static int xive_irq_domain_xlate(struct irq_domain *h, struct device_node *ct, @@ -1443,7 +1449,7 @@ static void xive_irq_domain_free(struct irq_domain *domain, pr_debug("%s %d #%d\n", __func__, virq, nr_irqs); for (i = 0; i < nr_irqs; i++) - xive_irq_free_data(virq + i); + xive_irq_free_data(domain, virq + i); } #endif From 789335cacdf37da93bb7c70322dff8c7e82881df Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Thu, 12 Mar 2026 14:00:49 +0530 Subject: [PATCH 14/47] powerpc/crash: fix backup region offset update to elfcorehdr MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit update_backup_region_phdr() in file_load_64.c iterates over all the program headers in the kdump kernel’s elfcorehdr and updates the p_offset of the program header whose physical address starts at 0. However, the loop logic is incorrect because the program header pointer is not updated during iteration. Since elfcorehdr typically contains PT_NOTE entries first, the PT_LOAD program header with physical address 0 is never reached. As a result, its p_offset is not updated to point to the backup region. Because of this behavior, the capture kernel exports the first 64 KB of the crashed kernel’s memory at offset 0, even though that memory actually lives in the backup region. When a crash happens, purgatory copies the first 64 KB of the crashed kernel’s memory into the backup region so the capture kernel can safely use it. This has not caused problems so far because the first 64 KB is usually identical in both the crashed and capture kernels. However, this is just an assumption and is not guaranteed to always hold true. Fix update_backup_region_phdr() to correctly update the p_offset of the program header with a starting physical address of 0 by correcting the logic used to iterate over the program headers. Fixes: cb350c1f1f86 ("powerpc/kexec_file: Prepare elfcore header for crashing kernel") Reviewed-by: Aditya Gupta Signed-off-by: Sourabh Jain Reviewed-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260312083051.1935737-2-sourabhjain@linux.ibm.com --- arch/powerpc/kexec/file_load_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 5f6d50e4c3d4..a7db7eca0481 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -391,7 +391,7 @@ static void update_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr) unsigned int i; phdr = (Elf64_Phdr *)(ehdr + 1); - for (i = 0; i < ehdr->e_phnum; i++) { + for (i = 0; i < ehdr->e_phnum; i++, phdr++) { if (phdr->p_paddr == BACKUP_SRC_START) { phdr->p_offset = image->arch.backup_start; kexec_dprintk("Backup region offset updated to 0x%lx\n", From f53b24d1fa263f56155213eabab734c18d884aff Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Thu, 12 Mar 2026 14:00:50 +0530 Subject: [PATCH 15/47] powerpc/crash: Update backup region offset in elfcorehdr on memory hotplug When elfcorehdr is prepared for kdump, the program header representing the first 64 KB of memory is expected to have its offset point to the backup region. This is required because purgatory copies the first 64 KB of the crashed kernel memory to this backup region following a kernel crash. This allows the capture kernel to use the first 64 KB of memory to place the exception vectors and other required data. When elfcorehdr is recreated due to memory hotplug, the offset of the program header representing the first 64 KB is not updated. As a result, the capture kernel exports the first 64 KB at offset 0, even though the data actually resides in the backup region. Fix this by calling sync_backup_region_phdr() to update the program header offset in the elfcorehdr created during memory hotplug. sync_backup_region_phdr() works for images loaded via the kexec_file_load syscall. However, it does not work for kexec_load, because image->arch.backup_start is not initialized in that case. So introduce machine_kexec_post_load() to process the elfcorehdr prepared by kexec-tools and initialize image->arch.backup_start for kdump images loaded via kexec_load syscall. Rename update_backup_region_phdr() to sync_backup_region_phdr() and extend it to synchronize the backup region offset between the kdump image and the ELF core header. The helper now supports updating either the kdump image from the ELF program header or updating the ELF program header from the kdump image, avoiding code duplication. Define ARCH_HAS_KIMAGE_ARCH and struct kimage_arch when CONFIG_KEXEC_FILE or CONFIG_CRASH_DUMP is enabled so that kimage->arch.backup_start is available with the kexec_load system call. This patch depends on the patch titled "powerpc/crash: fix backup region offset update to elfcorehdr". Fixes: 849599b702ef ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Aditya Gupta Signed-off-by: Sourabh Jain Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260312083051.1935737-3-sourabhjain@linux.ibm.com --- arch/powerpc/include/asm/kexec.h | 14 +++++-- arch/powerpc/kexec/crash.c | 64 +++++++++++++++++++++++++++++++ arch/powerpc/kexec/file_load_64.c | 29 +------------- 3 files changed, 76 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index bd4a6c42a5f3..e02710d6a2e1 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -66,11 +66,9 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co unsigned long start_address) __noreturn; void kexec_copy_flush(struct kimage *image); -#ifdef CONFIG_KEXEC_FILE -extern const struct kexec_file_ops kexec_elf64_ops; +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) #define ARCH_HAS_KIMAGE_ARCH - struct kimage_arch { struct crash_mem *exclude_ranges; @@ -78,6 +76,10 @@ struct kimage_arch { void *backup_buf; void *fdt; }; +#endif + +#ifdef CONFIG_KEXEC_FILE +extern const struct kexec_file_ops kexec_elf64_ops; char *setup_kdump_cmdline(struct kimage *image, char *cmdline, unsigned long cmdline_len); @@ -145,6 +147,10 @@ int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size + +int machine_kexec_post_load(struct kimage *image); +#define machine_kexec_post_load machine_kexec_post_load + #endif /* CONFIG_CRASH_HOTPLUG */ extern int crashing_cpu; @@ -159,6 +165,8 @@ extern void default_machine_crash_shutdown(struct pt_regs *regs); extern void crash_kexec_prepare(void); extern void crash_kexec_secondary(struct pt_regs *regs); +extern void sync_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr, + bool phdr_to_kimage); static inline bool kdump_in_progress(void) { return crashing_cpu >= 0; diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index a325c1c02f96..e6539f213b3d 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -27,6 +27,7 @@ #include #include #include +#include /* * The primary CPU waits a while for all secondary CPUs to enter. This is to @@ -399,7 +400,68 @@ void default_machine_crash_shutdown(struct pt_regs *regs) ppc_md.kexec_cpu_down(1, 0); } +#ifdef CONFIG_CRASH_DUMP +/** + * sync_backup_region_phdr - synchronize backup region offset between + * kexec image and ELF core header. + * @image: Kexec image. + * @ehdr: ELF core header. + * @phdr_to_kimage: If true, read the offset from the ELF program header + * and update the kimage backup region. If false, update + * the ELF program header offset from the kimage backup + * region. + * + * Note: During kexec_load, this is called with phdr_to_kimage = true. For + * kexec_file_load and ELF core header recreation during memory hotplug + * events, it is called with phdr_to_kimage = false. + * + * Returns nothing. + */ +void sync_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr, bool phdr_to_kimage) +{ + Elf64_Phdr *phdr; + unsigned int i; + + phdr = (Elf64_Phdr *)(ehdr + 1); + for (i = 0; i < ehdr->e_phnum; i++, phdr++) { + if (phdr->p_paddr == BACKUP_SRC_START) { + if (phdr_to_kimage) + image->arch.backup_start = phdr->p_offset; + else + phdr->p_offset = image->arch.backup_start; + + kexec_dprintk("Backup region offset updated to 0x%lx\n", + image->arch.backup_start); + return; + } + } +} +#endif /* CONFIG_CRASH_DUMP */ + #ifdef CONFIG_CRASH_HOTPLUG + +int machine_kexec_post_load(struct kimage *image) +{ + int i; + unsigned long mem; + unsigned char *ptr; + + if (image->type != KEXEC_TYPE_CRASH) + return 0; + + if (image->file_mode) + return 0; + + for (i = 0; i < image->nr_segments; i++) { + mem = image->segment[i].mem; + ptr = (char *)__va(mem); + + if (ptr && memcmp(ptr, ELFMAG, SELFMAG) == 0) + sync_backup_region_phdr(image, (Elf64_Ehdr *) ptr, true); + } + return 0; +} + #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt @@ -474,6 +536,8 @@ static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify * goto out; } + sync_backup_region_phdr(image, (Elf64_Ehdr *) elfbuf, false); + ptr = __va(mem); if (ptr) { /* Temporarily invalidate the crash image while it is replaced */ diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index a7db7eca0481..8c72e12ea44e 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -374,33 +374,6 @@ static int load_backup_segment(struct kimage *image, struct kexec_buf *kbuf) return 0; } -/** - * update_backup_region_phdr - Update backup region's offset for the core to - * export the region appropriately. - * @image: Kexec image. - * @ehdr: ELF core header. - * - * Assumes an exclusive program header is setup for the backup region - * in the ELF headers - * - * Returns nothing. - */ -static void update_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr) -{ - Elf64_Phdr *phdr; - unsigned int i; - - phdr = (Elf64_Phdr *)(ehdr + 1); - for (i = 0; i < ehdr->e_phnum; i++, phdr++) { - if (phdr->p_paddr == BACKUP_SRC_START) { - phdr->p_offset = image->arch.backup_start; - kexec_dprintk("Backup region offset updated to 0x%lx\n", - image->arch.backup_start); - return; - } - } -} - static unsigned int kdump_extra_elfcorehdr_size(struct crash_mem *cmem) { #if defined(CONFIG_CRASH_HOTPLUG) && defined(CONFIG_MEMORY_HOTPLUG) @@ -445,7 +418,7 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf) } /* Fix the offset for backup region in the ELF header */ - update_backup_region_phdr(image, headers); + sync_backup_region_phdr(image, headers, false); kbuf->buffer = headers; kbuf->mem = KEXEC_BUF_MEM_UNKNOWN; From cad2a72c29e037f1ade0079f7e4b925508680e20 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Wed, 18 Mar 2026 23:36:45 -0400 Subject: [PATCH 16/47] Revert "powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()" This reverts commit a9dadc1c512807f955f0799e85830b420da47932. The commit message states: When called from xive_irq_startup(), the size of the cpumask can be larger than nr_cpu_ids. This can result in a WARN_ON. [...] This happens because we're being called with our affinity mask set to irq_default_affinity. That in turn was populated using cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a mask which has > nr_cpu_ids bits set. In modern kernel, cpumask_weight() can't return > nr_cpu_ids. In inline case, cpumask_setall() explicitly clears all bits above nr_cpu_ids, see commit 63355b9884b3 ("cpumask: be more careful with 'cpumask_setall()'"). So, despite that cpumask_weight() is passed with small_cpumask_bits, which is NR_CPUS in this case, it can't count over the nr_cpu_ids. In outline case, cpumask_setall() may set bits beyond the limit up to the next byte alignment, but in this case small_cpumask_bits is wired to nr_cpu_ids, thus making overcounting impossible. Signed-off-by: Yury Norov Tested-by: Mukesh Kumar Chaurasiya (IBM) Reviewed-by: Mukesh Kumar Chaurasiya (IBM) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260319033647.881246-2-ynorov@nvidia.com --- arch/powerpc/sysdev/xive/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index 6b1b7541ca31..33e9fb3436e1 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -551,7 +551,7 @@ static int xive_find_target_in_mask(const struct cpumask *mask, int cpu, first, num, i; /* Pick up a starting point CPU in the mask based on fuzz */ - num = min_t(int, cpumask_weight(mask), nr_cpu_ids); + num = cpumask_weight(mask); first = fuzz % num; /* Locate it */ From ce7c43b0871989b4c665cceb9720d79b933c1818 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Wed, 18 Mar 2026 23:36:46 -0400 Subject: [PATCH 17/47] powerpc/xive: rework xive_find_target_in_mask() Switch the function to using modern cpumask API and drop most of the housekeeping code. Notice, if first >= nr_cpu_ids, for_each_cpu_wrap() iterator behaves just like for_each_cpu(), i.e. begins from 0. So even if WARN_ON() is triggered, no special handling is needed. Signed-off-by: Yury Norov Tested-by: Mukesh Kumar Chaurasiya (IBM) Reviewed-by: Shrikanth Hegde Reviewed-by: Mukesh Kumar Chaurasiya (IBM) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260319033647.881246-3-ynorov@nvidia.com --- arch/powerpc/sysdev/xive/common.c | 31 ++++++------------------------- 1 file changed, 6 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index 33e9fb3436e1..c120be73d149 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -548,40 +548,21 @@ static void xive_dec_target_count(int cpu) static int xive_find_target_in_mask(const struct cpumask *mask, unsigned int fuzz) { - int cpu, first, num, i; + int cpu, first; /* Pick up a starting point CPU in the mask based on fuzz */ - num = cpumask_weight(mask); - first = fuzz % num; - - /* Locate it */ - cpu = cpumask_first(mask); - for (i = 0; i < first && cpu < nr_cpu_ids; i++) - cpu = cpumask_next(cpu, mask); - - /* Sanity check */ - if (WARN_ON(cpu >= nr_cpu_ids)) - cpu = cpumask_first(cpu_online_mask); - - /* Remember first one to handle wrap-around */ - first = cpu; + fuzz %= cpumask_weight(mask); + first = cpumask_nth(fuzz, mask); + WARN_ON(first >= nr_cpu_ids); /* * Now go through the entire mask until we find a valid * target. */ - do { - /* - * We re-check online as the fallback case passes us - * an untested affinity mask - */ + for_each_cpu_wrap(cpu, mask, first) { if (cpu_online(cpu) && xive_try_pick_target(cpu)) return cpu; - cpu = cpumask_next(cpu, mask); - /* Wrap around */ - if (cpu >= nr_cpu_ids) - cpu = cpumask_first(mask); - } while (cpu != first); + } return -1; } From 6e65886fceb23605eff952d6b1975737b4c4b154 Mon Sep 17 00:00:00 2001 From: Amit Machhiwal Date: Fri, 13 Mar 2026 22:24:26 +0530 Subject: [PATCH 18/47] selftests/powerpc: Suppress -Wmaybe-uninitialized with GCC 15 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GCC 15 reports the below false positive '-Wmaybe-uninitialized' warning in vphn_unpack_associativity() when building the powerpc selftests. # make -C tools/testing/selftests TARGETS="powerpc" [...] CC test-vphn In file included from test-vphn.c:3: In function ‘vphn_unpack_associativity’, inlined from ‘test_one’ at test-vphn.c:371:2, inlined from ‘test_vphn’ at test-vphn.c:399:9: test-vphn.c:10:33: error: ‘be_packed’ may be used uninitialized [-Werror=maybe-uninitialized] 10 | #define be16_to_cpup(x) bswap_16(*x) | ^~~~~~~~ vphn.c:42:27: note: in expansion of macro ‘be16_to_cpup’ 42 | u16 new = be16_to_cpup(field++); | ^~~~~~~~~~~~ In file included from test-vphn.c:19: vphn.c: In function ‘test_vphn’: vphn.c:27:16: note: ‘be_packed’ declared here 27 | __be64 be_packed[VPHN_REGISTER_COUNT]; | ^~~~~~~~~ cc1: all warnings being treated as errors When vphn_unpack_associativity() is called from hcall_vphn() in kernel the error is not seen while building vphn.c during kernel compilation. This is because the top level Makefile includes '-fno-strict-aliasing' flag always. The issue here is that GCC 15 emits '-Wmaybe-uninitialized' due to type punning between __be64[] and __b16* when accessing the buffer via be16_to_cpup(). The underlying object is fully initialized but GCC 15 fails to track the aliasing due to the strict aliasing violation here. Please refer [1] and [2]. This results in a false positive warning which is promoted to an error under '-Werror'. This problem is not seen when the compilation is performed with GCC 13 and 14. An issue [1] has also been created on GCC bugzilla. The selftest compiles fine with '-fno-strict-aliasing'. Since this GCC flag is used to compile vphn.c in kernel too, the same flag should be used to build vphn tests when compiling vphn.c in the selftest as well. Fix this by including '-fno-strict-aliasing' during vphn.c compilation in the selftest. This keeps the build working while limiting the scope of the suppression to building vphn tests. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124427 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99768 Fixes: 58dae82843f5 ("selftests/powerpc: Add test for VPHN") Reviewed-by: Vaibhav Jain Signed-off-by: Amit Machhiwal Tested-by: Venkat Rao Bagalkote Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260313165426.43259-1-amachhiw@linux.ibm.com --- tools/testing/selftests/powerpc/vphn/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/vphn/Makefile b/tools/testing/selftests/powerpc/vphn/Makefile index 61d519a076c6..778fc396340d 100644 --- a/tools/testing/selftests/powerpc/vphn/Makefile +++ b/tools/testing/selftests/powerpc/vphn/Makefile @@ -5,7 +5,7 @@ top_srcdir = ../../../../.. include ../../lib.mk include ../flags.mk -CFLAGS += -m64 -I$(CURDIR) +CFLAGS += -m64 -I$(CURDIR) -fno-strict-aliasing $(TEST_GEN_PROGS): ../harness.c From 1ef8cf10cdbe79823fd6de0f0b93ca996045d1cc Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Mon, 19 Jan 2026 14:04:50 +0800 Subject: [PATCH 19/47] powerpc/52xx/media5200: Consolidate chained IRQ handler install/remove The driver currently sets the handler data and the chained handler in two separate steps. This creates a theoretical race window where an interrupt could fire after the handler is set but before the data is assigned, leading to a NULL pointer dereference. Replace the two calls with irq_set_chained_handler_and_data() to set both the handler and its data atomically under the irq_desc->lock. Signed-off-by: Chen Ni Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260119060450.889119-1-nichen@iscas.ac.cn --- arch/powerpc/platforms/52xx/media5200.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/52xx/media5200.c b/arch/powerpc/platforms/52xx/media5200.c index bc7f83cfec1d..c20ac8010f6d 100644 --- a/arch/powerpc/platforms/52xx/media5200.c +++ b/arch/powerpc/platforms/52xx/media5200.c @@ -176,8 +176,8 @@ static void __init media5200_init_irq(void) of_node_put(fpga_np); - irq_set_handler_data(cascade_virq, &media5200_irq); - irq_set_chained_handler(cascade_virq, media5200_irq_cascade); + irq_set_chained_handler_and_data(cascade_virq, media5200_irq_cascade, + &media5200_irq); return; From 7593721cd7c1315557956d5241bbb65fb33115eb Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Mon, 19 Jan 2026 14:12:32 +0800 Subject: [PATCH 20/47] powerpc/52xx/mpc52xx_gpt: consolidate chained IRQ handler install/remove The driver currently sets the handler data and the chained handler in two separate steps. This creates a theoretical race window where an interrupt could fire after the handler is set but before the data is assigned, leading to a NULL pointer dereference. Replace the two calls with irq_set_chained_handler_and_data() to set both the handler and its data atomically under the irq_desc->lock. Signed-off-by: Chen Ni Reviewed-by: Bartosz Golaszewski Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260119061232.889236-1-nichen@iscas.ac.cn --- arch/powerpc/platforms/52xx/mpc52xx_gpt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c index 7748b6641a3c..e8163fdee69a 100644 --- a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c +++ b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c @@ -253,8 +253,7 @@ mpc52xx_gpt_irq_setup(struct mpc52xx_gpt_priv *gpt, struct device_node *node) return; } - irq_set_handler_data(cascade_virq, gpt); - irq_set_chained_handler(cascade_virq, mpc52xx_gpt_irq_cascade); + irq_set_chained_handler_and_data(cascade_virq, mpc52xx_gpt_irq_cascade, gpt); /* If the GPT is currently disabled, then change it to be in Input * Capture mode. If the mode is non-zero, then the pin could be From 5716cacebac887b45091c658caa6d1ea25c238dc Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Mon, 19 Jan 2026 14:35:07 +0800 Subject: [PATCH 21/47] powerpc/44x/uic: Consolidate chained IRQ handler install/remove The driver currently sets the handler data and the chained handler in two separate steps. This creates a theoretical race window where an interrupt could fire after the handler is set but before the data is assigned, leading to a NULL pointer dereference. Replace the two calls with irq_set_chained_handler_and_data() to set both the handler and its data atomically under the irq_desc->lock. Signed-off-by: Chen Ni Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260119063507.940782-1-nichen@iscas.ac.cn --- arch/powerpc/platforms/44x/uic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/44x/uic.c b/arch/powerpc/platforms/44x/uic.c index cf4fc5263c89..3f90126a9056 100644 --- a/arch/powerpc/platforms/44x/uic.c +++ b/arch/powerpc/platforms/44x/uic.c @@ -309,8 +309,8 @@ void __init uic_init_tree(void) cascade_virq = irq_of_parse_and_map(np, 0); - irq_set_handler_data(cascade_virq, uic); - irq_set_chained_handler(cascade_virq, uic_irq_cascade); + irq_set_chained_handler_and_data(cascade_virq, + uic_irq_cascade, uic); /* FIXME: setup critical cascade?? */ } From 89f46b578694f1549426277c370488479d20e1ad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=2E=20Neusch=C3=A4fer?= Date: Tue, 3 Mar 2026 16:09:49 +0100 Subject: [PATCH 22/47] powerpc: Move GameCube/Wii options under EMBEDDED6xx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move CONFIG_GAMECUBE and CONFIG_WII directly below other embedded6xx boards, and above options such as TSI108_BRIDGE. This has two advantages for the GC/Wii options: - They won't be moved around by USBGECKO_UDBG appearing or disappearing - They will be intendented in menuconfig/nconfig, to make it clear they are part of the embedded6xx platforms Signed-off-by: J. Neuschäfer Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260303-gcwii-kconfig-v1-1-636b288e7270@posteo.net --- arch/powerpc/platforms/embedded6xx/Kconfig | 31 +++++++++++----------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/platforms/embedded6xx/Kconfig b/arch/powerpc/platforms/embedded6xx/Kconfig index c6adff216fe6..f406b3c7936b 100644 --- a/arch/powerpc/platforms/embedded6xx/Kconfig +++ b/arch/powerpc/platforms/embedded6xx/Kconfig @@ -51,6 +51,22 @@ config MVME5100 This option enables support for the Motorola (now Emerson) MVME5100 board. +config GAMECUBE + bool "Nintendo-GameCube" + depends on EMBEDDED6xx + select GAMECUBE_COMMON + help + Select GAMECUBE if configuring for the Nintendo GameCube. + More information at: + +config WII + bool "Nintendo-Wii" + depends on EMBEDDED6xx + select GAMECUBE_COMMON + help + Select WII if configuring for the Nintendo Wii. + More information at: + config TSI108_BRIDGE bool select FORCE_PCI @@ -77,18 +93,3 @@ config USBGECKO_UDBG If in doubt, say N here. -config GAMECUBE - bool "Nintendo-GameCube" - depends on EMBEDDED6xx - select GAMECUBE_COMMON - help - Select GAMECUBE if configuring for the Nintendo GameCube. - More information at: - -config WII - bool "Nintendo-Wii" - depends on EMBEDDED6xx - select GAMECUBE_COMMON - help - Select WII if configuring for the Nintendo Wii. - More information at: From d1620f27ed1aa3be4255513e1a213ab1805ec892 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=2E=20Neusch=C3=A4fer?= Date: Wed, 11 Mar 2026 18:35:56 +0100 Subject: [PATCH 23/47] powerpc: wii: Add unit address to /memory MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This fixes the following dtschema warning: arch/powerpc/boot/dts/wii.dtb: /: memory: False schema does not allow {'device_type': ['memory'], 'reg': [[0, 25165824], [268435456, 67108864]]} Signed-off-by: J. Neuschäfer Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260311-wii-schema-v1-1-1563ac4aefa8@posteo.net --- arch/powerpc/boot/dts/wii.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/boot/dts/wii.dts b/arch/powerpc/boot/dts/wii.dts index e46143c32308..e001c4c6fd79 100644 --- a/arch/powerpc/boot/dts/wii.dts +++ b/arch/powerpc/boot/dts/wii.dts @@ -29,7 +29,7 @@ chosen { bootargs = "root=/dev/mmcblk0p2 rootwait udbg-immortal"; }; - memory { + memory@0 { device_type = "memory"; reg = <0x00000000 0x01800000 /* MEM1 24MB 1T-SRAM */ 0x10000000 0x04000000>; /* MEM2 64MB GDDR3 */ From 4a03d824b3204bae7e19cdf47a85ac01027603bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=2E=20Neusch=C3=A4fer?= Date: Wed, 11 Mar 2026 18:35:57 +0100 Subject: [PATCH 24/47] powerpc: wii: Fix GPIO key name pattern MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adjust the names of GPIO key nodes to comply with the schema in Documentation/devicetree/bindings/input/gpio-keys.yaml. Signed-off-by: J. Neuschäfer Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260311-wii-schema-v1-2-1563ac4aefa8@posteo.net --- arch/powerpc/boot/dts/wii.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/dts/wii.dts b/arch/powerpc/boot/dts/wii.dts index e001c4c6fd79..57d428b1f787 100644 --- a/arch/powerpc/boot/dts/wii.dts +++ b/arch/powerpc/boot/dts/wii.dts @@ -256,13 +256,13 @@ drive-slot { gpio-keys { compatible = "gpio-keys"; - power { + button-power { label = "Power Button"; gpios = <&GPIO 0 GPIO_ACTIVE_HIGH>; linux,code = ; }; - eject { + button-eject { label = "Eject Button"; gpios = <&GPIO 6 GPIO_ACTIVE_HIGH>; linux,code = ; From 47a05517c6edbf5160ce1bff107c10b76aa09ef7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=2E=20Neusch=C3=A4fer?= Date: Wed, 11 Mar 2026 18:35:58 +0100 Subject: [PATCH 25/47] powerpc: wii: Fix LED name pattern MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adjust the name of the drive slot LED node to comply with the schema in Documentation/devicetree/bindings/leds/leds-gpio.yaml. arch/powerpc/boot/dts/wii.dtb: gpio-leds: 'drive-slot' does not match any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+' Signed-off-by: J. Neuschäfer Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260311-wii-schema-v1-3-1563ac4aefa8@posteo.net --- arch/powerpc/boot/dts/wii.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/boot/dts/wii.dts b/arch/powerpc/boot/dts/wii.dts index 57d428b1f787..473d9feb9cdb 100644 --- a/arch/powerpc/boot/dts/wii.dts +++ b/arch/powerpc/boot/dts/wii.dts @@ -246,7 +246,7 @@ gpio-leds { compatible = "gpio-leds"; /* This is the blue LED in the disk drive slot */ - drive-slot { + led-0 { label = "wii:blue:drive_slot"; gpios = <&GPIO 5 GPIO_ACTIVE_HIGH>; panic-indicator; From d1e6f90d6befb970dd6eb7fb5922c8690bf12623 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sat, 29 Nov 2025 10:36:36 -0800 Subject: [PATCH 26/47] powerpc/ps3: fix ps3.h kernel-doc warnings Fix some kernel-doc warnings in ps3.h: - add @dev to struct ps3_dma_region - don't mark a function as "struct" - add Returns: description for one function - add a short description for ps3_system_bus_set_drvdata() - correct an enum @name - move intervening "struct ps3_system_bus_device;" from between kernel-doc for ps3_dma_region_init() and the function declaration to eliminate these warnings: Warning: arch/powerpc/include/asm/ps3.h:96 struct member 'dev' not described in 'ps3_dma_region' Warning: arch/powerpc/include/asm/ps3.h:118 struct ps3_system_bus_device; error: Cannot parse struct or union! Warning: arch/powerpc/include/asm/ps3.h:166 int ps3_mmio_region_init(struct ps3_system_bus_device *dev, struct ps3_mmio_region *r, unsigned long bus_addr, unsigned long len, enum ps3_mmio_page_size page_size); error: Cannot parse struct or union! Warning: arch/powerpc/include/asm/ps3.h:167 No description found for return value of 'ps3_mmio_region_init' Warning: arch/powerpc/include/asm/ps3.h:407 missing initial short description on line: * ps3_system_bus_set_drvdata - Warning: arch/powerpc/include/asm/ps3.h:473 Enum value 'PS3_LPM_TB_TYPE_INTERNAL' not described in enum 'ps3_lpm_tb_type' Warning: arch/powerpc/include/asm/ps3.h:473 Excess enum value '@PS3_LPM_RIGHTS_USE_TB' description in 'ps3_lpm_tb_type' This leaves struct members in several structs and function parameters in one function still undescribed. Signed-off-by: Randy Dunlap Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20251129183636.1893634-1-rdunlap@infradead.org --- arch/powerpc/include/asm/ps3.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/ps3.h b/arch/powerpc/include/asm/ps3.h index 987e23a2bd28..b090ceb32a69 100644 --- a/arch/powerpc/include/asm/ps3.h +++ b/arch/powerpc/include/asm/ps3.h @@ -65,6 +65,7 @@ struct ps3_dma_region_ops; /** * struct ps3_dma_region - A per device dma state variables structure + * @dev: device structure * @did: The HV device id. * @page_size: The ioc pagesize. * @region_type: The HV region type. @@ -108,15 +109,15 @@ struct ps3_dma_region_ops { dma_addr_t bus_addr, unsigned long len); }; + +struct ps3_system_bus_device; + /** * struct ps3_dma_region_init - Helper to initialize structure variables * * Helper to properly initialize variables prior to calling * ps3_system_bus_device_register. */ - -struct ps3_system_bus_device; - int ps3_dma_region_init(struct ps3_system_bus_device *dev, struct ps3_dma_region *r, enum ps3_dma_page_size page_size, enum ps3_dma_region_type region_type, void *addr, unsigned long len); @@ -156,10 +157,12 @@ struct ps3_mmio_region_ops { int (*free)(struct ps3_mmio_region *); }; /** - * struct ps3_mmio_region_init - Helper to initialize structure variables + * ps3_mmio_region_init - Helper to initialize structure variables * * Helper to properly initialize variables prior to calling * ps3_system_bus_device_register. + * + * Returns: %0 on success, %-errno on error (or BUG()) */ int ps3_mmio_region_init(struct ps3_system_bus_device *dev, @@ -405,7 +408,7 @@ static inline struct ps3_system_bus_driver * } /** - * ps3_system_bus_set_drvdata - + * ps3_system_bus_set_drvdata - set driver's private data for this device * @dev: device structure * @data: Data to set */ @@ -464,7 +467,7 @@ enum ps3_lpm_rights { * enum ps3_lpm_tb_type - Type of trace buffer lv1 should use. * * @PS3_LPM_TB_TYPE_NONE: Do not use a trace buffer. - * @PS3_LPM_RIGHTS_USE_TB: Use the lv1 internal trace buffer. Must have + * @PS3_LPM_TB_TYPE_INTERNAL: Use the lv1 internal trace buffer. Must have * rights @PS3_LPM_RIGHTS_USE_TB. */ From 7695a4e12e5506ddd17b05a5e1ef61a9bd315a14 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 24 Feb 2026 21:53:14 -0800 Subject: [PATCH 27/47] powerpc: kgdb: fix kernel-doc warnings Remove empty comment line at the beginning of a kernel-doc function block. Add a "Return:" section for this function. These changes prevent 2 kernel-doc warnings: Warning: ../arch/powerpc/kernel/kgdb.c:103 Cannot find identifier on line: * Warning: kgdb.c:113 No description found for return value of 'kgdb_skipexception' Fixes: 949616cf2d30 ("powerpc/kgdb: Bail out of KGDB when we've been triggered") Signed-off-by: Randy Dunlap Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260225055314.247966-1-rdunlap@infradead.org --- arch/powerpc/kernel/kgdb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c index 5081334b7bd2..861db2334db8 100644 --- a/arch/powerpc/kernel/kgdb.c +++ b/arch/powerpc/kernel/kgdb.c @@ -101,7 +101,6 @@ static int computeSignal(unsigned int tt) } /** - * * kgdb_skipexception - Bail out of KGDB when we've been triggered. * @exception: Exception vector number * @regs: Current &struct pt_regs. @@ -109,6 +108,8 @@ static int computeSignal(unsigned int tt) * On some architectures we need to skip a breakpoint exception when * it occurs after a breakpoint has been removed. * + * Return: return %1 if the breakpoint for this address has been removed, + * otherwise return %0 */ int kgdb_skipexception(int exception, struct pt_regs *regs) { From 26d76caac47f44b3ee4cdf080614bbee07713007 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 24 Feb 2026 21:53:28 -0800 Subject: [PATCH 28/47] powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings Fix enum and function return value kernel-doc warnings: Warning: spu.c:36 Excess enum value '%spe_type_logical' description in 'spe_type' Warning: spu.c:78 Excess enum value '%spe_ex_state_unexecutable' description in 'spe_ex_state' Warning: spu.c:78 Excess enum value '%spe_ex_state_executable' description in 'spe_ex_state' Warning: spu.c:78 Excess enum value '%spe_ex_state_executed' description in 'spe_ex_state' Warning: spu.c:190 No description found for return value of 'setup_areas' Fixes: de91a5342995 ("[POWERPC] ps3: add spu support") Fixes: b47027795a22 ("powerpc/ps3: Fix ioremap of spu shadow regs") Signed-off-by: Randy Dunlap Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260225055328.249204-1-rdunlap@infradead.org --- arch/powerpc/platforms/ps3/spu.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/ps3/spu.c b/arch/powerpc/platforms/ps3/spu.c index 10ab256b675c..e4e0b45e1b9d 100644 --- a/arch/powerpc/platforms/ps3/spu.c +++ b/arch/powerpc/platforms/ps3/spu.c @@ -26,7 +26,7 @@ /** * enum spe_type - Type of spe to create. - * @spe_type_logical: Standard logical spe. + * @SPE_TYPE_LOGICAL: Standard logical spe. * * For use with lv1_construct_logical_spe(). The current HV does not support * any types other than those listed. @@ -64,9 +64,9 @@ struct spe_shadow { /** * enum spe_ex_state - Logical spe execution state. - * @spe_ex_state_unexecutable: Uninitialized. - * @spe_ex_state_executable: Enabled, not ready. - * @spe_ex_state_executed: Ready for use. + * @SPE_EX_STATE_UNEXECUTABLE: Uninitialized. + * @SPE_EX_STATE_EXECUTABLE: Enabled, not ready. + * @SPE_EX_STATE_EXECUTED: Ready for use. * * The execution state (status) of the logical spe as reported in * struct spe_shadow:spe_execution_status. @@ -185,6 +185,8 @@ static void spu_unmap(struct spu *spu) * * The current HV requires the spu shadow regs to be mapped with the * PTE page protection bits set as read-only. + * + * Returns: %0 on success or -errno on error. */ static int __init setup_areas(struct spu *spu) From 64ed1e3e728afb57ba9acb59e69de930ead847d9 Mon Sep 17 00:00:00 2001 From: Shrikanth Hegde Date: Wed, 11 Mar 2026 11:47:09 +0530 Subject: [PATCH 29/47] cpuidle: powerpc: avoid double clear when breaking snooze snooze_loop is done often in any system which has fair bit of idle time. So it qualifies for even micro-optimizations. When breaking the snooze due to timeout, TIF_POLLING_NRFLAG is cleared twice. Clearing the bit invokes atomics. Avoid double clear and thereby avoid one atomic write. dev->poll_time_limit indicates whether the loop was broken due to timeout. Use that instead of defining a new variable. Fixes: 7ded429152e8 ("cpuidle: powerpc: no memory barrier after break from idle") Cc: stable@vger.kernel.org Reviewed-by: Mukesh Kumar Chaurasiya (IBM) Signed-off-by: Shrikanth Hegde Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260311061709.1230440-1-sshegde@linux.ibm.com --- drivers/cpuidle/cpuidle-powernv.c | 5 ++++- drivers/cpuidle/cpuidle-pseries.c | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 9ebedd972df0..b89e7111e7b8 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -95,7 +95,10 @@ static int snooze_loop(struct cpuidle_device *dev, HMT_medium(); ppc64_runlatch_on(); - clear_thread_flag(TIF_POLLING_NRFLAG); + + /* Avoid double clear when breaking */ + if (!dev->poll_time_limit) + clear_thread_flag(TIF_POLLING_NRFLAG); local_irq_disable(); diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index f68c65f1d023..864dd5d6e627 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -64,7 +64,10 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv, } HMT_medium(); - clear_thread_flag(TIF_POLLING_NRFLAG); + + /* Avoid double clear when breaking */ + if (!dev->poll_time_limit) + clear_thread_flag(TIF_POLLING_NRFLAG); raw_local_irq_disable(); From f26ad12356a275ab303d5d3af4790ad94acc20d7 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 10 Mar 2026 16:08:07 +0100 Subject: [PATCH 30/47] powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit e65e1fc2d24b ("[PATCH] syscall class hookup for all normal targets") added generic support for AUDIT but that didn't include support for bi-arch like powerpc. Commit 4b58841149dc ("audit: Add generic compat syscall support") added generic support for bi-arch. Convert powerpc to that bi-arch generic audit support. With this change generated text is similar. Thomas has confirmed that the previously failing filter_exclude/test is now successful both without and with this patch, see [1] [1] https://lore.kernel.org/all/20260306115350-ef265661-6d6b-4043-9bd0-8e6b437d0d67@linutronix.de/ Link: https://github.com/linuxppc/issues/issues/412 Signed-off-by: Christophe Leroy Reviewed-by: Cédric Le Goater Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/261b1be5b8dc526b83d73e8281e682a73536ea28.1773155031.git.chleroy@kernel.org --- arch/powerpc/Kconfig | 5 +- arch/powerpc/include/asm/unistd32.h | 7 +++ arch/powerpc/kernel/Makefile | 3 - arch/powerpc/kernel/audit.c | 87 ----------------------------- arch/powerpc/kernel/compat_audit.c | 49 ---------------- 5 files changed, 8 insertions(+), 143 deletions(-) create mode 100644 arch/powerpc/include/asm/unistd32.h delete mode 100644 arch/powerpc/kernel/audit.c delete mode 100644 arch/powerpc/kernel/compat_audit.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index fd3e66e4ee0a..60b9862d530e 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -189,6 +189,7 @@ config PPC select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if PPC_RADIX_MMU select ARCH_WANTS_MODULES_DATA_IN_VMALLOC if PPC_BOOK3S_32 || PPC_8xx select ARCH_WEAK_RELEASE_ACQUIRE + select AUDIT_ARCH_COMPAT_GENERIC select BINFMT_ELF select BUILDTIME_TABLE_SORT select CLONE_BACKWARDS @@ -371,10 +372,6 @@ config GENERIC_TBSYNC bool default y if PPC32 && SMP -config AUDIT_ARCH - bool - default y - config GENERIC_BUG bool default y diff --git a/arch/powerpc/include/asm/unistd32.h b/arch/powerpc/include/asm/unistd32.h new file mode 100644 index 000000000000..07689897d206 --- /dev/null +++ b/arch/powerpc/include/asm/unistd32.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _ASM_POWERPC_UNISTD32_H_ +#define _ASM_POWERPC_UNISTD32_H_ + +#include + +#endif /* _ASM_POWERPC_UNISTD32_H_ */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 2f0a2e69c607..7bf6b16b2d93 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -149,9 +149,6 @@ obj-$(CONFIG_PCI) += pci_$(BITS).o $(pci64-y) \ pci-common.o pci_of_scan.o obj-$(CONFIG_PCI_MSI) += msi.o -obj-$(CONFIG_AUDIT) += audit.o -obj64-$(CONFIG_AUDIT) += compat_audit.o - obj-y += trace/ ifneq ($(CONFIG_PPC_INDIRECT_PIO),y) diff --git a/arch/powerpc/kernel/audit.c b/arch/powerpc/kernel/audit.c deleted file mode 100644 index 92298d6a3a37..000000000000 --- a/arch/powerpc/kernel/audit.c +++ /dev/null @@ -1,87 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include -#include -#include -#include - -#include "audit_32.h" - -static unsigned dir_class[] = { -#include -~0U -}; - -static unsigned read_class[] = { -#include -~0U -}; - -static unsigned write_class[] = { -#include -~0U -}; - -static unsigned chattr_class[] = { -#include -~0U -}; - -static unsigned signal_class[] = { -#include -~0U -}; - -int audit_classify_arch(int arch) -{ -#ifdef CONFIG_PPC64 - if (arch == AUDIT_ARCH_PPC) - return 1; -#endif - return 0; -} - -int audit_classify_syscall(int abi, unsigned syscall) -{ -#ifdef CONFIG_PPC64 - if (abi == AUDIT_ARCH_PPC) - return ppc32_classify_syscall(syscall); -#endif - switch(syscall) { - case __NR_open: - return AUDITSC_OPEN; - case __NR_openat: - return AUDITSC_OPENAT; - case __NR_socketcall: - return AUDITSC_SOCKETCALL; - case __NR_execve: - return AUDITSC_EXECVE; - case __NR_openat2: - return AUDITSC_OPENAT2; - default: - return AUDITSC_NATIVE; - } -} - -static int __init audit_classes_init(void) -{ -#ifdef CONFIG_PPC64 - extern __u32 ppc32_dir_class[]; - extern __u32 ppc32_write_class[]; - extern __u32 ppc32_read_class[]; - extern __u32 ppc32_chattr_class[]; - extern __u32 ppc32_signal_class[]; - audit_register_class(AUDIT_CLASS_WRITE_32, ppc32_write_class); - audit_register_class(AUDIT_CLASS_READ_32, ppc32_read_class); - audit_register_class(AUDIT_CLASS_DIR_WRITE_32, ppc32_dir_class); - audit_register_class(AUDIT_CLASS_CHATTR_32, ppc32_chattr_class); - audit_register_class(AUDIT_CLASS_SIGNAL_32, ppc32_signal_class); -#endif - audit_register_class(AUDIT_CLASS_WRITE, write_class); - audit_register_class(AUDIT_CLASS_READ, read_class); - audit_register_class(AUDIT_CLASS_DIR_WRITE, dir_class); - audit_register_class(AUDIT_CLASS_CHATTR, chattr_class); - audit_register_class(AUDIT_CLASS_SIGNAL, signal_class); - return 0; -} - -__initcall(audit_classes_init); diff --git a/arch/powerpc/kernel/compat_audit.c b/arch/powerpc/kernel/compat_audit.c deleted file mode 100644 index 57b38c592b9f..000000000000 --- a/arch/powerpc/kernel/compat_audit.c +++ /dev/null @@ -1,49 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#undef __powerpc64__ -#include -#include - -#include "audit_32.h" - -unsigned ppc32_dir_class[] = { -#include -~0U -}; - -unsigned ppc32_chattr_class[] = { -#include -~0U -}; - -unsigned ppc32_write_class[] = { -#include -~0U -}; - -unsigned ppc32_read_class[] = { -#include -~0U -}; - -unsigned ppc32_signal_class[] = { -#include -~0U -}; - -int ppc32_classify_syscall(unsigned syscall) -{ - switch(syscall) { - case __NR_open: - return AUDITSC_OPEN; - case __NR_openat: - return AUDITSC_OPENAT; - case __NR_socketcall: - return AUDITSC_SOCKETCALL; - case __NR_execve: - return AUDITSC_EXECVE; - case __NR_openat2: - return AUDITSC_OPENAT2; - default: - return AUDITSC_COMPAT; - } -} From 40a1b9d044c7dbbc2976f0432e32dc57d4896b00 Mon Sep 17 00:00:00 2001 From: "Christophe Leroy (CS GROUP)" Date: Tue, 10 Mar 2026 10:59:58 +0100 Subject: [PATCH 31/47] powerpc/futex: Use masked user access Commit 861574d51bbd ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Use masked user access in arch_futex_atomic_op_inuser() Signed-off-by: Christophe Leroy (CS GROUP) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/e29f6a5c14e5938df68d94bfac6b2f762fb922aa.1773136636.git.chleroy@kernel.org --- arch/powerpc/include/asm/futex.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h index b3001f8b2c1e..8cf3b2e97e17 100644 --- a/arch/powerpc/include/asm/futex.h +++ b/arch/powerpc/include/asm/futex.h @@ -33,8 +33,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, { int oldval = 0, ret; - if (!user_access_begin(uaddr, sizeof(u32))) - return -EFAULT; + uaddr = masked_user_access_begin(uaddr); switch (op) { case FUTEX_OP_SET: @@ -69,8 +68,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, int ret = 0; u32 prev; - if (!user_access_begin(uaddr, sizeof(u32))) - return -EFAULT; + uaddr = masked_user_access_begin(uaddr); __asm__ __volatile__ ( PPC_ATOMIC_ENTRY_BARRIER From 679fa9c756c7d6fcb6ae611f695d286c53dca076 Mon Sep 17 00:00:00 2001 From: "Christophe Leroy (CS GROUP)" Date: Tue, 10 Mar 2026 11:00:53 +0100 Subject: [PATCH 32/47] powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access Commit 861574d51bbd ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert gpr32_set_common_user() to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Also changes label from Efault to efault to avoid checkpatch complaining about CamelCase. Signed-off-by: Christophe Leroy (CS GROUP) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/2409643daab08b4bc07004c2b88f42085d1ef45a.1773136838.git.chleroy@kernel.org --- arch/powerpc/kernel/ptrace/ptrace-view.c | 60 ++++++++++++------------ 1 file changed, 29 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c b/arch/powerpc/kernel/ptrace/ptrace-view.c index 0310f9097e39..eb5f2091bb59 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-view.c +++ b/arch/powerpc/kernel/ptrace/ptrace-view.c @@ -758,39 +758,38 @@ static int gpr32_set_common_user(struct task_struct *target, const void *kbuf = NULL; compat_ulong_t reg; - if (!user_read_access_begin(u, count)) - return -EFAULT; + scoped_user_read_access_size(ubuf, count, efault) { + u = ubuf; + pos /= sizeof(reg); + count /= sizeof(reg); - pos /= sizeof(reg); - count /= sizeof(reg); + for (; count > 0 && pos < PT_MSR; --count) { + unsafe_get_user(reg, u++, efault); + regs[pos++] = reg; + } - for (; count > 0 && pos < PT_MSR; --count) { - unsafe_get_user(reg, u++, Efault); - regs[pos++] = reg; + if (count > 0 && pos == PT_MSR) { + unsafe_get_user(reg, u++, efault); + set_user_msr(target, reg); + ++pos; + --count; + } + + for (; count > 0 && pos <= PT_MAX_PUT_REG; --count) { + unsafe_get_user(reg, u++, efault); + regs[pos++] = reg; + } + for (; count > 0 && pos < PT_TRAP; --count, ++pos) + unsafe_get_user(reg, u++, efault); + + if (count > 0 && pos == PT_TRAP) { + unsafe_get_user(reg, u++, efault); + set_user_trap(target, reg); + ++pos; + --count; + } } - if (count > 0 && pos == PT_MSR) { - unsafe_get_user(reg, u++, Efault); - set_user_msr(target, reg); - ++pos; - --count; - } - - for (; count > 0 && pos <= PT_MAX_PUT_REG; --count) { - unsafe_get_user(reg, u++, Efault); - regs[pos++] = reg; - } - for (; count > 0 && pos < PT_TRAP; --count, ++pos) - unsafe_get_user(reg, u++, Efault); - - if (count > 0 && pos == PT_TRAP) { - unsafe_get_user(reg, u++, Efault); - set_user_trap(target, reg); - ++pos; - --count; - } - user_read_access_end(); - ubuf = u; pos *= sizeof(reg); count *= sizeof(reg); @@ -798,8 +797,7 @@ static int gpr32_set_common_user(struct task_struct *target, (PT_TRAP + 1) * sizeof(reg), -1); return 0; -Efault: - user_read_access_end(); +efault: return -EFAULT; } From bf53ede0038fe2a7b02cad85f337aba43ced572a Mon Sep 17 00:00:00 2001 From: "Christophe Leroy (CS GROUP)" Date: Tue, 10 Mar 2026 11:01:31 +0100 Subject: [PATCH 33/47] powerpc/align: Convert emulate_spe() to scoped user access Commit 861574d51bbd ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert emulate_spe() to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Signed-off-by: Christophe Leroy (CS GROUP) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/4ff83cb240da4e2d0c34e2bce4b8b6ef19a33777.1773136880.git.chleroy@kernel.org --- arch/powerpc/kernel/align.c | 75 ++++++++++++++++--------------------- 1 file changed, 33 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 3e37ece06739..61409431138f 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -165,25 +165,23 @@ static int emulate_spe(struct pt_regs *regs, unsigned int reg, temp.ll = data.ll = 0; p = addr; - if (!user_read_access_begin(addr, nb)) - return -EFAULT; - - switch (nb) { - case 8: - unsafe_get_user(temp.v[0], p++, Efault_read); - unsafe_get_user(temp.v[1], p++, Efault_read); - unsafe_get_user(temp.v[2], p++, Efault_read); - unsafe_get_user(temp.v[3], p++, Efault_read); - fallthrough; - case 4: - unsafe_get_user(temp.v[4], p++, Efault_read); - unsafe_get_user(temp.v[5], p++, Efault_read); - fallthrough; - case 2: - unsafe_get_user(temp.v[6], p++, Efault_read); - unsafe_get_user(temp.v[7], p++, Efault_read); + scoped_user_read_access_size(addr, nb, efault) { + switch (nb) { + case 8: + unsafe_get_user(temp.v[0], p++, efault); + unsafe_get_user(temp.v[1], p++, efault); + unsafe_get_user(temp.v[2], p++, efault); + unsafe_get_user(temp.v[3], p++, efault); + fallthrough; + case 4: + unsafe_get_user(temp.v[4], p++, efault); + unsafe_get_user(temp.v[5], p++, efault); + fallthrough; + case 2: + unsafe_get_user(temp.v[6], p++, efault); + unsafe_get_user(temp.v[7], p++, efault); + } } - user_read_access_end(); switch (instr) { case EVLDD: @@ -252,25 +250,23 @@ static int emulate_spe(struct pt_regs *regs, unsigned int reg, if (flags & ST) { p = addr; - if (!user_write_access_begin(addr, nb)) - return -EFAULT; - - switch (nb) { - case 8: - unsafe_put_user(data.v[0], p++, Efault_write); - unsafe_put_user(data.v[1], p++, Efault_write); - unsafe_put_user(data.v[2], p++, Efault_write); - unsafe_put_user(data.v[3], p++, Efault_write); - fallthrough; - case 4: - unsafe_put_user(data.v[4], p++, Efault_write); - unsafe_put_user(data.v[5], p++, Efault_write); - fallthrough; - case 2: - unsafe_put_user(data.v[6], p++, Efault_write); - unsafe_put_user(data.v[7], p++, Efault_write); + scoped_user_write_access_size(addr, nb, efault) { + switch (nb) { + case 8: + unsafe_put_user(data.v[0], p++, efault); + unsafe_put_user(data.v[1], p++, efault); + unsafe_put_user(data.v[2], p++, efault); + unsafe_put_user(data.v[3], p++, efault); + fallthrough; + case 4: + unsafe_put_user(data.v[4], p++, efault); + unsafe_put_user(data.v[5], p++, efault); + fallthrough; + case 2: + unsafe_put_user(data.v[6], p++, efault); + unsafe_put_user(data.v[7], p++, efault); + } } - user_write_access_end(); } else { *evr = data.w[0]; regs->gpr[reg] = data.w[1]; @@ -278,12 +274,7 @@ static int emulate_spe(struct pt_regs *regs, unsigned int reg, return 1; -Efault_read: - user_read_access_end(); - return -EFAULT; - -Efault_write: - user_write_access_end(); +efault: return -EFAULT; } #endif /* CONFIG_SPE */ From cd54714e938d4951abc671e562d10c2308613901 Mon Sep 17 00:00:00 2001 From: "Christophe Leroy (CS GROUP)" Date: Tue, 10 Mar 2026 11:03:41 +0100 Subject: [PATCH 34/47] powerpc/sstep: Convert to scoped user access Commit 861574d51bbd ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert single step emulation functions to scoped user access to benefit from masked user access. Scoped user access also make the code simpler. Signed-off-by: Christophe Leroy (CS GROUP) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/8f2d85bddacff18046096dc255fd94f6a0f8b230.1773137010.git.chleroy@kernel.org --- arch/powerpc/lib/sstep.c | 77 +++++++++++++++++----------------------- 1 file changed, 33 insertions(+), 44 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index ac3ee19531d8..f0d6aa657c1a 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -329,20 +329,17 @@ __read_mem_aligned(unsigned long *dest, unsigned long ea, int nb, struct pt_regs static nokprobe_inline int read_mem_aligned(unsigned long *dest, unsigned long ea, int nb, struct pt_regs *regs) { - int err; + void __user *uea = (void __user *)ea; if (is_kernel_addr(ea)) return __read_mem_aligned(dest, ea, nb, regs); - if (user_read_access_begin((void __user *)ea, nb)) { - err = __read_mem_aligned(dest, ea, nb, regs); - user_read_access_end(); - } else { - err = -EFAULT; - regs->dar = ea; - } + scoped_user_read_access_size(uea, nb, efault) + return __read_mem_aligned(dest, (unsigned long)uea, nb, regs); - return err; +efault: + regs->dar = ea; + return -EFAULT; } /* @@ -385,20 +382,17 @@ static __always_inline int __copy_mem_in(u8 *dest, unsigned long ea, int nb, str static nokprobe_inline int copy_mem_in(u8 *dest, unsigned long ea, int nb, struct pt_regs *regs) { - int err; + void __user *uea = (void __user *)ea; if (is_kernel_addr(ea)) return __copy_mem_in(dest, ea, nb, regs); - if (user_read_access_begin((void __user *)ea, nb)) { - err = __copy_mem_in(dest, ea, nb, regs); - user_read_access_end(); - } else { - err = -EFAULT; - regs->dar = ea; - } + scoped_user_read_access_size(uea, nb, efault) + return __copy_mem_in(dest, (unsigned long)uea, nb, regs); - return err; +efault: + regs->dar = ea; + return -EFAULT; } static nokprobe_inline int read_mem_unaligned(unsigned long *dest, @@ -465,20 +459,17 @@ __write_mem_aligned(unsigned long val, unsigned long ea, int nb, struct pt_regs static nokprobe_inline int write_mem_aligned(unsigned long val, unsigned long ea, int nb, struct pt_regs *regs) { - int err; + void __user *uea = (void __user *)ea; if (is_kernel_addr(ea)) return __write_mem_aligned(val, ea, nb, regs); - if (user_write_access_begin((void __user *)ea, nb)) { - err = __write_mem_aligned(val, ea, nb, regs); - user_write_access_end(); - } else { - err = -EFAULT; - regs->dar = ea; - } + scoped_user_write_access_size(uea, nb, efault) + return __write_mem_aligned(val, (unsigned long)uea, nb, regs); - return err; +efault: + regs->dar = ea; + return -EFAULT; } /* @@ -521,20 +512,17 @@ static __always_inline int __copy_mem_out(u8 *dest, unsigned long ea, int nb, st static nokprobe_inline int copy_mem_out(u8 *dest, unsigned long ea, int nb, struct pt_regs *regs) { - int err; + void __user *uea = (void __user *)ea; if (is_kernel_addr(ea)) return __copy_mem_out(dest, ea, nb, regs); - if (user_write_access_begin((void __user *)ea, nb)) { - err = __copy_mem_out(dest, ea, nb, regs); - user_write_access_end(); - } else { - err = -EFAULT; - regs->dar = ea; - } + scoped_user_write_access_size(uea, nb, efault) + return __copy_mem_out(dest, (unsigned long)uea, nb, regs); - return err; +efault: + regs->dar = ea; + return -EFAULT; } static nokprobe_inline int write_mem_unaligned(unsigned long val, @@ -1065,6 +1053,7 @@ static __always_inline int __emulate_dcbz(unsigned long ea) int emulate_dcbz(unsigned long ea, struct pt_regs *regs) { + void __user *uea = (void __user *)ea; int err; unsigned long size = l1_dcache_bytes(); @@ -1073,20 +1062,20 @@ int emulate_dcbz(unsigned long ea, struct pt_regs *regs) if (!address_ok(regs, ea, size)) return -EFAULT; - if (is_kernel_addr(ea)) { + if (is_kernel_addr(ea)) err = __emulate_dcbz(ea); - } else if (user_write_access_begin((void __user *)ea, size)) { - err = __emulate_dcbz(ea); - user_write_access_end(); - } else { - err = -EFAULT; - } + else + scoped_user_write_access_size(uea, size, efault) + err = __emulate_dcbz((unsigned long)uea); if (err) regs->dar = ea; - return err; + +efault: + regs->dar = ea; + return -EFAULT; } NOKPROBE_SYMBOL(emulate_dcbz); From cae734710dd156e2fbb4d66cdb22bbd5080beb52 Mon Sep 17 00:00:00 2001 From: "Christophe Leroy (CS GROUP)" Date: Tue, 10 Mar 2026 11:05:54 +0100 Subject: [PATCH 35/47] powerpc/net: Inline checksum wrappers and convert to scoped user access Commit 861574d51bbd ("powerpc/uaccess: Implement masked user access") provides optimised user access by avoiding the cost of access_ok(). Convert csum_and_copy_to_user() and csum_and_copy_from_user() to scoped user access to benefit from masked user access. csum_and_copy_to_user() and csum_and_copy_from_user() are only called respectively by csum_and_copy_to_iter() and csum_and_copy_from_iter_full() and they are only called twice. Those functions used to be large but they were first reduced by commit c693cc4676a0 ("saner calling conventions for csum_and_copy_..._user()") then commit 70d65cd555c5 ("ppc: propagate the calling conventions change down to csum_partial_copy_generic()"). With the additional size reduction provided by conversion to scoped user access they are not worth being kept out of line. $ ./scripts/bloat-o-meter vmlinux.0 vmlinux.1 add/remove: 0/2 grow/shrink: 2/0 up/down: 136/-176 (-40) Function old new delta csum_and_copy_to_iter 2416 2488 +72 csum_and_copy_from_iter_full 2272 2336 +64 csum_and_copy_to_user 88 - -88 csum_and_copy_from_user 88 - -88 Total: Before=11514471, After=11514431, chg -0.00% Signed-off-by: Christophe Leroy (CS GROUP) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/f44e1b2760dbed35b237040001a91bc8304b726b.1773137098.git.chleroy@kernel.org --- arch/powerpc/include/asm/checksum.h | 22 +++++++++++++--- arch/powerpc/lib/Makefile | 3 +-- arch/powerpc/lib/checksum_wrappers.c | 39 ---------------------------- 3 files changed, 19 insertions(+), 45 deletions(-) delete mode 100644 arch/powerpc/lib/checksum_wrappers.c diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h index 4b573a3b7e17..52921ea2494a 100644 --- a/arch/powerpc/include/asm/checksum.h +++ b/arch/powerpc/include/asm/checksum.h @@ -8,6 +8,7 @@ #include #include +#include /* * Computes the checksum of a memory block at src, length len, * and adds in "sum" (32-bit), while copying the block to dst. @@ -21,11 +22,24 @@ extern __wsum csum_partial_copy_generic(const void *src, void *dst, int len); #define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER -extern __wsum csum_and_copy_from_user(const void __user *src, void *dst, - int len); +static inline __wsum csum_and_copy_from_user(const void __user *src, void *dst, int len) +{ + scoped_user_read_access_size(src, len, efault) + return csum_partial_copy_generic((void __force *)src, dst, len); + +efault: + return 0; +} + #define HAVE_CSUM_COPY_USER -extern __wsum csum_and_copy_to_user(const void *src, void __user *dst, - int len); +static inline __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len) +{ + scoped_user_write_access_size(dst, len, efault) + return csum_partial_copy_generic(src, (void __force *)dst, len); + +efault: + return 0; +} #define _HAVE_ARCH_CSUM_AND_COPY #define csum_partial_copy_nocheck(src, dst, len) \ diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index f14ecab674a3..bcdf387f3998 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -62,8 +62,7 @@ obj64-$(CONFIG_ALTIVEC) += vmx-helper.o obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o \ test_emulate_step_exec_instr.o -obj-y += checksum_$(BITS).o checksum_wrappers.o \ - string_$(BITS).o +obj-y += checksum_$(BITS).o string_$(BITS).o obj-y += sstep.o obj-$(CONFIG_PPC_FPU) += ldstfp.o diff --git a/arch/powerpc/lib/checksum_wrappers.c b/arch/powerpc/lib/checksum_wrappers.c deleted file mode 100644 index 1a14c8780278..000000000000 --- a/arch/powerpc/lib/checksum_wrappers.c +++ /dev/null @@ -1,39 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * - * Copyright (C) IBM Corporation, 2010 - * - * Author: Anton Blanchard - */ -#include -#include -#include -#include -#include - -__wsum csum_and_copy_from_user(const void __user *src, void *dst, - int len) -{ - __wsum csum; - - if (unlikely(!user_read_access_begin(src, len))) - return 0; - - csum = csum_partial_copy_generic((void __force *)src, dst, len); - - user_read_access_end(); - return csum; -} - -__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len) -{ - __wsum csum; - - if (unlikely(!user_write_access_begin(dst, len))) - return 0; - - csum = csum_partial_copy_generic(src, (void __force *)dst, len); - - user_write_access_end(); - return csum; -} From f73338d089deedb4f4f1e49751c30b8b7f595ecd Mon Sep 17 00:00:00 2001 From: "Yury Norov (NVIDIA)" Date: Thu, 14 Aug 2025 15:09:35 -0400 Subject: [PATCH 36/47] powerpc: pci-ioda: use bitmap_alloc() in pnv_ioda_pick_m64_pe() Use the dedicated bitmap_alloc() in pnv_ioda_pick_m64_pe() and drop some housekeeping code. Because pe_alloc is local, annotate it with __free() and get rid of the explicit kfree() calls. Suggested-by: Jiri Slaby Signed-off-by: Yury Norov (NVIDIA) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20250814190936.381346-2-yury.norov@gmail.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 885392f4cd94..83f75d88e53b 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -292,18 +292,16 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus, static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) { + unsigned long *pe_alloc __free(bitmap) = NULL; struct pnv_phb *phb = pci_bus_to_pnvhb(bus); struct pnv_ioda_pe *master_pe, *pe; - unsigned long size, *pe_alloc; int i; /* Root bus shouldn't use M64 */ if (pci_is_root_bus(bus)) return NULL; - /* Allocate bitmap */ - size = ALIGN(phb->ioda.total_pe_num / 8, sizeof(unsigned long)); - pe_alloc = kzalloc(size, GFP_KERNEL); + pe_alloc = bitmap_zalloc(phb->ioda.total_pe_num, GFP_KERNEL); if (!pe_alloc) { pr_warn("%s: Out of memory !\n", __func__); @@ -319,7 +317,6 @@ static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) * pick M64 dependent PE#. */ if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) { - kfree(pe_alloc); return NULL; } @@ -345,7 +342,6 @@ static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) } } - kfree(pe_alloc); return master_pe; } From bd77a34e9a619ee92c03cbb227ca86d814aa6601 Mon Sep 17 00:00:00 2001 From: "Yury Norov (NVIDIA)" Date: Thu, 14 Aug 2025 15:09:36 -0400 Subject: [PATCH 37/47] powerpc: pci-ioda: Optimize pnv_ioda_pick_m64_pe() bitmap_empty() in pnv_ioda_pick_m64_pe() is O(N) and useless because the following find_next_bit() does the same work. Drop it, and while there replace a while() loop with the dedicated for_each_set_bit(). Reviewed-by: Andrew Donnellan Signed-off-by: Yury Norov (NVIDIA) Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20250814190936.381346-3-yury.norov@gmail.com --- arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 83f75d88e53b..32ecbc46e74b 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -295,7 +295,7 @@ static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) unsigned long *pe_alloc __free(bitmap) = NULL; struct pnv_phb *phb = pci_bus_to_pnvhb(bus); struct pnv_ioda_pe *master_pe, *pe; - int i; + unsigned int i; /* Root bus shouldn't use M64 */ if (pci_is_root_bus(bus)) @@ -312,22 +312,15 @@ static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all) pnv_ioda_reserve_m64_pe(bus, pe_alloc, all); /* - * the current bus might not own M64 window and that's all + * Figure out the master PE and put all slave PEs to master + * PE's list to form compound PE. + * + * The current bus might not own M64 window and that's all * contributed by its child buses. For the case, we needn't * pick M64 dependent PE#. */ - if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) { - return NULL; - } - - /* - * Figure out the master PE and put all slave PEs to master - * PE's list to form compound PE. - */ master_pe = NULL; - i = -1; - while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) < - phb->ioda.total_pe_num) { + for_each_set_bit(i, pe_alloc, phb->ioda.total_pe_num) { pe = &phb->ioda.pe_array[i]; phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number; From 156d985123b6d6e5189cfd0286b93c12167ae798 Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 06:32:14 -0400 Subject: [PATCH 38/47] powerpc64/bpf: Implement JIT support for private stack Provision the private stack as a per-CPU allocation during bpf_int_jit_compile(). Align the stack to 16 bytes and place guard regions at both ends to detect runtime stack overflow and underflow. Round the private stack size up to the nearest 16-byte boundary. Make each guard region 16 bytes to preserve the required overall 16-byte alignment. When private stack is set, skip bpf stack size accounting in kernel stack. There is no stack pointer in powerpc. Stack referencing during JIT is done using frame pointer. Frame pointer calculation goes like: BPF frame pointer = Priv stack allocation start address + Overflow guard + Actual stack size defined by verifier Memory layout: High Addr +--------------------------------------------------+ | | | 16 bytes Underflow guard (0xEB9F12345678eb9fULL) | | | BPF FP -> +--------------------------------------------------+ | | | Private stack - determined by verifier | | 16-bytes aligned | | | +--------------------------------------------------+ | | Lower Addr | 16 byte Overflow guard (0xEB9F12345678eb9fULL) | | | Priv stack alloc ->+--------------------------------------------------+ start Update BPF_REG_FP to point to the calculated offset within the allocated private stack buffer. Now, BPF stack usage reference in the allocated private stack. Signed-off-by: Abhishek Dubey Tested-by: Venkat Rao Bagalkote Acked-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401103215.104438-1-adubey@linux.ibm.com --- arch/powerpc/net/bpf_jit.h | 6 ++ arch/powerpc/net/bpf_jit_comp.c | 97 +++++++++++++++++++++++++++++-- arch/powerpc/net/bpf_jit_comp64.c | 31 +++++++++- 3 files changed, 126 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 7354e1d72f79..a232f3fb73be 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -178,8 +178,14 @@ struct codegen_context { bool is_subprog; bool exception_boundary; bool exception_cb; + void __percpu *priv_sp; + unsigned int priv_stack_size; }; +/* Memory size & magic-value to detect private stack overflow/underflow */ +#define PRIV_STACK_GUARD_SZ 16 +#define PRIV_STACK_GUARD_VAL 0xEB9F12345678eb9fULL + #define bpf_to_ppc(r) (ctx->b2p[r]) #ifdef CONFIG_PPC32 diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index a62a9a92b7b5..2018260f56c6 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -129,25 +129,60 @@ bool bpf_jit_needs_zext(void) return true; } +static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size) +{ + int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3; + u64 *stack_ptr; + + for_each_possible_cpu(cpu) { + stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu); + stack_ptr[0] = PRIV_STACK_GUARD_VAL; + stack_ptr[1] = PRIV_STACK_GUARD_VAL; + stack_ptr[underflow_idx] = PRIV_STACK_GUARD_VAL; + stack_ptr[underflow_idx + 1] = PRIV_STACK_GUARD_VAL; + } +} + +static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size, + struct bpf_prog *fp) +{ + int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3; + u64 *stack_ptr; + + for_each_possible_cpu(cpu) { + stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu); + if (stack_ptr[0] != PRIV_STACK_GUARD_VAL || + stack_ptr[1] != PRIV_STACK_GUARD_VAL || + stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL || + stack_ptr[underflow_idx + 1] != PRIV_STACK_GUARD_VAL) { + pr_err("BPF private stack overflow/underflow detected for prog %s\n", + bpf_jit_get_prog_name(fp)); + break; + } + } +} + struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) { u32 proglen; u32 alloclen; u8 *image = NULL; - u32 *code_base; - u32 *addrs; - struct powerpc_jit_data *jit_data; + u32 *code_base = NULL; + u32 *addrs = NULL; + struct powerpc_jit_data *jit_data = NULL; struct codegen_context cgctx; int pass; int flen; + int priv_stack_alloc_size; + void __percpu *priv_stack_ptr = NULL; struct bpf_binary_header *fhdr = NULL; struct bpf_binary_header *hdr = NULL; struct bpf_prog *org_fp = fp; - struct bpf_prog *tmp_fp; + struct bpf_prog *tmp_fp = NULL; bool bpf_blinded = false; bool extra_pass = false; u8 *fimage = NULL; - u32 *fcode_base; + u32 *fcode_base = NULL; u32 extable_len; u32 fixup_len; @@ -173,6 +208,26 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) fp->aux->jit_data = jit_data; } + priv_stack_ptr = fp->aux->priv_stack_ptr; + if (!priv_stack_ptr && fp->aux->jits_use_priv_stack) { + /* + * Allocate private stack of size equivalent to + * verifier-calculated stack size plus two memory + * guard regions to detect private stack overflow + * and underflow. + */ + priv_stack_alloc_size = round_up(fp->aux->stack_depth, 16) + + 2 * PRIV_STACK_GUARD_SZ; + priv_stack_ptr = __alloc_percpu_gfp(priv_stack_alloc_size, 16, GFP_KERNEL); + if (!priv_stack_ptr) { + fp = org_fp; + goto out_priv_stack; + } + + priv_stack_init_guard(priv_stack_ptr, priv_stack_alloc_size); + fp->aux->priv_stack_ptr = priv_stack_ptr; + } + flen = fp->len; addrs = jit_data->addrs; if (addrs) { @@ -209,6 +264,19 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) cgctx.is_subprog = bpf_is_subprog(fp); cgctx.exception_boundary = fp->aux->exception_boundary; cgctx.exception_cb = fp->aux->exception_cb; + cgctx.priv_sp = priv_stack_ptr; + cgctx.priv_stack_size = 0; + if (priv_stack_ptr) { + /* + * priv_stack_size required for setting bpf FP inside + * percpu allocation. + * stack_size is marked 0 to prevent allocation on + * general stack and offset calculation don't go for + * a toss in bpf_jit_stack_offsetof() & bpf_jit_stack_local() + */ + cgctx.priv_stack_size = cgctx.stack_size; + cgctx.stack_size = 0; + } /* Scouting faux-generate pass 0 */ if (bpf_jit_build_body(fp, NULL, NULL, &cgctx, addrs, 0, false)) { @@ -306,6 +374,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) } bpf_prog_fill_jited_linfo(fp, addrs); out_addrs: + if (!image && priv_stack_ptr) { + fp->aux->priv_stack_ptr = NULL; + free_percpu(priv_stack_ptr); + } +out_priv_stack: kfree(addrs); kfree(jit_data); fp->aux->jit_data = NULL; @@ -419,6 +492,8 @@ void bpf_jit_free(struct bpf_prog *fp) if (fp->jited) { struct powerpc_jit_data *jit_data = fp->aux->jit_data; struct bpf_binary_header *hdr; + void __percpu *priv_stack_ptr; + int priv_stack_alloc_size; /* * If we fail the final pass of JIT (from jit_subprogs), @@ -432,6 +507,13 @@ void bpf_jit_free(struct bpf_prog *fp) } hdr = bpf_jit_binary_pack_hdr(fp); bpf_jit_binary_pack_free(hdr, NULL); + priv_stack_ptr = fp->aux->priv_stack_ptr; + if (priv_stack_ptr) { + priv_stack_alloc_size = round_up(fp->aux->stack_depth, 16) + + 2 * PRIV_STACK_GUARD_SZ; + priv_stack_check_guard(priv_stack_ptr, priv_stack_alloc_size, fp); + free_percpu(priv_stack_ptr); + } WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp)); } @@ -453,6 +535,11 @@ bool bpf_jit_supports_kfunc_call(void) return IS_ENABLED(CONFIG_PPC64); } +bool bpf_jit_supports_private_stack(void) +{ + return IS_ENABLED(CONFIG_PPC64); +} + bool bpf_jit_supports_arena(void) { return IS_ENABLED(CONFIG_PPC64); diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index c5e26d231cd5..6670d8c69ade 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -183,6 +183,24 @@ void bpf_jit_realloc_regs(struct codegen_context *ctx) { } +static void emit_fp_priv_stack(u32 *image, struct codegen_context *ctx) +{ + PPC_LI64(bpf_to_ppc(BPF_REG_FP), (__force long)ctx->priv_sp); + /* + * Load base percpu pointer of private stack allocation. + * Runtime per-cpu address = (base + data_offset) + (guard + stack_size) + */ +#ifdef CONFIG_SMP + /* Load percpu data offset */ + EMIT(PPC_RAW_LD(bpf_to_ppc(TMP_REG_1), _R13, + offsetof(struct paca_struct, data_offset))); + EMIT(PPC_RAW_ADD(bpf_to_ppc(BPF_REG_FP), + bpf_to_ppc(TMP_REG_1), bpf_to_ppc(BPF_REG_FP))); +#endif + EMIT(PPC_RAW_ADDI(bpf_to_ppc(BPF_REG_FP), bpf_to_ppc(BPF_REG_FP), + PRIV_STACK_GUARD_SZ + round_up(ctx->priv_stack_size, 16))); +} + /* * For exception boundary & exception_cb progs: * return increased size to accommodate additional NVRs. @@ -307,9 +325,16 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx) * Exception_cb not restricted from using stack area or arena. * Setup frame pointer to point to the bpf stack area */ - if (bpf_is_seen_register(ctx, bpf_to_ppc(BPF_REG_FP))) - EMIT(PPC_RAW_ADDI(bpf_to_ppc(BPF_REG_FP), _R1, - STACK_FRAME_MIN_SIZE + ctx->stack_size)); + if (bpf_is_seen_register(ctx, bpf_to_ppc(BPF_REG_FP))) { + if (ctx->priv_sp) { + /* Set up fp in private stack */ + emit_fp_priv_stack(image, ctx); + } else { + /* Setup frame pointer to point to the bpf stack area */ + EMIT(PPC_RAW_ADDI(bpf_to_ppc(BPF_REG_FP), _R1, + STACK_FRAME_MIN_SIZE + ctx->stack_size)); + } + } if (ctx->arena_vm_start) PPC_LI64(bpf_to_ppc(ARENA_VM_START), ctx->arena_vm_start); From e640bcd1bf83dbdaa967b20cd98a782d52ec89cf Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 06:32:15 -0400 Subject: [PATCH 39/47] selftests/bpf: Enable private stack tests for powerpc64 With support of private stack, relevant tests must pass on powerpc64. #./test_progs -t struct_ops_private_stack #434/1 struct_ops_private_stack/private_stack:OK #434/2 struct_ops_private_stack/private_stack_fail:OK #434/3 struct_ops_private_stack/private_stack_recur:OK #434 struct_ops_private_stack:OK Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Abhishek Dubey Tested-by: Venkat Rao Bagalkote Reviewed-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401103215.104438-2-adubey@linux.ibm.com --- .../bpf/prog_tests/struct_ops_private_stack.c | 30 ++++++++----------- .../bpf/progs/struct_ops_private_stack.c | 6 ---- .../bpf/progs/struct_ops_private_stack_fail.c | 6 ---- .../progs/struct_ops_private_stack_recur.c | 6 ---- 4 files changed, 13 insertions(+), 35 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c index d42123a0fb16..98db9bafa44b 100644 --- a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c +++ b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c @@ -5,6 +5,7 @@ #include "struct_ops_private_stack_fail.skel.h" #include "struct_ops_private_stack_recur.skel.h" +#if defined(__x86_64__) || defined(__aarch64__) || defined(__powerpc64__) static void test_private_stack(void) { struct struct_ops_private_stack *skel; @@ -15,11 +16,6 @@ static void test_private_stack(void) if (!ASSERT_OK_PTR(skel, "struct_ops_private_stack__open")) return; - if (skel->data->skip) { - test__skip(); - goto cleanup; - } - err = struct_ops_private_stack__load(skel); if (!ASSERT_OK(err, "struct_ops_private_stack__load")) goto cleanup; @@ -48,15 +44,9 @@ static void test_private_stack_fail(void) if (!ASSERT_OK_PTR(skel, "struct_ops_private_stack_fail__open")) return; - if (skel->data->skip) { - test__skip(); - goto cleanup; - } - err = struct_ops_private_stack_fail__load(skel); ASSERT_ERR(err, "struct_ops_private_stack_fail__load"); -cleanup: struct_ops_private_stack_fail__destroy(skel); } @@ -70,11 +60,6 @@ static void test_private_stack_recur(void) if (!ASSERT_OK_PTR(skel, "struct_ops_private_stack_recur__open")) return; - if (skel->data->skip) { - test__skip(); - goto cleanup; - } - err = struct_ops_private_stack_recur__load(skel); if (!ASSERT_OK(err, "struct_ops_private_stack_recur__load")) goto cleanup; @@ -93,7 +78,7 @@ static void test_private_stack_recur(void) struct_ops_private_stack_recur__destroy(skel); } -void test_struct_ops_private_stack(void) +static void __test_struct_ops_private_stack(void) { if (test__start_subtest("private_stack")) test_private_stack(); @@ -102,3 +87,14 @@ void test_struct_ops_private_stack(void) if (test__start_subtest("private_stack_recur")) test_private_stack_recur(); } +#else +static void __test_struct_ops_private_stack(void) +{ + test__skip(); +} +#endif + +void test_struct_ops_private_stack(void) +{ + __test_struct_ops_private_stack(); +} diff --git a/tools/testing/selftests/bpf/progs/struct_ops_private_stack.c b/tools/testing/selftests/bpf/progs/struct_ops_private_stack.c index dbe646013811..3cd0c1a55cbd 100644 --- a/tools/testing/selftests/bpf/progs/struct_ops_private_stack.c +++ b/tools/testing/selftests/bpf/progs/struct_ops_private_stack.c @@ -7,12 +7,6 @@ char _license[] SEC("license") = "GPL"; -#if defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64) -bool skip __attribute((__section__(".data"))) = false; -#else -bool skip = true; -#endif - void bpf_testmod_ops3_call_test_2(void) __ksym; int val_i, val_j; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_private_stack_fail.c b/tools/testing/selftests/bpf/progs/struct_ops_private_stack_fail.c index 3d89ad7cbe2a..1442728f5604 100644 --- a/tools/testing/selftests/bpf/progs/struct_ops_private_stack_fail.c +++ b/tools/testing/selftests/bpf/progs/struct_ops_private_stack_fail.c @@ -7,12 +7,6 @@ char _license[] SEC("license") = "GPL"; -#if defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64) -bool skip __attribute((__section__(".data"))) = false; -#else -bool skip = true; -#endif - void bpf_testmod_ops3_call_test_2(void) __ksym; int val_i, val_j; diff --git a/tools/testing/selftests/bpf/progs/struct_ops_private_stack_recur.c b/tools/testing/selftests/bpf/progs/struct_ops_private_stack_recur.c index b1f6d7e5a8e5..faaa0f8d65a4 100644 --- a/tools/testing/selftests/bpf/progs/struct_ops_private_stack_recur.c +++ b/tools/testing/selftests/bpf/progs/struct_ops_private_stack_recur.c @@ -7,12 +7,6 @@ char _license[] SEC("license") = "GPL"; -#if defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64) -bool skip __attribute((__section__(".data"))) = false; -#else -bool skip = true; -#endif - void bpf_testmod_ops3_call_test_1(void) __ksym; int val_i, val_j; From 6fab063bd8d64f15cde2d194c08a159ad3afdf27 Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 10:10:42 -0400 Subject: [PATCH 40/47] powerpc64/bpf: Implement fsession support Implement JIT support for fsession in powerpc64 trampoline. The trampoline stack now accommodate session cookies and function metadata in place of function argument. fentry/fexit programs consume corresponding function metadata. This mirrors existing x86 behavior and enable session cookies on powerpc64. # ./test_progs -t fsession #135/1 fsession_test/fsession_test:OK #135/2 fsession_test/fsession_reattach:OK #135/3 fsession_test/fsession_cookie:OK #135 fsession_test:OK Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Abhishek Dubey Tested-by: Venkat Rao Bagalkote Acked-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401141043.41513-1-adubey@linux.ibm.com --- arch/powerpc/net/bpf_jit.h | 4 +- arch/powerpc/net/bpf_jit_comp.c | 69 ++++++++++++++++++++++++++----- arch/powerpc/net/bpf_jit_comp64.c | 25 +++++++++++ 3 files changed, 87 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index a232f3fb73be..f32de8704d4d 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -218,7 +218,9 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx); void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx); void bpf_jit_realloc_regs(struct codegen_context *ctx); int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr); - +void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt, + int cookie_off, int retval_off); +void store_func_meta(u32 *image, struct codegen_context *ctx, u64 func_meta, int func_meta_off); int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int pass, struct codegen_context *ctx, int insn_idx, int jmp_off, int dst_reg, u32 code); diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 2018260f56c6..16d15ff3145a 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -540,6 +540,11 @@ bool bpf_jit_supports_private_stack(void) return IS_ENABLED(CONFIG_PPC64); } +bool bpf_jit_supports_fsession(void) +{ + return IS_ENABLED(CONFIG_PPC64); +} + bool bpf_jit_supports_arena(void) { return IS_ENABLED(CONFIG_PPC64); @@ -812,12 +817,16 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im struct bpf_tramp_links *tlinks, void *func_addr) { - int regs_off, nregs_off, ip_off, run_ctx_off, retval_off, nvr_off, alt_lr_off, r4_off = 0; + int regs_off, func_meta_off, ip_off, run_ctx_off, retval_off; + int nvr_off, alt_lr_off, r4_off = 0; struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; int i, ret, nr_regs, retaddr_off, bpf_frame_size = 0; struct codegen_context codegen_ctx, *ctx; + int cookie_off, cookie_cnt, cookie_ctx_off; + int fsession_cnt = bpf_fsession_cnt(tlinks); + u64 func_meta; u32 *image = (u32 *)rw_image; ppc_inst_t branch_insn; u32 *branches = NULL; @@ -853,9 +862,11 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im * [ reg argN ] * [ ... ] * regs_off [ reg_arg1 ] prog_ctx - * nregs_off [ args count ] ((u64 *)prog_ctx)[-1] + * func_meta_off [ args count ] ((u64 *)prog_ctx)[-1] * ip_off [ traced function ] ((u64 *)prog_ctx)[-2] + * [ stack cookieN ] * [ ... ] + * cookie_off [ stack cookie1 ] * run_ctx_off [ bpf_tramp_run_ctx ] * [ reg argN ] * [ ... ] @@ -887,16 +898,21 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im run_ctx_off = bpf_frame_size; bpf_frame_size += round_up(sizeof(struct bpf_tramp_run_ctx), SZL); + /* room for session cookies */ + cookie_off = bpf_frame_size; + cookie_cnt = bpf_fsession_cookie_cnt(tlinks); + bpf_frame_size += cookie_cnt * 8; + /* Room for IP address argument */ ip_off = bpf_frame_size; if (flags & BPF_TRAMP_F_IP_ARG) bpf_frame_size += SZL; - /* Room for args count */ - nregs_off = bpf_frame_size; + /* Room for function metadata, arg regs count */ + func_meta_off = bpf_frame_size; bpf_frame_size += SZL; - /* Room for args */ + /* Room for arg regs */ regs_off = bpf_frame_size; bpf_frame_size += nr_regs * SZL; @@ -995,9 +1011,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im EMIT(PPC_RAW_STL(_R3, _R1, retaddr_off)); } - /* Save function arg count -- see bpf_get_func_arg_cnt() */ - EMIT(PPC_RAW_LI(_R3, nr_regs)); - EMIT(PPC_RAW_STL(_R3, _R1, nregs_off)); + /* Save function arg regs count -- see bpf_get_func_arg_cnt() */ + func_meta = nr_regs; + store_func_meta(image, ctx, func_meta, func_meta_off); /* Save nv regs */ EMIT(PPC_RAW_STL(_R25, _R1, nvr_off)); @@ -1011,10 +1027,28 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im return ret; } - for (i = 0; i < fentry->nr_links; i++) + if (fsession_cnt) { + /* + * Clear all the session cookies' values + * Clear the return value to make sure fentry always get 0 + */ + prepare_for_fsession_fentry(image, ctx, cookie_cnt, cookie_off, retval_off); + } + + cookie_ctx_off = (regs_off - cookie_off) / 8; + + for (i = 0; i < fentry->nr_links; i++) { + if (bpf_prog_calls_session_cookie(fentry->links[i])) { + u64 meta = func_meta | (cookie_ctx_off << BPF_TRAMP_COOKIE_INDEX_SHIFT); + + store_func_meta(image, ctx, meta, func_meta_off); + cookie_ctx_off--; + } + if (invoke_bpf_prog(image, ro_image, ctx, fentry->links[i], regs_off, retval_off, run_ctx_off, flags & BPF_TRAMP_F_RET_FENTRY_RET)) return -EINVAL; + } if (fmod_ret->nr_links) { branches = kcalloc(fmod_ret->nr_links, sizeof(u32), GFP_KERNEL); @@ -1076,12 +1110,27 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im image[branches[i]] = ppc_inst_val(branch_insn); } - for (i = 0; i < fexit->nr_links; i++) + /* set the "is_return" flag for fsession */ + func_meta |= (1ULL << BPF_TRAMP_IS_RETURN_SHIFT); + if (fsession_cnt) + store_func_meta(image, ctx, func_meta, func_meta_off); + + cookie_ctx_off = (regs_off - cookie_off) / 8; + + for (i = 0; i < fexit->nr_links; i++) { + if (bpf_prog_calls_session_cookie(fexit->links[i])) { + u64 meta = func_meta | (cookie_ctx_off << BPF_TRAMP_COOKIE_INDEX_SHIFT); + + store_func_meta(image, ctx, meta, func_meta_off); + cookie_ctx_off--; + } + if (invoke_bpf_prog(image, ro_image, ctx, fexit->links[i], regs_off, retval_off, run_ctx_off, false)) { ret = -EINVAL; goto cleanup; } + } if (flags & BPF_TRAMP_F_CALL_ORIG) { if (ro_image) /* image is NULL for dummy pass */ diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 6670d8c69ade..d9038c468af6 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -179,6 +179,31 @@ static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg) BUG(); } +void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt, + int cookie_off, int retval_off) +{ + EMIT(PPC_RAW_LI(bpf_to_ppc(TMP_REG_1), 0)); + + for (int i = 0; i < cookie_cnt; i++) + EMIT(PPC_RAW_STD(bpf_to_ppc(TMP_REG_1), _R1, cookie_off + 8 * i)); + EMIT(PPC_RAW_STD(bpf_to_ppc(TMP_REG_1), _R1, retval_off)); +} + +void store_func_meta(u32 *image, struct codegen_context *ctx, + u64 func_meta, int func_meta_off) +{ + /* + * Store func_meta to stack at [R1 + func_meta_off] = func_meta + * + * func_meta : + * bit[63]: is_return flag + * byte[1]: cookie offset from ctx + * byte[0]: args count + */ + PPC_LI64(bpf_to_ppc(TMP_REG_1), func_meta); + EMIT(PPC_RAW_STD(bpf_to_ppc(TMP_REG_1), _R1, func_meta_off)); +} + void bpf_jit_realloc_regs(struct codegen_context *ctx) { } From 92258b5bf1ec10204c23a793793a65dc92d17014 Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 10:10:43 -0400 Subject: [PATCH 41/47] powerpc32/bpf: Add fsession support Extend JIT support of fsession in powerpc64 trampoline, since ppc64 and ppc32 shares common trampoline implementation. Arch specific helpers handle 64-bit data copy using 32 bit regs. Need to validate fsession support along with trampoline support. Signed-off-by: Abhishek Dubey Acked-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401141043.41513-2-adubey@linux.ibm.com --- arch/powerpc/net/bpf_jit_comp.c | 8 ++++++- arch/powerpc/net/bpf_jit_comp32.c | 35 +++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 16d15ff3145a..b2fdf8ff9c60 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -542,7 +542,13 @@ bool bpf_jit_supports_private_stack(void) bool bpf_jit_supports_fsession(void) { - return IS_ENABLED(CONFIG_PPC64); + /* + * TODO: Remove after validating support + * for fsession and trampoline on ppc32. + */ + if (IS_ENABLED(CONFIG_PPC32)) + return -EOPNOTSUPP; + return true; } bool bpf_jit_supports_arena(void) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index 3087e744fb25..f3ae89e1d1d0 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -123,6 +123,41 @@ void bpf_jit_realloc_regs(struct codegen_context *ctx) } } +void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt, + int cookie_off, int retval_off) +{ + /* + * Set session cookies value + * Clear cookies field on stack + * Ensure retval to be cleared on fentry + */ + EMIT(PPC_RAW_LI(bpf_to_ppc(TMP_REG), 0)); + + for (int i = 0; i < cookie_cnt; i++) { + EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, cookie_off + 4 * i)); + EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, cookie_off + 4 * i + 4)); + } + + EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, retval_off)); + EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, retval_off + 4)); +} + +void store_func_meta(u32 *image, struct codegen_context *ctx, + u64 func_meta, int func_meta_off) +{ + /* + * Store func_meta to stack: [R1 + func_meta_off] = func_meta + * func_meta := argument count in first byte + cookie value + */ + /* Store lower word */ + EMIT(PPC_RAW_LI32(bpf_to_ppc(TMP_REG), (u32)func_meta)); + EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, func_meta_off)); + + /* Store upper word */ + EMIT(PPC_RAW_LI32(bpf_to_ppc(TMP_REG), (u32)(func_meta >> 32))); + EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, func_meta_off + 4)); +} + void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx) { int i; From 1e4bac7eb95a5a1aed5b39971ef77dca5b0f8a9f Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 11:21:30 -0400 Subject: [PATCH 42/47] powerpc/bpf: Add support for instruction array On loading the BPF program, the verifier might adjust/omit some instructions. The adjusted instruction offset is accounted in the map containing original instruction -> xlated mapping. This patch add ppc64 JIT support to additionally build the xlated->jitted mapping for every instruction present in instruction array. This change is needed to enable support for indirect jumps, added in a subsequent patch. Invoke bpf_prog_update_insn_ptrs() with offset pair of xlated_offset and jited_offset. The offset mapping is already available, which is being used for bpf_prog_fill_jited_linfo() and can be directly used for bpf_prog_update_insn_ptrs() as well. Additional details present at: commit b4ce5923e780 ("bpf, x86: add new map type: instructions array") Signed-off-by: Abhishek Dubey Acked-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401152133.42544-2-adubey@linux.ibm.com --- arch/powerpc/net/bpf_jit_comp.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index b2fdf8ff9c60..50103b3794fb 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -373,6 +373,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) goto out_addrs; } bpf_prog_fill_jited_linfo(fp, addrs); + /* + * On ABI V1, executable code starts after the function + * descriptor, so adjust base accordingly. + */ + bpf_prog_update_insn_ptrs(fp, addrs, + (void *)fimage + FUNCTION_DESCR_SIZE); + out_addrs: if (!image && priv_stack_ptr) { fp->aux->priv_stack_ptr = NULL; From 66cad93ad325b332868c062bbd0de65ca4e59657 Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 11:21:31 -0400 Subject: [PATCH 43/47] selftest/bpf: Enable instruction array test for powerpc With instruction array now supported, enable corresponding bpf selftest for powerpc. Signed-off-by: Abhishek Dubey Tested-by: Venkat Rao Bagalkote Acked-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401152133.42544-3-adubey@linux.ibm.com --- tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c index 269870bec941..482d38b9c29e 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_insn_array.c @@ -3,7 +3,7 @@ #include #include -#ifdef __x86_64__ +#if defined(__x86_64__) || defined(__powerpc__) static int map_create(__u32 map_type, __u32 max_entries) { const char *map_name = "insn_array"; From a32325c0e623d594992c4e4616fa685c0e765a33 Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 11:21:32 -0400 Subject: [PATCH 44/47] powerpc64/bpf: Add support for indirect jump MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add support for a new instruction BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=0 which does an indirect jump to a location stored in Rx. The register Rx should have type PTR_TO_INSN. This new type ensures that the Rx register contains a value (or a range of values) loaded from a correct jump table – map of type instruction array. Support indirect jump to all registers in powerpc64 JIT using the ctr register. Move Rx content to ctr register, then invoke bctr instruction to branch to address stored in ctr register. Skip save and restore of TOC as the jump is always within the program context. Signed-off-by: Abhishek Dubey Acked-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401152133.42544-4-adubey@linux.ibm.com --- arch/powerpc/net/bpf_jit_comp64.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index d9038c468af6..db364d9083e7 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -1708,6 +1708,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code addrs[++i] = ctx->idx * 4; break; + /* + * JUMP reg + */ + case BPF_JMP | BPF_JA | BPF_X: + EMIT(PPC_RAW_MTCTR(dst_reg)); + EMIT(PPC_RAW_BCTR()); + break; + /* * Return/Exit */ From e1f7a0e196e293c223a882788c6d1a884d06d6d8 Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 1 Apr 2026 11:21:33 -0400 Subject: [PATCH 45/47] selftest/bpf: Enable gotox tests for powerpc64 With gotox instruction and jumptable now supported, enable corresponding bpf selftest on powerpc. Signed-off-by: Abhishek Dubey Tested-by: Venkat Rao Bagalkote Acked-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260401152133.42544-5-adubey@linux.ibm.com --- tools/testing/selftests/bpf/progs/verifier_gotox.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/verifier_gotox.c b/tools/testing/selftests/bpf/progs/verifier_gotox.c index 607dad058ca1..0f43b56ec2bc 100644 --- a/tools/testing/selftests/bpf/progs/verifier_gotox.c +++ b/tools/testing/selftests/bpf/progs/verifier_gotox.c @@ -6,7 +6,7 @@ #include "bpf_misc.h" #include "../../../include/linux/filter.h" -#if defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64) +#if defined(__TARGET_ARCH_x86) || defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_powerpc) #define DEFINE_SIMPLE_JUMP_TABLE_PROG(NAME, SRC_REG, OFF, IMM, OUTCOME) \ \ @@ -384,6 +384,6 @@ jt0_%=: \ : __clobber_all); } -#endif /* __TARGET_ARCH_x86 || __TARGET_ARCH_arm64 */ +#endif /* __TARGET_ARCH_x86 || __TARGET_ARCH_arm64 || __TARGET_ARCH_powerpc*/ char _license[] SEC("license") = "GPL"; From e6ef4eb871ed884f5f480579b2e5f4fc9d2cb003 Mon Sep 17 00:00:00 2001 From: Abhishek Dubey Date: Wed, 8 Apr 2026 01:53:01 -0400 Subject: [PATCH 46/47] powerpc32/bpf: fix loading fsession func metadata using PPC_LI32 PPC_RAW_LI32 is not a valid macro in the PowerPC BPF JIT. Use PPC_LI32, which correctly handles immediate loads for large values. Fixes the build error introduced when adding fsession support on ppc32. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202604040212.jIxEd2DW-lkp@intel.com/ Fixes: 92258b5bf1ec ("powerpc32/bpf: Add fsession support") Signed-off-by: Abhishek Dubey Reviewed-by: Hari Bathini Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20260408055301.232745-1-adubey@linux.ibm.com --- arch/powerpc/net/bpf_jit_comp32.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index f3ae89e1d1d0..bfdc50740da8 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -150,11 +150,11 @@ void store_func_meta(u32 *image, struct codegen_context *ctx, * func_meta := argument count in first byte + cookie value */ /* Store lower word */ - EMIT(PPC_RAW_LI32(bpf_to_ppc(TMP_REG), (u32)func_meta)); + PPC_LI32(bpf_to_ppc(TMP_REG), (u32)func_meta); EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, func_meta_off)); /* Store upper word */ - EMIT(PPC_RAW_LI32(bpf_to_ppc(TMP_REG), (u32)(func_meta >> 32))); + PPC_LI32(bpf_to_ppc(TMP_REG), (u32)(func_meta >> 32)); EMIT(PPC_RAW_STW(bpf_to_ppc(TMP_REG), _R1, func_meta_off + 4)); } From b80777aef570ac561977d7210d04890f9df7e484 Mon Sep 17 00:00:00 2001 From: Andrew Donnellan Date: Mon, 8 Dec 2025 16:13:33 +1100 Subject: [PATCH 47/47] mailmap: Add entry for Andrew Donnellan I'm leaving IBM in January 2026. Add mailmap aliases to switch to using my personal email for now. (I will send a patch to update MAINTAINERS soon, hopefully after I can get someone to replace me.) Signed-off-by: Andrew Donnellan Signed-off-by: Madhavan Srinivasan Link: https://patch.msgid.link/20251208-mailmap-v1-1-524d5b9d175b@linux.ibm.com --- .mailmap | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.mailmap b/.mailmap index 63c11ea7e35d..07561ce8e61d 100644 --- a/.mailmap +++ b/.mailmap @@ -75,6 +75,9 @@ Andreas Herrmann Andreas Hindborg Andrej Shadura Andrej Shadura +Andrew Donnellan +Andrew Donnellan +Andrew Donnellan Andrew Morton Andrew Murray Andrew Murray