linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-19 12:51:10 -04:00

Author	SHA1	Message	Date
Mateusz Guzik	262ef8e55b	fork: stop ignoring NUMA while handling cached thread stacks 1. the numa parameter was straight up ignored. 2. nothing was done to check if the to-be-cached/allocated stack matches the local node The id remains ignored on free in case of memoryless nodes. Note the current caching is already bad as the cache keeps overflowing and a different solution is needed for the long run, to be worked out(tm). Stats collected over a kernel build with the patch with the following topology: NUMA node(s): 2 NUMA node0 CPU(s): 0-11 NUMA node1 CPU(s): 12-23 caller's node vs stack backing pages on free: matching: 50083 (70%) mismatched: 21492 (30%) caching efficiency: cached: 32651 (65.2%) dropped: 17432 (34.8%) Link: https://lkml.kernel.org/r/20251120054015.3019419-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Linus Waleij <linus.walleij@linaro.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Kees Cook <kees@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:31 -08:00
Eric Dumazet	94984bfed5	rbtree: inline rb_last() This is a very small function, inlining it saves cpu cycles in TCP by reducing register pressure and removing call/ret overhead. It also reduces vmlinux text size by 122 bytes on a typical x86_64 build. Before: size vmlinux text data bss dec hex filename 34811781 22177365 5685248 62674394 3bc55da vmlinux After: size vmlinux text data bss dec hex filename 34811659 22177365 5685248 62674272 3bc5560 vmlinux [ojeda@kernel.org: fix rust build] Link: https://lkml.kernel.org/r/20251120085518.1463498-1-ojeda@kernel.org Link: https://lkml.kernel.org/r/20251114140646.3817319-3-edumazet@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:30 -08:00
Eric Dumazet	c2d2dad245	rbtree: inline rb_first() Patch series "rbree: inline rb_first() and rb_last()". Inline these two small helpers, heavily used in TCP and FQ packet scheduler, and in many other places. This reduces kernel text size, and brings an 1.5 % improvement on network TCP stress test. This patch (of 2): This is a very small function, inlining it saves cpu cycles by reducing register pressure and removing call/ret overhead. It also reduces vmlinux text size by 744 bytes on a typical x86_64 build. Before: size vmlinux text data bss dec hex filename 34812525 22177365 5685248 62675138 3bc58c2 vmlinux After: size vmlinux text data bss dec hex filename 34811781 22177365 5685248 62674394 3bc55da vmlinux [ojeda@kernel.org: fix rust build] Link: https://lkml.kernel.org/r/20251120085518.1463498-1-ojeda@kernel.org Link: https://lkml.kernel.org/r/20251114140646.3817319-1-edumazet@google.com Link: https://lkml.kernel.org/r/20251114140646.3817319-2-edumazet@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:30 -08:00
Andrew Morton	bc947af677	Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be able to merge "kho: make debugfs interface optional" into mm-nonmm-stable.	2025-11-27 14:17:02 -08:00
Kiryl Shutsemau	7c9580f44f	mm/filemap: fix logic around SIGBUS in filemap_map_pages() Chris noticed that filemap_map_pages() calculates can_map_large only once for the first page in the fault around range. The value is not valid for the following pages in the range and must be recalculated. Instead of recalculating can_map_large on each iteration, pass down file_end to filemap_map_folio_range() and let it make the decision on what can be mapped. Link: https://lkml.kernel.org/r/20251120161411.859078-1-kirill@shutemov.name Fixes: `74207de2ba` ("mm/memory: do not populate page table entries beyond i_size")h Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reported-by: Chris Mason <clm@meta.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Mason <clm@meta.com> Cc: Christian Brauner <brauner@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:18 -08:00
Wei Yang	cff47b9e39	mm/huge_memory: fix NULL pointer deference when splitting folio Commit `c010d47f10` ("mm: thp: split huge page to any lower order pages") introduced an early check on the folio's order via mapping->flags before proceeding with the split work. This check introduced a bug: for shmem folios in the swap cache and truncated folios, the mapping pointer can be NULL. Accessing mapping->flags in this state leads directly to a NULL pointer dereference. This commit fixes the issue by moving the check for mapping != NULL before any attempt to access mapping->flags. Link: https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com Fixes: `c010d47f10` ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:17 -08:00
Pratyush Yadav	6c96c6bd2c	MAINTAINERS: add test_kho to KHO's entry Commit `b753522bed` ("kho: add test for kexec handover") introduced the KHO test but missed adding it to KHO's MAINTAINERS entry. Add it so the KHO maintainers can get patches for its test. Link: https://lkml.kernel.org/r/20251118182416.70660-1-pratyush@kernel.org Fixes: `b753522bed` ("kho: add test for kexec handover") Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:17 -08:00
Sam Protsenko	52ac3f5829	mailmap: add entry for Sam Protsenko Use 'Sam Protsenko' as my name consistently in git-shortlog. Also map my old GlobalLogic email address to my current email to stay reachable. Link: https://lkml.kernel.org/r/20251118033111.23382-1-semen.protsenko@linaro.org Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:17 -08:00
Carlos Llamas	f0bb6dba3d	selftests/mm: fix division-by-zero in uffd-unit-tests Commit `4dfd4bba85` ("selftests/mm/uffd: refactor non-composite global vars into struct") moved some of the operations previously implemented in uffd_setup_environment() earlier in the main test loop. The calculation of nr_pages, which involves a division by page_size, now occurs before checking that default_huge_page_size() returns a non-zero This leads to a division-by-zero error on systems with !CONFIG_HUGETLB. Fix this by relocating the non-zero page_size check before the nr_pages calculation, as it was originally implemented. Link: https://lkml.kernel.org/r/20251113034623.3127012-1-cmllamas@google.com Fixes: `4dfd4bba85` ("selftests/mm/uffd: refactor non-composite global vars into struct") Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Ujwal Kundur <ujwal.kundur@gmail.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:17 -08:00
Liam R. Howlett	270065f514	mm/mmap_lock: reset maple state on lock_vma_under_rcu() retry The retry in lock_vma_under_rcu() drops the rcu read lock before reacquiring the lock and trying again. This may cause a use-after-free if the maple node the maple state was using was freed. The maple state is protected by the rcu read lock. When the lock is dropped, the state cannot be reused as it tracks pointers to objects that may be freed during the time where the lock was not held. Any time the rcu read lock is dropped, the maple state must be invalidated. Resetting the address and state to MA_START is the safest course of action, which will result in the next operation starting from the top of the tree. Prior to commit `0b16f8bed1` ("mm: change vma_start_read() to drop RCU lock on failure"), vma_start_read() would drop rcu read lock and return NULL, so the retry would not have happened. However, now that vma_start_read() drops rcu read lock on failure followed by a retry, we may end up using a freed maple tree node cached in the maple state. [surenb@google.com: changelog alteration] Link: https://lkml.kernel.org/r/CAJuCfpEWMD-Z1j=nPYHcQW4F7E2Wka09KTXzGv7VE7oW1S8hcw@mail.gmail.com Link: https://lkml.kernel.org/r/20251111215605.1721380-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Fixes: `0b16f8bed1` ("mm: change vma_start_read() to drop RCU lock on failure") Reported-by: syzbot+131f9eb2b5807573275c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=131f9eb2b5807573275c Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:17 -08:00
Deepanshu Kartikey	de8798965f	mm/memfd: fix information leak in hugetlb folios When allocating hugetlb folios for memfd, three initialization steps are missing: 1. Folios are not zeroed, leading to kernel memory disclosure to userspace 2. Folios are not marked uptodate before adding to page cache 3. hugetlb_fault_mutex is not taken before hugetlb_add_to_page_cache() The memfd allocation path bypasses the normal page fault handler (hugetlb_no_page) which would handle all of these initialization steps. This is problematic especially for udmabuf use cases where folios are pinned and directly accessed by userspace via DMA. Fix by matching the initialization pattern used in hugetlb_no_page(): - Zero the folio using folio_zero_user() which is optimized for huge pages - Mark it uptodate with folio_mark_uptodate() - Take hugetlb_fault_mutex before adding to page cache to prevent races The folio_zero_user() change also fixes a potential security issue where uninitialized kernel memory could be disclosed to userspace through read() or mmap() operations on the memfd. Link: https://lkml.kernel.org/r/20251112145034.2320452-1-kartikey406@gmail.com Fixes: `89c1905d9c` ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios") Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reported-by: syzbot+f64019ba229e3a5c411b@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/20251112031631.2315651-1-kartikey406@gmail.com/ [v1] Closes: https://syzkaller.appspot.com/bug?extid=f64019ba229e3a5c411b Suggested-by: Oscar Salvador <osalvador@suse.de> Suggested-by: David Hildenbrand <david@redhat.com> Tested-by: syzbot+f64019ba229e3a5c411b@syzkaller.appspotmail.com Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> (v2) Cc: Christoph Hellwig <hch@lst.de> (v6) Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:17 -08:00
Youngjun Park	f5e31a196e	mm: swap: remove duplicate nr_swap_pages decrement in get_swap_page_of_type() After commit `4f78252da8`, nr_swap_pages is decremented in swap_range_alloc(). Since cluster_alloc_swap_entry() calls swap_range_alloc() internally, the decrement in get_swap_page_of_type() causes double-decrementing. As a representative userspace-visible runtime example of the impact, /proc/meminfo reports increasingly inaccurate SwapFree values. The discrepancy grows with each swap allocation, and during hibernation when large amounts of memory are written to swap, the reported value can deviate significantly from actual available swap space, misleading users and monitoring tools. Remove the duplicate decrement. Link: https://lkml.kernel.org/r/20251102082456.79807-1-youngjun.park@lge.com Fixes: `4f78252da8` ("mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc()") Signed-off-by: Youngjun Park <youngjun.park@lge.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Kairui Song <kasong@tencent.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-24 14:25:17 -08:00
Ahmet Eray Karadag	58b6fcd2ab	ocfs2: mark inode bad upon validation failure during read A VFS cache inconsistency, potentially triggered by sequences like buffered writes followed by open(O_DIRECT), can result in an invalid on-disk inode block (e.g., bad signature). OCFS2 detects this corruption when reading the inode block via ocfs2_validate_inode_block(), logs "Invalid dinode", and often switches the filesystem to read-only mode. The VFS open(O_DIRECT) operation appears to incorrectly clear the inode's I_DIRTY flag without ensuring the dirty metadata (reflecting the earlier buffered write, e.g., an updated i_size) is flushed to disk. This leaves the in-memory VFS inode object "in limbo" with an updated size (e.g., 38639 from the write) but marked clean, while its on-disk counterpart remains stale (e.g., size 0) or invalid. Currently, the function reading the inode block (ocfs2_read_inode_block_full()) fails to call make_bad_inode() upon detecting the validation error. Because the in-memory inode is not marked bad, subsequent operations (like ftruncate) proceed erroneously. They eventually reach code (e.g., ocfs2_truncate_file()) that compares the inconsistent in-memory size (38639) against the invalid/stale on-disk size (0), leading to kernel crashes via BUG_ON. Fix this by calling make_bad_inode(inode) within the error handling path of ocfs2_read_inode_block_full() immediately after a block read or validation error occurs. This ensures VFS is properly notified about the corrupt inode at the point of detection. Marking the inode bad allows VFS to correctly fail subsequent operations targeting this inode early, preventing kernel panics caused by operating on known inconsistent inode states. Link: https://lkml.kernel.org/r/20251118001833.423470-2-eraykrdg1@gmail.com Link: https://lore.kernel.org/all/20251029225748.11361-2-eraykrdg1@gmail.com/T/ Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Signed-off-by: Ahmet Eray Karadag <eraykrdg1@gmail.com> Reported-by: syzbot+b93b65ee321c97861072@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=b93b65ee321c97861072 Reviewed-by: Heming Zhao <heming.zhao@suse.com> Co-developed-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: David Hunter <david.hunter.linux@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:45 -08:00
Thorsten Blum	13db54aad7	ocfs2: replace deprecated strcpy with strscpy strcpy() has been deprecated [1] because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. Replace it with the safer strscpy(), and copy directly into '->rf_signature' instead of using the start of the struct as the destination buffer. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Link: https://lkml.kernel.org/r/20251118185345.132411-3-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:45 -08:00
Thorsten Blum	4022ba2005	ocfs2: replace deprecated strcpy in ocfs2_create_xattr_block strcpy() has been deprecated [1] because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. Replace it with the safer strscpy(), and copy directly into '->xb_signature' instead of using the start of the struct as the destination buffer. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Link: https://lkml.kernel.org/r/20251118185345.132411-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:45 -08:00
Chia-Liang Wang	ff713698ba	lib: ratelimit: fix spelling mistake 'seperately' Corrects a spelling mistake in a comment in ratelimit.c where 'seperately' was used instead of 'separately'. Link: https://lkml.kernel.org/r/20251119101144.3175-1-a0979625527@icloud.com Signed-off-by: Chia-Liang Wang <a0979652527@icloud.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:45 -08:00
Lance Yang	2fe869ecbd	MAINTAINERS: add Petr as a reviewer of hung task detector Petr has been actively reviewing hung task detector patches lately. It's always good to have a fresh pair of eyes, so let's make it official. I checked with him, and he's happy to be added. Link: https://lkml.kernel.org/r/20251119110822.46566-1-lance.yang@linux.dev Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: Petr Mladek <pmladek@suse.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Alice Ryhl	9031b852c9	uaccess: gate _copy_[to\|from]_user on !INLINE_COPY_FROM_USER These methods only exist when INLINE_COPY_FROM_USER is disabled, so update the header file to reflect that. This fixes the following error on builds that enable both RUST and INLINE_COPY_FROM_USER. ERROR: modpost: "_copy_from_user" [samples/rust/rust_misc_device.ko] undefined! ERROR: modpost: "_copy_to_user" [samples/rust/rust_misc_device.ko] undefined! This error is triggered because when a method is available both as a rust_helper_* and normal method, Rust will call the normal method. [akpm@linux-foundation.org: s/INLINE_COPY_FROM_USER/INLINE_COPY_TO_USER/, per Alice] Link: https://lkml.kernel.org/r/20251118173250.2821388-1-aliceryhl@google.com Fixes: `d99dc586ca` ("uaccess: decouple INLINE_COPY_FROM_USER and CONFIG_RUST") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Trevor Gross <tmgross@umich.edu> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Sourabh Jain	aa0145563c	crash: export crashkernel CMA reservation to userspace Add a sysfs entry /sys/kernel/kexec_crash_cma_ranges to expose all CMA crashkernel ranges. This allows userspace tools configuring kdump to determine how much memory is reserved for crashkernel. If CMA is used, tools can warn users when attempting to capture user pages with CMA reservation. The new sysfs hold the CMA ranges in below format: cat /sys/kernel/kexec_crash_cma_ranges 100000000-10c7fffff The reason for not including Crash CMA Ranges in /proc/iomem is to avoid conflicts. It has been observed that contiguous memory ranges are sometimes shown as two separate System RAM entries in /proc/iomem. If a CMA range overlaps two System RAM ranges, adding crashk_res to /proc/iomem can create a conflict. Reference [1] describes one such instance on the PowerPC architecture. Link: https://lkml.kernel.org/r/20251118071023.1673329-1-sourabhjain@linux.ibm.com Link: https://lore.kernel.org/all/20251016142831.144515-1-sourabhjain@linux.ibm.com/ [1] Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Sourabh Jain	fdd76c8d63	Documentation/ABI: add kexec and kdump sysfs interface Add an ABI document for following kexec and kdump sysfs interface: - /sys/kernel/kexec_loaded - /sys/kernel/kexec_crash_loaded - /sys/kernel/kexec_crash_size - /sys/kernel/crash_elfcorehdr_size Link: https://lkml.kernel.org/r/20251117035153.1199665-1-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Guan-Chun Wu	b1b72ac25f	ceph: replace local base64 helpers with lib/base64 Remove the ceph_base64_encode() and ceph_base64_decode() functions and replace their usage with the generic base64_encode() and base64_decode() helpers from lib/base64. This eliminates the custom implementation in Ceph, reduces code duplication, and relies on the shared Base64 code in lib. The helpers preserve RFC 3501-compliant Base64 encoding without padding, so there are no functional changes. This change also improves performance: encoding is about 2.7x faster and decoding achieves 43-52x speedups compared to the previous local implementation. Link: https://lkml.kernel.org/r/20251114060240.89965-1-409411716@gms.tku.edu.tw Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Xiubo Li <xiubli@redhat.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: David Laight <david.laight.linux@gmail.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Guan-Chun Wu	7794510e20	fscrypt: replace local base64url helpers with lib/base64 Replace the base64url encoding and decoding functions in fscrypt with the generic base64_encode() and base64_decode() helpers from lib/base64. This removes the custom implementation in fscrypt, reduces code duplication, and relies on the shared Base64 implementation in lib. The helpers preserve RFC 4648-compliant URL-safe Base64 encoding without padding, so there are no functional changes. This change also improves performance: encoding is about 2.7x faster and decoding achieves 43-52x speedups compared to the previous implementation. Link: https://lkml.kernel.org/r/20251114060221.89734-1-409411716@gms.tku.edu.tw Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Cc: Christoph Hellwig <hch@lst.de> Cc: David Laight <david.laight.linux@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Guan-Chun Wu	8b365c4f5b	lib: add KUnit tests for base64 encoding/decoding Add a KUnit test suite to validate the base64 helpers. The tests cover both encoding and decoding, including padded and unpadded forms as defined by RFC 4648 (standard base64), and add negative cases for malformed inputs and padding errors. The test suite also validates other variants (URLSAFE, IMAP) to ensure their correctness. In addition to functional checks, the suite includes simple microbenchmarks which report average encode/decode latency for small (64B) and larger (1KB) inputs. These numbers are informational only and do not gate the tests. Kconfig (BASE64_KUNIT) and lib/tests/Makefile are updated accordingly. Sample KUnit output: KTAP version 1 # Subtest: base64 # module: base64_kunit 1..4 # base64_performance_tests: [64B] encode run : 32ns # base64_performance_tests: [64B] decode run : 35ns # base64_performance_tests: [1KB] encode run : 510ns # base64_performance_tests: [1KB] decode run : 530ns ok 1 base64_performance_tests ok 2 base64_std_encode_tests ok 3 base64_std_decode_tests ok 4 base64_variant_tests # base64: pass:4 fail:0 skip:0 total:4 # Totals: pass:4 fail:0 skip:0 total:4 Link: https://lkml.kernel.org/r/20251114060157.89507-1-409411716@gms.tku.edu.tw Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Laight <david.laight.linux@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Guan-Chun Wu	9c7d3cf94d	lib/base64: rework encode/decode for speed and stricter validation The old base64 implementation relied on a bit-accumulator loop, which was slow for larger inputs and too permissive in validation. It would accept extra '=', missing '=', or even '=' appearing in the middle of the input, allowing malformed strings to pass. This patch reworks the internals to improve performance and enforce stricter validation. Changes: - Encoder: * Process input in 3-byte blocks, mapping 24 bits into four 6-bit symbols, avoiding bit-by-bit shifting and reducing loop iterations. * Handle the final 1-2 leftover bytes explicitly and emit '=' only when requested. - Decoder: * Based on the reverse lookup tables from the previous patch, decode input in 4-character groups. * Each group is looked up directly, converted into numeric values, and combined into 3 output bytes. * Explicitly handle padded and unpadded forms: - With padding: input length must be a multiple of 4, and '=' is allowed only in the last two positions. Reject stray or early '='. - Without padding: validate tail lengths (2 or 3 chars) and require unused low bits to be zero. * Removed the bit-accumulator style loop to reduce loop iterations. Performance (x86_64, Intel Core i7-10700 @ 2.90GHz, avg over 1000 runs, KUnit): Encode: 64B ~90ns -> ~32ns (~2.8x) 1KB ~1332ns -> ~510ns (~2.6x) Decode: 64B ~1530ns -> ~35ns (~43.7x) 1KB ~27726ns -> ~530ns (~52.3x) [akpm@linux-foundation.org: remove u32 casts, per David and Guan-Chun] Link: https://lkml.kernel.org/r/20251114060132.89279-1-409411716@gms.tku.edu.tw Co-developed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Co-developed-by: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: David Laight <david.laight.linux@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Kuan-Wei Chiu	c4eb7ad32e	lib/base64: optimize base64_decode() with reverse lookup tables Replace the use of strchr() in base64_decode() with precomputed reverse lookup tables for each variant. This avoids repeated string scans and improves performance. Use -1 in the tables to mark invalid characters. Decode: 64B ~1530ns -> ~80ns (~19.1x) 1KB ~27726ns -> ~1239ns (~22.4x) [akpm@linux-foundation.org: fix kernedoc] Link: https://lkml.kernel.org/r/20251114060107.89026-1-409411716@gms.tku.edu.tw Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: David Laight <david.laight.linux@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:44 -08:00
Kuan-Wei Chiu	f1e2ca801c	lib/base64: add support for multiple variants Patch series " lib/base64: add generic encoder/decoder, migrate users", v5. This series introduces a generic Base64 encoder/decoder to the kernel library, eliminating duplicated implementations and delivering significant performance improvements. The Base64 API has been extended to support multiple variants (Standard, URL-safe, and IMAP) as defined in RFC 4648 and RFC 3501. The API now takes a variant parameter and an option to control padding. As part of this series, users are migrated to the new interface while preserving their specific formats: fscrypt now uses BASE64_URLSAFE, Ceph uses BASE64_IMAP, and NVMe is updated to BASE64_STD. On the encoder side, the implementation processes input in 3-byte blocks, mapping 24 bits directly to 4 output symbols. This avoids bit-by-bit streaming and reduces loop overhead, achieving about a 2.7x speedup compared to previous implementations. On the decoder side, replace strchr() lookups with per-variant reverse tables and process input in 4-character groups. Each group is mapped to numeric values and combined into 3 bytes. Padded and unpadded forms are validated explicitly, rejecting invalid '=' usage and enforcing tail rules. This improves throughput by ~43-52x. This patch (of 6): Extend the base64 API to support multiple variants (standard, URL-safe, and IMAP) as defined in RFC 4648 and RFC 3501. The API now takes a variant parameter and an option to control padding. Update NVMe auth code to use the new interface with BASE64_STD. Link: https://lkml.kernel.org/r/20251114055829.87814-1-409411716@gms.tku.edu.tw Link: https://lkml.kernel.org/r/20251114060045.88792-1-409411716@gms.tku.edu.tw Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: David Laight <david.laight.linux@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Feng Tang	03ef32d665	sys_info: add a default kernel sys_info mask Which serves as a global default sys_info mask. When users want the same system information for many error cases (panic, hung, lockup ...), they can chose to set this global knob only once, while not setting up each individual sys_info knobs. This just adds a 'lazy' option, and doesn't change existing kernel behavior as the mask is 0 by default. Link: https://lkml.kernel.org/r/20251113111039.22701-5-feng.tang@linux.alibaba.com Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Feng Tang	a9af76a787	watchdog: add sys_info sysctls to dump sys info on system lockup When soft/hard lockup happens, developers may need different kinds of system information (call-stacks, memory info, locks, etc.) to help debugging. Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to take human readable string like "tasks,mem,timers,locks,ftrace,...", and when system lockup happens, all requested information will be printed out. (refer kernel/sys_info.c for more details). Link: https://lkml.kernel.org/r/20251113111039.22701-4-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Feng Tang	8b2b9b4f6f	hung_task: add hung_task_sys_info sysctl to dump sys info on task-hung When task-hung happens, developers may need different kinds of system information (call-stacks, memory info, locks, etc.) to help debugging. Add 'hung_task_sys_info' sysctl knob to take human readable string like "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all requested information will be dumped. (refer kernel/sys_info.c for more details). Meanwhile, the newly introduced sys_info() call is used to unify some existing info-dumping knobs. [feng.tang@linux.alibaba.com: maintain consistecy established behavior, per Lance and Petr] Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local Link: https://lkml.kernel.org/r/20251113111039.22701-3-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Feng Tang	5f264c00b6	docs: panic: correct some sys_ifo names in sysctl doc Patch series "Enable hung_task and lockup cases to dump system info on demand", v2. When working on kernel stability issues: panic, task-hung and soft/hard lockup are frequently met. And to debug them, user may need lots of system information at that time, like task call stacks, lock info, memory info, ftrace dump, etc. panic case already uses sys_info() for this purpose, and has a 'panic_sys_info' sysctl(also support cmdline setup) interface to take human readable string like "tasks,mem,timers,locks,ftrace,..." to control what kinds of information is needed. Which is also helpful to debug task-hung and lockup cases. So this patchset introduces the similar sys_info sysctl interface for task-hung and lockup cases. his is mainly for debugging and the info dumping could be intrusive, like dumping call stack for all tasks when system has huge number of tasks, similarly for ftrace dump (we may add tracing_stop() and tracing_start() around it) Locally these have been used in our bug chasing for stability issues and were helpful. As Andrew suggested, add a configurable global 'kernel_sys_info' knob. When error scenarios like panic/hung-task/lockup etc doesn't setup their own sys_info knob and calls sys_info() with parameter "0", this global knob will take effect. It could be used for other kernel cases like OOM, which may not need one dedicated sys_info knob. This patch (of 4): Some sys_info names wered forgotten to change in patch iterations, while the right names are defined in kernel/sys_info.c. Link: https://lkml.kernel.org/r/20251113111039.22701-1-feng.tang@linux.alibaba.com Link: https://lkml.kernel.org/r/20251113111039.22701-2-feng.tang@linux.alibaba.com Fixes: `d747755917` ("panic: add 'panic_sys_info' sysctl to take human readable string parameter") Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Kuan-Wei Chiu	9ab38c5216	Revert "lib/plist.c: enforce memory ordering in plist_check_list" This reverts commit `7abcb84f95`. The introduction of WRITE_ONCE() calls for the 'prev' and 'next' variables inside plist_check_list() was a misapplication. WRITE_ONCE() is fundamentally a compiler barrier designed to prevent compiler optimizations (like caching or reordering) on shared memory locations. However, the variables 'prev' and 'next' are local, stack-allocated pointers accessed only by the current thread's invocation of the function. Since these pointers are thread-local and are never accessed concurrently, applying WRITE_ONCE() to them is semantically incorrect and unnecessary. Furthermore, the use of WRITE_ONCE() on local variables prevents the compiler from performing standard optimizations, such as keeping these variables cached solely in CPU registers throughout the loop, potentially introducing performance overhead. Restore the conventional C assignment for local loop variables, allowing the compiler to generate optimal code. Link: https://lkml.kernel.org/r/20251113193413.499309-1-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: I Hsin Cheng <richard120310@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Gustavo Padovan	b50144900a	MAINTAINERS: remove Gustavo from sync framework I haven't been involved in the work anymore for some time. It is only fair that I remove myself from it and let other continue to take care of it. Link: https://lkml.kernel.org/r/20251112134330.64130-1-gustavo.padovan@collabora.com Signed-off-by: Gustavo Padovan <gustavo@padovan.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Xie Yuanbin	242b872239	include/linux/once_lite.h: fix judgment in WARN_ONCE with clang For c code: ```c extern int xx; void test(void) { if (WARN_ONCE(xx, "x")) __asm__ volatile ("nop":::); } ``` Clang will generate the following assembly code: ```assemble test: movl xx(%rip), %eax // Assume xx == 0 (likely case) testl %eax, %eax // judge once je .LBB0_3 // jump to .LBB0_3 testb $1, test.__already_done(%rip) je .LBB0_2 .LBB0_3: testl %eax, %eax // judge again je .LBB0_5 // jump to .LBB0_5 .LBB0_4: nop .LBB0_5: retq // omit ``` In the above code, `xx == 0` should be a likely case, but in this case, xx has been judged twice. Test info: 1. kernel source: linux-next commit 9c0826a5d9aa4d52206d ("Add linux-next specific files for 20251107") 2. compiler: clang: Debian clang version 21.1.4 (8) with Debian LLD 21.1.4 (compatible with GNU linkers) 3. config: base on default x86_64_defconfig, and setting: CONFIG_MITIGATION_RETHUNK=n CONFIG_STACKPROTECTOR=n Add unlikely to __ret_cond to help the compiler optimize correctly. [akpm@linux-foundation.org: undo whitespace changes] Link: https://lkml.kernel.org/r/20251109083715.24495-1-qq570070308@gmail.com Signed-off-by: Xie Yuanbin <qq570070308@gmail.com> Cc: Bill Wendling <morbo@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Justin Stitt <justinstitt@google.com> Cc: Maninder Singh <maninder1.s@samsung.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Ryusuke Konishi	1ab980e90c	MAINTAINERS: update nilfs2 entry Viacheslav has kindly offered to help with the maintenance of nilfs2 by upstreaming patches, similar to the HFS/HFS+ tree. I've accepted his offer, and will therefore add him as a co-maintainer and switch the project's git tree for that role. At the same time, change the outdated status field to Maintained to reflect the current state. Link: https://lkml.kernel.org/r/20251107153530.9023-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Acked-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
zhang jiao	14954cd190	fs/proc/page: remove unused KPMBITS KPMBITS is never referenced in the code. Just remove it. Link: https://lkml.kernel.org/r/20251106010735.1603-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liu Ye <liuye@kylinos.cn> Cc: Luiz Capitulino <luizcap@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
Andy Shevchenko	f3fb126fdc	math.h: amend abs() kernel-doc and add a note about signed type limits - amend the kernel-doc so the description is decoupled from the parameter descriptions. - add a note to explain behaviour for the signed types when supplied value is the minimum (e.g., INT_MIN for int type). Link: https://lkml.kernel.org/r/20251106152051.2361551-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
Ilya Leoshkevich	581ee79a25	scripts/gdb/symbols: make BPF debug info available to GDB One can debug BPF programs with QEMU gdbstub by setting a breakpoint on bpf_prog_kallsyms_add(), waiting for a hit with a matching aux.name, and then setting a breakpoint on bpf_func. This is tedious, error-prone, and also lacks line numbers. Automate this in a way similar to the existing support for modules in lx-symbols. Enumerate and monitor changes to both BPF kallsyms and JITed progs. For each ksym, generate and compile a synthetic .s file containing the name, code, and size. In addition, if this ksym is also a prog, and not a trampoline, add line number information. Ensure that this is a no-op if the kernel is built without BPF support or if "as" is missing. In theory the "as" dependency may be dropped by generating the synthetic .o file manually, but this is too much complexity for too little benefit. Now one can debug BPF progs out of the box like this: (gdb) lx-symbols -bpf (gdb) b bpf_prog_4e612a6a881a086b_arena_list_add Breakpoint 2 (bpf_prog_4e612a6a881a086b_arena_list_add) pending. # ./test_progs -t arena_list Thread 4 hit Breakpoint 2, bpf_prog_4e612a6a881a086b_arena_list_add () at linux/tools/testing/selftests/bpf/progs/arena_list.c:51 51 list_head = &global_head; (gdb) n bpf_prog_4e612a6a881a086b_arena_list_add () at linux/tools/testing/selftests/bpf/progs/arena_list.c:53 53 for (i = zero; i < cnt && can_loop; i++) { This also works for subprogs. Link: https://lkml.kernel.org/r/20251106124600.86736-3-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
Ilya Leoshkevich	caa71919a6	scripts/gdb/radix-tree: add lx-radix-tree-command Patch series "scripts/gdb/symbols: make BPF debug info available to GDB", v2. This series greatly simplifies debugging BPF progs when using QEMU gdbstub by providing symbol names, sizes, and line numbers to GDB. Patch 1 adds radix tree iteration, which is necessary for parsing prog_idr. Patch 2 is the actual implementation; its description contains some details on how to use this. This patch (of 2): Add a function and a command to iterate over radix tree contents. Duplicate the C implementation in Python, but drop support for tagging. Link: https://lkml.kernel.org/r/20251106124600.86736-1-iii@linux.ibm.com Link: https://lkml.kernel.org/r/20251106124600.86736-2-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
Pratyush Yadav	c9dddd9816	MAINTAINERS: add Pratyush as a reviewer for KHO I have been reviewing most patches for KHO already, and it is easier to spot them if I am directly in Cc. Link: https://lkml.kernel.org/r/20251105102022.18798-1-pratyush@kernel.org Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
David Laight	1d1ef8c1fb	lib: test_mul_u64_u64_div_u64(): test the 32bit code on 64bit There are slight differences in the mul_u64_add_u64_div_u64() code between 32bit and 64bit systems. Compile and test the 32bit version on 64bit hosts for better test coverage. Link: https://lkml.kernel.org/r/20251105201035.64043-10-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
David Laight	d10bb374c4	lib: mul_u64_u64_div_u64(): optimise the divide code Replace the bit by bit algorithm with one that generates 16 bits per iteration on 32bit architectures and 32 bits on 64bit ones. On my zen 5 this reduces the time for the tests (using the generic code) from ~3350ns to ~1000ns. Running the 32bit algorithm on 64bit x86 takes ~1500ns. It'll be slightly slower on a real 32bit system, mostly due to register pressure. The savings for 32bit x86 are much higher (tested in userspace). The worst case (lots of bits in the quotient) drops from ~900 clocks to ~130 (pretty much independant of the arguments). Other 32bit architectures may see better savings. It is possibly to optimise for divisors that span less than __LONG_WIDTH__/2 bits. However I suspect they don't happen that often and it doesn't remove any slow cpu divide instructions which dominate the result. Typical improvements for 64bit random divides: old new sandy bridge: 470 150 haswell: 400 144 piledriver: 960 467 I think rdpmc is very slow. zen5: 244 80 (Timing is 'rdpmc; mul_div(); rdpmc' with the multiply depending on the first rdpmc and the second rdpmc depending on the quotient.) Object code (64bit x86 test program): old 0x173 new 0x141. Link: https://lkml.kernel.org/r/20251105201035.64043-9-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
David Laight	630f96a687	lib: mul_u64_u64_div_u64(): optimise multiply on 32bit x86 gcc generates horrid code for both ((u64)u32_a * u32_b) and (u64_a + u32_b). As well as the extra instructions it can generate a lot of spills to stack (including spills of constant zeros and even multiplies by constant zero). mul_u32_u32() already exists to optimise the multiply. Add a similar add_u64_32() for the addition. Disable both for clang - it generates better code without them. Move the 64x64 => 128 multiply into a static inline helper function for code clarity. No need for the a/b_hi/lo variables, the implicit casts on the function calls do the work for us. Should have minimal effect on the generated code. Use mul_u32_u32() and add_u64_u32() in the 64x64 => 128 multiply in mul_u64_add_u64_div_u64(). Link: https://lkml.kernel.org/r/20251105201035.64043-8-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
David Laight	f0bff2eb04	lib: test_mul_u64_u64_div_u64(): test both generic and arch versions Change the #if in div64.c so that test_mul_u64_u64_div_u64.c can compile and test the generic version (including the 'long multiply') on architectures (eg amd64) that define their own copy. Test the kernel version and the locally compiled version on all arch. Output the time taken (in ns) on the 'test completed' trace. For reference, on my zen 5, the optimised version takes ~220ns and the generic version ~3350ns. Using the native multiply saves ~200ns and adding back the ilog2() 'optimisation' test adds ~50ms. Link: https://lkml.kernel.org/r/20251105201035.64043-7-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
David Laight	500db21917	lib: add tests for mul_u64_u64_div_u64_roundup() Replicate the existing mul_u64_u64_div_u64() test cases with round up. Update the shell script that verifies the table, remove the comment markers so that it can be directly pasted into a shell. Rename the divisor from 'c' to 'd' to match mul_u64_add_u64_div_u64(). It any tests fail then fail the module load with -EINVAL. Link: https://lkml.kernel.org/r/20251105201035.64043-6-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:42 -08:00
David Laight	6480241f31	lib: add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' variant needs 'divisor - 1' adding in between the multiply and divide so cannot easily be done by a caller. Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d and implement the 'round down' and 'round up' using it. Update the x86-64 asm to optimise for 'c' being a constant zero. Add kerndoc definitions for all three functions. Link: https://lkml.kernel.org/r/20251105201035.64043-5-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:41 -08:00
David Laight	d91f891d58	lib: mul_u64_u64_div_u64(): simplify check for a 64bit product If the product is only 64bits div64_u64() can be used for the divide. Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a simple post-multiply check that the high 64bits are zero. This has the advantage of being simpler, more accurate and less code. It will always be faster when the product is larger than 64bits. Most 64bit cpu have a native 64x64=128 bit multiply, this is needed (for the low 64bits) even when div64_u64() is called - so the early check gains nothing and is just extra code. 32bit cpu will need a compare (etc) to generate the 64bit ilog2() from two 32bit bit scans - so that is non-trivial. (Never mind the mess of x86's 'bsr' and any oddball cpu without fast bit-scan instructions.) Whereas the additional instructions for the 128bit multiply result are pretty much one multiply and two adds (typically the 'adc $0,%reg' can be run in parallel with the instruction that follows). The only outliers are 64bit systems without 128bit mutiply and simple in order 32bit ones with fast bit scan but needing extra instructions to get the high bits of the multiply result. I doubt it makes much difference to either, the latter is definitely not mainstream. If anyone is worried about the analysis they can look at the generated code for x86 (especially when cmov isn't used). Link: https://lkml.kernel.org/r/20251105201035.64043-4-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:41 -08:00
David Laight	08092babd3	lib: mul_u64_u64_div_u64(): combine overflow and divide by zero checks Since the overflow check always triggers when the divisor is zero move the check for divide by zero inside the overflow check. This means there is only one test in the normal path. Link: https://lkml.kernel.org/r/20251105201035.64043-3-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:41 -08:00
David Laight	5944f875ac	lib: mul_u64_u64_div_u64(): rename parameter 'c' to 'd' Patch series "Implement mul_u64_u64_div_u64_roundup()", v5. The pwm-stm32.c code wants a 'rounding up' version of mul_u64_u64_div_u64(). This can be done simply by adding 'divisor - 1' to the 128bit product. Implement mul_u64_add_u64_div_u64(a, b, c, d) = (a * b + c)/d based on the existing code. Define mul_u64_u64_div_u64(a, b, d) as mul_u64_add_u64_div_u64(a, b, 0, d) and mul_u64_u64_div_u64_roundup(a, b, d) as mul_u64_add_u64_div_u64(a, b, d-1, d). Only x86-64 has an optimsed (asm) version of the function. That is optimised to avoid the 'add c' when c is known to be zero. In all other cases the extra code will be noise compared to the software divide code. The test module has been updated to test mul_u64_u64_div_u64_roundup() and also enhanced it to verify the C division code on x86-64 and the 32bit division code on 64bit. This patch (of 9): Change to prototype from mul_u64_u64_div_u64(u64 a, u64 b, u64 c) to mul_u64_u64_div_u64(u64 a, u64 b, u64 d). Using 'd' for 'divisor' makes more sense. An upcoming change adds a 'c' parameter to calculate (a * b + c)/d. Link: https://lkml.kernel.org/r/20251105201035.64043-1-david.laight.linux@gmail.com Link: https://lkml.kernel.org/r/20251105201035.64043-2-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:41 -08:00
Christoph Hellwig	af9b65d686	kernel/hung_task: unexport sysctl_hung_task_timeout_secs This was added by the bcachefs pull requests despite various objections, and with bcachefs removed is now unused. This reverts commit `5c3273ec3c` ("kernel/hung_task.c: export sysctl_hung_task_timeout_secs"). Link: https://lkml.kernel.org/r/20251104121920.2430568-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:41 -08:00
Andy Shevchenko	bd97c97641	util_macros.h: fix kernel-doc for u64_to_user_ptr() The added documentation to u64_to_user_ptr() misspelled the function name. Fix it. Link: https://lkml.kernel.org/r/20251104183834.1046584-1-andriy.shevchenko@linux.intel.com Fixes: `029c896c41` ("kernel.h: move PTR_IF() and u64_to_user_ptr() to util_macros.h") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexandru Ardelean <aardelean@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:41 -08:00

1 2 3 4 5 ...

1398010 Commits