Commit Graph

10251 Commits

Author SHA1 Message Date
Eric Biggers
16f2c30e29 lib/crypto: x86/poly1305: Fix register corruption in no-SIMD contexts
Restore the SIMD usability check and base conversion that were removed
by commit 318c53ae02 ("crypto: x86/poly1305 - Add block-only
interface").

This safety check is cheap and is well worth eliminating a footgun.
While the Poly1305 functions should not be called when SIMD registers
are unusable, if they are anyway, they should just do the right thing
instead of corrupting random tasks' registers and/or computing incorrect
MACs.  Fixing this is also needed for poly1305_kunit to pass.

Just use irq_fpu_usable() instead of the original crypto_simd_usable(),
since poly1305_kunit won't rely on crypto_simd_disabled_for_test.

Fixes: 318c53ae02 ("crypto: x86/poly1305 - Add block-only interface")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250706231100.176113-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-11 14:29:42 -07:00
Eric Biggers
eec76ea5a7 lib/crypto: arm64/poly1305: Fix register corruption in no-SIMD contexts
Restore the SIMD usability check that was removed by commit a59e5468a9
("crypto: arm64/poly1305 - Add block-only interface").

This safety check is cheap and is well worth eliminating a footgun.
While the Poly1305 functions should not be called when SIMD registers
are unusable, if they are anyway, they should just do the right thing
instead of corrupting random tasks' registers and/or computing incorrect
MACs.  Fixing this is also needed for poly1305_kunit to pass.

Just use may_use_simd() instead of the original crypto_simd_usable(),
since poly1305_kunit won't rely on crypto_simd_disabled_for_test.

Fixes: a59e5468a9 ("crypto: arm64/poly1305 - Add block-only interface")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250706231100.176113-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-11 14:29:42 -07:00
Eric Biggers
52c3e242f4 lib/crypto: arm/poly1305: Fix register corruption in no-SIMD contexts
Restore the SIMD usability check that was removed by commit 773426f477
("crypto: arm/poly1305 - Add block-only interface").

This safety check is cheap and is well worth eliminating a footgun.
While the Poly1305 functions should not be called when SIMD registers
are unusable, if they are anyway, they should just do the right thing
instead of corrupting random tasks' registers and/or computing incorrect
MACs.  Fixing this is also needed for poly1305_kunit to pass.

Just use may_use_simd() instead of the original crypto_simd_usable(),
since poly1305_kunit won't rely on crypto_simd_disabled_for_test.

Fixes: 773426f477 ("crypto: arm/poly1305 - Add block-only interface")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250706231100.176113-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-11 14:29:42 -07:00
Sergio González Collado
34db4fba81 kunit: fix longest symbol length test
The kunit test that checks the longests symbol length [1], has triggered
warnings in some pilelines when symbol prefixes are used [2][3]. The test
will to depend on !PREFIX_SYMBOLS and !CFI_CLANG as sujested in [4] and
on !GCOV_KERNEL.

[1] https://lore.kernel.org/rust-for-linux/CABVgOSm=5Q0fM6neBhxSbOUHBgNzmwf2V22vsYC10YRBT=kN1g@mail.gmail.com/T/#t
[2] https://lore.kernel.org/all/20250328112156.2614513-1-arnd@kernel.org/T/#u
[3] https://lore.kernel.org/rust-for-linux/bbd03b37-c4d9-4a92-9be2-75aaf8c19815@infradead.org/T/#t
[4] https://lore.kernel.org/linux-kselftest/20250427200916.GA1661412@ax162/T/#t

Link: https://lore.kernel.org/r/20250706201855.232451-1-sergio.collado@gmail.com
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Sergio González Collado <sergio.collado@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-07-10 14:02:07 -06:00
Sebastian Andrzej Siewior
2885daf470 lib/smp_processor_id: Make migration check unconditional of SMP
Commit cac5cefbad ("sched/smp: Make SMP unconditional")
migrate_disable() even on UP builds.
Commit 06ddd17521 ("sched/smp: Always define is_percpu_thread() and
scheduler_ipi()") made is_percpu_thread() check the affinity mask
instead replying always true for UP mask.

As a consequence smp_processor_id() now complains if invoked within a
migrate_disable() section because is_percpu_thread() checks its mask and
the migration check is left out.

Make migration check unconditional of SMP.

Fixes: cac5cefbad ("sched/smp: Make SMP unconditional")
Closes: https://lore.kernel.org/oe-lkp/202507100448.6b88d6f1-lkp@intel.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/r/20250710082748.-DPO1rjO@linutronix.de
2025-07-10 17:52:13 +02:00
Arnd Bergmann
caf728dfa7 lib: test_objagg: split test_hints_case() into two functions
With sanitizers enabled, this function uses a lot of stack, causing a
harmless warning:

lib/test_objagg.c: In function 'test_hints_case.constprop':
lib/test_objagg.c:994:1: error: the frame size of 1440 bytes is larger than 1408 bytes [-Werror=frame-larger-than=]

Most of this is from the two 'struct world' structures.  Since most of the
work in this function is duplicated for the two, split it up into separate
functions that each use one of them.

The combined stack usage is still the same here, but there is no warning
any more, and the code is still safe because of the known call chain.

Link: https://lkml.kernel.org/r/20250620111907.3395296-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:55 -07:00
Herbert Xu
1857fcc847 lib/raid6: replace custom zero page with ZERO_PAGE
Use the system-wide zero page instead of a custom zero page.

[herbert@gondor.apana.org.au: update lib/raid6/recov_rvv.c, per Klara]
  Link: https://lkml.kernel.org/r/aFkUnXWtxcgOTVkw@gondor.apana.org.au
Link: https://lkml.kernel.org/r/Z9flJNkWQICx0PXk@gondor.apana.org.au
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Song Liu <song@kernel.org>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:54 -07:00
Nicolas Pitre
e795000e75 mul_u64_u64_div_u64: fix the division-by-zero behavior
The current implementation forces a compile-time 1/0 division, which
generates an undefined instruction (ud2 on x86) rather than a proper
runtime division-by-zero exception.

Change to trigger an actual div-by-0 exception at runtime, consistent with
other division operations.  Use a non-1 dividend to prevent the compiler
from optimizing the division into a comparison.

Link: https://lkml.kernel.org/r/q246p466-1453-qon9-29so-37105116009q@onlyvoer.pbz
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Cc: Biju Das <biju.das.jz@bp.renesas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:53 -07:00
Liam R. Howlett
1f0bce2fa8 maple_tree: add testing for restoring maple state to active
Restoring maple status to ma_active on overflow/underflow when mas->node
was NULL could have happened in the past, but was masked by a bug in
mas_walk().  Add test cases that triggered the bug when the node was
mas->node prior to fixing the maple state setup.

Add a few extra tests around restoring the active maple status.

Link: https://lore.kernel.org/all/202506191556.6bfc7b93-lkp@intel.com/
Link: https://lkml.kernel.org/r/20250624154823.52221-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:22 -07:00
Liam R. Howlett
12f1d79312 maple_tree: fix status setup on restore to active
During the initial call with a maple state, an error status may be set
before a valid node is populated into the maple state node.  Subsequent
calls with the maple state may restore the state into an active state with
no node set.  This was masked by the mas_walk() always resetting the
status to ma_start and result in an extra walk in this rare scenario.

Don't restore the state to active unless there is a value in the structs
node.  This also allows mas_walk() to be fixed to use the active state
without exposing an issue.

User visible results are marginal performance improvements when an active
state can be restored and used instead of rewalking the tree.

Stable is not Cc'ed because the existing code is stable and the
performance gains are not worth the risk.

Link: https://lore.kernel.org/all/20250611011253.19515-1-richard.weiyang@gmail.com/
Link: https://lore.kernel.org/all/20250407231354.11771-1-richard.weiyang@gmail.com/
Link: https://lore.kernel.org/all/202506191556.6bfc7b93-lkp@intel.com/
Link: https://lkml.kernel.org/r/20250624154823.52221-1-Liam.Howlett@oracle.com
Fixes: a8091f039c ("maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Wei Yang <richard.weiyang@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506191556.6bfc7b93-lkp@intel.com
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:22 -07:00
Uladzislau Rezki (Sony)
d8e77a0b63 lib/test_vmalloc.c: restrict default test mask to avoid test warnings
When the vmalloc test is built into the kernel, it runs automatically
during the boot.  The current-default "run_test_mask" includes all test
cases, including those which are designed to fail and which trigger kernel
warnings.

These kernel splats can be misinterpreted as actual kernel bugs, leading
to false alarms and unnecessary reports.

To address this, limit the default test mask to only the first few tests
which are expected to pass cleanly.  These tests are safe and should not
generate any warnings unless there is a real bug.

Users who wish to explicitly run specific test cases have to pass the
run_test_mask as a boot parameter or at module load time.

Link: https://lkml.kernel.org/r/20250623184035.581229-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: David Wang <00107082@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:21 -07:00
Uladzislau Rezki (Sony)
2e72850549 lib/test_vmalloc.c: use late_initcall() if built-in for init ordering
When the vmalloc test code is compiled as a built-in, use late_initcall()
instead of module_init() to defer a vmalloc test execution until most
subsystems are up and running.

It avoids interfering with components that may not yet be initialized at
module_init() time.  For example, there was a recent report of memory
profiling infrastructure not being ready early enough leading to kernel
crash.

By using late_initcall() in the built-in case, we ensure the tests are run
at a safer point during a boot sequence.

Link: https://lkml.kernel.org/r/20250623184035.581229-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: David Wang <00107082@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:21 -07:00
Dev Jain
592b939b59 maple tree: use goto label to simplify code
Use the underflow goto label to set the status to ma_underflow and return
NULL, as is being done elsewhere.

[akpm@linux-foundation.org: add newline, per Liam (and remove one, per akpm)]
Link: https://lkml.kernel.org/r/20250624080748.4855-1-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:19 -07:00
Hao Ge
59b5ed409d mm/percpu: conditionally define _shared_alloc_tag via CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU
Recently discovered this entry while checking kallsyms on ARM64:
ffff800083e509c0 D _shared_alloc_tag

If ARCH_NEEDS_WEAK_PER_CPU is not defined(it is only defined for s390 and
alpha architectures), there's no need to statically define the percpu
variable _shared_alloc_tag.

Therefore, we need to implement isolation for this purpose.

When building the core kernel code for s390 or alpha architectures,
ARCH_NEEDS_WEAK_PER_CPU remains undefined (as it is gated by #if
defined(MODULE)).  However, when building modules for these architectures,
the macro is explicitly defined.

Therefore, we remove all instances of ARCH_NEEDS_WEAK_PER_CPU from the
code and introduced CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU to replace the
relevant logic.  We can now conditionally define the perpcu variable
_shared_alloc_tag based on CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU.  This
allows architectures (such as s390/alpha) that require weak definitions
for percpu variables in modules to include the definition, while others
can omit it via compile-time exclusion.

Link: https://lkml.kernel.org/r/20250618015809.1235761-1-hao.ge@linux.dev
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>	[s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Chistoph Lameter <cl@linux.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:15 -07:00
David Wang
9f44df50fe alloc_tag: keep codetag iterator active between read()
When reading /proc/allocinfo, for each read syscall, seq_file would invoke
start/stop callbacks.  In start callback, a memory is alloced to store
iterator and the iterator would start from beginning to walk linearly to
current read position.

seq_file read() takes at most 4096 bytes, even if read with a larger user
space buffer, meaning read out all of /proc/allocinfo, tens of read
syscalls are needed.  For example, a 306036 bytes allocinfo files need 76
reads:

 $ sudo cat /proc/allocinfo  | wc
    3964   16678  306036
 $ sudo strace -T -e read cat /proc/allocinfo
 ...
 read(3, "        4096        1 arch/x86/k"..., 131072) = 4063 <0.000062>
 ...
 read(3, "           0        0 sound/core"..., 131072) = 4021 <0.000150>
 ...
For those n=3964 lines, each read takes about m=3964/76=52 lines,
since iterator restart from beginning for each read(),
it would move forward
   m  steps on 1st read
 2*m  steps on 2nd read
 3*m  steps on 3rd read
 ...
   n  steps on last read
As read() along, those linear seek steps make read() calls slower and
slower.  Adding those up, codetag iterator moves about O(n*n/m) steps,
making data structure traversal take significant part of the whole
reading.  Profiling when stress reading /proc/allocinfo confirms it:

 vfs_read(99.959% 1677299/1677995)
     proc_reg_read_iter(99.856% 1674881/1677299)
         seq_read_iter(99.959% 1674191/1674881)
             allocinfo_start(75.664% 1266755/1674191)
                 codetag_next_ct(79.217% 1003487/1266755)  <---
                 srso_return_thunk(1.264% 16011/1266755)
                 __kmalloc_cache_noprof(0.102% 1296/1266755)
                 ...
             allocinfo_show(21.287% 356378/1674191)
             allocinfo_next(1.530% 25621/1674191)
codetag_next_ct() takes major part.

A private data alloced at open() time can be used to carry iterator alive
across read() calls, and avoid the memory allocation and iterator reset
for each read().  This way, only O(1) memory allocation and O(n) steps
iterating, and `time` shows performance improvement from ~7ms to ~4ms. 
Profiling with the change:

 vfs_read(99.865% 1581073/1583214)
     proc_reg_read_iter(99.485% 1572934/1581073)
         seq_read_iter(99.846% 1570519/1572934)
             allocinfo_show(87.428% 1373074/1570519)
                 seq_buf_printf(83.695% 1149196/1373074)
                 seq_buf_putc(1.917% 26321/1373074)
                 _find_next_bit(1.531% 21023/1373074)
                 ...
                 codetag_to_text(0.490% 6727/1373074)
                 ...
             allocinfo_next(6.275% 98543/1570519)
             ...
             allocinfo_start(0.369% 5790/1570519)
             ...
Now seq_buf_printf() takes major part.

Link: https://lkml.kernel.org/r/20250609064408.112783-1-00107082@163.com
Signed-off-by: David Wang <00107082@163.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:06 -07:00
David Wang
b0da7709c2 alloc_tag: add sequence number for module and iterator
Codetag iterator use <id,address> pair to guarantee the validness.  But
both id and address can be reused, there is theoretical possibility when
module inserted right after another module removed, kmalloc returns an
address same as the address kfree by previous module and IDR key reuses
the key recently removed.

Add a sequence number to codetag_module and code_iterator, the sequence
number is strickly incremented whenever a module is loaded.  An iterator
is valid if and only if its sequence number match codetag_module's.

Link: https://lkml.kernel.org/r/20250609064200.112639-1-00107082@163.com
Signed-off-by: David Wang <00107082@163.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:06 -07:00
Arnd Bergmann
6046a3bed1 lib/test_hmm: reduce stack usage
The various test ioctl handlers use arrays of 64 integers that add up to
1KiB of stack data, which in turn leads to exceeding the warning limit in
some configurations:

lib/test_hmm.c:935:12: error: stack frame size (1408) exceeds limit (1280)
in 'dmirror_migrate_to_device' [-Werror,-Wframe-larger-than]

Use half the size for these arrays, in order to stay under the warning
limits.  The code can already deal with arbitrary lengths, but this may be
a little less efficient.

Link: https://lkml.kernel.org/r/20250610092159.2639515-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:05 -07:00
Dev Jain
1e6b17b423 xarray: add a BUG_ON() to ensure caller is not sibling
Suppose xas is pointing somewhere near the end of the multi-entry batch. 
Then it may happen that the computed slot already falls beyond the batch,
thus breaking the loop due to !xa_is_sibling(), and computing the wrong
order.

For example, suppose we have a shift-6 node having an order-9 entry => 8 -
1 = 7 siblings, so assume the slots are at offset 0 till 7 in this node. 
If xas->xa_offset is 6, then the code will compute order as 1 +
xas->xa_node->shift = 7.  Therefore, the order computation must start from
the beginning of the multi-slot entries, that is, the non-sibling entry.

Thus ensure that the caller is aware of this by triggering a BUG when the
entry is a sibling entry.  Note that this BUG_ON() is only active while
running selftests, so there is no overhead in a running kernel.

Link: https://lkml.kernel.org/r/20250604041533.91198-1-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:02 -07:00
Wei Yang
ea9b77f98d maple_tree: fix mt_destroy_walk() on root leaf node
On destroy, we should set each node dead.  But current code miss this when
the maple tree has only the root node.

The reason is mt_destroy_walk() leverage mte_destroy_descend() to set node
dead, but this is skipped since the only root node is a leaf.

Fixes this by setting the node dead if it is a leaf.

Link: https://lore.kernel.org/all/20250407231354.11771-1-richard.weiyang@gmail.com/
Link: https://lkml.kernel.org/r/20250624191841.64682-1-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 21:07:52 -07:00
Harry Yoo
99af22cd34 lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock even
when the alloc_tag_cttype is not allocated because:

  1) alloc tagging is disabled because mem profiling is disabled
     (!alloc_tag_cttype)
  2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
  3) alloc tagging is enabled, but failed initialization
     (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))

In all cases, alloc_tag_cttype is not allocated, and therefore
alloc_tag_top_users() should not attempt to acquire the semaphore.

This leads to a crash on memory allocation failure by attempting to
acquire a non-existent semaphore:

  Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
  CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G      D             6.16.0-rc2 #1 VOLUNTARY
  Tainted: [D]=DIE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
  RIP: 0010:down_read_trylock+0xaa/0x3b0
  Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
  RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
  RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
  RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
  RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
  R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
  R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
  FS:  00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   codetag_trylock_module_list+0xd/0x20
   alloc_tag_top_users+0x369/0x4b0
   __show_mem+0x1cd/0x6e0
   warn_alloc+0x2b1/0x390
   __alloc_frozen_pages_noprof+0x12b9/0x21a0
   alloc_pages_mpol+0x135/0x3e0
   alloc_slab_page+0x82/0xe0
   new_slab+0x212/0x240
   ___slab_alloc+0x82a/0xe00
   </TASK>

As David Wang points out, this issue became easier to trigger after commit
780138b123 ("alloc_tag: check mem_profiling_support in alloc_tag_init").

Before the commit, the issue occurred only when it failed to allocate and
initialize alloc_tag_cttype or if a memory allocation fails before
alloc_tag_init() is called.  After the commit, it can be easily triggered
when memory profiling is compiled but disabled at boot.

To properly determine whether alloc_tag_init() has been called and its
data structures initialized, verify that alloc_tag_cttype is a valid
pointer before acquiring the semaphore.  If the variable is NULL or an
error value, it has not been properly initialized.  In such a case, just
skip and do not attempt to acquire the semaphore.

[harry.yoo@oracle.com: v3]
  Link: https://lkml.kernel.org/r/20250624072513.84219-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250620195305.1115151-1-harry.yoo@oracle.com
Fixes: 780138b123 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Fixes: 1438d349d1 ("lib: add memory allocations report in show_mem()")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Acked-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: David Wang <00107082@163.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Yuanyuan Zhong <yzhong@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 21:07:52 -07:00
Thomas Weißschuh
34f888e340 vdso/gettimeofday: Return bool from clock_getres() helpers
The internal helpers are effectively using boolean results,
while pretending to use error numbers.

Switch the return type to bool for more clarity.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250701-vdso-auxclock-v1-5-df7d9f87b9b8@linutronix.de
2025-07-09 11:52:34 +02:00
Yury Norov [NVIDIA]
c56f97c5c7 bitmap: generalize node_random()
Generalize node_random() and make it available to general bitmaps and
cpumasks users.

Notice, find_first_bit() is generally faster than find_nth_bit(), and we
employ it when there's a single set bit in the bitmap.

See commit 3e061d924f ("lib/nodemask: optimize node_random for
nodemask with single NUMA node").

CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Yury Norov [NVIDIA]" <yury.norov@gmail.com>
2025-07-08 19:11:57 -04:00
Eric Biggers
aacb37f597 lib/crypto: hash_info: Move hash_info.c into lib/crypto/
crypto/hash_info.c just contains a couple of arrays that map HASH_ALGO_*
algorithm IDs to properties of those algorithms.  It is compiled only
when CRYPTO_HASH_INFO=y, but currently CRYPTO_HASH_INFO depends on
CRYPTO.  Since this can be useful without the old-school crypto API,
move it into lib/crypto/ so that it no longer depends on CRYPTO.

This eliminates the need for FS_VERITY to select CRYPTO after it's been
converted to use lib/crypto/.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630172224.46909-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-08 12:03:44 -07:00
Eric Biggers
57b15e9260 lib/crypto: x86/sha256: Remove unnecessary checks for nblocks==0
Since sha256_blocks() is called only with nblocks >= 1, remove
unnecessary checks for nblocks == 0 from the x86 SHA-256 assembly code.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250704023958.73274-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:23:56 -07:00
Eric Biggers
a8c60a9aca lib/crypto: x86/sha256: Move static_call above kernel-mode FPU section
As I did for sha512_blocks(), reorganize x86's sha256_blocks() to be
just a static_call.  To achieve that, for each assembly function add a C
function that handles the kernel-mode FPU section and fallback.  While
this increases total code size slightly, the amount of code actually
executed on a given system does not increase, and it is slightly more
efficient since it eliminates the extra static_key.  It also makes the
assembly functions be called with standard direct calls instead of
static calls, eliminating the need for ANNOTATE_NOENDBR.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250704023958.73274-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:23:55 -07:00
Eric Biggers
773d2b99bb lib/crypto: sha256: Sync sha256_update() with sha512_update()
The BLOCK_HASH_UPDATE_BLOCKS macro is difficult to read.  For now, let's
just write the update explicitly in the straightforward way, mirroring
sha512_update().  It's possible that we'll bring back a macro for this
later, but it needs to be properly justified and hopefully a bit more
readable.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:23:11 -07:00
Eric Biggers
e96cb9507f lib/crypto: sha256: Consolidate into single module
Consolidate the CPU-based SHA-256 code into a single module, following
what I did with SHA-512:

- Each arch now provides a header file lib/crypto/$(SRCARCH)/sha256.h,
  replacing lib/crypto/$(SRCARCH)/sha256.c.  The header defines
  sha256_blocks() and optionally sha256_mod_init_arch().  It is included
  by lib/crypto/sha256.c, and thus the code gets built into the single
  libsha256 module, with proper inlining and dead code elimination.

- sha256_blocks_generic() is moved from lib/crypto/sha256-generic.c into
  lib/crypto/sha256.c.  It's now a static function marked with
  __maybe_unused, so the compiler automatically eliminates it in any
  cases where it's not used.

- Whether arch-optimized SHA-256 is buildable is now controlled
  centrally by lib/crypto/Kconfig instead of by
  lib/crypto/$(SRCARCH)/Kconfig.  The conditions for enabling it remain
  the same as before, and it remains enabled by default.

- Any additional arch-specific translation units for the optimized
  SHA-256 code (such as assembly files) are now compiled by
  lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:23:11 -07:00
Eric Biggers
9f9846a72e lib/crypto: sha256: Remove sha256_is_arch_optimized()
Remove sha256_is_arch_optimized(), since it is no longer used.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:23:11 -07:00
Eric Biggers
077833cd60 lib/crypto: sha256: Add HMAC-SHA224 and HMAC-SHA256 support
Since HMAC support is commonly needed and is fairly simple, include it
as a first-class citizen of the SHA-256 library.

The API supports both incremental and one-shot computation, and either
preparing the key ahead of time or just using a raw key.  The
implementation is much more streamlined than crypto/hmac.c.

I've kept it consistent with the HMAC-SHA384 and HMAC-SHA512 code as
much as possible.

Testing of these functions will be via sha224_kunit and sha256_kunit,
added by a later commit.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:23:11 -07:00
Eric Biggers
4c855d5069 lib/crypto: sha256: Propagate sha256_block_state type to implementations
The previous commit made the SHA-256 compression function state be
strongly typed, but it wasn't propagated all the way down to the
implementations of it.  Do that now.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:22:57 -07:00
Eric Biggers
b86ced882b lib/crypto: sha256: Make library API use strongly-typed contexts
Currently the SHA-224 and SHA-256 library functions can be mixed
arbitrarily, even in ways that are incorrect, for example using
sha224_init() and sha256_final().  This is because they operate on the
same structure, sha256_state.

Introduce stronger typing, as I did for SHA-384 and SHA-512.

Also as I did for SHA-384 and SHA-512, use the names *_ctx instead of
*_state.  The *_ctx names have the following small benefits:

- They're shorter.
- They avoid an ambiguity with the compression function state.
- They're consistent with the well-known OpenSSL API.
- Users usually name the variable 'sctx' anyway, which suggests that
  *_ctx would be the more natural name for the actual struct.

Therefore: update the SHA-224 and SHA-256 APIs, implementation, and
calling code accordingly.

In the new structs, also strongly-type the compression function state.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:18:53 -07:00
Eric Biggers
6fa4b29220 lib/crypto: sha256: Add sha224() and sha224_update()
Add a one-shot SHA-224 computation function sha224(), for consistency
with sha256(), sha384(), and sha512() which all already exist.

Similarly, add sha224_update().  While for now it's identical to
sha256_update(), omitting it makes the API harder to use since users
have to "know" which functions are the same between SHA-224 and SHA-256.
Also, this is a prerequisite for using different context types for each.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:18:53 -07:00
Eric Biggers
9f97707bdb lib/crypto: sha256: Remove sha256_blocks_simd()
Instead of having both sha256_blocks_arch() and sha256_blocks_simd(),
instead have just sha256_blocks_arch() which uses the most efficient
implementation that is available in the calling context.

This is simpler, as it reduces the API surface.  It's also safer, since
sha256_blocks_arch() just works in all contexts, including contexts
where the FPU/SIMD/vector registers cannot be used.  This doesn't mean
that SHA-256 computations *should* be done in such contexts, but rather
we should just do the right thing instead of corrupting a random task's
registers.  Eliminating this footgun and simplifying the code is well
worth the very small performance cost of doing the check.

Note: in the case of arm and arm64, what used to be sha256_blocks_arch()
is renamed back to its original name of sha256_block_data_order().
sha256_blocks_arch() is now used for the higher-level dispatch function.
This renaming also required an update to lib/crypto/arm64/sha512.h,
since sha2-armv8.pl is shared by both SHA-256 and SHA-512.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:18:53 -07:00
Eric Biggers
3135d5be7c lib/crypto: sha256: Reorder some code
First, move the declarations of sha224_init/update/final to be just
above the corresponding SHA-256 code, matching the order that I used for
SHA-384 and SHA-512.  In sha2.h, the end result is that SHA-224,
SHA-256, SHA-384, and SHA-512 are all in the logical order.

Second, move sha224_block_init() and sha256_block_init() to be just
below crypto_sha256_state.  In later changes, these functions as well as
struct crypto_sha256_state will no longer be used by the library
functions.  They'll remain just for some legacy offload drivers.  This
gets them into a logical place in the file for that.

No code changes other than reordering.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250630160645.3198-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04 10:18:52 -07:00
Paolo Abeni
6b9fd8857b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc5).

No conflicts.

No adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-04 08:03:18 +02:00
Linus Torvalds
17bbde2e17 Merge tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth.

  Current release - new code bugs:

    - eth:
       - txgbe: fix the issue of TX failure
       - ngbe: specify IRQ vector when the number of VFs is 7

  Previous releases - regressions:

    - sched: always pass notifications when child class becomes empty

    - ipv4: fix stat increase when udp early demux drops the packet

    - bluetooth: prevent unintended pause by checking if advertising is active

    - virtio: fix error reporting in virtqueue_resize

    - eth:
       - virtio-net:
          - ensure the received length does not exceed allocated size
          - fix the xsk frame's length check
       - lan78xx: fix WARN in __netif_napi_del_locked on disconnect

  Previous releases - always broken:

    - bluetooth: mesh: check instances prior disabling advertising

    - eth:
       - idpf: convert control queue mutex to a spinlock
       - dpaa2: fix xdp_rxq_info leak
       - amd-xgbe: align CL37 AN sequence as per databook"

* tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  vsock/vmci: Clear the vmci transport packet properly when initializing it
  dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example
  net: ngbe: specify IRQ vector when the number of VFs is 7
  net: wangxun: revert the adjustment of the IRQ vector sequence
  net: txgbe: request MISC IRQ in ndo_open
  virtio_net: Enforce minimum TX ring size for reliability
  virtio_net: Cleanup '2+MAX_SKB_FRAGS'
  virtio_ring: Fix error reporting in virtqueue_resize
  virtio-net: xsk: rx: fix the frame's length check
  virtio-net: use the check_mergeable_len helper
  virtio-net: remove redundant truesize check with PAGE_SIZE
  virtio-net: ensure the received length does not exceed allocated size
  net: ipv4: fix stat increase when udp early demux drops the packet
  net: libwx: fix the incorrect display of the queue number
  amd-xgbe: do not double read link status
  net/sched: Always pass notifications when child class becomes empty
  nui: Fix dma_mapping_error() check
  rose: fix dangling neighbour pointers in rose_rt_device_down()
  enic: fix incorrect MTU comparison in enic_change_mtu()
  amd-xgbe: align CL37 AN sequence as per databook
  ...
2025-07-03 09:18:55 -07:00
Dan Carpenter
e6ed134a4e lib: test_objagg: Set error message in check_expect_hints_stats()
Smatch complains that the error message isn't set in the caller:

    lib/test_objagg.c:923 test_hints_case2()
    error: uninitialized symbol 'errmsg'.

This static checker warning only showed up after a recent refactoring
but the bug dates back to when the code was originally added.  This
likely doesn't affect anything in real life.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202506281403.DsuyHFTZ-lkp@intel.com/
Fixes: 0a020d416d ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/8548f423-2e3b-4bb7-b816-5041de2762aa@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01 17:29:00 -07:00
Daniel Wagner
b6139a6abf lib/group_cpus: Let group_cpu_evenly() return the number of initialized masks
group_cpu_evenly() might have allocated less groups then requested:

group_cpu_evenly()
  __group_cpus_evenly()
    alloc_nodes_groups()
      # allocated total groups may be less than numgrps when
      # active total CPU number is less then numgrps

In this case, the caller will do an out of bound access because the
caller assumes the masks returned has numgrps.

Return the number of groups created so the caller can limit the access
range accordingly.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250617-isolcpus-queue-counters-v1-1-13923686b54b@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-01 10:24:11 -06:00
Eric Biggers
1a822ea52a lib/crc: Explicitly include <linux/export.h>
Fix build warnings with W=1 that started appearing after
commit a934a57a42 ("scripts/misc-check: check missing #include
<linux/export.h> when W=1").  While at it, sort the include lists
alphabetically.

Link: https://lore.kernel.org/r/20250612183852.114878-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
61d01fb7af lib/crc: Remove ARCH_HAS_* kconfig symbols
These symbols are no longer used, so remove them.

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
b10749d89f lib/crc: x86: Migrate optimized CRC code into lib/crc/
Move the x86-optimized CRC code from arch/x86/lib/crc* into its new
location in lib/crc/x86/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
9b2d720e8a lib/crc: sparc: Migrate optimized CRC code into lib/crc/
Move the sparc-optimized CRC code from arch/sparc/lib/crc* into its new
location in lib/crc/sparc/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
2374bf2386 lib/crc: s390: Migrate optimized CRC code into lib/crc/
Move the s390-optimized CRC code from arch/s390/lib/crc* into its new
location in lib/crc/s390/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
b5943815e6 lib/crc: riscv: Migrate optimized CRC code into lib/crc/
Move the riscv-optimized CRC code from arch/riscv/lib/crc* into its new
location in lib/crc/riscv/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
190c253d86 lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
Move the powerpc-optimized CRC code from arch/powerpc/lib/crc* into its
new location in lib/crc/powerpc/, and wire it up in the new way.  This
new way of organizing the CRC code eliminates the need to artificially
split the code for each CRC variant into separate arch and generic
modules, enabling better inlining and dead code elimination.  For more
details, see "lib/crc: Prepare for arch-optimized code in subdirs of
lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
da4fd65773 lib/crc: mips: Migrate optimized CRC code into lib/crc/
Move the mips-optimized CRC code from arch/mips/lib/crc* into its new
location in lib/crc/mips/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
b10d2d20d9 lib/crc: loongarch: Migrate optimized CRC code into lib/crc/
Move the loongarch-optimized CRC code from arch/loongarch/lib/crc* into
its new location in lib/crc/loongarch/, and wire it up in the new way.
This new way of organizing the CRC code eliminates the need to
artificially split the code for each CRC variant into separate arch and
generic modules, enabling better inlining and dead code elimination.
For more details, see "lib/crc: Prepare for arch-optimized code in
subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
2b7531b2a2 lib/crc: arm64: Migrate optimized CRC code into lib/crc/
Move the arm64-optimized CRC code from arch/arm64/lib/crc* into its new
location in lib/crc/arm64/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
530b304f00 lib/crc: arm: Migrate optimized CRC code into lib/crc/
Move the arm-optimized CRC code from arch/arm/lib/crc* into its new
location in lib/crc/arm/, and wire it up in the new way.  This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination.  For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00
Eric Biggers
0bcfca5640 lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
Rework how lib/crc/ supports arch-optimized code.  First, instead of the
arch-optimized CRC code being in arch/$(SRCARCH)/lib/, it will now be in
lib/crc/$(SRCARCH)/.  Second, the API functions (e.g. crc32c()),
arch-optimized functions (e.g. crc32c_arch()), and generic functions
(e.g. crc32c_base()) will now be part of a single module for each CRC
type, allowing better inlining and dead code elimination.  The second
change is made possible by the first.

As an example, consider CONFIG_CRC32=m on x86.  We'll now have just
crc32.ko instead of both crc32-x86.ko and crc32.ko.  The two modules
were already coupled together and always both got loaded together via
direct symbol dependency, so the separation provided no benefit.

Note: later I'd like to apply the same design to lib/crypto/ too, where
often the API functions are out-of-line so this will work even better.
In those cases, for each algorithm we currently have 3 modules all
coupled together, e.g. libsha256.ko, libsha256-generic.ko, and
sha256-x86.ko.  We should have just one, inline things properly, and
rely on the compiler's dead code elimination to decide the inclusion of
the generic code instead of manually setting it via kconfig.

Having arch-specific code outside arch/ was somewhat controversial when
Zinc proposed it back in 2018.  But I don't think the concerns are
warranted.  It's better from a technical perspective, as it enables the
improvements mentioned above.  This model is already successfully used
in other places in the kernel such as lib/raid6/.  The community of each
architecture still remains free to work on the code, even if it's not in
arch/.  At the time there was also a desire to put the library code in
the same files as the old-school crypto API, but that was a mistake; now
that the library is separate, that's no longer a constraint either.

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-3-ebiggers@kernel.org
Link: https://lore.kernel.org/r/20250612054514.142728-1-ebiggers@kernel.org
Link: https://lore.kernel.org/r/20250621012221.4351-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30 09:31:57 -07:00