Commit Graph

10551 Commits

Author SHA1 Message Date
Eric Biggers
fa2297750c lib/crypto: arm/aes: Migrate optimized code into library
Move the ARM optimized single-block AES en/decryption code into
lib/crypto/, wire it up to the AES library API, and remove the
superseded "aes-arm" crypto_cipher algorithm.

The result is that both the AES library and crypto_cipher APIs are now
optimized for ARM, whereas previously only crypto_cipher was (and the
optimizations weren't enabled by default, which this fixes as well).

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:39:58 -08:00
Eric Biggers
a22fd0e3c4 lib/crypto: aes: Introduce improved AES library
The kernel's AES library currently has the following issues:

- It doesn't take advantage of the architecture-optimized AES code,
  including the implementations using AES instructions.

- It's much slower than even the other software AES implementations: 2-4
  times slower than "aes-generic", "aes-arm", and "aes-arm64".

- It requires that both the encryption and decryption round keys be
  computed and cached.  This is wasteful for users that need only the
  forward (encryption) direction of the cipher: the key struct is 484
  bytes when only 244 are actually needed.  This missed optimization is
  very common, as many AES modes (e.g. GCM, CFB, CTR, CMAC, and even the
  tweak key in XTS) use the cipher only in the forward (encryption)
  direction even when doing decryption.

- It doesn't provide the flexibility to customize the prepared key
  format.  The API is defined to do key expansion, and several callers
  in drivers/crypto/ use it specifically to expand the key.  This is an
  issue when integrating the existing powerpc, s390, and sparc code,
  which is necessary to provide full parity with the traditional API.

To resolve these issues, I'm proposing the following changes:

1. New structs 'aes_key' and 'aes_enckey' are introduced, with
   corresponding functions aes_preparekey() and aes_prepareenckey().

   Generally these structs will include the encryption+decryption round
   keys and the encryption round keys, respectively.  However, the exact
   format will be under control of the architecture-specific AES code.

   (The verb "prepare" is chosen over "expand" since key expansion isn't
   necessarily done.  It's also consistent with hmac*_preparekey().)

2. aes_encrypt() and aes_decrypt() will be changed to operate on the new
   structs instead of struct crypto_aes_ctx.

3. aes_encrypt() and aes_decrypt() will use architecture-optimized code
   when available, or else fall back to a new generic AES implementation
   that unifies the existing two fragmented generic AES implementations.

   The new generic AES implementation uses tables for both SubBytes and
   MixColumns, making it almost as fast as "aes-generic".  However,
   instead of aes-generic's huge 8192-byte tables per direction, it uses
   only 1024 bytes for encryption and 1280 bytes for decryption (similar
   to "aes-arm").  The cost is just some extra rotations.

   The new generic AES implementation also includes table prefetching,
   making it have some "constant-time hardening".  That's an improvement
   from aes-generic which has no constant-time hardening.

   It does slightly regress in constant-time hardening vs. the old
   lib/crypto/aes.c which had smaller tables, and from aes-fixed-time
   which disabled IRQs on top of that.  But I think this is tolerable.
   The real solutions for constant-time AES are AES instructions or
   bit-slicing.  The table-based code remains a best-effort fallback for
   the increasingly-rare case where a real solution is unavailable.

4. crypto_aes_ctx and aes_expandkey() will remain for now, but only for
   callers that are using them specifically for the AES key expansion
   (as opposed to en/decrypting data with the AES library).

This commit begins the migration process by introducing the new structs
and functions, backed by the new generic AES implementation.

To allow callers to be incrementally converted, aes_encrypt() and
aes_decrypt() are temporarily changed into macros that use a _Generic
expression to call either the old functions (which take crypto_aes_ctx)
or the new functions (which take the new types).  Once all callers have
been updated, these macros will go away, the old functions will be
removed, and the "_new" suffix will be dropped from the new functions.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:39:58 -08:00
Eric Biggers
959a634ebc lib/crypto: mldsa: Add FIPS cryptographic algorithm self-test
Since ML-DSA is FIPS-approved, add the boot-time self-test which is
apparently required.

Just add a test vector manually for now, borrowed from
lib/crypto/tests/mldsa-testvecs.h (where in turn it's borrowed from
leancrypto).  The SHA-* FIPS test vectors are generated by
scripts/crypto/gen-fips-testvecs.py instead, but the common Python
libraries don't support ML-DSA yet.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20260107044215.109930-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:50 -08:00
Eric Biggers
0d92c55532 lib/crypto: nh: Restore dependency of arch code on !KMSAN
Since the architecture-specific implementations of NH initialize memory
in assembly code, they aren't compatible with KMSAN as-is.

Fixes: 382de740759a ("lib/crypto: nh: Add NH library")
Link: https://lore.kernel.org/r/20260105053652.1708299-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:50 -08:00
Rusydi H. Makarim
c8bf0b969d lib/crypto: md5: Use rol32() instead of open-coding it
For the bitwise left rotation in MD5STEP, use rol32() from
<linux/bitops.h> instead of open-coding it.

Signed-off-by: Rusydi H. Makarim <rusydi.makarim@kriptograf.id>
Link: https://lore.kernel.org/r/20251214-rol32_in_md5-v1-1-20f5f11a92b2@kriptograf.id
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:50 -08:00
Eric Biggers
a229d83235 lib/crypto: x86/nh: Migrate optimized code into library
Migrate the x86_64 implementations of NH into lib/crypto/.  This makes
the nh() function be optimized on x86_64 kernels.

Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code.  This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:50 -08:00
Eric Biggers
b4a8528d17 lib/crypto: arm64/nh: Migrate optimized code into library
Migrate the arm64 NEON implementation of NH into lib/crypto/.  This
makes the nh() function be optimized on arm64 kernels.

Note: this temporarily makes the adiantum template not utilize the arm64
optimized NH code.  This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:50 -08:00
Eric Biggers
29e39a11f5 lib/crypto: arm/nh: Migrate optimized code into library
Migrate the arm32 NEON implementation of NH into lib/crypto/.  This
makes the nh() function be optimized on arm32 kernels.

Note: this temporarily makes the adiantum template not utilize the arm32
optimized NH code.  This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:49 -08:00
Eric Biggers
7246fe6cd6 lib/crypto: tests: Add KUnit tests for NH
Add some simple KUnit tests for the nh() function.

These replace the test coverage which will be lost by removing the
nhpoly1305 crypto_shash.

Note that the NH code also continues to be tested indirectly as well,
via the tests for the "adiantum(xchacha12,aes)" crypto_skcipher.

Link: https://lore.kernel.org/r/20251211011846.8179-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:49 -08:00
Eric Biggers
14e15c71d7 lib/crypto: nh: Add NH library
Add support for the NH "almost-universal hash function" to lib/crypto/,
specifically the variant of NH used in Adiantum.

This will replace the need for the "nhpoly1305" crypto_shash algorithm.
All the implementations of "nhpoly1305" use architecture-optimized code
only for the NH stage; they just use the generic C Poly1305 code for the
Poly1305 stage.  We can achieve the same result in a simpler way using
an (architecture-optimized) nh() function combined with code in
crypto/adiantum.c that passes the results to the Poly1305 library.

This commit begins this cleanup by adding the nh() function.  The code
is derived from crypto/nhpoly1305.c and include/crypto/nhpoly1305.h.

Link: https://lore.kernel.org/r/20251211011846.8179-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:49 -08:00
Eric Biggers
ed894faccb lib/crypto: tests: Add KUnit tests for ML-DSA verification
Add a KUnit test suite for ML-DSA verification, including the following
for each ML-DSA parameter set (ML-DSA-44, ML-DSA-65, and ML-DSA-87):

- Positive test (valid signature), using vector imported from leancrypto
- Various negative tests:
    - Wrong length for signature, message, or public key
    - Out-of-range coefficients in z vector
    - Invalid encoded hint vector
    - Any bit flipped in signature, message, or public key
- Unit test for the internal function use_hint()
- A benchmark

ML-DSA inputs and outputs are very large.  To keep the size of the tests
down, use just one valid test vector per parameter set, and generate the
negative tests at runtime by mutating the valid test vector.

I also considered importing the test vectors from Wycheproof.  I've
tested that mldsa_verify() indeed passes all of Wycheproof's ML-DSA test
vectors that use an empty context string.  However, importing these
permanently would add over 6 MB of source.  That's too much to be a
reasonable addition to the Linux kernel tree for one algorithm.  It also
wouldn't actually provide much better test coverage than this commit.
Another potential issue is that Wycheproof uses the Apache license.

Similarly, this also differs from the earlier proposal to import a long
list of test vectors from leancrypto.  I retained only one valid
signature for each algorithm, and I also added (runtime-generated)
negative tests which were missing.  I think this is a better tradeoff.

Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20251214181712.29132-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:49 -08:00
Eric Biggers
64edccea59 lib/crypto: Add ML-DSA verification support
Add support for verifying ML-DSA signatures.

ML-DSA (Module-Lattice-Based Digital Signature Algorithm) is specified
in FIPS 204 and is the standard version of Dilithium.  Unlike RSA and
elliptic-curve cryptography, ML-DSA is believed to be secure even
against adversaries in possession of a large-scale quantum computer.

Compared to the earlier patch
(https://lore.kernel.org/r/20251117145606.2155773-3-dhowells@redhat.com/)
that was based on "leancrypto", this implementation:

  - Is about 700 lines of source code instead of 4800.

  - Generates about 4 KB of object code instead of 28 KB.

  - Uses 9-13 KB of memory to verify a signature instead of 31-84 KB.

  - Is at least about the same speed, with a microbenchmark showing 3-5%
    improvements on one x86_64 CPU and -1% to 1% changes on another.
    When memory is a bottleneck, it's likely much faster.

  - Correctly implements the RejNTTPoly step of the algorithm.

The API just consists of a single function mldsa_verify(), supporting
pure ML-DSA with any standard parameter set (ML-DSA-44, ML-DSA-65, or
ML-DSA-87) as selected by an enum.  That's all that's actually needed.

The following four potential features are unneeded and aren't included.
However, any that ever become needed could fairly easily be added later,
as they only affect how the message representative mu is calculated:

  - Nonempty context strings
  - Incremental message hashing
  - HashML-DSA
  - External mu

Signing support would, of course, be a larger and more complex addition.
However, the kernel doesn't, and shouldn't, need ML-DSA signing support.

Note that mldsa_verify() allocates memory, so it can sleep and can fail
with ENOMEM.  Unfortunately we don't have much choice about that, since
ML-DSA needs a lot of memory.  At least callers have to check for errors
anyway, since the signature could be invalid.

Note that verification doesn't require constant-time code, and in fact
some steps are inherently variable-time.  I've used constant-time
patterns in some places anyway, but technically they're not needed.

Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20251214181712.29132-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12 11:07:49 -08:00
Linus Torvalds
7143203341 Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:

 - A couple more fixes for the lib/crypto KUnit tests

 - Fix missing MMU protection for the AES S-box

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: aes: Fix missing MMU protection for AES S-box
  MAINTAINERS: add test vector generation scripts to "CRYPTO LIBRARY"
  lib/crypto: tests: Fix syntax error for old python versions
  lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
2026-01-11 15:07:56 -10:00
Thomas Gleixner
2e4b28c48f treewide: Update email address
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-11 06:09:11 -10:00
Andy Shevchenko
80c70bfb95 scatterlist: introduce sg_nents_for_dma() helper
Sometimes the user needs to split each entry on the mapped scatter list
due to DMA length constrains. This helper returns a number of entities
assuming that each of them is not bigger than supplied maximum length.

Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260108105619.3513561-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-01-09 08:36:00 +05:30
Eric Biggers
74d74bb78a lib/crypto: aes: Fix missing MMU protection for AES S-box
__cacheline_aligned puts the data in the ".data..cacheline_aligned"
section, which isn't marked read-only i.e. it doesn't receive MMU
protection.  Replace it with ____cacheline_aligned which does the right
thing and just aligns the data while keeping it in ".rodata".

Fixes: b5e0b032b6 ("crypto: aes - add generic time invariant AES cipher")
Cc: stable@vger.kernel.org
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://lore.kernel.org/r/20260105074712.498-1-dqfext@gmail.com/
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260107052023.174620-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-08 11:14:59 -08:00
Thomas Weißschuh
fcff71fd88 lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
On my development machine the generic, memcpy()-only implementation of
polyval_preparekey() is too fast for the IRQ workers to actually fire.
The test fails.

Increase the iterations to make the test more robust.
The test will run for a maximum of one second in any case.

[EB: This failure was already fixed by commit c31f4aa8fe ("kunit:
Enforce task execution in {soft,hard}irq contexts").  I'm still applying
this patch too, since the iteration count in this test made its running
time much shorter than the other similar ones.]

Fixes: b3aed551b3 ("lib/crypto: tests: Add KUnit tests for POLYVAL")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20260102-kunit-polyval-fix-v1-1-5313b5a65f35@linutronix.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-08 11:14:59 -08:00
Stefan Wiehler
1d1fd18869 Kconfig.ubsan: Remove CONFIG_UBSAN_REPORT_FULL from documentation
There is no indication in the history that such an option was merged to
mainline.

Fixes: c637693b20 ("ubsan: remove UBSAN_MISC in favor of individual options")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Link: https://patch.msgid.link/20260107114833.2030995-1-stefan.wiehler@nokia.com
Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-07 12:16:03 -08:00
Greg Kroah-Hartman
90b5f2dce9 test_list_sort: fix up const mismatch
In the internal cmp function, a const pointer is cast out to a non-const
pointer by using container_of().  This is probably not what is intended
at all, so fix up the const marking to properly preserve what is really
happening (i.e. the const should flow through the container_of() call)

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Cc: linux-kernel@vger.kernel.org

Link: https://lore.kernel.org/all/2025121751-backtrack-manifesto-7c57@gregkh/#r
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-05 15:32:03 -07:00
Greg Kroah-Hartman
e70a307b85 kunit: fix up const mis-match in many assert functions
In many kunit assert functions a const pointer is passed to
container_of() and out pops a non-const pointer, which really isn't the
correct thing to do at all.  Fix this up by correctly marking the
casted-to pointer as const to preserve the marking.

Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <raemoar63@gmail.com>
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com

Link: https://lore.kernel.org/r/2025121746-result-staleness-5a68@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-05 15:32:03 -07:00
Marco Elver
322366b8f1 rhashtable: Enable context analysis
Enable context analysis for rhashtable, which was used as an initial
test as it contains a combination of RCU, mutex, and bit_spinlock usage.

Users of rhashtable now also benefit from annotations on the API, which
will now warn if the RCU read lock is not held where required.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-33-elver@google.com
2026-01-05 16:43:35 +01:00
Marco Elver
c3d3023f1c stackdepot: Enable context analysis
Enable context analysis for stackdepot.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-32-elver@google.com
2026-01-05 16:43:35 +01:00
Marco Elver
3635ad8782 compiler: Let data_race() imply disabled context analysis
Many patterns that involve data-racy accesses often deliberately ignore
normal synchronization rules to avoid taking a lock.

If we have a lock-guarded variable on which we do a lock-less data-racy
access, rather than having to write context_unsafe(data_race(..)),
simply make the data_race(..) macro imply context-unsafety. The
data_race() macro already denotes the intent that something subtly
unsafe is about to happen, so it should be clear enough as-is.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-27-elver@google.com
2026-01-05 16:43:34 +01:00
Marco Elver
e4588c25c9 compiler-context-analysis: Remove __cond_lock() function-like helper
As discussed in [1], removing __cond_lock() will improve the readability
of trylock code. Now that Sparse context tracking support has been
removed, we can also remove __cond_lock().

Change existing APIs to either drop __cond_lock() completely, or make
use of the __cond_acquires() function attribute instead.

In particular, spinlock and rwlock implementations required switching
over to inline helpers rather than statement-expressions for their
trylock_* variants.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250207082832.GU7145@noisy.programming.kicks-ass.net/ [1]
Link: https://patch.msgid.link/20251219154418.3592607-25-elver@google.com
2026-01-05 16:43:33 +01:00
Marco Elver
47907461e4 locking/ww_mutex: Support Clang's context analysis
Add support for Clang's context analysis for ww_mutex.

The programming model for ww_mutex is subtly more complex than other
locking primitives when using ww_acquire_ctx. Encoding the respective
pre-conditions for ww_mutex lock/unlock based on ww_acquire_ctx state
using Clang's context analysis makes incorrect use of the API harder.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-21-elver@google.com
2026-01-05 16:43:32 +01:00
Marco Elver
d3febf16de locking/local_lock: Support Clang's context analysis
Add support for Clang's context analysis for local_lock_t and
local_trylock_t.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-20-elver@google.com
2026-01-05 16:43:31 +01:00
Marco Elver
e4fd3be884 locking/rwsem: Support Clang's context analysis
Add support for Clang's context analysis for rw_semaphore.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-18-elver@google.com
2026-01-05 16:43:31 +01:00
Marco Elver
f0b7ce22d7 srcu: Support Clang's context analysis
Add support for Clang's context analysis for SRCU.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://patch.msgid.link/20251219154418.3592607-16-elver@google.com
2026-01-05 16:43:30 +01:00
Marco Elver
fe00f6e846 rcu: Support Clang's context analysis
Improve the existing annotations to properly support Clang's context
analysis.

The old annotations distinguished between RCU, RCU_BH, and RCU_SCHED;
however, to more easily be able to express that "hold the RCU read lock"
without caring if the normal, _bh(), or _sched() variant was used we'd
have to remove the distinction of the latter variants: change the _bh()
and _sched() variants to also acquire "RCU".

When (and if) we introduce context locks to denote more generally that
"IRQ", "BH", "PREEMPT" contexts are disabled, it would make sense to
acquire these instead of RCU_BH and RCU_SCHED respectively.

The above change also simplified introducing __guarded_by support, where
only the "RCU" context lock needs to be held: introduce __rcu_guarded,
where Clang's context analysis warns if a pointer is dereferenced
without any of the RCU locks held, or updated without the appropriate
helpers.

The primitives rcu_assign_pointer() and friends are wrapped with
context_unsafe(), which enforces using them to update RCU-protected
pointers marked with __rcu_guarded.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://patch.msgid.link/20251219154418.3592607-15-elver@google.com
2026-01-05 16:43:30 +01:00
Marco Elver
eb7d96a13b bit_spinlock: Support Clang's context analysis
The annotations for bit_spinlock.h have simply been using "bitlock" as
the token. For Sparse, that was likely sufficient in most cases. But
Clang's context analysis is more precise, and we need to ensure we
can distinguish different bitlocks.

To do so, add a token context, and a macro __bitlock(bitnum, addr)
that is used to construct unique per-bitlock tokens.

Add the appropriate test.

<linux/list_bl.h> is implicitly included through other includes, and
requires 2 annotations to indicate that acquisition (without release)
and release (without prior acquisition) of its bitlock is intended.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-14-elver@google.com
2026-01-05 16:43:30 +01:00
Marco Elver
8f8a55f49c locking/seqlock: Support Clang's context analysis
Add support for Clang's context analysis for seqlock_t.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-12-elver@google.com
2026-01-05 16:43:29 +01:00
Marco Elver
370f0a345a locking/mutex: Support Clang's context analysis
Add support for Clang's context analysis for mutex.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-11-elver@google.com
2026-01-05 16:43:29 +01:00
Marco Elver
f16a802d40 locking/rwlock, spinlock: Support Clang's context analysis
Add support for Clang's context analysis for raw_spinlock_t,
spinlock_t, and rwlock. This wholesale conversion is required because
all three of them are interdependent.

To avoid warnings in constructors, the initialization functions mark a
lock as acquired when initialized before guarded variables.

The test verifies that common patterns do not generate false positives.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-9-elver@google.com
2026-01-05 16:43:28 +01:00
Marco Elver
9b00c1609d compiler-context-analysis: Add test stub
Add a simple test stub where we will add common supported patterns that
should not generate false positives for each new supported context lock.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-4-elver@google.com
2026-01-05 16:43:27 +01:00
Marco Elver
3269701cb2 compiler-context-analysis: Add infrastructure for Context Analysis with Clang
Context Analysis is a language extension, which enables statically
checking that required contexts are active (or inactive), by acquiring
and releasing user-definable "context locks". An obvious application is
lock-safety checking for the kernel's various synchronization primitives
(each of which represents a "context lock"), and checking that locking
rules are not violated.

Clang originally called the feature "Thread Safety Analysis" [1]. This
was later changed and the feature became more flexible, gaining the
ability to define custom "capabilities". Its foundations can be found in
"Capability Systems" [2], used to specify the permissibility of
operations to depend on some "capability" being held (or not held).

Because the feature is not just able to express "capabilities" related
to synchronization primitives, and "capability" is already overloaded in
the kernel, the naming chosen for the kernel departs from Clang's
"Thread Safety" and "capability" nomenclature; we refer to the feature
as "Context Analysis" to avoid confusion. The internal implementation
still makes references to Clang's terminology in a few places, such as
`-Wthread-safety` being the warning option that also still appears in
diagnostic messages.

 [1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
 [2] https://www.cs.cornell.edu/talc/papers/capabilities.pdf

See more details in the kernel-doc documentation added in this and
subsequent changes.

Clang version 22+ is required.

[peterz: disable the thing for __CHECKER__ builds]
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-3-elver@google.com
2026-01-05 16:43:26 +01:00
Matthew Wilcox (Oracle)
c6e8e595a0 idr: fix idr_alloc() returning an ID out of range
If you use an IDR with a non-zero base, and specify a range that lies
entirely below the base, 'max - base' becomes very large and
idr_get_free() can return an ID that lies outside of the requested range.

Link: https://lkml.kernel.org/r/20251128161853.3200058-1-willy@infradead.org
Fixes: 6ce711f275 ("idr: Make 1-based IDRs more efficient")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6449
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-12-23 11:23:11 -08:00
Linus Torvalds
fa084c35af Merge tag 'linux_kselftest-kunit-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
 "Drop unused parameter from kunit_device_register_internal and make
  FAULT_TEST default to n when PANIC_ON_OOPS"

* tag 'linux_kselftest-kunit-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: make FAULT_TEST default to n when PANIC_ON_OOPS
  kunit: Drop unused parameter from kunit_device_register_internal
2025-12-20 11:59:06 -08:00
Ihor Solodrai
903922cfa0 lib/Kconfig.debug: Set the minimum required pahole version to v1.22
Subsequent patches in the series change vmlinux linking scripts to
unconditionally pass --btf_encode_detached to pahole, which was
introduced in v1.22 [1][2].

This change allows to remove PAHOLE_HAS_SPLIT_BTF Kconfig option and
other checks of older pahole versions.

[1] https://github.com/acmel/dwarves/releases/tag/v1.22
[2] https://lore.kernel.org/bpf/cbafbf4e-9073-4383-8ee6-1353f9e5869c@oracle.com/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-1-ihor.solodrai@linux.dev
2025-12-19 10:55:40 -08:00
Marc Zyngier
c119e66853 genirq: Remove IRQ timing tracking infrastructure
The IRQ timing tracking infrastructure was merged in 2019, but was never
plumbed in, is not selectable, and is therefore never used.

As Daniel agrees that there is little hope for this infrastructure to be
completed in the near term, drop it altogether.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/87zf7vex6h.wl-maz@kernel.org
Link: https://patch.msgid.link/20251210082242.360936-2-maz@kernel.org
2025-12-15 22:20:50 +01:00
Brendan Jackman
c33b68801f kunit: make FAULT_TEST default to n when PANIC_ON_OOPS
As describe in the help string, the user might want to disable these
tests if they don't like to see stacktraces/BUG etc in their kernel log.

However, if they enable PANIC_ON_OOPS, these tests also crash the
machine, which it's safe to assume _almost_ nobody wants.

One might argue that _absolutely_ nobody ever wants their kernel to
crash so this should just be a hard dependency instead of a default.
However, since this is rather special code that's anyway concerned with
deliberately doing "bad" things, the normal rules don't seem to apply,
hence prefer flexibility and allow users to set up a crashing Kconfig if
they so choose.

Link: https://lore.kernel.org/r/20251207-kunit-fault-no-panic-v1-1-2ac932f26864@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-15 09:27:19 -07:00
Uwe Kleine-König
726c93b040 kunit: Drop unused parameter from kunit_device_register_internal
The passed driver isn't used, so just drop this parameter.

Link: https://lore.kernel.org/r/20251210065839.482608-2-u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-15 09:27:19 -07:00
Charles Mirabile
5a0b188250 lib/crypto: riscv: Add poly1305-core.S to .gitignore
poly1305-core.S is an auto-generated file, so it should be ignored.

Fixes: bef9c75598 ("lib/crypto: riscv/poly1305: Import OpenSSL/CRYPTOGAMS implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Link: https://lore.kernel.org/r/20251212184717.133701-1-cmirabil@redhat.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-14 10:18:22 -08:00
Linus Torvalds
edbe407235 Merge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:

 - Improve bug reporting

 - Suppress W=1 format warning

 - Improve rseq scalability on Clang builds

* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Always inline rseq_debug_syscall_return()
  bug: Hush suggest-attribute=format for __warn_printf()
  bug: Let report_bug_entry() provide the correct bugaddr
2025-12-14 06:04:16 +12:00
Brendan Jackman
d36067d6ea bug: Hush suggest-attribute=format for __warn_printf()
Recent additions to this function cause GCC 14.3.0 to get excited
(W=1) and suggest a missing attribute:

	lib/bug.c: In function '__warn_printf':
	lib/bug.c:187:25: error: function '__warn_printf' be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
	  187 |                         vprintk(fmt, *args);
	      |                         ^~~~~~~

Disable the diagnostic locally, following the pattern used for stuff
like va_format().

Fixes: 5c47b7f3d1 ("bug: Add BUG_FORMAT_ARGS infrastructure")
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251207-warn-printf-gcc-v1-1-b597d612b94b@google.com
2025-12-12 10:26:26 +01:00
Heiko Carstens
b5e51ef787 bug: Let report_bug_entry() provide the correct bugaddr
report_bug_entry() always provides zero for bugaddr but could easily
extract the correct address from the provided bug_entry. Just do that to
have proper warning messages.

E.g. adding an artificial:

  void foo(void) { WARN_ONCE(1, "bar"); }

function generates this warning message:

  WARNING: arch/s390/kernel/setup.c:1017 at 0x0, CPU#0: swapper/0/0
                                            ^^^

With the correct bug address this changes to:

  WARNING: arch/s390/kernel/setup.c:1017 at foo+0x1c/0x40, CPU#0: swapper/0/0
                                            ^^^^^^^^^^^^^

Fixes: 7d2c27a0ec ("bug: Add report_bug_entry()")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251208200658.3431511-1-hca@linux.ibm.com
2025-12-12 10:26:13 +01:00
Eric Biggers
68b233b1d5 lib/crypto: blake2s: Replace manual unrolling with unrolled_full
As we're doing in the BLAKE2b code, use unrolled_full to make the
compiler handle the loop unrolling.  This simplifies the code slightly.
The generated object code is nearly the same with both gcc and clang.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251205051155.25274-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09 15:10:21 -08:00
Eric Biggers
2e8f7b170a lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit
BLAKE2b has a state of 16 64-bit words.  Add the message data in and
there are 32 64-bit words.  With the current code where all the rounds
are unrolled to enable constant-folding of the blake2b_sigma values,
this results in a very large code size on 32-bit kernels, including a
recurring issue where gcc uses a large amount of stack.

There's just not much benefit to this unrolling when the code is already
so large.  Let's roll up the rounds when !CONFIG_64BIT.

To avoid having to duplicate the code, just write the code once using a
loop, and conditionally use 'unrolled_full' from <linux/unroll.h>.

Then, fold the now-unneeded ROUND() macro into the loop.  Finally, also
remove the now-unneeded override of the stack frame size warning.

Code size improvements for blake2b_compress_generic():

                  Size before (bytes)    Size after (bytes)
                  -------------------    ------------------
    i386, gcc           27584                 3632
    i386, clang         18208                 3248
    arm32, gcc          19912                 2860
    arm32, clang        21336                 3344

Running the BLAKE2b benchmark on a !CONFIG_64BIT kernel on an x86_64
processor shows a 16384B throughput change of 351 => 340 MB/s (gcc) or
442 MB/s => 375 MB/s (clang).  So clearly not much of a slowdown either.
But also that microbenchmark also effectively disregards cache usage,
which is important in practice and is far better in the smaller code.

Note: If we rolled up the loop on x86_64 too, the change would be
7024 bytes => 1584 bytes and 1960 MB/s => 1396 MB/s (gcc), or
6848 bytes => 1696 bytes and 1920 MB/s => 1263 MB/s (clang).
Maybe still worth it, though not quite as clearly beneficial.

Fixes: 91d689337f ("crypto: blake2b - add blake2b generic implementation")
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251205050330.89704-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09 15:10:21 -08:00
Eric Biggers
1cd5bb6e9e lib/crypto: riscv: Depend on RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
Replace the RISCV_ISA_V dependency of the RISC-V crypto code with
RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS, which implies RISCV_ISA_V as
well as vector unaligned accesses being efficient.

This is necessary because this code assumes that vector unaligned
accesses are supported and are efficient.  (It does so to avoid having
to use lots of extra vsetvli instructions to switch the element width
back and forth between 8 and either 32 or 64.)

This was omitted from the code originally just because the RISC-V kernel
support for detecting this feature didn't exist yet.  Support has now
been added, but it's fragmented into per-CPU runtime detection, a
command-line parameter, and a kconfig option.  The kconfig option is the
only reasonable way to do it, though, so let's just rely on that.

Fixes: eb24af5d7a ("crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}")
Fixes: bb54668837 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Fixes: 600a3853df ("crypto: riscv - add vector crypto accelerated GHASH")
Fixes: 8c8e40470f ("crypto: riscv - add vector crypto accelerated SHA-{256,224}")
Fixes: b3415925a0 ("crypto: riscv - add vector crypto accelerated SHA-{512,384}")
Fixes: 563a5255af ("crypto: riscv - add vector crypto accelerated SM3")
Fixes: b8d06352bb ("crypto: riscv - add vector crypto accelerated SM4")
Cc: stable@vger.kernel.org
Reported-by: Vivian Wang <wangruikang@iscas.ac.cn>
Closes: https://lore.kernel.org/r/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/
Reviewed-by: Jerry Shih <jerry.shih@sifive.com>
Link: https://lore.kernel.org/r/20251206213750.81474-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09 15:10:21 -08:00
Vivian Wang
43169328c7 lib/crypto: riscv/chacha: Avoid s0/fp register
In chacha_zvkb, avoid using the s0 register, which is the frame pointer,
by reallocating KEY0 to t5. This makes stack traces available if e.g. a
crash happens in chacha_zvkb.

No frame pointer maintenance is otherwise required since this is a leaf
function.

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Fixes: bb54668837 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20251202-riscv-chacha_zvkb-fp-v2-1-7bd00098c9dc@iscas.ac.cn
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09 15:10:20 -08:00
Christoph Hellwig
12eef14bcb lockref: add a __cond_lock annotation for lockref_put_or_lock
Add a cond_lock annotation for lockref_put_or_lock to make sparse
happy with using it.  Note that for this the return value has to be
double-inverted as the return value convention of lockref_put_or_lock
is inverted compared to _trylock conventions expected by __cond_lock,
as lockref_put_or_lock returns true when it did not need to take the
lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-10 05:58:51 +09:00