Commit Graph

361 Commits

Author SHA1 Message Date
Linus Torvalds
37b33c68b0 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:

 - Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
   directly accessible via the library API, instead of requiring the
   crypto API. This is much simpler and more efficient.

 - Convert some users such as ext4 to use the CRC32 library API instead
   of the crypto API. More conversions like this will come later.

 - Add a KUnit test that tests and benchmarks multiple CRC variants.
   Remove older, less-comprehensive tests that are made redundant by
   this.

 - Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
   volunteering to maintain it. I have additional cleanups and
   optimizations planned for future cycles.

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits)
  MAINTAINERS: add entry for CRC library
  powerpc/crc: delete obsolete crc-vpmsum_test.c
  lib/crc32test: delete obsolete crc32test.c
  lib/crc16_kunit: delete obsolete crc16_kunit.c
  lib/crc_kunit.c: add KUnit test suite for CRC library functions
  powerpc/crc-t10dif: expose CRC-T10DIF function through lib
  arm64/crc-t10dif: expose CRC-T10DIF function through lib
  arm/crc-t10dif: expose CRC-T10DIF function through lib
  x86/crc-t10dif: expose CRC-T10DIF function through lib
  crypto: crct10dif - expose arch-optimized lib function
  lib/crc-t10dif: add support for arch overrides
  lib/crc-t10dif: stop wrapping the crypto API
  scsi: target: iscsi: switch to using the crc32c library
  f2fs: switch to using the crc32 library
  jbd2: switch to using the crc32c library
  ext4: switch to using the crc32c library
  lib/crc32: make crc32c() go directly to lib
  bcachefs: Explicitly select CRYPTO from BCACHEFS_FS
  x86/crc32: expose CRC32 functions through lib
  x86/crc32: update prototype for crc32_pclmul_le_16()
  ...
2025-01-22 19:55:08 -08:00
Peter Zijlstra
cdd30ebb1b module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498f ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.

Scripted using

  git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
  do
    awk -i inplace '
      /^#define EXPORT_SYMBOL_NS/ {
        gsub(/__stringify\(ns\)/, "ns");
        print;
        next;
      }
      /^#define MODULE_IMPORT_NS/ {
        gsub(/__stringify\(ns\)/, "ns");
        print;
        next;
      }
      /MODULE_IMPORT_NS/ {
        $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
      }
      /EXPORT_SYMBOL_NS/ {
        if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
  	if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
  	    $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
  	    $0 !~ /^my/) {
  	  getline line;
  	  gsub(/[[:space:]]*\\$/, "");
  	  gsub(/[[:space:]]/, "", line);
  	  $0 = $0 " " line;
  	}

  	$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
  		    "\\1(\\2, \"\\3\")", "g");
        }
      }
      { print }' $file;
  done

Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 11:34:44 -08:00
Eric Biggers
2051da8585 arm64/crc-t10dif: expose CRC-T10DIF function through lib
Move the arm64 CRC-T10DIF assembly code into the lib directory and wire
it up to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Note: to see the diff from arch/arm64/crypto/crct10dif-ce-glue.c to
arch/arm64/lib/crc-t10dif-glue.c, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20241202012056.209768-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01 17:23:13 -08:00
Linus Torvalds
02b2f1a7b8 Merge tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Add sig driver API
   - Remove signing/verification from akcipher API
   - Move crypto_simd_disabled_for_test to lib/crypto
   - Add WARN_ON for return values from driver that indicates memory
     corruption

  Algorithms:
   - Provide crc32-arch and crc32c-arch through Crypto API
   - Optimise crc32c code size on x86
   - Optimise crct10dif on arm/arm64
   - Optimise p10-aes-gcm on powerpc
   - Optimise aegis128 on x86
   - Output full sample from test interface in jitter RNG
   - Retry without padata when it fails in pcrypt

  Drivers:
   - Add support for Airoha EN7581 TRNG
   - Add support for STM32MP25x platforms in stm32
   - Enable iproc-r200 RNG driver on BCMBCA
   - Add Broadcom BCM74110 RNG driver"

* tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits)
  crypto: marvell/cesa - fix uninit value for struct mv_cesa_op_ctx
  crypto: cavium - Fix an error handling path in cpt_ucode_load_fw()
  crypto: aesni - Move back to module_init
  crypto: lib/mpi - Export mpi_set_bit
  crypto: aes-gcm-p10 - Use the correct bit to test for P10
  hwrng: amd - remove reference to removed PPC_MAPLE config
  crypto: arm/crct10dif - Implement plain NEON variant
  crypto: arm/crct10dif - Macroify PMULL asm code
  crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl
  crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code
  crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply
  crypto: arm64/crct10dif - Remove obsolete chunking logic
  crypto: bcm - add error check in the ahash_hmac_init function
  crypto: caam - add error check to caam_rsa_set_priv_key_form
  hwrng: bcm74110 - Add Broadcom BCM74110 RNG driver
  dt-bindings: rng: add binding for BCM74110 RNG
  padata: Clean up in padata_do_multithreaded()
  crypto: inside-secure - Fix the return value of safexcel_xcbcmac_cra_init()
  crypto: qat - Fix missing destroy_workqueue in adf_init_aer()
  crypto: rsassa-pkcs1 - Reinstate support for legacy protocols
  ...
2024-11-19 10:28:41 -08:00
Ard Biesheuvel
779cee8209 crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code
The only remaining user of the fallback implementation of 64x64
polynomial multiplication using 8x8 PMULL instructions is the final
reduction from a 16 byte vector to a 16-bit CRC.

The fallback code is complicated and messy, and this reduction has
little impact on the overall performance, so instead, let's calculate
the final CRC by passing the 16 byte vector to the generic CRC-T10DIF
implementation when running the fallback version.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15 19:52:51 +08:00
Ard Biesheuvel
67dfb1b73f crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply
The CRC-T10DIF implementation for arm64 has a version that uses 8x8
polynomial multiplication, for cores that lack the crypto extensions,
which cover the 64x64 polynomial multiplication instruction that the
algorithm was built around.

This fallback version rather naively adopted the 64x64 polynomial
multiplication algorithm that I ported from ARM for the GHASH driver,
which needs 8 PMULL8 instructions to implement one PMULL64. This is
reasonable, given that each 8-bit vector element needs to be multiplied
with each element in the other vector, producing 8 vectors with partial
results that need to be combined to yield the correct result.

However, most PMULL64 invocations in the CRC-T10DIF code involve
multiplication by a pair of 16-bit folding coefficients, and so all the
partial results from higher order bytes will be zero, and there is no
need to calculate them to begin with.

Then, the CRC-T10DIF algorithm always XORs the output values of the
PMULL64 instructions being issued in pairs, and so there is no need to
faithfully implement each individual PMULL64 instruction, as long as
XORing the results pairwise produces the expected result.

Implementing these improvements results in a speedup of 3.3x on low-end
platforms such as Raspberry Pi 4 (Cortex-A72)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15 19:52:51 +08:00
Ard Biesheuvel
7048c21e6b crypto: arm64/crct10dif - Remove obsolete chunking logic
This is a partial revert of commit fc754c024a, which moved the logic
into C code which ensures that kernel mode NEON code does not hog the
CPU for too long.

This is no longer needed now that kernel mode NEON no longer disables
preemption, so we can drop this.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15 19:52:51 +08:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Jia He
9369693a2c crypto: arm64/poly1305 - move data to rodata section
When objtool gains support for ARM in the future, it may encounter issues
disassembling the following data in the .text section:
> .Lzeros:
> .long   0,0,0,0,0,0,0,0
> .asciz  "Poly1305 for ARMv8, CRYPTOGAMS by \@dot-asm"
> .align  2

Move it to .rodata which is a more appropriate section for read-only data.

There is a limit on how far the label can be from the instruction, hence
use "adrp" and low 12bits offset of the label to avoid the compilation
error.

Signed-off-by: Jia He <justin.he@arm.com>
Tested-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-08-17 13:55:49 +08:00
Herbert Xu
b0cd6f4c3f Revert "crypto: arm64/poly1305 - move data to rodata section"
This reverts commit 47d9625209.

It causes build issues as detected by the kernel test robot.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408040817.OWKXtCv6-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-08-06 13:45:59 +08:00
Jia He
47d9625209 crypto: arm64/poly1305 - move data to rodata section
When objtool gains support for ARM in the future, it may encounter issues
disassembling the following data in the .text section:
> .Lzeros:
> .long   0,0,0,0,0,0,0,0
> .asciz  "Poly1305 for ARMv8, CRYPTOGAMS by \@dot-asm"
> .align  2

Move it to .rodata which is a more appropriate section for read-only data.

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-08-02 21:11:20 +08:00
Jeff Johnson
b568826eff crypto: arm64 - add missing MODULE_DESCRIPTION() macros
With ARCH=arm64, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/crct10dif-ce.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/poly1305-neon.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/aes-neon-bs.o

Add the missing invocations of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-06-21 22:04:16 +10:00
Mark Brown
a720de9fba crypto: arm64/crc10dif - Raise priority of NEON crct10dif implementation
The NEON implementation of crctd10dif is registered with a priority of 100
which is identical to that used by the generic C implementation. Raise the
priority to 150, half way between the PMULL based implementation and the
NEON one, so that it will be preferred over the generic implementation.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-31 17:34:56 +08:00
Ard Biesheuvel
571e557cba crypto: arm64/aes-ce - Simplify round key load sequence
Tweak the round key logic so that they can be loaded using a single
branchless sequence using overlapping loads. This is shorter and
simpler, and puts the conditional branches based on the key size further
apart, which might benefit microarchitectures that cannot record taken
branches at every instruction. For these branches, use test-bit-branch
instructions that don't clobber the condition flags.

Note that none of this has any impact on performance, positive or
otherwise (and the branch prediction benefit would only benefit AES-192
which nobody uses). It does make for nicer code, though.

While at it, use \@ to generate the labels inside the macros, which is
more robust than using fixed numbers, which could clash inadvertently.
Also, bring aes-neon.S in line with these changes, including the switch
to test-and-branch instructions, to avoid surprises in the future when
we might start relying on the condition flags being preserved in the
chaining mode wrappers in aes-modes.S

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Linus Torvalds
c8e7699616 Merge tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:

   - Avoid unnecessary copying in scomp for trivial SG lists

  Algorithms:

   - Optimise NEON CCM implementation on ARM64

  Drivers:

   - Add queue stop/query debugfs support in hisilicon/qm

   - Intel qat updates and cleanups"

* tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (79 commits)
  Revert "crypto: remove CONFIG_CRYPTO_STATS"
  crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem
  crypto: tcrypt - add ffdhe2048(dh) test
  crypto: iaa - fix the missing CRYPTO_ALG_ASYNC in cra_flags
  crypto: hisilicon/zip - fix the missing CRYPTO_ALG_ASYNC in cra_flags
  hwrng: hisi - use dev_err_probe
  MAINTAINERS: Remove T Ambarus from few mchp entries
  crypto: iaa - Fix comp/decomp delay statistics
  crypto: iaa - Fix async_disable descriptor leak
  dt-bindings: rng: atmel,at91-trng: add sam9x7 TRNG
  dt-bindings: crypto: add sam9x7 in Atmel TDES
  dt-bindings: crypto: add sam9x7 in Atmel SHA
  dt-bindings: crypto: add sam9x7 in Atmel AES
  crypto: remove CONFIG_CRYPTO_STATS
  crypto: dh - Make public key test FIPS-only
  crypto: rockchip - fix to check return value
  crypto: jitter - fix CRYPTO_JITTERENTROPY help text
  crypto: qat - make ring to service map common for QAT GEN4
  crypto: qat - fix ring to service map for dcc in 420xx
  crypto: qat - fix ring to service map for dcc in 4xxx
  ...
2024-03-15 14:46:54 -07:00
Ard Biesheuvel
1c0cf6d196 crypto: arm64/neonbs - fix out-of-bounds access on short input
The bit-sliced implementation of AES-CTR operates on blocks of 128
bytes, and will fall back to the plain NEON version for tail blocks or
inputs that are shorter than 128 bytes to begin with.

It will call straight into the plain NEON asm helper, which performs all
memory accesses in granules of 16 bytes (the size of a NEON register).
For this reason, the associated plain NEON glue code will copy inputs
shorter than 16 bytes into a temporary buffer, given that this is a rare
occurrence and it is not worth the effort to work around this in the asm
code.

The fallback from the bit-sliced NEON version fails to take this into
account, potentially resulting in out-of-bounds accesses. So clone the
same workaround, and use a temp buffer for short in/outputs.

Fixes: fc074e1300 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk")
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:37:24 +08:00
Ard Biesheuvel
f691d444f9 crypto: arm64/aes-ccm - Merge finalization into en/decrypt asm helpers
The C glue code already infers whether or not the current iteration is
the final one, by comparing walk.nbytes with walk.total. This means we
can easily inform the asm helpers of this as well, by conditionally
passing a pointer to the original IV, which is used in the finalization
of the MAC. This removes the need for a separate call into the asm code
to perform the finalization.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Ard Biesheuvel
7150528849 crypto: arm64/aes-ccm - Merge encrypt and decrypt tail handling
The encryption and decryption code paths are mostly identical, except
for a small difference where the plaintext input into the MAC is taken
from either the input or the output block.

We can factor this in quite easily using a vector bit select, and a few
additional XORs, without the need for branches. This way, we can use the
same tail handling logic on the encrypt and decrypt code paths, allowing
further consolidation of the asm helpers in a subsequent patch.

(In the main loop, adding just a handful of ALU instructions results in
a noticeable performance hit [around 5% on Apple M2], so those routines
are kept separate)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Ard Biesheuvel
565def1542 crypto: arm64/aes-ccm - Cache round keys and unroll AES loops
The CCM code as originally written attempted to use as few NEON
registers as possible, to avoid having to eagerly preserve/restore the
entire NEON register file at every call to kernel_neon_begin/end. At
that time, this API took a number of NEON registers as a parameter, and
only preserved that many registers.

Today, the NEON register file is restored lazily, and the old API is
long gone. This means we can use as many NEON registers as we can make
meaningful use of, which means in the AES case that we can keep all
round keys in registers rather than reloading each of them for each AES
block processed.

On Cortex-A53, this results in a speedup of more than 50%. (From 4
cycles per byte to 2.6 cycles per byte)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Ard Biesheuvel
948ffc66e5 crypto: arm64/aes-ccm - Reuse existing MAC update for AAD input
CCM combines the counter (CTR) encryption mode with a MAC based on the
same block cipher. This MAC construction is a bit clunky: it invokes the
block cipher in a way that cannot be parallelized, resulting in poor CPU
pipeline efficiency.

The arm64 CCM code mitigates this by interleaving the encryption and MAC
at the AES round level, resulting in a substantial speedup. But this
approach does not apply to the additional authenticated data (AAD) which
is not encrypted.

This means the special asm routine dealing with the AAD is not any
better than the MAC update routine used by the arm64 AES block
encryption driver, so let's reuse that, and drop the special AES-CCM
version.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Ard Biesheuvel
c131098d6d crypto: arm64/aes-ccm - Replace bytewise tail handling with NEON permute
Implement the CCM tail handling using a single sequence that uses
permute vectors and overlapping loads and stores, rather than going over
the tail byte by byte in a loop, and using scalar operations. This is
more efficient, even though the measured speedup is only around 1-2% on
the CPUs I have tried.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Ard Biesheuvel
97c4c10daf crypto: arm64/aes-ccm - Pass short inputs via stack buffer
In preparation for optimizing the CCM core asm code using permutation
vectors and overlapping loads and stores, ensure that inputs shorter
than the size of a AES block are passed via a buffer on the stack, in a
way that positions the data at the end of a 16 byte buffer. This removes
the need for the asm code to reason about a rare corner case where the
tail of the data cannot be read/written using a single NEON load/store
instruction.

While at it, tweak the copyright header and authorship to bring it up to
date.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Ard Biesheuvel
88c6d50f64 crypto: arm64/aes-ccm - Keep NEON enabled during skcipher walk
Now that kernel mode NEON no longer disables preemption, we no longer
have to take care to disable and re-enable use of the NEON when calling
into the skcipher walk API. So just keep it enabled until done.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Ard Biesheuvel
f722002441 crypto: arm64/aes-ccm - Revert "Rewrite skcipher walker loop"
This reverts commit 57ead1bf1c, which updated the CCM code to only
rely on walk.nbytes to check for failures returned from the skcipher
walk API, mostly for the common good rather than to fix a particular
problem in the code.

This change introduces a problem of its own: the skcipher walk is
started with the 'atomic' argument set to false, which means that the
skcipher walk API is permitted to sleep. Subsequently, it invokes
skcipher_walk_done() with preemption disabled on the final iteration of
the loop. This appears to work by accident, but it is arguably a bad
example, and providing a better example was the point of the original
patch.

Given that future changes to the CCM code will rely on the original
behavior of entering the loop even for zero sized inputs, let's just
revert this change entirely, and proceed from there.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:32 +08:00
Herbert Xu
29fa12e918 crypto: arm64/sm4 - Remove cfb(sm4)
Remove the unused CFB implementation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-08 11:59:45 +08:00
Eric Biggers
1be7505933 crypto: arm64/sha512 - clean up backwards function names
In the Linux kernel, a function whose name has two leading underscores
is conventionally called by the same-named function without leading
underscores -- not the other way around.  __sha512_block_data_order()
got this backwards.  Fix this, albeit without changing the name in the
perlasm since that is OpenSSL code.  No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:26 +08:00
Eric Biggers
455951b5e1 crypto: arm64/sha256 - clean up backwards function names
In the Linux kernel, a function whose name has two leading underscores
is conventionally called by the same-named function without leading
underscores -- not the other way around.  __sha256_block_data_order()
and __sha256_block_neon() got this backwards.  Fix this, albeit without
changing the names in the perlasm since that is OpenSSL code.  No change
in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:26 +08:00
Eric Biggers
5f720a3df3 crypto: arm64/sha512-ce - clean up backwards function names
In the Linux kernel, a function whose name has two leading underscores
is conventionally called by the same-named function without leading
underscores -- not the other way around.  __sha512_ce_transform() and
__sha512_block_data_order() got this backwards.  Fix this, albeit
without changing "sha512_block_data_order" in the perlasm since that is
OpenSSL code.  No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:25 +08:00
Eric Biggers
ba30d31121 crypto: arm64/sha2-ce - clean up backwards function names
In the Linux kernel, a function whose name has two leading underscores
is conventionally called by the same-named function without leading
underscores -- not the other way around.  __sha2_ce_transform() and
__sha256_block_data_order() got this backwards.  Fix this, albeit
without changing "sha256_block_data_order" in the perlasm since that is
OpenSSL code.  No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:25 +08:00
Eric Biggers
1f9f3a5218 crypto: arm64/sha1-ce - clean up backwards function names
In the Linux kernel, a function whose name has two leading underscores
is conventionally called by the same-named function without leading
underscores -- not the other way around.  __sha1_ce_transform() got this
backwards.  Fix this.  No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:25 +08:00
Eric Biggers
ddefde7b2a crypto: arm64/nhpoly1305 - implement ->digest
Implement the ->digest method to improve performance on single-page
messages by reducing the number of indirect calls.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:25 +08:00
Eric Biggers
1efcbf0eff crypto: arm64/sha2-ce - implement ->digest for sha256
Implement a ->digest function for sha256-ce.  This improves the
performance of crypto_shash_digest() with this algorithm by reducing the
number of indirect calls that are made.  This only adds ~112 bytes of
code, mostly for the inlined init, as the finup function is tail-called.

For now, don't bother with this for sha224, since sha224 is rarely used.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:25 +08:00
Masahiro Yamada
ac2d838fb7 crypto: arm64/aes - remove Makefile hack
Do it more simiply. This also fixes single target builds.

[before]

  $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i
    [snip]
  make[4]: *** No rule to make target 'arch/arm64/crypto/aes-glue-ce.i'.  Stop.

[after]

  $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i
    [snip]
    CPP     arch/arm64/crypto/aes-glue-ce.i

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-08-11 19:19:27 +08:00
Herbert Xu
f573db7aa5 crypto: arm64/sha256-glue - Include module.h
Include module.h in arch/arm64/crypto/sha256-glue.c as it uses
various macros (such as MODULE_AUTHOR) that are defined there.

Also fix the ordering of types.h.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305191953.PIB1w80W-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-19 20:56:59 +08:00
Eric Biggers
47446d7cd4 crypto: arm64/aes-neonbs - fix crash with CFI enabled
aesbs_ecb_encrypt(), aesbs_ecb_decrypt(), aesbs_xts_encrypt(), and
aesbs_xts_decrypt() are called via indirect function calls.  Therefore
they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause
their type hashes to be emitted when the kernel is built with
CONFIG_CFI_CLANG=y.  Otherwise, the code crashes with a CFI failure if
the compiler doesn't happen to optimize out the indirect calls.

Fixes: c50d32859e ("arm64: Add types to indirect called assembly functions")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-03-14 17:06:44 +08:00
Herbert Xu
4e4a08868f crypto: arm64/sm4-gcm - Fix possible crash in GCM cryption
An often overlooked aspect of the skcipher walker API is that an
error is not just indicated by a non-zero return value, but by the
fact that walk->nbytes is zero.

Thus it is an error to call skcipher_walk_done after getting back
walk->nbytes == 0 from the previous interaction with the walker.

This is because when walk->nbytes is zero the walker is left in
an undefined state and any further calls to it may try to free
uninitialised stack memory.

The sm4 arm64 ccm code gets this wrong and ends up calling
skcipher_walk_done even when walk->nbytes is zero.

This patch rewrites the loop in a form that resembles other callers.

Reported-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Fixes: ae1b83c7d5 ("crypto: arm64/sm4 - add CE implementation for GCM mode")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-10 17:20:19 +08:00
Tianjia Zhang
3b9d902153 crypto: arm64/sm4-ccm - Rewrite skcipher walker loop
The fact that an error in the skcipher walker API are indicated
not only by a non-zero return value, but also by the fact that
walk->nbytes is zero, causes the layout of the skcipher walker
loop to be sufficiently different from the usual layout, which
is not a problem in itself, but it is likely to cause reading
confusion and difficulty in code maintenance.

This patch rewrites skcipher walker loop, and separates the
last chunk cryption from the loop to avoid wrong calls to the
skcipher walker API. In addition to following the usual convention
of checking walk->nbytes, it also makes the loop execute logic
clearer and easier to understand.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-10 17:20:19 +08:00
Herbert Xu
57ead1bf1c crypto: arm64/aes-ccm - Rewrite skcipher walker loop
An often overlooked aspect of the skcipher walker API is that an
error is not just indicated by a non-zero return value, but by the
fact that walk->nbytes is zero.

Thus it is an error to call skcipher_walk_done after getting back
walk->nbytes == 0 from the previous interaction with the walker.

This is because when walk->nbytes is zero the walker is left in
an undefined state and any further calls to it may try to free
uninitialised stack memory.

The arm64 ccm code has to deal with zero-length messages, and
it needs to process data even when walk->nbytes == 0 is returned.
It doesn't have this bug because there is an explicit check for
walk->nbytes != 0 prior to the skcipher_walk_done call.

However, the loop is still sufficiently different from the usual
layout and it appears to have been copied into other code which
then ended up with this bug.  This patch rewrites it to follow the
usual convention of checking walk->nbytes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-10 17:20:19 +08:00
Ard Biesheuvel
9e3457112b crypto: arm64/gcm - add RFC4106 support
Add support for RFC4106 ESP encapsulation to the accelerated GCM
implementation. This results in a ~10% speedup for IPsec frames of
typical size (~1420 bytes) on Cortex-A53.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-20 18:29:31 +08:00
Tianjia Zhang
736f88689c crypto: arm64/sm4 - fix possible crash with CFI enabled
The SM4 CCM/GCM assembly functions for encryption and decryption is
called via indirect function calls.  Therefore they need to use
SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash
to be emitted when the kernel is built with CONFIG_CFI_CLANG=y.
Otherwise, the code crashes with a CFI failure (if the compiler didn't
happen to optimize out the indirect call).

Fixes: 67fa3a7fdf ("crypto: arm64/sm4 - add CE implementation for CCM mode")
Fixes: ae1b83c7d5 ("crypto: arm64/sm4 - add CE implementation for GCM mode")
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-12-30 17:57:42 +08:00
Ard Biesheuvel
a428636d4c crypto: arm64/ghash-ce - use frame_push/pop macros consistently
Use the frame_push and frame_pop macros to set up the stack frame so
that return address protections will be enabled automically when
configured.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-12-09 18:45:00 +08:00
Ard Biesheuvel
489a4a05fe crypto: arm64/crct10dif - use frame_push/pop macros consistently
Use the frame_push and frame_pop macros to set up the stack frame so
that return address protections will be enabled automically when
configured.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-12-09 18:45:00 +08:00
Ard Biesheuvel
7d709af180 crypto: arm64/aes-modes - use frame_push/pop macros consistently
Use the frame_push and frame_pop macros to create the stack frames in
the AES chaining mode wrappers so that they will get PAC and/or shadow
call stack protection when configured.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-12-09 18:45:00 +08:00
Ard Biesheuvel
67ab02dce3 crypto: arm64/aes-neonbs - use frame_push/pop consistently
Use the frame_push and frame_pop macros consistently to create the stack
frame, so that we will get PAC and/or shadow call stack handling as well
when enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-12-09 18:45:00 +08:00
Herbert Xu
14386d4713 crypto: Prepare to move crypto_tfm_ctx
The helper crypto_tfm_ctx is only used by the Crypto API algorithm
code and should really be in algapi.h.  However, for historical
reasons many files relied on it to be in crypto.h.  This patch
changes those files to use algapi.h instead in prepartion for a
move.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-12-02 18:12:40 +08:00
Eric Biggers
be8f6b6496 crypto: arm64/sm3 - fix possible crash with CFI enabled
sm3_neon_transform() is called via indirect function calls.  Therefore
it needs to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause
its type hash to be emitted when the kernel is built with
CONFIG_CFI_CLANG=y.  Otherwise, the code crashes with a CFI failure (if
the compiler didn't happen to optimize out the indirect call).

Fixes: c50d32859e ("arm64: Add types to indirect called assembly functions")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-25 17:39:19 +08:00
Eric Biggers
e5e1c67e2f crypto: arm64/nhpoly1305 - eliminate unnecessary CFI wrapper
Since the CFI implementation now supports indirect calls to assembly
functions, take advantage of that rather than use a wrapper function.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-25 17:39:19 +08:00
Tianjia Zhang
824db5cd1e crypto: arm64 - Fix unused variable compilation warnings of cpu_feature
The cpu feature defined by MODULE_DEVICE_TABLE is only referenced when
compiling as a module, and the warning of unused variable will be
encountered when compiling with intree. The warning can be removed by
adding the __maybe_unused flag.

Fixes: 03c9a333fe ("crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL")
Fixes: ae1b83c7d5 ("crypto: arm64/sm4 - add CE implementation for GCM mode")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-18 16:59:34 +08:00
Ard Biesheuvel
61c581a46a crypto: move gf128mul library into lib/crypto
The gf128mul library does not depend on the crypto API at all, so it can
be moved into lib/crypto. This will allow us to use it in other library
code in a subsequent patch without having to depend on CONFIG_CRYPTO.

While at it, change the Kconfig symbol name to align with other crypto
library implementations. However, the source file name is retained, as
it is reflected in the module .ko filename, and changing this might
break things for users.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-11 18:14:59 +08:00
Tianjia Zhang
ae1b83c7d5 crypto: arm64/sm4 - add CE implementation for GCM mode
This patch is a CE-optimized assembly implementation for GCM mode.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224
modes of tcrypt, and compared the performance before and after this patch (the
driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)).
The abscissas are blocks of different lengths. The data is tabulated and the
unit is Mb/s:

Before (gcm_base(ctr-sm4-ce,ghash-generic)):

gcm(sm4)     |     16      64      256      512     1024     1420     4096     8192
-------------+---------------------------------------------------------------------
  GCM enc    |  25.24   64.65   104.66   116.69   123.81   125.12   129.67   130.62
  GCM dec    |  25.40   64.80   104.74   116.70   123.81   125.21   129.68   130.59
  GCM mb enc |  24.95   64.06   104.20   116.38   123.55   124.97   129.63   130.61
  GCM mb dec |  24.92   64.00   104.13   116.34   123.55   124.98   129.56   130.48

After:

gcm-sm4-ce   |     16      64      256      512     1024     1420     4096     8192
-------------+---------------------------------------------------------------------
  GCM enc    | 108.62  397.18   971.60  1283.92  1522.77  1513.39  1777.00  1806.96
  GCM dec    | 116.36  398.14  1004.27  1319.11  1624.21  1635.43  1932.54  1974.20
  GCM mb enc | 107.13  391.79   962.05  1274.94  1514.76  1508.57  1769.07  1801.58
  GCM mb dec | 113.40  389.36   988.51  1307.68  1619.10  1631.55  1931.70  1970.86

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-04 17:35:44 +08:00