linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-27 09:02:25 -04:00

Author	SHA1	Message	Date
Linus Torvalds	37b33c68b0	Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: - Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be directly accessible via the library API, instead of requiring the crypto API. This is much simpler and more efficient. - Convert some users such as ext4 to use the CRC32 library API instead of the crypto API. More conversions like this will come later. - Add a KUnit test that tests and benchmarks multiple CRC variants. Remove older, less-comprehensive tests that are made redundant by this. - Add an entry to MAINTAINERS for the kernel's CRC library code. I'm volunteering to maintain it. I have additional cleanups and optimizations planned for future cycles. * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits) MAINTAINERS: add entry for CRC library powerpc/crc: delete obsolete crc-vpmsum_test.c lib/crc32test: delete obsolete crc32test.c lib/crc16_kunit: delete obsolete crc16_kunit.c lib/crc_kunit.c: add KUnit test suite for CRC library functions powerpc/crc-t10dif: expose CRC-T10DIF function through lib arm64/crc-t10dif: expose CRC-T10DIF function through lib arm/crc-t10dif: expose CRC-T10DIF function through lib x86/crc-t10dif: expose CRC-T10DIF function through lib crypto: crct10dif - expose arch-optimized lib function lib/crc-t10dif: add support for arch overrides lib/crc-t10dif: stop wrapping the crypto API scsi: target: iscsi: switch to using the crc32c library f2fs: switch to using the crc32 library jbd2: switch to using the crc32c library ext4: switch to using the crc32c library lib/crc32: make crc32c() go directly to lib bcachefs: Explicitly select CRYPTO from BCACHEFS_FS x86/crc32: expose CRC32 functions through lib x86/crc32: update prototype for crc32_pclmul_le_16() ...	2025-01-22 19:55:08 -08:00
Peter Zijlstra	cdd30ebb1b	module: Convert symbol namespace to string literal Clean up the existing export namespace code along the same lines of commit `33def8498f` ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS \| while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS$([^)])$/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(])$([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(])\(([^,]+), ([^)]+)$/ && $0 !~ /(EXPORT_SYMBOL_NS[^(])/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(])$([^,]+), ([^)]+)$/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-12-02 11:34:44 -08:00
Eric Biggers	2051da8585	arm64/crc-t10dif: expose CRC-T10DIF function through lib Move the arm64 CRC-T10DIF assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed. Note: to see the diff from arch/arm64/crypto/crct10dif-ce-glue.c to arch/arm64/lib/crc-t10dif-glue.c, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241202012056.209768-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-12-01 17:23:13 -08:00
Linus Torvalds	02b2f1a7b8	Merge tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add sig driver API - Remove signing/verification from akcipher API - Move crypto_simd_disabled_for_test to lib/crypto - Add WARN_ON for return values from driver that indicates memory corruption Algorithms: - Provide crc32-arch and crc32c-arch through Crypto API - Optimise crc32c code size on x86 - Optimise crct10dif on arm/arm64 - Optimise p10-aes-gcm on powerpc - Optimise aegis128 on x86 - Output full sample from test interface in jitter RNG - Retry without padata when it fails in pcrypt Drivers: - Add support for Airoha EN7581 TRNG - Add support for STM32MP25x platforms in stm32 - Enable iproc-r200 RNG driver on BCMBCA - Add Broadcom BCM74110 RNG driver" * tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits) crypto: marvell/cesa - fix uninit value for struct mv_cesa_op_ctx crypto: cavium - Fix an error handling path in cpt_ucode_load_fw() crypto: aesni - Move back to module_init crypto: lib/mpi - Export mpi_set_bit crypto: aes-gcm-p10 - Use the correct bit to test for P10 hwrng: amd - remove reference to removed PPC_MAPLE config crypto: arm/crct10dif - Implement plain NEON variant crypto: arm/crct10dif - Macroify PMULL asm code crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply crypto: arm64/crct10dif - Remove obsolete chunking logic crypto: bcm - add error check in the ahash_hmac_init function crypto: caam - add error check to caam_rsa_set_priv_key_form hwrng: bcm74110 - Add Broadcom BCM74110 RNG driver dt-bindings: rng: add binding for BCM74110 RNG padata: Clean up in padata_do_multithreaded() crypto: inside-secure - Fix the return value of safexcel_xcbcmac_cra_init() crypto: qat - Fix missing destroy_workqueue in adf_init_aer() crypto: rsassa-pkcs1 - Reinstate support for legacy protocols ...	2024-11-19 10:28:41 -08:00
Ard Biesheuvel	779cee8209	crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code The only remaining user of the fallback implementation of 64x64 polynomial multiplication using 8x8 PMULL instructions is the final reduction from a 16 byte vector to a 16-bit CRC. The fallback code is complicated and messy, and this reduction has little impact on the overall performance, so instead, let's calculate the final CRC by passing the 16 byte vector to the generic CRC-T10DIF implementation when running the fallback version. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-11-15 19:52:51 +08:00
Ard Biesheuvel	67dfb1b73f	crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply The CRC-T10DIF implementation for arm64 has a version that uses 8x8 polynomial multiplication, for cores that lack the crypto extensions, which cover the 64x64 polynomial multiplication instruction that the algorithm was built around. This fallback version rather naively adopted the 64x64 polynomial multiplication algorithm that I ported from ARM for the GHASH driver, which needs 8 PMULL8 instructions to implement one PMULL64. This is reasonable, given that each 8-bit vector element needs to be multiplied with each element in the other vector, producing 8 vectors with partial results that need to be combined to yield the correct result. However, most PMULL64 invocations in the CRC-T10DIF code involve multiplication by a pair of 16-bit folding coefficients, and so all the partial results from higher order bytes will be zero, and there is no need to calculate them to begin with. Then, the CRC-T10DIF algorithm always XORs the output values of the PMULL64 instructions being issued in pairs, and so there is no need to faithfully implement each individual PMULL64 instruction, as long as XORing the results pairwise produces the expected result. Implementing these improvements results in a speedup of 3.3x on low-end platforms such as Raspberry Pi 4 (Cortex-A72) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-11-15 19:52:51 +08:00
Ard Biesheuvel	7048c21e6b	crypto: arm64/crct10dif - Remove obsolete chunking logic This is a partial revert of commit `fc754c024a`, which moved the logic into C code which ensures that kernel mode NEON code does not hog the CPU for too long. This is no longer needed now that kernel mode NEON no longer disables preemption, so we can drop this. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-11-15 19:52:51 +08:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Jia He	9369693a2c	crypto: arm64/poly1305 - move data to rodata section When objtool gains support for ARM in the future, it may encounter issues disassembling the following data in the .text section: > .Lzeros: > .long 0,0,0,0,0,0,0,0 > .asciz "Poly1305 for ARMv8, CRYPTOGAMS by \@dot-asm" > .align 2 Move it to .rodata which is a more appropriate section for read-only data. There is a limit on how far the label can be from the instruction, hence use "adrp" and low 12bits offset of the label to avoid the compilation error. Signed-off-by: Jia He <justin.he@arm.com> Tested-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-08-17 13:55:49 +08:00
Herbert Xu	b0cd6f4c3f	Revert "crypto: arm64/poly1305 - move data to rodata section" This reverts commit `47d9625209`. It causes build issues as detected by the kernel test robot. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408040817.OWKXtCv6-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-08-06 13:45:59 +08:00
Jia He	47d9625209	crypto: arm64/poly1305 - move data to rodata section When objtool gains support for ARM in the future, it may encounter issues disassembling the following data in the .text section: > .Lzeros: > .long 0,0,0,0,0,0,0,0 > .asciz "Poly1305 for ARMv8, CRYPTOGAMS by \@dot-asm" > .align 2 Move it to .rodata which is a more appropriate section for read-only data. Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-08-02 21:11:20 +08:00
Jeff Johnson	b568826eff	crypto: arm64 - add missing MODULE_DESCRIPTION() macros With ARCH=arm64, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/crct10dif-ce.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/poly1305-neon.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/aes-neon-bs.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-06-21 22:04:16 +10:00
Mark Brown	a720de9fba	crypto: arm64/crc10dif - Raise priority of NEON crct10dif implementation The NEON implementation of crctd10dif is registered with a priority of 100 which is identical to that used by the generic C implementation. Raise the priority to 150, half way between the PMULL based implementation and the NEON one, so that it will be preferred over the generic implementation. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-05-31 17:34:56 +08:00
Ard Biesheuvel	571e557cba	crypto: arm64/aes-ce - Simplify round key load sequence Tweak the round key logic so that they can be loaded using a single branchless sequence using overlapping loads. This is shorter and simpler, and puts the conditional branches based on the key size further apart, which might benefit microarchitectures that cannot record taken branches at every instruction. For these branches, use test-bit-branch instructions that don't clobber the condition flags. Note that none of this has any impact on performance, positive or otherwise (and the branch prediction benefit would only benefit AES-192 which nobody uses). It does make for nicer code, though. While at it, use \@ to generate the labels inside the macros, which is more robust than using fixed numbers, which could clash inadvertently. Also, bring aes-neon.S in line with these changes, including the switch to test-and-branch instructions, to avoid surprises in the future when we might start relying on the condition flags being preserved in the chaining mode wrappers in aes-modes.S Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-04-26 17:26:09 +08:00
Linus Torvalds	c8e7699616	Merge tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Avoid unnecessary copying in scomp for trivial SG lists Algorithms: - Optimise NEON CCM implementation on ARM64 Drivers: - Add queue stop/query debugfs support in hisilicon/qm - Intel qat updates and cleanups" * tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (79 commits) Revert "crypto: remove CONFIG_CRYPTO_STATS" crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem crypto: tcrypt - add ffdhe2048(dh) test crypto: iaa - fix the missing CRYPTO_ALG_ASYNC in cra_flags crypto: hisilicon/zip - fix the missing CRYPTO_ALG_ASYNC in cra_flags hwrng: hisi - use dev_err_probe MAINTAINERS: Remove T Ambarus from few mchp entries crypto: iaa - Fix comp/decomp delay statistics crypto: iaa - Fix async_disable descriptor leak dt-bindings: rng: atmel,at91-trng: add sam9x7 TRNG dt-bindings: crypto: add sam9x7 in Atmel TDES dt-bindings: crypto: add sam9x7 in Atmel SHA dt-bindings: crypto: add sam9x7 in Atmel AES crypto: remove CONFIG_CRYPTO_STATS crypto: dh - Make public key test FIPS-only crypto: rockchip - fix to check return value crypto: jitter - fix CRYPTO_JITTERENTROPY help text crypto: qat - make ring to service map common for QAT GEN4 crypto: qat - fix ring to service map for dcc in 420xx crypto: qat - fix ring to service map for dcc in 4xxx ...	2024-03-15 14:46:54 -07:00
Ard Biesheuvel	1c0cf6d196	crypto: arm64/neonbs - fix out-of-bounds access on short input The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with. It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code. The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs. Fixes: `fc074e1300` ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk") Cc: <stable@vger.kernel.org> Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-02-24 08:37:24 +08:00
Ard Biesheuvel	f691d444f9	crypto: arm64/aes-ccm - Merge finalization into en/decrypt asm helpers The C glue code already infers whether or not the current iteration is the final one, by comparing walk.nbytes with walk.total. This means we can easily inform the asm helpers of this as well, by conditionally passing a pointer to the original IV, which is used in the finalization of the MAC. This removes the need for a separate call into the asm code to perform the finalization. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	7150528849	crypto: arm64/aes-ccm - Merge encrypt and decrypt tail handling The encryption and decryption code paths are mostly identical, except for a small difference where the plaintext input into the MAC is taken from either the input or the output block. We can factor this in quite easily using a vector bit select, and a few additional XORs, without the need for branches. This way, we can use the same tail handling logic on the encrypt and decrypt code paths, allowing further consolidation of the asm helpers in a subsequent patch. (In the main loop, adding just a handful of ALU instructions results in a noticeable performance hit [around 5% on Apple M2], so those routines are kept separate) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	565def1542	crypto: arm64/aes-ccm - Cache round keys and unroll AES loops The CCM code as originally written attempted to use as few NEON registers as possible, to avoid having to eagerly preserve/restore the entire NEON register file at every call to kernel_neon_begin/end. At that time, this API took a number of NEON registers as a parameter, and only preserved that many registers. Today, the NEON register file is restored lazily, and the old API is long gone. This means we can use as many NEON registers as we can make meaningful use of, which means in the AES case that we can keep all round keys in registers rather than reloading each of them for each AES block processed. On Cortex-A53, this results in a speedup of more than 50%. (From 4 cycles per byte to 2.6 cycles per byte) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	948ffc66e5	crypto: arm64/aes-ccm - Reuse existing MAC update for AAD input CCM combines the counter (CTR) encryption mode with a MAC based on the same block cipher. This MAC construction is a bit clunky: it invokes the block cipher in a way that cannot be parallelized, resulting in poor CPU pipeline efficiency. The arm64 CCM code mitigates this by interleaving the encryption and MAC at the AES round level, resulting in a substantial speedup. But this approach does not apply to the additional authenticated data (AAD) which is not encrypted. This means the special asm routine dealing with the AAD is not any better than the MAC update routine used by the arm64 AES block encryption driver, so let's reuse that, and drop the special AES-CCM version. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	c131098d6d	crypto: arm64/aes-ccm - Replace bytewise tail handling with NEON permute Implement the CCM tail handling using a single sequence that uses permute vectors and overlapping loads and stores, rather than going over the tail byte by byte in a loop, and using scalar operations. This is more efficient, even though the measured speedup is only around 1-2% on the CPUs I have tried. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	97c4c10daf	crypto: arm64/aes-ccm - Pass short inputs via stack buffer In preparation for optimizing the CCM core asm code using permutation vectors and overlapping loads and stores, ensure that inputs shorter than the size of a AES block are passed via a buffer on the stack, in a way that positions the data at the end of a 16 byte buffer. This removes the need for the asm code to reason about a rare corner case where the tail of the data cannot be read/written using a single NEON load/store instruction. While at it, tweak the copyright header and authorship to bring it up to date. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	88c6d50f64	crypto: arm64/aes-ccm - Keep NEON enabled during skcipher walk Now that kernel mode NEON no longer disables preemption, we no longer have to take care to disable and re-enable use of the NEON when calling into the skcipher walk API. So just keep it enabled until done. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	f722002441	crypto: arm64/aes-ccm - Revert "Rewrite skcipher walker loop" This reverts commit `57ead1bf1c`, which updated the CCM code to only rely on walk.nbytes to check for failures returned from the skcipher walk API, mostly for the common good rather than to fix a particular problem in the code. This change introduces a problem of its own: the skcipher walk is started with the 'atomic' argument set to false, which means that the skcipher walk API is permitted to sleep. Subsequently, it invokes skcipher_walk_done() with preemption disabled on the final iteration of the loop. This appears to work by accident, but it is arguably a bad example, and providing a better example was the point of the original patch. Given that future changes to the CCM code will rely on the original behavior of entering the loop even for zero sized inputs, let's just revert this change entirely, and proceed from there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Herbert Xu	29fa12e918	crypto: arm64/sm4 - Remove cfb(sm4) Remove the unused CFB implementation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-12-08 11:59:45 +08:00
Eric Biggers	1be7505933	crypto: arm64/sha512 - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha512_block_data_order() got this backwards. Fix this, albeit without changing the name in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:26 +08:00
Eric Biggers	455951b5e1	crypto: arm64/sha256 - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha256_block_data_order() and __sha256_block_neon() got this backwards. Fix this, albeit without changing the names in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:26 +08:00
Eric Biggers	5f720a3df3	crypto: arm64/sha512-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha512_ce_transform() and __sha512_block_data_order() got this backwards. Fix this, albeit without changing "sha512_block_data_order" in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	ba30d31121	crypto: arm64/sha2-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha2_ce_transform() and __sha256_block_data_order() got this backwards. Fix this, albeit without changing "sha256_block_data_order" in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	1f9f3a5218	crypto: arm64/sha1-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha1_ce_transform() got this backwards. Fix this. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	ddefde7b2a	crypto: arm64/nhpoly1305 - implement ->digest Implement the ->digest method to improve performance on single-page messages by reducing the number of indirect calls. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	1efcbf0eff	crypto: arm64/sha2-ce - implement ->digest for sha256 Implement a ->digest function for sha256-ce. This improves the performance of crypto_shash_digest() with this algorithm by reducing the number of indirect calls that are made. This only adds ~112 bytes of code, mostly for the inlined init, as the finup function is tail-called. For now, don't bother with this for sha224, since sha224 is rarely used. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Masahiro Yamada	ac2d838fb7	crypto: arm64/aes - remove Makefile hack Do it more simiply. This also fixes single target builds. [before] $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i [snip] make[4]: *** No rule to make target 'arch/arm64/crypto/aes-glue-ce.i'. Stop. [after] $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i [snip] CPP arch/arm64/crypto/aes-glue-ce.i Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-08-11 19:19:27 +08:00
Herbert Xu	f573db7aa5	crypto: arm64/sha256-glue - Include module.h Include module.h in arch/arm64/crypto/sha256-glue.c as it uses various macros (such as MODULE_AUTHOR) that are defined there. Also fix the ordering of types.h. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202305191953.PIB1w80W-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-05-19 20:56:59 +08:00
Eric Biggers	47446d7cd4	crypto: arm64/aes-neonbs - fix crash with CFI enabled aesbs_ecb_encrypt(), aesbs_ecb_decrypt(), aesbs_xts_encrypt(), and aesbs_xts_decrypt() are called via indirect function calls. Therefore they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause their type hashes to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure if the compiler doesn't happen to optimize out the indirect calls. Fixes: `c50d32859e` ("arm64: Add types to indirect called assembly functions") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-03-14 17:06:44 +08:00
Herbert Xu	4e4a08868f	crypto: arm64/sm4-gcm - Fix possible crash in GCM cryption An often overlooked aspect of the skcipher walker API is that an error is not just indicated by a non-zero return value, but by the fact that walk->nbytes is zero. Thus it is an error to call skcipher_walk_done after getting back walk->nbytes == 0 from the previous interaction with the walker. This is because when walk->nbytes is zero the walker is left in an undefined state and any further calls to it may try to free uninitialised stack memory. The sm4 arm64 ccm code gets this wrong and ends up calling skcipher_walk_done even when walk->nbytes is zero. This patch rewrites the loop in a form that resembles other callers. Reported-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Tianjia Zhang	3b9d902153	crypto: arm64/sm4-ccm - Rewrite skcipher walker loop The fact that an error in the skcipher walker API are indicated not only by a non-zero return value, but also by the fact that walk->nbytes is zero, causes the layout of the skcipher walker loop to be sufficiently different from the usual layout, which is not a problem in itself, but it is likely to cause reading confusion and difficulty in code maintenance. This patch rewrites skcipher walker loop, and separates the last chunk cryption from the loop to avoid wrong calls to the skcipher walker API. In addition to following the usual convention of checking walk->nbytes, it also makes the loop execute logic clearer and easier to understand. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Herbert Xu	57ead1bf1c	crypto: arm64/aes-ccm - Rewrite skcipher walker loop An often overlooked aspect of the skcipher walker API is that an error is not just indicated by a non-zero return value, but by the fact that walk->nbytes is zero. Thus it is an error to call skcipher_walk_done after getting back walk->nbytes == 0 from the previous interaction with the walker. This is because when walk->nbytes is zero the walker is left in an undefined state and any further calls to it may try to free uninitialised stack memory. The arm64 ccm code has to deal with zero-length messages, and it needs to process data even when walk->nbytes == 0 is returned. It doesn't have this bug because there is an explicit check for walk->nbytes != 0 prior to the skcipher_walk_done call. However, the loop is still sufficiently different from the usual layout and it appears to have been copied into other code which then ended up with this bug. This patch rewrites it to follow the usual convention of checking walk->nbytes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Ard Biesheuvel	9e3457112b	crypto: arm64/gcm - add RFC4106 support Add support for RFC4106 ESP encapsulation to the accelerated GCM implementation. This results in a ~10% speedup for IPsec frames of typical size (~1420 bytes) on Cortex-A53. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-01-20 18:29:31 +08:00
Tianjia Zhang	736f88689c	crypto: arm64/sm4 - fix possible crash with CFI enabled The SM4 CCM/GCM assembly functions for encryption and decryption is called via indirect function calls. Therefore they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure (if the compiler didn't happen to optimize out the indirect call). Fixes: `67fa3a7fdf` ("crypto: arm64/sm4 - add CE implementation for CCM mode") Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-30 17:57:42 +08:00
Ard Biesheuvel	a428636d4c	crypto: arm64/ghash-ce - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to set up the stack frame so that return address protections will be enabled automically when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	489a4a05fe	crypto: arm64/crct10dif - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to set up the stack frame so that return address protections will be enabled automically when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	7d709af180	crypto: arm64/aes-modes - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to create the stack frames in the AES chaining mode wrappers so that they will get PAC and/or shadow call stack protection when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	67ab02dce3	crypto: arm64/aes-neonbs - use frame_push/pop consistently Use the frame_push and frame_pop macros consistently to create the stack frame, so that we will get PAC and/or shadow call stack handling as well when enabled. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Herbert Xu	14386d4713	crypto: Prepare to move crypto_tfm_ctx The helper crypto_tfm_ctx is only used by the Crypto API algorithm code and should really be in algapi.h. However, for historical reasons many files relied on it to be in crypto.h. This patch changes those files to use algapi.h instead in prepartion for a move. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-02 18:12:40 +08:00
Eric Biggers	be8f6b6496	crypto: arm64/sm3 - fix possible crash with CFI enabled sm3_neon_transform() is called via indirect function calls. Therefore it needs to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure (if the compiler didn't happen to optimize out the indirect call). Fixes: `c50d32859e` ("arm64: Add types to indirect called assembly functions") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:19 +08:00
Eric Biggers	e5e1c67e2f	crypto: arm64/nhpoly1305 - eliminate unnecessary CFI wrapper Since the CFI implementation now supports indirect calls to assembly functions, take advantage of that rather than use a wrapper function. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:19 +08:00
Tianjia Zhang	824db5cd1e	crypto: arm64 - Fix unused variable compilation warnings of cpu_feature The cpu feature defined by MODULE_DEVICE_TABLE is only referenced when compiling as a module, and the warning of unused variable will be encountered when compiling with intree. The warning can be removed by adding the __maybe_unused flag. Fixes: `03c9a333fe` ("crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL") Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Ard Biesheuvel	61c581a46a	crypto: move gf128mul library into lib/crypto The gf128mul library does not depend on the crypto API at all, so it can be moved into lib/crypto. This will allow us to use it in other library code in a subsequent patch without having to depend on CONFIG_CRYPTO. While at it, change the Kconfig symbol name to align with other crypto library implementations. However, the source file name is retained, as it is reflected in the module .ko filename, and changing this might break things for users. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Tianjia Zhang	ae1b83c7d5	crypto: arm64/sm4 - add CE implementation for GCM mode This patch is a CE-optimized assembly implementation for GCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (gcm_base(ctr-sm4-ce,ghash-generic)): gcm(sm4) \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 25.24 64.65 104.66 116.69 123.81 125.12 129.67 130.62 GCM dec \| 25.40 64.80 104.74 116.70 123.81 125.21 129.68 130.59 GCM mb enc \| 24.95 64.06 104.20 116.38 123.55 124.97 129.63 130.61 GCM mb dec \| 24.92 64.00 104.13 116.34 123.55 124.98 129.56 130.48 After: gcm-sm4-ce \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 108.62 397.18 971.60 1283.92 1522.77 1513.39 1777.00 1806.96 GCM dec \| 116.36 398.14 1004.27 1319.11 1624.21 1635.43 1932.54 1974.20 GCM mb enc \| 107.13 391.79 962.05 1274.94 1514.76 1508.57 1769.07 1801.58 GCM mb dec \| 113.40 389.36 988.51 1307.68 1619.10 1631.55 1931.70 1970.86 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:44 +08:00

1 2 3 4 5 ...

361 Commits