Use cpu_to_be32 instead of be32_to_cpu in img_hash_read_result_queue
to silence sparse. The generated code should be identical.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Perform a cache flush on the SEV-ES TMR memory after allocation to prevent
any possibility of the firmware encountering an error should dirty cache
lines be present. Use clflush_cache_range() to flush the SEV-ES TMR memory.
Fixes: 97f9ac3db6 ("crypto: ccp - Add support for SEV-ES to the PSP driver")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ZLIB format (RFC 1950) is made of deflate compressed data surrounded
by a header and a footer. The QAT accelerators support only the deflate
algorithm, therefore the header and the footer need to be inserted in
software.
This adds logic in the QAT driver to support the ZLIB format. In
particular:
* Generalize the function qat_comp_alg_compress_decompress() to allow
skipping an initial region (header) of the source and/or destination
scatter lists.
* Add logic to register the qat_zlib_deflate algorithm into the acomp
framework.
* For ZLIB compression, skip the initial portion of the destination
buffer before sending the job to the QAT accelerator and insert the
ZLIB header and footer in the callback, after the QAT request has
been processed.
* For ZLIB decompression, parse the header in the input buffer
provided by the user and verify its validity before attempting the
decompression of the buffer with QAT. Then submit the buffer to QAT
for decompression. In the callback verify the correctness of the
footer by comparing the value of the ADLER produced by QAT with the
one in the destination buffer.
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Extend qat_bl_sgl_to_bufl() to allow skipping the mapping of a region
of the source and the destination scatter lists starting from byte
zero.
This is to support the ZLIB format (RFC 1950) in the qat driver.
The ZLIB format is made of deflate compressed data surrounded by a
header and a footer. The QAT accelerators support only the deflate
algorithm, therefore the header should not be mapped since it is
inserted in software.
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit 1ca2809897.
While the akcipher API as a whole is designed to be called only
from thread context, its completion path is still called from
softirq context as usual. Therefore we must not use GFP_KERNEL
on that path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently the ecb/cbc macros hold fpu context unnecessarily when using
scalar cipher routines (e.g. when handling odd sizes of blocks per walk).
Change the macros to drop fpu context as soon as the fpu is out of use.
No performance impact found (on Intel Haswell).
Signed-off-by: Peter Lafreniere <peter@n8pjl.ca>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As it is xts only handles the special return value of EINPROGRESS,
which means that in all other cases it will free data related to the
request.
However, as the caller of xts may specify MAY_BACKLOG, we also need
to expect EBUSY and treat it in the same way. Otherwise backlogged
requests will trigger a use-after-free.
Fixes: 8083b1bf81 ("crypto: xts - add support for ciphertext stealing")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of doing saving and restoring on the AEAD request object
for fallback processing, use a subrequest instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Previously the child skcipher request was stored on the stack and
therefore needed to be zeroed. As it is now dynamically allocated
we no longer need to do so.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
According to FIPS 140-3 IG, section D.R "Hash Functions Acceptable for
Use in the SP 800-90A DRBGs", modules certified after May 16th, 2023
must not support the use of: SHA-224, SHA-384, SHA512-224, SHA512-256,
SHA3-224, SHA3-384. Disallow HMAC and HASH DRBGs using SHA-384 in FIPS
mode.
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Reviewed-by: Stephan Müller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add comments next to the version data MMIO register values to identify
the register name being used.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The minimum version of binutils for kernel build is currently 2.23 and
it doesn't support GFNI.
So, it fails to build the aria-avx512 if the old binutils is used.
aria-avx512 requires GFNI, so it should not be allowed to build if the
old binutils is used.
The AS_AVX512 and AS_GFNI are added to the Kconfig to disable build
aria-avx512 if the old binutils is used.
Fixes: c970d42001 ("crypto: x86/aria - implement aria-avx512")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The minimum version of binutils for kernel build is currently 2.23 and
it doesn't support GFNI.
So, it fails to build the aria-avx2 if the old binutils is used.
The code using GFNI is an optional part of aria-avx2.
So, it disables GFNI part in it when the old binutils is used.
Fixes: 37d8d3ae7a ("crypto: x86/aria - implement aria-avx2")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The minimum version of binutils for kernel build is currently 2.23 and
it doesn't support GFNI.
So, it fails to build the aria-avx if the old binutils is used.
The code using GFNI is an optional part of aria-avx.
So, it disables GFNI part in it when the old binutils is used.
In order to check whether the using binutils is supporting GFNI or not,
AS_GFNI is added.
Fixes: ba3579e6e4 ("crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As it is seqiv only handles the special return value of EINPROGERSS,
which means that in all other cases it will free data related to the
request.
However, as the caller of seqiv may specify MAY_BACKLOG, we also need
to expect EBUSY and treat it in the same way. Otherwise backlogged
requests will trigger a use-after-free.
Fixes: 0a270321db ("[CRYPTO] seqiv: Add Sequence Number IV Generator")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As it is essiv only handles the special return value of EINPROGERSS,
which means that in all other cases it will free data related to the
request.
However, as the caller of essiv may specify MAY_BACKLOG, we also need
to expect EBUSY and treat it in the same way. Otherwise backlogged
requests will trigger a use-after-free.
Fixes: be1eb7f78a ("crypto: essiv - create wrapper template...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Defines prefixed with "CONFIG" should be limited to proper Kconfig options,
that are introduced in a Kconfig file.
Here, a definition for the driver's configuration zone is named
CONFIG_ZONE. Rename this local definition to CONFIGURATION_ZONE to avoid
defines prefixed with "CONFIG".
No functional change.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
While reviewing dependencies in some Kconfig files, I noticed the redundant
dependency "depends on PCI && PCI_MSI". The config PCI_MSI has always,
since its introduction, been dependent on the config PCI. So, it is
sufficient to just depend on PCI_MSI, and know that the dependency on PCI
is implicitly implied.
Reduce the dependencies of configs CRYPTO_DEV_HISI_SEC2,
CRYPTO_DEV_HISI_QM, CRYPTO_DEV_HISI_ZIP and CRYPTO_DEV_HISI_HPRE.
No functional change and effective change of Kconfig dependendencies.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When reading or writing crypto buffers the inner loops can
be replaced with readsl and writesl which will on ARM result
in a tight assembly loop, speeding up encryption/decryption
a little bit. This optimization was in the Ux500 driver so
let's carry it over to the STM32 driver.
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Lionel Debieve <lionel.debieve@foss.st.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AEAD documentation conflates associated data and authentication
tags: the former (along with the ciphertext) is authenticated by the
latter. Fix the doc accordingly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
RFC4106 wraps AES in GCM mode, and can be used with larger key sizes
than 128/160 bits, just like AES itself. So add these to the tcrypt
recipe so they will be benchmarked as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for RFC4106 ESP encapsulation to the accelerated GCM
implementation. This results in a ~10% speedup for IPsec frames of
typical size (~1420 bytes) on Cortex-A53.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Revert the changes that added p10-aes-gcm:
0781bbd7ea ("crypto: p10-aes-gcm - A perl script to process PowerPC assembler source")
41a6437ab4 ("crypto: p10-aes-gcm - Supporting functions for ghash")
3b47eccaaf ("crypto: p10-aes-gcm - Supporting functions for AES")
ca68a96c37 ("crypto: p10-aes-gcm - An accelerated AES/GCM stitched implementation")
cc40379b6e ("crypto: p10-aes-gcm - Glue code for AES/GCM stitched implementation")
3c657e8689 ("crypto: p10-aes-gcm - Update Kconfig and Makefile")
These changes fail to build in many configurations and are not ready
for prime time.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
At least the D1 variant requires a separate clock for the TRNG.
Without this clock enabled, reading from /dev/hwrng reports:
sun8i-ce 3040000.crypto: DMA timeout for TRNG (tm=96) on flow 3
Experimentation shows that the necessary clock is the SoC's internal
RC oscillator. This makes sense, as noise from the oscillator can be
used as a source of entropy.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
D1 has a crypto engine similar to the one in other Allwinner SoCs.
Like H6, it has a separate MBUS clock gate.
It also requires the internal RC oscillator to be enabled for the TRNG
to return data, presumably because noise from the oscillator is used as
an entropy source. This is likely the case for earlier variants as well,
but it really only matters for H616 and newer SoCs, as H6 provides no
way to disable the internal oscillator.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ACRY Engine is designed to accelerate the throughput of
ECDSA/RSA signature and verification.
This patch aims to add ACRY RSA engine driver for hardware
acceleration.
Signed-off-by: Neal Liu <neal_liu@aspeedtech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The skcipher walk API implementation avoids scatterwalk_map() for
mapping the source and destination buffers, and invokes kmap_atomic()
directly if the buffer in question is not in low memory (which can only
happen on 32-bit architectures). This avoids some overhead on 64-bit
architectures, and most notably, permits the skcipher code to run with
preemption enabled.
Now that scatterwalk_map() has been updated to use kmap_local(), none of
this is needed, so we can simply use scatterwalk_map/unmap instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This perl code is taken from the OpenSSL project and added gcm_init_htable function
used in the p10-aes-gcm-glue.c code to initialize hash table. gcm_hash_p8 is used
to hash encrypted data blocks.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This code is taken from CRYPTOGAMs[1]. The following functions are used,
aes_p8_set_encrypt_key is used to generate AES round keys and aes_p8_encrypt is used
to encrypt single block.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Improve overall performance of AES/GCM encrypt and decrypt operations
for Power10+ CPU.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Defined CRYPTO_P10_AES_GCM in Kconfig to support AES/GCM
stitched implementation for Power10+ CPU.
Added a new module driver p10-aes-gcm-crypto.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
aria-avx512 implementation uses AVX512 and GFNI.
It supports 64way parallel processing.
So, byteslicing code is changed to support 64way parallel.
And it exports some aria-avx2 functions such as encrypt() and decrypt().
AVX and AVX2 have 16 registers.
They should use memory to store/load state because of lack of registers.
But AVX512 supports 32 registers.
So, it doesn't require store/load in the s-box layer.
It means that it can reduce overhead of store/load in the s-box layer.
Also code become much simpler.
Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100:
ARIA-AVX512(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx512) encryption
tcrypt: 1 operation in 1504 cycles (1024 bytes)
tcrypt: 1 operation in 4595 cycles (4096 bytes)
tcrypt: 1 operation in 1763 cycles (1024 bytes)
tcrypt: 1 operation in 5540 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx512) decryption
tcrypt: 1 operation in 1502 cycles (1024 bytes)
tcrypt: 1 operation in 4615 cycles (4096 bytes)
tcrypt: 1 operation in 1759 cycles (1024 bytes)
tcrypt: 1 operation in 5554 cycles (4096 bytes)
ARIA-AVX2 with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption
tcrypt: 1 operation in 2003 cycles (1024 bytes)
tcrypt: 1 operation in 5867 cycles (4096 bytes)
tcrypt: 1 operation in 2358 cycles (1024 bytes)
tcrypt: 1 operation in 7295 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption
tcrypt: 1 operation in 2004 cycles (1024 bytes)
tcrypt: 1 operation in 5956 cycles (4096 bytes)
tcrypt: 1 operation in 2409 cycles (1024 bytes)
tcrypt: 1 operation in 7564 cycles (4096 bytes)
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
aria-avx2 implementation uses AVX2, AES-NI, and GFNI.
It supports 32way parallel processing.
So, byteslicing code is changed to support 32way parallel.
And it exports some aria-avx functions such as encrypt() and decrypt().
There are two main logics, s-box layer and diffusion layer.
These codes are the same as aria-avx implementation.
But some instruction are exchanged because they don't support 256bit
registers.
Also, AES-NI doesn't support 256bit register.
So, aesenclast and aesdeclast are used twice like below:
vextracti128 $1, ymm0, xmm6;
vaesenclast xmm7, xmm0, xmm0;
vaesenclast xmm7, xmm6, xmm6;
vinserti128 $1, xmm6, ymm0, ymm0;
Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100:
ARIA-AVX2 with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption
tcrypt: 1 operation in 2003 cycles (1024 bytes)
tcrypt: 1 operation in 5867 cycles (4096 bytes)
tcrypt: 1 operation in 2358 cycles (1024 bytes)
tcrypt: 1 operation in 7295 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption
tcrypt: 1 operation in 2004 cycles (1024 bytes)
tcrypt: 1 operation in 5956 cycles (4096 bytes)
tcrypt: 1 operation in 2409 cycles (1024 bytes)
tcrypt: 1 operation in 7564 cycles (4096 bytes)
ARIA-AVX with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption
tcrypt: 1 operation in 2761 cycles (1024 bytes)
tcrypt: 1 operation in 9390 cycles (4096 bytes)
tcrypt: 1 operation in 3401 cycles (1024 bytes)
tcrypt: 1 operation in 11876 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption
tcrypt: 1 operation in 2735 cycles (1024 bytes)
tcrypt: 1 operation in 9424 cycles (4096 bytes)
tcrypt: 1 operation in 3369 cycles (1024 bytes)
tcrypt: 1 operation in 11954 cycles (4096 bytes)
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
aria-avx assembly code accesses members of aria_ctx with magic number
offset. If the shape of struct aria_ctx is changed carelessly,
aria-avx will not work.
So, we need to ensure accessing members of aria_ctx with correct
offset values, not with magic numbers.
It adds ARIA_CTX_enc_key, ARIA_CTX_dec_key, and ARIA_CTX_rounds in the
asm-offsets.c So, correct offset definitions will be generated.
aria-avx assembly code can access members of aria_ctx safely with
these definitions.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>