Commit Graph

338 Commits

Author SHA1 Message Date
Linus Torvalds
04446eee58 Merge tag 'v6.16-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "Fix a loongarch header regression and a module name collision on s390"

* tag 'v6.16-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  asm-generic: Add sched.h inclusion in simd.h
  crypto: s390/sha256 - rename module to sha256-s390
2025-06-03 08:03:45 -07:00
Eric Biggers
cddded9803 crypto: s390/sha256 - rename module to sha256-s390
When the s390 SHA-256 code is built as a loadable module, name it
sha256-s390.ko instead of sha256.ko.  This avoids a module name
collision with crypto/sha256.ko and makes it consistent with the other
architectures.

We should consider making a single module provide all the SHA-256
library code, which would prevent issues like this.  But for now this is
the fix that's needed.

Fixes: b9eac03edc ("crypto: s390/sha256 - implement library instead of shash")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/r/20250529110526.6d2959a9.alex.williamson@redhat.com/
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-30 20:56:48 +08:00
Linus Torvalds
d8cb068359 Merge tag 's390-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:

 - Large rework of the protected key crypto code to allow for
   asynchronous handling without memory allocation

 - Speed up system call entry/exit path by re-implementing lazy ASCE
   handling

 - Add module autoload support for the diag288_wdt watchdog device
   driver

 - Get rid of s390 specific strcpy() and strncpy() implementations, and
   switch all remaining users to strscpy() when possible

 - Various other small fixes and improvements

* tag 's390-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (51 commits)
  s390/pci: Serialize device addition and removal
  s390/pci: Allow re-add of a reserved but not yet removed device
  s390/pci: Prevent self deletion in disable_slot()
  s390/pci: Remove redundant bus removal and disable from zpci_release_device()
  s390/crypto: Extend protected key conversion retry loop
  s390/pci: Fix __pcilg_mio_inuser() inline assembly
  s390/ptrace: Always inline regs_get_kernel_stack_nth() and regs_get_register()
  s390/thread_info: Cleanup header includes
  s390/extmem: Add workaround for DCSS unload diag
  s390/crypto: Rework protected key AES for true asynch support
  s390/cpacf: Rework cpacf_pcc() to return condition code
  s390/mm: Fix potential use-after-free in __crst_table_upgrade()
  s390/mm: Add mmap_assert_write_locked() check to crst_table_upgrade()
  s390/string: Remove strcpy() implementation
  s390/con3270: Use strscpy() instead of strcpy()
  s390/boot: Use strspcy() instead of strcpy()
  s390: Simple strcpy() to strscpy() conversions
  s390/pkey/crypto: Introduce xflags param for pkey in-kernel API
  s390/pkey: Provide and pass xflags within pkey and zcrypt layers
  s390/uv: Remove uv_get_secret_metadata function
  ...
2025-05-26 14:36:05 -07:00
Linus Torvalds
14418ddcc2 Merge tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Fix memcpy_sglist to handle partially overlapping SG lists
   - Use memcpy_sglist to replace null skcipher
   - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK
   - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS
   - Hide CRYPTO_MANAGER
   - Add delayed freeing of driver crypto_alg structures

  Compression:
   - Allocate large buffers on first use instead of initialisation in scomp
   - Drop destination linearisation buffer in scomp
   - Move scomp stream allocation into acomp
   - Add acomp scatter-gather walker
   - Remove request chaining
   - Add optional async request allocation

  Hashing:
   - Remove request chaining
   - Add optional async request allocation
   - Move partial block handling into API
   - Add ahash support to hmac
   - Fix shash documentation to disallow usage in hard IRQs

  Algorithms:
   - Remove unnecessary SIMD fallback code on x86 and arm/arm64
   - Drop avx10_256 xts(aes)/ctr(aes) on x86
   - Improve avx-512 optimisations for xts(aes)
   - Move chacha arch implementations into lib/crypto
   - Move poly1305 into lib/crypto and drop unused Crypto API algorithm
   - Disable powerpc/poly1305 as it has no SIMD fallback
   - Move sha256 arch implementations into lib/crypto
   - Convert deflate to acomp
   - Set block size correctly in cbcmac

  Drivers:
   - Do not use sg_dma_len before mapping in sun8i-ss
   - Fix warm-reboot failure by making shutdown do more work in qat
   - Add locking in zynqmp-sha
   - Remove cavium/zip
   - Add support for PCI device 0x17D8 to ccp
   - Add qat_6xxx support in qat
   - Add support for RK3576 in rockchip-rng
   - Add support for i.MX8QM in caam

  Others:
   - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up
   - Add new SEV/SNP platform shutdown API in ccp"

* tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits)
  x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
  crypto: qat - add missing header inclusion
  crypto: api - Redo lookup on EEXIST
  Revert "crypto: testmgr - Add hash export format testing"
  crypto: marvell/cesa - Do not chain submitted requests
  crypto: powerpc/poly1305 - add depends on BROKEN for now
  Revert "crypto: powerpc/poly1305 - Add SIMD fallback"
  crypto: ccp - Add missing tee info reg for teev2
  crypto: ccp - Add missing bootloader info reg for pspv5
  crypto: sun8i-ce - move fallback ahash_request to the end of the struct
  crypto: octeontx2 - Use dynamic allocated memory region for lmtst
  crypto: octeontx2 - Initialize cptlfs device info once
  crypto: xts - Only add ecb if it is not already there
  crypto: lrw - Only add ecb if it is not already there
  crypto: testmgr - Add hash export format testing
  crypto: testmgr - Use ahash for generic tfm
  crypto: hmac - Add ahash support
  crypto: testmgr - Ignore EEXIST on shash allocation
  crypto: algapi - Add driver template support to crypto_inst_setname
  crypto: shash - Set reqsize in shash_alg
  ...
2025-05-26 13:47:28 -07:00
Eric Biggers
bdc2a55687 crypto: lib/chacha - add array bounds to function prototypes
Add explicit array bounds to the function prototypes for the parameters
that didn't already get handled by the conversion to use chacha_state:

- chacha_block_*():
  Change 'u8 *out' or 'u8 *stream' to u8 out[CHACHA_BLOCK_SIZE].

- hchacha_block_*():
  Change 'u32 *out' or 'u32 *stream' to u32 out[HCHACHA_OUT_WORDS].

- chacha_init():
  Change 'const u32 *key' to 'const u32 key[CHACHA_KEY_WORDS]'.
  Change 'const u8 *iv' to 'const u8 iv[CHACHA_IV_SIZE]'.

No functional changes.  This just makes it clear when fixed-size arrays
are expected.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:32:53 +08:00
Eric Biggers
98066f2f89 crypto: lib/chacha - strongly type the ChaCha state
The ChaCha state matrix is 16 32-bit words.  Currently it is represented
in the code as a raw u32 array, or even just a pointer to u32.  This
weak typing is error-prone.  Instead, introduce struct chacha_state:

    struct chacha_state {
            u32 x[16];
    };

Convert all ChaCha and HChaCha functions to use struct chacha_state.
No functional changes.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:32:53 +08:00
Herbert Xu
67488527af crypto: arch/sha256 - Export block functions as GPL only
Export the block functions as GPL only, there is no reason
to let arbitrary modules use these internal functions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05 18:20:45 +08:00
Herbert Xu
ef93f15628 Revert "crypto: run initcalls for generic implementations earlier"
This reverts commit c4741b2305.

Crypto API self-tests no longer run at registration time and now
occur either at late_initcall or upon the first use.

Therefore the premise of the above commit no longer exists.  Revert
it and subsequent additions of subsys_initcall and arch_initcall.

Note that lib/crypto calls will stay at subsys_initcall (or rather
downgraded from arch_initcall) because they may need to occur
before Crypto API registration.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05 18:20:44 +08:00
Eric Biggers
b9eac03edc crypto: s390/sha256 - implement library instead of shash
Instead of providing crypto_shash algorithms for the arch-optimized
SHA-256 code, instead implement the SHA-256 library.  This is much
simpler, it makes the SHA-256 library functions be arch-optimized, and
it fixes the longstanding issue where the arch-optimized SHA-256 was
disabled by default.  SHA-256 still remains available through
crypto_shash, but individual architectures no longer need to handle it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05 18:20:43 +08:00
Heiko Carstens
de6b4f9901 s390/string: Remove strcpy() implementation
Remove the optimized strcpy() library implementation. This doesn't make any
difference since gcc recognizes all strcpy() usages anyway and uses the
builtin variant. There is not a single branch to strcpy() within the
generated kernel image, which also seems to be the reason why most other
architectures don't have a strcpy() implementation anymore.

Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30 11:41:28 +02:00
Eric Biggers
fa7ed85c9b s390/crc: drop "glue" from filenames
The use of the term "glue" in filenames is a Crypto API-ism that does
not show up elsewhere in lib/.  I think adopting it there was a mistake.
The library just uses standard functions, so the amount of code that
could be considered "glue" is quite small.  And while often the C
functions just wrap the assembly functions, there are also cases like
crc32c_arch() in arch/x86/lib/crc32-glue.c that blur the line by
in-lining the actual implementation into the C function.  That's not
"glue code", but rather the actual code.

Therefore, let's drop "glue" from the filenames and instead use e.g.
crc32.c instead of crc32-glue.c.

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424002038.179114-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-28 09:07:19 -07:00
Eric Biggers
fea9ad4dde s390/crc32: Remove no-op module init and exit functions
Now that the crc32-s390 module init function is a no-op, there is no
need to define it.  Remove it.  The removal of the init function also
makes the exit function unnecessary, so remove that too.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250417163829.4599-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-28 09:07:19 -07:00
Heiko Carstens
93b988cf8e s390/crc32: Remove have_vxrs static key
Replace the have_vxrs static key with a cpu_has_vx() call.  cpu_has_vx()
resolves into a compile time constant (true) if the kernel is compiled
for z13 or newer.  Otherwise it generates an unconditional one
instruction branch, which is patched based on CPU alternatives.

In any case the generated code is at least as good as before and avoids
static key handling.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250417125318.12521F12-hca@linux.ibm.com/
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-28 09:07:19 -07:00
Eric Biggers
7ef377c4d4 lib/crc: make the CPU feature static keys __ro_after_init
All of the CRC library's CPU feature static_keys are initialized by
initcalls and never change afterwards, so there's no need for them to be
in the regular .data section.  Put them in .data..ro_after_init instead.

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Link: https://lore.kernel.org/r/20250413154350.10819-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-28 09:07:19 -07:00
Eric Biggers
879f47548b crypto: lib/chacha - remove INTERNAL symbol and selection of CRYPTO
Now that the architecture-optimized ChaCha kconfig symbols are defined
regardless of CRYPTO, there is no need for CRYPTO_LIB_CHACHA to select
CRYPTO.  So, remove that.  This makes the indirection through the
CRYPTO_LIB_CHACHA_INTERNAL symbol unnecessary, so get rid of that and
just use CRYPTO_LIB_CHACHA directly.  Finally, make the fallback to the
generic implementation use a default value instead of a select; this
makes it consistent with how the arch-optimized code gets enabled and
also with how CRYPTO_LIB_BLAKE2S_GENERIC gets enabled.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:54 +08:00
Eric Biggers
3ea91323fe crypto: s390 - move library functions to arch/s390/lib/crypto/
Continue disentangling the crypto library functions from the generic
crypto infrastructure by moving the s390 ChaCha library functions into a
new directory arch/s390/lib/crypto/ that does not depend on CRYPTO.
This mirrors the distinction between crypto/ and lib/crypto/.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28 19:40:53 +08:00
Heiko Carstens
e7b3f9a058 s390/string: Remove optimized strncpy()
There are hardly any strncpy() users left, therefore drop the
optimized s390 variant.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-17 15:16:10 +02:00
Heiko Carstens
8b72f5a97b s390/mm: Reimplement lazy ASCE handling
Reduce system call overhead time (round trip time for invoking a
non-existent system call) by 25%.

With the removal of set_fs() [1] lazy control register handling was removed
in order to keep kernel entry and exit simple. However this made system
calls slower.

With the conversion to generic entry [2] and numerous follow up changes
which simplified the entry code significantly, adding support for lazy asce
handling doesn't add much complexity to the entry code anymore.

In particular this means:

- On kernel entry the primary asce is not modified and contains the user
  asce

- Kernel accesses which require secondary-space mode (for example futex
  operations) are surrounded by enable_sacf_uaccess() and
  disable_sacf_uaccess() calls. enable_sacf_uaccess() sets the primary asce
  to kernel asce so that the sacf instruction can be used to switch to
  secondary-space mode. The primary asce is changed back to user asce with
  disable_sacf_uaccess().

The state of the control register which contains the primary asce is
reflected with a new TIF_ASCE_PRIMARY bit. This is required on context
switch so that the correct asce is restored for the scheduled in process.

In result address spaces are now setup like this:

CPU running in               | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
-----------------------------|-----------|-----------|-----------
user space                   |  user     |  user     |  kernel
kernel (no sacf)             |  user     |  user     |  kernel
kernel (during sacf uaccess) |  kernel   |  user     |  kernel
kernel (kvm guest execution) |  guest    |  user     |  kernel

In result cr1 control register content is not changed except for:
- futex system calls
- legacy s390 PCI system calls
- the kvm specific cmpxchg_user_key() uaccess helper

This leads to faster system call execution.

[1] 87d5986345 ("s390/mm: remove set_fs / rework address space handling")
[2] 56e62a7370 ("s390: convert to generic entry")

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-14 11:23:21 +02:00
Linus Torvalds
f90f2145b2 Merge tag 's390-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:

 - Add sorting of mcount locations at build time

 - Rework uaccess functions with C exception handling to shorten inline
   assembly size and enable full inlining. This yields near-optimal code
   for small constant copies with a ~40kb kernel size increase

 - Add support for a configurable STRICT_MM_TYPECHECKS which allows to
   generate better code, but also allows to have type checking for debug
   builds

 - Optimize get_lowcore() for common callers with alternatives that
   nearly revert to the pre-relocated lowcore code, while also slightly
   reducing syscall entry and exit time

 - Convert MACHINE_HAS_* checks for single facility tests into cpu_has_*
   style macros that call test_facility(), and for features with
   additional conditions, add a new ALT_TYPE_FEATURE alternative to
   provide a static branch via alternative patching. Also, move machine
   feature detection to the decompressor for early patching and add
   debugging functionality to easily show which alternatives are patched

 - Add exception table support to early boot / startup code to get rid
   of the open coded exception handling

 - Use asm_inline for all inline assemblies with EX_TABLE or ALTERNATIVE
   to ensure correct inlining and unrolling decisions

 - Remove 2k page table leftovers now that s390 has been switched to
   always allocate 4k page tables

 - Split kfence pool into 4k mappings in arch_kfence_init_pool() and
   remove the architecture-specific kfence_split_mapping()

 - Use READ_ONCE_NOCHECK() in regs_get_kernel_stack_nth() to silence
   spurious KASAN warnings from opportunistic ftrace argument tracing

 - Force __atomic_add_const() variants on s390 to always return void,
   ensuring compile errors for improper usage

 - Remove s390's ioremap_wt() and pgprot_writethrough() due to
   mismatched semantics and lack of known users, relying on asm-generic
   fallbacks

 - Signal eventfd in vfio-ap to notify userspace when the guest AP
   configuration changes, including during mdev removal

 - Convert mdev_types from an array to a pointer in vfio-ccw and vfio-ap
   drivers to avoid fake flex array confusion

 - Cleanup trap code

 - Remove references to the outdated linux390@de.ibm.com address

 - Other various small fixes and improvements all over the code

* tag 's390-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (78 commits)
  s390: Use inline qualifier for all EX_TABLE and ALTERNATIVE inline assemblies
  s390/kfence: Split kfence pool into 4k mappings in arch_kfence_init_pool()
  s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()
  s390/boot: Ignore vmlinux.map
  s390/sysctl: Remove "vm/allocate_pgste" sysctl
  s390: Remove 2k vs 4k page table leftovers
  s390/tlb: Use mm_has_pgste() instead of mm_alloc_pgste()
  s390/lowcore: Use lghi instead llilh to clear register
  s390/syscall: Merge __do_syscall() and do_syscall()
  s390/spinlock: Implement SPINLOCK_LOCKVAL with inline assembly
  s390/smp: Implement raw_smp_processor_id() with inline assembly
  s390/current: Implement current with inline assembly
  s390/lowcore: Use inline qualifier for get_lowcore() inline assembly
  s390: Move s390 sysctls into their own file under arch/s390
  s390/syscall: Simplify syscall_get_arguments()
  s390/vfio-ap: Notify userspace that guest's AP config changed when mdev removed
  s390: Remove ioremap_wt() and pgprot_writethrough()
  s390/mm: Add configurable STRICT_MM_TYPECHECKS
  s390/mm: Convert pgste_val() into function
  s390/mm: Convert pgprot_val() into function
  ...
2025-03-29 11:59:43 -07:00
Heiko Carstens
b46525437e s390/spinlock: Implement SPINLOCK_LOCKVAL with inline assembly
Implement SPINLOCK_LOCKVAL with an inline assembly, which makes use of the
ALTERNATIVE macro, to read spinlock_lockval from lowcore. Provide an
alternative instruction with a different offset in case lowcore is
relocated.

This replaces sequences of two instructions with one instruction.

Before:
  10602a:       a7 78 00 00             lhi     %r7,0
  10602e:       a5 8e 00 00             llilh   %r8,0
  106032:       58 d0 83 ac             l       %r13,940(%r8)
  106036:       ba 7d b5 80             cs      %r7,%r13,1408(%r11)

After:
  10602a:       a7 88 00 00             lhi     %r8,0
  10602e:       e3 70 03 ac 00 58       ly      %r7,940
  106034:       ba 87 b5 80             cs      %r8,%r7,1408(%r11)

Kernel image size change:
add/remove: 756/750 grow/shrink: 646/3435 up/down: 30778/-46326 (-15548)

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-18 17:13:04 +01:00
joel granados
20de8f8d31 s390: Move s390 sysctls into their own file under arch/s390
Move s390 sysctls (spin_retry and userprocess_debug) into their own
files under arch/s390. Create two new sysctl tables
(2390_{fault,spin}_sysctl_table) which will be initialized with
arch_initcall placing them after their original place in proc_root_init.

This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Signed-off-by: joel granados <joel.granados@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250306-jag-mv_ctltables-v2-6-71b243c8d3f8@kernel.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-18 17:13:04 +01:00
Heiko Carstens
52109a067a s390: Convert MACHINE_IS_[LPAR|VM|KVM], etc, machine_is_[lpar|vm|kvm]()
Move machine type detection to the decompressor and use static branches
to implement and use machine_is_[lpar|vm|kvm]() instead of a runtime check
via MACHINE_IS_[LPAR|VM|KVM].

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04 17:18:07 +01:00
Heiko Carstens
ee487b0120 s390/uaccess: Inline __clear_user()
Rework __clear_user() similar to raw_copy_from_user() / raw_copy_to_user()
and inline the function saving the overhead of branches.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04 17:18:04 +01:00
Heiko Carstens
bc6029239c s390/uaccess: Separate key uaccess functions
Implement separate raw_copy_to_user_key() and raw_copy_from_user_key()
functions, which allows to remove the open-coded operand access control
handling from the normal raw_copy_to_user() / raw_copy_from_user()
functions - they are simplified to use immediate instructions to load
hard-coded operand access control values into register zero, which saves
one instruction.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04 17:18:03 +01:00
Eric Biggers
68ea3c2ae0 lib/crc32: remove "_le" from crc32c base and arch functions
Following the standardization on crc32c() as the lib entry point for the
Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(),
and __crc32c_le(), make the same change to the underlying base and arch
functions that implement it.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-08 20:06:30 -08:00
Linus Torvalds
b731bc5f49 Merge tag 's390-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:

 - The rework that uncoupled physical and virtual address spaces
   inadvertently prevented KASAN shadow mappings from using large pages.
   Restore large page mappings for KASAN shadows

 - Add decompressor routine physmem_alloc() that may fail, unlike
   physmem_alloc_or_die(). This allows callers to implement fallback
   paths

 - Allow falling back from large pages to smaller pages (1MB or 4KB) if
   the allocation of 2GB pages in the decompressor can not be fulfilled

 - Add to the decompressor boot print support of "%%" format string,
   width and padding hadnling, length modifiers and decimal conversion
   specifiers

 - Add to the decompressor message severity levels similar to kernel
   ones. Support command-line options that control console output
   verbosity

 - Replaces boot_printk() calls with appropriate loglevel- specific
   helpers such as boot_emerg(), boot_warn(), and boot_debug().

 - Collect all boot messages into a ring buffer independent of the
   current log level. This is particularly useful for early crash
   analysis

 - If 'earlyprintk' command line parameter is not specified, store
   decompressor boot messages in a ring buffer to be printed later by
   the kernel, once the console driver is registered

 - Add 'bootdebug' command line parameter to enable printing of
   decompressor debug messages when needed. That parameters allows
   message suppressing and filtering

 - Dump boot messages on a decompressor crash, but only if 'bootdebug'
   command line parameter is enabled

 - When CONFIG_PRINTK_TIME is enabled, add timestamps to boot messages
   in the same format as regular printk()

 - Dump physical memory tracking information on boot: online ranges,
   reserved areas and vmem allocations

 - Dump virtual memory layout and randomization details

 - Improve decompression error reporting and dump the message ring
   buffer in case the boot failed and system halted

 - Add an exception handler which handles exceptions when FPU control
   register is attempted to be set to an invalid value. Remove '.fixup'
   section as result of this change

 - Use 'A', 'O', and 'R' inline assembly format flags, which allows
   recent Clang compilers to generate better FPU code

 - Rework uaccess code so it reads better and generates more efficient
   code

 - Cleanup futex inline assembly code

 - Disable KMSAN instrumention for futex inline assemblies, which
   contain dereferenced user pointers. Otherwise, shadows for the user
   pointers would be accessed

 - PFs which are not initially configured but in standby create only a
   single-function PCI domain. If they are configured later on, sibling
   PFs and their child VFs will not be added to their PCI domain
   breaking SR-IOV expectations.

   Fix that by allowing initially configured but in standby PFs create
   multi-function PCI domains

 - Add '-std=gnu11' to decompressor and purgatory CFLAGS to avoid
   compile errors caused by kernel's own definitions of 'bool', 'false',
   and 'true' conflicting with the C23 reserved keywords

 - Fix sclp subsystem failure when a sclp console is not present

 - Fix misuse of non-NULL terminated strings in vmlogrdr driver

 - Various other small improvements, cleanups and fixes

* tag 's390-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (53 commits)
  s390/vmlogrdr: Use array instead of string initializer
  s390/vmlogrdr: Use internal_name for error messages
  s390/sclp: Initialize sclp subsystem via arch_cpu_finalize_init()
  s390/tools: Use array instead of string initializer
  s390/vmem: Fix null-pointer-arithmetic warning in vmem_map_init()
  s390: Add '-std=gnu11' to decompressor and purgatory CFLAGS
  s390/bitops: Use correct constraint for arch_test_bit() inline assembly
  s390/pci: Fix SR-IOV for PFs initially in standby
  s390/futex: Avoid KMSAN instrumention for user pointers
  s390/uaccess: Rename get_put_user_noinstr_attributes to uaccess_kmsan_or_inline
  s390/futex: Cleanup futex_atomic_cmpxchg_inatomic()
  s390/futex: Generate futex atomic op functions
  s390/uaccess: Remove INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER
  s390/uaccess: Use asm goto for put_user()/get_user()
  s390/uaccess: Remove usage of the oac specifier
  s390/uaccess: Replace EX_TABLE_UA_LOAD_MEM exception handling
  s390/uaccess: Cleanup noinstr __put_user()/__get_user() inline assembly constraints
  s390/uaccess: Remove __put_user_fn()/__get_user_fn() wrappers
  s390/uaccess: Move put_user() / __put_user() close to put_user() asm code
  s390/uaccess: Use asm goto for __mvc_kernel_nofault()
  ...
2025-01-30 10:48:17 -08:00
Heiko Carstens
884f0582b2 s390/uaccess: Remove INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER
The s390 implementations of raw_copy_from_user() and raw_copy_to_user() are
never inlined. However INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER are
still set. This leads to the odd situation that only the error handling
(memset to zero of the not copied bytes) of copy_from_user() is inlined,
while the actual fast path code is out-of-line.

This would make sense if raw_copy_from_user() and raw_copy_to_user() were
implemented in assembler files, where inlining is not possible. But the
current s390 setup does not make any sense.

Address this by moving the raw uaccess copy inline assemblies to the
uaccess header file, and remove INLINE_COPY_FROM_USER and
INLINE_COPY_TO_USER definitions. This way the uaccess code, but now
including error handling, is still out-of-line with the common code
_copy_from_user() and _copy_to_user() variants, which inline the raw
uaccess functions via _inline_copy_from_user() and _inline_copy_to_user().

This reduces the size of the kernel image by ~17kb.
(defconfig, gcc 14.2.0)

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-01-26 17:24:07 +01:00
Linus Torvalds
37b33c68b0 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:

 - Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
   directly accessible via the library API, instead of requiring the
   crypto API. This is much simpler and more efficient.

 - Convert some users such as ext4 to use the CRC32 library API instead
   of the crypto API. More conversions like this will come later.

 - Add a KUnit test that tests and benchmarks multiple CRC variants.
   Remove older, less-comprehensive tests that are made redundant by
   this.

 - Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
   volunteering to maintain it. I have additional cleanups and
   optimizations planned for future cycles.

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits)
  MAINTAINERS: add entry for CRC library
  powerpc/crc: delete obsolete crc-vpmsum_test.c
  lib/crc32test: delete obsolete crc32test.c
  lib/crc16_kunit: delete obsolete crc16_kunit.c
  lib/crc_kunit.c: add KUnit test suite for CRC library functions
  powerpc/crc-t10dif: expose CRC-T10DIF function through lib
  arm64/crc-t10dif: expose CRC-T10DIF function through lib
  arm/crc-t10dif: expose CRC-T10DIF function through lib
  x86/crc-t10dif: expose CRC-T10DIF function through lib
  crypto: crct10dif - expose arch-optimized lib function
  lib/crc-t10dif: add support for arch overrides
  lib/crc-t10dif: stop wrapping the crypto API
  scsi: target: iscsi: switch to using the crc32c library
  f2fs: switch to using the crc32 library
  jbd2: switch to using the crc32c library
  ext4: switch to using the crc32c library
  lib/crc32: make crc32c() go directly to lib
  bcachefs: Explicitly select CRYPTO from BCACHEFS_FS
  x86/crc32: expose CRC32 functions through lib
  x86/crc32: update prototype for crc32_pclmul_le_16()
  ...
2025-01-22 19:55:08 -08:00
Sven Schnelle
745600ed69 s390/lib: Use exrl instead of ex in xor functions
exrl is present in all machines currently supported, therefore prefer
it over ex. This saves one instruction and doesn't need an additional
register to hold the address of the target instruction.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-01-13 09:50:17 +01:00
Sven Schnelle
807e39ed4d s390/lib: Use exrl instead of ex in string functions
exrl is present in all machines currently supported in the linux
kernel, therefore prefer it over ex. This saves one instruction
and doesn't need an additional register to hold the address of the
target instruction.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-17 12:46:13 +01:00
Eric Biggers
008071917d s390/crc32: expose CRC32 functions through lib
Move the s390 CRC32 assembly code into the lib directory and wire it up
to the library interface.  This allows it to be used without going
through the crypto API.  It remains usable via the crypto API too via
the shash algorithms that use the library interface.  Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Note: to see the diff from arch/s390/crypto/crc32-vx.c to
arch/s390/lib/crc32-glue.c, view this commit with 'git show -M10'.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01 17:23:01 -08:00
Linus Torvalds
509f806f7f Merge tag 's390-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:

 - Add swap entry for hugetlbfs support

 - Add PTE_MARKER support for hugetlbs mappings; this fixes a regression
   (possible page fault loop) which was introduced when support for
   UFFDIO_POISON for hugetlbfs was added

 - Add ARCH_HAS_PREEMPT_LAZY and PREEMPT_DYNAMIC support

 - Mark IRQ entries in entry code, so that stack tracers can filter out
   the non-IRQ parts of stack traces. This fixes stack depot capacity
   limit warnings, since without filtering the number of unique stack
   traces is huge

 - In PCI code fix leak of struct zpci_dev object, and fix potential
   double remove of hotplug slot

 - Fix pagefault_disable() / pagefault_enable() unbalance in
   arch_stack_user_walk_common()

 - A couple of inline assembly optimizations, more cmpxchg() to
   try_cmpxchg() conversions, and removal of usages of xchg() and
   cmpxchg() on one and two byte memory areas

 - Various other small improvements and cleanups

* tag 's390-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (27 commits)
  Revert "s390/mm: Allow large pages for KASAN shadow mapping"
  s390/spinlock: Use flag output constraint for arch_cmpxchg_niai8()
  s390/spinlock: Use R constraint for arch_load_niai4()
  s390/spinlock: Generate shorter code for arch_spin_unlock()
  s390/spinlock: Remove condition code clobber from arch_spin_unlock()
  s390/spinlock: Use symbolic names in inline assemblies
  s390: Support PREEMPT_DYNAMIC
  s390/pci: Fix potential double remove of hotplug slot
  s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails
  s390/mm/hugetlbfs: Add missing includes
  s390/mm: Add PTE_MARKER support for hugetlbfs mappings
  s390/mm: Introduce region-third and segment table swap entries
  s390/mm: Introduce region-third and segment table entry present bits
  s390/mm: Rearrange region-third and segment table entry SW bits
  KVM: s390: Increase size of union sca_utility to four bytes
  KVM: s390: Remove one byte cmpxchg() usage
  KVM: s390: Use try_cmpxchg() instead of cmpxchg() loops
  s390/ap: Replace xchg() with WRITE_ONCE()
  s390/mm: Allow large pages for KASAN shadow mapping
  s390: Add ARCH_HAS_PREEMPT_LAZY support
  ...
2024-11-29 10:40:52 -08:00
Heiko Carstens
889221c4d7 s390/spinlock: Use flag output constraint for arch_cmpxchg_niai8()
Add a new variant of arch_cmpxchg_niai8() which makes use of the flag
output constraint, which allows the compiler to generate slightly better
code. Also rename arch_cmpxchg_niai8() to arch_try_cmpxchg_niai8() which
reflects the purpose of the function and makes it consistent with other
"try" variants.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28 14:14:34 +01:00
Heiko Carstens
84ac96587b s390/spinlock: Use R constraint for arch_load_niai4()
The load instruction used within arch_load_niai4() has a short displacement
and index register. Therefore use the R constraint to reflect this.
The used Q constraint does consider an index register.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28 14:12:05 +01:00
Heiko Carstens
78486ed9e7 s390/spinlock: Use symbolic names in inline assemblies
Improve readability and use symbolic names.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28 14:12:04 +01:00
Linus Torvalds
aad3a0d084 Merge tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:

 - Restructure the function graph shadow stack to prepare it for use
   with kretprobes

   With the goal of merging the shadow stack logic of function graph and
   kretprobes, some more restructuring of the function shadow stack is
   required.

   Move out function graph specific fields from the fgraph
   infrastructure and store it on the new stack variables that can pass
   data from the entry callback to the exit callback.

   Hopefully, with this change, the merge of kretprobes to use fgraph
   shadow stacks will be ready by the next merge window.

 - Make shadow stack 4k instead of using PAGE_SIZE.

   Some architectures have very large PAGE_SIZE values which make its
   use for shadow stacks waste a lot of memory.

 - Give shadow stacks its own kmem cache.

   When function graph is started, every task on the system gets a
   shadow stack. In the future, shadow stacks may not be 4K in size.
   Have it have its own kmem cache so that whatever size it becomes will
   still be efficient in allocations.

 - Initialize profiler graph ops as it will be needed for new updates to
   fgraph

 - Convert to use guard(mutex) for several ftrace and fgraph functions

 - Add more comments and documentation

 - Show function return address in function graph tracer

   Add an option to show the caller of a function at each entry of the
   function graph tracer, similar to what the function tracer does.

 - Abstract out ftrace_regs from being used directly like pt_regs

   ftrace_regs was created to store a partial pt_regs. It holds only the
   registers and stack information to get to the function arguments and
   return values. On several archs, it is simply a wrapper around
   pt_regs. But some users would access ftrace_regs directly to get the
   pt_regs which will not work on all archs. Make ftrace_regs an
   abstract structure that requires all access to its fields be through
   accessor functions.

 - Show how long it takes to do function code modifications

   When code modification for function hooks happen, it always had the
   time recorded in how long it took to do the conversion. But this
   value was never exported. Recently the code was touched due to new
   ROX modification handling that caused a large slow down in doing the
   modifications and had a significant impact on boot times.

   Expose the timings in the dyn_ftrace_total_info file. This file was
   created a while ago to show information about memory usage and such
   to implement dynamic function tracing. It's also an appropriate file
   to store the timings of this modification as well. This will make it
   easier to see the impact of changes to code modification on boot up
   timings.

 - Other clean ups and small fixes

* tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits)
  ftrace: Show timings of how long nop patching took
  ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash()
  ftrace: Use guard to take the ftrace_lock in release_probe()
  ftrace: Use guard to lock ftrace_lock in cache_mod()
  ftrace: Use guard for match_records()
  fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph()
  fgraph: Give ret_stack its own kmem cache
  fgraph: Separate size of ret_stack from PAGE_SIZE
  ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
  selftests/ftrace: Fix check of return value in fgraph-retval.tc test
  ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros
  ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs
  ftrace: Make ftrace_regs abstract from direct use
  fgragh: No need to invoke the function call_filter_check_discard()
  fgraph: Simplify return address printing in function graph tracer
  function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr()
  function_graph: Support recording and printing the function return address
  ftrace: Have calltime be saved in the fgraph storage
  ftrace: Use a running sleeptime instead of saving on shadow stack
  fgraph: Use fgraph data to store subtime for profiler
  ...
2024-11-20 11:34:10 -08:00
Heiko Carstens
0caf91f6d6 s390/string: Convert to use flag output macros
Use flag output macros in inline asm to allow for better code generation if
the compiler has support for the flag output constraint.

Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-13 14:31:33 +01:00
Heiko Carstens
d5fd93629a s390/locking: Use arch_try_cmpxchg() instead of __atomic_cmpxchg_bool()
Use arch_try_cmpxchg() instead of __atomic_cmpxchg_bool() everywhere.
This generates the same code like before, but uses the standard
cmpxchg() implementation instead of a custom __atomic_cmpxchg_bool().

Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-12 14:01:29 +01:00
Steven Rostedt
7888af4166 ftrace: Make ftrace_regs abstract from direct use
ftrace_regs was created to hold registers that store information to save
function parameters, return value and stack. Since it is a subset of
pt_regs, it should only be used by its accessor functions. But because
pt_regs can easily be taken from ftrace_regs (on most archs), it is
tempting to use it directly. But when running on other architectures, it
may fail to build or worse, build but crash the kernel!

Instead, make struct ftrace_regs an empty structure and have the
architectures define __arch_ftrace_regs and all the accessor functions
will typecast to it to get to the actual fields. This will help avoid
usage of ftrace_regs directly.

Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/

Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul  Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas  Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav  Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-10 20:18:01 -04:00
Heiko Carstens
b3e0c5f734 s390/alternatives: Rework to allow for callbacks
Rework alternatives to allow for callbacks. With this every
alternative entry has additional data encoded:

- When (aka context) an alternative is supposed to be applied

- The type of an alternative, which allows for type specific handling
  and callbacks

- Extra type specific payload (patch information), which can be passed
  to callbacks in order to decide if an alternative should be applied
  or not

With this only the "late" context is implemented, which means there is
no change to the previous behaviour. All code is just converted to the
more generic new infrastructure.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-23 16:02:31 +02:00
Jeff Johnson
4657a8a1c0 s390/lib: Add missing MODULE_DESCRIPTION() macros
With ARCH=s390, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/s390/lib/test_kprobes_s390.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/s390/lib/test_unwind.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/s390/lib/test_modules.o

Add the missing invocations of the MODULE_DESCRIPTION() macro.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240615-md-s390-arch-s390-lib-v1-1-d7424b943973@quicinc.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-28 14:52:30 +02:00
Sven Schnelle
208da1d5fc s390: Replace S390_lowcore by get_lowcore()
Replace all S390_lowcore usages in arch/s390/ by get_lowcore().

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-18 17:01:33 +02:00
Vasily Gorbik
ba05b39d54 s390/expoline: Make modules use kernel expolines
Currently, kernel modules contain their own set of expoline thunks. In
the case of EXPOLINE_EXTERN, this involves postlinking of precompiled
expoline.o. expoline.o is also necessary for out-of-source tree module
builds.

Now that the kernel modules area is less than 4 GB away from
kernel expoline thunks, make modules use kernel expolines. Also make
EXPOLINE_EXTERN the default if the compiler supports it. This simplifies
build and aligns with the approach adopted by other architectures.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-17 13:38:03 +02:00
Heiko Carstens
dcd3e1de9d s390/checksum: provide csum_partial_copy_nocheck()
With csum_partial(), which reads all bytes into registers it is easy to
also implement csum_partial_copy_nocheck() which copies the buffer while
calculating its checksum.

For a 512 byte buffer this reduces the runtime by 19%. Compared to the old
generic variant (memcpy() + cksm instruction) runtime is reduced by 42%).

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
cb2a1dd589 s390/checksum: provide vector register variant of csum_partial()
Provide a faster variant of csum_partial() which uses vector registers
instead of the cksm instruction.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
1c8b8cf28f s390/nmi: implement and use local_mcck_save() / local_mcck_restore()
Instead of using local_mcck_disable() / local_mcck_enable() implement and
use local_mcck_save() / local_mcck_restore() to disable machine checks, and
restoring the previous state.

The problem with using local_mcck_disable() / local_mcck_enable() is that
there is an assumption that machine checks are always enabled. While this
is currently the case the code still looks quite odd, readers need to
double check if the code is correct.

In order to increase readability save and then restore the old machine
check mask bit, instead of assuming that it must have been enabled.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-11 14:33:05 +01:00
Heiko Carstens
527618abb9 s390/ctlreg: add struct ctlreg
Add struct ctlreg to enforce strict type checking / usage for control
register functions.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
2372d39142 s390/ctlreg: use local_ctl_load() and local_ctl_store() where possible
Convert all single control register usages of __local_ctl_load() and
__local_ctl_store() to local_ctl_load() and local_ctl_store().

This also requires to change the type of some struct lowcore members
from __u64 to unsigned long.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
8d5e98f8d6 s390/ctlreg: add local and system prefix to some functions
Add local and system prefix to some functions to clarify they change
control register contents on either the local CPU or the on all CPUs.

This results in the following API:

Two defines which load and save multiple control registers.
The defines correlate with the following C prototypes:

void __local_ctl_load(unsigned long *, unsigned int cr_low, unsigned int cr_high);
void __local_ctl_store(unsigned long *, unsigned int cr_low, unsigned int cr_high);

Two functions which locally set or clear one bit for a specified
control register:

void local_ctl_set_bit(unsigned int cr, unsigned int bit);
void local_ctl_clear_bit(unsigned int cr, unsigned int bit);

Two functions which set or clear one bit for a specified control
register on all CPUs:

void system_ctl_set_bit(unsigned int cr, unsigned int bit);
void system_ctl_clear_bit(unsigend int cr, unsigned int bit);

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00
Heiko Carstens
ebe1cd530f s390/ctlreg: rename ctl_reg.h to ctlreg.h
Rename ctl_reg.h to ctlreg.h so it matches not only ctlreg.c but also
other control register related function, union, and structure names,
which all come with a ctlreg prefix.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19 13:26:56 +02:00