Files
linux/lib
Demian Shulhan 63432fd625 lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation
Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON
Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR
software implementation is slow, which creates a bottleneck in NVMe and
other storage subsystems.

The acceleration is implemented using C intrinsics (<arm_neon.h>) rather
than raw assembly for better readability and maintainability.

Key highlights of this implementation:
- Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency
  spikes on large buffers.
- Pre-calculates and loads fold constants via vld1q_u64() to minimize
  register spilling.
- Benchmarks show the break-even point against the generic implementation
  is around 128 bytes. The PMULL path is enabled only for len >= 128.

Performance results (kunit crc_benchmark on Cortex-A72):
- Generic (len=4096): ~268 MB/s
- PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)

Signed-off-by: Demian Shulhan <demyansh@gmail.com>
Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-29 13:22:13 -07:00
..
2025-03-16 22:30:49 -07:00
2025-04-11 17:32:37 -07:00
2021-08-19 09:02:55 +09:00
2023-02-02 22:50:01 -08:00
2023-02-02 22:50:01 -08:00
2021-01-03 20:05:18 -05:00
2025-01-12 20:21:15 -08:00
2021-08-19 09:02:55 +09:00
2026-01-26 19:07:13 -08:00
2025-10-24 21:39:27 +02:00
2024-10-14 16:33:24 -05:00
2026-01-11 06:09:11 -10:00
2025-11-27 14:24:30 -08:00
2021-07-08 11:48:20 -07:00
2024-02-15 12:17:28 -05:00
2021-06-18 11:43:09 +02:00
2025-03-25 10:18:31 -03:00
2024-12-09 13:48:29 -08:00
2025-09-13 16:54:46 -07:00