Allow the data to be verified in a PKCS#7 or CMS message to be passed
directly to an asymmetric cipher algorithm (e.g. ML-DSA) if it wants to do
whatever passes for hashing/digestion itself. The normal digestion of the
data is then skipped as that would be ignored unless another signed info in
the message has some other algorithm that needs it.
The 'data to be verified' may be the content of the PKCS#7 message or it
will be the authenticatedAttributes (signedAttrs if CMS), modified, if
those are present.
This is done by:
(1) Make ->m and ->m_size point to the data to be verified rather than
making public_key_verify_signature() access the data directly. This
is so that keyctl(KEYCTL_PKEY_VERIFY) will still work.
(2) Add a flag, ->algo_takes_data, to indicate that the verification
algorithm wants to access the data to be verified directly rather than
having it digested first.
(3) If the PKCS#7 message has authenticatedAttributes (or CMS
signedAttrs), then the digest contained therein will be validated as
now, and the modified attrs blob will either be digested or assigned
to ->m as appropriate.
(4) If present, always copy and modify the authenticatedAttributes (or
signedAttrs) then digest that in one go rather than calling the shash
update twice (once for the tag and once for the rest).
(5) For ML-DSA, point ->m to the TBSCertificate instead of digesting it
and using the digest.
Note that whilst ML-DSA does allow for an "external mu", CMS doesn't yet
have that standardised.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Lukas Wunner <lukas@wunner.de>
cc: Ignat Korchagin <ignat@cloudflare.com>
cc: Stephan Mueller <smueller@chronox.de>
cc: Eric Biggers <ebiggers@kernel.org>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: keyrings@vger.kernel.org
cc: linux-crypto@vger.kernel.org
Calculate the SHA256 hash for blacklisting purposes independently of the
signature hash (which may be something other than SHA256).
This is necessary because when ML-DSA is used, no digest is calculated.
Note that this represents a change of behaviour in that the hash used for
the blacklist check would previously have been whatever digest was used
for, say, RSA-based signatures. It may be that this is inadvisable.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
cc: Lukas Wunner <lukas@wunner.de>
cc: Ignat Korchagin <ignat@cloudflare.com>
cc: Stephan Mueller <smueller@chronox.de>
cc: Eric Biggers <ebiggers@kernel.org>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: keyrings@vger.kernel.org
cc: linux-crypto@vger.kernel.org
Since ML-DSA is FIPS-approved, add the boot-time self-test which is
apparently required.
Just add a test vector manually for now, borrowed from
lib/crypto/tests/mldsa-testvecs.h (where in turn it's borrowed from
leancrypto). The SHA-* FIPS test vectors are generated by
scripts/crypto/gen-fips-testvecs.py instead, but the common Python
libraries don't support ML-DSA yet.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20260107044215.109930-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
CONFIG_CRYPTO_NHPOLY1305_NEON, CONFIG_CRYPTO_NHPOLY1305_SSE2, and
CONFIG_CRYPTO_NHPOLY1305_AVX2 no longer exist. The equivalent
optimizations are now just enabled automatically when Adiantum support
is enabled. Update the fscrypt documentation accordingly.
Link: https://lore.kernel.org/r/20251211011846.8179-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
This feature isn't useful in practice. Simplify and streamline the code
in the synchronous case, i.e. the case that actually matters, instead.
For example, by no longer having to support resuming the calculation
after an asynchronous return of the xchacha cipher, we can just keep
more of the state on the stack instead of in the request context.
Link: https://lore.kernel.org/r/20251211011846.8179-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Make adiantum_hash_message() use the scatter_walk API instead of
sg_miter. scatter_walk is a bit simpler and also more efficient. For
example, unlike sg_miter, scatter_walk doesn't require that the number
of scatterlist entries be calculated up-front.
Link: https://lore.kernel.org/r/20251211011846.8179-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reimplement the Adiantum message hashing using the nh() library
function, combined with some code which directly handles the Poly1305
stage. The latter code is derived from crypto/nhpoly1305.c.
This eliminates the dependency on the "nhpoly1305" crypto_shash
algorithm, which existed only to fit Adiantum message hashing into the
traditional Linux crypto API paradigm. Now that simple,
architecture-optimized library functions are a well-established option
too, we can switch to this simpler implementation.
Note: I've dropped the support for the optional third parameter of the
adiantum template, which specified the nhpoly1305 implementation. We
could keep accepting some strings in this parameter for backwards
compatibility, but I don't think it's being used. I believe only
"adiantum(xchacha12,aes)" and "adiantum(xchacha20,aes)" are used.
Link: https://lore.kernel.org/r/20251211011846.8179-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Migrate the x86_64 implementations of NH into lib/crypto/. This makes
the nh() function be optimized on x86_64 kernels.
Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code. This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Migrate the arm64 NEON implementation of NH into lib/crypto/. This
makes the nh() function be optimized on arm64 kernels.
Note: this temporarily makes the adiantum template not utilize the arm64
optimized NH code. This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Migrate the arm32 NEON implementation of NH into lib/crypto/. This
makes the nh() function be optimized on arm32 kernels.
Note: this temporarily makes the adiantum template not utilize the arm32
optimized NH code. This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add some simple KUnit tests for the nh() function.
These replace the test coverage which will be lost by removing the
nhpoly1305 crypto_shash.
Note that the NH code also continues to be tested indirectly as well,
via the tests for the "adiantum(xchacha12,aes)" crypto_skcipher.
Link: https://lore.kernel.org/r/20251211011846.8179-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add support for the NH "almost-universal hash function" to lib/crypto/,
specifically the variant of NH used in Adiantum.
This will replace the need for the "nhpoly1305" crypto_shash algorithm.
All the implementations of "nhpoly1305" use architecture-optimized code
only for the NH stage; they just use the generic C Poly1305 code for the
Poly1305 stage. We can achieve the same result in a simpler way using
an (architecture-optimized) nh() function combined with code in
crypto/adiantum.c that passes the results to the Poly1305 library.
This commit begins this cleanup by adding the nh() function. The code
is derived from crypto/nhpoly1305.c and include/crypto/nhpoly1305.h.
Link: https://lore.kernel.org/r/20251211011846.8179-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add a KUnit test suite for ML-DSA verification, including the following
for each ML-DSA parameter set (ML-DSA-44, ML-DSA-65, and ML-DSA-87):
- Positive test (valid signature), using vector imported from leancrypto
- Various negative tests:
- Wrong length for signature, message, or public key
- Out-of-range coefficients in z vector
- Invalid encoded hint vector
- Any bit flipped in signature, message, or public key
- Unit test for the internal function use_hint()
- A benchmark
ML-DSA inputs and outputs are very large. To keep the size of the tests
down, use just one valid test vector per parameter set, and generate the
negative tests at runtime by mutating the valid test vector.
I also considered importing the test vectors from Wycheproof. I've
tested that mldsa_verify() indeed passes all of Wycheproof's ML-DSA test
vectors that use an empty context string. However, importing these
permanently would add over 6 MB of source. That's too much to be a
reasonable addition to the Linux kernel tree for one algorithm. It also
wouldn't actually provide much better test coverage than this commit.
Another potential issue is that Wycheproof uses the Apache license.
Similarly, this also differs from the earlier proposal to import a long
list of test vectors from leancrypto. I retained only one valid
signature for each algorithm, and I also added (runtime-generated)
negative tests which were missing. I think this is a better tradeoff.
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20251214181712.29132-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add support for verifying ML-DSA signatures.
ML-DSA (Module-Lattice-Based Digital Signature Algorithm) is specified
in FIPS 204 and is the standard version of Dilithium. Unlike RSA and
elliptic-curve cryptography, ML-DSA is believed to be secure even
against adversaries in possession of a large-scale quantum computer.
Compared to the earlier patch
(https://lore.kernel.org/r/20251117145606.2155773-3-dhowells@redhat.com/)
that was based on "leancrypto", this implementation:
- Is about 700 lines of source code instead of 4800.
- Generates about 4 KB of object code instead of 28 KB.
- Uses 9-13 KB of memory to verify a signature instead of 31-84 KB.
- Is at least about the same speed, with a microbenchmark showing 3-5%
improvements on one x86_64 CPU and -1% to 1% changes on another.
When memory is a bottleneck, it's likely much faster.
- Correctly implements the RejNTTPoly step of the algorithm.
The API just consists of a single function mldsa_verify(), supporting
pure ML-DSA with any standard parameter set (ML-DSA-44, ML-DSA-65, or
ML-DSA-87) as selected by an enum. That's all that's actually needed.
The following four potential features are unneeded and aren't included.
However, any that ever become needed could fairly easily be added later,
as they only affect how the message representative mu is calculated:
- Nonempty context strings
- Incremental message hashing
- HashML-DSA
- External mu
Signing support would, of course, be a larger and more complex addition.
However, the kernel doesn't, and shouldn't, need ML-DSA signing support.
Note that mldsa_verify() allocates memory, so it can sleep and can fail
with ENOMEM. Unfortunately we don't have much choice about that, since
ML-DSA needs a lot of memory. At least callers have to check for errors
anyway, since the signature could be invalid.
Note that verification doesn't require constant-time code, and in fact
some steps are inherently variable-time. I've used constant-time
patterns in some places anyway, but technically they're not needed.
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20251214181712.29132-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Pull crypto library fixes from Eric Biggers:
- A couple more fixes for the lib/crypto KUnit tests
- Fix missing MMU protection for the AES S-box
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: aes: Fix missing MMU protection for AES S-box
MAINTAINERS: add test vector generation scripts to "CRYPTO LIBRARY"
lib/crypto: tests: Fix syntax error for old python versions
lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for some reported issues.
Included in here is:
- much reported rust_binder fix
- counter driver fixes
- new device ids for the mei driver
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
rust_binder: remove spin_lock() in rust_shrink_free_page()
mei: me: add nova lake point S DID
counter: 104-quad-8: Fix incorrect return value in IRQ handler
counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
Pull x86 fix from Ingo Molnar:
"Disable GCOV instrumentation in the SEV noinstr.c collection of SEV
noinstr methods, to further robustify the code"
* tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Disable GCOV on noinstr object
Pull scheduler fix from Ingo Molnar:
"Fix a crash in sched_mm_cid_after_execve()"
* tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/mm_cid: Prevent NULL mm dereference in sched_mm_cid_after_execve()
Pull misc irqchip fixes from Ingo Molnar:
- Fix an endianness bug in the gic-v5 irqchip driver
- Revert a broken commit from the riscv-imsic irqchip driver
* tag 'irq-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "irqchip/riscv-imsic: Embed the vector array in lpriv"
irqchip/gic-v5: Fix gicv5_its_map_event() ITTE read endianness
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RISC-V fixes from Paul Walmsley:
"Notable changes include a fix to close one common microarchitectural
attack vector for out-of-order cores. Another patch exposed an
omission in my boot test coverage, which is currently missing
relocatable kernels. Otherwise, the fixes seem to be settling down for
us.
- Fix CONFIG_RELOCATABLE=y boots by building Image files from
vmlinux, rather than vmlinux.unstripped, now that the .modinfo
section is included in vmlinux.unstripped
- Prevent branch predictor poisoning microarchitectural attacks that
use the syscall index as a vector by using array_index_nospec() to
clamp the index after the bounds check (as x86 and ARM64 already
do)
- Fix a crash in test_kprobes when building with Clang
- Fix a deadlock possible when tracing is enabled for SBI ecalls
- Fix the definition of the Zk standard RISC-V ISA extension bundle,
which was missing the Zknh extension
- A few other miscellaneous non-functional cleanups, removing unused
macros, fixing an out-of-date path in code comments, resolving a
compile-time warning for a type mismatch in a pr_crit(), and
removing an unnecessary header file inclusion"
* tag 'riscv-for-linus-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: trace: fix snapshot deadlock with sbi ecall
riscv: remove irqflags.h inclusion in asm/bitops.h
riscv: cpu_ops_sbi: smp_processor_id() returns int, not unsigned int
riscv: configs: Clean up references to non-existing configs
riscv: kexec_image: Fix dead link to boot-image-header.rst
riscv: pgtable: Cleanup useless VA_USER_XXX definitions
riscv: cpufeature: Fix Zk bundled extension missing Zknh
riscv: fix KUnit test_kprobes crash when building with Clang
riscv: Sanitize syscall table indexing under speculation
riscv: boot: Always make Image from vmlinux, not vmlinux.unstripped
Pull driver core fixes from Danilo Krummrich:
- Fix swapped example values for the `family` and `machine` attributes
in the sysfs SoC bus ABI documentation
- Fix Rust build and intra-doc issues when optional subsystems
(CONFIG_PCI, CONFIG_AUXILIARY_BUS, CONFIG_PRINTK) are disabled
- Fix typos and incorrect safety comments in Rust PCI, DMA, and
device ID documentation
* tag 'driver-core-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
rust: device: Remove explicit import of CStrExt
rust: pci: fix typos in Bar struct's comments
rust: device: fix broken intra-doc links
rust: dma: fix broken intra-doc links
rust: driver: fix broken intra-doc links to example driver types
rust: device_id: replace incorrect word in safety documentation
rust: dma: remove incorrect safety documentation
docs: ABI: sysfs-devices-soc: Fix swapped sample values
Pull kselftest fix from Shuah Khan:
"Fix tracing test_multiple_writes stalls when buffer_size_kb is less
than 12KB"
* tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/tracing: Fix test_multiple_writes stall
Pull iomu fixes from Joerg Roedel:
- several Kconfig-related build fixes
- fix for when gcc 8.5 on PPC refuses to inline a function from a
header file
* tag 'iommu-fixes-v6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommupt: Make pt_feature() always_inline
iommufd/selftest: Prevent module/builtin conflicts in kconfig
iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFER
iommupt: Fix the kunit building
Sheng Yong reported [1] that Android APEX images didn't work with commit
072a7c7cdb ("erofs: don't bother with s_stack_depth increasing for
now") because "EROFS-formatted APEX file images can be stored within an
EROFS-formatted Android system partition."
In response, I sent a quick fat-fingered [PATCH v3] to address the
report. Unfortunately, the updated condition was incorrect:
if (erofs_is_fileio_mode(sbi)) {
- sb->s_stack_depth =
- file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
- if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
- erofs_err(sb, "maximum fs stacking depth exceeded");
+ inode = file_inode(sbi->dif0.file);
+ if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
+ inode->i_sb->s_stack_depth) {
The condition `!sb->s_bdev` is always true for all file-backed EROFS
mounts, making the check effectively a no-op.
The real fix tested and confirmed by Sheng Yong [2] at that time was
[PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup
works:
EROFS (on a block device) + EROFS (file-backed mount)
But sadly I screwed it up again by upstreaming the outdated [PATCH v3].
This patch applies the same logic as the delta between the upstream
[PATCH v3] and the real fix [PATCH v3 RESEND].
Reported-by: Sheng Yong <shengyong1@xiaomi.com>
Closes: https://lore.kernel.org/r/3acec686-4020-4609-aee4-5dae7b9b0093@gmail.com [1]
Fixes: 072a7c7cdb ("erofs: don't bother with s_stack_depth increasing for now")
Link: https://lore.kernel.org/r/243f57b8-246f-47e7-9fb1-27a771e8e9e8@gmail.com [2]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc 8.5 on powerpc does not automatically inline these functions even
though they evaluate to constants in key cases. Since the constant
propagation is essential for some code elimination and built-time checks
this causes a build failure:
ERROR: modpost: "__pt_no_sw_bit" [drivers/iommu/generic_pt/fmt/iommu_amdv1.ko] undefined!
Caused by this:
if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) &&
!pt_test_sw_bit_acquire(&pts,
SW_BIT_CACHE_FLUSH_DONE))
flush_writes_item(&pts);
Where pts_feature() evaluates to a constant false. Mark them as
__always_inline to force it to evaluate to a constant and trigger the code
elimination.
Fixes: 7c5b184db7 ("genpt: Generic Page Table base API")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512230720.9y9DtWIo-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The selftest now depends on the AMDv1 page table, however the selftest
kconfig itself is just an sub-option of the main IOMMUFD module kconfig.
This means it cannot be modular and so kconfig allowed a modular
IOMMU_PT_AMDV1 with a built in IOMMUFD. This causes link failures:
ld: vmlinux.o: in function `mock_domain_alloc_pgtable.isra.0':
selftest.c:(.text+0x12e8ad3): undefined reference to `pt_iommu_amdv1_init'
ld: vmlinux.o: in function `BSWAP_SHUFB_CTL':
sha1-avx2-asm.o:(.rodata+0xaa36a8): undefined reference to `pt_iommu_amdv1_read_and_clear_dirty'
ld: sha1-avx2-asm.o:(.rodata+0xaa36f0): undefined reference to `pt_iommu_amdv1_map_pages'
ld: sha1-avx2-asm.o:(.rodata+0xaa36f8): undefined reference to `pt_iommu_amdv1_unmap_pages'
ld: sha1-avx2-asm.o:(.rodata+0xaa3720): undefined reference to `pt_iommu_amdv1_iova_to_phys'
Adjust the kconfig to disable IOMMUFD_TEST if IOMMU_PT_AMDV1 is incompatible.
Fixes: e93d5945ed ("iommufd: Change the selftest to use iommupt instead of xarray")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512210135.freQWpxa-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The test doesn't build without it, dma-buf.h does not provide stub
functions if it is not enabled. Compilation can fail with:
ERROR:root:ld: vmlinux.o: in function `iommufd_test':
(.text+0x3b1cdd): undefined reference to `dma_buf_get'
ld: (.text+0x3b1d08): undefined reference to `dma_buf_put'
ld: (.text+0x3b2105): undefined reference to `dma_buf_export'
ld: (.text+0x3b211f): undefined reference to `dma_buf_fd'
ld: (.text+0x3b2e47): undefined reference to `dma_buf_move_notify'
Add the missing select.
Fixes: d2041f1f11 ("iommufd/selftest: Add some tests for the dmabuf flow")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The kunit doesn't work since the below commit made GENERIC_PT
unselectable:
$ make ARCH=x86_64 O=build_kunit_x86_64 olddefconfig
ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
This is probably due to unsatisfied dependencies.
Missing: CONFIG_DEBUG_GENERIC_PT=y, CONFIG_IOMMUFD_TEST=y,
CONFIG_IOMMU_PT_X86_64=y, CONFIG_GENERIC_PT=y, CONFIG_IOMMU_PT_AMDV1=y,
CONFIG_IOMMU_PT_VTDSS=y, CONFIG_IOMMU_PT=y, CONFIG_IOMMU_PT_KUNIT_TEST=y
Also remove the unneeded CONFIG_IOMMUFD_TEST reference as the iommupt kunit
doesn't interact with iommufd, and it doesn't currently build for the
kunit due problems with DMA_SHARED buffer either.
Fixes: 01569c216d ("genpt: Make GENERIC_PT invisible")
Fixes: 1dd4187f53 ("iommupt: Add a kunit test for Generic Page Table")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Pull erofs fix from Gao Xiang:
- Don't increase s_stack_depth which caused regressions in some
composefs mount setups (EROFS + ovl^2)
Instead just allow one extra unaccounted fs stacking level for
straightforward cases.
* tag 'erofs-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: don't bother with s_stack_depth increasing for now
Previously, commit d53cd891f0 ("erofs: limit the level of fs stacking
for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
stack overflow when stacking an unlimited number of EROFS on top of
each other.
This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
(and such setups are already used in production for quite a long time).
One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
from 2 to 3, but proving that this is safe in general is a high bar.
After a long discussion on GitHub issues [1] about possible solutions,
one conclusion is that there is no need to support nesting file-backed
EROFS mounts on stacked filesystems, because there is always the option
to use loopback devices as a fallback.
As a quick fix for the composefs regression for this cycle, instead of
bumping `s_stack_depth` for file backed EROFS mounts, we disallow
nesting file-backed EROFS over EROFS and over filesystems with
`s_stack_depth` > 0.
This works for all known file-backed mount use cases (composefs,
containerd, and Android APEX for some Android vendors), and the fix is
self-contained.
Essentially, we are allowing one extra unaccounted fs stacking level of
EROFS below stacking filesystems, but EROFS can only be used in the read
path (i.e. overlayfs lower layers), which typically has much lower stack
usage than the write path.
We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
stack usage analysis or using alternative approaches, such as splitting
the `s_stack_depth` limitation according to different combinations of
stacking.
Fixes: d53cd891f0 ("erofs: limit the level of fs stacking for file-backed mounts")
Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
Reported-by: Timothée Ravier <tim@siosm.fr>
Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Alexander Larsson <alexl@redhat.com>
Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Pull block fixes from Jens Axboe:
- Kill unlikely checks for blk-rq-qos. These checks are really
all-or-nothing, either the branch is taken all the time, or it's not.
Depending on the configuration, either one of those cases may be
true. Just remove the annotation
- Fix for merging bios with different app tags set
- Fix for a recently introduced slowdown due to RCU synchronization
- Fix for a status change on loop while it's in use, and then a later
fix for that fix
- Fix for the async partition scanning in ublk
* tag 'block-6.19-20260109' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
ublk: fix use-after-free in ublk_partition_scan_work
blk-mq: avoid stall during boot due to synchronize_rcu_expedited
loop: add missing bd_abort_claiming in loop_set_status
block: don't merge bios with different app_tags
blk-rq-qos: Remove unlikely() hints from QoS checks
loop: don't change loop device under exclusive opener in loop_set_status
Pull io_uring fixes from Jens Axboe:
"A single fix for a regression introduced in 6.15, where a failure to
wake up idle io-wq workers at ring exit will wait for the timeout to
expire.
This isn't normally noticeable, as the exit is async.
But if a parent task created a thread that sets up a ring and uses
requests that cause io-wq threads to be created, and the parent task
then waits for the thread to exit, then it can take 5 seconds for that
pthread_join() to succeed as the child thread is waiting for its
children to exit.
On top of that, just a basic cleanup as well"
* tag 'io_uring-6.19-20260109' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/io-wq: remove io_wq_for_each_worker() return value
io_uring/io-wq: fix incorrect io_wq_for_each_worker() termination logic
Pull arm64 fixes from Catalin Marinas:
- Do not return false if !preemptible() in current_in_efi(). EFI
runtime services can now run with preemption enabled
- Fix uninitialised variable in the arm MPAM driver, reported by sparse
- Fix partial kasan_reset_tag() use in change_memory_common() when
calculating page indices or comparing ranges
- Save/restore TCR2_EL1 during suspend/resume, otherwise the E0POE bit
is lost
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix cleared E0POE bit after cpu_suspend()/resume()
arm64: mm: Fix incomplete tag reset in change_memory_common()
arm_mpam: Stop using uninitialized variables in __ris_msmon_read()
arm64/efi: Don't fail check current_in_efi() if preemptible
Pull SoC fixes from Arnd Bergmann:
"The main code change is a revert of the Raspberry Pi RP1 overlay
support that was decided to not be ready.
The other fixes are all for devicetree sources:
- ethernet configuration on ixp42x-actiontec-mi424wr is board
revision specific
- validation warning fixes for imx27/imx51/imx6, hikey960 and k3
- Minor corrections across imx8 boards, addressing all types of
issues with interrups, dma, ethernet and clock settings, all simple
one-line changes"
* tag 'soc-fixes-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits)
arm64: dts: hisilicon: hikey960: Drop "snps,gctl-reset-quirk" and "snps,tx_de_emphasis*" properties
Documentation/process: maintainer-soc: Mark 'make' as commands
Documentation/process: maintainer-soc: Be more explicit about defconfig
arm64: dts: mba8mx: Fix Ethernet PHY IRQ support
arm64: dts: imx8qm-ss-dma: correct the dma channels of lpuart
arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM
arm64: dts: freescale: tx8p-ml81: fix eqos nvmem-cells
arm64: dts: freescale: moduline-display: fix compatible
dt-bindings: arm: fsl: moduline-display: fix compatible
ARM: dts: imx6q-ba16: fix RTC interrupt level
arm64: dts: freescale: imx95-toradex-smarc: fix SMARC_SDIO_WP label position
arm64: dts: freescale: imx95-toradex-smarc: use edge trigger for ethphy1 interrupt
arm64: dts: add off-on-delay-us for usdhc2 regulator
arm64: dts: imx8qm-mek: correct the light sensor interrupt type to low level
ARM: dts: nxp: imx: Fix mc13xxx LED node names
arm64: dts: imx95: correct I3C2 pclk to IMX95_CLK_BUSWAKEUP
MAINTAINERS: Fix a linusw mail address
arm64: dts: broadcom: rp1: drop RP1 overlay
arm64: dts: broadcom: bcm2712: fix RP1 endpoint PCI topology
misc: rp1: drop overlay support
...
Pull ceph fixes from Ilya Dryomov:
"A bunch of libceph fixes split evenly between memory safety and
implementation correctness issues (all marked for stable) and a change
in maintainers for CephFS: Slava and Alex have formally taken over
Xiubo's role"
* tag 'ceph-for-6.19-rc5' of https://github.com/ceph/ceph-client:
libceph: make calc_target() set t->paused, not just clear it
libceph: reset sparse-read state in osd_fault()
libceph: return the handler error from mon_handle_auth_done()
libceph: make free_choose_arg_map() resilient to partial allocation
ceph: update co-maintainers list in MAINTAINERS
libceph: replace overzealous BUG_ON in osdmap_apply_incremental()
libceph: prevent potential out-of-bounds reads in handle_auth_done()
When /sys/kernel/tracing/buffer_size_kb is less than 12KB,
the test_multiple_writes test will stall and wait for more
input due to insufficient buffer space.
Check current buffer_size_kb value before the test. If it is
less than 12KB, it temporarily increase the buffer to 12KB,
and restore the original value after the tests are completed.
Link: https://lore.kernel.org/r/20260109033620.25727-1-fushuai.wang@linux.dev
Fixes: 37f4660138 ("selftests/tracing: Add basic test for trace_marker_raw file")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull btrfs fixes from David Sterba:
- fix potential NULL pointer dereference when replaying tree log after
an error
- release path before initializing extent tree to avoid potential
deadlock when allocating new inode
- on filesystems with block size > page size
- fix potential read out of bounds during encoded read of an inline
extent
- only enforce free space tree if v1 cache is required
- print correct tree id in error message
* tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: show correct warning if can't read data reloc tree
btrfs: fix NULL pointer dereference in do_abort_log_replay()
btrfs: force free space tree for bs > ps cases
btrfs: only enforce free space tree if v1 cache is required for bs < ps cases
btrfs: release path before initializing extent tree in btrfs_read_locked_inode()
btrfs: avoid access-beyond-folio for bs > ps encoded writes
Pull PCI fixes from Bjorn Helgaas:
- Remove ASPM L0s support for MSM8996 SoC since we now enable L0s when
advertised, and it caused random hangs on this device (Manivannan
Sadhasivam)
- Fix meson-pcie to report that the link is up while in ASPM L0s or L1,
since those are active states from the software point of view, and
treating the link as down caused config access failures (Bjorn
Helgaas)
- Fix up sparc DTS BAR descriptions that are above 4GB but not marked
as prefetchable, which caused resource assignment and driver probe
failures after we converted from the SPARC pcibios_enable_device() to
the generic version (Ilpo Järvinen)
* tag 'pci-v6.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
sparc/PCI: Correct 64-bit non-pref -> pref BAR resources
PCI: meson: Report that link is up while in ASPM L0s and L1 states
PCI: qcom: Remove ASPM L0s support for MSM8996 SoC
Pull ACPI support fix from Rafael Wysocki:
"This fixes the ACPI/PCI legacy interrupts (INTx) parsing in the case
when the ACPI Global System Interrupt (GSI) value is a 32-bit one with
the MSB set.
That was interpreted as a negative integer and caused
acpi_pci_link_allocate_irq() to fail and acpi_irq_get_penalty() to
trigger an out-of-bounds array dereference (Lorenzo Pieralisi)"
* tag 'acpi-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PCI: IRQ: Fix INTx GSIs signedness
Pull power management fix from Rafael Wysocki:
"This fixes a crash in the hibernation image saving code that can be
triggered when the given compression algorithm is unavailable (Malaya
Kumar Rout)"
* tag 'pm-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: Fix crash when freeing invalid crypto compressor