The struct pidfd_info currently exposes in a field called coredump_signal the
signal number (si_signo) that triggered the dump (for example, 11 for SIGSEGV).
However, it is also valuable to understand the reason why that signal was sent.
This additional context is provided by the signal code (si_code), such as 2 for
SEGV_ACCERR.
Add a new field to struct pidfd_info called coredump_code with the value of
si_code for the benefit of sysadmins who pipe core dumps to user-space programs
for later analysis. The following snippet illustrates a simplified C program
that consumes coredump_signal and coredump_code, and then logs core dump
signals and codes to a file:
int pidfd = (int)atoi(argv[1]);
struct pidfd_info info = {
.mask = PIDFD_INFO_EXIT | PIDFD_INFO_COREDUMP,
};
if (ioctl(pidfd, PIDFD_GET_INFO, &info) == 0)
if (info.mask & PIDFD_INFO_COREDUMP)
fprintf(f, "PID=%d, si_signo: %d si_code: %d\n",
info.pid, info.coredump_signal, info.coredump_code);
Assuming the program is installed under /usr/local/bin/core-logger, core dump
processing can be enabled by setting /proc/sys/kernel/core_pattern to
'|/usr/local/bin/dumpstuff %F'.
systemd-coredump(8) already uses pidfds to process core dumps, and it could be
extended to include the values of coredump_code too.
Signed-off-by: Emanuele Rocca <emanuele.rocca@arm.com>
Link: https://patch.msgid.link/acE52HIFivNZN3nE@NH27D9T0LF
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Commit 673a55cc49 replaced the null pointer dereference used in
crashing_child() with __builtin_trap to address the following LLVM warnings:
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
All coredump tests expect crashing_child() to result in a SIGSEGV. However, the
behavior of __builtin_trap is architecture-dependent. On x86 it yields SIGILL,
on aarch64 SIGTRAP. Given that neither of those signals are SIGSEGV, both
coredump_socket_test and coredump_socket_protocol_test are currently failing:
get_pidfd_info: mask=0xd7, coredump_mask=0x5, coredump_signal=5
socket_coredump_signal_sigsegv: coredump_signal=5, expected SIGSEGV=11
Qualify the pointer with volatile instead of calling __builtin_trap to fix the
tests.
Signed-off-by: Emanuele Rocca <emanuele.rocca@arm.com>
Link: https://patch.msgid.link/ab2kI0PI_Vk6bU88@NH27D9T0LF
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Christian Brauner <brauner@kernel.org> says:
Add three new clone3() flags for pidfd-based process lifecycle
management.
=== CLONE_AUTOREAP ===
CLONE_AUTOREAP makes a child process auto-reap on exit without ever
becoming a zombie. This is a per-process property in contrast to the
existing auto-reap mechanism via SA_NOCLDWAIT or SIG_IGN for SIGCHLD
which applies to all children of a given parent.
Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property
affecting all children which makes it unsuitable for libraries or
applications that need selective auto-reaping of specific children while
still being able to wait() on others.
CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct.
When the child exits do_notify_parent() checks this flag and causes
exit_notify() to transition the task directly to EXIT_DEAD. Since the
flag lives on the child it survives reparenting: if the original parent
exits and the child is reparented to a subreaper or init the child still
auto-reaps when it eventually exits. This is cleaner than forcing the
subreaper to get SIGCHLD and then reaping it. If the parent doesn't care
the subreaper won't care. If there's a subreaper that would care it
would be easy enough to add a prctl() that either just turns back on
SIGCHLD and turns off auto-reaping or a prctl() that just notifies the
subreaper whenever a child is reparented to it.
CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to
monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern. No exit signal is delivered so exit_signal must be zero.
CLONE_THREAD and CLONE_PARENT are rejected: CLONE_THREAD because
autoreap is a process-level property, and CLONE_PARENT because an
autoreap child reparented via CLONE_PARENT could become an invisible
zombie under a parent that never calls wait().
The flag is not inherited by the autoreap process's own children. Each
child that should be autoreaped must be explicitly created with
CLONE_AUTOREAP.
=== CLONE_NNP ===
CLONE_NNP sets no_new_privs on the child at clone time. Unlike
prctl(PR_SET_NO_NEW_PRIVS) which a process sets on itself, CLONE_NNP
allows the parent to impose no_new_privs on the child at creation
without affecting the parent's own privileges. CLONE_THREAD is rejected
because threads share credentials. CLONE_NNP is useful on its own for
any spawn-and-sandbox pattern but was specifically introduced to enable
unprivileged usage of CLONE_PIDFD_AUTOKILL.
=== CLONE_PIDFD_AUTOKILL ===
This flag ties a child's lifetime to the pidfd returned from clone3().
When the last reference to the struct file created by clone3() is closed
the kernel sends SIGKILL to the child. A pidfd obtained via pidfd_open()
for the same process does not keep the child alive and does not trigger
autokill - only the specific struct file from clone3() has this
property. This is useful for container runtimes, service managers, and
sandboxed subprocess execution - any scenario where the child must die
if the parent crashes or abandons the pidfd or just wants a throwaway
helper process.
CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD and CLONE_AUTOREAP. It
requires CLONE_PIDFD because the whole point is tying the child's
lifetime to the pidfd. It requires CLONE_AUTOREAP because a killed child
with no one to reap it would become a zombie - the primary use case is
the parent crashing or abandoning the pidfd so no one is around to call
waitpid(). CLONE_THREAD is rejected because autokill targets a process
not a thread.
If CLONE_NNP is specified together with CLONE_PIDFD_AUTOKILL an
unprivileged user may spawn a process that is autokilled. The child
cannot escalate privileges via setuid/setgid exec after being spawned.
If CLONE_PIDFD_AUTOKILL is specified without CLONE_NNP the caller must
have have CAP_SYS_ADMIN in its user namespace.
* patches from https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-0-d148b984a989@kernel.org:
selftests/pidfd: add CLONE_PIDFD_AUTOKILL tests
selftests/pidfd: add CLONE_NNP tests
selftests/pidfd: add CLONE_AUTOREAP tests
pidfd: add CLONE_PIDFD_AUTOKILL
clone: add CLONE_NNP
clone: add CLONE_AUTOREAP
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-0-d148b984a989@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add tests for CLONE_PIDFD_AUTOKILL:
- autokill_basic: Verify closing the clone3 pidfd kills the child.
- autokill_requires_pidfd: Verify AUTOKILL without CLONE_PIDFD fails.
- autokill_requires_autoreap: Verify AUTOKILL without CLONE_AUTOREAP
fails.
- autokill_rejects_thread: Verify AUTOKILL with CLONE_THREAD fails.
- autokill_pidfd_open_no_effect: Verify only the clone3 pidfd triggers
autokill, not pidfd_open().
- autokill_requires_cap_sys_admin: Verify AUTOKILL without CLONE_NNP
fails with -EPERM for an unprivileged caller.
- autokill_without_nnp_with_cap: Verify AUTOKILL without CLONE_NNP
succeeds with CAP_SYS_ADMIN.
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-6-d148b984a989@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add tests for the new CLONE_NNP flag:
- nnp_sets_no_new_privs: Verify a child created with CLONE_NNP has
no_new_privs set while the parent does not.
- nnp_rejects_thread: Verify CLONE_NNP | CLONE_THREAD is rejected
with -EINVAL since threads share credentials.
- autoreap_no_new_privs_unset: Verify a plain CLONE_AUTOREAP child
does not get no_new_privs.
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-5-d148b984a989@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add tests for the new CLONE_AUTOREAP clone3() flag:
- autoreap_without_pidfd: CLONE_AUTOREAP without CLONE_PIDFD works
(fire-and-forget)
- autoreap_rejects_exit_signal: CLONE_AUTOREAP with non-zero
exit_signal fails
- autoreap_rejects_parent: CLONE_AUTOREAP with CLONE_PARENT fails
- autoreap_rejects_thread: CLONE_AUTOREAP with CLONE_THREAD fails
- autoreap_basic: child exits, pidfd poll works, PIDFD_GET_INFO returns
correct exit code, waitpid() returns -ECHILD
- autoreap_signaled: child killed by signal, exit info correct via pidfd
- autoreap_reparent: autoreap grandchild reparented to subreaper still
auto-reaps
- autoreap_multithreaded: autoreap process with sub-threads auto-reaps
after last thread exits
- autoreap_no_inherit: grandchild forked without CLONE_AUTOREAP becomes
a regular zombie
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-4-d148b984a989@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add a new clone3() flag CLONE_PIDFD_AUTOKILL that ties a child's
lifetime to the pidfd returned from clone3(). When the last reference to
the struct file created by clone3() is closed the kernel sends SIGKILL
to the child. A pidfd obtained via pidfd_open() for the same process
does not keep the child alive and does not trigger autokill - only the
specific struct file from clone3() has this property.
This is useful for container runtimes, service managers, and sandboxed
subprocess execution - any scenario where the child must die if the
parent crashes or abandons the pidfd.
CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD (the whole point is tying
lifetime to the pidfd file) and CLONE_AUTOREAP (a killed child with no
one to reap it would become a zombie). CLONE_THREAD is rejected because
autokill targets a process not a thread.
The clone3 pidfd is identified by the PIDFD_AUTOKILL file flag set on
the struct file at clone3() time. The pidfs .release handler checks this
flag and sends SIGKILL via do_send_sig_info(SIGKILL, SEND_SIG_PRIV, ...)
only when it is set. Files from pidfd_open() or open_by_handle_at() are
distinct struct files that do not carry this flag. dup()/fork() share the
same struct file so they extend the child's lifetime until the last
reference drops.
CLONE_PIDFD_AUTOKILL uses a privilege model based on CLONE_NNP: without
CLONE_NNP the child could escalate privileges via setuid/setgid exec
after being spawned, so the caller must have CAP_SYS_ADMIN in its user
namespace. With CLONE_NNP the child can never gain new privileges so
unprivileged usage is allowed. This is a deliberate departure from the
pdeath_signal model which is reset during secureexec and commit_creds()
rendering it useless for container runtimes that need to deprivilege
themselves.
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-3-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add a new clone3() flag CLONE_NNP that sets no_new_privs on the child
process at clone time. This is analogous to prctl(PR_SET_NO_NEW_PRIVS)
but applied at process creation rather than requiring a separate step
after the child starts running.
CLONE_NNP is rejected with CLONE_THREAD. It's conceptually a lot simpler
if the whole thread-group is forced into NNP and not have single threads
running around with NNP.
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-2-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add a new clone3() flag CLONE_AUTOREAP that makes a child process
auto-reap on exit without ever becoming a zombie. This is a per-process
property in contrast to the existing auto-reap mechanism via
SA_NOCLDWAIT or SIG_IGN for SIGCHLD which applies to all children of a
given parent.
Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property
affecting all children which makes it unsuitable for libraries or
applications that need selective auto-reaping of specific children while
still being able to wait() on others.
CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct.
When the child exits do_notify_parent() checks this flag and causes
exit_notify() to transition the task directly to EXIT_DEAD. Since the
flag lives on the child it survives reparenting: if the original parent
exits and the child is reparented to a subreaper or init the child still
auto-reaps when it eventually exits.
CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to
monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern where the parent simply doesn't care about the child's exit
status. No exit signal is delivered so exit_signal must be zero.
CLONE_AUTOREAP is rejected in combination with CLONE_PARENT. If a
CLONE_AUTOREAP child were to clone(CLONE_PARENT) the new grandchild
would inherit exit_signal == 0 from the autoreap parent's group leader
but without signal->autoreap. This grandchild would become a zombie that
never sends a signal and is never autoreaped - confusing and arguably
broken behavior.
The flag is not inherited by the autoreap process's own children. Each
child that should be autoreaped must be explicitly created with
CLONE_AUTOREAP.
Link: https://github.com/uapi-group/kernel-features/issues/45
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-1-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Pull fsverity fixes from Eric Biggers:
- Fix a build error on parisc
- Remove the non-large-folio-aware function fsverity_verify_page()
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: fix build error by adding fsverity_readahead() stub
fsverity: remove fsverity_verify_page()
f2fs: make f2fs_verify_cluster() partially large-folio-aware
f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()
Pull crypto library fix from Eric Biggers:
"Fix a big endian specific issue in the PPC64-optimized AES code"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: powerpc/aes: Fix rndkey_from_vsx() on big endian CPUs
Stephen retired and stepped back from -next maintainership, update his
entry in CREDITS to recognise his 18 years of hard work making it what
it is today and all the impact it's had on our development process.
Also update to his current GnuPG key while we're here.
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The x509 public key code gained a dependency on the sha256 hash
implementation, causing a rare link time failure in randconfig
builds:
arm-linux-gnueabi-ld: crypto/asymmetric_keys/x509_public_key.o: in function `x509_get_sig_params':
x509_public_key.c:(.text.x509_get_sig_params+0x12): undefined reference to `sha256'
arm-linux-gnueabi-ld: (sha256): Unknown destination type (ARM/Thumb) in crypto/asymmetric_keys/x509_public_key.o
x509_public_key.c:(.text.x509_get_sig_params+0x12): dangerous relocation: unsupported relocation
Select the necessary library code from Kconfig.
Fixes: 2c62068ac8 ("x509: Separately calculate sha256 for blacklist")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Align to the commit bf4afc53b7 ("Convert 'alloc_obj' family to use the
new default GFP_KERNEL argument") update the 'kmalloc_obj' declaration
for userspace to fix below compile error:
In file included from arch/arm/boot/compressed/../../../../lib/decompress_unxz.c:241,
from arch/arm/boot/compressed/decompress.c:56:
arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c: In function 'xz_dec_init':
arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c:787:28: error: implicit declaration of function 'kmalloc_obj'; did you mean 'kmalloc'? [-Wimplicit-function-declaration]
787 | struct xz_dec *s = kmalloc_obj(*s);
| ^~~~~~~~~~~
| kmalloc
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Fixes: 69050f8d6d ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
Fixes: bf4afc53b7 ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument")
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RTC updates from Alexandre Belloni:
- loongson: Loongson-2K0300 support
- s35390a: nvmem support
- zynqmp: rework calibration
* tag 'rtc-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: ds1390: fix number of bytes read from RTC
rtc: class: Remove duplicate check for alarm
rtc: optee: simplify OP-TEE context match
rtc: interface: Alarm race handling should not discard preceding error
rtc: s35390a: implement nvmem support
rtc: loongson: Add Loongson-2K0300 support
dt-bindings: rtc: loongson: Document Loongson-2K0300 compatible
dt-bindings: rtc: loongson: Correct Loongson-1C interrupts property
dt-bindings: rtc: renesas,rz-rtca3: Add RZ/V2N support
dt-bindings: rtc: cpcap: convert to schema
rtc: zynqmp: use dynamic max and min offset ranges
rtc: zynqmp: rework set_offset
rtc: zynqmp: rework read_offset
rtc: zynqmp: check calibration max value
rtc: zynqmp: correct frequency value
rtc: amlogic-a4: Remove IRQF_ONESHOT
rtc: pcf8563: use correct of_node for output clock
rtc: max31335: use correct CONFIG symbol in IS_REACHABLE()
rtc: nvvrs: Add ARCH_TEGRA to the NV VRS RTC driver
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Pass '-Zunstable-options' flag required by the future Rust 1.95.0
- Fix 'objtool' warning for Rust 1.84.0
'kernel' crate:
- 'irq' module: add missing bound detected by the future Rust 1.95.0
- 'list' module: add missing 'unsafe' blocks and placeholder safety
comments to macros (an issue for future callers within the crate)
'pin-init' crate:
- Clean Clippy warning that changed behavior in the future Rust
1.95.0"
* tag 'rust-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: list: Add unsafe blocks for container_of and safety comments
rust: pin-init: replace clippy `expect` with `allow`
rust: irq: add `'static` bounds to irq callbacks
objtool/rust: add one more `noreturn` Rust function
rust: kbuild: pass `-Zunstable-options` for Rust 1.95.0
Pull runtime verifier fix from Steven Rostedt:
- Fix multiple definition of __pcpu_unique_da_mon_this
After refactoring monitors, we used static per-cpu variables with the
same names across different per-cpu monitors. This is explicitly
disallowed for modules on some architectures (alpha) or if
CONFIG_DEBUG_FORCE_WEAK_PER_CPU is enabled (e.g. Fedora's debug
kernel). Make sure all those variables have different names to avoid
compilation issues.
* tag 'trace-rv-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix multiple definition of __pcpu_unique_da_mon_this
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most simple allocations use GFP_KERNEL, and with the new allocation
helpers being introduced, let's just take advantage of that to simplify
that default case.
It's a numbers game:
git grep 'alloc_obj(' |
sed 's/.*\(GFP_[_A-Z]*\).*/\1/' |
sort | uniq -c | sort -n | tail
shows that about 90% of all those new allocator instances just use that
standard GFP_KERNEL.
Those helpers are already macros, and we can easily just make it be the
default case when the gfp argument is missing.
And yes, we could do that for all the legacy interfaces too, but let's
keep it to just the new ones at least for now, since those all got
converted recently anyway, so this is not any "extra" noise outside of
that limited conversion.
And, in fact, I want to do this before doing the -rc1 release, exactly
so that we don't get extra merge conflicts.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 69050f8d6d ("treewide: Replace kmalloc with kmalloc_obj for
non-scalar types") started using the new allocation helpers, and in the
process showed that they were completely non-working.
The overflow logic in overflows_flex_counter_type() is completely the
wrong way around, and that broke __alloc_flex() completely. By chance,
the resulting code was then such a mess that clang generated
sufficiently garbage code that objtool warned about it all. Which made
it somewhat quicker to narrow things down.
While fixing overflows_flex_counter_type() would presumably fix this
all, I'm excising the whole broken overflow logic from __alloc_flex(),
because we don't want that kind of code in basic allocation functions
anyway.
That (no longer) broken overflows_flex_counter_type() thing needs to be
inserted into the actual __set_flex_counter() logic in the unlikely case
that we ever want this at all. And made conditional.
Fixes: 81cee9166a ("compiler_types: Introduce __flex_counter() and family")
Fixes: 69050f8d6d ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
Cc: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/all/CAHk-=whEd020BYzGTzYrENjD9Z5_82xx6h8HsQvH5xDSnv0=Hw@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kmalloc_obj conversion from Kees Cook:
"This does the tree-wide conversion to kmalloc_obj() and friends using
coccinelle, with a subsequent small manual cleanup of whitespace
alignment that coccinelle does not handle.
This uncovered a clang bug in __builtin_counted_by_ref(), so the
conversion is preceded by disabling that for current versions of
clang. The imminent clang 22.1 release has the fix.
I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I
did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv,
s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc"
* tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kmalloc_obj: Clean up after treewide replacements
treewide: Replace kmalloc with kmalloc_obj for non-scalar types
compiler_types: Disable __builtin_counted_by_ref for Clang
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Introduce 'perf sched stats' tool with record/report/diff workflows
using schedstat counters
- Add a faster libdw based addr2line implementation and allow selecting
it or its alternatives via 'perf config addr2line.style='
- Data-type profiling fixes and improvements including the ability to
select fields using 'perf report''s -F/-fields, e.g.:
'perf report --fields overhead,type'
- Add 'perf test' regression tests for Data-type profiling with C and
Rust workloads
- Fix srcline printing with inlines in callchains, make sure this has
coverage in 'perf test'
- Fix printing of leaf IP in LBR callchains
- Fix display of metrics without sufficient permission in 'perf stat'
- Print all machines in 'perf kvm report -vvv', not just the host
- Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1
code
- Fix 'perf report's histogram entry collapsing with '-F' option
- Use system's cacheline size instead of a hardcoded value in 'perf
report'
- Allow filtering conversion by time range in 'perf data'
- Cover conversion to CTF using 'perf data' in 'perf test'
- Address newer glibc const-correctness (-Werror=discarded-qualifiers)
issues
- Fixes and improvements for ARM's CoreSight support, simplify ARM SPE
event config in 'perf mem', update docs for 'perf c2c' including the
ARM events it can be used with
- Build support for generating metrics from arch specific python
script, add extra AMD, Intel, ARM64 metrics using it
- Add AMD Zen 6 events and metrics
- Add JSON file with OpenHW Risc-V CVA6 hardware counters
- Add 'perf kvm' stats live testing
- Add more 'perf stat' tests to 'perf test'
- Fix segfault in `perf lock contention -b/--use-bpf`
- Fix various 'perf test' cases for s390
- Build system cleanups, bump minimum shellcheck version to 0.7.2
- Support building the capstone based annotation routines as a plugin
- Allow passing extra Clang flags via EXTRA_BPF_FLAGS
* tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (255 commits)
perf test script: Add python script testing support
perf test script: Add perl script testing support
perf script: Allow the generated script to be a path
perf test: perf data --to-ctf testing
perf test: Test pipe mode with data conversion --to-json
perf json: Pipe mode --to-ctf support
perf json: Pipe mode --to-json support
perf check: Add libbabeltrace to the listed features
perf build: Allow passing extra Clang flags via EXTRA_BPF_FLAGS
perf test data_type_profiling.sh: Skip just the Rust tests if code_with_type workload is missing
tools build: Fix feature test for rust compiler
perf libunwind: Fix calls to thread__e_machine()
perf stat: Add no-affinity flag
perf evlist: Reduce affinity use and move into iterator, fix no affinity
perf evlist: Missing TPEBS close in evlist__close()
perf evlist: Special map propagation for tool events that read on 1 CPU
perf stat-shadow: In prepare_metric fix guard on reading NULL perf_stat_evsel
Revert "perf tool_pmu: More accurately set the cpus for tool events"
tools build: Emit dependencies file for test-rust.bin
tools build: Make test-rust.bin be removed by the 'clean' target
...
Pull coccinelle updates from Julia Lawall:
"This simplifies and clarifies the handling of output generated by
Coccinelle that is sent to standard error.
By default, this goes to /dev/null. Remind the user of that and
encourage them to provide another file name (Benjamin Philip)"
* tag 'cocci-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
Documentation: Coccinelle: document debug log handling
scripts: coccicheck: warn on unset debug file
scripts: coccicheck: simplify debug file handling
Pull NTB (PCIe non-transparent bridge) updates from Jon Mason:
"NTB updates include debugfs improvements, correctness fixes, cleanups,
and new hardware support:
ntb_transport QP stats are converted to seq_file, a tx_memcpy_offload
module parameter is introduced with associated ordering fixes, and a
debugfs queue name truncation bug is corrected.
Additional fixes address format specifier mismatches in ntb_tool and
boundary conditions in the Switchtec driver, while unused MSI helpers
are removed and the codebase migrates to dma_map_phys().
Intel Gen6 (Diamond Rapids) NTB support is also added"
* tag 'ntb-7.0' of https://github.com/jonmason/ntb:
NTB: ntb_transport: Use seq_file for QP stats debugfs
NTB: ntb_transport: Fix too small buffer for debugfs_name
ntb/ntb_tool: correct sscanf format for u64 and size_t in tool_peer_mw_trans_write
ntb: intel: Add Intel Gen6 NTB support for DiamondRapids
NTB/msi: Remove unused functions
ntb: ntb_hw_switchtec: Increase MAX_MWS limit to 256
ntb: ntb_hw_switchtec: Fix array-index-out-of-bounds access
ntb: ntb_hw_switchtec: Fix shift-out-of-bounds for 0 mw lut
NTB: epf: allow built-in build
ntb: migrate to dma_map_phys instead of map_page
NTB: ntb_transport: Add 'tx_memcpy_offload' module option
NTB: ntb_transport: Remove unused 'retries' field from ntb_queue_entry
Pull io_uring fixes from Jens Axboe:
- A fix for a missing URING_CMD128 opcode check, fixing an issue with
the SQE mixed mode support introduced in 6.19. Merged late due to
having multiple dependencies
- Add sqe->cmd size checking for big SQEs, similar to what we have for
normal sized SQEs
- Fix a race condition in zcrx, that leads to a double free
* tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: Add size check for sqe->cmd
io_uring: add IORING_OP_URING_CMD128 to opcode checks
io_uring/zcrx: fix user_ref race between scrub and refill paths
Pull memblock fix from Mike Rapoport:
"Fix detection of NUMA node for CXL windows
phys_to_target_node() may assign a CXL Fixed Memory Window to the
wrong NUMA node when a CXL node resides in the gap of discontinuous
System RAM node.
Fix this by checking both numa_meminfo and numa_reserved_meminfo,
preferring the reserved NID when the address appears in both"
* tag 'fixes-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm: numa_memblks: Identify the accurate NUMA ID of CFMW
Pull sched_ext fixes from Tejun Heo:
- Various bug fixes for the example schedulers and selftests
* tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
tools/sched_ext: fix getopt not re-parsed on restart
tools/sched_ext: scx_userland: fix data races on shared counters
tools/sched_ext: scx_pair: fix stride == 0 crash on single-CPU systems
tools/sched_ext: scx_central: fix CPU_SET and skeleton leak on early exit
tools/sched_ext: scx_userland: fix stale data on restart
tools/sched_ext: scx_flatcg: fix potential stack overflow from VLA in fcg_read_stats
selftests/sched_ext: Fix rt_stall flaky failure
tools/sched_ext: scx_userland: fix restart and stats thread lifecycle bugs
tools/sched_ext: scx_central: fix sched_setaffinity() call with the set size
tools/sched_ext: scx_flatcg: zero-initialize stats counter array
Pull smb server fixes from Steve French:
"Two small fixes:
- fix potential deadlock
- minor cleanup"
* tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: call ksmbd_vfs_kern_path_end_removing() on some error paths
smb: server: Remove duplicate include of misc.h
The current debug documentation does not mention that logs are printed
to stdout unless DEBUG_FILE is set. It also doesn't mention that
Coccinelle cannot overwrite debug files.
Document this behaviour in the examples and reference it in the
debugging section.
Signed-off-by: Benjamin Philip <benjamin.philip495@gmail.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
coccicheck prints debug logs to stdout unless a debug file has been set.
This makes it hard to read coccinelle's suggested changes, especially
for someone new to coccicheck.
From this commit, we warn about this behaviour from within the script on
an unset debug file. Explicitly setting the debug file to /dev/null
suppresses the warning while keeping the default.
Signed-off-by: Benjamin Philip <benjamin.philip495@gmail.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
This commit separates handling unset files and pre-existing files. It
also eliminates a duplicated check for unset files in run_cmd_parmap().
Signed-off-by: Benjamin Philip <benjamin.philip495@gmail.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
Unfortunately, there is a corner case of __builtin_counted_by_ref()
usage that crashes[1] Clang since support was introduced in Clang 19.
Disable it prior to Clang 22. Found while tested kmalloc_obj treewide
refactoring (via kmalloc_flex() usage).
Link: https://github.com/llvm/llvm-project/issues/182575 [1]
Signed-off-by: Kees Cook <kees@kernel.org>
After goto restart, optind retains its advanced position from the
previous getopt loop, causing getopt() to immediately return -1.
This silently drops all command-line options on the restarted skeleton.
Reset optind to 1 at the restart label so options are re-parsed.
Affected schedulers: scx_simple, scx_central, scx_flatcg, scx_pair,
scx_sdt, scx_cpu0.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The stats thread reads nr_vruntime_enqueues, nr_vruntime_dispatches,
nr_vruntime_failed, and nr_curr_enqueued concurrently with the main
thread writing them, with no synchronization.
Use __atomic builtins with relaxed ordering for all accesses to these
counters to eliminate the data races.
Only display accuracy is affected, not scheduling correctness.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull spi fixes from Mark Brown:
"There's a relatively large but ultimately simple fix for spidev here
which addresses some ABBA races by simplifying down to just using a
single lock, it's not clear to me that there was ever any benefit in
having the two separate locks in the first place.
We also have simple missing error check fix in in the wpcm-fiu driver"
* tag 'spi-fix-v7.0-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spidev: fix lock inversion between spi_lock and buf_lock
spi: wpcm-fiu: Fix potential NULL pointer dereference in wpcm_fiu_probe()
Pull regulator fixes from Mark Brown:
"A few driver specific fixes, plus a patch from Bjorn which removes a
fixed limit on regulator names that was breaking some Qualcomm
systems"
* tag 'regulator-fix-v7.0-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: s2mps11: fix pctrlsel macro usage in s2mpg10_of_parse_cb()
regulator: s2mps11: drop redundant sanity checks in s2mpg10_of_parse_cb()
regulator: core: Remove regulator supply_name length limit
regulator: mt6363: Fix interrmittent timeout
Pull dmi update from Jean Delvare:
- include product_family info in dmi-id modalias
* tag 'dmi-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware/dmi: Include product_family info to modalias
Pull gpio fixes from Bartosz Golaszewski:
- add a missing IS_ERR() check in gpio-nomadik
- fix a NULL-pointer dereference in GPIO character device code
- restore label matching in swnode-lookup due to reported regressions
in existing users (this will get removed again once we audit and
update all drivers)
- fix remove path in GPIO sysfs code
- normalize the return value of gpio_chip::get() in gpio-amd-fch
* tag 'gpio-fixes-for-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: amd-fch: ionly return allowed values from amd_fch_gpio_get()
gpio: sysfs: fix chip removal with GPIOs exported over sysfs
gpio: swnode: restore the swnode-name-against-chip-label matching
gpio: cdev: Avoid NULL dereference in linehandle_create()
gpio: nomadik: Add missing IS_ERR() check
Pull more i2c updates from Wolfram Sang:
"Designware:
- refactor the transfer path to support I2C_M_STOP
- handle pm runtime by using the active auto try macros
- handle controllers lacking explicit START and STOP conditions
- general cleanups
Other i2c drivers:
- qualcomm: add support for qcs8300-cci
- amd8111: general cleanups
- cp2112: add DT bindings"
* tag 'i2c-for-7.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
dt-bindings: i2c: Add CP2112 HID USB to SMBus Bridge
i2c: amd8111: switch to devm_ functions
i2c: amd8111: Remove spaces in MODULE_* macros
i2c: designware-platdrv: fix cleanup on probe failure
i2c: designware-platdrv: simplify reset control
dt-bindings: i2c: qcom-cci: Document qcs8300 compatible
i2c: designware: Remove dead code in AMD ISP case
i2c: designware: Support of controller with IC_EMPTYFIFO_HOLD_MASTER disabled
i2c: designware: Use runtime PM macro for auto-cleanup
i2c: designware: Implement I2C_M_STOP support
Pull sound fixes from Takashi Iwai:
"Here are a bunch of updates, but there should be no big surprises;
mostly device-specific quirks and fix-ups or non-code changes:
- Quirks for ASoC AMD, HD-audio and USB-audio
- Fixes in ASoC fsl, rockchip, renesas, aw codecs
- Fixes for USB-audio packet handling in the implicit feedback mode
- Updates of SPDX license IDs in some files"
* tag 'sound-fix-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
ASoC: rockchip: i2s-tdm: Use param rate if not provided by set_sysclk
ALSA: hda/hdmi: Add quirk for TUXEDO IBS14G6
ASoC: dt-bindings: asahi-kasei,ak5558: Fix the supply names
ASoC: dt-bindings: asahi-kasei,ak4458: Fix the supply names
ASoC: dt-bindings: asahi-kasei,ak4458: set unevaluatedProperties:false
ASoC: amd: amd_sdw: add machine driver quirk for Lenovo models
ASoC: amd: acp: Add ACP7.0 match entries for Realtek parts
ALSA: echoaudio: Add SPDX ids to some files
ALSA: isa: Add SPDX id lines to some files
ALSA: core: Add SPDX license id to files
ASoC: tas2783A: add explicit port prepare handling
ASoC: renesas: rz-ssi: Fix playback and capture
ALSA: hda/realtek: Fix headset mic on ASUS Zenbook 14 UX3405MA
ALSA: hda/conexant: Fix headphone jack handling on Acer Swift SF314
ASoC: qcom: sm8250: Add quinary MI2S support
ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 15X M6501RR
ALSA: usb-audio: Avoid potentially repeated XRUN error messages
ALSA: usb-audio: Add sanity check for OOB writes at silencing
ALSA: usb-audio: Optimize the copy of packet sizes for implicit fb handling
ALSA: usb-audio: Update the number of packets properly at receiving
...
Pull more fbdev updates from Helge Deller:
"Code cleanups for the au1100fb fbdev driver (Uwe Kleine-König)"
* tag 'fbdev-for-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: au1100fb: Replace license boilerplate by SPDX header
fbdev: au1100fb: Fold au1100fb.h into its only user
fbdev: au1100fb: Replace custom printk wrappers by pr_*
fbdev: au1100fb: Make driver compilable on non-mips platforms
fbdev: au1100fb: Use proper conversion specifiers in printk formats
fbdev: au1100fb: Mark several local functions as static
fbdev: au1100fb: Don't store device specific data in global variables