Introduce SKIP_LLVM, SKIP_LIBBFD, and SKIP_CRYPTO build flags that let
users build bpftool without these optional dependencies.
SKIP_LLVM=1 skips LLVM even when detected. SKIP_LIBBFD=1 prevents the
libbfd JIT disassembly fallback when LLVM is absent. Together, they
produce a bpftool with no disassembly support.
SKIP_CRYPTO=1 excludes sign.c and removes the -lcrypto link dependency.
Inline stubs in main.h return errors with a clear message if signing
functions are called at runtime.
Use BPFTOOL_WITHOUT_CRYPTO (not HAVE_LIBCRYPTO_SUPPORT) as the C
define, following the BPFTOOL_WITHOUT_SKELETONS naming convention for
bpftool-internal build config, leaving HAVE_LIBCRYPTO_SUPPORT free for
proper feature detection in the future.
All three flags are propagated through the selftests Makefile to bpftool
sub-builds.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260312-b4-bpftool_build-v2-1-4c9d57133644@meta.com
Emil Tsalapatis says:
====================
bpf: Relax 8 frame limitation for global subprogs
The BPF verifier currently limits the maximum runtime call stack to
8 frames. Larger BPF programs like sched-ext schedulers routinely
fail verification because they exceed this limit, even as they use
very little actual stack space for each frame.
Relax the verifier to permit call stacks > 8 frames deep when the
call stacks include global subprogs. The old 8 stack frame limit now
only applies to call stacks composed entirely of static function calls.
This works because global functions are each verified in isolation, so
the verifier does not need to cross-reference verification state across
the function call boundary, which has been the reason for limiting the
call stack size in the first place.
This patch does not change the verification time limit of 8 stack
frames. Static functions that are inlined for verification purposes
still only go 8 frames deep to avoid changing the verifier's internal
data structures used for verification. These data structures only
support holding information on up to 8 stack frames.
This patch also does not adjust the actual maximum stack size of 512.
CHANGELOG
=========
v5 -> v6 (https://lore.kernel.org/bpf/20260311182831.91219-1-emil@etsalapatis.com/)
- Make bpf_subprog_call_depth_info internal to verifier.c (Alexei)
v4 -> v5 (https://lore.kernel.org/bpf/20260309204430.201219-1-emil@etsalapatis.com/)
- Move depth tracking state to verifier (Eduard) and free it after verification (Alexei)
- Fix selftest patch title and formatting errors (Yonghong)
v3 -> v4 (https://lore.kernel.org/bpf/20260303043106.406099-1-emil@etsalapatis.com/)
- Factor out temp call depth tracking info into its own struct (Eduard)
- Bring depth calculation loop in line with the other instances (Mykyta)
- Add comment on why selftest call stack is 16 bytes/frame (Eduard)
- Rename "cidx" to "caller" for clarity (Mykyta, Eduard)
v2 -> v3 (https://lore.kernel.org/bpf/20260210213606.475415-1-emil@etsalapatis.com/)
- Change logic to remove arbitrary limit on call depth (Eduard)
- Add additional selftests (Eduard)
v1 -> v2 (https://lore.kernel.org/bpf/20260202233716.835638-1-emil@etsalapatis.com)
- Adjust patch to only increase the runtime stack depth, leaving the
verification-time stack depth unchanged (Alexei)
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
====================
Link: https://patch.msgid.link/20260316161225.128011-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF verifier currently enforces a call stack depth of 8 frames,
regardless of the actual stack space consumption of those frames. The
limit is necessary for static call stacks, because the bookkeeping data
structures used by the verifier when stepping into static functions
during verification only support 8 stack frames. However, this
limitation only matters for static stack frames: Global subprogs are
verified by themselves and do not require limiting the call depth.
Relax this limitation to only apply to static stack frames. Verification
now only fails when there is a sequence of 8 calls to non-global
subprogs. Calling into a global subprog resets the counter. This allows
deeper call stacks, provided all frames still fit in the stack.
The change does not increase the maximum size of the call stack, only
the maximum number of frames we can place in it.
Also change the progs/test_global_func3.c selftest to use static
functions, since with the new patch it would otherwise unexpectedly
pass verification.
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260316161225.128011-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul Chaignon says:
====================
Optimize bounds refinement by reordering deductions
This patchset optimizes the bounds refinement (reg_bounds_sync) by
reordering deductions in __reg_deduce_bounds. This reordering allows us
to improve precision slightly while losing one call to
__reg_deduce_bounds.
The first patch from Eduard refactors the __reg_deduce_bounds
subfunctions, the second patch implements the reordering, and the last
one adds a selftest.
Changes in v3:
- Added first commit from Eduard that significantly helps with
readability of second commit.
- Reshuffled a bit more the functions in the second commit to improve
precision (Eduard).
- Rebased.
Changes in v2:
- Updated description to mention potential precision improvement and
to clarify the sequence of refinements (Shung-Hsi).
- Added the second patch.
- Rebased.
====================
Link: https://patch.msgid.link/cover.1773401138.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This new selftest demonstrates the improvement of bounds refinement from
the previous patch. It is inspired from a set of reg_bounds_sync inputs
generated using CBMC [1] by Shung-Hsi:
reg.smin_value=0x8000000000000002
reg.smax_value=2
reg.umin_value=2
reg.umax_value=19
reg.s32_min_value=2
reg.s32_max_value=3
reg.u32_min_value=2
reg.u32_max_value=3
reg_bounds_sync returns R=[2; 3] without the previous patch, and R=2
with it. __reg64_deduce_bounds is able to derive that u64=2, but before
the previous patch, those bounds are overwritten in
__reg_deduce_mixed_bounds using the 32bits bounds.
To arrive to these reg_bounds_sync inputs, we bound the 32bits value
first to [2; 3]. We can then upper-bound s64 without impacting u64. At
that point, the refinement to u64=2 doesn't happen because the ranges
still overlap in two points:
0 umin=2 umax=0xff..ff00..03 U64_MAX
| [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] |
|----------------------------|------------------------------|
|xx] [xxxxxxxxxxxxxxxxxxxxxxxxxxxx|
0 smax=2 smin=0x800..02 -1
With an upper-bound check at value 19, we can reach the above inputs for
reg_bounds_sync. At that point, the refinement to u64=2 happens and
because it isn't overwritten by __reg_deduce_mixed_bounds anymore,
reg_bounds_sync returns with reg=2.
The test validates this result by including an illegal instruction in
the (dead) branch reg != 2.
Link: https://github.com/shunghsiyu/reg_bounds_sync-review/ [1]
Co-developed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/622dc51c581cd4d652fff362188b2a5f73c1fe99.1773401138.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In commit 5dbb19b16a ("bpf: Add third round of bounds deduction"), I
added a new round of bounds deduction because two rounds were not enough
to converge to a fixed point. This commit slightly refactor the bounds
deduction logic such that two rounds are enough.
In [1], Eduard noticed that after we improved the refinement logic, a
third call to the bounds deduction (__reg_deduce_bounds) was needed to
converge to a fixed point. More specifically, we needed this third call
to improve the s64 range using the s32 range. We added the third call
and postponed a more detailed analysis of the refinement logic.
I've been looking into this more recently. The register refinement
consists of the following calls.
__update_reg_bounds();
3 x __reg_deduce_bounds() {
deduce_bounds_32_from_64();
deduce_bounds_32_from_32();
deduce_bounds_64_from_64();
deduce_bounds_64_from_32();
};
__reg_bound_offset();
__update_reg_bounds();
From this, we can observe that we first improve the 32bit ranges from
the 64bit ranges in deduce_bounds_32_from_64, then improve the 64bit
ranges on their own in deduce_bounds_64_from_64. Intuitively, if we
were to improve the 64bit ranges on their own *before* we use them to
improve the 32bit ranges, we may reach a fixed point earlier.
In a similar manner, using CBMC, Eduard found that it's best to improve
the 32bit ranges on their own *after* we've improve them using the 64bit
ranges. That is, running deduce_bounds_32_from_32 after
deduce_bounds_32_from_64.
These changes allow us to lose one call to __reg_deduce_bounds. Without
this reordering, the test "verifier_bounds/bounds deduction cross sign
boundary, negative overlap" fails when removing one call to
__reg_deduce_bounds. In some cases, this change can even improve
precision a little bit, as illustrated in the new selftest in the next
patch.
As expected, this change didn't have any impact on the number of
instructions processed when running it through the Cilium complexity
test suite [2].
Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Link: https://pchaigno.github.io/test-verifier-complexity.html [2]
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/1b00d2749ec4c774c3ada84e265ac7fda72cfe56.1773401138.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On powerpc, immediate load instructions are sign extended. In case
of unsigned types, arguments should be explicitly zero-extended by
the caller. For kfunc call, this needs to be handled in the JIT code.
In bpf_kfunc_call_test4(), that tests for sign-extension of signed
argument types in kfunc calls, add some additional failure checks.
And add bpf_kfunc_call_test5() to test zero-extension of unsigned
argument types in kfunc calls.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260312080113.843408-1-hbathini@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Building selftests with
clang 23.0.0 (6fae863eba8a72cdd82f37e7111a46a70be525e0) triggers
the following error:
tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c:117:12:
error: assigning to 'char *' from 'const char *' discards qualifiers
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]
The variable `tgt_name` is declared as `char *`, but it stores the
result of strstr(prog_name[i], "/"). Since `prog_name[i]` is a
`const char *`, the returned pointer should also be treated as
const-qualified.
Update `tgt_name` to `const char *` to match the type of the underlying
string and silence the compiler warning.
Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Menglong Dong <menglong.dong@linux.dev>
Link: https://lore.kernel.org/bpf/20260305222132.470700-1-varunrmallya@gmail.com
livepatch_trampoline relies on livepatch sysfs and livepatch-sample.ko.
When CONFIG_LIVEPATCH is disabled or the samples module isn't built, the
test fails with ENOENT and causes false failures in minimal CI configs.
Skip the test when livepatch sysfs or the sample module is unavailable.
Also avoid writing to livepatch sysfs when it's not present.
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260309104448.817401-1-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Patch 1/2 added PID filtering to the probe_user BPF program to avoid
cross-test interference from the global connect() hooks.
With the interference removed, drop the serial_ prefix and remove the
stale TODO comment so the test can run in parallel.
Tested:
./test_progs -t probe_user -v
./test_progs -j$(nproc) -t probe_user
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260306083330.518627-2-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test installs a kprobe on __sys_connect and checks that
bpf_probe_write_user() can modify the syscall argument. However, any
concurrent thread in any other test that calls connect() will also
trigger the kprobe and have its sockaddr silently overwritten, causing
flaky failures in unrelated tests.
Constrain the hook to the current test process by filtering on a PID
stored as a global variable in .bss. Initialize the .bss value from
user space before bpf_object__load() using bpf_map__set_initial_value(),
and validate the bss map value size to catch layout mismatches.
No new map is introduced and the test keeps the existing non-skeleton
flow.
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260306083330.518627-1-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The module_attach test contains subtests which check that unloading a
module while there are BPF programs attached to its functions is not
possible because the module is still referenced.
The problem is that the test calls the generic unload_module() helper
function which is used for module cleanup after test_progs terminate and
tries to wait until all module references are released. This
unnecessarily slows down the module_attach subtests since each
unsuccessful call to unload_module() takes about 1 second.
Introduce try_unload_module() which takes the number of retries as a
parameter. Make unload_module() call it with the currently used amount
of 10000 retries but call it with just 1 retry from module_attach tests
as it is always expected to fail. This speeds up the module_attach()
test significantly.
Before:
# time ./test_progs -t module_attach
[...]
Summary: 1/14 PASSED, 0 SKIPPED, 0 FAILED
real 0m5.011s
user 0m0.293s
sys 0m0.108s
After:
# time ./test_progs -t module_attach
[...]
Summary: 1/14 PASSED, 0 SKIPPED, 0 FAILED
real 0m0.350s
user 0m0.197s
sys 0m0.063s
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260306101628.3822284-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently BPF selftests will fail to compile if CONFIG_SMC
is not set.
Use BPF CO-RE to work around the case where CONFIG_SMC is
not set; use ___local variants of relevant structures and
utilize bpf_core_field_exists() for net->smc.
The test continues to pass where
CONFIG_SMC=y
CONFIG_SMC_HS_CTRL_BPF=y
but these changes allow the selftests to build in the absence
of CONFIG_SMC=y.
Also ensure that we get a pure skip rather than a skip+fail
by removing the SMC is unsupported part from the ASSERT_FALSE()
in get_smc_nl_family(); doing this means we get a skip without
a fail when CONFIG_SMC is not set:
$ sudo ./test_progs -t bpf_smc
Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED
Fixes: beb3c67297 ("bpf/selftests: Add selftest for bpf_smc_hs_ctrl")
Reported-by: Colm Harrington <colm.harrington@oracle.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://patch.msgid.link/20260310111330.601765-1-alan.maguire@oracle.com
Hui Zhu says:
====================
Fix test_cgroup_iter_memcg issues found during back-porting
While back-porting "mm: bpf kfuncs to access memcg data", I
encountered issues with test_cgroup_iter_memcg, specifically
in test_kmem.
The test_cgroup_iter_memcg test would falsely pass when
bpf_mem_cgroup_page_state() failed due to incompatible enum
values across kernel versions. Additionally, test_kmem would
fail on systems with cgroup.memory=nokmem enabled.
These patches are my fixes for the problems I encountered.
Changelog:
v5:
According to the comments of Emil Tsalapatis and JP Kobryn, dropped
"selftests/bpf: Check bpf_mem_cgroup_page_state return value".
v4:
Fixed wrong git commit log in "bpf: Use bpf_core_enum_value for stats in
cgroup_iter_memcg".
v3:
According to the comments of JP Kobryn, remove kmem subtest from
cgroup_iter_memcg and fix assertion string in test_pgfault.
v2:
According to the comments of JP Kobryn, added bpf_core_enum_value()
usage in the BPF program to handle cross-kernel enum value differences
at load-time instead of compile-time.
Dropped the mm/memcontrol.c patch.
Modified test_kmem handling: instead of skipping when nokmem is set,
verify that kmem value is zero as expected.
According to the comments of bot, fixed assertion message: changed
"bpf_mem_cgroup_page_state" to "bpf_mem_cgroup_vm_events" for PGFAULT
check.
====================
Link: https://patch.msgid.link/cover.1772505399.git.zhuhui@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replace hardcoded enum values with bpf_core_enum_value() calls in
cgroup_iter_memcg test to improve portability across different
kernel versions.
The change adds runtime enum value resolution for:
- node_stat_item: NR_ANON_MAPPED, NR_SHMEM, NR_FILE_PAGES,
NR_FILE_MAPPED
- vm_event_item: PGFAULT
This ensures the BPF program can adapt to enum value changes
between kernel versions.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: JP Kobryn <jp.kobryn@linux.dev>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Link: https://lore.kernel.org/r/ca6eb1a1a4fd7a17ffe995acf52c9a4ceb7bac13.1772505399.git.zhuhui@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When cgroup.memory=nokmem is set in the kernel command line, kmem
accounting is disabled. This causes the test_kmem subtest in
cgroup_iter_memcg to fail because it expects non-zero kmem values.
Remove the kmem subtest altogether since the remaining subtests
(shmem, file, pgfault) already provide sufficient coverage for
the cgroup iter memcg functionality.
Reviewed-by: JP Kobryn <jp.kobryn@linux.dev>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Link: https://lore.kernel.org/r/35fa32a019361ec26265c8a789ee31e448d4dbda.1772505399.git.zhuhui@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cupertino Miranda says:
====================
bpf: support for non_null ptr detection with JEQ/JNE with register operand
Changes from v1:
- Corrected typos in commit messages.
- Fixed indentation.
- Replaced text by simpler version suggested by Eduard.
Changes from v2:
- Small fixes after AI patch checker complaints.
Changes from v3:
- Removed log file. No idea how that got added.
====================
Link: https://patch.msgid.link/20260304195018.181396-1-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Viktor Malik says:
====================
Always allow sleepable and fmod_ret programs on syscalls
Both sleepable and fmod_ret programs are only allowed on selected
functions. For convenience, the error injection list was originally
used.
When error injection is disabled, that list is empty and sleepable
tracing programs, as well as fmod_ret programs, are effectively
unavailable.
This patch series addresses the issue by at least enabling sleepable and
fmod_ret programs on syscalls, if error injection is disabled. More
details on why syscalls are used can be found in [1].
[1] https://lore.kernel.org/bpf/CAADnVQK6qP8izg+k9yV0vdcT-+=axtFQ2fKw7D-2Ei-V6WS5Dw@mail.gmail.com/
Changes in v3:
- Handle LoongArch (Leon)
- Add Kumar's and Leon's acks
Changes in v2:
- Check "sys_" prefix instead of "sys" for powerpc syscalls (AI review)
- Add link to the original discussion (Kumar)
- Add explanation why arch syscall prefixes are hard-coded (Leon)
====================
Link: https://patch.msgid.link/cover.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that sleepable programs are always enabled on syscalls, let
refcounted_kptr tests use syscalls rather than bpf_testmod_test_read,
which is not sleepable with error injection disabled.
The tests just check that the verifier can handle usage of RCU locks in
sleepable programs and never actually attach. So, the attachment target
doesn't matter (as long as it is sleepable) and with syscalls, the tests
pass on kernels with disabled error injection.
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/8b6626eae384559855f7a0e846a16e83f25f06f6.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
fmod_ret BPF programs can only be attached to selected functions. For
convenience, the error injection list was originally used (along with
functions prefixed with "security_"), which contains syscalls and
several other functions.
When error injection is disabled (CONFIG_FUNCTION_ERROR_INJECTION=n),
that list is empty and fmod_ret programs are effectively unavailable for
most of the functions. In such a case, at least enable fmod_ret programs
on syscalls.
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/472310f9a5f4944ad03214e4d943a4830fd8eb76.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sleepable BPF programs can only be attached to selected functions. For
convenience, the error injection list was originally used, which
contains syscalls and several other functions.
When error injection is disabled (CONFIG_FUNCTION_ERROR_INJECTION=n),
that list is empty and sleepable tracing programs are effectively
unavailable. In such a case, at least enable sleepable programs on
syscalls. For discussion why syscalls were chosen, see [1].
To detect that a function is a syscall handler, we check for
arch-specific prefixes for the most common architectures. Unfortunately,
the prefixes are hard-coded in arch syscall code so we need to hard-code
them, too.
[1] https://lore.kernel.org/bpf/CAADnVQK6qP8izg+k9yV0vdcT-+=axtFQ2fKw7D-2Ei-V6WS5Dw@mail.gmail.com/
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/2704a8512746655037e3c02b471b31bd0d76c8db.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull EFI fix from Ard Biesheuvel:
"Fix for the x86 EFI workaround keeping boot services code and data
regions reserved until after SetVirtualAddressMap() completes:
deferred struct page initialization may result in some of this memory
being lost permanently"
* tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efi: defer freeing of boot services memory
Pull i2c fix from Wolfram Sang:
"A revert for the i801 driver restoring old locking behaviour"
* tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"
Pull x86 fixes from Ingo Molnar:
- Fix SEV guest boot failures in certain circumstances, due to
very early code relying on a BSS-zeroed variable that isn't
actually zeroed yet an may contain non-zero bootup values
Move the variable into the .data section go gain even earlier
zeroing
- Expose & allow the IBPB-on-Entry feature on SNP guests, which
was not properly exposed to guests due to initial implementational
caution
- Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative
file paths
- Fix the various SNC (Sub-NUMA Clustering) topology enumeration
bugs/artifacts (sched-domain build errors mostly).
SNC enumeration data got more complicated with Granite Rapids X
(GNR) and Clearwater Forest X (CWF), which exposed these bugs
and made their effects more serious
- Also use the now sane(r) SNC code to fix resctrl SNC detection bugs
- Work around a historic libgcc unwinder bug in the vdso32 sigreturn
code (again), which regressed during an overly aggressive recent
cleanup of DWARF annotations
* tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/vdso32: Work around libgcc unwinder bug
x86/resctrl: Fix SNC detection
x86/topo: Fix SNC topology mess
x86/topo: Replace x86_has_numa_in_package
x86/topo: Add topology_num_nodes_per_package()
x86/numa: Store extra copy of numa_nodes_parsed
x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
x86/sev: Allow IBPB-on-Entry feature for SNP guests
x86/boot/sev: Move SEV decompressor variables into the .data section
Pull timer fix from Ingo Molnar:
"Make clock_adjtime() syscall timex validation slightly more permissive
for auxiliary clocks, to not reject syscalls based on the status field
that do not try to modify the status field.
This makes the ABI behavior in clock_adjtime() consistent with
CLOCK_REALTIME"
* tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix timex status validation for auxiliary clocks
Pull scheduler fix from Ingo Molnar:
"Fix a DL scheduler bug that may corrupt internal metrics during PI and
setscheduler() syscalls, resulting in kernel warnings and misbehavior.
Found during stress-testing"
* tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
Pull SCSI fixes from James Bottomley:
"Two core changes and the rest in drivers, one core change to quirk the
behaviour of the Iomega Zip drive and one to fix a hang caused by tag
reallocation problems, which has mostly been seen by the iscsi client.
Note the latter fixes the problem but still has a slight sysfs memory
leak, so will be amended in the next pull request (once we've run the
fix for the fix through our testing)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: Fix recursive locking in __configfs_open_file()
scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP
scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
scsi: core: Fix refcount leak for tagset_refcnt
Pull fbdev fix from Helge Deller:
"Silence build error in au1100fb driver found by kernel test robot"
* tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: au1100fb: Fix build on MIPS64
Pull parisc fixes from Helge Deller:
"While testing Sasha Levin's 'kallsyms: embed source file:line info in
kernel stack traces' patch series, which increases the typical kernel
image size, I found some issues with the parisc initial kernel mapping
which may prevent the kernel to boot.
The three small patches here fix this"
* tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix initial page table creation for boot
parisc: Check kernel mapping earlier at bootup
parisc: Increase initial mapping to 64 MB with KALLSYMS
Pull RCU selftest fixes from Boqun Feng:
"Fix a regression in RCU torture test pre-defined scenarios caused by
commit 7dadeaa6e8 ("sched: Further restrict the preemption modes")
which limits PREEMPT_NONE to architectures that do not support
preemption at all and PREEMPT_VOLUNTARY to those architectures that do
not yet have PREEMPT_LAZY support.
Since major architectures (e.g. x86 and arm64) no longer support
CONFIG_PREEMPT_NONE and CONFIG_PREEMPT_VOLUNTARY, using them in
rcutorture, rcuscale, refscale, and scftorture pre-defined scenarios
causes config checking errors.
Switch these kconfigs to PREEMPT_LAZY"
* tag 'rcu-fixes.v7.0-20260307a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
scftorture: Update due to x86 not supporting none/voluntary preemption
refscale: Update due to x86 not supporting none/voluntary preemption
rcuscale: Update due to x86 not supporting none/voluntary preemption
rcutorture: Update due to x86 not supporting none/voluntary preemption
Pull tracing fixes from Steven Rostedt:
- Fix possible NULL pointer dereference in trace_data_alloc()
On the trace_data_alloc() error path, it can call trigger_data_free()
with a NULL pointer. This used to be a kfree() but was changed to
trigger_data_free() to clean up any partial initialization. The issue
is that trigger_data_free() does not expect a NULL pointer. Have
trigger_data_free() return safely on NULL pointer.
- Fix multiple events on the command line and bootconfig
If multiple events are enabled on the command line separately and not
grouped, only the last event gets enabled. That is:
trace_event=sched_switch trace_event=sched_waking
will only enable sched_waking whereas:
trace_event=sched_switch,sched_waking
will enable both.
The bootconfig makes it even worse as the second way is the more
common method.
The issue is that a temporary buffer is used to store the events to
enable later in boot. Each time the cmdline callback is called, it
overwrites what was previously there.
Have the callback append the next value (delimited by a comma) if the
temporary buffer already has content.
- Fix command line trace_buffer_size if >= 2G
The logic to allocate the trace buffer uses "int" for the size
parameter in the command line code causing overflow issues if more
that 2G is specified.
* tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G
tracing: Fix enabling multiple events on the kernel command line and bootconfig
tracing: Add NULL pointer check to trigger_data_free()
Pull hwmon fixes from Guenter Roeck:
- Fix initialization commands for AHT20
- Correct a malformed email address (emc1403)
- Check the it87_lock() return value
- Fix inverted polarity (max6639)
- Fix overflows, underflows, sign extension, and other problems in
macsmc
- Fix stack overflow in debugfs read (pmbus/q54sj108a2)
- Drop support for SMARC-sAM67 (discontinued and never released to
market)
* tag 'hwmon-for-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus/q54sj108a2) fix stack overflow in debugfs read
hwmon: (max6639) fix inverted polarity
dt-bindings: hwmon: sl28cpld: Drop sa67mcu compatible
hwmon: (it87) Check the it87_lock() return value
Revert "hwmon: add SMARC-sAM67 support"
hwmon: (aht10) Fix initialization commands for AHT20
hwmon: (emc1403) correct a malformed email address
hwmon: (macsmc) Fix overflows, underflows, and sign extension
hwmon: (macsmc) Fix regressions in Apple Silicon SMC hwmon driver
Pull driver core fix from Danilo Krummrich:
- Revert "driver core: enforce device_lock for driver_match_device()":
When a device is already present in the system and a driver is
registered on the same bus, we iterate over all devices registered on
this bus to see if one of them matches. If we come across an already
bound one where the corresponding driver crashed while holding the
device lock (e.g. in probe()) we can't make any progress anymore.
Thus, revert and clarify that an implementer of struct bus_type must
not expect match() to be called with the device lock held.
* tag 'driver-core-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
Revert "driver core: enforce device_lock for driver_match_device()"
Pull xen fixes from Juergen Gross:
- a cleanup of arch/x86/kernel/head_64.S removing the pre-built page
tables for Xen guests
- a small comment update
- another cleanup for Xen PVH guests mode
- fix an issue with Xen PV-devices backed by driver domains
* tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: better handle backend crash
xenbus: add xenbus_device parameter to xenbus_read_driver_state()
x86/PVH: Use boot params to pass RSDP address in start_info page
x86/xen: update outdated comment
xen/acpi-processor: fix _CST detection using undersized evaluation buffer
x86/xen: Build identity mapping page tables dynamically for XENPV
Eduard Zingerman says:
====================
bpf: Fix precision backtracking bug with linked registers
Emil Tsalapatis reported a verifier bug hit by the scx_lavd sched_ext
scheduler. The essential part of the verifier log looks as follows:
436: ...
// checkpoint hit for 438: (1d) if r7 == r8 goto ...
frame 3: propagating r2,r7,r8
frame 2: propagating r6
mark_precise: frame3: last_idx ...
mark_precise: frame3: regs=r2,r7,r8 stack= before 436: ...
mark_precise: frame3: regs=r2,r7 stack= before 435: ...
mark_precise: frame3: regs=r2,r7 stack= before 434: (85) call bpf_trace_vprintk#177
verifier bug: backtracking call unexpected regs 84
The log complains that registers r2 and r7 are tracked as precise
while processing the bpf_trace_vprintk() call in precision backtracking.
This can't be right, as r2 is reset by the call and there is nothing
to backtrack it to. The precision propagation is triggered when
a checkpoint is hit at instruction 438, r2 is dead at that instruction.
This happens because of the following sequence of events:
- Instruction 438 is first reached with registers r2 and r7 having
the same id via a path that does not call bpf_trace_vprintk():
- Checkpoint is created at 438.
- The jump at 438 is predicted, hence r7 and registers linked to it
(r2) are propagated as precise, marking r2 and r7 precise in the
checkpoint.
- Instruction 438 is reached a second time with r2 undefined and via
a path that calls bpf_trace_vprintk():
- Checkpoint is hit.
- propagate_precision() picks registers r2 and r7 and propagates
precision marks for those up to the helper call.
The root cause is the fact that states_equal() and
propagate_precision() assume that the precision flag can't be set for a
dead register (as computed by compute_live_registers()).
However, this is not the case when linked registers are at play.
Fix this by accounting for live register flags in
collect_linked_regs().
---
====================
Link: https://patch.msgid.link/20260306-linked-regs-and-propagate-precision-v1-0-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test for the scenario described in the previous commit:
an iterator loop with two paths where one ties r2/r7 via
shared scalar id and skips a call, while the other goes
through the call. Precision marks from the linked registers
get spuriously propagated to the call path via
propagate_precision(), hitting "backtracking call unexpected
regs" in backtrack_insn().
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-2-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix an inconsistency between func_states_equal() and
collect_linked_regs():
- regsafe() uses check_ids() to verify that cached and current states
have identical register id mapping.
- func_states_equal() calls regsafe() only for registers computed as
live by compute_live_registers().
- clean_live_states() is supposed to remove dead registers from cached
states, but it can skip states belonging to an iterator-based loop.
- collect_linked_regs() collects all registers sharing the same id,
ignoring the marks computed by compute_live_registers().
Linked registers are stored in the state's jump history.
- backtrack_insn() marks all linked registers for an instruction
as precise whenever one of the linked registers is precise.
The above might lead to a scenario:
- There is an instruction I with register rY known to be dead at I.
- Instruction I is reached via two paths: first A, then B.
- On path A:
- There is an id link between registers rX and rY.
- Checkpoint C is created at I.
- Linked register set {rX, rY} is saved to the jump history.
- rX is marked as precise at I, causing both rX and rY
to be marked precise at C.
- On path B:
- There is no id link between registers rX and rY,
otherwise register states are sub-states of those in C.
- Because rY is dead at I, check_ids() returns true.
- Current state is considered equal to checkpoint C,
propagate_precision() propagates spurious precision
mark for register rY along the path B.
- Depending on a program, this might hit verifier_bug()
in the backtrack_insn(), e.g. if rY ∈ [r1..r5]
and backtrack_insn() spots a function call.
The reproducer program is in the next patch.
This was hit by sched_ext scx_lavd scheduler code.
Changes in tests:
- verifier_scalar_ids.c selftests need modification to preserve
some registers as live for __msg() checks.
- exceptions_assert.c adjusted to match changes in the verifier log,
R0 is dead after conditional instruction and thus does not get
range.
- precise.c adjusted to match changes in the verifier log, register r9
is dead after comparison and it's range is not important for test.
Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Fixes: 0fb3cf6110 ("bpf: use register liveness information for func_states_equal")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-1-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull Kbuild fixes from Nathan Chancellor:
- Split out .modinfo section from ELF_DETAILS macro, as that macro may
be used in other areas that expect to discard .modinfo, breaking
certain image layouts
- Adjust genksyms parser to handle optional attributes in certain
declarations, necessary after commit 07919126ec ("netfilter:
annotate NAT helper hook pointers with __rcu")
- Include resolve_btfids in external module build created by
scripts/package/install-extmod-build when it may be run on external
modules
- Avoid removing objtool binary with 'make clean', as it is required
for external module builds
* tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Leave objtool binary around with 'make clean'
kbuild: install-extmod-build: Package resolve_btfids if necessary
genksyms: Fix parsing a declarator with a preceding attribute
kbuild: Split .modinfo out from ELF_DETAILS