Commit Graph

1427962 Commits

Author SHA1 Message Date
Ihor Solodrai
2364959abe libbpf: Start v1.8 development cycle
libbpf 1.7.0 has been released [1].

Update libbpf.map and libbpf_verson.h to start v1.8 development cycle.

[1] https://github.com/libbpf/libbpf/releases/tag/v1.7.0

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260316163916.2822081-1-ihor.solodrai@linux.dev
2026-03-16 14:15:15 -07:00
Ihor Solodrai
7e2f40ef0a selftests/bpf: Bump path and command buffer sizes in bpftool_helpers.c
The path length of 64 is way too low in some envirnoments, which leads
to subtle failures due to truncation [1].

Replace BPFTOOL_PATH_MAX_LEN with PATH_MAX, and set
BPFTOOL_FULL_CMD_MAX_LEN to double of PATH_MAX.

[1] https://github.com/libbpf/libbpf/actions/runs/22980753016/job/66719800527

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260312234820.439720-1-ihor.solodrai@linux.dev
2026-03-16 14:14:57 -07:00
Mykyta Yatsenko
c73a244366 bpftool: Allow explicitly skip llvm, libbfd and libcrypto dependencies
Introduce SKIP_LLVM, SKIP_LIBBFD, and SKIP_CRYPTO build flags that let
users build bpftool without these optional dependencies.

SKIP_LLVM=1 skips LLVM even when detected. SKIP_LIBBFD=1 prevents the
libbfd JIT disassembly fallback when LLVM is absent. Together, they
produce a bpftool with no disassembly support.

SKIP_CRYPTO=1 excludes sign.c and removes the -lcrypto link dependency.
Inline stubs in main.h return errors with a clear message if signing
functions are called at runtime.

Use BPFTOOL_WITHOUT_CRYPTO (not HAVE_LIBCRYPTO_SUPPORT) as the C
define, following the BPFTOOL_WITHOUT_SKELETONS naming convention for
bpftool-internal build config, leaving HAVE_LIBCRYPTO_SUPPORT free for
proper feature detection in the future.

All three flags are propagated through the selftests Makefile to bpftool
sub-builds.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260312-b4-bpftool_build-v2-1-4c9d57133644@meta.com
2026-03-16 14:14:14 -07:00
Alexei Starovoitov
6c8e1a9eee Merge branch 'bpf-relax-8-frame-limitation-for-global-subprogs'
Emil Tsalapatis says:

====================
bpf: Relax 8 frame limitation for global subprogs

The BPF verifier currently limits the maximum runtime call stack to
8 frames. Larger BPF programs like sched-ext schedulers routinely
fail verification because they exceed this limit, even as they use
very little actual stack space for each frame.

Relax the verifier to permit call stacks > 8 frames deep when the
call stacks include global subprogs. The old 8 stack frame limit now
only applies to call stacks composed entirely of static function calls.
This works because global functions are each verified in isolation, so
the verifier does not need to cross-reference verification state across
the function call boundary, which has been the reason for limiting the
call stack size in the first place.

This patch does not change the verification time limit of 8 stack
frames. Static functions that are inlined for verification purposes
still only go 8 frames deep to avoid changing the verifier's internal
data structures used for verification. These data structures only
support holding information on up to 8 stack frames.

This patch also does not adjust the actual maximum stack size of 512.

CHANGELOG
=========

v5 -> v6 (https://lore.kernel.org/bpf/20260311182831.91219-1-emil@etsalapatis.com/)
- Make bpf_subprog_call_depth_info internal to verifier.c (Alexei)

v4 -> v5 (https://lore.kernel.org/bpf/20260309204430.201219-1-emil@etsalapatis.com/)
- Move depth tracking state to verifier (Eduard) and free it after verification (Alexei)
- Fix selftest patch title and formatting errors (Yonghong)

v3 -> v4 (https://lore.kernel.org/bpf/20260303043106.406099-1-emil@etsalapatis.com/)
- Factor out temp call depth tracking info into its own struct (Eduard)
- Bring depth calculation loop in line with the other instances (Mykyta)
- Add comment on why selftest call stack is 16 bytes/frame (Eduard)
- Rename "cidx" to "caller" for clarity (Mykyta, Eduard)

v2 -> v3 (https://lore.kernel.org/bpf/20260210213606.475415-1-emil@etsalapatis.com/)
- Change logic to remove arbitrary limit on call depth (Eduard)
- Add additional selftests (Eduard)

v1 -> v2 (https://lore.kernel.org/bpf/20260202233716.835638-1-emil@etsalapatis.com)
- Adjust patch to only increase the runtime stack depth, leaving the
verification-time stack depth unchanged (Alexei)

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
====================

Link: https://patch.msgid.link/20260316161225.128011-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-16 11:26:42 -07:00
Emil Tsalapatis
01d5d2f7d9 selftests/bpf: Add deep call stack selftests
Add tests that demonstrate the verifier support for deep call stacks
while still enforcing maximum stack size limits.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260316161225.128011-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-16 11:26:41 -07:00
Emil Tsalapatis
ad95d3c758 bpf: Only enforce 8 frame call stack limit for all-static stacks
The BPF verifier currently enforces a call stack depth of 8 frames,
regardless of the actual stack space consumption of those frames. The
limit is necessary for static call stacks, because the bookkeeping data
structures used by the verifier when stepping into static functions
during verification only support 8 stack frames. However, this
limitation only matters for static stack frames: Global subprogs are
verified by themselves and do not require limiting the call depth.

Relax this limitation to only apply to static stack frames. Verification
now only fails when there is a sequence of 8 calls to non-global
subprogs. Calling into a global subprog resets the counter. This allows
deeper call stacks, provided all frames still fit in the stack.

The change does not increase the maximum size of the call stack, only
the maximum number of frames we can place in it.

Also change the progs/test_global_func3.c selftest to use static
functions, since with the new patch it would otherwise unexpectedly
pass verification.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260316161225.128011-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-16 11:26:41 -07:00
Ilya Leoshkevich
202e42e4aa s390/bpf: Zero-extend bpf prog return values and kfunc arguments
s390x ABI requires callers to zero-extend unsigned arguments and
sign-extend signed arguments, and callees to zero-extend unsigned
return values and sign-extend signed return values.

s390 BPF JIT currently implements only sign extension. Fix this
omission and implement zero extension too.

Fixes: 528eb2cb87 ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
Reported-by: Hari Bathini <hbathini@linux.ibm.com>
Closes: https://lore.kernel.org/bpf/20260312080113.843408-1-hbathini@linux.ibm.com/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260313174807.581826-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-16 09:21:35 -07:00
Alexei Starovoitov
bb41fcef5c Merge branch 'optimize-bounds-refinement-by-reordering-deductions'
Paul Chaignon says:

====================
Optimize bounds refinement by reordering deductions

This patchset optimizes the bounds refinement (reg_bounds_sync) by
reordering deductions in __reg_deduce_bounds. This reordering allows us
to improve precision slightly while losing one call to
__reg_deduce_bounds.

The first patch from Eduard refactors the __reg_deduce_bounds
subfunctions, the second patch implements the reordering, and the last
one adds a selftest.

Changes in v3:
  - Added first commit from Eduard that significantly helps with
    readability of second commit.
  - Reshuffled a bit more the functions in the second commit to improve
    precision (Eduard).
  - Rebased.
Changes in v2:
  - Updated description to mention potential precision improvement and
    to clarify the sequence of refinements (Shung-Hsi).
  - Added the second patch.
  - Rebased.
====================

Link: https://patch.msgid.link/cover.1773401138.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-13 19:09:35 -07:00
Paul Chaignon
0a753d8cd6 selftests/bpf: Test case for refinement improvement using 64b bounds
This new selftest demonstrates the improvement of bounds refinement from
the previous patch. It is inspired from a set of reg_bounds_sync inputs
generated using CBMC [1] by Shung-Hsi:

  reg.smin_value=0x8000000000000002
  reg.smax_value=2
  reg.umin_value=2
  reg.umax_value=19
  reg.s32_min_value=2
  reg.s32_max_value=3
  reg.u32_min_value=2
  reg.u32_max_value=3

reg_bounds_sync returns R=[2; 3] without the previous patch, and R=2
with it. __reg64_deduce_bounds is able to derive that u64=2, but before
the previous patch, those bounds are overwritten in
__reg_deduce_mixed_bounds using the 32bits bounds.

To arrive to these reg_bounds_sync inputs, we bound the 32bits value
first to [2; 3]. We can then upper-bound s64 without impacting u64. At
that point, the refinement to u64=2 doesn't happen because the ranges
still overlap in two points:

  0  umin=2                             umax=0xff..ff00..03   U64_MAX
  |  [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx]      |
  |----------------------------|------------------------------|
  |xx]                           [xxxxxxxxxxxxxxxxxxxxxxxxxxxx|
  0  smax=2                      smin=0x800..02               -1

With an upper-bound check at value 19, we can reach the above inputs for
reg_bounds_sync. At that point, the refinement to u64=2 happens and
because it isn't overwritten by __reg_deduce_mixed_bounds anymore,
reg_bounds_sync returns with reg=2.

The test validates this result by including an illegal instruction in
the (dead) branch reg != 2.

Link: https://github.com/shunghsiyu/reg_bounds_sync-review/ [1]
Co-developed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/622dc51c581cd4d652fff362188b2a5f73c1fe99.1773401138.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-13 19:09:35 -07:00
Paul Chaignon
9e5fcb003a bpf: Avoid one round of bounds deduction
In commit 5dbb19b16a ("bpf: Add third round of bounds deduction"), I
added a new round of bounds deduction because two rounds were not enough
to converge to a fixed point. This commit slightly refactor the bounds
deduction logic such that two rounds are enough.

In [1], Eduard noticed that after we improved the refinement logic, a
third call to the bounds deduction (__reg_deduce_bounds) was needed to
converge to a fixed point. More specifically, we needed this third call
to improve the s64 range using the s32 range. We added the third call
and postponed a more detailed analysis of the refinement logic.

I've been looking into this more recently. The register refinement
consists of the following calls.

    __update_reg_bounds();
    3 x __reg_deduce_bounds() {
        deduce_bounds_32_from_64();
        deduce_bounds_32_from_32();
        deduce_bounds_64_from_64();
        deduce_bounds_64_from_32();
    };
    __reg_bound_offset();
    __update_reg_bounds();

From this, we can observe that we first improve the 32bit ranges from
the 64bit ranges in deduce_bounds_32_from_64, then improve the 64bit
ranges on their own in deduce_bounds_64_from_64. Intuitively, if we
were to improve the 64bit ranges on their own *before* we use them to
improve the 32bit ranges, we may reach a fixed point earlier.

In a similar manner, using CBMC, Eduard found that it's best to improve
the 32bit ranges on their own *after* we've improve them using the 64bit
ranges. That is, running deduce_bounds_32_from_32 after
deduce_bounds_32_from_64.

These changes allow us to lose one call to __reg_deduce_bounds. Without
this reordering, the test "verifier_bounds/bounds deduction cross sign
boundary, negative overlap" fails when removing one call to
__reg_deduce_bounds. In some cases, this change can even improve
precision a little bit, as illustrated in the new selftest in the next
patch.

As expected, this change didn't have any impact on the number of
instructions processed when running it through the Cilium complexity
test suite [2].

Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Link: https://pchaigno.github.io/test-verifier-complexity.html [2]
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/1b00d2749ec4c774c3ada84e265ac7fda72cfe56.1773401138.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-13 19:09:35 -07:00
Eduard Zingerman
879cace976 bpf: better naming for __reg_deduce_bounds() parts
This renaming will also help reshuffle the different parts in the
subsequent patch.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/a988ecf2c57e265b97917136b14b421038534e8c.1773401138.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-13 19:09:35 -07:00
Hari Bathini
2af3aa702c selftests/bpf: Improve test coverage for kfunc call
On powerpc, immediate load instructions are sign extended. In case
of unsigned types, arguments should be explicitly zero-extended by
the caller. For kfunc call, this needs to be handled in the JIT code.
In bpf_kfunc_call_test4(), that tests for sign-extension of signed
argument types in kfunc calls, add some additional failure checks.
And add bpf_kfunc_call_test5() to test zero-extension of unsigned
argument types in kfunc calls.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260312080113.843408-1-hbathini@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-13 07:13:35 -07:00
Varun R Mallya
ca0f39a369 selftests/bpf: Fix const qualifier warning in fexit_bpf2bpf.c
Building selftests with
clang 23.0.0 (6fae863eba8a72cdd82f37e7111a46a70be525e0) triggers
the following error:

  tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c:117:12:
  error: assigning to 'char *' from 'const char *' discards qualifiers
  [-Werror,-Wincompatible-pointer-types-discards-qualifiers]

The variable `tgt_name` is declared as `char *`, but it stores the
result of strstr(prog_name[i], "/"). Since `prog_name[i]` is a
`const char *`, the returned pointer should also be treated as
const-qualified.

Update `tgt_name` to `const char *` to match the type of the underlying
string and silence the compiler warning.

Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Menglong Dong <menglong.dong@linux.dev>
Link: https://lore.kernel.org/bpf/20260305222132.470700-1-varunrmallya@gmail.com
2026-03-11 10:54:40 -07:00
Sun Jian
c02e0ab8ae selftests/bpf: Skip livepatch test when prerequisites are missing
livepatch_trampoline relies on livepatch sysfs and livepatch-sample.ko.
When CONFIG_LIVEPATCH is disabled or the samples module isn't built, the
test fails with ENOENT and causes false failures in minimal CI configs.

Skip the test when livepatch sysfs or the sample module is unavailable.
Also avoid writing to livepatch sysfs when it's not present.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260309104448.817401-1-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-11 09:36:37 -07:00
Sun Jian
aa181c7d64 selftests/bpf: drop serial restriction
Patch 1/2 added PID filtering to the probe_user BPF program to avoid
cross-test interference from the global connect() hooks.

With the interference removed, drop the serial_ prefix and remove the
stale TODO comment so the test can run in parallel.

Tested:
  ./test_progs -t probe_user -v
  ./test_progs -j$(nproc) -t probe_user

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260306083330.518627-2-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-11 09:34:22 -07:00
Sun Jian
70ce840d5f selftests/bpf: filter by pid to avoid cross-test interference
The test installs a kprobe on __sys_connect and checks that
bpf_probe_write_user() can modify the syscall argument. However, any
concurrent thread in any other test that calls connect() will also
trigger the kprobe and have its sockaddr silently overwritten, causing
flaky failures in unrelated tests.

Constrain the hook to the current test process by filtering on a PID
stored as a global variable in .bss. Initialize the .bss value from
user space before bpf_object__load() using bpf_map__set_initial_value(),
and validate the bss map value size to catch layout mismatches.

No new map is introduced and the test keeps the existing non-skeleton
flow.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260306083330.518627-1-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-11 09:34:22 -07:00
Viktor Malik
900b7cc73c selftests/bpf: Speed up module_attach test
The module_attach test contains subtests which check that unloading a
module while there are BPF programs attached to its functions is not
possible because the module is still referenced.

The problem is that the test calls the generic unload_module() helper
function which is used for module cleanup after test_progs terminate and
tries to wait until all module references are released. This
unnecessarily slows down the module_attach subtests since each
unsuccessful call to unload_module() takes about 1 second.

Introduce try_unload_module() which takes the number of retries as a
parameter. Make unload_module() call it with the currently used amount
of 10000 retries but call it with just 1 retry from module_attach tests
as it is always expected to fail. This speeds up the module_attach()
test significantly.

Before:

    # time ./test_progs -t module_attach
    [...]
    Summary: 1/14 PASSED, 0 SKIPPED, 0 FAILED

    real        0m5.011s
    user        0m0.293s
    sys         0m0.108s

After:

    # time ./test_progs -t module_attach
    [...]
    Summary: 1/14 PASSED, 0 SKIPPED, 0 FAILED

    real        0m0.350s
    user        0m0.197s
    sys         0m0.063s

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260306101628.3822284-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-11 09:29:09 -07:00
Alan Maguire
e95e85b891 selftests/bpf: Handle !CONFIG_SMC in bpf_smc.c
Currently BPF selftests will fail to compile if CONFIG_SMC
is not set.

Use BPF CO-RE to work around the case where CONFIG_SMC is
not set; use ___local variants of relevant structures and
utilize bpf_core_field_exists() for net->smc.

The test continues to pass where

  CONFIG_SMC=y
  CONFIG_SMC_HS_CTRL_BPF=y

but these changes allow the selftests to build in the absence
of CONFIG_SMC=y.

Also ensure that we get a pure skip rather than a skip+fail
by removing the SMC is unsupported part from the ASSERT_FALSE()
in get_smc_nl_family(); doing this means we get a skip without
a fail when CONFIG_SMC is not set:

$ sudo ./test_progs -t bpf_smc
Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED

Fixes: beb3c67297 ("bpf/selftests: Add selftest for bpf_smc_hs_ctrl")
Reported-by: Colm Harrington <colm.harrington@oracle.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://patch.msgid.link/20260310111330.601765-1-alan.maguire@oracle.com
2026-03-10 17:40:15 -07:00
Alexei Starovoitov
0c55d4817a Merge branch 'fix-test_cgroup_iter_memcg-issues-found-during-back-porting'
Hui Zhu says:

====================
Fix test_cgroup_iter_memcg issues found during back-porting

While back-porting "mm: bpf kfuncs to access memcg data", I
encountered issues with test_cgroup_iter_memcg, specifically
in test_kmem.
The test_cgroup_iter_memcg test would falsely pass when
bpf_mem_cgroup_page_state() failed due to incompatible enum
values across kernel versions. Additionally, test_kmem would
fail on systems with cgroup.memory=nokmem enabled.

These patches are my fixes for the problems I encountered.

Changelog:
v5:
According to the comments of Emil Tsalapatis and JP Kobryn, dropped
"selftests/bpf: Check bpf_mem_cgroup_page_state return value".
v4:
Fixed wrong git commit log in "bpf: Use bpf_core_enum_value for stats in
cgroup_iter_memcg".
v3:
According to the comments of JP Kobryn, remove kmem subtest from
cgroup_iter_memcg and fix assertion string in test_pgfault.
v2:
According to the comments of JP Kobryn, added bpf_core_enum_value()
usage in the BPF program to handle cross-kernel enum value differences
at load-time instead of compile-time.
Dropped the mm/memcontrol.c patch.
Modified test_kmem handling: instead of skipping when nokmem is set,
verify that kmem value is zero as expected.
According to the comments of bot, fixed assertion message: changed
"bpf_mem_cgroup_page_state" to "bpf_mem_cgroup_vm_events" for PGFAULT
check.
====================

Link: https://patch.msgid.link/cover.1772505399.git.zhuhui@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-10 11:54:05 -07:00
Hui Zhu
da99028c21 selftests/bpf: Use bpf_core_enum_value for stats in cgroup_iter_memcg
Replace hardcoded enum values with bpf_core_enum_value() calls in
cgroup_iter_memcg test to improve portability across different
kernel versions.

The change adds runtime enum value resolution for:
- node_stat_item: NR_ANON_MAPPED, NR_SHMEM, NR_FILE_PAGES,
  NR_FILE_MAPPED
- vm_event_item: PGFAULT

This ensures the BPF program can adapt to enum value changes
between kernel versions.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: JP Kobryn <jp.kobryn@linux.dev>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Link: https://lore.kernel.org/r/ca6eb1a1a4fd7a17ffe995acf52c9a4ceb7bac13.1772505399.git.zhuhui@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-10 11:53:48 -07:00
Hui Zhu
a8fce027e1 selftests/bpf: Remove kmem subtest from cgroup_iter_memcg
When cgroup.memory=nokmem is set in the kernel command line, kmem
accounting is disabled. This causes the test_kmem subtest in
cgroup_iter_memcg to fail because it expects non-zero kmem values.

Remove the kmem subtest altogether since the remaining subtests
(shmem, file, pgfault) already provide sufficient coverage for
the cgroup iter memcg functionality.

Reviewed-by: JP Kobryn <jp.kobryn@linux.dev>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Link: https://lore.kernel.org/r/35fa32a019361ec26265c8a789ee31e448d4dbda.1772505399.git.zhuhui@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-10 11:53:23 -07:00
Alexei Starovoitov
437350df86 Merge branch 'bpf-support-for-non_null-ptr-detection-with-jeq-jne-with-register-operand'
Cupertino Miranda says:

====================
bpf: support for non_null ptr detection with JEQ/JNE with register operand

Changes from v1:
 - Corrected typos in commit messages.
 - Fixed indentation.
 - Replaced text by simpler version suggested by Eduard.
Changes from v2:
 - Small fixes after AI patch checker complaints.
Changes from v3:
 - Removed log file. No idea how that got added.
====================

Link: https://patch.msgid.link/20260304195018.181396-1-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-10 11:51:27 -07:00
Cupertino Miranda
6a1c9a442f selftests/bpf: tests to non_null ptr detection using register operand in JEQ/JNE
This patch adds two tests to check non_null ptr detection when using JEQ and JNE
have a register in second operand, and its value is known to be 0.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Cc: David Faust  <david.faust@oracle.com>
Cc: Jose Marchesi  <jose.marchesi@oracle.com>
Cc: Elena Zannoni  <elena.zannoni@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260304195018.181396-4-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-10 11:51:18 -07:00
Cupertino Miranda
2f4cb53eed bpf: detect non null pointer with register operand in JEQ/JNE.
This patch adds support to validate a pointer as not null when its
value is compared to a register whose value the verifier knows to be
null.
Initial pattern only verifies against an immediate operand.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Cc: David Faust  <david.faust@oracle.com>
Cc: Jose Marchesi  <jose.marchesi@oracle.com>
Cc: Elena Zannoni  <elena.zannoni@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260304195018.181396-3-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-10 11:51:18 -07:00
Alexei Starovoitov
bd2e02e3c9 Merge branch 'always-allow-sleepable-and-fmod_ret-programs-on-syscalls'
Viktor Malik says:

====================
Always allow sleepable and fmod_ret programs on syscalls

Both sleepable and fmod_ret programs are only allowed on selected
functions. For convenience, the error injection list was originally
used.

When error injection is disabled, that list is empty and sleepable
tracing programs, as well as fmod_ret programs, are effectively
unavailable.

This patch series addresses the issue by at least enabling sleepable and
fmod_ret programs on syscalls, if error injection is disabled. More
details on why syscalls are used can be found in [1].

[1] https://lore.kernel.org/bpf/CAADnVQK6qP8izg+k9yV0vdcT-+=axtFQ2fKw7D-2Ei-V6WS5Dw@mail.gmail.com/

Changes in v3:
- Handle LoongArch (Leon)
- Add Kumar's and Leon's acks

Changes in v2:
- Check "sys_" prefix instead of "sys" for powerpc syscalls (AI review)
- Add link to the original discussion (Kumar)
- Add explanation why arch syscall prefixes are hard-coded (Leon)
====================

Link: https://patch.msgid.link/cover.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-09 09:28:43 -07:00
Viktor Malik
fcec7c66d6 selftests/bpf: Move sleepable refcounted_kptr tests to syscalls
Now that sleepable programs are always enabled on syscalls, let
refcounted_kptr tests use syscalls rather than bpf_testmod_test_read,
which is not sleepable with error injection disabled.

The tests just check that the verifier can handle usage of RCU locks in
sleepable programs and never actually attach. So, the attachment target
doesn't matter (as long as it is sleepable) and with syscalls, the tests
pass on kernels with disabled error injection.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/8b6626eae384559855f7a0e846a16e83f25f06f6.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-09 09:28:42 -07:00
Viktor Malik
20c2e102a2 bpf: Always allow fmod_ret programs on syscalls
fmod_ret BPF programs can only be attached to selected functions. For
convenience, the error injection list was originally used (along with
functions prefixed with "security_"), which contains syscalls and
several other functions.

When error injection is disabled (CONFIG_FUNCTION_ERROR_INJECTION=n),
that list is empty and fmod_ret programs are effectively unavailable for
most of the functions. In such a case, at least enable fmod_ret programs
on syscalls.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/472310f9a5f4944ad03214e4d943a4830fd8eb76.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-09 09:28:42 -07:00
Viktor Malik
16d9c56606 bpf: Always allow sleepable programs on syscalls
Sleepable BPF programs can only be attached to selected functions. For
convenience, the error injection list was originally used, which
contains syscalls and several other functions.

When error injection is disabled (CONFIG_FUNCTION_ERROR_INJECTION=n),
that list is empty and sleepable tracing programs are effectively
unavailable. In such a case, at least enable sleepable programs on
syscalls. For discussion why syscalls were chosen, see [1].

To detect that a function is a syscall handler, we check for
arch-specific prefixes for the most common architectures. Unfortunately,
the prefixes are hard-coded in arch syscall code so we need to hard-code
them, too.

[1] https://lore.kernel.org/bpf/CAADnVQK6qP8izg+k9yV0vdcT-+=axtFQ2fKw7D-2Ei-V6WS5Dw@mail.gmail.com/

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/2704a8512746655037e3c02b471b31bd0d76c8db.1773055375.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-09 09:28:42 -07:00
Alexei Starovoitov
099bded752 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.0-rc3
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-08 17:46:38 -07:00
Linus Torvalds
1f318b96cc Linux 7.0-rc3 v7.0-rc3 2026-03-08 16:56:54 -07:00
Linus Torvalds
fc9f248d8c Merge tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fix from Ard Biesheuvel:
 "Fix for the x86 EFI workaround keeping boot services code and data
  regions reserved until after SetVirtualAddressMap() completes:
  deferred struct page initialization may result in some of this memory
  being lost permanently"

* tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efi: defer freeing of boot services memory
2026-03-08 12:13:09 -07:00
Linus Torvalds
014441d1e4 Merge tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "A revert for the i801 driver restoring old locking behaviour"

* tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"
2026-03-08 10:17:05 -07:00
Linus Torvalds
c23719abc3 Merge tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Fix SEV guest boot failures in certain circumstances, due to
   very early code relying on a BSS-zeroed variable that isn't
   actually zeroed yet an may contain non-zero bootup values

   Move the variable into the .data section go gain even earlier
   zeroing

 - Expose & allow the IBPB-on-Entry feature on SNP guests, which
   was not properly exposed to guests due to initial implementational
   caution

 - Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative
   file paths

 - Fix the various SNC (Sub-NUMA Clustering) topology enumeration
   bugs/artifacts (sched-domain build errors mostly).

   SNC enumeration data got more complicated with Granite Rapids X
   (GNR) and Clearwater Forest X (CWF), which exposed these bugs
   and made their effects more serious

 - Also use the now sane(r) SNC code to fix resctrl SNC detection bugs

 - Work around a historic libgcc unwinder bug in the vdso32 sigreturn
   code (again), which regressed during an overly aggressive recent
   cleanup of DWARF annotations

* tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/vdso32: Work around libgcc unwinder bug
  x86/resctrl: Fix SNC detection
  x86/topo: Fix SNC topology mess
  x86/topo: Replace x86_has_numa_in_package
  x86/topo: Add topology_num_nodes_per_package()
  x86/numa: Store extra copy of numa_nodes_parsed
  x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
  x86/sev: Allow IBPB-on-Entry feature for SNP guests
  x86/boot/sev: Move SEV decompressor variables into the .data section
2026-03-07 17:12:06 -08:00
Linus Torvalds
6ff1020c2f Merge tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "Make clock_adjtime() syscall timex validation slightly more permissive
  for auxiliary clocks, to not reject syscalls based on the status field
  that do not try to modify the status field.

  This makes the ABI behavior in clock_adjtime() consistent with
  CLOCK_REALTIME"

* tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix timex status validation for auxiliary clocks
2026-03-07 17:09:15 -08:00
Linus Torvalds
b1b9a9d0b5 Merge tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix a DL scheduler bug that may corrupt internal metrics during PI and
  setscheduler() syscalls, resulting in kernel warnings and misbehavior.

  Found during stress-testing"

* tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
2026-03-07 17:07:13 -08:00
Eric Dumazet
1954c4f012 eventpoll: Convert epoll_put_uevent() to scoped user access
Saves two function calls, and one stac/clac pair.

stac/clac is rather expensive on older cpus like Zen 2.

A synthetic network stress test gives a ~1.5% increase of pps
on AMD Zen 2.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-03-07 15:03:14 -08:00
Linus Torvalds
3b5d535c63 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Two core changes and the rest in drivers, one core change to quirk the
  behaviour of the Iomega Zip drive and one to fix a hang caused by tag
  reallocation problems, which has mostly been seen by the iscsi client.

  Note the latter fixes the problem but still has a slight sysfs memory
  leak, so will be amended in the next pull request (once we've run the
  fix for the fix through our testing)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: Fix recursive locking in __configfs_open_file()
  scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP
  scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
  scsi: core: Fix refcount leak for tagset_refcnt
2026-03-07 14:04:50 -08:00
Linus Torvalds
fb07430e6f Merge tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fix from Helge Deller:
 "Silence build error in au1100fb driver found by kernel test robot"

* tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: au1100fb: Fix build on MIPS64
2026-03-07 13:21:43 -08:00
Linus Torvalds
6deccafcb4 Merge tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
 "While testing Sasha Levin's 'kallsyms: embed source file:line info in
  kernel stack traces' patch series, which increases the typical kernel
  image size, I found some issues with the parisc initial kernel mapping
  which may prevent the kernel to boot.

  The three small patches here fix this"

* tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix initial page table creation for boot
  parisc: Check kernel mapping earlier at bootup
  parisc: Increase initial mapping to 64 MB with KALLSYMS
2026-03-07 12:38:16 -08:00
Linus Torvalds
8b7f4cd3ac Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Fix u32/s32 bounds when ranges cross min/max boundary (Eduard
   Zingerman)

 - Fix precision backtracking with linked registers (Eduard Zingerman)

 - Fix linker flags detection for resolve_btfids (Ihor Solodrai)

 - Fix race in update_ftrace_direct_add/del (Jiri Olsa)

 - Fix UAF in bpf_trampoline_link_cgroup_shim (Lang Xu)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  resolve_btfids: Fix linker flags detection
  selftests/bpf: add reproducer for spurious precision propagation through calls
  bpf: collect only live registers in linked regs
  Revert "selftests/bpf: Update reg_bound range refinement logic"
  selftests/bpf: test refining u32/s32 bounds when ranges cross min/max boundary
  bpf: Fix u32/s32 bounds when ranges cross min/max boundary
  bpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim
  ftrace: Add missing ftrace_lock to update_ftrace_direct_add/del
2026-03-07 12:20:37 -08:00
Linus Torvalds
03dcad79ee Merge tag 'rcu-fixes.v7.0-20260307a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU selftest fixes from Boqun Feng:
 "Fix a regression in RCU torture test pre-defined scenarios caused by
  commit 7dadeaa6e8 ("sched: Further restrict the preemption modes")
  which limits PREEMPT_NONE to architectures that do not support
  preemption at all and PREEMPT_VOLUNTARY to those architectures that do
  not yet have PREEMPT_LAZY support.

  Since major architectures (e.g. x86 and arm64) no longer support
  CONFIG_PREEMPT_NONE and CONFIG_PREEMPT_VOLUNTARY, using them in
  rcutorture, rcuscale, refscale, and scftorture pre-defined scenarios
  causes config checking errors.

  Switch these kconfigs to PREEMPT_LAZY"

* tag 'rcu-fixes.v7.0-20260307a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  scftorture: Update due to x86 not supporting none/voluntary preemption
  refscale: Update due to x86 not supporting none/voluntary preemption
  rcuscale: Update due to x86 not supporting none/voluntary preemption
  rcutorture: Update due to x86 not supporting none/voluntary preemption
2026-03-07 11:56:55 -08:00
Linus Torvalds
aed0af05a8 Merge tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix possible NULL pointer dereference in trace_data_alloc()

   On the trace_data_alloc() error path, it can call trigger_data_free()
   with a NULL pointer. This used to be a kfree() but was changed to
   trigger_data_free() to clean up any partial initialization. The issue
   is that trigger_data_free() does not expect a NULL pointer. Have
   trigger_data_free() return safely on NULL pointer.

 - Fix multiple events on the command line and bootconfig

   If multiple events are enabled on the command line separately and not
   grouped, only the last event gets enabled. That is:

      trace_event=sched_switch trace_event=sched_waking

   will only enable sched_waking whereas:

      trace_event=sched_switch,sched_waking

   will enable both.

   The bootconfig makes it even worse as the second way is the more
   common method.

   The issue is that a temporary buffer is used to store the events to
   enable later in boot. Each time the cmdline callback is called, it
   overwrites what was previously there.

   Have the callback append the next value (delimited by a comma) if the
   temporary buffer already has content.

 - Fix command line trace_buffer_size if >= 2G

   The logic to allocate the trace buffer uses "int" for the size
   parameter in the command line code causing overflow issues if more
   that 2G is specified.

* tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G
  tracing: Fix enabling multiple events on the kernel command line and bootconfig
  tracing: Add NULL pointer check to trigger_data_free()
2026-03-07 09:50:54 -08:00
Ihor Solodrai
b0dcdcb9ae resolve_btfids: Fix linker flags detection
The "|| echo -lzstd" default makes zstd an unconditional link
dependency of resolve_btfids. On systems where libzstd-dev is not
installed and pkg-config fails, the linker fails:

  ld: cannot find -lzstd: No such file or directory

libzstd is a transitive dependency of libelf, so the -lzstd flag is
strictly necessary only for static builds [1].

Remove ZSTD_LIBS variable, and instead set LIBELF_LIBS depending on
whether the build is static or not. Use $(HOSTPKG_CONFIG) as primary
source of the flags list.

Also add a default value for HOSTPKG_CONFIG in case it's not built via
the toplevel Makefile. Pass it from selftests/bpf too.

[1] https://lore.kernel.org/bpf/4ff82800-2daa-4b9f-95a9-6f512859ee70@linux.dev/

Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
Reported-by: Vitaly Chikunov <vt@altlinux.org>
Closes: https://lore.kernel.org/bpf/aaWqMcK-2AQw5dx8@altlinux.org/
Fixes: 4021848a90 ("selftests/bpf: Pass through build flags to bpftool and resolve_btfids")
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260305014730.3123382-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-07 08:51:51 -08:00
Linus Torvalds
7b6e48df88 Merge tag 'hwmon-for-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:

 - Fix initialization commands for AHT20

 - Correct a malformed email address (emc1403)

 - Check the it87_lock() return value

 - Fix inverted polarity (max6639)

 - Fix overflows, underflows, sign extension, and other problems in
   macsmc

 - Fix stack overflow in debugfs read (pmbus/q54sj108a2)

 - Drop support for SMARC-sAM67 (discontinued and never released to
   market)

* tag 'hwmon-for-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pmbus/q54sj108a2) fix stack overflow in debugfs read
  hwmon: (max6639) fix inverted polarity
  dt-bindings: hwmon: sl28cpld: Drop sa67mcu compatible
  hwmon: (it87) Check the it87_lock() return value
  Revert "hwmon: add SMARC-sAM67 support"
  hwmon: (aht10) Fix initialization commands for AHT20
  hwmon: (emc1403) correct a malformed email address
  hwmon: (macsmc) Fix overflows, underflows, and sign extension
  hwmon: (macsmc) Fix regressions in Apple Silicon SMC hwmon driver
2026-03-07 08:39:59 -08:00
Linus Torvalds
e33aafac04 Merge tag 'driver-core-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fix from Danilo Krummrich:

 - Revert "driver core: enforce device_lock for driver_match_device()":

   When a device is already present in the system and a driver is
   registered on the same bus, we iterate over all devices registered on
   this bus to see if one of them matches. If we come across an already
   bound one where the corresponding driver crashed while holding the
   device lock (e.g. in probe()) we can't make any progress anymore.

   Thus, revert and clarify that an implementer of struct bus_type must
   not expect match() to be called with the device lock held.

* tag 'driver-core-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  Revert "driver core: enforce device_lock for driver_match_device()"
2026-03-07 08:16:48 -08:00
Linus Torvalds
0f912c8917 Merge tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - a cleanup of arch/x86/kernel/head_64.S removing the pre-built page
   tables for Xen guests

 - a small comment update

 - another cleanup for Xen PVH guests mode

 - fix an issue with Xen PV-devices backed by driver domains

* tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: better handle backend crash
  xenbus: add xenbus_device parameter to xenbus_read_driver_state()
  x86/PVH: Use boot params to pass RSDP address in start_info page
  x86/xen: update outdated comment
  xen/acpi-processor: fix _CST detection using undersized evaluation buffer
  x86/xen: Build identity mapping page tables dynamically for XENPV
2026-03-07 07:44:32 -08:00
Alexei Starovoitov
325d1ba3ca Merge branch 'bpf-fix-precision-backtracking-bug-with-linked-registers'
Eduard Zingerman says:

====================
bpf: Fix precision backtracking bug with linked registers

Emil Tsalapatis reported a verifier bug hit by the scx_lavd sched_ext
scheduler. The essential part of the verifier log looks as follows:

  436: ...
  // checkpoint hit for 438: (1d) if r7 == r8 goto ...
  frame 3: propagating r2,r7,r8
  frame 2: propagating r6
  mark_precise: frame3: last_idx ...
  mark_precise: frame3: regs=r2,r7,r8 stack= before 436: ...
  mark_precise: frame3: regs=r2,r7 stack= before 435: ...
  mark_precise: frame3: regs=r2,r7 stack= before 434: (85) call bpf_trace_vprintk#177
  verifier bug: backtracking call unexpected regs 84

The log complains that registers r2 and r7 are tracked as precise
while processing the bpf_trace_vprintk() call in precision backtracking.
This can't be right, as r2 is reset by the call and there is nothing
to backtrack it to. The precision propagation is triggered when
a checkpoint is hit at instruction 438, r2 is dead at that instruction.

This happens because of the following sequence of events:
- Instruction 438 is first reached with registers r2 and r7 having
  the same id via a path that does not call bpf_trace_vprintk():
  - Checkpoint is created at 438.
  - The jump at 438 is predicted, hence r7 and registers linked to it
    (r2) are propagated as precise, marking r2 and r7 precise in the
    checkpoint.
- Instruction 438 is reached a second time with r2 undefined and via
  a path that calls bpf_trace_vprintk():
  - Checkpoint is hit.
  - propagate_precision() picks registers r2 and r7 and propagates
    precision marks for those up to the helper call.

The root cause is the fact that states_equal() and
propagate_precision() assume that the precision flag can't be set for a
dead register (as computed by compute_live_registers()).
However, this is not the case when linked registers are at play.
Fix this by accounting for live register flags in
collect_linked_regs().
---
====================

Link: https://patch.msgid.link/20260306-linked-regs-and-propagate-precision-v1-0-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:50:05 -08:00
Eduard Zingerman
223ffb6a3d selftests/bpf: add reproducer for spurious precision propagation through calls
Add a test for the scenario described in the previous commit:
an iterator loop with two paths where one ties r2/r7 via
shared scalar id and skips a call, while the other goes
through the call. Precision marks from the linked registers
get spuriously propagated to the call path via
propagate_precision(), hitting "backtracking call unexpected
regs" in backtrack_insn().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-2-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:50:05 -08:00
Eduard Zingerman
2658a1720a bpf: collect only live registers in linked regs
Fix an inconsistency between func_states_equal() and
collect_linked_regs():
- regsafe() uses check_ids() to verify that cached and current states
  have identical register id mapping.
- func_states_equal() calls regsafe() only for registers computed as
  live by compute_live_registers().
- clean_live_states() is supposed to remove dead registers from cached
  states, but it can skip states belonging to an iterator-based loop.
- collect_linked_regs() collects all registers sharing the same id,
  ignoring the marks computed by compute_live_registers().
  Linked registers are stored in the state's jump history.
- backtrack_insn() marks all linked registers for an instruction
  as precise whenever one of the linked registers is precise.

The above might lead to a scenario:
- There is an instruction I with register rY known to be dead at I.
- Instruction I is reached via two paths: first A, then B.
- On path A:
  - There is an id link between registers rX and rY.
  - Checkpoint C is created at I.
  - Linked register set {rX, rY} is saved to the jump history.
  - rX is marked as precise at I, causing both rX and rY
    to be marked precise at C.
- On path B:
  - There is no id link between registers rX and rY,
    otherwise register states are sub-states of those in C.
  - Because rY is dead at I, check_ids() returns true.
  - Current state is considered equal to checkpoint C,
    propagate_precision() propagates spurious precision
    mark for register rY along the path B.
  - Depending on a program, this might hit verifier_bug()
    in the backtrack_insn(), e.g. if rY ∈  [r1..r5]
    and backtrack_insn() spots a function call.

The reproducer program is in the next patch.
This was hit by sched_ext scx_lavd scheduler code.

Changes in tests:
- verifier_scalar_ids.c selftests need modification to preserve
  some registers as live for __msg() checks.
- exceptions_assert.c adjusted to match changes in the verifier log,
  R0 is dead after conditional instruction and thus does not get
  range.
- precise.c adjusted to match changes in the verifier log, register r9
  is dead after comparison and it's range is not important for test.

Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Fixes: 0fb3cf6110 ("bpf: use register liveness information for func_states_equal")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-1-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:49:40 -08:00
Linus Torvalds
4ae12d8bd9 Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:

 - Split out .modinfo section from ELF_DETAILS macro, as that macro may
   be used in other areas that expect to discard .modinfo, breaking
   certain image layouts

 - Adjust genksyms parser to handle optional attributes in certain
   declarations, necessary after commit 07919126ec ("netfilter:
   annotate NAT helper hook pointers with __rcu")

 - Include resolve_btfids in external module build created by
   scripts/package/install-extmod-build when it may be run on external
   modules

 - Avoid removing objtool binary with 'make clean', as it is required
   for external module builds

* tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: Leave objtool binary around with 'make clean'
  kbuild: install-extmod-build: Package resolve_btfids if necessary
  genksyms: Fix parsing a declarator with a preceding attribute
  kbuild: Split .modinfo out from ELF_DETAILS
2026-03-06 20:27:13 -08:00