linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-14 07:42:38 -04:00

Author	SHA1	Message	Date
Bastien Curutchet (eBPF Foundation)	157feaaf18	selftests/bpf: ns_current_pid_tgid: Use test_progs's ns_ feature Two subtests use the test_in_netns() function to run the test in a dedicated network namespace. This can now be done directly through the test_progs framework with a test name starting with 'ns_'. Replace the use of test_in_netns() by test_ns_* calls. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-4-14504db136b7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-19 09:46:02 -08:00
Bastien Curutchet (eBPF Foundation)	207cd7578a	selftests/bpf: tc_links/tc_opts: Unserialize tests Tests are serialized because they all use the loopback interface. Replace the 'serial_test_' prefixes with 'test_ns_' to benefit from the new test_prog feature which creates a dedicated namespace for each test, allowing them to run in parallel. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-3-14504db136b7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-19 09:46:02 -08:00
Bastien Curutchet (eBPF Foundation)	c047e0e0e4	selftests/bpf: Optionally open a dedicated namespace to run test in it Some tests are serialized to prevent interference with others. Open a dedicated network namespace when a test name starts with 'ns_' to allow more test parallelization. Use the test name as namespace name to avoid conflict between namespaces. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-2-14504db136b7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-19 09:46:02 -08:00
Bastien Curutchet (eBPF Foundation)	4a06c5251a	selftests/bpf: ns_current_pid_tgid: Rename the test function Next patch will add a new feature to test_prog to run tests in a dedicated namespace if the test name starts with 'ns_'. Here the test name already starts with 'ns_' and creates some namespaces which would conflict with the new feature. Rename the test to avoid this conflict. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-1-14504db136b7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-19 09:46:02 -08:00
Alexei Starovoitov	654765b5c6	Merge branch 'bpf-copy_verifier_state-should-copy-loop_entry-field' Eduard Zingerman says: ==================== This patch set fixes a bug in copy_verifier_state() where the loop_entry field was not copied. This omission led to incorrect loop_entry fields remaining in env->cur_state, causing incorrect decisions about loop entry assignments in update_loop_entry(). An example of an unsafe program accepted by the verifier due to this bug can be found in patch #2. This bug can also cause an infinite loop in the verifier, see patch #5. Structure of the patch set: - Patch #1 fixes the bug but has a significant negative impact on verification performance for sched_ext programs. - Patch #3 mitigates the verification performance impact of patch #1 by avoiding clean_live_states() for states whose loop_entry is still being verified. This reduces the number of processed instructions for sched_ext programs by 28–92% in some cases. - Patches #5-6 simplify {get,update}_loop_entry() logic (and are not strictly necessary). - Patches #7–10 mitigate the memory overhead introduced by patch #1 when a program with iterator-based loop hits the 1M instruction limit. This is achieved by freeing states in env->free_list when their branches and used_as_loop_entry counts reach zero. Patches #1-4 were previously sent as a part of [1]. [1] https://lore.kernel.org/bpf/20250122120442.3536298-1-eddyz87@gmail.com/ ==================== Link: https://patch.msgid.link/20250215110411.3236773-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:23:06 -08:00
Eduard Zingerman	574078b001	bpf: fix env->peak_states computation Compute env->peak_states as a maximum value of sum of env->explored_states and env->free_list size. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-11-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	408fcf946b	bpf: free verifier states when they are no longer referenced When fixes from patches 1 and 3 are applied, Patrick Somaru reported an increase in memory consumption for sched_ext iterator-based programs hitting 1M instructions limit. For example, 2Gb VMs ran out of memory while verifying a program. Similar behaviour could be reproduced on current bpf-next master. Here is an example of such program: /* verification completes if given 16G or RAM, * final env->free_list size is 369,960 entries. / SEC("raw_tp") __flag(BPF_F_TEST_STATE_FREQ) __success int free_list_bomb(const void ctx) { volatile char buf[48] = {}; unsigned i, j; j = 0; bpf_for(i, 0, 10) { /* this forks verifier state: * - verification of current path continues and * creates a checkpoint after 'if'; * - verification of forked path hits the * checkpoint and marks it as loop_entry. / if (bpf_get_prandom_u32()) asm volatile (""); / this marks 'j' as precise, thus any checkpoint * created on current iteration would not be matched * on the next iteration. */ buf[j++] = 42; j %= ARRAY_SIZE(buf); } asm volatile (""::"r"(buf)); return 0; } Memory consumption increased due to more states being marked as loop entries and eventually added to env->free_list. This commit introduces logic to free states from env->free_list during verification. A state in env->free_list can be freed if: - it has no child states; - it is not used as a loop_entry. This commit: - updates bpf_verifier_state->used_as_loop_entry to be a counter that tracks how many states use this one as a loop entry; - adds a function maybe_free_verifier_state(), which: - frees a state if its ->branches and ->used_as_loop_entry counters are both zero; - if the state is freed, state->loop_entry->used_as_loop_entry is decremented, and an attempt is made to free state->loop_entry. In the example above, this approach reduces the maximum number of states in the free list from 369,960 to 16,223. However, this approach has its limitations. If the buf size in the example above is modified to 64, state caching overflows: the state for j=0 is evicted from the cache before it can be used to stop traversal. As a result, states in the free list accumulate because their branch counters do not reach zero. The effect of this patch on the selftests looks as follows: File Program Max free list (A) Max free list (B) Max free list (DIFF) -------------------------------- ------------------------------------ ----------------- ----------------- -------------------- arena_list.bpf.o arena_list_add 17 3 -14 (-82.35%) bpf_iter_task_stack.bpf.o dump_task_stack 39 9 -30 (-76.92%) iters.bpf.o checkpoint_states_deletion 265 89 -176 (-66.42%) iters.bpf.o clean_live_states 19 0 -19 (-100.00%) profiler2.bpf.o tracepoint__syscalls__sys_enter_kill 102 1 -101 (-99.02%) profiler3.bpf.o tracepoint__syscalls__sys_enter_kill 144 0 -144 (-100.00%) pyperf600_iter.bpf.o on_event 15 0 -15 (-100.00%) pyperf600_nounroll.bpf.o on_event 1170 1158 -12 (-1.03%) setget_sockopt.bpf.o skops_sockopt 18 0 -18 (-100.00%) strobemeta_nounroll1.bpf.o on_event 147 83 -64 (-43.54%) strobemeta_nounroll2.bpf.o on_event 312 209 -103 (-33.01%) strobemeta_subprogs.bpf.o on_event 124 86 -38 (-30.65%) test_cls_redirect_subprogs.bpf.o cls_redirect 15 0 -15 (-100.00%) timer.bpf.o test1 30 15 -15 (-50.00%) Measured using "do-not-submit" patches from here: https://github.com/eddyz87/bpf/tree/get-loop-entry-hungup Reported-by: Patrick Somaru <patsomaru@meta.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-10-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	5564ee3abb	bpf: use list_head to track explored states and free list The next patch in the set needs the ability to remove individual states from env->free_list while only holding a pointer to the state. Which requires env->free_list to be a doubly linked list. This patch converts env->free_list and struct bpf_verifier_state_list to use struct list_head for this purpose. The change to env->explored_states is collateral. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	590eee4268	bpf: do not update state->loop_entry in get_loop_entry() The patch 9 is simpler if less places modify loop_entry field. The loop deleted by this patch does not affect correctness, but is a performance optimization. However, measurements on selftests and sched_ext programs show that this optimization is unnecessary: - at most 2 steps are done in get_loop_entry(); - most of the time 0 or 1 steps are done in get_loop_entry(). Measured using "do-not-submit" patches from here: https://github.com/eddyz87/bpf/tree/get-loop-entry-hungup Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	bb7abf3049	bpf: make state->dfs_depth < state->loop_entry->dfs_depth an invariant For a generic loop detection algorithm a graph node can be a loop header for itself. However, state loop entries are computed for use in is_state_visited(), where get_loop_entry(state)->branches is checked. is_state_visited() also checks state->branches, thus the case when state == state->loop_entry is not interesting for is_state_visited(). This change does not affect correctness, but simplifies get_loop_entry() a bit and also simplifies change to update_loop_entry() in patch 9. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	c1ce66357f	bpf: detect infinite loop in get_loop_entry() Tejun Heo reported an infinite loop in get_loop_entry(), when verifying a sched_ext program layered_dispatch in [1]. After some investigation I'm sure that root cause is fixed by patches 1,3 in this patch-set. To err on the safe side, this commit modifies get_loop_entry() to detect infinite loops and abort verification in such cases. The number of steps get_loop_entry(S) can make while moving along the bpf_verifier_state->loop_entry chain is bounded by the DFS depth of state S. This fact is exploited to implement the check. To avoid dealing with the potential error code returned from get_loop_entry() in update_loop_entry(), remove the get_loop_entry() calls there: - This change does not affect correctness. Loop entries would still be updated during the backward DFS move in update_branch_counts(). - This change does not affect performance. Measurements show that get_loop_entry() performs at most 1 step on selftests and at most 2 steps on sched_ext programs (1 step in 17 cases, 2 steps in 3 cases, measured using "do-not-submit" patches from [2]). [1] https://github.com/sched-ext/scx/ commit f0b27038ea10 ("XXX - kernel stall") [2] https://github.com/eddyz87/bpf/tree/get-loop-entry-hungup Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	6361cd26e4	selftests/bpf: check states pruning for deeply nested iterator A test case with ridiculously deep bpf_for() nesting and a conditional update of a stack location. Consider the innermost loop structure: 1: bpf_for(o, 0, 10) 2: if (unlikely(bpf_get_prandom_u32())) 3: buf[0] = 42; 4: <exit> Assuming that verifier.c:clean_live_states() operates w/o change from the previous patch (e.g. as on current master) verification would proceed as follows: - at (1) state {buf[0]=?,o=drained}: - checkpoint - push visit to (2) for later - at (4) {buf[0]=?,o=drained} - pop (2) {buf[0]=?,o=active}, push visit to (3) for later - at (1) {buf[0]=?,o=active} - checkpoint - push visit to (2) for later - at (4) {buf[0]=?,o=drained} - pop (2) {buf[0]=?,o=active}, push visit to (3) for later - at (1) {buf[0]=?,o=active}: - checkpoint reached, checkpoint's branch count becomes 0 - checkpoint is processed by clean_live_states() and becomes {o=active} - pop (3) {buf[0]=42,o=active} - at (1), {buf[0]=42,o=active} - checkpoint - push visit to (2) for later - at (4) {buf[0]=42,o=drained} - pop (2) {buf[0]=42,o=active}, push visit to (3) for later - at (1) {buf[0]=42,o=active}, checkpoint reached - pop (3) {buf[0]=42,o=active} - at (1) {buf[0]=42,o=active}: - checkpoint reached, checkpoint's branch count becomes 0 - checkpoint is processed by clean_live_states() and becomes {o=active} - ... Note how clean_live_states() converted the checkpoint {buf[0]=42,o=active} to {o=active} and it can no longer be matched against {buf[0]=<any>,o=active}, because iterator based states are compared using stacksafe(... RANGE_WITHIN), that requires stack slots to have same types. At the same time there are still states {buf[0]=42,o=active} pushed to DFS stack. This behaviour becomes exacerbated with multiple nesting levels, here are veristat results: - nesting level 1: 69 insns - nesting level 2: 258 insns - nesting level 3: 900 insns - nesting level 4: 4754 insns - nesting level 5: 35944 insns - nesting level 6: 312558 insns - nesting level 7: 1M limit Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	9e63fdb0cb	bpf: don't do clean_live_states when state->loop_entry->branches > 0 verifier.c:is_state_visited() uses RANGE_WITHIN states comparison rules for cached states that have loop_entry with non-zero branches count (meaning that loop_entry's verification is not yet done). The RANGE_WITHIN rules in regsafe()/stacksafe() require register and stack objects types to be identical in current and old states. verifier.c:clean_live_states() replaces registers and stack spills with NOT_INIT/STACK_INVALID marks, if these registers/stack spills are not read in any child state. This means that clean_live_states() works against loop convergence logic under some conditions. See selftest in the next patch for a specific example. Mitigate this by prohibiting clean_verifier_state() when state->loop_entry->branches > 0. This undoes negative verification performance impact of the copy_verifier_state() fix from the previous patch. Below is comparison between master and current patch. selftests: File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ---------------------------------- ---------------------------- --------- --------- --------------- ---------- ---------- -------------- arena_htab.bpf.o arena_htab_llvm 717 423 -294 (-41.00%) 57 37 -20 (-35.09%) arena_htab_asm.bpf.o arena_htab_asm 597 445 -152 (-25.46%) 47 37 -10 (-21.28%) arena_list.bpf.o arena_list_add 1493 1822 +329 (+22.04%) 30 37 +7 (+23.33%) arena_list.bpf.o arena_list_del 309 261 -48 (-15.53%) 23 15 -8 (-34.78%) iters.bpf.o checkpoint_states_deletion 18125 22154 +4029 (+22.23%) 818 918 +100 (+12.22%) iters.bpf.o iter_nested_deeply_iters 593 367 -226 (-38.11%) 67 43 -24 (-35.82%) iters.bpf.o iter_nested_iters 813 772 -41 (-5.04%) 79 72 -7 (-8.86%) iters.bpf.o iter_subprog_check_stacksafe 155 135 -20 (-12.90%) 15 14 -1 (-6.67%) iters.bpf.o iter_subprog_iters 1094 808 -286 (-26.14%) 88 68 -20 (-22.73%) iters.bpf.o loop_state_deps2 479 356 -123 (-25.68%) 46 35 -11 (-23.91%) iters.bpf.o triple_continue 35 31 -4 (-11.43%) 3 3 +0 (+0.00%) kmem_cache_iter.bpf.o open_coded_iter 63 59 -4 (-6.35%) 7 6 -1 (-14.29%) mptcp_subflow.bpf.o _getsockopt_subflow 501 446 -55 (-10.98%) 25 23 -2 (-8.00%) pyperf600_iter.bpf.o on_event 12339 6379 -5960 (-48.30%) 441 286 -155 (-35.15%) verifier_bits_iter.bpf.o max_words 92 84 -8 (-8.70%) 8 7 -1 (-12.50%) verifier_iterating_callbacks.bpf.o cond_break2 113 192 +79 (+69.91%) 12 21 +9 (+75.00%) sched_ext: File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ----------------- ---------------------- --------- --------- ----------------- ---------- ---------- ---------------- bpf.bpf.o layered_dispatch 11485 9039 -2446 (-21.30%) 848 662 -186 (-21.93%) bpf.bpf.o layered_dump 7422 5022 -2400 (-32.34%) 681 298 -383 (-56.24%) bpf.bpf.o layered_enqueue 16854 13753 -3101 (-18.40%) 1611 1308 -303 (-18.81%) bpf.bpf.o layered_init 1000001 5549 -994452 (-99.45%) 84672 523 -84149 (-99.38%) bpf.bpf.o layered_runnable 3149 1899 -1250 (-39.70%) 288 151 -137 (-47.57%) bpf.bpf.o p2dq_init 2343 1936 -407 (-17.37%) 201 170 -31 (-15.42%) bpf.bpf.o refresh_layer_cpumasks 16487 1285 -15202 (-92.21%) 1770 120 -1650 (-93.22%) bpf.bpf.o rusty_select_cpu 1937 1386 -551 (-28.45%) 177 125 -52 (-29.38%) scx_central.bpf.o central_dispatch 636 600 -36 (-5.66%) 63 59 -4 (-6.35%) scx_central.bpf.o central_init 913 632 -281 (-30.78%) 48 39 -9 (-18.75%) scx_nest.bpf.o nest_init 636 601 -35 (-5.50%) 60 58 -2 (-3.33%) scx_pair.bpf.o pair_dispatch 1000001 1914 -998087 (-99.81%) 58169 142 -58027 (-99.76%) scx_qmap.bpf.o qmap_dispatch 2393 2187 -206 (-8.61%) 196 174 -22 (-11.22%) scx_qmap.bpf.o qmap_init 16367 22777 +6410 (+39.16%) 603 768 +165 (+27.36%) 'layered_init' and 'pair_dispatch' hit 1M on master, but are verified ok with this patch. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:59 -08:00
Eduard Zingerman	6da35da1a3	selftests/bpf: test correct loop_entry update in copy_verifier_state A somewhat cumbersome test case sensitive to correct copying of bpf_verifier_state->loop_entry fields in verifier.c:copy_verifier_state(). W/o the fix from a previous commit the program is accepted as safe. 1: /* poison block / 2: if (random() != 24) { // assume false branch is placed first 3: i = iter_new(); 4: while (iter_next(i)); 5: iter_destroy(i); 6: return; 7: } 8: 9: / dfs_depth block / 10: for (i = 10; i > 0; i--); 11: 12: / main block / 13: i = iter_new(); // fp[-16] 14: b = -24; // r8 15: for (;;) { 16: if (iter_next(i)) 17: break; 18: if (random() == 77) { // assume false branch is placed first 19: (u64 )(r10 + b) = 7; // this is not safe when b == -25 20: iter_destroy(i); 21: return; 22: } 23: if (random() == 42) { // assume false branch is placed first 24: b = -25; 25: } 26: } 27: iter_destroy(i); The goal of this example is to: (a) poison env->cur_state->loop_entry with a state S, such that S->branches == 0; (b) set state S as a loop_entry for all checkpoints in / main block /, thus forcing NOT_EXACT states comparisons; (c) exploit incorrect loop_entry set for checkpoint at line 18 by first creating a checkpoint with b == -24 and then pruning the state with b == -25 using that checkpoint. The / poison block / is responsible for goal (a). It forces verifier to first validate some unrelated iterator based loop, which leads to an update_loop_entry() call in is_state_visited(), which places checkpoint created at line 4 as env->cur_state->loop_entry. Starting from line 8, the branch count for that checkpoint is 0. The / dfs_depth block / is responsible for goal (b). It abuses the fact that update_loop_entry(cur, hdr) only updates cur->loop_entry when hdr->dfs_depth <= cur->dfs_depth. After line 12 every state has dfs_depth bigger then dfs_depth of poisoned env->cur_state->loop_entry. Thus the above condition is never true for lines 12-27. The / main block */ is responsible for goal (c). Verification proceeds as follows: - checkpoint {b=-24,i=active} created at line 16; - jump 18->23 is verified first, jump to 19 pushed to stack; - jump 23->26 is verified first, jump to 24 pushed to stack; - checkpoint {b=-24,i=active} created at line 15; - current state is pruned by checkpoint created at line 16, this sets branches count for checkpoint at line 15 to 0; - jump to 24 is popped from stack; - line 16 is reached in state {b=-25,i=active}; - this is pruned by a previous checkpoint {b=-24,i=active}: - checkpoint's loop_entry is poisoned and has branch count of 0, hence states are compared using NOT_EXACT rules; - b is not marked precise yet. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:58 -08:00
Eduard Zingerman	bbbc02b744	bpf: copy_verifier_state() should copy 'loop_entry' field The bpf_verifier_state.loop_entry state should be copied by copy_verifier_state(). Otherwise, .loop_entry values from unrelated states would poison env->cur_state. Additionally, env->stack should not contain any states with .loop_entry != NULL. The states in env->stack are yet to be verified, while .loop_entry is set for states that reached an equivalent state. This means that env->cur_state->loop_entry should always be NULL after pop_stack(). See the selftest in the next commit for an example of the program that is not safe yet is accepted by verifier w/o this fix. This change has some verification performance impact for selftests: File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ---------------------------------- ---------------------------- --------- --------- -------------- ---------- ---------- ------------- arena_htab.bpf.o arena_htab_llvm 717 426 -291 (-40.59%) 57 37 -20 (-35.09%) arena_htab_asm.bpf.o arena_htab_asm 597 445 -152 (-25.46%) 47 37 -10 (-21.28%) arena_list.bpf.o arena_list_del 309 279 -30 (-9.71%) 23 14 -9 (-39.13%) iters.bpf.o iter_subprog_check_stacksafe 155 141 -14 (-9.03%) 15 14 -1 (-6.67%) iters.bpf.o iter_subprog_iters 1094 1003 -91 (-8.32%) 88 83 -5 (-5.68%) iters.bpf.o loop_state_deps2 479 725 +246 (+51.36%) 46 63 +17 (+36.96%) kmem_cache_iter.bpf.o open_coded_iter 63 59 -4 (-6.35%) 7 6 -1 (-14.29%) verifier_bits_iter.bpf.o max_words 92 84 -8 (-8.70%) 8 7 -1 (-12.50%) verifier_iterating_callbacks.bpf.o cond_break2 113 107 -6 (-5.31%) 12 12 +0 (+0.00%) And significant negative impact for sched_ext: File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ----------------- ---------------------- --------- --------- -------------------- ---------- ---------- ------------------ bpf.bpf.o lavd_init 7039 14723 +7684 (+109.16%) 490 1139 +649 (+132.45%) bpf.bpf.o layered_dispatch 11485 10548 -937 (-8.16%) 848 762 -86 (-10.14%) bpf.bpf.o layered_dump 7422 1000001 +992579 (+13373.47%) 681 31178 +30497 (+4478.27%) bpf.bpf.o layered_enqueue 16854 71127 +54273 (+322.02%) 1611 6450 +4839 (+300.37%) bpf.bpf.o p2dq_dispatch 665 791 +126 (+18.95%) 68 78 +10 (+14.71%) bpf.bpf.o p2dq_init 2343 2980 +637 (+27.19%) 201 237 +36 (+17.91%) bpf.bpf.o refresh_layer_cpumasks 16487 674760 +658273 (+3992.68%) 1770 65370 +63600 (+3593.22%) bpf.bpf.o rusty_select_cpu 1937 40872 +38935 (+2010.07%) 177 3210 +3033 (+1713.56%) scx_central.bpf.o central_dispatch 636 2687 +2051 (+322.48%) 63 227 +164 (+260.32%) scx_nest.bpf.o nest_init 636 815 +179 (+28.14%) 60 73 +13 (+21.67%) scx_qmap.bpf.o qmap_dispatch 2393 3580 +1187 (+49.60%) 196 253 +57 (+29.08%) scx_qmap.bpf.o qmap_dump 233 318 +85 (+36.48%) 22 30 +8 (+36.36%) scx_qmap.bpf.o qmap_init 16367 17436 +1069 (+6.53%) 603 669 +66 (+10.95%) Note 'layered_dump' program, which now hits 1M instructions limit. This impact would be mitigated in the next patch. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250215110411.3236773-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-18 19:22:58 -08:00
Martin KaFai Lau	0fc6025c95	Merge branch 'selftests-bpf-migrate-test_xdp_redirect_multi-sh-to-test_progs' Bastien Curutchet says: ==================== This patch series continues the work to migrate the *.sh tests into prog_tests framework. test_xdp_redirect_multi.sh tests the XDP redirections done through bpf_redirect_map(). This is already partly covered by test_xdp_veth.c that already tests map redirections at XDP level. What isn't covered yet by test_xdp_veth is the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS) and XDP egress programs. Hence, this patch series add test cases to test_xdp_veth.c to get rid of the test_xdp_redirect_multi.sh: - PATCH 1 & 2 Rework test_xdp_veth.c to avoid using the root namespace - PATCH 3 and 4 cover the broadcast flags - PATCH 5 covers the XDP egress programs NOTE: While working on this iteration I ran into a memory leak in net/core/rtnetlink.c that leads to oom-kill when running ./test_progs in a loop. This leak has been fixed by commit `1438f5d07b` ("rtnetlink: fix netns leak with rtnl_setlink()") in the net tree. ==================== Link: https://patch.msgid.link/20250212-redirect-multi-v5-0-fd0d39fca6e6@bootlin.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-02-18 14:23:19 -08:00
Bastien Curutchet (eBPF Foundation)	e06f5bfd93	selftests/bpf: Remove test_xdp_redirect_multi.sh The tests done by test_xdp_redirect_multi.sh are now fully covered by the CI through test_xdp_veth.c. Remove test_xdp_redirect_multi.sh Remove xdp_redirect_multi.c that was used by the script to load and attach the BPF programs. Remove their entries in the Makefile Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-6-fd0d39fca6e6@bootlin.com	2025-02-18 13:56:34 -08:00
Bastien Curutchet (eBPF Foundation)	a93bfd824d	selftests/bpf: test_xdp_veth: Add XDP program on egress test XDP programs loaded on egress is tested by test_xdp_redirect_multi.sh but not by the test_progs framework. Add a test case in test_xdp_veth.c to test the XDP program on egress. Use the same BPF program than test_xdp_redirect_multi.sh that replaces the source MAC address by one provided through a BPF map. Use a BPF program that stores the source MAC of received packets in a map to check the test results. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-5-fd0d39fca6e6@bootlin.com	2025-02-18 13:56:34 -08:00
Bastien Curutchet (eBPF Foundation)	1e7e634542	selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests XDP redirections with BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS flags are tested by test_xdp_redirect_multi.sh but not within the test_progs framework. Add a broadcast test case in test_xdp_veth.c to test them. Use the same BPF programs than the one used by test_xdp_redirect_multi.sh. Use a BPF map to select the broadcast flags. Use a BPF map with an entry per veth to check whether packets are received or not Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-4-fd0d39fca6e6@bootlin.com	2025-02-18 13:56:34 -08:00
Bastien Curutchet (eBPF Foundation)	09c8bb1fae	selftests/bpf: Optionally select broadcasting flags Broadcasting flags are hardcoded for each kind for protocol. Create a redirect_flags map that allows to select the broadcasting flags to use in the bpf_redirect_map(). The protocol ID is used as a key. Set the old hardcoded values as default if the map isn't filled by the BPF caller. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-3-fd0d39fca6e6@bootlin.com	2025-02-18 13:56:34 -08:00
Bastien Curutchet (eBPF Foundation)	19a9484c1b	selftests/bpf: test_xdp_veth: Use a dedicated namespace Tests use the root network namespace, so they aren't fully independent of each other. For instance, the index of the created veth interfaces is incremented every time a new test is launched. Wrap the network topology in a network namespace to ensure full isolation. Use the append_tid() helper to ensure the uniqueness of this namespace's name during parallel runs. Remove the use of the append_tid() on the veth names as they now belong to an already unique namespace. Simplify cleanup_network() by directly deleting the namespaces Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250212-redirect-multi-v5-2-fd0d39fca6e6@bootlin.com	2025-02-18 13:56:34 -08:00
Bastien Curutchet (eBPF Foundation)	6bdac0e317	selftests/bpf: test_xdp_veth: Create struct net_configuration The network configuration is defined by a table of struct veth_configuration. This isn't convenient if we want to add a network configuration that isn't linked to a veth pair. Create a struct net_configuration that holds the veth_configuration table to ease adding new configuration attributes in upcoming patch. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250212-redirect-multi-v5-1-fd0d39fca6e6@bootlin.com	2025-02-18 13:56:34 -08:00
Alexei Starovoitov	50b77eb514	Merge branch 'extend-struct_ops-support-for-operators' Amery Hung says: ==================== This patchset supports struct_ops operators that acquire kptrs through arguments and operators that return a kptr. A coming new struct_ops use case, bpf qdisc [0], has two operators that are not yet supported by current struct_ops infrastructure. Qdisc_ops::enqueue requires getting referenced skb kptr from the argument; Qdisc_ops::dequeue needs to return a referenced skb kptr. This patchset will allow bpf qdisc and other potential struct_ops implementers to do so. For struct_ops implementers: - To get a kptr from an argument, a struct_ops implementer needs to annotate the argument name in the stub function with "__ref" suffix. - The kptr return will automatically work as we now allow operators that return a struct pointer. - The verifier allows returning a null pointer. More control can be added later if there is a future struct_ops implementer only expecting valid pointers. For struct_ops users: - The referenced kptr acquired through the argument needs to be released or xchged into maps just like ones acquired via kfuncs. - To return a referenced kptr in struct_ops, 1) The type of the pointer must matches the return type 2) The pointer must comes from the kernel (not locally allocated), and 3) The pointer must be in its unmodified form [0] https://lore.kernel.org/bpf/20250210174336.2024258-1-ameryhung@gmail.com/ --- v2 - Replace kcalloc+memcpy with kmemdup_array in bpf_prog_ctx_arg_info_init() - Remove unnecessary checks when kfree-ing ctx_arg_info - Remove conditional assignment of ref_obj_id in btf_ctx_access() v1 - Link: https://lore.kernel.org/bpf/20250214164520.1001211-1-ameryhung@gmail.com/ - Fix missing kfree for ctx_arg_info ==================== Link: https://patch.msgid.link/20250217190640.1748177-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-17 18:47:37 -08:00
Amery Hung	af17bad9fb	selftests/bpf: Test returning referenced kptr from struct_ops programs Test struct_ops programs returning referenced kptr. When the return type of a struct_ops operator is pointer to struct, the verifier should only allow programs that return a scalar NULL or a non-local kptr with the correct type in its unmodified form. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-6-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-17 18:47:27 -08:00
Amery Hung	8d9f547f74	bpf: Allow struct_ops prog to return referenced kptr Allow a struct_ops program to return a referenced kptr if the struct_ops operator's return type is a struct pointer. To make sure the returned pointer continues to be valid in the kernel, several constraints are required: 1) The type of the pointer must matches the return type 2) The pointer originally comes from the kernel (not locally allocated) 3) The pointer is in its unmodified form Implementation wise, a referenced kptr first needs to be allowed to _leak_ in check_reference_leak() if it is in the return register. Then, in check_return_code(), constraints 1-3 are checked. During struct_ops registration, a check is also added to warn about operators with non-struct pointer return. In addition, since the first user, Qdisc_ops::dequeue, allows a NULL pointer to be returned when there is no skb to be dequeued, we will allow a scalar value with value equals to NULL to be returned. In the future when there is a struct_ops user that always expects a valid pointer to be returned from an operator, we may extend tagging to the return value. We can tell the verifier to only allow NULL pointer return if the return value is tagged with MAY_BE_NULL. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-5-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-17 18:47:27 -08:00
Amery Hung	6991ec6beb	selftests/bpf: Test referenced kptr arguments of struct_ops programs Test referenced kptr acquired through struct_ops argument tagged with "__ref". The success case checks whether 1) a reference to the correct type is acquired, and 2) the referenced kptr argument can be accessed in multiple paths as long as it hasn't been released. In the fail cases, we first confirm that a referenced kptr acquried through a struct_ops argument is not allowed to be leaked. Then, we make sure this new referenced kptr acquiring mechanism does not accidentally allow referenced kptrs to flow into global subprograms through their arguments. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-17 18:47:27 -08:00
Amery Hung	a687df2008	bpf: Support getting referenced kptr from struct_ops argument Allows struct_ops programs to acqurie referenced kptrs from arguments by directly reading the argument. The verifier will acquire a reference for struct_ops a argument tagged with "__ref" in the stub function in the beginning of the main program. The user will be able to access the referenced kptr directly by reading the context as long as it has not been released by the program. This new mechanism to acquire referenced kptr (compared to the existing "kfunc with KF_ACQUIRE") is introduced for ergonomic and semantic reasons. In the first use case, Qdisc_ops, an skb is passed to .enqueue in the first argument. This mechanism provides a natural way for users to get a referenced kptr in the .enqueue struct_ops programs and makes sure that a qdisc will always enqueue or drop the skb. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-17 18:47:27 -08:00
Amery Hung	432051806f	bpf: Make every prog keep a copy of ctx_arg_info Currently, ctx_arg_info is read-only in the view of the verifier since it is shared among programs of the same attach type. Make each program have their own copy of ctx_arg_info so that we can use it to store program specific information. In the next patch where we support acquiring a referenced kptr through a struct_ops argument tagged with "__ref", ctx_arg_info->ref_obj_id will be used to store the unique reference object id of the argument. This avoids creating a requirement in the verifier that "__ref" tagged arguments must be the first set of references acquired [0]. [0] https://lore.kernel.org/bpf/20241220195619.2022866-2-amery.hung@gmail.com/ Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-17 18:47:27 -08:00
Andrii Nakryiko	4eb93fea59	selftests/bpf: add test for LDX/STX/ST relocations over array field Add a simple repro for the issue of miscalculating LDX/STX/ST CO-RE relocation size adjustment when the CO-RE relocation target type is an ARRAY. We need to make sure that compiler generates LDX/STX/ST instruction with CO-RE relocation against entire ARRAY type, not ARRAY's element. With the code pattern in selftest, we get this: 59: 61 71 00 00 00 00 00 00 w1 = (u32 )(r7 + 0x0) 00000000000001d8: CO-RE <byte_off> [5] struct core_reloc_arrays::a (0:0) Where offset of `int a[5]` is embedded (through CO-RE relocation) into memory load instruction itself. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250207014809.1573841-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-14 19:58:14 -08:00
Andrii Nakryiko	06096d19ee	libbpf: fix LDX/STX/ST CO-RE relocation size adjustment logic Libbpf has a somewhat obscure feature of automatically adjusting the "size" of LDX/STX/ST instruction (memory store and load instructions), based on originally recorded access size (u8, u16, u32, or u64) and the actual size of the field on target kernel. This is meant to facilitate using BPF CO-RE on 32-bit architectures (pointers are always 64-bit in BPF, but host kernel's BTF will have it as 32-bit type), as well as generally supporting safe type changes (unsigned integer type changes can be transparently "relocated"). One issue that surfaced only now, 5 years after this logic was implemented, is how this all works when dealing with fields that are arrays. This isn't all that easy and straightforward to hit (see selftests that reproduce this condition), but one of sched_ext BPF programs did hit it with innocent looking loop. Long story short, libbpf used to calculate entire array size, instead of making sure to only calculate array's element size. But it's the element that is loaded by LDX/STX/ST instructions (1, 2, 4, or 8 bytes), so that's what libbpf should check. This patch adjusts the logic for arrays and fixed the issue. Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250207014809.1573841-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-14 19:58:05 -08:00
Alexei Starovoitov	772b9b11e6	Merge branch 'bpf-fix-array-bounds-error-with-may_goto-and-add-selftest' Jiayuan Chen says: ==================== bpf: Fix array bounds error with may_goto and add selftest Syzbot caught an array out-of-bounds bug [1]. It turns out that when the BPF program runs through do_misc_fixups(), it allocates an extra 8 bytes on the call stack, which eventually causes stack_depth to exceed 512. I was able to reproduce this issue probabilistically by enabling CONFIG_UBSAN=y and disabling CONFIG_BPF_JIT_ALWAYS_ON with the selfttest I provide in second patch(although it doesn't happen every time - I didn't dig deeper into why UBSAN behaves this way). Furthermore, if I set /proc/sys/net/core/bpf_jit_enable to 0 to disable the jit, a panic occurs, and the reason is the same, that bpf_func is assigned an incorrect address. [---[ end trace ]--- [Oops: general protection fault, probably for non-canonical address 0x100f0e0e0d090808: 0000 [#1] PREEMPT SMP NOPTI [Tainted: [W]=WARN, [O]=OOT_MODULE [RIP: 0010:bpf_test_run+0x1d2/0x360 [RSP: 0018:ffffafc7955178a0 EFLAGS: 00010246 [RAX: 100f0e0e0d090808 RBX: ffff8e9fdb2c4100 RCX: 0000000000000018 [RDX: 00000000002b5b18 RSI: ffffafc780497048 RDI: ffff8ea04d601700 [RBP: ffffafc780497000 R08: ffffafc795517a0c R09: 0000000000000000 [R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8ea04d601700 [R13: ffffafc795517928 R14: ffffafc795517928 R15: 0000000000000000 [CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [CR2: 00007f181c064648 CR3: 00000001aa2be003 CR4: 0000000000770ef0 [DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [PKRU: 55555554 [Call Trace: [ <TASK> [ ? die_addr+0x36/0x90 [ ? exc_general_protection+0x237/0x430 [ ? asm_exc_general_protection+0x26/0x30 [ ? bpf_test_run+0x1d2/0x360 [ ? bpf_test_run+0x10d/0x360 [ ? __link_object+0x12a/0x1e0 [ ? slab_build_skb+0x23/0x130 [ ? kmem_cache_alloc_noprof+0x2ea/0x3f0 [ ? sk_prot_alloc+0xc2/0x120 [ bpf_prog_test_run_skb+0x21b/0x590 [ __sys_bpf+0x340/0xa80 [ __x64_sys_bpf+0x1e/0x30 --- v2 -> v3: Optimized some code naming and conditional judgment logic. https://lore.kernel.org/bpf/20250213131214.164982-1-mrpre@163.com/T/ v1 -> v2: Directly reject loading programs with a stack size greater than 512 when jit disabled.(Suggested by Alexei Starovoitov) https://lore.kernel.org/bpf/20250212135251.85487-1-mrpre@163.com/T/ ==================== Link: https://patch.msgid.link/20250214091823.46042-1-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-14 19:55:15 -08:00
Jiayuan Chen	72266ee83f	selftests/bpf: Add selftest for may_goto Added test cases to ensure that programs with stack sizes exceeding 512 bytes are restricted in non-JITed mode, and can be executed normally in JITed mode, even with stack sizes exceeding 512 bytes due to the presence of may_goto instructions. Test result: echo "0" > /proc/sys/net/core/bpf_jit_enable ./test_progs -t verifier_stack_ptr ... stack size 512 with may_goto with jit:SKIP stack size 512 with may_goto without jit:OK ... Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED echo "1" > /proc/sys/net/core/bpf_jit_enable ./test_progs -t verifier_stack_ptr ... stack size 512 with may_goto with jit:OK stack size 512 with may_goto without jit:SKIP ... Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20250214091823.46042-4-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-14 19:55:15 -08:00
Jiayuan Chen	b38c72ab80	selftests/bpf: Introduce __load_if_JITed annotation for tests In some cases, the verification logic under the interpreter and JIT differs, such as may_goto, and the test program behaves differently under different runtime modes, requiring separate verification logic for each result. Introduce __load_if_JITed and __load_if_no_JITed annotation for tests. Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20250214091823.46042-3-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-14 19:55:15 -08:00
Jiayuan Chen	6ebc5030e0	bpf: Fix array bounds error with may_goto may_goto uses an additional 8 bytes on the stack, which causes the interpreters[] array to go out of bounds when calculating index by stack_size. 1. If a BPF program is rewritten, re-evaluate the stack size. For non-JIT cases, reject loading directly. 2. For non-JIT cases, calculating interpreters[idx] may still cause out-of-bounds array access, and just warn about it. 3. For jit_requested cases, the execution of bpf_func also needs to be warned. So move the definition of function __bpf_prog_ret0_warn out of the macro definition CONFIG_BPF_JIT_ALWAYS_ON. Reported-by: syzbot+d2a2c639d03ac200a4f1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/0000000000000f823606139faa5d@google.com/ Fixes: `011832b97b` ("bpf: Introduce may_goto instruction") Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20250214091823.46042-2-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-14 19:55:15 -08:00
Rong Tao	a4585442ad	bpftool: Check map name length when map create The size of struct bpf_map::name is BPF_OBJ_NAME_LEN (16). bpf(2) { map_create() { bpf_obj_name_cpy(map->name, attr->map_name, sizeof(attr->map_name)); } } When specifying a map name using bpftool map create name, no error is reported if the name length is greater than 15. $ sudo bpftool map create /sys/fs/bpf/12345678901234567890 \ type array key 4 value 4 entries 5 name 12345678901234567890 Users will think that 12345678901234567890 is legal, but this name cannot be used to index a map. $ sudo bpftool map show name 12345678901234567890 Error: can't parse name $ sudo bpftool map show ... 1249: array name 123456789012345 flags 0x0 key 4B value 4B max_entries 5 memlock 304B $ sudo bpftool map show name 123456789012345 1249: array name 123456789012345 flags 0x0 key 4B value 4B max_entries 5 memlock 304B The map name provided in the command line is truncated, but no warning is reported. This submission checks the length of the map name. Reviewed-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Rong Tao <rongtao@cestc.cn> Link: https://lore.kernel.org/r/tencent_B44B3A95F0D7C2512DC40D831DA1FA2C9907@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 20:11:38 -08:00
Alexei Starovoitov	68a4154490	Merge branch 'enable-writing-xattr-from-bpf-programs' Song Liu says: ==================== Enable writing xattr from BPF programs Add support to set and remove xattr from BPF program. Also add security.bpf. xattr name prefix. kfuncs are added to set and remove xattrs with security.bpf. name prefix. Update kfuncs bpf_get_[file\|dentry]_xattr to read xattrs with security.bpf. name prefix. Note that BPF programs can read user. xattrs, but not write and remove them. To pick the right version of kfunc to use, a remap logic is added to btf_kfunc_id_set. This helps move some kfunc specific logic off the verifier core code. Also use this remap logic to select bpf_dynptr_from_skb or bpf_dynptr_from_skb_rdonly. Cover letter of v1 and v2: Follow up discussion in LPC 2024 [1], that we need security.bpf xattr prefix. This set adds "security.bpf." xattr name prefix, and allows bpf kfuncs bpf_get_[file\|dentry]_xattr() to read these xattrs. [1] https://lpc.events/event/18/contributions/1940/ --- Changes v11 => v12: 1. Drop btf_kfunc_id_set.remap and changes for bpf_dynptr_from_skb. (Alexei) 2. Minor refactoring in patch 1. (Matt Bobrowski) v11: https://lore.kernel.org/bpf/20250129205957.2457655-1-song@kernel.org/ Changes v10 => v11: 1. Add Acked-by from Christian Brauner. 2. Fix selftests build error like this one: https://github.com/kernel-patches/bpf/actions/runs/13022268618/job/36325472992 3. Rename some variables in the selftests. v10: https://lore.kernel.org/bpf/20250124202911.3264715-1-song@kernel.org/ Changes v9 => v10: 1. Refactor bpf_[set\|remove]_dentry_xattr[_locked]. (Christian Brauner). v9: https://lore.kernel.org/bpf/20250110011342.2965136-1-song@kernel.org/ Changes v8 => v9 1. Fix build for CONFIG_DEBUG_INFO_BTF=n case. (kernel test robot) v8: https://lore.kernel.org/bpf/20250108225140.3467654-1-song@kernel.org/ Changes v7 => v8 1. Rebase and resolve conflicts. v7: https://lore.kernel.org/bpf/20241219221439.2455664-1-song@kernel.org/ Changes v6 => v7 1. Move btf_kfunc_id_remap() to the right place. (Bug reported by CI) v6: https://lore.kernel.org/bpf/20241219202536.1625216-1-song@kernel.org/ Changes v5 => v6 1. Hide _locked version of the kfuncs from vmlinux.h (Alexei) 2. Add remap logic to btf_kfunc_id_set and use that to pick the correct version of kfuncs to use. 3. Also use the remap logic for bpf_dynptr_from_skb[\|_rdonly]. v5: https://lore.kernel.org/bpf/20241218044711.1723221-1-song@kernel.org/ Changes v4 => v5 1. Let verifier pick proper kfunc (_locked or not _locked) based on the calling context. (Alexei) 2. Remove the __failure test (6/6 of v4). v4: https://lore.kernel.org/bpf/20241217063821.482857-1-song@kernel.org/ Changes v3 => v4 1. Do write permission check with inode locked. (Jan Kara) 2. Fix some source_inline warnings. v3: https://lore.kernel.org/bpf/20241210220627.2800362-1-song@kernel.org/ Changes v2 => v3 1. Add kfuncs to set and remove xattr from BPF programs. v2: https://lore.kernel.org/bpf/20241016070955.375923-1-song@kernel.org/ Changes v1 => v2 1. Update comment of bpf_get_[file\|dentry]_xattr. (Jiri Olsa) 2. Fix comment for return value of bpf_get_[file\|dentry]_xattr. v1: https://lore.kernel.org/bpf/20241002214637.3625277-1-song@kernel.org/ ==================== Link: https://patch.msgid.link/20250130213549.3353349-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 19:35:32 -08:00
Song Liu	60c2e1fa91	selftests/bpf: Test kfuncs that set and remove xattr from BPF programs Two sets of tests are added to exercise the not _locked and _locked version of the kfuncs. For both tests, user space accesses xattr security.bpf.foo on a testfile. The BPF program is triggered by user space access (on LSM hook inode_[set\|get]_xattr) and sets or removes xattr security.bpf.bar. Then user space then validates that xattr security.bpf.bar is set or removed as expected. Note that, in both tests, the BPF programs use the not _locked kfuncs. The verifier picks the proper kfuncs based on the calling context. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250130213549.3353349-6-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 19:35:32 -08:00
Song Liu	5646729279	bpf: fs/xattr: Add BPF kfuncs to set and remove xattrs Add the following kfuncs to set and remove xattrs from BPF programs: bpf_set_dentry_xattr bpf_remove_dentry_xattr bpf_set_dentry_xattr_locked bpf_remove_dentry_xattr_locked The _locked version of these kfuncs are called from hooks where dentry->d_inode is already locked. Instead of requiring the user to know which version of the kfuncs to use, the verifier will pick the proper kfunc based on the calling hook. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20250130213549.3353349-5-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 19:35:32 -08:00
Song Liu	7587d735b1	bpf: lsm: Add two more sleepable hooks Add bpf_lsm_inode_removexattr and bpf_lsm_inode_post_removexattr to list sleepable_lsm_hooks. These two hooks are always called from sleepable context. Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20250130213549.3353349-4-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 19:35:31 -08:00
Song Liu	ab39ad6796	selftests/bpf: Extend test fs_kfuncs to cover security.bpf. xattr names Extend test_progs fs_kfuncs to cover different xattr names. Specifically: xattr name "user.kfuncs" and "security.bpf.xxx" can be read from BPF program with kfuncs bpf_get_[file\|dentry]_xattr(); while "security.bpf" and "security.selinux" cannot be read. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250130213549.3353349-3-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 19:35:31 -08:00
Song Liu	531118f1cc	fs/xattr: bpf: Introduce security.bpf. xattr name prefix Introduct new xattr name prefix security.bpf., and enable reading these xattrs from bpf kfuncs bpf_get_[file\|dentry]_xattr(). As we are on it, correct the comments for return value of bpf_get_[file\|dentry]_xattr(), i.e. return length the xattr value on success. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20250130213549.3353349-2-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 19:35:31 -08:00
Amery Hung	b99f27e902	selftests/bpf: Fix stdout race condition in traffic monitor Fix a race condition between the main test_progs thread and the traffic monitoring thread. The traffic monitor thread tries to print a line using multiple printf and use flockfile() to prevent the line from being torn apart. Meanwhile, the main thread doing io redirection can reassign or close stdout when going through tests. A deadlock as shown below can happen. main traffic_monitor_thread ==== ====================== show_transport() -> flockfile(stdout) stdio_hijack_init() -> stdout = open_memstream(log_buf, log_cnt); ... env.subtest_state->stdout_saved = stdout; ... funlockfile(stdout) stdio_restore_cleanup() -> fclose(env.subtest_state->stdout_saved); After the traffic monitor thread lock stdout, A new memstream can be assigned to stdout by the main thread. Therefore, the traffic monitor thread later will not be able to unlock the original stdout. As the main thread tries to access the old stdout, it will hang indefinitely as it is still locked by the traffic monitor thread. The deadlock can be reproduced by running test_progs repeatedly with traffic monitor enabled: for ((i=1;i<=100;i++)); do ./test_progs -a flow_dissector_skb* -m '*' done Fix this by only calling printf once and remove flockfile()/funlockfile(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250213233217.553258-1-ameryhung@gmail.com	2025-02-13 17:06:25 -08:00
Jiri Olsa	c83e2d970b	bpf: Add tracepoints with null-able arguments Some of the tracepoints slipped when we did the first scan, adding them now. Fixes: `838a10bd2e` ("bpf: Augment raw_tp arguments with PTR_MAYBE_NULL") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250210175913.2893549-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-13 17:01:36 -08:00
Yonghong Song	f18169c89e	bpf: Sync uapi bpf.h header for the tooling infra Commit `0abff462d8` ("bpf: Add comment about helper freeze") missed the tooling header sync. Fix it. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250213050427.2788837-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-12 21:56:30 -08:00
Saket Kumar Bhaskar	4107a1aeb2	selftests/bpf: Select NUMA_NO_NODE to create map On powerpc, a CPU does not necessarily originate from NUMA node 0. This contrasts with architectures like x86, where CPU 0 is not hot-pluggable, making NUMA node 0 a consistently valid node. This discrepancy can lead to failures when creating a map on NUMA node 0, which is initialized by default, if no CPUs are allocated from NUMA node 0. This patch fixes the issue by setting NUMA_NO_NODE (-1) for map creation for this selftest. Fixes: `96eabe7a40` ("bpf: Allow selecting numa node during map creation") Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/cf1f61468b47425ecf3728689bc9636ddd1d910e.1738302337.git.skb99@linux.ibm.com	2025-02-11 16:41:25 -08:00
Saket Kumar Bhaskar	650f20bbd9	selftests/bpf: Define SYS_PREFIX for powerpc Since commit `7e92e01b72` ("powerpc: Provide syscall wrapper") landed in v6.1, syscall wrapper is enabled on powerpc. Commit `9474689020` ("powerpc: Don't add __powerpc_ prefix to syscall entry points") , that drops the prefix to syscall entry points, also landed in the same release. So, add the missing empty SYS_PREFIX prefix definition for powerpc, to fix some fentry and kprobe selftests. Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/7192d6aa9501115dc242435970df82b3d190f257.1738302337.git.skb99@linux.ibm.com	2025-02-11 16:41:25 -08:00
Jiayuan Chen	17c3dc5029	bpftool: Using the right format specifiers Fixed some formatting specifiers errors, such as using %d for int and %u for unsigned int, as well as other byte-length types. Signed-off-by: Jiayuan Chen <mrpre@163.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20250207123706.727928-2-mrpre@163.com	2025-02-10 16:08:13 -08:00
Bastien Curutchet (eBPF Foundation)	9b6cdaf2ac	selftests/bpf: Remove with_addr.sh and with_tunnels.sh Those two scripts were used by test_flow_dissector.sh to setup/cleanup the network topology before/after the tests. test_flow_dissector.sh have been deleted by commit `63b37657c5` ("selftests/bpf: remove test_flow_dissector.sh") so they aren't used anywhere now. Remove the two unused scripts and their Makefile entries. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250204-with-v1-1-387a42118cd4@bootlin.com	2025-02-07 18:30:41 -08:00
Ihor Solodrai	ea145d530a	bpf: define KF_ARENA_* flags for bpf_arena kfuncs bpf_arena_alloc_pages() and bpf_arena_free_pages() work with the bpf_arena pointers [1], which is indicated by the __arena macro in the kernel source code: #define __arena __attribute__((address_space(1))) However currently this information is absent from the debug data in the vmlinux binary. As a consequence, bpf_arena_* kfuncs declarations in vmlinux.h (produced by bpftool) do not match prototypes expected by the BPF programs attempting to use these functions. Introduce a set of kfunc flags to mark relevant types as bpf_arena pointers. The flags then can be detected by pahole when generating BTF from vmlinux's DWARF, allowing it to emit corresponding BTF type tags for the marked kfuncs. With recently proposed BTF extension [2], these type tags will be processed by bpftool when dumping vmlinux.h, and corresponding compiler attributes will be added to the declarations. [1] https://lwn.net/Articles/961594/ [2] https://lore.kernel.org/bpf/20250130201239.1429648-1-ihor.solodrai@linux.dev/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Link: https://lore.kernel.org/r/20250206003148.2308659-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-02-07 18:22:52 -08:00
Jason Xing	003be25ab9	selftests/bpf: Correct the check of join cgroup Use ASSERT_OK_FD to check the return value of join cgroup, or else this test will pass even if the fd < 0. ASSERT_OK_FD can print the error message to the console. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/all/6d62bd77-6733-40c7-b240-a1aeff55566c@linux.dev/ Link: https://patch.msgid.link/20250204051154.57655-1-kerneljasonxing@gmail.com	2025-02-06 21:18:04 -08:00

1 2 3 4 5 ...

1335534 Commits