Tests use the root network namespace, so they aren't fully independent
of each other. For instance, the index of the created veth interfaces
is incremented every time a new test is launched.
Wrap the network topology in a network namespace to ensure full
isolation. Use the append_tid() helper to ensure the uniqueness of this
namespace's name during parallel runs.
Remove the use of the append_tid() on the veth names as they now belong
to an already unique namespace.
Simplify cleanup_network() by directly deleting the namespaces
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-2-fd0d39fca6e6@bootlin.com
The network configuration is defined by a table of struct
veth_configuration. This isn't convenient if we want to add a network
configuration that isn't linked to a veth pair.
Create a struct net_configuration that holds the veth_configuration
table to ease adding new configuration attributes in upcoming patch.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-1-fd0d39fca6e6@bootlin.com
Amery Hung says:
====================
This patchset supports struct_ops operators that acquire kptrs through
arguments and operators that return a kptr. A coming new struct_ops use
case, bpf qdisc [0], has two operators that are not yet supported by
current struct_ops infrastructure. Qdisc_ops::enqueue requires getting
referenced skb kptr from the argument; Qdisc_ops::dequeue needs to return
a referenced skb kptr. This patchset will allow bpf qdisc and other
potential struct_ops implementers to do so.
For struct_ops implementers:
- To get a kptr from an argument, a struct_ops implementer needs to
annotate the argument name in the stub function with "__ref" suffix.
- The kptr return will automatically work as we now allow operators that
return a struct pointer.
- The verifier allows returning a null pointer. More control can be
added later if there is a future struct_ops implementer only expecting
valid pointers.
For struct_ops users:
- The referenced kptr acquired through the argument needs to be released
or xchged into maps just like ones acquired via kfuncs.
- To return a referenced kptr in struct_ops,
1) The type of the pointer must matches the return type
2) The pointer must comes from the kernel (not locally allocated), and
3) The pointer must be in its unmodified form
[0] https://lore.kernel.org/bpf/20250210174336.2024258-1-ameryhung@gmail.com/
---
v2
- Replace kcalloc+memcpy with kmemdup_array in
bpf_prog_ctx_arg_info_init()
- Remove unnecessary checks when kfree-ing ctx_arg_info
- Remove conditional assignment of ref_obj_id in btf_ctx_access()
v1
- Link: https://lore.kernel.org/bpf/20250214164520.1001211-1-ameryhung@gmail.com/
- Fix missing kfree for ctx_arg_info
====================
Link: https://patch.msgid.link/20250217190640.1748177-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allow a struct_ops program to return a referenced kptr if the struct_ops
operator's return type is a struct pointer. To make sure the returned
pointer continues to be valid in the kernel, several constraints are
required:
1) The type of the pointer must matches the return type
2) The pointer originally comes from the kernel (not locally allocated)
3) The pointer is in its unmodified form
Implementation wise, a referenced kptr first needs to be allowed to _leak_
in check_reference_leak() if it is in the return register. Then, in
check_return_code(), constraints 1-3 are checked. During struct_ops
registration, a check is also added to warn about operators with
non-struct pointer return.
In addition, since the first user, Qdisc_ops::dequeue, allows a NULL
pointer to be returned when there is no skb to be dequeued, we will allow
a scalar value with value equals to NULL to be returned.
In the future when there is a struct_ops user that always expects a valid
pointer to be returned from an operator, we may extend tagging to the
return value. We can tell the verifier to only allow NULL pointer return
if the return value is tagged with MAY_BE_NULL.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250217190640.1748177-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test referenced kptr acquired through struct_ops argument tagged with
"__ref". The success case checks whether 1) a reference to the correct
type is acquired, and 2) the referenced kptr argument can be accessed in
multiple paths as long as it hasn't been released. In the fail cases,
we first confirm that a referenced kptr acquried through a struct_ops
argument is not allowed to be leaked. Then, we make sure this new
referenced kptr acquiring mechanism does not accidentally allow referenced
kptrs to flow into global subprograms through their arguments.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250217190640.1748177-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allows struct_ops programs to acqurie referenced kptrs from arguments
by directly reading the argument.
The verifier will acquire a reference for struct_ops a argument tagged
with "__ref" in the stub function in the beginning of the main program.
The user will be able to access the referenced kptr directly by reading
the context as long as it has not been released by the program.
This new mechanism to acquire referenced kptr (compared to the existing
"kfunc with KF_ACQUIRE") is introduced for ergonomic and semantic reasons.
In the first use case, Qdisc_ops, an skb is passed to .enqueue in the
first argument. This mechanism provides a natural way for users to get a
referenced kptr in the .enqueue struct_ops programs and makes sure that a
qdisc will always enqueue or drop the skb.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250217190640.1748177-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, ctx_arg_info is read-only in the view of the verifier since
it is shared among programs of the same attach type. Make each program
have their own copy of ctx_arg_info so that we can use it to store
program specific information.
In the next patch where we support acquiring a referenced kptr through a
struct_ops argument tagged with "__ref", ctx_arg_info->ref_obj_id will
be used to store the unique reference object id of the argument. This
avoids creating a requirement in the verifier that "__ref" tagged
arguments must be the first set of references acquired [0].
[0] https://lore.kernel.org/bpf/20241220195619.2022866-2-amery.hung@gmail.com/
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250217190640.1748177-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a simple repro for the issue of miscalculating LDX/STX/ST CO-RE
relocation size adjustment when the CO-RE relocation target type is an
ARRAY.
We need to make sure that compiler generates LDX/STX/ST instruction with
CO-RE relocation against entire ARRAY type, not ARRAY's element. With
the code pattern in selftest, we get this:
59: 61 71 00 00 00 00 00 00 w1 = *(u32 *)(r7 + 0x0)
00000000000001d8: CO-RE <byte_off> [5] struct core_reloc_arrays::a (0:0)
Where offset of `int a[5]` is embedded (through CO-RE relocation) into memory
load instruction itself.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250207014809.1573841-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Libbpf has a somewhat obscure feature of automatically adjusting the
"size" of LDX/STX/ST instruction (memory store and load instructions),
based on originally recorded access size (u8, u16, u32, or u64) and the
actual size of the field on target kernel. This is meant to facilitate
using BPF CO-RE on 32-bit architectures (pointers are always 64-bit in
BPF, but host kernel's BTF will have it as 32-bit type), as well as
generally supporting safe type changes (unsigned integer type changes
can be transparently "relocated").
One issue that surfaced only now, 5 years after this logic was
implemented, is how this all works when dealing with fields that are
arrays. This isn't all that easy and straightforward to hit (see
selftests that reproduce this condition), but one of sched_ext BPF
programs did hit it with innocent looking loop.
Long story short, libbpf used to calculate entire array size, instead of
making sure to only calculate array's element size. But it's the element
that is loaded by LDX/STX/ST instructions (1, 2, 4, or 8 bytes), so
that's what libbpf should check. This patch adjusts the logic for
arrays and fixed the issue.
Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250207014809.1573841-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiayuan Chen says:
====================
bpf: Fix array bounds error with may_goto and add selftest
Syzbot caught an array out-of-bounds bug [1]. It turns out that when the
BPF program runs through do_misc_fixups(), it allocates an extra 8 bytes
on the call stack, which eventually causes stack_depth to exceed 512.
I was able to reproduce this issue probabilistically by enabling
CONFIG_UBSAN=y and disabling CONFIG_BPF_JIT_ALWAYS_ON with the selfttest
I provide in second patch(although it doesn't happen every time - I didn't
dig deeper into why UBSAN behaves this way).
Furthermore, if I set /proc/sys/net/core/bpf_jit_enable to 0 to disable
the jit, a panic occurs, and the reason is the same, that bpf_func is
assigned an incorrect address.
[---[ end trace ]---
[Oops: general protection fault, probably for non-canonical address
0x100f0e0e0d090808: 0000 [#1] PREEMPT SMP NOPTI
[Tainted: [W]=WARN, [O]=OOT_MODULE
[RIP: 0010:bpf_test_run+0x1d2/0x360
[RSP: 0018:ffffafc7955178a0 EFLAGS: 00010246
[RAX: 100f0e0e0d090808 RBX: ffff8e9fdb2c4100 RCX: 0000000000000018
[RDX: 00000000002b5b18 RSI: ffffafc780497048 RDI: ffff8ea04d601700
[RBP: ffffafc780497000 R08: ffffafc795517a0c R09: 0000000000000000
[R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8ea04d601700
[R13: ffffafc795517928 R14: ffffafc795517928 R15: 0000000000000000
[CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[CR2: 00007f181c064648 CR3: 00000001aa2be003 CR4: 0000000000770ef0
[DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[PKRU: 55555554
[Call Trace:
[ <TASK>
[ ? die_addr+0x36/0x90
[ ? exc_general_protection+0x237/0x430
[ ? asm_exc_general_protection+0x26/0x30
[ ? bpf_test_run+0x1d2/0x360
[ ? bpf_test_run+0x10d/0x360
[ ? __link_object+0x12a/0x1e0
[ ? slab_build_skb+0x23/0x130
[ ? kmem_cache_alloc_noprof+0x2ea/0x3f0
[ ? sk_prot_alloc+0xc2/0x120
[ bpf_prog_test_run_skb+0x21b/0x590
[ __sys_bpf+0x340/0xa80
[ __x64_sys_bpf+0x1e/0x30
---
v2 -> v3:
Optimized some code naming and conditional judgment logic.
https://lore.kernel.org/bpf/20250213131214.164982-1-mrpre@163.com/T/
v1 -> v2:
Directly reject loading programs with a stack size greater than 512 when
jit disabled.(Suggested by Alexei Starovoitov)
https://lore.kernel.org/bpf/20250212135251.85487-1-mrpre@163.com/T/
====================
Link: https://patch.msgid.link/20250214091823.46042-1-mrpre@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Added test cases to ensure that programs with stack sizes exceeding 512
bytes are restricted in non-JITed mode, and can be executed normally in
JITed mode, even with stack sizes exceeding 512 bytes due to the presence
of may_goto instructions.
Test result:
echo "0" > /proc/sys/net/core/bpf_jit_enable
./test_progs -t verifier_stack_ptr
...
stack size 512 with may_goto with jit:SKIP
stack size 512 with may_goto without jit:OK
...
Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED
echo "1" > /proc/sys/net/core/bpf_jit_enable
./test_progs -t verifier_stack_ptr
...
stack size 512 with may_goto with jit:OK
stack size 512 with may_goto without jit:SKIP
...
Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Link: https://lore.kernel.org/r/20250214091823.46042-4-mrpre@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In some cases, the verification logic under the interpreter and JIT
differs, such as may_goto, and the test program behaves differently under
different runtime modes, requiring separate verification logic for each
result.
Introduce __load_if_JITed and __load_if_no_JITed annotation for tests.
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Link: https://lore.kernel.org/r/20250214091823.46042-3-mrpre@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
may_goto uses an additional 8 bytes on the stack, which causes the
interpreters[] array to go out of bounds when calculating index by
stack_size.
1. If a BPF program is rewritten, re-evaluate the stack size. For non-JIT
cases, reject loading directly.
2. For non-JIT cases, calculating interpreters[idx] may still cause
out-of-bounds array access, and just warn about it.
3. For jit_requested cases, the execution of bpf_func also needs to be
warned. So move the definition of function __bpf_prog_ret0_warn out of
the macro definition CONFIG_BPF_JIT_ALWAYS_ON.
Reported-by: syzbot+d2a2c639d03ac200a4f1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/0000000000000f823606139faa5d@google.com/
Fixes: 011832b97b ("bpf: Introduce may_goto instruction")
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Link: https://lore.kernel.org/r/20250214091823.46042-2-mrpre@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The size of struct bpf_map::name is BPF_OBJ_NAME_LEN (16).
bpf(2) {
map_create() {
bpf_obj_name_cpy(map->name, attr->map_name, sizeof(attr->map_name));
}
}
When specifying a map name using bpftool map create name, no error is
reported if the name length is greater than 15.
$ sudo bpftool map create /sys/fs/bpf/12345678901234567890 \
type array key 4 value 4 entries 5 name 12345678901234567890
Users will think that 12345678901234567890 is legal, but this name cannot
be used to index a map.
$ sudo bpftool map show name 12345678901234567890
Error: can't parse name
$ sudo bpftool map show
...
1249: array name 123456789012345 flags 0x0
key 4B value 4B max_entries 5 memlock 304B
$ sudo bpftool map show name 123456789012345
1249: array name 123456789012345 flags 0x0
key 4B value 4B max_entries 5 memlock 304B
The map name provided in the command line is truncated, but no warning is
reported. This submission checks the length of the map name.
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_B44B3A95F0D7C2512DC40D831DA1FA2C9907@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Two sets of tests are added to exercise the not _locked and _locked
version of the kfuncs. For both tests, user space accesses xattr
security.bpf.foo on a testfile. The BPF program is triggered by user
space access (on LSM hook inode_[set|get]_xattr) and sets or removes
xattr security.bpf.bar. Then user space then validates that xattr
security.bpf.bar is set or removed as expected.
Note that, in both tests, the BPF programs use the not _locked kfuncs.
The verifier picks the proper kfuncs based on the calling context.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250130213549.3353349-6-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the following kfuncs to set and remove xattrs from BPF programs:
bpf_set_dentry_xattr
bpf_remove_dentry_xattr
bpf_set_dentry_xattr_locked
bpf_remove_dentry_xattr_locked
The _locked version of these kfuncs are called from hooks where
dentry->d_inode is already locked. Instead of requiring the user
to know which version of the kfuncs to use, the verifier will pick
the proper kfunc based on the calling hook.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20250130213549.3353349-5-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend test_progs fs_kfuncs to cover different xattr names. Specifically:
xattr name "user.kfuncs" and "security.bpf.xxx" can be read from BPF
program with kfuncs bpf_get_[file|dentry]_xattr(); while "security.bpf"
and "security.selinux" cannot be read.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250130213549.3353349-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix a race condition between the main test_progs thread and the traffic
monitoring thread. The traffic monitor thread tries to print a line
using multiple printf and use flockfile() to prevent the line from being
torn apart. Meanwhile, the main thread doing io redirection can reassign
or close stdout when going through tests. A deadlock as shown below can
happen.
main traffic_monitor_thread
==== ======================
show_transport()
-> flockfile(stdout)
stdio_hijack_init()
-> stdout = open_memstream(log_buf, log_cnt);
...
env.subtest_state->stdout_saved = stdout;
...
funlockfile(stdout)
stdio_restore_cleanup()
-> fclose(env.subtest_state->stdout_saved);
After the traffic monitor thread lock stdout, A new memstream can be
assigned to stdout by the main thread. Therefore, the traffic monitor
thread later will not be able to unlock the original stdout. As the
main thread tries to access the old stdout, it will hang indefinitely
as it is still locked by the traffic monitor thread.
The deadlock can be reproduced by running test_progs repeatedly with
traffic monitor enabled:
for ((i=1;i<=100;i++)); do
./test_progs -a flow_dissector_skb* -m '*'
done
Fix this by only calling printf once and remove flockfile()/funlockfile().
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250213233217.553258-1-ameryhung@gmail.com
On powerpc, a CPU does not necessarily originate from NUMA node 0.
This contrasts with architectures like x86, where CPU 0 is not
hot-pluggable, making NUMA node 0 a consistently valid node.
This discrepancy can lead to failures when creating a map on NUMA
node 0, which is initialized by default, if no CPUs are allocated
from NUMA node 0.
This patch fixes the issue by setting NUMA_NO_NODE (-1) for map
creation for this selftest.
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/cf1f61468b47425ecf3728689bc9636ddd1d910e.1738302337.git.skb99@linux.ibm.com
Those two scripts were used by test_flow_dissector.sh to setup/cleanup
the network topology before/after the tests. test_flow_dissector.sh
have been deleted by commit 63b37657c5 ("selftests/bpf: remove
test_flow_dissector.sh") so they aren't used anywhere now.
Remove the two unused scripts and their Makefile entries.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250204-with-v1-1-387a42118cd4@bootlin.com
bpf_arena_alloc_pages() and bpf_arena_free_pages() work with the
bpf_arena pointers [1], which is indicated by the __arena macro in the
kernel source code:
#define __arena __attribute__((address_space(1)))
However currently this information is absent from the debug data in
the vmlinux binary. As a consequence, bpf_arena_* kfuncs declarations
in vmlinux.h (produced by bpftool) do not match prototypes expected by
the BPF programs attempting to use these functions.
Introduce a set of kfunc flags to mark relevant types as bpf_arena
pointers. The flags then can be detected by pahole when generating BTF
from vmlinux's DWARF, allowing it to emit corresponding BTF type tags
for the marked kfuncs.
With recently proposed BTF extension [2], these type tags will be
processed by bpftool when dumping vmlinux.h, and corresponding
compiler attributes will be added to the declarations.
[1] https://lwn.net/Articles/961594/
[2] https://lore.kernel.org/bpf/20250130201239.1429648-1-ihor.solodrai@linux.dev/
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20250206003148.2308659-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 13b25489b6 ("kbuild: change working directory to external
module directory with M=") changed kbuild working directory of bpf
sample programs to samples/bpf, which broke the vmlinux path for
VMLINUX_BTF, as the Makefiles assume the current work directory to be
the kernel output directory and use a relative path (i.e., ./vmlinux):
Makefile:316: *** Cannot find a vmlinux for VMLINUX_BTF at any of " /path/to/linux/samples/bpf/vmlinux", build the kernel or set VMLINUX_BTF like "VMLINUX_BTF=/sys/kernel/btf/vmlinux" or VMLINUX_H variable. Stop.
Correctly refer to the kernel output directory using $(objtree).
Fixes: 13b25489b6 ("kbuild: change working directory to external module directory with M=")
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Ruowen Qin <ruqin@redhat.com>
Link: https://lore.kernel.org/bpf/20250203085506.220297-3-jinghao7@illinois.edu
Ihor Solodrai says:
====================
BTF: arbitrary __attribute__ encoding
This patch series extends BPF Type Format (BTF) to support arbitrary
__attribute__ encoding.
Setting the kind_flag to 1 in BTF type tags and decl tags now changes
the meaning for the encoded tag, in particular with respect to
btf_dump in libbpf.
If the kflag is set, then the string encoded by the tag represents the
full attribute-list of an attribute specifier [1].
This feature will allow extending tools such as pahole and bpftool to
capture and use more granular type information, and make it easier to
manage compatibility between clang and gcc BPF compilers.
[1] https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Attribute-Syntax.html
v2->v3: nit fixes suggested by Andrii
v1->v2:
- When checking for specific BTF tags in the verifier, make sure the
tag's kflag is 0
- Split docs and libbpf changes into separate patches
- Various renames, as suggested by Andrii and Eduard
v2: https://lore.kernel.org/bpf/20250127233955.2275804-1-ihor.solodrai@linux.dev/
v1: https://lore.kernel.org/bpf/20250122025308.2717553-1-ihor.solodrai@pm.me
====================
Link: https://patch.msgid.link/20250130201239.1429648-1-ihor.solodrai@linux.dev
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
BTF type tags and decl tags now may have info->kflag set to 1,
changing the semantics of the tag.
Change BTF verification to permit BTF that makes use of this feature:
* remove kflag check in btf_decl_tag_check_meta(), as both values
are valid
* allow kflag to be set for BTF_KIND_TYPE_TAG type in
btf_ref_type_check_meta()
Make sure kind_flag is NOT set when checking for specific BTF tags,
such as "kptr", "user" etc.
Modify a selftest checking for kflag in decl_tag accordingly.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250130201239.1429648-6-ihor.solodrai@linux.dev
Add the following functions to libbpf API:
* btf__add_type_attr()
* btf__add_decl_attr()
These functions allow to add to BTF the type tags and decl tags with
info->kflag set to 1. The kflag indicates that the tag directly
encodes an __attribute__ and not a normal tag.
See Documentation/bpf/btf.rst changes in the subsequent patch for
details on the semantics.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250130201239.1429648-2-ihor.solodrai@linux.dev
Bastien Curutchet says:
====================
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
This series is the prep work that will be followed up adding test cases
to eventually cover the tests done in test_xdp_redirect_multi.sh:
- PATCH 1 Add an helper to generate unique names
- PATCH 2 to 9 rework test_xdp_veth to make it more generic and allow to
configure different test cases
- PATCH 10 adds test cases for 'classic' bpf_redirect_map()
====================
Link: https://patch.msgid.link/20250131-redirect-multi-v4-0-970b33678512@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The network namespaces and the veth used by the tests have hardcoded
names that can conflict with other tests during parallel runs.
Use the append_tid() helper to ensure the uniqueness of these names.
Use the static network configuration table as a template on which
thread IDs are appended in each test.
Set a fixed size to remote_addr field so the struct veth_configuration
can also have a fixed size.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250131-redirect-multi-v4-9-970b33678512@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF program attached to each veth is hardcoded through the
use of the struct skeletons. It prevents from re-using the initialization
code in new test cases.
Replace the struct skeletons by a bpf_object table.
Add a struct prog_configuration that holds the name of BPF program to
load on a given veth pair.
Use bpf_object__find_program_by_name() / bpf_xdp_attach() API instead of
bpf_program__attach_xdp() to retrieve the BPF programs from their names.
Detach BPF progs in the cleanup() as it's not automatically done by this
API.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250131-redirect-multi-v4-7-970b33678512@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In commit 1611603537 ("bpf: Create argument information for nullable arguments."),
it introduced a "__nullable" tagging at the argument name of a
stub function. Some background on the commit:
it requires to tag the stub function instead of directly tagging
the "ops" of a struct. This is because the btf func_proto of the "ops"
does not have the argument name and the "__nullable" is tagged at
the argument name.
To find the stub function of a "ops", it currently relies on a naming
convention on the stub function "st_ops__ops_name".
e.g. tcp_congestion_ops__ssthresh. However, the new kernel
sub system implementing bpf_struct_ops have missed this and
have been surprised that the "__nullable" and the to-be-landed
"__ref" tagging was not effective.
One option would be to give a warning whenever the stub function does
not follow the naming convention, regardless if it requires arg tagging
or not.
Instead, this patch uses the kallsyms_lookup approach and removes
the requirement on the naming convention. The st_ops->cfi_stubs has
all the stub function kernel addresses. kallsyms_lookup() is used to
lookup the function name. With the function name, BTF can be used to
find the BTF func_proto. The existing "__nullable" arg name searching
logic will then fall through.
One notable change is,
if it failed in kallsyms_lookup or it failed in looking up the stub
function name from the BTF, the bpf_struct_ops registration will fail.
This is different from the previous behavior that it silently ignored
the "st_ops__ops_name" function not found error.
The "tcp_congestion_ops", "sched_ext_ops", and "hid_bpf_ops" can still be
registered successfully after this patch. There is struct_ops_maybe_null
selftest to cover the "__nullable" tagging.
Other minor changes:
1. Removed the "%s__%s" format from the pr_warn because the naming
convention is removed.
2. The existing bpf_struct_ops_supported() is also moved earlier
because prepare_arg_info needs to use it to decide if the
stub function is NULL before calling the prepare_arg_info.
Cc: Tejun Heo <tj@kernel.org>
Cc: Benjamin Tissoires <bentiss@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20250127222719.2544255-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The `readlink(path, buf, sizeof(buf))` call reads at most sizeof(buf)
bytes and *does not* append null-terminator to buf. With respect to
that, fix two pieces in get_fd_type:
1. Change the truncation check to contain sizeof(buf) rather than
sizeof(path).
2. Append null-terminator to buf.
Reported by Coverity.
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20250129071857.75182-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
configure_network() does two things : it first creates the network
topology and then configures the BPF maps to fit the test needs. This
isn't convenient if we want to re-use the same network topology for
different test cases.
Rename configure_network() create_network().
Move the BPF configuration to the test itself.
Split the test description in two parts, first the description of the
network topology, then the description of the test case.
Remove the veth indexes from the ASCII art as dynamic ones are used
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250131-redirect-multi-v4-5-970b33678512@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>