linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 15:13:44 -04:00

Author	SHA1	Message	Date
Martin KaFai Lau	9621eb635b	Merge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master' Merge skb-meta-dynptr branch into master branch after fixing a compiler warning. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-09-04 21:23:47 -07:00
Jakub Sitnicki	54728bd535	bpf: Return an error pointer for skb metadata when CONFIG_NET=n Kernel Test Robot reported a compiler warning - a null pointer may be passed to memmove in __bpf_dynptr_{read,write} when building without networking support. The warning is correct from a static analysis standpoint, but not actually reachable. Without CONFIG_NET, creating dynptrs to skb metadata is impossible since the constructor kfunc is missing. Silence the false-postive diagnostic message by returning an error pointer from bpf_skb_meta_pointer stub when CONFIG_NET=n. Fixes: `6877cd392b` ("bpf: Enable read/write access to skb metadata through a dynptr") Closes: https://lore.kernel.org/oe-kbuild-all/202508212031.ir9b3B6Q-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250901-dynptr-skb-meta-no-net-v2-1-ce607fcb6091@cloudflare.com	2025-09-04 13:42:29 -07:00
Jiawei Zhao	b338cf849e	libbpf: Remove unused args in parse_usdt_note Remove unused 'elf' and 'path' parameters from parse_usdt_note function signature. These parameters are not referenced within the function body and only add unnecessary complexity. The function only requires the note header, data buffer, offsets, and output structure to perform USDT note parsing. Update function declaration, definition, and the single call site in collect_usdt_targets() to match the simplified signature. This is a safe internal cleanup as parse_usdt_note is a static function. Signed-off-by: Jiawei Zhao <phoenix500526@163.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250904030525.1932293-1-phoenix500526@163.com	2025-09-04 11:35:44 -07:00
Alexei Starovoitov	2d92ef7da2	Merge branch 'selftests-bpf-introduce-experimental-bpf_in_interrupt' Leon Hwang says: ==================== selftests/bpf: Introduce experimental bpf_in_interrupt() Filtering pid_tgid is meanlingless when the current task is preempted by an interrupt. To address this, introduce 'bpf_in_interrupt()' helper function, which allows BPF programs to determine whether they are executing in interrupt context. 'get_preempt_count()': * On x86, '(int ) bpf_this_cpu_ptr(&__preempt_count)'. * On arm64, 'bpf_get_current_task_btf()->thread_info.preempt.count'. Then 'bpf_in_interrupt()' will be: * If !PREEMPT_RT, 'get_preempt_count() & (NMI_MASK \| HARDIRQ_MASK \| SOFTIRQ_MASK)'. * If PREEMPT_RT, '(get_preempt_count() & (NMI_MASK \| HARDIRQ_MASK)) \| (bpf_get_current_task_btf()->softirq_disable_cnt & SOFTIRQ_MASK)'. 'bpf_in_interrupt()' runs well when PREEMPT_RT is enabled. But it's difficult for me to test it well because I'm not familiar with PREEMPT_RT. Changes: v2 -> v3: * Address comments from Alexei: * Move bpf_in_interrupt() to bpf_experimental.h. * Add support for arm64. v2: https://lore.kernel.org/bpf/20250825131502.54269-1-leon.hwang@linux.dev/ v1 -> v2: * Fix a build error reported by test bot. ==================== Link: https://patch.msgid.link/20250903140438.59517-1-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:05:58 -07:00
Leon Hwang	88a3bde432	selftests/bpf: Add case to test bpf_in_interrupt() Add a timer test case to test 'bpf_in_interrupt()'. cd tools/testing/selftests/bpf ./test_progs -t timer_interrupt 462 timer_interrupt:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250903140438.59517-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:05:58 -07:00
Leon Hwang	4b69e31329	selftests/bpf: Introduce experimental bpf_in_interrupt() Filtering pid_tgid is meanlingless when the current task is preempted by an interrupt. To address this, introduce 'bpf_in_interrupt()' helper function, which allows BPF programs to determine whether they are executing in interrupt context. 'get_preempt_count()': * On x86, '(int ) bpf_this_cpu_ptr(&__preempt_count)'. * On arm64, 'bpf_get_current_task_btf()->thread_info.preempt.count'. Then 'bpf_in_interrupt()' will be: * If !PREEMPT_RT, 'get_preempt_count() & (NMI_MASK \| HARDIRQ_MASK \| SOFTIRQ_MASK)'. * If PREEMPT_RT, '(get_preempt_count() & (NMI_MASK \| HARDIRQ_MASK)) \| (bpf_get_current_task_btf()->softirq_disable_cnt & SOFTIRQ_MASK)'. As for other archs, it can be added support by updating 'get_preempt_count()'. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250903140438.59517-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:05:58 -07:00
Hengqi Chen	929adf8838	bpf, arm64: Remove duplicated bpf_flush_icache() The bpf_flush_icache() is done by bpf_arch_text_copy() already. Remove the duplicated one in arch_prepare_bpf_trampoline(). Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Acked-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250904075703.49404-1-hengqi.chen@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:02:23 -07:00
Rong Tao	abc8a952d4	selftests/bpf: Test kfunc bpf_strcasecmp Add testsuites for kfunc bpf_strcasecmp. Signed-off-by: Rong Tao <rongtao@cestc.cn> Link: https://lore.kernel.org/r/tencent_81A1A0ACC04B68158C57C4D151C46A832B07@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:57 -07:00
Rong Tao	19559e8441	bpf: add bpf_strcasecmp kfunc bpf_strcasecmp() function performs same like bpf_strcmp() except ignoring the case of the characters. Signed-off-by: Rong Tao <rongtao@cestc.cn> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Viktor Malik <vmalik@redhat.com> Link: https://lore.kernel.org/r/tencent_292BD3682A628581AA904996D8E59F4ACD06@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:57 -07:00
Alexei Starovoitov	525ac69b1e	Merge branch 'selftests-bpf-benchmark-all-symbols-for-kprobe-multi' Menglong Dong says: ==================== selftests/bpf: benchmark all symbols for kprobe-multi Add the benchmark testcase "kprobe-multi-all", which will hook all the kernel functions during the testing. This series is separated out from [1]. Changes since V2: * add some comment to attach_ksyms_all, which notes that don't run the testing on a debug kernel Changes since V1: * introduce trace_blacklist instead of copy-pasting strcmp in the 2nd patch * use fprintf() instead of printf() in 3rd patch Link: https://lore.kernel.org/bpf/20250817024607.296117-1-dongml2@chinatelecom.cn/ [1] ==================== Link: https://patch.msgid.link/20250904021011.14069-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:26 -07:00
Menglong Dong	a85d888768	selftests/bpf: add benchmark testing for kprobe-multi-all For now, the benchmark for kprobe-multi is single, which means there is only 1 function is hooked during testing. Add the testing "kprobe-multi-all", which will hook all the kernel functions during the benchmark. And the "kretprobe-multi-all" is added too. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250904021011.14069-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:25 -07:00
Menglong Dong	adf6b57ce4	selftests/bpf: skip recursive functions for kprobe_multi Some functions is recursive for the kprobe_multi and impact the benchmark results. So just skip them. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250904021011.14069-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:25 -07:00
Menglong Dong	8bad31edf5	selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c We need to get all the kernel function that can be traced sometimes, so we move the get_syms() and get_addrs() in kprobe_multi_test.c to trace_helpers.c and rename it to bpf_get_ksyms() and bpf_get_addrs(). Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250904021011.14069-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:25 -07:00
Ricardo B. Marlière	c9110e6f72	selftests/bpf: Fix count write in testapp_xdp_metadata_copy() Commit `4b30209255` ("selftests/xsk: Add tail adjustment tests and support check") added a new global to xsk_xdp_progs.c, but left out the access in the testapp_xdp_metadata_copy() function. Since bpf_map_update_elem() will write to the whole bss section, it gets truncated. Fix by writing to skel_rx->bss->count directly. Fixes: `4b30209255` ("selftests/xsk: Add tail adjustment tests and support check") Signed-off-by: Ricardo B. Marlière <rbm@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250829-selftests-bpf-xsk_regression_fix-v1-1-5f5acdb9fe6b@suse.com	2025-09-03 16:46:36 -07:00
Ricardo B. Marlière	2a912258c9	selftests/bpf: Upon failures, exit with code 1 in test_xsk.sh Currently, even if some subtests fails, the end result will still yield "ok 1 selftests: bpf: test_xsk.sh". Fix it by exiting with 1 if there are any failures. Signed-off-by: Ricardo B. Marlière <rbm@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-test_xsk_ret-v1-1-e6656c01f397@suse.com	2025-09-03 16:44:22 -07:00
Feng Yang	e4980fa646	bpf: Replace kvfree with kfree for kzalloc memory These pointers are allocated by kzalloc. Therefore, replace kvfree() with kfree() to avoid unnecessary is_vmalloc_addr() check in kvfree(). This is the remaining unmodified part from [1]. Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250811123949.552885-1-rongqianfeng@vivo.com [1] Link: https://lore.kernel.org/bpf/20250827032812.498216-1-yangfeng59949@163.com	2025-09-02 17:29:52 +02:00
Yuan Chen	6417ca8530	bpftool: Add CET-aware symbol matching for x86_64 architectures Adjust symbol matching logic to account for Control-flow Enforcement Technology (CET) on x86_64 systems. CET prefixes functions with a 4-byte 'endbr' instruction, shifting the actual hook entry point to symbol + 4. Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <qmo@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250829061107.23905-3-chenyuan_fl@163.com	2025-09-02 17:20:54 +02:00
Yuan Chen	70f32a10ad	bpftool: Refactor kernel config reading into common helper Extract the kernel configuration file parsing logic from feature.c into a new read_kernel_config() function in common.c. This includes: 1. Moving the config file handling and option parsing code 2. Adding required headers and struct definition 3. Keeping all existing functionality The refactoring enables sharing this logic with other components while maintaining current behavior. This will be used by subsequent patches that need to check kernel config options. Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <qmo@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250829061107.23905-2-chenyuan_fl@163.com	2025-09-02 17:20:54 +02:00
Ricardo B. Marlière	98857d111c	selftests/bpf: Fix bpf_prog_detach2 usage in test_lirc_mode2 Commit `e9fc3ce99b` ("libbpf: Streamline error reporting for high-level APIs") redefined the way that bpf_prog_detach2() returns. Therefore, adapt the usage in test_lirc_mode2_user.c. Signed-off-by: Ricardo B. Marlière <rbm@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-v1-1-c7811cd8b98c@suse.com	2025-08-28 09:49:46 -07:00
Matt Fleming	737433c6a5	selftests/bpf: Add LPM trie microbenchmarks Add benchmarks for the standard set of operations: LOOKUP, INSERT, UPDATE, DELETE. Also include benchmarks to measure the overhead of the bench framework itself (NOOP) as well as the overhead of generating keys (BASELINE). Lastly, this includes a benchmark for FREE (trie_free()) which is known to have terrible performance for maps with many entries. Benchmarks operate on tries without gaps in the key range, i.e. each test begins or ends with a trie with valid keys in the range [0, nr_entries). This is intended to cause maximum branching when traversing the trie. LOOKUP, UPDATE, DELETE, and FREE fill a BPF LPM trie from userspace using bpf_map_update_batch() and run the corresponding benchmark operation via bpf_loop(). INSERT starts with an empty map and fills it kernel-side from bpf_loop(). FREE records the time to free a filled LPM trie by attaching and destroying a BPF prog. NOOP measures the overhead of the test harness by running an empty function with bpf_loop(). BASELINE is similar to NOOP except that the function generates a key. Each operation runs 10,000 times using bpf_loop(). Note that this value is intentionally independent of the number of entries in the LPM trie so that the stability of the results isn't affected by the number of entries. For those benchmarks that need to reset the LPM trie once it's full (INSERT) or empty (DELETE), throughput and latency results are scaled by the fraction of a second the operation actually ran to ignore any time spent reinitialising the trie. By default, benchmarks run using sequential keys in the range [0, nr_entries). BASELINE, LOOKUP, and UPDATE can use random keys via the --random parameter but beware there is a runtime cost involved in generating random keys. Other benchmarks are prohibited from using random keys because it can skew the results, e.g. when inserting an existing key or deleting a missing one. All measurements are recorded from within the kernel to eliminate syscall overhead. Most benchmarks run an XDP program to generate stats but FREE needs to collect latencies using fentry/fexit on map_free_deferred() because it's not possible to use fentry directly on lpm_trie.c since commit `c83508da56` ("bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs") and there's no way to create/destroy a map from within an XDP program. Here is example output from an AMD EPYC 9684X 96-Core machine for each of the benchmarks using a trie with 10K entries and a 32-bit prefix length, e.g. $ ./bench lpm-trie-$op \ --prefix_len=32 \ --producers=1 \ --nr_entries=10000 noop: throughput 74.417 ± 0.032 M ops/s ( 74.417M ops/prod), latency 13.438 ns/op baseline: throughput 70.107 ± 0.171 M ops/s ( 70.107M ops/prod), latency 14.264 ns/op lookup: throughput 8.467 ± 0.047 M ops/s ( 8.467M ops/prod), latency 118.109 ns/op insert: throughput 2.440 ± 0.015 M ops/s ( 2.440M ops/prod), latency 409.290 ns/op update: throughput 2.806 ± 0.042 M ops/s ( 2.806M ops/prod), latency 356.322 ns/op delete: throughput 4.625 ± 0.011 M ops/s ( 4.625M ops/prod), latency 215.613 ns/op free: throughput 0.578 ± 0.006 K ops/s ( 0.578K ops/prod), latency 1.730 ms/op And the same benchmarks using random keys: $ ./bench lpm-trie-$op \ --prefix_len=32 \ --producers=1 \ --nr_entries=10000 \ --random noop: throughput 74.259 ± 0.335 M ops/s ( 74.259M ops/prod), latency 13.466 ns/op baseline: throughput 35.150 ± 0.144 M ops/s ( 35.150M ops/prod), latency 28.450 ns/op lookup: throughput 7.119 ± 0.048 M ops/s ( 7.119M ops/prod), latency 140.469 ns/op insert: N/A update: throughput 2.736 ± 0.012 M ops/s ( 2.736M ops/prod), latency 365.523 ns/op delete: N/A free: N/A Signed-off-by: Matt Fleming <mfleming@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20250827140149.1001557-1-matt@readmodwrite.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-27 17:28:14 -07:00
Alexei Starovoitov	bd27626f48	Merge branch 'bpf-arm64-support-for-timed-may_goto' Puranjay Mohan says: ==================== bpf, arm64: support for timed may_goto Changes in v2->v3: v2: https://lore.kernel.org/all/20250809204833.44803-1-puranjay@kernel.org/ - Rebased on bpf-next/master - Added Acked-by: tags from Xu and Kumar Changes in v1->v2: v1: https://lore.kernel.org/bpf/20250724125443.26182-1-puranjay@kernel.org/ - Added comment in arch_bpf_timed_may_goto() about BPF_REG_FP setup (Xu Kuohai) This set adds support for the timed may_goto instruction for the arm64. The timed may_goto instruction is implemented by the verifier by reserving 2 8byte slots in the program stack and then calling arch_bpf_timed_may_goto() in a loop with the stack offset of these two slots in BPF_REG_AX. It expects the function to put a timestamp in the first slot and the returned count in BPF_REG_AX is put into the second slot by a store instruction emitted by the verifier. arch_bpf_timed_may_goto() is special as it receives the parameter in BPF_REG_AX and is expected to return the result in BPF_REG_AX as well. It can't clobber any caller saved registers because verifier doesn't save anything before emitting the call. So, arch_bpf_timed_may_goto() is implemented in assembly so the exact registers that are stored/restored can be controlled (BPF caller saved registers here) and it also needs to take care of moving arguments and return values to and from BPF_REG_AX <-> arm64 R0. So, arch_bpf_timed_may_goto() acts as a trampoline to call bpf_check_timed_may_goto() which does the main logic of placing the timestamp and returning the count. All tests that use may_goto instruction pass after the changing some of them in patch 2 #404 stream_errors:OK [...] #406/2 stream_success/stream_cond_break:OK [...] #494/23 verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP #494/24 verifier_bpf_fastcall/may_goto_interaction_arm64:OK [...] #539/1 verifier_may_goto_1/may_goto 0:OK #539/2 verifier_may_goto_1/batch 2 of may_goto 0:OK #539/3 verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK #539/4 verifier_may_goto_1/may_goto batch with offsets 2/0:OK #539 verifier_may_goto_1:OK #540/1 verifier_may_goto_2/C code with may_goto 0:OK #540 verifier_may_goto_2:OK Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED ==================== Link: https://patch.msgid.link/20250827113245.52629-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-27 17:16:23 -07:00
Puranjay Mohan	22b22bf9ee	selftests/bpf: Enable timed may_goto tests for arm64 As arm64 JIT now supports timed may_goto instruction, make sure all relevant tests run on this architecture. Some tests were enabled and other required modifications to work properly on arm64. $ ./test_progs -a "stream","may_goto*",verifier_bpf_fastcall #404 stream_errors:OK [...] #406/2 stream_success/stream_cond_break:OK [...] #494/23 verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP #494/24 verifier_bpf_fastcall/may_goto_interaction_arm64:OK [...] #539/1 verifier_may_goto_1/may_goto 0:OK #539/2 verifier_may_goto_1/batch 2 of may_goto 0:OK #539/3 verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK #539/4 verifier_may_goto_1/may_goto batch with offsets 2/0:OK #539 verifier_may_goto_1:OK #540/1 verifier_may_goto_2/C code with may_goto 0:OK #540 verifier_may_goto_2:OK Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20250827113245.52629-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-27 17:16:22 -07:00
Puranjay Mohan	16175375da	bpf, arm64: Add JIT support for timed may_goto When verifier sees a timed may_goto instruction, it emits a call to arch_bpf_timed_may_goto() with a stack offset in BPF_REG_AX (arm64 r9) and expects a count value to be returned in the same register. The verifier doesn't save or restore any registers before emitting this call. arch_bpf_timed_may_goto() should act as a trampoline to call bpf_check_timed_may_goto() with AAPCS64 calling convention. To support this custom calling convention, implement arch_bpf_timed_may_goto() in assembly and make sure BPF caller saved registers are saved and restored, call bpf_check_timed_may_goto with arm64 calling convention where first argument and return value both are in x0, then put the result back into BPF_REG_AX before returning. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20250827113245.52629-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-27 17:16:22 -07:00
Andrii Nakryiko	4c229f337e	Merge branch 'libbpf-fix-usdt-sib-argument-handling-causing-unrecognized-register-error' Jiawei Zhao says: ==================== libbpf: fix USDT SIB argument handling causing unrecognized register error When using GCC on x86-64 to compile an usdt prog with -O1 or higher optimization, the compiler will generate SIB addressing mode for global array, e.g. "1@-96(%rbp,%rax,8)". The current USDT implementation in libbpf cannot parse these two formats, causing `bpf_program__attach_usdt()` to fail with -ENOENT (unrecognized register). This patch series adds support for SIB addressing mode in USDT probes. The main changes include: - add correct handling logic for SIB-addressed arguments in `parse_usdt_arg`. - add an usdt_o2 test case to cover SIB addressing mode. Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8) argument spec and passes all validation checks. The modification history of this patch series: Change since v1: - refactor the code to make it more readable - modify the commit message to explain why and how Change since v2: - fix the `scale` uninitialized error Change since v3: - force -O2 optimization for usdt.test.o to generate SIB addressing usdt and pass all test cases. Change since v4: - split the patch into two parts, one for the fix and the other for the test Change since v5: - Only enable optimization for x86 architecture to generate SIB addressing usdt argument spec. Change since v6: - Add an usdt_o2 test case to cover SIB addressing mode. - Reinstate the usdt.c test case. Change since v7: - Refactor modifications to __bpf_usdt_arg_spec to avoid increasing its size, achieving better compatibility - Fix some minor code style issues - Refactor the usdt_o2 test case, removing semaphore and adding GCC attribute to force -O2 optimization Change since v8: - Refactor the usdt_o2 test case, using assembly to force SIB addressing mode. Change since v9: - Only enable the usdt_o2 test case on x86_64 and i386 architectures since the SIB addressing mode is only supported on x86_64 and i386. Change since v10: - Replace `__attribute__((optimize("O2")))` with `#pragma GCC optimize("O1")` to fix the issue where the optimized compilation condition works improperly. - Renamed test case usdt_o2 and relevant files name to usdt_o1 in that O1 level optimization is enough to generate SIB addressing usdt argument spec. Change since v11: - Replace `STAP_PROBE1` with `STAP_PROBE_ASM` - Use bit fields instead of bit shifting operations - Merge the usdt_o1 test case into the usdt test case Change since v12: - This patch is same with the v12 but with a new version number. Change since v13(resolve some review comments): - https://lore.kernel.org/bpf/CAEf4BzZWd2zUC=U6uGJFF3EMZ7zWGLweQAG3CJWTeHy-5yFEPw@mail.gmail.com/ - https://lore.kernel.org/bpf/CAEf4Bzbs3hV_Q47+d93tTX13WkrpkpOb4=U04mZCjHyZg4aVdw@mail.gmail.com/ Change since v14: - fix a typo in __bpf_usdt_arg_spec Change since v15(resolve some review comments): - https://lore.kernel.org/bpf/CAEf4BzaxuYijEfQMDFZ+CQdjxLuDZiesUXNA-SiopS+5+VxRaA@mail.gmail.com/ - https://lore.kernel.org/bpf/CAEf4BzaHi5kpuJ6OVvDU62LT5g0qHbWYMfb_FBQ3iuvvUF9fag@mail.gmail.com/ - https://lore.kernel.org/bpf/d438bf3a-a9c9-4d34-b814-63f2e9bb3a85@linux.dev/ ==================== Link: https://patch.msgid.link/20250827053128.1301287-1-phoenix500526@163.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2025-08-27 15:48:08 -07:00
Jiawei Zhao	69424097ee	selftests/bpf: Enrich subtest_basic_usdt case in selftests to cover SIB handling logic When using GCC on x86-64 to compile an usdt prog with -O1 or higher optimization, the compiler will generate SIB addressing mode for global array, e.g. "1@-96(%rbp,%rax,8)". In this patch: - enrich subtest_basic_usdt test case to cover SIB addressing usdt argument spec handling logic Signed-off-by: Jiawei Zhao <phoenix500526@163.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250827053128.1301287-3-phoenix500526@163.com	2025-08-27 15:48:05 -07:00
Jiawei Zhao	758acb9ccf	libbpf: Fix USDT SIB argument handling causing unrecognized register error On x86-64, USDT arguments can be specified using Scale-Index-Base (SIB) addressing, e.g. "1@-96(%rbp,%rax,8)". The current USDT implementation in libbpf cannot parse this format, causing `bpf_program__attach_usdt()` to fail with -ENOENT (unrecognized register). This patch fixes this by implementing the necessary changes: - add correct handling for SIB-addressed arguments in `bpf_usdt_arg`. - add adaptive support to `__bpf_usdt_arg_type` and `__bpf_usdt_arg_spec` to represent SIB addressing parameters. Signed-off-by: Jiawei Zhao <phoenix500526@163.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250827053128.1301287-2-phoenix500526@163.com	2025-08-27 15:44:25 -07:00
Shubham Sharma	d3abefe897	selftests/bpf: Fix typos and grammar in test sources Fix spelling typos and grammar errors in BPF selftests source code. Signed-off-by: Shubham Sharma <slopixelz@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250826125746.17983-1-slopixelz@gmail.com	2025-08-27 15:13:08 -07:00
Nandakumar Edamana	2660b9d477	bpf: Add selftest to check the verifier's abstract multiplication Add new selftest to test the abstract multiplication technique(s) used by the verifier, following the recent improvement in tnum multiplication (tnum_mul). One of the newly added programs, verifier_mul/mul_precise, results in a false positive with the old tnum_mul, while the program passes with the latest one. Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250826034524.2159515-2-nandakumar@nandakumar.co.in	2025-08-27 15:00:31 -07:00
Nandakumar Edamana	1df7dad4d5	bpf: Improve the general precision of tnum_mul Drop the value-mask decomposition technique and adopt straightforward long-multiplication with a twist: when LSB(a) is uncertain, find the two partial products (for LSB(a) = known 0 and LSB(a) = known 1) and take a union. Experiment shows that applying this technique in long multiplication improves the precision in a significant number of cases (at the cost of losing precision in a relatively lower number of cases). Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250826034524.2159515-1-nandakumar@nandakumar.co.in	2025-08-27 15:00:26 -07:00
Alexei Starovoitov	2465bb83e0	Merge branch 's390-bpf-add-s390-jit-support-for-timed-may_goto' Ilya Leoshkevich says: ==================== s390/bpf: Add s390 JIT support for timed may_goto v1: https://lore.kernel.org/bpf/20250821103256.291412-1-iii@linux.ibm.com/ v1 -> v2: Fix test_stream_errors (caught by CI). This series adds timed may_goto implementation to the s390x JIT. Patch 1 is the implementation itself, patches 2-5 are the associated test changes. ==================== Link: https://patch.msgid.link/20250821113339.292434-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-26 16:52:07 -07:00
Ilya Leoshkevich	21bce56940	selftests/bpf: Remove may_goto tests from DENYLIST.s390x The may_goto instruction is now fully supported on s390x, including the timed implementation, so remove the respective test from the denylist. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250821113339.292434-6-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-26 16:51:52 -07:00
Ilya Leoshkevich	7197dbcba2	selftests/bpf: Enable timed may_goto verifier tests on s390x Now that the timed may_goto implementation is available on s390x, enable the respective verifier tests. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250821113339.292434-5-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-26 16:51:52 -07:00
Ilya Leoshkevich	1e4e6b9e26	selftests/bpf: Add __arch_s390x macro Make it possible to limit certain tests to s390x, just like it's already done for x86_64, arm64, and riscv64. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250821113339.292434-4-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-26 16:51:52 -07:00
Ilya Leoshkevich	b68dfcc12a	selftests/bpf: Add a missing newline to the "bad arch spec" message Fix error messages like this one: parse_test_spec:FAIL:569 bad arch spec: 's390x'process_subtest:FAIL:1153 Can't parse test spec for program 'may_goto_simple' Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250821113339.292434-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-26 16:51:52 -07:00
Ilya Leoshkevich	b8efa810c1	s390/bpf: Add s390 JIT support for timed may_goto The verifier provides an architecture-independent implementation of the may_goto instruction, which is currently used on s390x, but it has a downside: there is no way to prevent progs using it from running for a very long time. The solution to this problem is an alternative timed implementation, which requires architecture-specific bits. Its availability is signaled to the verifier by bpf_jit_supports_timed_may_goto() returning true. The verifier then emits a call to arch_bpf_timed_may_goto() using a non-standard calling convention. This function must act as a trampoline for bpf_check_timed_may_goto(). Implement bpf_jit_supports_timed_may_goto(), account for the special calling convention in the BPF_CALL implementation, and implement arch_bpf_timed_may_goto(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250821113339.292434-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-26 16:51:52 -07:00
Tiezhu Yang	d0f27ff27c	selftests/bpf: Remove entries from config.{arch} already present in config `config.{arch}` had entries already present in `config`. When generating the config used by vmtest, concatenate the `config` file with the `config.{arch}` one, making those entries duplicated, so remove those duplications. Use the following command to get the differences: $ comm -1 -2 <(sort tools/testing/selftests/bpf/config.x86_64) <(sort tools/testing/selftests/bpf/config) $ comm -1 -2 <(sort tools/testing/selftests/bpf/config.aarch64) <(sort tools/testing/selftests/bpf/config) $ comm -1 -2 <(sort tools/testing/selftests/bpf/config.riscv64) <(sort tools/testing/selftests/bpf/config) $ comm -1 -2 <(sort tools/testing/selftests/bpf/config.ppc64el) <(sort tools/testing/selftests/bpf/config) $ comm -1 -2 <(sort tools/testing/selftests/bpf/config.s390x) <(sort tools/testing/selftests/bpf/config) This is similar with commit `7a42af4b94` ("selftests/bpf: Remove entries from config.s390x already present in config"). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250826065057.11415-1-yangtiezhu@loongson.cn	2025-08-26 13:47:03 +02:00
Alexei Starovoitov	f4c227cc97	Merge branch 'bpf-introduce-and-use-rcu_read_lock_dont_migrate' Menglong Dong says: ==================== bpf: introduce and use rcu_read_lock_dont_migrate migrate_disable() and rcu_read_lock() are used to together in many case in bpf. However, when PREEMPT_RCU is not enabled, rcu_read_lock() will disable preemption, which indicate migrate_disable(), so we don't need to call it in this case. In this series, we introduce rcu_read_lock_dont_migrate and rcu_read_unlock_migrate, which will call migrate_disable and migrate_enable only when PREEMPT_RCU enabled. And use rcu_read_lock_dont_migrate in bpf subsystem. Changes since V2: * make rcu_read_lock_dont_migrate() more compatible by using IS_ENABLED() Changes since V1: * introduce rcu_read_lock_dont_migrate() instead of rcu_migrate_disable() + rcu_read_lock() ==================== Link: https://patch.msgid.link/20250821090609.42508-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:17 -07:00
Menglong Dong	8e4f0b1ebc	bpf: use rcu_read_lock_dont_migrate() for trampoline.c Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in trampoline.c to obtain better performance when PREEMPT_RCU is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250821090609.42508-8-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:16 -07:00
Menglong Dong	427a36bb55	bpf: use rcu_read_lock_dont_migrate() for bpf_prog_run_array_cg() Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in bpf_prog_run_array_cg to obtain better performance when PREEMPT_RCU is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250821090609.42508-7-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:16 -07:00
Menglong Dong	cf4303b70d	bpf: use rcu_read_lock_dont_migrate() for bpf_task_storage_free() Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in bpf_task_storage_free to obtain better performance when PREEMPT_RCU is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250821090609.42508-6-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:16 -07:00
Menglong Dong	68748f0397	bpf: use rcu_read_lock_dont_migrate() for bpf_iter_run_prog() Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in bpf_iter_run_prog to obtain better performance when PREEMPT_RCU is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250821090609.42508-5-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:16 -07:00
Menglong Dong	f2fa9b9069	bpf: use rcu_read_lock_dont_migrate() for bpf_inode_storage_free() Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in bpf_inode_storage_free to obtain better performance when PREEMPT_RCU is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250821090609.42508-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:16 -07:00
Menglong Dong	8c0afc7c9c	bpf: use rcu_read_lock_dont_migrate() for bpf_cgrp_storage_free() Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in bpf_cgrp_storage_free to obtain better performance when PREEMPT_RCU is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250821090609.42508-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:16 -07:00
Menglong Dong	1b93c03fb3	rcu: add rcu_read_lock_dont_migrate() migrate_disable() is called to disable migration in the kernel, and it is often used together with rcu_read_lock(). However, with PREEMPT_RCU disabled, it's unnecessary, as rcu_read_lock() will always disable preemption, which will also disable migration. Introduce rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate(), which will do the migration enable and disable only when PREEMPT_RCU. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20250821090609.42508-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-25 18:52:16 -07:00
Tao Chen	4223bf833c	bpf: Remove preempt_disable in bpf_try_get_buffers Now BPF program will run with migration disabled, so it is safe to access this_cpu_inc_return(bpf_bprintf_nest_level). Fixes: `d9c9e4db18` ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250819125638.2544715-1-chen.dylane@linux.dev	2025-08-22 11:44:09 -07:00
Eric Biggers	d47cc4dea1	bpf: Use sha1() instead of sha1_transform() in bpf_prog_calc_tag() Now that there's a proper SHA-1 library API, just use that instead of the low-level SHA-1 compression function. This eliminates the need for bpf_prog_calc_tag() to implement the SHA-1 padding itself. No functional change; the computed tags remain the same. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250811201615.564461-1-ebiggers@kernel.org	2025-08-22 11:40:05 -07:00
Paul Chaignon	0780f54ab1	selftests/bpf: Tests for is_scalar_branch_taken tnum logic This patch adds tests for the new jeq and jne logic in is_scalar_branch_taken. The following shows the first test failing before the previous patch is applied. Once the previous patch is applied, the verifier can use the tnum values to deduce that instruction 7 is dead code. 0: call bpf_get_prandom_u32#7 ; R0_w=scalar() 1: w0 = w0 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 2: r0 >>= 30 ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=3,var_off=(0x0; 0x3)) 3: r0 <<= 30 ; R0_w=scalar(smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000)) 4: r1 = r0 ; R0_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000)) R1_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000)) 5: r1 += 1024 ; R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000400,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000)) 6: if r1 != r0 goto pc+1 ; R0_w=scalar(id=1,smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000000,var_off=(0x400; 0xc0000000)) R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000)) 7: r10 = 0 frame pointer is read only Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/550004f935e2553bdb2fb1f09cbde7d0452112d0.1755694148.git.paul.chaignon@gmail.com	2025-08-22 18:12:34 +02:00
Paul Chaignon	f41345f47f	bpf: Use tnums for JEQ/JNE is_branch_taken logic In the following toy program (reg states minimized for readability), R0 and R1 always have different values at instruction 6. This is obvious when reading the program but cannot be guessed from ranges alone as they overlap (R0 in [0; 0xc0000000], R1 in [1024; 0xc0000400]). 0: call bpf_get_prandom_u32#7 ; R0_w=scalar() 1: w0 = w0 ; R0_w=scalar(var_off=(0x0; 0xffffffff)) 2: r0 >>= 30 ; R0_w=scalar(var_off=(0x0; 0x3)) 3: r0 <<= 30 ; R0_w=scalar(var_off=(0x0; 0xc0000000)) 4: r1 = r0 ; R1_w=scalar(var_off=(0x0; 0xc0000000)) 5: r1 += 1024 ; R1_w=scalar(var_off=(0x400; 0xc0000000)) 6: if r1 != r0 goto pc+1 Looking at tnums however, we can deduce that R1 is always different from R0 because their tnums don't agree on known bits. This patch uses this logic to improve is_scalar_branch_taken in case of BPF_JEQ and BPF_JNE. This change has a tiny impact on complexity, which was measured with the Cilium complexity CI test. That test covers 72 programs with various build and load time configurations for a total of 970 test cases. For 80% of test cases, the patch has no impact. On the other test cases, the patch decreases complexity by only 0.08% on average. In the best case, the verifier needs to walk 3% less instructions and, in the worst case, 1.5% more. Overall, the patch has a small positive impact, especially for our largest programs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/be3ee70b6e489c49881cb1646114b1d861b5c334.1755694147.git.paul.chaignon@gmail.com	2025-08-22 18:12:24 +02:00
Hengqi Chen	21aeabb682	selftests/bpf: Use vmlinux.h for BPF programs Some of the bpf test progs still use linux/libc headers. Let's use vmlinux.h instead like the rest of test progs. This will also ease cross compiling. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250821030254.398826-1-hengqi.chen@gmail.com	2025-08-21 11:32:25 -07:00
Cryolitia PukNgae	78e097fbca	libbpf: Add documentation to version and error API functions Add documentation for the following API functions: - libbpf_major_version() - libbpf_minor_version() - libbpf_version_string() - libbpf_strerror() Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250820-libbpf-doc-1-v1-1-13841f25a134@uniontech.com	2025-08-20 13:33:02 -07:00

1 2 3 4 5 ...

1381800 Commits