linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-12-27 12:21:22 -05:00

Author	SHA1	Message	Date
Menglong Dong	ae4a3160d1	bpf: specify the old and new poke_type for bpf_arch_text_poke In the origin logic, the bpf_arch_text_poke() assume that the old and new instructions have the same opcode. However, they can have different opcode if we want to replace a "call" insn with a "jmp" insn. Therefore, add the new function parameter "old_t" along with the "new_t", which are used to indicate the old and new poke type. Meanwhile, adjust the implement of bpf_arch_text_poke() for all the archs. "BPF_MOD_NOP" is added to make the code more readable. In bpf_arch_text_poke(), we still check if the new and old address is NULL to determine if nop insn should be used, which I think is more safe. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20251118123639.688444-6-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-24 09:47:03 -08:00
Menglong Dong	47c9214dcb	bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME Some places calculate the origin_call by checking if BPF_TRAMP_F_SKIP_FRAME is set. However, it should use BPF_TRAMP_F_ORIG_STACK for this propose. Just fix them. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20251118123639.688444-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-24 09:47:03 -08:00
Linus Torvalds	ae28ed4578	Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ...	2025-09-30 17:58:11 -07:00
Linus Torvalds	a5ba183bde	Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One notable addition is the creation of the 'transitional' keyword for kconfig so CONFIG renaming can go more smoothly. This has been a long-standing deficiency, and with the renaming of CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI support), this came up again. The breadth of the diffstat is mainly this renaming. - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva) - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure (Junjie Cao) - Add str_assert_deassert() helper (Lad Prabhakar) - gcc-plugins: Remove TODO_verify_il for GCC >= 16 - kconfig: Fix BrokenPipeError warnings in selftests - kconfig: Add transitional symbol attribute for migration support - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI" * tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/string_choices: Add str_assert_deassert() helper kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI kconfig: Add transitional symbol attribute for migration support kconfig: Fix BrokenPipeError warnings in selftests gcc-plugins: Remove TODO_verify_il for GCC >= 16 stddef: Introduce __TRAILING_OVERLAP() stddef: Remove token-pasting in TRAILING_OVERLAP() lkdtm: fortify: Fix potential NULL dereference on kmalloc failure	2025-09-29 17:48:27 -07:00
Kees Cook	23ef9d4397	kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI The kernel's CFI implementation uses the KCFI ABI specifically, and is not strictly tied to a particular compiler. In preparation for GCC supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with associated options). Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will enable CONFIG_CFI during olddefconfig. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>	2025-09-24 14:29:14 -07:00
Chenghao Duan	d0bf7cd5df	riscv: bpf: Fix uninitialized symbol 'retval_off' In the __arch_prepare_bpf_trampoline() function, retval_off is only meaningful when save_ret is true, so the current logic is correct. However, in the original logic, retval_off is only initialized under certain conditions; for example, in the fmod_ret logic, the compiler is not aware that the flags of the fmod_ret program (prog) have set BPF_TRAMP_F_CALL_ORIG, which results in an uninitialized symbol compilation warning. So initialize retval_off unconditionally to fix it. Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20250922062244.822937-2-duanchenghao@kylinos.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:29:03 -07:00
Kumar Kartikeya Dwivedi	a91ae3c893	bpf, x86: Add support for signed arena loads Currently, signed load instructions into arena memory are unsupported. The compiler is free to generate these, and on GCC-14 we see a corresponding error when it happens. The hurdle in supporting them is deciding which unused opcode to use to mark them for the JIT's own consumption. After much thinking, it appears 0xc0 / BPF_NOSPEC can be combined with load instructions to identify signed arena loads. Use this to recognize and JIT them appropriately, and remove the verifier side limitation on the program if the JIT supports them. Co-developed-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250923110157.18326-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:22 -07:00
Hengqi Chen	fd2e081289	riscv, bpf: Sign extend struct ops return values properly The ns_bpf_qdisc selftest triggers a kernel panic: Unable to handle kernel paging request at virtual address ffffffffa38dbf58 Current test_progs pgtable: 4K pagesize, 57-bit VAs, pgdp=0x00000001109cc000 [ffffffffa38dbf58] pgd=000000011fffd801, p4d=000000011fffd401, pud=000000011fffd001, pmd=0000000000000000 Oops [#1] Modules linked in: bpf_testmod(OE) xt_conntrack nls_iso8859_1 [...] [last unloaded: bpf_testmod(OE)] CPU: 1 UID: 0 PID: 23584 Comm: test_progs Tainted: G W OE 6.17.0-rc1-g2465bb83e0b4 #1 NONE Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2024.01+dfsg-1ubuntu5.1 01/01/2024 epc : __qdisc_run+0x82/0x6f0 ra : __qdisc_run+0x6e/0x6f0 epc : ffffffff80bd5c7a ra : ffffffff80bd5c66 sp : ff2000000eecb550 gp : ffffffff82472098 tp : ff60000096895940 t0 : ffffffff8001f180 t1 : ffffffff801e1664 t2 : 0000000000000000 s0 : ff2000000eecb5d0 s1 : ff60000093a6a600 a0 : ffffffffa38dbee8 a1 : 0000000000000001 a2 : ff2000000eecb510 a3 : 0000000000000001 a4 : 0000000000000000 a5 : 0000000000000010 a6 : 0000000000000000 a7 : 0000000000735049 s2 : ffffffffa38dbee8 s3 : 0000000000000040 s4 : ff6000008bcda000 s5 : 0000000000000008 s6 : ff60000093a6a680 s7 : ff60000093a6a6f0 s8 : ff60000093a6a6ac s9 : ff60000093140000 s10: 0000000000000000 s11: ff2000000eecb9d0 t3 : 0000000000000000 t4 : 0000000000ff0000 t5 : 0000000000000000 t6 : ff60000093a6a8b6 status: 0000000200000120 badaddr: ffffffffa38dbf58 cause: 000000000000000d [<ffffffff80bd5c7a>] __qdisc_run+0x82/0x6f0 [<ffffffff80b6fe58>] __dev_queue_xmit+0x4c0/0x1128 [<ffffffff80b80ae0>] neigh_resolve_output+0xd0/0x170 [<ffffffff80d2daf6>] ip6_finish_output2+0x226/0x6c8 [<ffffffff80d31254>] ip6_finish_output+0x10c/0x2a0 [<ffffffff80d31446>] ip6_output+0x5e/0x178 [<ffffffff80d2e232>] ip6_xmit+0x29a/0x608 [<ffffffff80d6f4c6>] inet6_csk_xmit+0xe6/0x140 [<ffffffff80c985e4>] __tcp_transmit_skb+0x45c/0xaa8 [<ffffffff80c995fe>] tcp_connect+0x9ce/0xd10 [<ffffffff80d66524>] tcp_v6_connect+0x4ac/0x5e8 [<ffffffff80cc19b8>] __inet_stream_connect+0xd8/0x318 [<ffffffff80cc1c36>] inet_stream_connect+0x3e/0x68 [<ffffffff80b42b20>] __sys_connect_file+0x50/0x88 [<ffffffff80b42bee>] __sys_connect+0x96/0xc8 [<ffffffff80b42c40>] __riscv_sys_connect+0x20/0x30 [<ffffffff80e5bcae>] do_trap_ecall_u+0x256/0x378 [<ffffffff80e69af2>] handle_exception+0x14a/0x156 Code: 892a 0363 1205 489c 8bc1 c7e5 2d03 084a 2703 080a (2783) 0709 ---[ end trace 0000000000000000 ]--- The bpf_fifo_dequeue prog returns a skb which is a pointer. The pointer is treated as a 32bit value and sign extend to 64bit in epilogue. This behavior is right for most bpf prog types but wrong for struct ops which requires RISC-V ABI. So let's sign extend struct ops return values according to the function model and RISC-V ABI([0]). [0]: https://riscv.org/wp-content/uploads/2024/12/riscv-calling.pdf Fixes: `25ad10658d` ("riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework") Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/bpf/20250908012448.1695-1-hengqi.chen@gmail.com	2025-09-15 13:21:32 +02:00
Hengqi Chen	6798668ab1	riscv, bpf: Remove duplicated bpf_flush_icache() The bpf_flush_icache() is done by bpf_arch_text_copy() already. Remove the duplicated one in arch_prepare_bpf_trampoline(). Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/bpf/20250904105119.21861-1-hengqi.chen@gmail.com	2025-09-15 13:16:55 +02:00
Alexei Starovoitov	5d87e96a49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5 Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-11 09:34:37 -07:00
Radim Krčmář	8a16586fa7	riscv, bpf: use lw when reading int cpu in bpf_get_smp_processor_id emit_ld is wrong, because thread_info.cpu is 32-bit, not xlen-bit wide. The struct currently has a hole after cpu, so little endian accesses seemed fine. Fixes: `2ddec2c80b` ("riscv, bpf: inline bpf_get_smp_processor_id()") Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20250812090256.757273-4-rkrcmar@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-05 15:31:20 -06:00
Radim Krčmář	ad5348c765	riscv, bpf: use lw when reading int cpu in BPF_MOV64_PERCPU_REG emit_ld is wrong, because thread_info.cpu is 32-bit, not xlen-bit wide. The struct currently has a hole after cpu, so little endian accesses seemed fine. Fixes: `19c56d4e5b` ("riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs") Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250812090256.757273-3-rkrcmar@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2025-09-05 15:31:17 -06:00
Pu Lehui	fb7cefabae	riscv, bpf: Add support arena atomics for RV64 Add arena atomics support for RMW atomics and load-acquire and store-release instructions. Non-Zacas cmpxchg is implemented via loop, which is not currently supported because it requires more complex extable and loop logic. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250719091730.2660197-10-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Pu Lehui	b18f4aae6a	riscv, bpf: Add ex_insn_off and ex_jmp_off for exception table handling Add ex_insn_off and ex_jmp_off fields to struct rv_jit_context so that add_exception_handler() does not need to be immediately followed by the instruction to add the exception table. ex_insn_off indicates the offset of the instruction to add the exception table, and ex_jmp_off indicates the offset to jump over the faulting instruction. This is to prepare for adding the exception table to atomic instructions later, because some atomic instructions need to perform zext or other operations. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250719091730.2660197-9-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Pu Lehui	1c0196b878	riscv, bpf: Optimize cmpxchg insn with Zacas support Optimize cmpxchg instruction with amocas.w and amocas.d Zacas instructions. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250719091730.2660197-8-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Pu Lehui	de39d2c4cd	riscv, bpf: Add Zacas instructions Add Zacas instructions introduced by [0] to reduce code size and improve performance of RV64 JIT. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://github.com/riscvarchive/riscv-zacas/releases/download/v1.0/riscv-zacas.pdf [0] Link: https://lore.kernel.org/bpf/20250719091730.2660197-7-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Pu Lehui	5090b339ee	riscv, bpf: Add rv_ext_enabled macro for runtime detection extentsion Add rv_ext_enabled macro to check whether the runtime detection extension is enabled. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250719091730.2660197-6-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Pu Lehui	01422a4f2c	riscv, bpf: Extract emit_ldx() helper There's a lot of redundant code related to load into register operations, let's extract emit_ldx() to make code more compact. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250719091730.2660197-4-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Pu Lehui	d92c11a6b5	riscv, bpf: Extract emit_st() helper There's a lot of redundant code related to store from immediate operations, let's extract emit_st() to make code more compact. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250719091730.2660197-3-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Pu Lehui	02fc01adec	riscv, bpf: Extract emit_stx() helper There's a lot of redundant code related to store from register operations, let's extract emit_stx() to make code more compact. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250719091730.2660197-2-pulehui@huaweicloud.com	2025-08-15 10:46:51 +02:00
Peilin Ye	db7a3822b5	bpf, riscv64: Skip redundant zext instruction after load-acquire Currently, the verifier inserts a zext instruction right after every 8-, 16- or 32-bit load-acquire, which is already zero-extending. Skip such redundant zext instructions. While we are here, update that already-obsolete comment about "skip the next instruction" in build_body(). Also change emit_atomic_rmw()'s parameters to keep it consistent with emit_atomic_ld_st(). Note that checking 'insn[1]' relies on 'insn' not being the last instruction, which should have been guaranteed by the verifier; we already use 'insn[1]' elsewhere in the file for similar purposes. Additionally, we don't check if 'insn[1]' is actually a zext for our load-acquire's dst_reg, or some other registers - in other words, here we are relying on the verifier to always insert a redundant zext right after a 8/16/32-bit load-acquire, for its dst_reg. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/10e90e0eab042f924d35ad0d1c1f7ca29f673152.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-05-09 10:05:27 -07:00
Andrea Parri	8afd3170d5	bpf, riscv64: Support load-acquire and store-release instructions Support BPF load-acquire (BPF_LOAD_ACQ) and store-release (BPF_STORE_REL) instructions in the riscv64 JIT compiler. For example, consider the following 64-bit load-acquire (assuming little-endian): db 10 00 00 00 01 00 00 r1 = load_acquire((u64 )(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC \| BPF_DW \| BPF_STX imm (0x00000100): BPF_LOAD_ACQ The JIT compiler will emit an LD instruction followed by a FENCE R,RW instruction for the above, e.g.: ld x7,0(x6) fence r,rw Similarly, consider the following 16-bit store-release: cb 21 00 00 10 01 00 00 store_release((u16 )(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit opcode (0xcb): BPF_ATOMIC \| BPF_H \| BPF_STX imm (0x00000110): BPF_STORE_REL A FENCE RW,W instruction followed by an SH instruction will be emitted, e.g.: fence rw,w sh x2,0(x4) 8-bit and 16-bit load-acquires are zero-extending (cf., LBU, LHU). The verifier always rejects misaligned load-acquires/store-releases (even if BPF_F_ANY_ALIGNMENT is set), so the emitted load and store instructions are guaranteed to be single-copy atomic. Introduce primitives to emit the relevant (and the most common/used in the kernel) fences, i.e. fences with R -> RW, RW -> W and RW -> RW. Rename emit_atomic() to emit_atomic_rmw() to make it clear that it only handles RMW atomics, and replace its is64 parameter to allow to perform the required checks on the opsize (BPF_SIZE(code)). Acked-by: Björn Töpel <bjorn@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Co-developed-by: Peilin Ye <yepeilin@google.com> Signed-off-by: Peilin Ye <yepeilin@google.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/3059c560e537ad43ed19055d2ebbd970c698095a.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-05-09 10:05:27 -07:00
Andrea Parri	118ae46b79	bpf, riscv64: Introduce emit_load_() and emit_store_() We're planning to add support for the load-acquire and store-release BPF instructions. Define emit_load_<size>() and emit_store_<size>() to enable/facilitate the (re)use of their code. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Tested-by: Peilin Ye <yepeilin@google.com> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> [yepeilin@google.com: cosmetic change to commit title] Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/fce89473a5748e1631d18a5917d953460d1ae0d0.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-05-09 10:05:27 -07:00
Mike Rapoport (Microsoft)	0c3beacf68	asm-generic: introduce text-patching.h Several architectures support text patching, but they name the header files that declare patching functions differently. Make all such headers consistently named text-patching.h and add an empty header in asm-generic for architectures that do not support text patching. Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-07 14:25:15 -08:00
Andrea Parri	e59db0623f	riscv, bpf: Make BPF_CMPXCHG fully ordered According to the prototype formal BPF memory consistency model discussed e.g. in [1] and following the ordering properties of the C/in-kernel macro atomic_cmpxchg(), a BPF atomic operation with the BPF_CMPXCHG modifier is fully ordered. However, the current RISC-V JIT lowerings fail to meet such memory ordering property. This is illustrated by the following litmus test: BPF BPF__MP+success_cmpxchg+fence { 0:r1=x; 0:r3=y; 0:r5=1; 1:r2=y; 1:r4=f; 1:r7=x; } P0 \| P1 ; (u64 )(r1 + 0) = 1 \| r1 = (u64 )(r2 + 0) ; r2 = cmpxchg_64 (r3 + 0, r4, r5) \| r3 = atomic_fetch_add((u64 )(r4 + 0), r5) ; \| r6 = (u64 *)(r7 + 0) ; exists (1:r1=1 /\ 1:r6=0) whose "exists" clause is not satisfiable according to the BPF memory model. Using the current RISC-V JIT lowerings, the test can be mapped to the following RISC-V litmus test: RISCV RISCV__MP+success_cmpxchg+fence { 0:x1=x; 0:x3=y; 0:x5=1; 1:x2=y; 1:x4=f; 1:x7=x; } P0 \| P1 ; sd x5, 0(x1) \| ld x1, 0(x2) ; L00: \| amoadd.d.aqrl x3, x5, 0(x4) ; lr.d x2, 0(x3) \| ld x6, 0(x7) ; bne x2, x4, L01 \| ; sc.d x6, x5, 0(x3) \| ; bne x6, x4, L00 \| ; fence rw, rw \| ; L01: \| ; exists (1:x1=1 /\ 1:x6=0) where the two stores in P0 can be reordered. Update the RISC-V JIT lowerings/implementation of BPF_CMPXCHG to emit an SC with RELEASE ("rl") annotation in order to meet the expected memory ordering guarantees. The resulting RISC-V JIT lowerings of BPF_CMPXCHG match the RISC-V lowerings of the C atomic_cmpxchg(). Other lowerings were fixed via `20a759df3b` ("riscv, bpf: make some atomic operations fully ordered"). Fixes: `dd642ccb45` ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lpc.events/event/18/contributions/1949/attachments/1665/3441/bpfmemmodel.2024.09.19p.pdf [1] Link: https://lore.kernel.org/bpf/20241017143628.2673894-1-parri.andrea@gmail.com	2024-10-17 17:14:48 +02:00
Pu Lehui	30a59cc797	riscv, bpf: Fix possible infinite tailcall when CONFIG_CFI_CLANG is enabled When CONFIG_CFI_CLANG is enabled, the number of prologue instructions skipped by tailcall needs to include the kcfi instruction, otherwise the TCC will be initialized every tailcall is called, which may result in infinite tailcalls. Fixes: `e63985ecd2` ("bpf, riscv64/cfi: Support kCFI + BPF on riscv64") Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20241008124544.171161-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-10-09 18:23:06 -07:00
Linus Torvalds	f557af081d	Merge tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various new ISA extensions: * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector extension * Zimop and Zcmop for may-be-operations * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension * Zawrs - riscv,cpu-intc is now dtschema - A handful of performance improvements and cleanups to text patching - Support for memory hot{,un}plug - The highest user-allocatable virtual address is now visible in hwprobe * tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (58 commits) riscv: lib: relax assembly constraints in hweight riscv: set trap vector earlier KVM: riscv: selftests: Add Zawrs extension to get-reg-list test KVM: riscv: Support guest wrs.nto riscv: hwprobe: export Zawrs ISA extension riscv: Add Zawrs support for spinlocks dt-bindings: riscv: Add Zawrs ISA extension description riscv: Provide a definition for 'pause' riscv: hwprobe: export highest virtual userspace address riscv: Improve sbi_ecall() code generation by reordering arguments riscv: Add tracepoints for SBI calls and returns riscv: Optimize crc32 with Zbc extension riscv: Enable DAX VMEMMAP optimization riscv: mm: Add support for ZONE_DEVICE virtio-mem: Enable virtio-mem for RISC-V riscv: Enable memory hotplugging for RISC-V riscv: mm: Take memory hotplug read-lock during kernel page table dump riscv: mm: Add memory hotplugging support riscv: mm: Add pfn_to_kaddr() implementation riscv: mm: Refactor create_linear_mapping_range() for memory hot add ...	2024-07-20 09:11:27 -07:00
Puranjay Mohan	a5912c37fa	riscv, bpf: Optimize stack usage of trampoline When BPF_TRAMP_F_CALL_ORIG is not set, stack space for passing arguments on stack doesn't need to be reserved because the original function is not called. Only reserve space for stacked arguments when BPF_TRAMP_F_CALL_ORIG is set. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/bpf/20240708114758.64414-1-puranjay@kernel.org	2024-07-08 15:44:08 +02:00
Pu Lehui	6801b0aef7	riscv, bpf: Add 12-argument support for RV64 bpf trampoline This patch adds 12 function arguments support for riscv64 bpf trampoline. The current bpf trampoline supports <= sizeof(u64) bytes scalar arguments [0] and <= 16 bytes struct arguments [1]. Therefore, we focus on the situation where scalars are at most XLEN bits and aggregates whose total size does not exceed 2×XLEN bits in the riscv calling convention [2]. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Puranjay Mohan <puranjay@kernel.org> Link: https://elixir.bootlin.com/linux/v6.8/source/kernel/bpf/btf.c#L6184 [0] Link: https://elixir.bootlin.com/linux/v6.8/source/kernel/bpf/btf.c#L6769 [1] Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/draft-20230929-e5c800e661a53efe3c2678d71a306323b60eb13b/riscv-abi.pdf [2] Link: https://lore.kernel.org/bpf/20240702121944.1091530-2-pulehui@huaweicloud.com	2024-07-02 16:01:30 +02:00
Pu Lehui	2382a405c5	riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. We can apply it to bpf trampoline, as Song had been implemented it in core and x86 [0]. This patch is going to use bpf_prog_pack to RV64 bpf trampoline. Since Song and Puranjay have done a lot of work for bpf_prog_pack on RV64, implementing this function will be easy. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> #riscv Link: https://lore.kernel.org/all/20231206224054.492250-1-song@kernel.org [0] Link: https://lore.kernel.org/bpf/20240622030437.3973492-4-pulehui@huaweicloud.com	2024-07-01 17:10:46 +02:00
Pu Lehui	9f1e16fb1f	riscv, bpf: Fix out-of-bounds issue when preparing trampoline image We get the size of the trampoline image during the dry run phase and allocate memory based on that size. The allocated image will then be populated with instructions during the real patch phase. But after commit `26ef208c20` ("bpf: Use arch_bpf_trampoline_size"), the `im` argument is inconsistent in the dry run and real patch phase. This may cause emit_imm in RV64 to generate a different number of instructions when generating the 'im' address, potentially causing out-of-bounds issues. Let's emit the maximum number of instructions for the "im" address during dry run to fix this problem. Fixes: `26ef208c20` ("bpf: Use arch_bpf_trampoline_size") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240622030437.3973492-3-pulehui@huaweicloud.com	2024-07-01 17:10:46 +02:00
Samuel Holland	51781ce8f4	riscv: Pass patch_text() the length in bytes patch_text_nosync() already handles an arbitrary length of code, so this removes a superfluous loop and reduces the number of icache flushes. Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240327160520.791322-6-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-06-26 07:36:31 -07:00
Rafael Passos	9919c5c98c	bpf: remove unused parameter in bpf_jit_binary_pack_finalize Fixes a compiler warning. the bpf_jit_binary_pack_finalize function was taking an extra bpf_prog parameter that went unused. This removves it and updates the callers accordingly. Signed-off-by: Rafael Passos <rafael@rcpassos.me> Link: https://lore.kernel.org/r/20240615022641.210320-2-rafael@rcpassos.me Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-06-20 19:50:26 -07:00
Xiao Wang	96a27ee76f	riscv, bpf: Introduce shift add helper with Zba optimization Zba extension is very useful for generating addresses that index into array of basic data types. This patch introduces sh2add and sh3add helpers for RV32 and RV64 respectively, to accelerate addressing for array of unsigned long data. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240524075543.4050464-3-xiao.w.wang@intel.com	2024-06-03 16:45:23 +02:00
Xiao Wang	99fa63d9ca	riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT We could try to emit compressed insn for reg move operation during CMPXCHG JIT, the instruction compression has no impact on the jump offsets of following forward and backward jump instructions. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240519050507.2217791-1-xiao.w.wang@intel.com	2024-05-24 17:40:33 +02:00
Xiao Wang	e944fc8152	riscv, bpf: Use STACK_ALIGN macro for size rounding up Use the macro STACK_ALIGN that is defined in asm/processor.h for stack size rounding up, just like bpf_jit_comp32.c does. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/bpf/20240523031835.3977713-1-xiao.w.wang@intel.com	2024-05-24 17:16:56 +02:00
Xiao Wang	c12603e76e	riscv, bpf: Optimize zextw insn with Zba extension The Zba extension provides add.uw insn which can be used to implement zext.w with rs2 set as ZERO. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/bpf/20240516090430.493122-1-xiao.w.wang@intel.com	2024-05-24 16:53:12 +02:00
Linus Torvalds	a49468240e	Merge tag 'modules-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: "Finally something fun. Mike Rapoport does some cleanup to allow us to take out module_alloc() out of modules into a new paint shedded execmem_alloc() and execmem_free() so to make emphasis these helpers are actually used outside of modules. It starts with a non-functional changes API rename / placeholders to then allow architectures to define their requirements into a new shiny struct execmem_info with ranges, and requirements for those ranges. Archs now can intitialize this execmem_info as the last part of mm_core_init() if they have to diverge from the norm. Each range is a known type clearly articulated and spelled out in enum execmem_type. Although a lot of this is major cleanup and prep work for future enhancements an immediate clear gain is we get to enable KPROBES without MODULES now. That is ultimately what motiviated to pick this work up again, now with smaller goal as concrete stepping stone" * tag 'modules-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of kprobes: remove dependency on CONFIG_MODULES powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate x86/ftrace: enable dynamic ftrace without CONFIG_MODULES arch: make execmem setup available regardless of CONFIG_MODULES powerpc: extend execmem_params for kprobes allocations arm64: extend execmem_info for generated code allocations riscv: extend execmem_params for generated code allocations mm/execmem, arch: convert remaining overrides of module_alloc to execmem mm/execmem, arch: convert simple overrides of module_alloc to execmem mm: introduce execmem_alloc() and execmem_free() module: make module_memory_{alloc,free} more self-contained sparc: simplify module_alloc() nios2: define virtual address space for modules mips: module: rename MODULE_START to MODULES_VADDR arm64: module: remove unneeded call to kasan_alloc_module_shadow() kallsyms: replace deprecated strncpy with strscpy module: allow UNUSED_KSYMS_WHITELIST to be relative against objtree.	2024-05-15 14:05:08 -07:00
Mike Rapoport (IBM)	4d7b321a9c	riscv: extend execmem_params for generated code allocations The memory allocations for kprobes and BPF on RISC-V are not placed in the modules area and these custom allocations are implemented with overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END for 32 bit and slightly reorder execmem_params initialization to support both 32 and 64 bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in riscv::execmem_params and drop overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2024-05-14 00:31:43 -07:00
Jakub Kicinski	6e62702feb	Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-05-13 We've added 119 non-merge commits during the last 14 day(s) which contain a total of 134 files changed, 9462 insertions(+), 4742 deletions(-). The main changes are: 1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi. 2) Add BPF range computation improvements to the verifier in particular around XOR and OR operators, refactoring of checks for range computation and relaxing MUL range computation so that src_reg can also be an unknown scalar, from Cupertino Miranda. 3) Add support to attach kprobe BPF programs through kprobe_multi link in a session mode, meaning, a BPF program is attached to both function entry and return, the entry program can decide if the return program gets executed and the entry program can share u64 cookie value with return program. Session mode is a common use-case for tetragon and bpftrace, from Jiri Olsa. 4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf as well as BPF selftest's struct_ops handling, from Andrii Nakryiko. 5) Improvements to BPF selftests in context of BPF gcc backend, from Jose E. Marchesi & David Faust. 6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test- -style in order to retire the old test, run it in BPF CI and additionally expand test coverage, from Jordan Rife. 7) Big batch for BPF selftest refactoring in order to remove duplicate code around common network helpers, from Geliang Tang. 8) Another batch of improvements to BPF selftests to retire obsolete bpf_tcp_helpers.h as everything is available vmlinux.h, from Martin KaFai Lau. 9) Fix BPF map tear-down to not walk the map twice on free when both timer and wq is used, from Benjamin Tissoires. 10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL, from Alexei Starovoitov. 11) Change BTF build scripts to using --btf_features for pahole v1.26+, from Alan Maguire. 12) Small improvements to BPF reusing struct_size() and krealloc_array(), from Andy Shevchenko. 13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions, from Ilya Leoshkevich. 14) Extend TCP ->cong_control() callback in order to feed in ack and flag parameters and allow write-access to tp->snd_cwnd_stamp from BPF program, from Miao Xu. 15) Add support for internal-only per-CPU instructions to inline bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs, from Puranjay Mohan. 16) Follow-up to remove the redundant ethtool.h from tooling infrastructure, from Tushar Vyavahare. 17) Extend libbpf to support "module:<function>" syntax for tracing programs, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) bpf: make list_for_each_entry portable bpf: ignore expected GCC warning in test_global_func10.c bpf: disable strict aliasing in test_global_func9.c selftests/bpf: Free strdup memory in xdp_hw_metadata selftests/bpf: Fix a few tests for GCC related warnings. bpf: avoid gcc overflow warning in test_xdp_vlan.c tools: remove redundant ethtool.h from tooling infra selftests/bpf: Expand ATTACH_REJECT tests selftests/bpf: Expand getsockname and getpeername tests sefltests/bpf: Expand sockaddr hook deny tests selftests/bpf: Expand sockaddr program return value tests selftests/bpf: Retire test_sock_addr.(c\|sh) selftests/bpf: Remove redundant sendmsg test cases selftests/bpf: Migrate ATTACH_REJECT test cases selftests/bpf: Migrate expected_attach_type tests selftests/bpf: Migrate wildcard destination rewrite test selftests/bpf: Migrate sendmsg6 v4 mapped address tests selftests/bpf: Migrate sendmsg deny test cases selftests/bpf: Migrate WILDCARD_IP test selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases ... ==================== Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-13 16:41:10 -07:00
Puranjay Mohan	20a759df3b	riscv, bpf: make some atomic operations fully ordered The BPF atomic operations with the BPF_FETCH modifier along with BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements all atomic operations except BPF_CMPXCHG with relaxed ordering. Section 8.1 of the "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic Instructions" says: \| To provide more efficient support for release consistency [5], each \| atomic instruction has two bits, aq and rl, used to specify additional \| memory ordering constraints as viewed by other RISC-V harts. and \| If only the aq bit is set, the atomic memory operation is treated as \| an acquire access. \| If only the rl bit is set, the atomic memory operation is treated as a \| release access. \| \| If both the aq and rl bits are set, the atomic memory operation is \| sequentially consistent. Fix this by setting both aq and rl bits as 1 for operations with BPF_FETCH and BPF_XCHG. [1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf Fixes: `dd642ccb45` ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-05-12 17:00:06 -07:00
Xiao Wang	80c5a07ae6	riscv, bpf: Fix typo in comment We can use either "instruction" or "insn" in the comment. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-05-12 16:56:43 -07:00
Puranjay Mohan	2ddec2c80b	riscv, bpf: inline bpf_get_smp_processor_id() Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu). RISCV64 JIT output for `call bpf_get_smp_processor_id` ====================================================== Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ \| Name \| Before \| After \| % change \| \|---------------+------------------+------------------+--------------\| \| glob-arr-inc \| 1.077 ± 0.006M/s \| 1.336 ± 0.010M/s \| + 24.04% \| \| arr-inc \| 1.078 ± 0.002M/s \| 1.332 ± 0.015M/s \| + 23.56% \| \| hash-inc \| 0.494 ± 0.004M/s \| 0.653 ± 0.001M/s \| + 32.18% \| +---------------+------------------+------------------+--------------+ NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn. [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-05-12 16:54:34 -07:00
Puranjay Mohan	19c56d4e5b	riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. RISC-V uses generic per-cpu implementation where the offsets for CPUs are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores the address of the task_struct in TP register. The first element in task_struct is struct thread_info, and we can get the cpu number by reading from the TP register + offsetof(struct thread_info, cpu). Once we have the cpu number in a register we read the offset for that cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this offset to the destination register. To measure the improvement from this change, the benchmark in [1] was used on Qemu: Before: glob-arr-inc : 1.127 ± 0.013M/s arr-inc : 1.121 ± 0.004M/s hash-inc : 0.681 ± 0.052M/s After: glob-arr-inc : 1.138 ± 0.011M/s arr-inc : 1.366 ± 0.006M/s hash-inc : 0.676 ± 0.001M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-05-12 16:54:34 -07:00
Jakub Kicinski	e958da0ddb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c `66e13b615a` ("bpf: verifier: prevent userspace memory access") `d503a04f8b` ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 12:06:25 -07:00
Xu Kuohai	10541b374a	riscv, bpf: Fix incorrect runtime stats When __bpf_prog_enter() returns zero, the s1 register is not set to zero, resulting in incorrect runtime stats. Fix it by setting s1 immediately upon the return of __bpf_prog_enter(). Fixes: `49b5e77ae3` ("riscv, bpf: Add bpf trampoline support for RV64") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240416064208.2919073-3-xukuohai@huaweicloud.com	2024-04-16 17:19:41 +02:00
Puranjay Mohan	21ab0b6d0c	bpf, riscv: Implement bpf_addr_space_cast instruction LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space. rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY rY = addr_space_cast(rX, 1, 0) has to be converted by JIT. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240404114203.105970-3-puranjay12@gmail.com	2024-04-04 16:48:42 +02:00
Puranjay Mohan	633a6e01d1	bpf, riscv: Implement PROBE_MEM32 pseudo instructions Add support for [LDX \| STX \| ST], PROBE_MEM32, [B \| H \| W \| DW] instructions. They are similar to PROBE_MEM instructions with the following differences: - PROBE_MEM32 supports store. - PROBE_MEM32 relies on the verifier to clear upper 32-bit of the src/dst register - PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in S7 in the prologue). Due to bpf_arena constructions such S7 + reg + off16 access is guaranteed to be within arena virtual range, so no address check at run-time. - S11 is a free callee-saved register, so it is used to store kern_vm_start - PROBE_MEM32 allows STX and ST. If they fault the store is a nop. When LDX faults the destination register is zeroed. To support these on riscv, we do tmp = S7 + src/dst reg and then use tmp2 as the new src/dst register. This allows us to reuse most of the code for normal [LDX \| STX \| ST]. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240404114203.105970-2-puranjay12@gmail.com	2024-04-04 16:48:10 +02:00
Pu Lehui	443574b033	riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi We encountered a failing case when running selftest in no_alu32 mode: The failure case is `kfunc_call/kfunc_call_test4` and its source code is like bellow: ``` long bpf_kfunc_call_test4(signed char a, short b, int c, long d) __ksym; int kfunc_call_test4(struct __sk_buff *skb) { ... tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000); ... } ``` And its corresponding asm code is: ``` 0: r1 = -3 1: r2 = -30 2: r3 = 0xffffff38 # opcode: 18 03 00 00 38 ff ff ff 00 00 00 00 00 00 00 00 4: r4 = -1000 5: call bpf_kfunc_call_test4 ``` insn 2 is parsed to ld_imm64 insn to emit 0x00000000ffffff38 imm, and converted to int type and then send to bpf_kfunc_call_test4. But since it is zero-extended in the bpf calling convention, riscv jit will directly treat it as an unsigned 32-bit int value, and then fails with the message "actual 4294966063 != expected -1234". The reason is the incompatibility between bpf and riscv abi, that is, bpf will do zero-extension on uint, but riscv64 requires sign-extension on int or uint. We can solve this problem by sign extending the 32-bit parameters in kfunc. The issue is related to [0], and thanks to Yonghong and Alexei. Link: https://github.com/llvm/llvm-project/pull/84874 [0] Fixes: `d40c3847b4` ("riscv, bpf: Add kfunc support for RV64") Signed-off-by: Pu Lehui <pulehui@huawei.com> Tested-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240324103306.2202954-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-25 11:39:31 -07:00
Puranjay Mohan	e63985ecd2	bpf, riscv64/cfi: Support kCFI + BPF on riscv64 The riscv BPF JIT doesn't emit proper kCFI prologues for BPF programs and struct_ops trampolines when CONFIG_CFI_CLANG is enabled. This causes CFI failures when calling BPF programs and can even crash the kernel due to invalid memory accesses. Example crash: root@rv-selftester:~/bpf# ./test_progs -a dummy_st_ops Unable to handle kernel paging request at virtual address ffffffff78204ffc Oops [#1] Modules linked in: bpf_testmod(OE) [....] CPU: 3 PID: 356 Comm: test_progs Tainted: P OE 6.8.0-rc1 #1 Hardware name: riscv-virtio,qemu (DT) epc : bpf_struct_ops_test_run+0x28c/0x5fc ra : bpf_struct_ops_test_run+0x26c/0x5fc epc : ffffffff82958010 ra : ffffffff82957ff0 sp : ff200000007abc80 gp : ffffffff868d6218 tp : ff6000008d87b840 t0 : 000000000000000f t1 : 0000000000000000 t2 : 000000002005793e s0 : ff200000007abcf0 s1 : ff6000008a90fee0 a0 : 0000000000000000 a1 : 0000000000000000 a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 a5 : ffffffff868dba26 a6 : 0000000000000001 a7 : 0000000052464e43 s2 : 00007ffffc0a95f0 s3 : ff6000008a90fe80 s4 : ff60000084c24c00 s5 : ffffffff78205000 s6 : ff60000088750648 s7 : ff20000000035008 s8 : fffffffffffffff4 s9 : ffffffff86200610 s10: 0000000000000000 s11: 0000000000000000 t3 : ffffffff8483dc30 t4 : ffffffff8483dc10 t5 : ffffffff8483dbf0 t6 : ffffffff8483dbd0 status: 0000000200000120 badaddr: ffffffff78204ffc cause: 000000000000000d [<ffffffff82958010>] bpf_struct_ops_test_run+0x28c/0x5fc [<ffffffff805083ee>] bpf_prog_test_run+0x170/0x548 [<ffffffff805029c8>] __sys_bpf+0x2d2/0x378 [<ffffffff804ff570>] __riscv_sys_bpf+0x5c/0x120 [<ffffffff8000e8fe>] syscall_handler+0x62/0xe4 [<ffffffff83362df6>] do_trap_ecall_u+0xc6/0x27c [<ffffffff833822c4>] ret_from_exception+0x0/0x64 Code: b603 0109 b683 0189 b703 0209 8493 0609 157d 8d65 (a303) ffca ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Implement proper kCFI prologues for the BPF programs and callbacks and drop __nocfi for riscv64. Fix the trampoline generation code to emit kCFI prologue when a struct_ops trampoline is being prepared. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240303170207.82201-2-puranjay12@gmail.com	2024-03-06 15:18:16 -08:00

1 2 3

126 Commits