linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-06 23:09:24 -04:00

Author	SHA1	Message	Date
Eduard Zingerman	fc2915bb8b	selftests/bpf: More precise cpu_mitigations state detection test_progs and test_verifier binaries execute unpriv tests under the following conditions: - unpriv BPF is enabled; - CPU mitigations are enabled (see [1] for details). The detection of the "mitigations enabled" state is performed by unpriv_helpers.c:get_mitigations_off() via inspecting kernel boot command line, looking for a parameter "mitigations=off". Such detection scheme won't work for certain configurations, e.g. when CONFIG_CPU_MITIGATIONS is disabled and boot parameter is not supplied. Miss-detection leads to test_progs executing tests meant to be run only with mitigations enabled, e.g. verifier_and.c:known_subreg_with_unknown_reg(), and reporting false failures. Internally, verifier sets bpf_verifier_env->bypass_spec_{v1,v4} basing on the value returned by kernel/cpu.c:cpu_mitigations_off(). This function is backed by a variable kernel/cpu.c:cpu_mitigations. This state is not fully introspect-able via sysfs. The closest proxy is /sys/devices/system/cpu/vulnerabilities/spectre_v1, but it reports "vulnerable" state only if mitigations are disabled and current cpu is vulnerable, while verifier does not check cpu state. There are only two ways the kernel/cpu.c:cpu_mitigations can be set: - via boot parameter; - via CONFIG_CPU_MITIGATIONS option. This commit updates unpriv_helpers.c:get_mitigations_off() to scan /boot/config-$(uname -r) and /proc/config.gz for CONFIG_CPU_MITIGATIONS value in addition to boot command line check. Tested using the following configurations: - mitigations enabled (unpriv tests are enabled) - mitigations disabled via boot cmdline (unpriv tests skipped) - mitigations disabled via CONFIG_CPU_MITIGATIONS (unpriv tests skipped) [1] https://lore.kernel.org/bpf/20231025031144.5508-1-laoar.shao@gmail.com/ Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250617005710.1066165-2-eddyz87@gmail.com	2025-06-17 13:23:49 -07:00
Yonghong Song	a633dab4b4	selftests/bpf: Fix RELEASE build failure with gcc14 With gcc14, when building with RELEASE=1, I hit four below compilation failure: Error 1: In file included from test_loader.c:6: test_loader.c: In function ‘run_subtest’: test_progs.h:194:17: error: ‘retval’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 194 \| fprintf(stdout, ##format); \ \| ^~~~~~~ test_loader.c:958:13: note: ‘retval’ was declared here 958 \| int retval, err, i; \| ^~~~~~ The uninitialized var 'retval' actually could cause incorrect result. Error 2: In function ‘test_fd_array_cnt’: prog_tests/fd_array.c:71:14: error: ‘btf_id’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 71 \| fd = bpf_btf_get_fd_by_id(id); \| ^~~~~~~~~~~~~~~~~~~~~~~~ prog_tests/fd_array.c:302:15: note: ‘btf_id’ was declared here 302 \| __u32 btf_id; \| ^~~~~~ Changing ASSERT_GE to ASSERT_EQ can fix the compilation error. Otherwise, there is no functionality change. Error 3: prog_tests/tailcalls.c: In function ‘test_tailcall_hierarchy_count’: prog_tests/tailcalls.c:1402:23: error: ‘fentry_data_fd’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1402 \| err = bpf_map_lookup_elem(fentry_data_fd, &i, &val); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The code is correct. The change intends to silence gcc errors. Error 4: (this error only happens on arm64) In file included from prog_tests/log_buf.c:4: prog_tests/log_buf.c: In function ‘bpf_prog_load_log_buf’: ./test_progs.h:390:22: error: ‘log_buf’ may be used uninitialized [-Werror=maybe-uninitialized] 390 \| int ___err = libbpf_get_error(___res); \ \| ^~~~~~~~~~~~~~~~~~~~~~~~ prog_tests/log_buf.c:158:14: note: in expansion of macro ‘ASSERT_OK_PTR’ 158 \| if (!ASSERT_OK_PTR(log_buf, "log_buf_alloc")) \| ^~~~~~~~~~~~~ In file included from selftests/bpf/tools/include/bpf/bpf.h:32, from ./test_progs.h:36: selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17: note: by argument 1 of type ‘const void ’ to ‘libbpf_get_error’ declared here 113 \| LIBBPF_API long libbpf_get_error(const void ptr); \| ^~~~~~~~~~~~~~~~ Adding a pragma to disable maybe-uninitialized fixed the issue. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250617044956.2686668-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-17 10:18:30 -07:00
Eduard Zingerman	4a4b84ba9e	selftests/bpf: verify jset handling in CFG computation A test case to check if both branches of jset are explored when computing program CFG. At 'if r1 & 0x7 ...': - register 'r2' is computed alive only if jump branch of jset instruction is followed; - register 'r0' is computed alive only if fallthrough branch of jset instruction is followed. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250613175331.3238739-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-13 11:51:19 -07:00
Eduard Zingerman	67cdcc405b	veristat: Memory accounting for bpf programs This commit adds a new field mem_peak / "Peak memory (MiB)" field to a set of gathered statistics. The field is intended as an estimate for peak verifier memory consumption for processing of a given program. Mechanically stat is collected as follows: - At the beginning of handle_verif_mode() a new cgroup is created and veristat process is moved into this cgroup. - At each program load: - bpf_object__load() is split into bpf_object__prepare() and bpf_object__load() to avoid accounting for memory allocated for maps; - before bpf_object__load(): - a write to "memory.peak" file of the new cgroup is used to reset cgroup statistics; - updated value is read from "memory.peak" file and stashed; - after bpf_object__load() "memory.peak" is read again and difference between new and stashed values is used as a metric. If any of the above steps fails veristat proceeds w/o collecting mem_peak information for a program, reporting mem_peak as -1. While memcg provides data in bytes (converted from pages), veristat converts it to megabytes to avoid jitter when comparing results of different executions. The change has no measurable impact on veristat running time. A correlation between "Peak states" and "Peak memory" fields provides a sanity check for gathered statistics, e.g. a sample of data for sched_ext programs: Program Peak states Peak memory (MiB) ------------------------ ----------- ----------------- lavd_select_cpu 2153 44 lavd_enqueue 1982 41 lavd_dispatch 3480 28 layered_dispatch 1417 17 layered_enqueue 760 11 lavd_cpu_offline 349 6 lavd_cpu_online 349 6 lavd_init 394 6 rusty_init 350 5 layered_select_cpu 391 4 ... rusty_stopping 134 1 arena_topology_node_init 170 0 Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250613072147.3938139-3-eddyz87@gmail.com	2025-06-13 10:44:12 -07:00
Song Liu	ccefa19335	bpf/veristat: Fix veristat for map type BPF_MAP_TYPE_CGRP_STORAGE BPF_MAP_TYPE_CGRP_STORAGE doesn't allow non-zero max_entries. So veristat should not set it to 1. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250613050001.1058733-1-song@kernel.org	2025-06-13 10:08:22 -07:00
Ruslan Semchenko	af91af33c1	tools/bpf_jit_disasm: Fix potential negative tpath index in get_exec_path() If readlink() fails, len will be -1, which can cause negative indexing and undefined behavior. This patch ensures that len is set to 0 on readlink failure, preventing such issues. Signed-off-by: Ruslan Semchenko <uncleruc2075@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250612131816.1870-1-uncleruc2075@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-12 19:11:13 -07:00
Yonghong Song	44df9e0d4e	selftests/bpf: Fix xdp_do_redirect failure with 64KB page size On arm64 with 64KB page size, the selftest xdp_do_redirect failed like below: ... test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec test_max_pkt_size:PASS:prog_run_max_size 0 nsec test_max_pkt_size:FAIL:prog_run_too_big unexpected prog_run_too_big: actual -28 != expected -22 With 64KB page size, the xdp frame size will be much bigger so the existing test will fail. Adjust various parameters so the test can also work on 64K page size. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612035042.2208630-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-12 19:07:51 -07:00
Yonghong Song	96fcf7e7a7	selftests/bpf: Fix two net related test failures with 64K page size When running BPF selftests on arm64 with a 64K page size, I encountered the following two test failures: sockmap_basic/sockmap skb_verdict change tail:FAIL tc_change_tail:FAIL With further debugging, I identified the root cause in the following kernel code within __bpf_skb_change_tail(): u32 max_len = BPF_SKB_MAX_LEN; u32 min_len = __bpf_skb_min_len(skb); int ret; if (unlikely(flags \|\| new_len > max_len \|\| new_len < min_len)) return -EINVAL; With a 4K page size, new_len = 65535 and max_len = 16064, the function returns -EINVAL. However, With a 64K page size, max_len increases to 261824, allowing execution to proceed further in the function. This is because BPF_SKB_MAX_LEN scales with the page size and larger page sizes result in higher max_len values. Updating the new_len parameter in both tests based on actual kernel page size resolved both failures. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612035037.2207911-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-12 19:07:51 -07:00
Yonghong Song	4fc012daf9	bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on arm64 with 64KB page: xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on when constructing frags, with 64K page size, the frag data_len could be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail(). To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel can test different page size properly. With the kernel change, the user space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default value is 17, so for 4K page, the maximum packet size will be less than 68K. To test 64K page, a bigger maximum packet size than 68K is desired. So two different functions are implemented for subtest xdp_adjust_frags_tail_grow. Depending on different page size, different data input/output sizes are used to adapt with different page size. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-12 19:07:51 -07:00
Fushuai Wang	6a4bd31f68	selftests/bpf: fix signedness bug in redir_partial() When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly treats it as success due to unsigned promotion. Explicitly check for -1 first. Fixes: `a4b7193d8e` ("selftests/bpf: Add sockmap test for redirecting partial skb data") Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Link: https://lore.kernel.org/r/20250612084208.27722-1-wangfushuai@baidu.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-12 16:52:43 -07:00
Eduard Zingerman	5159482fdb	selftests/bpf: tests with a loop state missing read/precision mark The test case absent_mark_in_the_middle_state is equivalent of the following C program: 1: r8 = bpf_get_prandom_u32(); 2: r6 = -32; 3: bpf_iter_num_new(&fp[-8], 0, 10); 4: if (unlikely(bpf_get_prandom_u32())) 5: r6 = -31; 6: for (;;) { 7: if (!bpf_iter_num_next(&fp[-8])) 8: break; 9: if (unlikely(bpf_get_prandom_u32())) 10: (u64 )(fp + r6) = 7; 11: } 12: bpf_iter_num_destroy(&fp[-8]); 13: return 0; W/o a fix that instructs verifier to ignore branches count for loop entries verification proceeds as follows: - 1-4, state is {r6=-32,fp-8=active}; - 6, checkpoint A is created with {r6=-32,fp-8=active}; - 7, checkpoint B is created with {r6=-32,fp-8=active}, push state {r6=-32,fp-8=active} from 7 to 9; - 8,12,13, {r6=-32,fp-8=drained}, exit; - pop state with {r6=-32,fp-8=active} from 7 to 9; - 9, push state {r6=-32,fp-8=active} from 9 to 10; - 6, checkpoint C is created with {r6=-32,fp-8=active}; - 7, checkpoint A is hit, no precision propagated for r6 to C; - pop state {r6=-32,fp-8=active} from 9 to 10; - 10, state is {r6=-31,fp-8=active}, r6 is marked as read and precise, these marks are propagated to checkpoints A and B (but not C, as it is not the parent of current state; - 6, {r6=-31,fp-8=active} checkpoint C is hit, because r6 is not marked precise for this checkpoint; - the program is accepted, despite a possibility of unaligned u64 stack access at offset -31. The test case absent_mark_in_the_middle_state2 is similar except the following change: r8 = bpf_get_prandom_u32(); r6 = -32; bpf_iter_num_new(&fp[-8], 0, 10); if (unlikely(bpf_get_prandom_u32())) { r6 = -31; + jump_into_loop: + goto +0; + goto loop; + } + if (unlikely(bpf_get_prandom_u32())) + goto jump_into_loop; + loop: for (;;) { if (!bpf_iter_num_next(&fp[-8])) break; if (unlikely(bpf_get_prandom_u32())) (u64 )(fp + r6) = 7; } bpf_iter_num_destroy(&fp[-8]) return 0 The goal is to check that read/precision marks are propagated to checkpoint created at 'goto +0' that resides outside of the loop. The test case absent_mark_in_the_middle_state3 is a bit different and is equivalent to the C program below: int absent_mark_in_the_middle_state3(void) { bpf_iter_num_new(&fp[-8], 0, 10) loop1(-32, &fp[-8]) loop1_wrapper(&fp[-8]) bpf_iter_num_destroy(&fp[-8]) } int loop1(num, iter) { while (bpf_iter_num_next(iter)) { if (unlikely(bpf_get_prandom_u32())) *(fp + num) = 7; } return 0 } int loop1_wrapper(iter) { r6 = -32; if (unlikely(bpf_get_prandom_u32())) r6 = -31; loop1(r6, iter); return 0; } The unsafe state is reached in a similar manner, but the loop is located inside a subprogram that is called from two locations in the main subprogram. This detail is important for exercising bpf_scc_visit->backedges memory management. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-11-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-12 16:52:43 -07:00
Yonghong Song	517b088a84	selftests/bpf: Fix cgroup_mprog_ordering failure due to uninitialized variable On arm64, the cgroup_mprog_ordering selftest failed with test_progs run when building with clang compiler. The reason is due to socklen_t optlen not initialized. In kernel function do_ip_getsockopt(), we have if (copy_from_sockptr(&len, optlen, sizeof(int))) return -EFAULT; if (len < 0) return -EINVAL; The above 'len' variable is a negative value and hence the test failed. But the test is okay on x86_64. I checked the x86_64 asm code and I didn't see explicit initialization of 'optlen' but its value is 0 so kernel didn't return error. This should be a pure luck. Fix the bug by initializing 'oplen' var properly. Fixes: `e422d5f118` ("selftests/bpf: Add two selftests for mprog API based cgroup progs") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250611162103.1623692-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-11 12:32:50 -07:00
Tobias Klauser	2d72dd14d7	bpf: adjust path to trace_output sample eBPF program The sample file was renamed from trace_output_kern.c to trace_output.bpf.c in commit `d4fffba4d0` ("samples/bpf: Change _kern suffix to .bpf with syscall tracing program"). Adjust the path in the documentation comment for bpf_perf_event_output. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Link: https://lore.kernel.org/r/20250610140756.16332-1-tklauser@distanz.ch Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-10 08:10:21 -07:00
Luis Gerhorst	4a8765d9a5	selftests/bpf: Add test for Spectre v1 mitigation This is based on the gadget from the description of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"). Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250603212814.338867-1-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-09 20:11:10 -07:00
Luis Gerhorst	d6f1c85f22	bpf: Fall back to nospec for Spectre v1 This implements the core of the series and causes the verifier to fall back to mitigating Spectre v1 using speculation barriers. The approach was presented at LPC'24 [1] and RAID'24 [2]. If we find any forbidden behavior on a speculative path, we insert a nospec (e.g., lfence speculation barrier on x86) before the instruction and stop verifying the path. While verifying a speculative path, we can furthermore stop verification of that path whenever we encounter a nospec instruction. A minimal example program would look as follows: A = true B = true if A goto e f() if B goto e unsafe() e: exit There are the following speculative and non-speculative paths (`cur->speculative` and `speculative` referring to the value of the push_stack() parameters): - A = true - B = true - if A goto e - A && !cur->speculative && !speculative - exit - !A && !cur->speculative && speculative - f() - if B goto e - B && cur->speculative && !speculative - exit - !B && cur->speculative && speculative - unsafe() If f() contains any unsafe behavior under Spectre v1 and the unsafe behavior matches `state->speculative && error_recoverable_with_nospec(err)`, do_check() will now add a nospec before f() instead of rejecting the program: A = true B = true if A goto e nospec f() if B goto e unsafe() e: exit Alternatively, the algorithm also takes advantage of nospec instructions inserted for other reasons (e.g., Spectre v4). Taking the program above as an example, speculative path exploration can stop before f() if a nospec was inserted there because of Spectre v4 sanitization. In this example, all instructions after the nospec are dead code (and with the nospec they are also dead code speculatively). For this, it relies on the fact that speculation barriers generally prevent all later instructions from executing if the speculation was not correct: * On Intel x86_64, lfence acts as full speculation barrier, not only as a load fence [3]: An LFENCE instruction or a serializing instruction will ensure that no later instructions execute, even speculatively, until all prior instructions complete locally. [...] Inserting an LFENCE instruction after a bounds check prevents later operations from executing before the bound check completes. This was experimentally confirmed in [4]. * On AMD x86_64, lfence is dispatch-serializing [5] (requires MSR C001_1029[1] to be set if the MSR is supported, this happens in init_amd()). AMD further specifies "A dispatch serializing instruction forces the processor to retire the serializing instruction and all previous instructions before the next instruction is executed" [8]. As dispatch is not specific to memory loads or branches, lfence therefore also affects all instructions there. Also, if retiring a branch means it's PC change becomes architectural (should be), this means any "wrong" speculation is aborted as required for this series. * ARM's SB speculation barrier instruction also affects "any instruction that appears later in the program order than the barrier" [6]. * PowerPC's barrier also affects all subsequent instructions [7]: [...] executing an ori R31,R31,0 instruction ensures that all instructions preceding the ori R31,R31,0 instruction have completed before the ori R31,R31,0 instruction completes, and that no subsequent instructions are initiated, even out-of-order, until after the ori R31,R31,0 instruction completes. The ori R31,R31,0 instruction may complete before storage accesses associated with instructions preceding the ori R31,R31,0 instruction have been performed Regarding the example, this implies that `if B goto e` will not execute before `if A goto e` completes. Once `if A goto e` completes, the CPU should find that the speculation was wrong and continue with `exit`. If there is any other path that leads to `if B goto e` (and therefore `unsafe()`) without going through `if A goto e`, then a nospec will still be needed there. However, this patch assumes this other path will be explored separately and therefore be discovered by the verifier even if the exploration discussed here stops at the nospec. This patch furthermore has the unfortunate consequence that Spectre v1 mitigations now only support architectures which implement BPF_NOSPEC. Before this commit, Spectre v1 mitigations prevented exploits by rejecting the programs on all architectures. Because some JITs do not implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's security to a limited extent: * The regression is limited to systems vulnerable to Spectre v1, have unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The latter is not the case for x86 64- and 32-bit, arm64, and powerpc 64-bit and they are therefore not affected by the regression. According to commit `a6f6a95f25` ("LoongArch, bpf: Fix jit to skip speculation barrier opcode"), LoongArch is not vulnerable to Spectre v1 and therefore also not affected by the regression. * To the best of my knowledge this regression may therefore only affect MIPS. This is deemed acceptable because unpriv BPF is still disabled there by default. As stated in a previous commit, BPF_NOSPEC could be implemented for MIPS based on GCC's speculation_barrier implementation. * It is unclear which other architectures (besides x86 64- and 32-bit, ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel are vulnerable to Spectre v1. Also, it is not clear if barriers are available on these architectures. Implementing BPF_NOSPEC on these architectures therefore is non-trivial. Searching GCC and the kernel for speculation barrier implementations for these architectures yielded no result. * If any of those regressed systems is also vulnerable to Spectre v4, the system was already vulnerable to Spectre v4 attacks based on unpriv BPF before this patch and the impact is therefore further limited. As an alternative to regressing security, one could still reject programs if the architecture does not emit BPF_NOSPEC (e.g., by removing the empty BPF_NOSPEC-case from all JITs except for LoongArch where it appears justified). However, this will cause rejections on these archs that are likely unfounded in the vast majority of cases. In the tests, some are now successful where we previously had a false-positive (i.e., rejection). Change them to reflect where the nospec should be inserted (using __xlated_unpriv) and modify the error message if the nospec is able to mitigate a problem that previously shadowed another problem (in that case __xlated_unpriv does not work, therefore just add a comment). Define SPEC_V1 to avoid duplicating this ifdef whenever we check for nospec insns using __xlated_unpriv, define it here once. This also improves readability. PowerPC can probably also be added here. However, omit it for now because the BPF CI currently does not include a test. Limit it to EPERM, EACCES, and EINVAL (and not everything except for EFAULT and ENOMEM) as it already has the desired effect for most real-world programs. Briefly went through all the occurrences of EPERM, EINVAL, and EACCESS in verifier.c to validate that catching them like this makes sense. Thanks to Dustin for their help in checking the vendor documentation. [1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") [3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html ("Managed Runtime Speculative Execution Side Channel Mitigations") [4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a tool to analyze speculative execution attacks and mitigations" - Section 4.6 "Stopping Speculative Execution") [5] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf ("White Paper - SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD PROCESSORS - REVISION 5.09.23") [6] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB--Speculation-Barrier- ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set Architecture (2020-12)") [7] https://wiki.raptorcs.com/w/images/5/5f/OPF_PowerISA_v3.1C.pdf ("Power ISA™ - Version 3.1C - May 26, 2024 - Section 9.2.1 of Book III") [8] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf ("AMD64 Architecture Programmer’s Manual Volumes 1–5 - Revision 4.08 - April 2024 - 7.6.4 Serializing Instructions") Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Henriette Herzog <henriette.herzog@rub.de> Cc: Dustin Nguyen <nguyen@cs.fau.de> Cc: Maximilian Ott <ott@cs.fau.de> Cc: Milan Stephan <milan.stephan@fau.de> Link: https://lore.kernel.org/r/20250603212428.338473-1-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-09 20:11:10 -07:00
Tao Chen	ad954cbe08	bpftool: Display cookie for tracing link probe Display cookie for tracing link probe, in plain mode: #bpftool link 5: tracing prog 34 prog_type tracing attach_type trace_fentry target_obj_id 1 target_btf_id 60355 cookie 4503599627370496 pids test_progs(176) And in json mode: #bpftool link -j \| jq { "id": 5, "type": "tracing", "prog_id": 34, "prog_type": "tracing", "attach_type": "trace_fentry", "target_obj_id": 1, "target_btf_id": 60355, "cookie": 4503599627370496, "pids": [ { "pid": 176, "comm": "test_progs" } ] } Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-3-chen.dylane@linux.dev	2025-06-09 16:45:17 -07:00
Tao Chen	d77efc0ef5	selftests/bpf: Add cookies check for tracing fill_link_info test Adding tests for getting cookie with fill_link_info for tracing. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-2-chen.dylane@linux.dev	2025-06-09 16:45:17 -07:00
Tao Chen	c7beb48344	bpf: Add cookie to tracing bpf_link_info bpf_tramp_link includes cookie info, we can add it in bpf_link_info. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-1-chen.dylane@linux.dev	2025-06-09 16:45:17 -07:00
Ihor Solodrai	260b862918	selftests/bpf: Add test cases with CONST_PTR_TO_MAP null checks A test requires the following to happen: * CONST_PTR_TO_MAP value is checked for null * the code in the null branch fails verification Add test cases: * direct global map_ptr comparison to null * lookup inner map, then two checks (the first transforms map_value_or_null into map_ptr) * lookup inner map, spill-fill it, then check for null * use an array of ringbufs to recreate a common coding pattern [1] [1] https://lore.kernel.org/bpf/CAEf4BzZNU0gX_sQ8k8JaLe1e+Veth3Rk=4x7MDhv=hQxvO8EDw@mail.gmail.com/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250609183024.359974-4-isolodrai@meta.com	2025-06-09 16:42:04 -07:00
Ihor Solodrai	eb6c992784	selftests/bpf: Add cmp_map_pointer_with_const test Add a test for CONST_PTR_TO_MAP comparison with a non-0 constant. A BPF program with this code must not pass verification in unpriv. Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250609183024.359974-3-isolodrai@meta.com	2025-06-09 16:42:04 -07:00
Ihor Solodrai	5534e58f2e	bpf: Make reg_not_null() true for CONST_PTR_TO_MAP When reg->type is CONST_PTR_TO_MAP, it can not be null. However the verifier explores the branches under rX == 0 in check_cond_jmp_op() even if reg->type is CONST_PTR_TO_MAP, because it was not checked for in reg_not_null(). Fix this by adding CONST_PTR_TO_MAP to the set of types that are considered non nullable in reg_not_null(). An old "unpriv: cmp map pointer with zero" selftest fails with this change, because now early out correctly triggers in check_cond_jmp_op(), making the verification to pass. In practice verifier may allow pointer to null comparison in unpriv, since in many cases the relevant branch and comparison op are removed as dead code. So change the expected test result to __success_unpriv. Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250609183024.359974-2-isolodrai@meta.com	2025-06-09 16:42:04 -07:00
Yonghong Song	e422d5f118	selftests/bpf: Add two selftests for mprog API based cgroup progs Two tests are added: - cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link attach are tested. Some negative tests are also included. - cgroup_mprog_ordering, which actually runs the program with some mprog API flags. [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163156.2429955-1-yonghong.song@linux.dev	2025-06-09 16:28:31 -07:00
Yonghong Song	c1bb68656b	selftests/bpf: Move some tc_helpers.h functions to test_progs.h Move static inline functions id_from_prog_fd() and id_from_link_fd() from prog_tests/tc_helpers.h to test_progs.h so these two functions can be reused for later cgroup mprog selftests. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163151.2429325-1-yonghong.song@linux.dev	2025-06-09 16:28:30 -07:00
Yonghong Song	1d6711667c	libbpf: Support link-based cgroup attach with options Currently libbpf supports bpf_program__attach_cgroup() with signature: LIBBPF_API struct bpf_link * bpf_program__attach_cgroup(const struct bpf_program prog, int cgroup_fd); To support mprog style attachment, additionsl fields like flags, relative_{fd,id} and expected_revision are needed. Add a new API: LIBBPF_API struct bpf_link bpf_program__attach_cgroup_opts(const struct bpf_program prog, int cgroup_fd, const struct bpf_cgroup_opts opts); where bpf_cgroup_opts contains all above needed fields. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163146.2429212-1-yonghong.song@linux.dev	2025-06-09 16:28:30 -07:00
Yonghong Song	1209339844	bpf: Implement mprog API on top of existing cgroup progs Current cgroup prog ordering is appending at attachment time. This is not ideal. In some cases, users want specific ordering at a particular cgroup level. To address this, the existing mprog API seems an ideal solution with supporting BPF_F_BEFORE and BPF_F_AFTER flags. But there are a few obstacles to directly use kernel mprog interface. Currently cgroup bpf progs already support prog attach/detach/replace and link-based attach/detach/replace. For example, in struct bpf_prog_array_item, the cgroup_storage field needs to be together with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog as the member, which makes it difficult to use kernel mprog interface. In another case, the current cgroup prog detach tries to use the same flag as in attach. This is different from mprog kernel interface which uses flags passed from user space. So to avoid modifying existing behavior, I made the following changes to support mprog API for cgroup progs: - The support is for prog list at cgroup level. Cross-level prog list (a.k.a. effective prog list) is not supported. - Previously, BPF_F_PREORDER is supported only for prog attach, now BPF_F_PREORDER is also supported by link-based attach. - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported similar to kernel mprog but with different implementation. - For detach and replace, use the existing implementation. - For attach, detach and replace, the revision for a particular prog list, associated with a particular attach type, will be updated by increasing count by 1. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163141.2428937-1-yonghong.song@linux.dev	2025-06-09 16:28:28 -07:00
Yonghong Song	bbc7bd658d	selftests/bpf: Fix a user_ringbuf failure with arm64 64KB page size The ringbuf max_entries must be PAGE_ALIGNED. See kernel function ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries properly. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250607013626.1553001-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-06 19:21:43 -07:00
Yonghong Song	8c8c5e3c85	selftests/bpf: Fix ringbuf/ringbuf_write test failure with arm64 64KB page size The ringbuf max_entries must be PAGE_ALIGNED. See kernel function ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries and other related metrics properly. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250607013621.1552332-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-06 19:21:43 -07:00
Yonghong Song	377d371590	selftests/bpf: Fix bpf_mod_race test failure with arm64 64KB page size Currently, uffd_register.range.len is set to 4096 for command 'ioctl(uffd, UFFDIO_REGISTER, &uffd_register)'. For arm64 64KB page size, the len must be 64KB size aligned as page size alignment is required. See fs/userfaultfd.c:validate_unaligned_range(). Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250607013615.1551783-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-06 19:21:43 -07:00
Yonghong Song	ae8824037a	selftests/bpf: Reduce test_xdp_adjust_frags_tail_grow logs For selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow, if tested failure, I see a long list of log output like ... test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec ... There are total 7374 lines of the above which is too much. Let us only issue such logs when it is an assert failure. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250607013610.1551399-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-06 19:21:43 -07:00
Rong Tao	64a064ce33	selftests/bpf: rbtree: Fix incorrect global variable usage Within __add_three() function, should use function parameters instead of global variables. So that the variables groot_nested.inner.root and groot_nested.inner.glock in rbtree_add_nodes_nested() are tested correctly. Signed-off-by: Rong Tao <rongtao@cestc.cn> Link: https://lore.kernel.org/r/tencent_3DD7405C0839EBE2724AC5FA357B5402B105@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-05 13:55:26 -07:00
Blake Jones	a570f386f3	Tests for the ".emit_strings" functionality in the BTF dumper. When this mode is turned on, "emit_zeroes" and "compact" have no effect, and embedded NUL characters always terminate printing of an array. Signed-off-by: Blake Jones <blakejones@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250603203701.520541-2-blakejones@google.com	2025-06-05 13:45:16 -07:00
Blake Jones	87c9c79a02	libbpf: Add support for printing BTF character arrays as strings The BTF dumper code currently displays arrays of characters as just that - arrays, with each character formatted individually. Sometimes this is what makes sense, but it's nice to be able to treat that array as a string. This change adds a special case to the btf_dump functionality to allow 0-terminated arrays of single-byte integer values to be printed as character strings. Characters for which isprint() returns false are printed as hex-escaped values. This is enabled when the new ".emit_strings" is set to 1 in the btf_dump_type_data_opts structure. As an example, here's what it looks like to dump the string "hello" using a few different field values for btf_dump_type_data_opts (.compact = 1): - .emit_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',] - .emit_strings = 0, .skip_names = 1: ['h','e','l','l','o',] - .emit_strings = 1, .skip_names = 0: (char[6])"hello" - .emit_strings = 1, .skip_names = 1: "hello" Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1: - .emit_strings = 0: ['h',-1,] - .emit_strings = 1: "h\xff" Signed-off-by: Blake Jones <blakejones@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250603203701.520541-1-blakejones@google.com	2025-06-05 13:45:16 -07:00
Jiawei Zhao	919319b4ed	libbpf: Correct some typos and syntax issues in usdt doc Fix some incorrect words, such as "and" -> "an", "it's" -> "its". Fix some grammar issues, such as removing redundant "will", "would complicated" -> "would complicate". Signed-off-by: Jiawei Zhao <Phoenix500526@163.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250531095111.57824-1-Phoenix500526@163.com	2025-06-05 11:45:48 -07:00
Tao Chen	9c8827d773	bpftool: Display cookie for raw_tp link probe Display cookie for raw_tp link probe, in plain mode: #bpftool link 22: raw_tracepoint prog 14 tp 'sys_enter' cookie 23925373020405760 pids test_progs(176) And in json mode: #bpftool link -j \| jq [ { "id": 47, "type": "raw_tracepoint", "prog_id": 79, "tp_name": "sys_enter", "cookie": 23925373020405760, "pids": [ { "pid": 274, "comm": "test_progs" } ] } ] Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20250603154309.3063644-3-chen.dylane@linux.dev	2025-06-05 11:44:52 -07:00
Tao Chen	25a0d04d38	selftests/bpf: Add cookies check for raw_tp fill_link_info test Adding tests for getting cookie with fill_link_info for raw_tp. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250603154309.3063644-2-chen.dylane@linux.dev	2025-06-05 11:44:52 -07:00
Tao Chen	2fe1c59347	bpf: Add cookie to raw_tp bpf_link_info After commit `68ca5d4eeb` ("bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs"), we can show the cookie in bpf_link_info like kprobe etc. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250603154309.3063644-1-chen.dylane@linux.dev	2025-06-05 11:44:52 -07:00
Linus Torvalds	ec7714e494	Merge tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - KUnit '#[test]'s: - Support KUnit-mapped 'assert!' macros. The support that landed last cycle was very basic, and the 'assert!' macros panicked since they were the standard library ones. Now, they are mapped to the KUnit ones in a similar way to how is done for doctests, reusing the infrastructure there. With this, a failing test like: #[test] fn my_first_test() { assert_eq!(42, 43); } will report: # my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251 Expected 42 == 43 to be true, but is false # my_first_test.speed: normal not ok 1 my_first_test - Support tests with checked 'Result' return types. The return value of test functions that return a 'Result' will be checked, thus one can now easily catch errors when e.g. using the '?' operator in tests. With this, a failing test like: #[test] fn my_test() -> Result { f()?; Ok(()) } will report: # my_test: ASSERTION FAILED at rust/kernel/lib.rs:321 Expected is_test_result_ok(my_test()) to be true, but is false # my_test.speed: normal not ok 1 my_test - Add 'kunit_tests' to the prelude. - Clarify the remaining language unstable features in use. - Compile 'core' with edition 2024 for Rust >= 1.87. - Workaround 'bindgen' issue with forward references to 'enum' types. - objtool: relax slice condition to cover more 'noreturn' functions. - Use absolute paths in macros referencing 'core' and 'kernel' crates. - Skip '-mno-fdpic' flag for bindgen in GCC 32-bit arm builds. - Clean some 'doc_markdown' lint hits -- we may enable it later on. 'kernel' crate: - 'alloc' module: - 'Box': support for type coercion, e.g. 'Box<T>' to 'Box<dyn U>' if 'T' implements 'U'. - 'Vec': implement new methods (prerequisites for nova-core and binder): 'truncate', 'resize', 'clear', 'pop', 'push_within_capacity' (with new error type 'PushError'), 'drain_all', 'retain', 'remove' (with new error type 'RemoveError'), insert_within_capacity' (with new error type 'InsertError'). In addition, simplify 'push' using 'spare_capacity_mut', split 'set_len' into 'inc_len' and 'dec_len', add type invariant 'len <= capacity' and simplify 'truncate' using 'dec_len'. - 'time' module: - Morph the Rust hrtimer subsystem into the Rust timekeeping subsystem, covering delay, sleep, timekeeping, timers. This new subsystem has all the relevant timekeeping C maintainers listed in the entry. - Replace 'Ktime' with 'Delta' and 'Instant' types to represent a duration of time and a point in time. - Temporarily add 'Ktime' to 'hrtimer' module to allow 'hrtimer' to delay converting to 'Instant' and 'Delta'. - 'xarray' module: - Add a Rust abstraction for the 'xarray' data structure. This abstraction allows Rust code to leverage the 'xarray' to store types that implement 'ForeignOwnable'. This support is a dependency for memory backing feature of the Rust null block driver, which is waiting to be merged. - Set up an entry in 'MAINTAINERS' for the XArray Rust support. Patches will go to the new Rust XArray tree and then via the Rust subsystem tree for now. - Allow 'ForeignOwnable' to carry information about the pointed-to type. This helps asserting alignment requirements for the pointer passed to the foreign language. - 'container_of!': retain pointer mut-ness and add a compile-time check of the type of the first parameter ('$field_ptr'). - Support optional message in 'static_assert!'. - Add C FFI types (e.g. 'c_int') to the prelude. - 'str' module: simplify KUnit tests 'format!' macro, convert 'rusttest' tests into KUnit, take advantage of the '-> Result' support in KUnit '#[test]'s. - 'list' module: add examples for 'List', fix path of 'assert_pinned!' (so far unused macro rule). - 'workqueue' module: remove 'HasWork::OFFSET'. - 'page' module: add 'inline' attribute. 'macros' crate: - 'module' macro: place 'cleanup_module()' in '.exit.text' section. 'pin-init' crate: - Add 'Wrapper<T>' trait for creating pin-initializers for wrapper structs with a structurally pinned value such as 'UnsafeCell<T>' or 'MaybeUninit<T>'. - Add 'MaybeZeroable' derive macro to try to derive 'Zeroable', but not error if not all fields implement it. This is needed to derive 'Zeroable' for all bindgen-generated structs. - Add 'unsafe fn cast_[pin_]init()' functions to unsafely change the initialized type of an initializer. These are utilized by the 'Wrapper<T>' implementations. - Add support for visibility in 'Zeroable' derive macro. - Add support for 'union's in 'Zeroable' derive macro. - Upstream dev news: streamline CI, fix some bugs. Add new workflows to check if the user-space version and the one in the kernel tree have diverged. Use the issues tab [1] to track them, which should help folks report and diagnose issues w.r.t. 'pin-init' better. [1] https://github.com/rust-for-linux/pin-init/issues Documentation: - Testing: add docs on the new KUnit '#[test]' tests. - Coding guidelines: explain that '///' vs. '//' applies to private items too. Add section on C FFI types. - Quick Start guide: update Ubuntu instructions and split them into "25.04" and "24.04 LTS and older". And a few other cleanups and improvements" * tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (78 commits) rust: list: Fix typo `much` in arc.rs rust: check type of `$ptr` in `container_of!` rust: workqueue: remove HasWork::OFFSET rust: retain pointer mut-ness in `container_of!` Documentation: rust: testing: add docs on the new KUnit `#[test]` tests Documentation: rust: rename `#[test]`s to "`rusttest` host tests" rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s rust: str: simplify KUnit tests `format!` macro rust: str: convert `rusttest` tests into KUnit rust: add `kunit_tests` to the prelude rust: kunit: support checked `-> Result`s in KUnit `#[test]`s rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s rust: make section names plural rust: list: fix path of `assert_pinned!` rust: compile libcore with edition 2024 for 1.87+ rust: dma: add missing Markdown code span rust: task: add missing Markdown code spans and intra-doc links rust: pci: fix docs related to missing Markdown code spans rust: alloc: add missing Markdown code span rust: alloc: add missing Markdown code spans ...	2025-06-04 21:18:37 -07:00
Linus Torvalds	64980441d2	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: "Two small fixes to selftests" * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Fix selftest btf_tag/btf_type_tag_percpu_vmlinux_helper failure selftests/bpf: Fix bpf selftest build error	2025-06-04 19:46:22 -07:00
Linus Torvalds	0939bd2fcf	Merge tag 'perf-tools-for-v6.16-1-2025-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "perf report/top/annotate TUI: - Accept the left arrow key as a Zoom out if done on the first column - Show if source code toggle status in title, to help spotting bugs with the various disassemblers (capstone, llvm, objdump) - Provide feedback on unhandled hotkeys Build: - Better inform when certain features are not available with warnings in the build process and in 'perf version --build-options' or 'perf -vv' perf record: - Improve the --off-cpu code by synthesizing events for switch-out -> switch-in intervals using a BPF program. This can be fine tuned using a --off-cpu-thresh knob perf report: - Add 'tgid' sort key perf mem/c2c: - Add 'op', 'cache', 'snoop', 'dtlb' output fields - Add support for 'ldlat' on AMD IBS (Instruction Based Sampling) perf ftrace: - Use process/session specific trace settings instead of messing with the global ftrace knobs perf trace: - Implement syscall summary in BPF - Support --summary-mode=cgroup - Always print return value for syscalls returning a pid - The rseq and set_robust_list don't return a pid, just -errno perf lock contention: - Symbolize zone->lock using BTF - Add -J/--inject-delay option to estimate impact on application performance by optimization of kernel locking behavior perf stat: - Improve hybrid support for the NMI watchdog warning Symbol resolution: - Handle 'u' and 'l' symbols in /proc/kallsyms, resolving some Rust symbols - Improve Rust demangler Hardware tracing: Intel PT: - Fix PEBS-via-PT data_src - Do not default to recording all switch events - Fix pattern matching with python3 on the SQL viewer script arm64: - Fixups for the hip08 hha PMU Vendor events: - Update Intel events/metrics files for alderlake, alderlaken, arrowlake, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, clearwaterforest, elkhartlake, emeraldrapids, grandridge, graniterapids, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, lunarlake, meteorlake, nehalemep, nehalemex, rocketlake, sandybridge, sapphirerapids, sierraforest, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp, westmereep-sx python support: - Add support for event counts in the python binding, add a counting.py example perf list: - Display the PMU name associated with a perf metric in JSON perf test: - Hybrid improvements for metric value validation test - Fix LBR test by ignoring idle task - Add AMD IBS sw filter ana d'ldlat' tests - Add 'perf trace --summary-mode=cgroup' test - Add tests for the various language symbol demanglers Miscellaneous: - Allow specifying the cpu an event will be tied using '-e event/cpu=N/' - Sync various headers with the kernel sources - Add annotations to use clang's -Wthread-safety and fix some problems it detected - Make dump_stack() use perf's symbol resolution to provide better backtraces - Intel TPEBS support cleanups and fixes. TPEBS stands for Timed PEBS (Precision Event-Based Sampling), that adds timing info, the retirement latency of instructions - Various memory allocation (some detected by ASAN) and reference counting fixes - Add a 8-byte aligned PERF_RECORD_COMPRESSED2 to replace PERF_RECORD_COMPRESSED - Skip unsupported event types in perf.data files, don't stop when finding one - Improve lookups using hashmaps and binary searches" * tag 'perf-tools-for-v6.16-1-2025-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (206 commits) perf callchain: Always populate the addr_location map when adding IP perf lock contention: Reject more than 10ms delays for safety perf trace: Set errpid to false for rseq and set_robust_list perf symbol: Move demangling code out of symbol-elf.c perf trace: Always print return value for syscalls returning a pid perf script: Print PERF_AUX_FLAG_COLLISION flag perf mem: Show absolute percent in mem_stat output perf mem: Display sort order only if it's available perf mem: Describe overhead calculation in brief perf record: Fix incorrect --user-regs comments Revert "perf thread: Ensure comm_lock held for comm_list" perf test trace_summary: Skip --bpf-summary tests if no libbpf perf test intel-pt: Skip jitdump test if no libelf perf intel-tpebs: Avoid race when evlist is being deleted perf test demangle-java: Don't segv if demangling fails perf symbol: Fix use-after-free in filename__read_build_id perf pmu: Avoid segv for missing name/alias_name in wildcarding perf machine: Factor creating a "live" machine out of dwarf-unwind perf test: Add AMD IBS sw filter test perf mem: Count L2 HITM for c2c statistic ...	2025-06-03 15:11:44 -07:00
Linus Torvalds	29e9359005	Merge tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) updates from Dave Jiang: - Remove always true condition in cxl features code - Add verification of CHBS length for CXL 2.0 - Ignore interleave granularity when interleave ways is 1 - Add update addressing mising MODULE_DESCRIPTION for cxl_test - A series of cleanups/refactor to prep for AMD Zen5 translate code - Clean %pa debug printk in core/hdm.c - Documentation updates: - Update to CXL Maturity Map - Fixes to source linking in CXL documentation - CXL documentation fixes, spelling corrections - A large collection of CXL documentation for the entire CXL subsystem, including documentation on CXL related platform and firmware notes - Remove redundant code of cxlctl_get_supported_features() - Series to support CXL RAS Features - Including "Patrol Scrub Control", "Error Check Scrub", "Performance Maitenance" and "Memory Sparing". The series connects CXL to EDAC. * tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (53 commits) cxl/edac: Add CXL memory device soft PPR control feature cxl/edac: Add CXL memory device memory sparing control feature cxl/edac: Support for finding memory operation attributes from the current boot cxl/edac: Add support for PERFORM_MAINTENANCE command cxl/edac: Add CXL memory device ECS control feature cxl/edac: Add CXL memory device patrol scrub control feature cxl: Update prototype of function get_support_feature_info() EDAC: Update documentation for the CXL memory patrol scrub control feature cxl/features: Remove the inline specifier from to_cxlfs() cxl/feature: Remove redundant code of get supported features docs: ABI: Fix "firwmare" to "firmware" cxl/Documentation: Fix typo in sysfs write_bandwidth attribute path cxl: doc/linux/access-coordinates Update access coordinates calculation methods cxl: docs/platform/acpi/srat Add generic target documentation cxl: docs/platform/cdat reference documentation Documentation: Update the CXL Maturity Map cxl: Sync up the driver-api/cxl documentation cxl: docs - add self-referencing cross-links cxl: docs/allocation/hugepages cxl: docs/allocation/reclaim ...	2025-06-03 13:24:14 -07:00
Linus Torvalds	c00b285024	Merge tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel) - Fixes for Hyper-V UIO driver (Long Li) - Fixes for Hyper-V PCI driver (Michael Kelley) - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley) - Documentation updates for Hyper-V VMBus (Michael Kelley) - Enhance logging for hv_kvp_daemon (Shradha Gupta) * tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits) Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir Documentation: hyperv: Update VMBus doc with new features and info PCI: hv: Remove unnecessary flex array in struct pci_packet Drivers: hv: Remove hv_alloc/free_* helpers Drivers: hv: Use kzalloc for panic page allocation uio_hv_generic: Align ring size to system page uio_hv_generic: Use correct size for interrupt and monitor pages Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary arch/x86: Provide the CPU number in the wakeup AP callback x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap() PCI: hv: Get vPCI MSI IRQ domain from DeviceTree ACPI: irq: Introduce acpi_get_gsi_dispatcher() Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device() Drivers: hv: vmbus: Get the IRQ number from DeviceTree dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties arm64, x86: hyperv: Report the VTL the system boots in arm64: hyperv: Initialize the Virtual Trust Level field Drivers: hv: Provide arch-neutral implementation of get_vtl() Drivers: hv: Enable VTL mode for arm64 ...	2025-06-03 08:39:20 -07:00
Linus Torvalds	546b1c9e93	Merge tag 'bootconfig-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig updates from Masami Hiramatsu: - Allow overriding CFLAGS and LDFLAGS for tools/bootconfig, for example making it a static binary. * tag 'bootconfig-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tools/bootconfig: specify LDFLAGS as an argument to CC tools/bootconfig: allow overriding CFLAGS assignment	2025-06-02 17:39:24 -07:00
Linus Torvalds	fd1f847350	Merge tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "zram: support algorithm-specific parameters" from Sergey Senozhatsky adds infrastructure for passing algorithm-specific parameters into zram. A single parameter `winbits' is implemented at this time. - "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg charging nmi-safe, which is required by BFP, which can operate in NMI context. - "Some random fixes and cleanup to shmem" from Kemeng Shi implements small fixes and cleanups in the shmem code. - "Skip mm selftests instead when kernel features are not present" from Zi Yan fixes some issues in the MM selftest code. - "mm/damon: build-enable essential DAMON components by default" from SeongJae Park reworks DAMON Kconfig to make it easier to enable CONFIG_DAMON. - "sched/numa: add statistics of numa balance task migration" from Libo Chen adds more info into sysfs and procfs files to improve visibility into the NUMA balancer's task migration activity. - "selftests/mm: cow and gup_longterm cleanups" from Mark Brown provides various updates to some of the MM selftests to make them play better with the overall containing framework. * tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits) mm/khugepaged: clean up refcount check using folio_expected_ref_count() selftests/mm: fix test result reporting in gup_longterm selftests/mm: report unique test names for each cow test selftests/mm: add helper for logging test start and results selftests/mm: use standard ksft_finished() in cow and gup_longterm selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled sched/numa: add statistics of numa balance task sched/numa: fix task swap by skipping kernel threads tools/testing: check correct variable in open_procmap() tools/testing/vma: add missing function stub mm/gup: update comment explaining why gup_fast() disables IRQs selftests/mm: two fixes for the pfnmap test mm/khugepaged: fix race with folio split/free using temporary reference mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order mmu_notifiers: remove leftover stub macros selftests/mm: deduplicate test names in madv_populate kcov: rust: add flags for KCOV with Rust mm: rust: make CONFIG_MMU ifdefs more narrow mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() mm/damon/Kconfig: enable CONFIG_DAMON by default ...	2025-06-02 16:00:26 -07:00
Linus Torvalds	7f9039c524	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...	2025-06-02 12:24:58 -07:00
Yonghong Song	baa39c169d	selftests/bpf: Fix selftest btf_tag/btf_type_tag_percpu_vmlinux_helper failure Ihor Solodrai reported selftest 'btf_tag/btf_type_tag_percpu_vmlinux_helper' failure ([1]) during 6.16 merge window. The failure log: ... 7: (15) if r0 == 0x0 goto pc+1 ; R0=ptr_css_rstat_cpu() ; (volatile int )rstat; @ btf_type_tag_percpu.c:68 8: (61) r1 = (u32 )(r0 +0) cannot access ptr member updated_children with moff 0 in struct css_rstat_cpu with off 0 size 4 Two changes are needed. First, 'struct cgroup_rstat_cpu' needs to be replaced with 'struct css_rstat_cpu' to be consistent with new data structure. Second, layout of 'css_rstat_cpu' is changed compared to 'cgroup_rstat_cpu'. The first member becomes a pointer so the bpf prog needs to do 8-byte load instead of 4-byte load. [1] https://lore.kernel.org/bpf/6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev/ Cc: Ihor Solodrai <ihor.solodrai@linux.dev> Cc: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: JP Kobryn <inwardvessel@gmail.com> Link: https://lore.kernel.org/r/20250529201151.1787575-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-01 13:07:47 -07:00
Saket Kumar Bhaskar	4b65d5ae97	selftests/bpf: Fix bpf selftest build error On linux-next, build for bpf selftest displays an error due to mismatch in the expected function signature of bpf_testmod_test_read and bpf_testmod_test_write. Commit `97d06802d1` ("sysfs: constify bin_attribute argument of bin_attribute::read/write()") changed the required type for struct bin_attribute to const struct bin_attribute. To resolve the error, update corresponding signature for the callback. Fixes: `97d06802d1` ("sysfs: constify bin_attribute argument of bin_attribute::read/write()") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/e915da49-2b9a-4c4c-a34f-877f378129f6@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Link: https://lore.kernel.org/r/20250512091108.2015615-1-skb99@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-01 12:57:41 -07:00
Mark Brown	66bce7afba	selftests/mm: fix test result reporting in gup_longterm The kselftest framework uses the string logged when a test result is reported as the unique identifier for a test, using it to track test results between runs. The gup_longterm test fails to follow this pattern, it runs a single test function repeatedly with various parameters but each result report is a string logging an error message which is fixed between runs. Since the code already logs each test uniquely before it starts refactor to also print this to a buffer, then use that name as the test result. This isn't especially pretty but is relatively straightforward and is a great help to tooling. Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-4-ff198df8e38e@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-31 22:46:16 -07:00
Mark Brown	3f2d9a9ac5	selftests/mm: report unique test names for each cow test The kselftest framework uses the string logged when a test result is reported as the unique identifier for a test, using it to track test results between runs. The cow test completely fails to follow this pattern, it runs test functions repeatedly with various parameters with each result report from those functions being a string logging an error message which is fixed between runs. Since the code already logs each test uniquely before it starts refactor to also print this to a buffer, then use that name as the test result. This isn't especially pretty but is relatively straightforward and is a great help to tooling. Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-3-ff198df8e38e@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-31 22:46:16 -07:00
Mark Brown	3f192afbed	selftests/mm: add helper for logging test start and results Several of the MM tests have a pattern of printing a description of the test to be run then reporting the actual TAP result using a generic string not connected to the specific test, often in a shared function used by many tests. The name reported typically varies depending on the specific result rather than the test too. This causes problems for tooling that works with test results, the names reported with the results are used to deduplicate tests and track them between runs so both duplicated names and changing names cause trouble for things like UIs and automated bisection. As a first step towards matching these tests better with the expectations of kselftest provide helpers which record the test name as part of the initial print and then use that as part of reporting a result. This is not added as a generic kselftest helper partly because the use of a variable to store the test name doesn't fit well with the header only implementation of kselftest.h and partly because it's not really an intended pattern. Ideally at some point the mm tests that use it will be updated to not need it. Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-2-ff198df8e38e@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-31 22:46:16 -07:00
Mark Brown	109364fce5	selftests/mm: use standard ksft_finished() in cow and gup_longterm Patch series "selftests/mm: cow and gup_longterm cleanups", v2. The bulk of these changes modify the cow and gup_longterm tests to report unique and stable names for each test, bringing them into line with the expectations of tooling that works with kselftest. The string reported as a test result is used by tooling to both deduplicate tests and track tests between test runs, using the same string for multiple tests or changing the string depending on test result causes problems for user interfaces and automation such as bisection. It was suggested that converting to use kselftest_harness.h would be a good way of addressing this, however that really wants the set of tests to run to be known at compile time but both test programs dynamically enumarate the set of huge page sizes the system supports and test each. Refactoring to handle this would be even more invasive than these changes which are large but straightforward and repetitive. A version of the main gup_longterm cleanup was previously sent separately, this version factors out the helpers for logging the start of the test since the cow test looks very similar. This patch (of 4): The cow and gup_longterm test programs open code something that looks a lot like the standard ksft_finished() helper to summarise the test results and provide an exit code, convert to use ksft_finished(). Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-0-ff198df8e38e@kernel.org Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-1-ff198df8e38e@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-31 22:46:15 -07:00

1 2 3 4 5 ...

47277 Commits