linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-17 12:30:29 -05:00

Author	SHA1	Message	Date
Linus Torvalds	22c5696e3f	Merge tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...	2025-07-29 12:15:39 -07:00
KaFai Wan	863aab3d4d	bpf: Add log for attaching tracing programs to functions in deny list Show the rejected function name when attaching tracing programs to functions in deny list. With this change, we know why tracing programs can't attach to functions like __rcu_read_lock() from log. $ ./fentry libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG -- Attaching tracing programs to function '__rcu_read_lock' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 19:39:29 -07:00
KaFai Wan	a5a6b29a70	bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions With this change, we know the precise rejected function name when attaching fexit/fmod_ret to __noreturn functions from log. $ ./fexit libbpf: prog 'fexit': BPF program load failed: -EINVAL libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG -- Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 19:39:29 -07:00
Linus Torvalds	13150742b0	Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request for 6.17. The main focus this cycle is on reorganizing the SHA-1 and SHA-2 code, providing high-quality library APIs for SHA-1 and SHA-2 including HMAC support, and establishing conventions for lib/crypto/ going forward: - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares most of the SHA-512 code) into lib/crypto/. This includes both the generic and architecture-optimized code. Greatly simplify how the architecture-optimized code is integrated. Add an easy-to-use library API for each SHA variant, including HMAC support. Finally, reimplement the crypto_shash support on top of the library API. - Apply the same reorganization to the SHA-256 code (and also SHA-224 which shares most of the SHA-256 code). This is a somewhat smaller change, due to my earlier work on SHA-256. But this brings in all the same additional improvements that I made for SHA-1 and SHA-512. There are also some smaller changes: - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For these algorithms it's just a move, not a full reorganization yet. - Fix the MIPS chacha-core.S to build with the clang assembler. - Fix the Poly1305 functions to work in all contexts. - Fix a performance regression in the x86_64 Poly1305 code. - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code. Note that since the new organization of the SHA code is much simpler, the diffstat of this pull request is negative, despite the addition of new fully-documented library APIs for multiple SHA and HMAC-SHA variants. These APIs will allow further simplifications across the kernel as users start using them instead of the old-school crypto API. (I've already written a lot of such conversion patches, removing over 1000 more lines of code. But most of those will target 6.18 or later)" * tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits) lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils lib/crypto: x86/sha1-ni: Convert to use rounds macros lib/crypto: x86/sha1-ni: Minor optimizations and cleanup crypto: sha1 - Remove sha1_base.h lib/crypto: x86/sha1: Migrate optimized code into library lib/crypto: sparc/sha1: Migrate optimized code into library lib/crypto: s390/sha1: Migrate optimized code into library lib/crypto: powerpc/sha1: Migrate optimized code into library lib/crypto: mips/sha1: Migrate optimized code into library lib/crypto: arm64/sha1: Migrate optimized code into library lib/crypto: arm/sha1: Migrate optimized code into library crypto: sha1 - Use same state format as legacy drivers crypto: sha1 - Wrap library and add HMAC support lib/crypto: sha1: Add HMAC support lib/crypto: sha1: Add SHA-1 library functions lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() crypto: x86/sha1 - Rename conflicting symbol lib/crypto: sha2: Add hmac_sha*_init_usingrawkey() lib/crypto: arm/poly1305: Remove unneeded empty weak function lib/crypto: x86/poly1305: Fix performance regression on short messages ...	2025-07-28 17:58:52 -07:00
Linus Torvalds	7e7bc8335b	Merge tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs bpf updates from Christian Brauner: "These changes allow bpf to read extended attributes from cgroupfs. This is useful in redirecting AF_UNIX socket connections based on cgroup membership of the socket. One use-case is the ability to implement log namespaces in systemd so services and containers are redirected to different journals" * tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/kernfs: test xattr retrieval selftests/bpf: Add tests for bpf_cgroup_read_xattr bpf: Mark cgroup_subsys_state->cgroup RCU safe bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node kernfs: remove iattr_mutex	2025-07-28 14:42:31 -07:00
Suchit Karunakaran	5b4c54ac49	bpf: Fix various typos in verifier.c comments This patch fixes several minor typos in comments within the BPF verifier. No changes in functionality. Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 10:02:57 -07:00
Paul Chaignon	5dbb19b16a	bpf: Add third round of bounds deduction Commit `d7f0087381` ("bpf: try harder to deduce register bounds from different numeric domains") added a second call to __reg_deduce_bounds in reg_bounds_sync because a single call wasn't enough to converge to a fixed point in terms of register bounds. With patch "bpf: Improve bounds when s64 crosses sign boundary" from this series, Eduard noticed that calling __reg_deduce_bounds twice isn't enough anymore to converge. The first selftest added in "selftests/bpf: Test cross-sign 64bits range refinement" highlights the need for a third call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync performs the following bounds deduction: reg_bounds_sync entry: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __update_reg_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_bound_offset: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff)) __update_reg_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff)) In particular, notice how: 1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds learns new u32 bounds. 2. __reg64_deduce_bounds is unable to improve bounds at this point. 3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds. 4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds improves the smax and umin bounds thanks to patch "bpf: Improve bounds when s64 crosses sign boundary" from this series. 5. Subsequent functions are unable to improve the ranges further (only tnums). Yet, a better smin32 bound could be learned from the smin bound. __reg32_deduce_bounds is able to improve smin32 from smin, but for that we need a third call to __reg_deduce_bounds. As discussed in [1], there may be a better way to organize the deduction rules to learn the same information with less calls to the same functions. Such an optimization requires further analysis and is orthogonal to the present patchset. Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Co-developed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 10:02:13 -07:00
Paul Chaignon	00bf8d0c6c	bpf: Improve bounds when s64 crosses sign boundary __reg64_deduce_bounds currently improves the s64 range using the u64 range and vice versa, but only if it doesn't cross the sign boundary. This patch improves __reg64_deduce_bounds to cover the case where the s64 range crosses the sign boundary but overlaps with the u64 range on only one end. In that case, we can improve both ranges. Consider the following example, with the s64 range crossing the sign boundary: 0 U64_MAX \| [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx] \| \|----------------------------\|----------------------------\| \|xxxxx s64 range xxxxxxxxx] [xxxxxxx\| 0 S64_MAX S64_MIN -1 The u64 range overlaps only with positive portion of the s64 range. We can thus derive the following new s64 and u64 ranges. 0 U64_MAX \| [xxxxxx u64 range xxxxx] \| \|----------------------------\|----------------------------\| \| [xxxxxx s64 range xxxxx] \| 0 S64_MAX S64_MIN -1 The same logic can probably apply to the s32/u32 ranges, but this patch doesn't implement that change. In addition to the selftests, the __reg64_deduce_bounds change was also tested with Agni, the formal verification tool for the range analysis [1]. Link: https://github.com/bpfverif/agni [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 10:02:12 -07:00
Paul Chaignon	5345e64760	bpf: Simplify bounds refinement from s32 During the bounds refinement, we improve the precision of various ranges by looking at other ranges. Among others, we improve the following in this order (other things happen between 1 and 2): 1. Improve u32 from s32 in __reg32_deduce_bounds. 2. Improve s/u64 from u32 in __reg_deduce_mixed_bounds. 3. Improve s/u64 from s32 in __reg_deduce_mixed_bounds. In particular, if the s32 range forms a valid u32 range, we will use it to improve the u32 range in __reg32_deduce_bounds. In __reg_deduce_mixed_bounds, under the same condition, we will use the s32 range to improve the s/u64 ranges. If at (1) we were able to learn from s32 to improve u32, we'll then be able to use that in (2) to improve s/u64. Hence, as (3) happens under the same precondition as (1), it won't improve s/u64 ranges further than (1)+(2) did. Thus, we can get rid of (3). In addition to the extensive suite of selftests for bounds refinement, this patch was also tested with the Agni formal verification tool [1]. Additionally, Eduard mentioned: The argument appears to be as follows: Under precondition `(u32)reg->s32_min <= (u32)reg->s32_max` __reg32_deduce_bounds produces: reg->u32_min = max_t(u32, reg->s32_min, reg->u32_min); reg->u32_max = min_t(u32, reg->s32_max, reg->u32_max); And then first part of __reg_deduce_mixed_bounds assigns: a. reg->umin umax= (reg->umin & ~0xffffffffULL) \| max_t(u32, reg->s32_min, reg->u32_min); b. reg->umax umin= (reg->umax & ~0xffffffffULL) \| min_t(u32, reg->s32_max, reg->u32_max); And then second part of __reg_deduce_mixed_bounds assigns: c. reg->umin umax= (reg->umin & ~0xffffffffULL) \| (u32)reg->s32_min; d. reg->umax umin= (reg->umax & ~0xffffffffULL) \| (u32)reg->s32_max; But assignment (c) is a noop because: max_t(u32, reg->s32_min, reg->u32_min) >= (u32)reg->s32_min Hence RHS(a) >= RHS(c) and umin= does nothing. Also assignment (d) is a noop because: min_t(u32, reg->s32_max, reg->u32_max) <= (u32)reg->s32_max Hence RHS(b) <= RHS(d) and umin= does nothing. Plus the same reasoning for the part dealing with reg->s{min,max}_value: e. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) \| max_t(u32, reg->s32_min_value, reg->u32_min_value); f. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) \| min_t(u32, reg->s32_max_value, reg->u32_max_value); vs g. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) \| (u32)reg->s32_min_value; h. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) \| (u32)reg->s32_max_value; RHS(e) >= RHS(g) and RHS(f) <= RHS(h), hence smax=,smin= do nothing. This appears to be correct. Also, Shung-Hsi: Beside going through the reasoning, I also played with CBMC a bit to double check that as far as a single run of __reg_deduce_bounds() is concerned (and that the register state matches certain handwavy expectations), the change indeed still preserve the original behavior. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://github.com/bpfverif/agni [1] Link: https://lore.kernel.org/bpf/aIJwnFnFyUjNsCNa@mail.gmail.com	2025-07-27 19:23:29 +02:00
Puranjay Mohan	3ba58312e6	bpf: Move bpf_jit_get_prog_name() to core.c bpf_jit_get_prog_name() will be used by all JITs when enabling support for private stack. This function is currently implemented in the x86 JIT. Move the function to core.c so that other JITs can easily use it in their implementation of private stack. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250724120257.7299-2-puranjay@kernel.org	2025-07-26 21:26:51 +02:00
Thomas Weißschuh	b7b3500bd4	umd: Remove usermode driver framework The code is unused since `98e20e5e13` ("bpfilter: remove bpfilter"), therefore remove it. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-2-0d0083334382@linutronix.de	2025-07-26 21:03:04 +02:00
Thomas Weißschuh	2b03164eee	bpf/preload: Don't select USERMODE_DRIVER The usermode driver framework is not used anymore by the BPF preload code. Fixes: `cb80ddc671` ("bpf: Convert bpf_preload.ko to use light skeleton.") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-1-0d0083334382@linutronix.de	2025-07-26 21:02:48 +02:00
Samiullah Khawaja	71c52411c5	net: Create separate gro_flush_normal function Move multiple copies of same code snippet doing `gro_flush` and `gro_normal_list` into separate helper function. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250723013031.2911384-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-24 18:34:55 -07:00
Jakub Kicinski	a4f5759b6f	Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-07-24 We've added 3 non-merge commits during the last 3 day(s) which contain a total of 4 files changed, 40 insertions(+), 15 deletions(-). The main changes are: 1) Improved verifier error message for incorrect narrower load from pointer field in ctx, from Paul Chaignon. 2) Disabled migration in nf_hook_run_bpf to address a syzbot report, from Kuniyuki Iwashima. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Test invalid narrower ctx load bpf: Reject narrower access to pointer ctx fields bpf: Disable migration in nf_hook_run_bpf(). ==================== Link: https://patch.msgid.link/20250724173306.3578483-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-24 18:02:24 -07:00
Paul Chaignon	e09299225d	bpf: Reject narrower access to pointer ctx fields The following BPF program, simplified from a syzkaller repro, causes a kernel warning: r0 = (u8 )(r1 + 169); exit; With pointer field sk being at offset 168 in __sk_buff. This access is detected as a narrower read in bpf_skb_is_valid_access because it doesn't match offsetof(struct __sk_buff, sk). It is therefore allowed and later proceeds to bpf_convert_ctx_access. Note that for the "is_narrower_load" case in the convert_ctx_accesses(), the insn->off is aligned, so the cnt may not be 0 because it matches the offsetof(struct __sk_buff, sk) in the bpf_convert_ctx_access. However, the target_size stays 0 and the verifier errors with a kernel warning: verifier bug: error during ctx access conversion(1) This patch fixes that to return a proper "invalid bpf_context access off=X size=Y" error on the load instruction. The same issue affects multiple other fields in context structures that allow narrow access. Some other non-affected fields (for sk_msg, sk_lookup, and sockopt) were also changed to use bpf_ctx_range_ptr for consistency. Note this syzkaller crash was reported in the "Closes" link below, which used to be about a different bug, fixed in commit `fce7bd8e38` ("bpf/verifier: Handle BPF_LOAD_ACQ instructions in insn_def_regno()"). Because syzbot somehow confused the two bugs, the new crash and repro didn't get reported to the mailing list. Fixes: `f96da09473` ("bpf: simplify narrower ctx access") Fixes: `0df1a55afa` ("bpf: Warn on internal verifier errors") Reported-by: syzbot+0ef84a7bdf5301d4cbec@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0ef84a7bdf5301d4cbec Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/3b8dcee67ff4296903351a974ddd9c4dca768b64.1753194596.git.paul.chaignon@gmail.com	2025-07-23 19:33:49 -07:00
Yonghong Song	95993dc303	bpf: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...)) Intel linux test robot reported a warning that ERR_CAST can be used for error pointer casting instead of more-complicated/rarely-used ERR_PTR(PTR_ERR(...)) style. There is no functionality change, but still let us replace two such instances as it improves consistency and readability. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507201048.bceHy8zX-lkp@intel.com/ Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/20250720164754.3999140-1-yonghong.song@linux.dev	2025-07-21 17:27:09 -07:00
Alexei Starovoitov	beb1097ec8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6 Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-18 12:15:59 -07:00
Lorenz Bauer	2e2713ae1a	btf: Fix virt_to_phys() on arm64 when mmapping BTF Breno Leitao reports that arm64 emits the following warning with CONFIG_DEBUG_VIRTUAL: [ 58.896157] virt_to_phys used for non-linear address: 000000009fea9737 (__start_BTF+0x0/0x685530) [ 23.988669] WARNING: CPU: 25 PID: 1442 at arch/arm64/mm/physaddr.c:15 __virt_to_phys (arch/arm64/mm/physaddr.c:?) ... [ 24.075371] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST [ 24.080276] Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024 [ 24.088295] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 24.098440] pc : __virt_to_phys (arch/arm64/mm/physaddr.c:?) [ 24.105398] lr : __virt_to_phys (arch/arm64/mm/physaddr.c:?) ... [ 24.197257] Call trace: [ 24.199761] __virt_to_phys (arch/arm64/mm/physaddr.c:?) (P) [ 24.206883] btf_sysfs_vmlinux_mmap (kernel/bpf/sysfs_btf.c:27) [ 24.214264] sysfs_kf_bin_mmap (fs/sysfs/file.c:179) [ 24.218536] kernfs_fop_mmap (fs/kernfs/file.c:462) [ 24.222461] mmap_region (./include/linux/fs.h:? mm/internal.h:167 mm/vma.c:2405 mm/vma.c:2467 mm/vma.c:2622 mm/vma.c:2692) It seems that the memory layout on arm64 maps the kernel image in vmalloc space which is different than x86. This makes virt_to_phys emit the warning. Fix this by translating the address using __pa_symbol as suggested by Breno instead. Reported-by: Breno Leitao <leitao@debian.org> Closes: https://lore.kernel.org/bpf/g2gqhkunbu43awrofzqb4cs4sxkxg2i4eud6p4qziwrdh67q4g@mtw3d3aqfgmb/ Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Tested-by: Breno Leitao <leitao@debian> Fixes: `a539e2a6d5` ("btf: Allow mmap of vmlinux btf") Link: https://lore.kernel.org/r/20250717-vmlinux-mmap-pa-symbol-v1-1-970be6681158@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-17 11:33:52 -07:00
Tao Chen	19d18fdfc7	bpf: Add struct bpf_token_info The 'commit `35f96de041` ("bpf: Introduce BPF token object")' added BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD already used to get BPF object info, so we can also get token info with this cmd. One usage scenario, when program runs failed with token, because of the permission failure, we can report what BPF token is allowing with this API for debugging. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:38:05 -07:00
Feng Yang	62ef449b8d	bpf: Clean up individual BTF_ID code Use BTF_ID_LIST_SINGLE(a, b, c) instead of BTF_ID_LIST(a) BTF_ID(b, c) Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Link: https://lore.kernel.org/r/20250710055419.70544-1-yangfeng59949@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:34:42 -07:00
Ilya Leoshkevich	1f489662fb	bpf: Update iterators.lskel-big-endian.h The last iterators update (commit `515ee52b22` ("bpf: make preloaded map iterators to display map elements count")) missed the big-endian skeleton. Update it by running "make big" with Debian clang version 21.0.0 (++20250706105601+01c97b4953e8-1~exp1~20250706225612.1558). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250710100907.45880-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:29:18 -07:00
Eric Biggers	9503ca2cca	lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() Rename the existing sha1_init() to sha1_init_raw(), since it conflicts with the upcoming library function. This will later be removed, but this keeps the kernel building for the introduction of the library. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-14 08:22:31 -07:00
Tao Chen	0eeeebdcc5	bpf: Remove attach_type in bpf_tracing_link Use attach_type in bpf_link, and remove it in bpf_tracing_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-7-chen.dylane@linux.dev	2025-07-11 11:01:08 -07:00
Tao Chen	2a76a80c7f	bpf: Remove attach_type in bpf_netns_link Use attach_type in bpf_link, and remove it in bpf_netns_link. And move netns_type field to the end to fill the byte hole. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20250710032038.888700-6-chen.dylane@linux.dev	2025-07-11 11:01:04 -07:00
Tao Chen	6e816e1c05	bpf: Remove location field in tcx_link Use attach_type in bpf_link to replace the location filed, and remove location field in tcx_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20250710032038.888700-5-chen.dylane@linux.dev	2025-07-11 11:00:57 -07:00
Tao Chen	9b8d543dc2	bpf: Remove attach_type in bpf_cgroup_link Use attach_type in bpf_link, and remove it in bpf_cgroup_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-3-chen.dylane@linux.dev	2025-07-11 10:51:55 -07:00
Tao Chen	b725441f02	bpf: Add attach_type field to bpf_link Attach_type will be set when a link is created by user. It is better to record attach_type in bpf_link generically and have it available universally for all link types. So add the attach_type field in bpf_link and move the sleepable field to avoid unnecessary gap padding. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-2-chen.dylane@linux.dev	2025-07-11 10:51:55 -07:00
Paul Chaignon	6279846b9b	bpf: Forget ranges when refining tnum after JSET Syzbot reported a kernel warning due to a range invariant violation on the following BPF program. 0: call bpf_get_netns_cookie 1: if r0 == 0 goto <exit> 2: if r0 & Oxffffffff goto <exit> The issue is on the path where we fall through both jumps. That path is unreachable at runtime: after insn 1, we know r0 != 0, but with the sign extension on the jset, we would only fallthrough insn 2 if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to figure this out, so the verifier walks all branches. The verifier then refines the register bounds using the second condition and we end up with inconsistent bounds on this unreachable path: 1: if r0 == 0 goto <exit> r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff) 2: if r0 & 0xffffffff goto <exit> r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0) r0 after reg_bounds_sync: u64=[0x1, 0] var_off=(0, 0) Improving the range refinement for JSET to cover all cases is tricky. We also don't expect many users to rely on JSET given LLVM doesn't generate those instructions. So instead of improving the range refinement for JSETs, Eduard suggested we forget the ranges whenever we're narrowing tnums after a JSET. This patch implements that approach. Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/9d4fd6432a095d281f815770608fdcd16028ce0b.1752171365.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-11 10:45:25 -07:00
Emil Tsalapatis	8fc3d2d8b5	bpf/arena: add bpf_arena_reserve_pages kfunc Add a new BPF arena kfunc for reserving a range of arena virtual addresses without backing them with pages. This prevents the range from being populated using bpf_arena_alloc_pages(). Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250709191312.29840-2-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-11 10:43:54 -07:00
Tao Chen	3413bc0cf1	bpf: Clean code with bpf_copy_to_user() No logic change, use bpf_copy_to_user() to clean code. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250703163700.677628-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:53:59 -07:00
Luis Gerhorst	dadb59104c	bpf: Fix aux usage after do_check_insn() We must terminate the speculative analysis if the just-analyzed insn had nospec_result set. Using cur_aux() here is wrong because insn_idx might have been incremented by do_check_insn(). Therefore, introduce and use insn_aux variable. Also change cur_aux(env)->nospec in case do_check_insn() ever manages to increment insn_idx but still fail. Change the warning to check the insn class (which prevents it from triggering for ldimm64, for which nospec_result would not be problematic) and use verifier_bug_if(). In line with Eduard's suggestion, do not introduce prev_aux() because that requires one to understand that after do_check_insn() call what was current became previous. This would at-least require a comment. Fixes: `d6f1c85f22` ("bpf: Fall back to nospec for Spectre v1") Reported-by: Paul Chaignon <paul.chaignon@gmail.com> Reported-by: Eduard Zingerman <eddyz87@gmail.com> Reported-by: syzbot+dc27c5fb8388e38d2d37@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/685b3c1b.050a0220.2303ee.0010.GAE@google.com/ Link: https://lore.kernel.org/bpf/4266fd5de04092aa4971cbef14f1b4b96961f432.camel@gmail.com/ Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250705190908.1756862-2-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:32:34 -07:00
Kumar Kartikeya Dwivedi	bfa2bb9abd	bpf: Fix improper int-to-ptr cast in dump_stack_cb On 32-bit platforms, we'll try to convert a u64 directly to a pointer type which is 32-bit, which causes the compiler to complain about cast from an integer of a different size to a pointer type. Cast to long before casting to the pointer type to match the pointer width. Reported-by: kernelci.org bot <bot@kernelci.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: `d7c431cafc` ("bpf: Add dump_stack() analogue to print to BPF stderr") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250705053035.3020320-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:30:15 -07:00
Kumar Kartikeya Dwivedi	116c8f4747	bpf: Fix bounds for bpf_prog_get_file_line linfo loop We may overrun the bounds because linfo and jited_linfo are already advanced to prog->aux->linfo_idx, hence we must only iterate the remaining elements until we reach prog->aux->nr_linfo. Adjust the nr_linfo calculation to fix this. Reported in [0]. [0]: https://lore.kernel.org/bpf/f3527af3b0620ce36e299e97e7532d2555018de2.camel@gmail.com Reported-by: Eduard Zingerman <eddyz87@gmail.com> Fixes: `0e521efaf3` ("bpf: Add function to extract program source info") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250705053035.3020320-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:30:15 -07:00
Eduard Zingerman	c4aa454c64	bpf: support for void/primitive __arg_untrusted global func params Allow specifying __arg_untrusted for void /char /int /long parameters. Treat such parameters as PTR_TO_MEM\|MEM_RDONLY\|PTR_UNTRUSTED of size zero. Intended usage is as follows: int memcmp(char a __arg_untrusted, char b __arg_untrusted, size_t n) { bpf_for(i, 0, n) { if (a[i] - b[i]) // load at any offset is allowed return a[i] - b[i]; } return 0; } Allocate register id for ARG_PTR_TO_MEM parameters only when PTR_MAYBE_NULL is set. Register id for PTR_TO_MEM is used only to propagate non-null status after conditionals. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:25:07 -07:00
Eduard Zingerman	182f7df704	bpf: attribute __arg_untrusted for global function parameters Add support for PTR_TO_BTF_ID \| PTR_UNTRUSTED global function parameters. Anything is allowed to pass to such parameters, as these are read-only and probe read instructions would protect against invalid memory access. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:25:06 -07:00
Eduard Zingerman	2d5c91e1cc	bpf: rdonly_untrusted_mem for btf id walk pointer leafs When processing a load from a PTR_TO_BTF_ID, the verifier calculates the type of the loaded structure field based on the load offset. For example, given the following types: struct foo { struct foo a; int b; } *p; The verifier would calculate the type of `p->a` as a pointer to `struct foo`. However, the type of `p->b` is currently calculated as a SCALAR_VALUE. This commit updates the logic for processing PTR_TO_BTF_ID to instead calculate the type of p->b as PTR_TO_MEM\|MEM_RDONLY\|PTR_UNTRUSTED. This change allows further dereferencing of such pointers (using probe memory instructions). Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:25:06 -07:00
Eduard Zingerman	b9d44bc9fd	bpf: make makr_btf_ld_reg return error for unexpected reg types Non-functional change: mark_btf_ld_reg() expects 'reg_type' parameter to be either SCALAR_VALUE or PTR_TO_BTF_ID. Next commit expands this set, so update this function to fail if unexpected type is passed. Also update callers to propagate the error. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:25:06 -07:00
Yonghong Song	82bc4abf28	bpf: Avoid putting struct bpf_scc_callchain variables on the stack Add a 'struct bpf_scc_callchain callchain_buf' field in bpf_verifier_env. This way, the previous bpf_scc_callchain local variables can be replaced by taking address of env->callchain_buf. This can reduce stack usage and fix the following error: kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [-Werror,-Wframe-larger-than] Reported-by: Arnd Bergmann <arnd@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250703141117.1485108-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:31:30 -07:00
Yonghong Song	45e9cd38aa	bpf: Reduce stack frame size by using env->insn_buf for bpf insns Arnd Bergmann reported an issue ([1]) where clang compiler (less than llvm18) may trigger an error where the stack frame size exceeds the limit. I can reproduce the error like below: kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than] kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [-Werror,-Wframe-larger-than] Use env->insn_buf for bpf insns instead of putting these insns on the stack. This can resolve the above 'bpf_check' error. The 'do_check' error will be resolved in the next patch. [1] https://lore.kernel.org/bpf/20250620113846.3950478-1-arnd@kernel.org/ Reported-by: Arnd Bergmann <arnd@kernel.org> Tested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250703141111.1484521-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:31:29 -07:00
Yonghong Song	3b87251439	bpf: Simplify assignment to struct bpf_insn pointer in do_misc_fixups() In verifier.c, the following code patterns (in two places) struct bpf_insn patch = &insn_buf[0]; can be simplified to struct bpf_insn patch = insn_buf; which is easier to understand. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250703141106.1483216-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:31:29 -07:00
Paul Chaignon	032547272e	bpf: Avoid warning on unexpected map for tail call Before handling the tail call in record_func_key(), we check that the map is of the expected type and log a verifier error if it isn't. Such an error however doesn't indicate anything wrong with the verifier. The check for map<>func compatibility is done after record_func_key(), by check_map_func_compatibility(). Therefore, this patch logs the error as a typical reject instead of a verifier error. Fixes: `d2e4c1e6c2` ("bpf: Constant map key tracking for prog array pokes") Fixes: `0df1a55afa` ("bpf: Warn on internal verifier errors") Reported-by: syzbot+efb099d5833bca355e51@syzkaller.appspotmail.com Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/1f395b74e73022e47e04a31735f258babf305420.1751578055.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:54 -07:00
Kumar Kartikeya Dwivedi	ecec5b5743	bpf: Report rqspinlock deadlocks/timeout to BPF stderr Begin reporting rqspinlock deadlocks and timeout to BPF program's stderr. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi	e8d0133022	bpf: Report may_goto timeout to BPF stderr Begin reporting may_goto timeouts to BPF program's stderr stream. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi	d7c431cafc	bpf: Add dump_stack() analogue to print to BPF stderr Introduce a kernel function which is the analogue of dump_stack() printing some useful information and the stack trace. This is not exposed to BPF programs yet, but can be made available in the future. When we have a program counter for a BPF program in the stack trace, also additionally output the filename and line number to make the trace helpful. The rest of the trace can be passed into ./decode_stacktrace.sh to obtain the line numbers for kernel symbols. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi	f0c53fd4a7	bpf: Add function to find program from stack trace In preparation of figuring out the closest program that led to the current point in the kernel, implement a function that scans through the stack trace and finds out the closest BPF program when walking down the stack trace. Special care needs to be taken to skip over kernel and BPF subprog frames. We basically scan until we find a BPF main prog frame. The assumption is that if a program calls into us transitively, we'll hit it along the way. If not, we end up returning NULL. Contextually the function will be used in places where we know the program may have called into us. Due to reliance on arch_bpf_stack_walk(), this function only works on x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from arch_bpf_stack_walk as well since we call it outside bpf_throw() context. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi	d090326860	bpf: Ensure RCU lock is held around bpf_prog_ksym_find Add a warning to ensure RCU lock is held around tree lookup, and then fix one of the invocations in bpf_stack_walker. The program has an active stack frame and won't disappear. Use the opportunity to remove unneeded invocation of is_bpf_text_address. Fixes: `f18b03faba` ("bpf: Implement BPF exceptions") Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi	0e521efaf3	bpf: Add function to extract program source info Prepare a function for use in future patches that can extract the file info, line info, and the source line number for a given BPF program provided it's program counter. Only the basename of the file path is provided, given it can be excessively long in some cases. This will be used in later patches to print source info to the BPF stream. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi	5ab154f146	bpf: Introduce BPF standard streams Add support for a stream API to the kernel and expose related kfuncs to BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These can be used for printing messages that can be consumed from user space, thus it's similar in spirit to existing trace_pipe interface. The kernel will use the BPF_STDERR stream to notify the program of any errors encountered at runtime. BPF programs themselves may use both streams for writing debug messages. BPF library-like code may use BPF_STDERR to print warnings or errors on misuse at runtime. The implementation of a stream is as follows. Everytime a message is emitted from the kernel (directly, or through a BPF program), a record is allocated by bump allocating from per-cpu region backed by a page obtained using alloc_pages_nolock(). This ensures that we can allocate memory from any context. The eventual plan is to discard this scheme in favor of Alexei's kmalloc_nolock() [0]. This record is then locklessly inserted into a list (llist_add()) so that the printing side doesn't require holding any locks, and works in any context. Each stream has a maximum capacity of 4MB of text, and each printed message is accounted against this limit. Messages from a program are emitted using the bpf_stream_vprintk kfunc, which takes a stream_id argument in addition to working otherwise similar to bpf_trace_vprintk. The bprintf buffer helpers are extracted out to be reused for printing the string into them before copying it into the stream, so that we can (with the defined max limit) format a string and know its true length before performing allocations of the stream element. For consuming elements from a stream, we expose a bpf(2) syscall command named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the stream of a given prog_fd into a user space buffer. The main logic is implemented in bpf_stream_read(). The log messages are queued in bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and ordered correctly in the stream backlog. For this purpose, we hold a lock around bpf_stream_backlog_peek(), as llist_del_first() (if we maintained a second lockless list for the backlog) wouldn't be safe from multiple threads anyway. Then, if we fail to find something in the backlog log, we splice out everything from the lockless log, and place it in the backlog log, and then return the head of the backlog. Once the full length of the element is consumed, we will pop it and free it. The lockless list bpf_stream::log is a LIFO stack. Elements obtained using a llist_del_all() operation are in LIFO order, thus would break the chronological ordering if printed directly. Hence, this batch of messages is first reversed. Then, it is stashed into a separate list in the stream, i.e. the backlog_log. The head of this list is the actual message that should always be returned to the caller. All of this is done in bpf_stream_backlog_fill(). From the kernel side, the writing into the stream will be a bit more involved than the typical printk. First, the kernel typically may print a collection of messages into the stream, and parallel writers into the stream may suffer from interleaving of messages. To ensure each group of messages is visible atomically, we can lift the advantage of using a lockless list for pushing in messages. To enable this, we add a bpf_stream_stage() macro, and require kernel users to use bpf_stream_printk statements for the passed expression to write into the stream. Underneath the macro, we have a message staging API, where a bpf_stream_stage object on the stack accumulates the messages being printed into a local llist_head, and then a commit operation splices the whole batch into the stream's lockless log list. This is especially pertinent for rqspinlock deadlock messages printed to program streams. After this change, we see each deadlock invocation as a non-interleaving contiguous message without any confusion on the reader's part, improving their user experience in debugging the fault. While programs cannot benefit from this staged stream writing API, they could just as well hold an rqspinlock around their print statements to serialize messages, hence this is kept kernel-internal for now. Overall, this infrastructure provides NMI-safe any context printing of messages to two dedicated streams. Later patches will add support for printing splats in case of BPF arena page faults, rqspinlock deadlocks, and cond_break timeouts, and integration of this facility into bpftool for dumping messages to user space. [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi	0426729f46	bpf: Refactor bprintf buffer support Refactor code to be able to get and put bprintf buffers and use bpf_printf_prepare independently. This will be used in the next patch to implement BPF streams support, particularly as a staging buffer for strings that need to be formatted and then allocated and pushed into a stream. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:06 -07:00
Tao Chen	803f0700a3	bpf: Show precise link_type for {uprobe,kprobe}_multi fdinfo Alexei suggested, 'link_type' can be more precise and differentiate for human in fdinfo. In fact BPF_LINK_TYPE_KPROBE_MULTI includes kretprobe_multi type, the same as BPF_LINK_TYPE_UPROBE_MULTI, so we can show it more concretely. link_type: kprobe_multi link_id: 1 prog_tag: d2b307e915f0dd37 ... link_type: kretprobe_multi link_id: 2 prog_tag: ab9ea0545870781d ... link_type: uprobe_multi link_id: 9 prog_tag: e729f789e34a8eca ... link_type: uretprobe_multi link_id: 10 prog_tag: 7db356c03e61a4d4 Co-developed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250702153958.639852-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:29:42 -07:00

1 2 3 4 5 ...

4147 Commits