linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-17 14:50:43 -05:00

Author	SHA1	Message	Date
Russell King (Oracle)	2dd63c3645	net: stmmac: ingenic: move ingenic_mac_init() Move ingenic_mac_init() to between variant specific set_mode() implementations and ingenic_mac_probe(). No code changes. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vHHpi-0000000Djqp-4910@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:30:39 -08:00
Simon Schippers	7ff14c5204	usbnet: Add support for Byte Queue Limits (BQL) In the current implementation, usbnet uses a fixed tx_qlen of: USB2: 60 * 1518 bytes = 91.08 KB USB3: 60 * 5 * 1518 bytes = 454.80 KB Such large transmit queues can be problematic, especially for cellular modems. For example, with a typical celluar link speed of 10 Mbit/s, a fully occupied USB3 transmit queue results in: 454.80 KB / (10 Mbit/s / 8 bit/byte) = 363.84 ms of additional latency. This patch adds support for Byte Queue Limits (BQL) [1] to dynamically manage the transmit queue size and reduce latency without sacrificing throughput. Testing was performed on various devices using the usbnet driver for packet transmission: - DELOCK 66045: USB3 to 2.5 GbE adapter (ax88179_178a) - DELOCK 61969: USB2 to 1 GbE adapter (asix) - Quectel RM520: 5G modem (qmi_wwan) - USB2 Android tethering (cdc_ncm) No performance degradation was observed for iperf3 TCP or UDP traffic, while latency for a prioritized ping application was significantly reduced. For example, using the USB3 to 2.5 GbE adapter, which was fully utilized by iperf3 UDP traffic, the prioritized ping was improved from 1.6 ms to 0.6 ms. With the same setup but with a 100 Mbit/s Ethernet connection, the prioritized ping was improved from 35 ms to 5 ms. [1] https://lwn.net/Articles/469652/ Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251106175615.26948-1-simon.schippers@tu-dortmund.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:28:38 -08:00
Zahari Doychev	41d0c31be2	tools: ynl: call nested attribute free function for indexed arrays When freeing indexed arrays, the corresponding free function should be called for each entry of the indexed array. For example, for for 'struct tc_act_attrs' 'tc_act_attrs_free(...)' needs to be called for each entry. Previously, memory leaks were reported when enabling the ASAN analyzer. ================================================================= ==874==ERROR: LeakSanitizer: detected memory leaks Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67 #1 0x55c98db048af in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813 #2 0x55c98db048af in main ./linux/tools/net/ynl/samples/tc-filter-add.c:71 Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67 #1 0x55c98db04a93 in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813 #2 0x55c98db04a93 in main ./linux/tools/net/ynl/samples/tc-filter-add.c:74 Direct leak of 10 byte(s) in 2 object(s) allocated from: #0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67 #1 0x55c98db0527d in tc_act_attrs_set_kind ../generated/tc-user.h:1622 SUMMARY: AddressSanitizer: 58 byte(s) leaked in 4 allocation(s). The following diff illustrates the changes introduced compared to the previous version of the code. void tc_flower_attrs_free(struct tc_flower_attrs *obj) { + unsigned int i; + free(obj->indev); + for (i = 0; i < obj->_count.act; i++) + tc_act_attrs_free(&obj->act[i]); free(obj->act); free(obj->key_eth_dst); free(obj->key_eth_dst_mask); Signed-off-by: Zahari Doychev <zahari.doychev@linux.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20251106151529.453026-3-zahari.doychev@linux.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:18:05 -08:00
Breno Leitao	23c52b58cc	tg3: Fix num of RX queues being reported by ethtool Using num_online_cpus() to report number of queues is actually not correct, as reported by Michael[1]. netif_get_num_default_rss_queues() was used to replace num_online_cpus() in the past, but tg3 ethtool callbacks didn't get converted. Doing it now. Link: https://lore.kernel.org/all/CACKFLim7ruspmqvjr6bNRq5Z_XXVk3vVaLZOons7kMCzsEG23A@mail.gmail.com/#t [1] Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20251107-tg3_counts-v1-1-337fe5c8ccb7@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:32 -08:00
Jakub Kicinski	f5d8ec838b	Merge branch 'net-dsa-b53-add-support-for-bcm5389-97-98-and-bcm63xx-arl-formats' Jonas Gorski says: ==================== net: dsa: b53: add support for BCM5389/97/98 and BCM63XX ARL formats Currently b53 assumes that all switches apart from BCM5325/5365 use the same ARL formats, but there are actually multiple formats in use. Older switches use a format apparently introduced with BCM5387/BCM5389, while newer chips use a format apparently introduced with BCM5395. Note that these numbers are not linear, BCM5397/BCM5398 use the older format. In addition to that the switches integrated into BCM63XX SoCs use their own format. While accessing these normal read/write ARL entries are the same format as BCM5389 one, the search format is different. So in order to support all these different format, split all code accessing these entries into chip-family specific functions, and collect them in appropriate arl ops structs to keep the code cleaner. Sent as net-next since the ARL accesses have never worked before, and the extensive refactoring might be too much to warrant a fix. ==================== Link: https://patch.msgid.link/20251107080749.26936-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:10 -08:00
Jonas Gorski	2b3013ac03	net: dsa: b53: add support for bcm63xx ARL entry format The ARL registers of BCM63XX embedded switches are somewhat unique. The normal ARL table access registers have the same format as BCM5389, but the ARL search registers differ: * SRCH_CTL is at the same offset of BCM5389, but 16 bits wide. It does not have more fields, just needs to be accessed by a 16 bit read. * SRCH_RSLT_MACVID and SRCH_RSLT are aligned to 32 bit, and have shifted offsets. * SRCH_RSLT has a different format than the normal ARL data entry register. * There is only one set of ENTRY_N registers, implying a 1 bin layout. So add appropriate ops for bcm63xx and let it use it. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-9-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:07 -08:00
Jonas Gorski	300f78e8b6	net: dsa: b53: add support for 5389/5397/5398 ARL entry format BCM5389, BCM5397 and BCM5398 use a different ARL entry format with just a 16 bit fwdentry register, as well as different search control and data offsets. So add appropriate ops for them and switch those chips to use them. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-8-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:07 -08:00
Jonas Gorski	a7e73339ad	net: dsa: b53: move ARL entry functions into ops struct Now that the differences in ARL entry formats are neatly contained into functions per chip family, wrap them into an ops struct and add wrapper functions to access them. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-7-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:07 -08:00
Jonas Gorski	e0c476f325	net: dsa: b53: split reading search entry into their own functions Split reading search entries into a function for each format. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-6-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:07 -08:00
Jonas Gorski	1716be6db0	net: dsa: b53: provide accessors for accessing ARL_SRCH_CTL In order to more easily support more formats, move accessing ARL_SRCH_CTL into helper functions to contain the differences. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-5-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:07 -08:00
Jonas Gorski	bf6e9d2ae1	net: dsa: b53: move writing ARL entries into their own functions Move writing ARL entries into individual functions for each format. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-4-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:06 -08:00
Jonas Gorski	4a291fe722	net: dsa: b53: move reading ARL entries into their own function Instead of duplicating the whole code iterating over all bins for BCM5325, factor out reading and parsing the entry into its own functions, and name it the modern one after the first chip with that ARL format, (BCM53)95. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-3-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:06 -08:00
Jonas Gorski	a6e4fd38bf	net: dsa: b53: b53_arl_read{,25}(): use the entry for comparision Align the b53_arl_read{,25}() functions by consistently using the parsed arl entry instead of parsing the raw registers again. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251107080749.26936-2-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:11:06 -08:00
Jonas Gorski	762e7e174d	net: dsa: tag_brcm: do not mark link local traffic as offloaded Broadcom switches locally terminate link local traffic and do not forward it, so we should not mark it as offloaded. In some situations we still want/need to flood this traffic, e.g. if STP is disabled, or it is explicitly enabled via the group_fwd_mask. But if the skb is marked as offloaded, the kernel will assume this was already done in hardware, and the packets never reach other bridge ports. So ensure that link local traffic is never marked as offloaded, so that the kernel can forward/flood these packets in software if needed. Since the local termination in not configurable, check the destination MAC, and never mark packets as offloaded if it is a link local ether address. While modern switches set the tag reason code to BRCM_EG_RC_PROT_TERM for trapped link local traffic, they also set it for link local traffic that is flooded (01:80:c2:00:00:10 to 01:80:c2:00:00:2f), so we cannot use it and need to look at the destination address for them as well. Fixes: `964dbf186e` ("net: dsa: tag_brcm: add support for legacy tags") Fixes: `0e62f543be` ("net: dsa: Fix duplicate frames flooded by learning") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251109134635.243951-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 17:04:19 -08:00
Victor Nogueira	60260ad935	selftests/tc-testing: Create tests trying to add children to clsact/ingress qdiscs In response to Wang's bug report [1], add the following test cases: - Try and fail to add an fq child to an ingress qdisc - Try and fail to add an fq child to a clsact qdisc [1] https://lore.kernel.org/netdev/20251105022213.1981982-1-wangliang74@huawei.com/ Reviewed-by: Pedro Tammela <pctammela@mojatatu.ai> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Cong Wang <cwang@multikernel.io> Link: https://patch.msgid.link/20251106205621.3307639-2-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:57:56 -08:00
Victor Nogueira	e781122d76	net/sched: Abort __tc_modify_qdisc if parent is a clsact/ingress qdisc Wang reported an illegal configuration [1] where the user attempts to add a child qdisc to the ingress qdisc as follows: tc qdisc add dev eth0 handle ffff:0 ingress tc qdisc add dev eth0 handle ffe0:0 parent ffff:a fq To solve this, we reject any configuration attempt to add a child qdisc to ingress or clsact. [1] https://lore.kernel.org/netdev/20251105022213.1981982-1-wangliang74@huawei.com/ Fixes: `5e50da01d0` ("[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs") Reported-by: Wang Liang <wangliang74@huawei.com> Closes: https://lore.kernel.org/netdev/20251105022213.1981982-1-wangliang74@huawei.com/ Reviewed-by: Pedro Tammela <pctammela@mojatatu.ai> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Cong Wang <cwang@multikernel.io> Link: https://patch.msgid.link/20251106205621.3307639-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:57:56 -08:00
Jakub Kicinski	7fc2bf8d30	Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-11-10 We've added 19 non-merge commits during the last 3 day(s) which contain a total of 22 files changed, 1345 insertions(+), 197 deletions(-). The main changes are: 1) Preserve skb metadata after a TC BPF program has changed the skb, from Jakub Sitnicki. This allows a TC program at the end of a TC filter chain to still see the skb metadata, even if another TC program at the front of the chain has changed the skb using BPF helpers. 2) Initial af_smc bpf_struct_ops support to control the smc specific syn/synack options, from D. Wythe. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: bpf/selftests: Add selftest for bpf_smc_hs_ctrl net/smc: bpf: Introduce generic hook for handshake flow bpf: Export necessary symbols for modules with struct_ops selftests/bpf: Cover skb metadata access after bpf_skb_change_proto selftests/bpf: Cover skb metadata access after change_head/tail helper selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room selftests/bpf: Cover skb metadata access after vlan push/pop helper selftests/bpf: Expect unclone to preserve skb metadata selftests/bpf: Dump skb metadata on verification failure selftests/bpf: Verify skb metadata in BPF instead of userspace bpf: Make bpf_skb_change_head helper metadata-safe bpf: Make bpf_skb_change_proto helper metadata-safe bpf: Make bpf_skb_adjust_room metadata-safe bpf: Make bpf_skb_vlan_push helper metadata-safe bpf: Make bpf_skb_vlan_pop helper metadata-safe vlan: Make vlan_remove_tag return nothing bpf: Unclone skb head on bpf_dynptr_write to skb metadata net: Preserve metadata on pskb_expand_head net: Helper to move packet data and metadata after skb_push/pull ==================== Link: https://patch.msgid.link/20251110232427.3929291-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:43:51 -08:00
Niklas Söderlund	38f073a71e	net: ravb: Correct bad check of timestamp control flags When converting the Renesas network drivers to use flags from enum hwtstamp_rx_filters to control when to timestamp packages instead of a driver specific schema with bit-wise flags an error was made. The bit-wise driver specific flags correct logic to set get_ts was: q: RAVB_BE + tstamp_rx_ctrl: 0 => 0 q: RAVB_NC + tstamp_rx_ctrl: 0 => 0 q: RAVB_BE + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_V2_L2_EVENT => 0 q: RAVB_NC + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_V2_L2_EVENT => 1 q: RAVB_BE + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_ALL => 1 q: RAVB_NC + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_ALL => 1 The converted logic to use enum flags mapped tstamp_rx_ctrl as 0 to HWTSTAMP_FILTER_NONE RAVB_RXTSTAMP_TYPE_V2_L2_EVENT to HWTSTAMP_FILTER_PTP_V2_L2_EVENT RAVB_RXTSTAMP_TYPE_ALL to HWTSTAMP_FILTER_ALL But the logic was incorrectly changed to: q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_NONE => 1 (error) q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_NONE => 0 q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_PTP_V2_L2_EVENT => 0 q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_PTP_V2_L2_EVENT => 1 q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_ALL => 1 q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_ALL => 0 (error) This change restores the converted flag check to the correct logic of the bit-wise driver specific flags. Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/linux-renesas-soc/aQ4xSv9629XF-Bt3@horms.kernel.org/ Fixes: `16e2e6cf75` ("net: ravb: Use common defines for time stamping control") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/20251107200100.3637869-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:26:31 -08:00
Zhongqiu Han	5b9192c2c0	ptp: ocp: Document sysfs output format for backward compatibility Add a comment to ptp_ocp_tty_show() explaining that the sysfs output intentionally does not include a trailing newline. This is required for backward compatibility with existing userspace software that reads the sysfs attribute and uses the value directly as a device path. A previous attempt to add a newline to align with common kernel conventions broke userspace applications that were opening device paths like "/dev/ttyS4\n" instead of "/dev/ttyS4", resulting in ENOENT errors. This comment prevents future attempts to "fix" this behavior, which would break existing userspace applications. Link: https://lore.kernel.org/netdev/20251030124519.1828058-1-zhongqiu.han@oss.qualcomm.com/ Link: https://lore.kernel.org/netdev/aef3b850-5f38-4c28-a018-3b0006dc2f08@linux.dev/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251107074533.416048-1-zhongqiu.han@oss.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:23:13 -08:00
Kuniyuki Iwashima	73edb26b06	sctp: Don't inherit do_auto_asconf in sctp_clone_sock(). syzbot reported list_del(&sp->auto_asconf_list) corruption in sctp_destroy_sock(). The repro calls setsockopt(SCTP_AUTO_ASCONF, 1) to a SCTP listener, calls accept(), and close()s the child socket. setsockopt(SCTP_AUTO_ASCONF, 1) sets sp->do_auto_asconf to 1 and links sp->auto_asconf_list to a per-netns list. Both fields are placed after sp->pd_lobby in struct sctp_sock, and sctp_copy_descendant() did not copy the fields before the cited commit. Also, sctp_clone_sock() did not set them explicitly. In addition, sctp_auto_asconf_init() is called from sctp_sock_migrate(), but it initialises the fields only conditionally. The two fields relied on __GFP_ZERO added in sk_alloc(), but sk_clone() does not use it. Let's clear newsp->do_auto_asconf in sctp_clone_sock(). [0]: list_del corruption. prev->next should be ffff8880799e9148, but was ffff8880799e8808. (prev=ffff88803347d9f8) kernel BUG at lib/list_debug.c:64! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6008 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 RIP: 0010:__list_del_entry_valid_or_report+0x15a/0x190 lib/list_debug.c:62 Code: e8 7b 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 7c ee 92 fd 49 8b 17 48 c7 c7 80 0a bf 8b 48 89 de 4c 89 f9 e8 07 c6 94 fc 90 <0f> 0b 4c 89 f7 e8 4c 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 4d RSP: 0018:ffffc90003067ad8 EFLAGS: 00010246 RAX: 000000000000006d RBX: ffff8880799e9148 RCX: b056988859ee6e00 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffc90003067807 R09: 1ffff9200060cf00 R10: dffffc0000000000 R11: fffff5200060cf01 R12: 1ffff1100668fb3f R13: dffffc0000000000 R14: ffff88803347d9f8 R15: ffff88803347d9f8 FS: 00005555823e5500(0000) GS:ffff88812613e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000000480 CR3: 00000000741ce000 CR4: 00000000003526f0 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:132 [inline] __list_del_entry include/linux/list.h:223 [inline] list_del include/linux/list.h:237 [inline] sctp_destroy_sock+0xb4/0x370 net/sctp/socket.c:5163 sk_common_release+0x75/0x310 net/core/sock.c:3961 sctp_close+0x77e/0x900 net/sctp/socket.c:1550 inet_release+0x144/0x190 net/ipv4/af_inet.c:437 __sock_release net/socket.c:662 [inline] sock_close+0xc3/0x240 net/socket.c:1455 __fput+0x44c/0xa70 fs/file_table.c:468 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: `16942cf4d3` ("sctp: Use sk_clone() in sctp_accept().") Reported-by: syzbot+ba535cb417f106327741@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/690d2185.a70a0220.22f260.000e.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251106223418.1455510-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:22:09 -08:00
Eric Dumazet	1534ff7775	sctp: prevent possible shift-out-of-bounds in sctp_transport_update_rto syzbot reported a possible shift-out-of-bounds [1] Blamed commit added rto_alpha_max and rto_beta_max set to 1000. It is unclear if some sctp users are setting very large rto_alpha and/or rto_beta. In order to prevent user regression, perform the test at run time. Also add READ_ONCE() annotations as sysctl values can change under us. [1] UBSAN: shift-out-of-bounds in net/sctp/transport.c:509:41 shift exponent 64 is too large for 32-bit type 'unsigned int' CPU: 0 UID: 0 PID: 16704 Comm: syz.2.2320 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:233 [inline] __ubsan_handle_shift_out_of_bounds+0x27f/0x420 lib/ubsan.c:494 sctp_transport_update_rto.cold+0x1c/0x34b net/sctp/transport.c:509 sctp_check_transmitted+0x11c4/0x1c30 net/sctp/outqueue.c:1502 sctp_outq_sack+0x4ef/0x1b20 net/sctp/outqueue.c:1338 sctp_cmd_process_sack net/sctp/sm_sideeffect.c:840 [inline] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1372 [inline] Fixes: `b58537a1f5` ("net: sctp: fix permissions for rto_alpha and rto_beta knobs") Reported-by: syzbot+f8c46c8b2b7f6e076e99@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/690c81ae.050a0220.3d0d33.014e.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251106111054.3288127-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-10 16:21:05 -08:00
Linus Torvalds	4427259cc7	Merge tag 'riscv-for-linus-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: - fix broken clang build on versions earlier than 19 and binutils versions earlier than 2.38. (This exposed that we're not properly testing earlier toolchain versions in our linux-next builds and PR submissions. This was fixed for this PR, and is being addressed more generally for -next builds.) - remove some redundant Makefile code - avoid building Canaan Kendryte K210-specific code on targets that don't build for the K210 * tag 'riscv-for-linus-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix CONFIG_AS_HAS_INSN for new .insn usage riscv: Remove redundant judgment for the default build target riscv: Build loader.bin exclusively for Canaan K210	2025-11-10 15:35:45 -08:00
Luiz Augusto von Dentz	41bf23338a	Bluetooth: hci_conn: Fix not cleaning up PA_LINK connections Contrary to what was stated on `d36349ea73` ("Bluetooth: hci_conn: Fix running bis_cleanup for hci_conn->type PA_LINK") the PA_LINK does in fact needs to run bis_cleanup in order to terminate the PA Sync, since that is bond to the listening socket which is the entity that controls the lifetime of PA Sync, so if it is closed/released the PA Sync shall be terminated, terminating the PA Sync shall not result in the BIG Sync being terminated since once the later is established it doesn't depend on the former anymore. If the use user wants to reconnect/rebind a number of BIS(s) it shall keep the socket open until it no longer needs the PA Sync, which means it retains full control of the lifetime of both PA and BIG Syncs. Fixes: `d36349ea73` ("Bluetooth: hci_conn: Fix running bis_cleanup for hci_conn->type PA_LINK") Fixes: `a7bcffc673` ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:09:00 -05:00
Pauli Virtanen	15f32cabf4	Bluetooth: 6lowpan: add missing l2cap_chan_lock() l2cap_chan_close() needs to be called in l2cap_chan_lock(), otherwise l2cap_le_sig_cmd() etc. may run concurrently. Add missing locks around l2cap_chan_close(). Fixes: `6b8d4a6a03` ("Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:08:41 -05:00
Pauli Virtanen	98454bc812	Bluetooth: 6lowpan: Don't hold spin lock over sleeping functions disconnect_all_peers() calls sleeping function (l2cap_chan_close) under spinlock. Holding the lock doesn't actually do any good -- we work on a local copy of the list, and the lock doesn't protect against peer->chan having already been freed. Fix by taking refcounts of peer->chan instead. Clean up the code and old comments a bit. Take devices_lock instead of RCU, because the kfree_rcu(); l2cap_chan_put(); construct in chan_close_cb() does not guarantee peer->chan is necessarily valid in RCU. Also take l2cap_chan_lock() which is required for l2cap_chan_close(). Log: (bluez 6lowpan-tester Client Connect - Disable) ------ BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575 ... <TASK> ... l2cap_send_disconn_req (net/bluetooth/l2cap_core.c:938 net/bluetooth/l2cap_core.c:1495) ... ? __pfx_l2cap_chan_close (net/bluetooth/l2cap_core.c:809) do_enable_set (net/bluetooth/6lowpan.c:1048 net/bluetooth/6lowpan.c:1068) ------ Fixes: `9030582963` ("Bluetooth: 6lowpan: Converting rwlocks to use RCU") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:08:21 -05:00
Pauli Virtanen	e060088db0	Bluetooth: L2CAP: export l2cap_chan_hold for modules l2cap_chan_put() is exported, so export also l2cap_chan_hold() for modules. l2cap_chan_hold() has use case in net/bluetooth/6lowpan.c Signed-off-by: Pauli Virtanen <pav@iki.fi> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:08:00 -05:00
Pauli Virtanen	b454505bf5	Bluetooth: 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion Bluetooth 6lowpan.c confuses BDADDR_LE and ADDR_LE_DEV address types, e.g. debugfs "connect" command takes the former, and "disconnect" and "connect" to already connected device take the latter. This is due to using same value both for l2cap_chan_connect and hci_conn_hash_lookup_le which take different dst_type values. Fix address type passed to hci_conn_hash_lookup_le(). Retain the debugfs API difference between "connect" and "disconnect" commands since it's been like this since 2015 and nobody apparently complained. Fixes: `f5ad4ffceb` ("Bluetooth: 6lowpan: Use hci_conn_hash_lookup_le() when possible") Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:07:41 -05:00
Pauli Virtanen	3b78f50918	Bluetooth: 6lowpan: reset link-local header on ipv6 recv path Bluetooth 6lowpan.c netdev has header_ops, so it must set link-local header for RX skb, otherwise things crash, eg. with AF_PACKET SOCK_RAW Add missing skb_reset_mac_header() for uncompressed ipv6 RX path. For the compressed one, it is done in lowpan_header_decompress(). Log: (BlueZ 6lowpan-tester Client Recv Raw - Success) ------ kernel BUG at net/core/skbuff.c:212! Call Trace: <IRQ> ... packet_rcv (net/packet/af_packet.c:2152) ... <TASK> __local_bh_enable_ip (kernel/softirq.c:407) netif_rx (net/core/dev.c:5648) chan_recv_cb (net/bluetooth/6lowpan.c:294 net/bluetooth/6lowpan.c:359) ------ Fixes: `18722c2470` ("Bluetooth: Enable 6LoWPAN support for BT LE devices") Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:07:21 -05:00
Raphael Pinsonneault-Thibeault	23d22f2f71	Bluetooth: btusb: reorder cleanup in btusb_disconnect to avoid UAF There is a KASAN: slab-use-after-free read in btusb_disconnect(). Calling "usb_driver_release_interface(&btusb_driver, data->intf)" will free the btusb data associated with the interface. The same data is then used later in the function, hence the UAF. Fix by moving the accesses to btusb data to before the data is free'd. Reported-by: syzbot+2fc81b50a4f8263a159b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2fc81b50a4f8263a159b Tested-by: syzbot+2fc81b50a4f8263a159b@syzkaller.appspotmail.com Fixes: `fd913ef7ce` ("Bluetooth: btusb: Add out-of-band wakeup support") Signed-off-by: Raphael Pinsonneault-Thibeault <rpthibeault@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:07:01 -05:00
Pauli Virtanen	55fb52ffdd	Bluetooth: MGMT: cancel mesh send timer when hdev removed mesh_send_done timer is not canceled when hdev is removed, which causes crash if the timer triggers after hdev is gone. Cancel the timer when MGMT removes the hdev, like other MGMT timers. Should fix the BUG: sporadically seen by BlueZ test bot (in "Mesh - Send cancel - 1" test). Log: ------ BUG: KASAN: slab-use-after-free in run_timer_softirq+0x76b/0x7d0 ... Freed by task 36: kasan_save_stack+0x24/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3a/0x60 __kasan_slab_free+0x43/0x70 kfree+0x103/0x500 device_release+0x9a/0x210 kobject_put+0x100/0x1e0 vhci_release+0x18b/0x240 ------ Fixes: `b338d91703` ("Bluetooth: Implement support for Mesh") Link: https://lore.kernel.org/linux-bluetooth/67364c09.0c0a0220.113cba.39ff@mx.google.com/ Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-11-10 16:00:44 -05:00
Martin KaFai Lau	67f4cfb530	Merge branch 'net-smc-introduce-smc_hs_ctrl' D. Wythe says: ==================== net/smc: Introduce smc_hs_ctrl This patch aims to introduce BPF injection capabilities for SMC and includes a self-test to ensure code stability. Since the SMC protocol isn't ideal for every situation, especially short-lived ones, most applications can't guarantee the absence of such scenarios. Consequently, applications may need specific strategies to decide whether to use SMC. For example, an application might limit SMC usage to certain IP addresses or ports. To maintain the principle of transparent replacement, we want applications to remain unaffected even if they need specific SMC strategies. In other words, they should not require recompilation of their code. Additionally, we need to ensure the scalability of strategy implementation. While using socket options or sysctl might be straightforward, it could complicate future expansions. Fortunately, BPF addresses these concerns effectively. Users can write their own strategies in eBPF to determine whether to use SMC, and they can easily modify those strategies in the future. This is a rework of the series from [1]. Changes since [1] are limited to the SMC parts: 1. Rename smc_ops to smc_hs_ctrl and change interface name. 2. Squash SMC patches, removing standalone non-BPF hook capability. 3. Fix typos [1]: https://lore.kernel.org/bpf/20250123015942.94810-1-alibuda@linux.alibaba.com/#t v2 -> v1: - Removed the fixes patch, which have already been merged on current branch. - Fixed compilation warning of smc_call_hsbpf() when CONFIG_SMC_HS_CTRL_BPF is not enabled. - Changed the default value of CONFIG_SMC_HS_CTRL_BPF to Y. - Fix typo and renamed some variables v3 -> v2: - Removed the libbpf patch, which have already been merged on current branch. - Fixed sparse warning of smc_call_hsbpf() and xchg(). v4 -> v3: - Rebased on latest bpf-next, updated SMC loopback config from SMC_LO to DIBS_LO per upstream changes. v5 -> v4: - Removed the redundant sk parameter from smc_call_hsbpf - Reject registration when bpf_link is set, link support will be added in the future. - Updated selftests with new test heplers. ==================== Link: https://patch.msgid.link/20251107035632.115950-1-alibuda@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-11-10 12:00:47 -08:00
D. Wythe	beb3c67297	bpf/selftests: Add selftest for bpf_smc_hs_ctrl This tests introduces a tiny smc_hs_ctrl for filtering SMC connections based on IP pairs, and also adds a realistic topology model to verify it. Also, we can only use SMC loopback under CI test, so an additional configuration needs to be enabled. Follow the steps below to run this test. make -C tools/testing/selftests/bpf cd tools/testing/selftests/bpf sudo ./test_progs -t smc Results shows: Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20251107035632.115950-4-alibuda@linux.alibaba.com	2025-11-10 12:00:45 -08:00
D. Wythe	15f295f556	net/smc: bpf: Introduce generic hook for handshake flow The introduction of IPPROTO_SMC enables eBPF programs to determine whether to use SMC based on the context of socket creation, such as network namespaces, PID and comm name, etc. As a subsequent enhancement, to introduce a new generic hook that allows decisions on whether to use SMC or not at runtime, including but not limited to local/remote IP address or ports. User can write their own implememtion via bpf_struct_ops now to choose whether to use SMC or not before TCP 3rd handshake to be comleted. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Link: https://patch.msgid.link/20251107035632.115950-3-alibuda@linux.alibaba.com	2025-11-10 11:19:41 -08:00
D. Wythe	07c428ece3	bpf: Export necessary symbols for modules with struct_ops Exports three necessary symbols for implementing struct_ops with tristate subsystem. To hold or release refcnt of struct_ops refcnt by inline funcs bpf_try_module_get and bpf_module_put which use bpf_struct_ops_get(put) conditionally. And to copy obj name from one to the other with effective checks by bpf_obj_name_cpy. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251107035632.115950-2-alibuda@linux.alibaba.com	2025-11-10 11:07:34 -08:00
Martin KaFai Lau	abd0c0f6aa	Merge branch 'make-tc-bpf-helpers-preserve-skb-metadata' Jakub Sitnicki says: ==================== Make TC BPF helpers preserve skb metadata Changes in v4: - Fix copy-paste bug in check_metadata() test helper (AI review) - Add "out of scope" section (at the bottom) - Link to v3: https://lore.kernel.org/r/20251026-skb-meta-rx-path-v3-0-37cceebb95d3@cloudflare.com Changes in v3: - Use the already existing BPF_STREAM_STDERR const in tests (Martin) - Unclone skb head on bpf_dynptr_write to skb metadata (patch 3) (Martin) - Swap order of patches 1 & 2 to refer to skb_postpush_data_move() in docs - Mention in skb_data_move() docs how to move just the metadata - Note in pskb_expand_head() docs to move metadata after skb_push() (Jakub) - Link to v2: https://lore.kernel.org/r/20251019-skb-meta-rx-path-v2-0-f9a58f3eb6d6@cloudflare.com Changes in v2: - Tweak WARN_ON_ONCE check in skb_data_move() (patch 2) - Convert all tests to verify skb metadata in BPF (patches 9-10) - Add test coverage for modified BPF helpers (patches 12-15) - Link to RFCv1: https://lore.kernel.org/r/20250929-skb-meta-rx-path-v1-0-de700a7ab1cb@cloudflare.com This patch set continues our work [1] to allow BPF programs and user-space applications to attach multiple bytes of metadata to packets via the XDP/skb metadata area. The focus of this patch set it to ensure that skb metadata remains intact when packets pass through a chain of TC BPF programs that call helpers which operate on skb head. Currently, several helpers that either adjust the skb->data pointer or reallocate skb->head do not preserve metadata at its expected location, that is immediately in front of the MAC header. These are: - bpf_skb_adjust_room - bpf_skb_change_head - bpf_skb_change_proto - bpf_skb_change_tail - bpf_skb_vlan_pop - bpf_skb_vlan_push In TC BPF context, metadata must be moved whenever skb->data changes to keep the skb->data_meta pointer valid. I don't see any way around it. Creative ideas how to avoid that would be very welcome. With that in mind, we can patch the helpers in at least two different ways: 1. Integrate metadata move into header move Replace the existing memmove, which follows skb_push/pull, with a helper that moves both headers and metadata in a single call. This avoids an extra memmove but reduces transparency. skb_pull(skb, len); - memmove(skb->data, skb->data - len, n); + skb_postpull_data_move(skb, len, n); skb->mac_header += len; skb_push(skb, len) - memmove(skb->data, skb->data + len, n); + skb_postpush_data_move(skb, len, n); skb->mac_header -= len; 2. Move metadata separately Add a dedicated metadata move after the header move. This is more explicit but costs an additional memmove. skb_pull(skb, len); memmove(skb->data, skb->data - len, n); + skb_metadata_postpull_move(skb, len); skb->mac_header += len; skb_push(skb, len) + skb_metadata_postpush_move(skb, len); memmove(skb->data, skb->data + len, n); skb->mac_header -= len; This patch set implements option (1), expecting that "you can have just one memmove" will be the most obvious feedback, while readability is a, somewhat subjective, matter of taste, which I don't claim to have ;-) The structure of the patch set is as follows: - patches 1-4 prepare ground for safe-proofing the BPF helpers - patches 5-9 modify the BPF helpers to preserve skb metadata - patches 10-11 prepare ground for metadata tests with BPF helper calls - patches 12-16 adapt and expand tests to cover the made changes Out of scope for this series: - safe-proofing tunnel & tagging devices - VLAN, GRE, ... (next in line, in development preview at [2]) - metadata access after packet foward (to do after Rx path - once metadata reliably reaches sk_filter) Thanks, -jkbs [1] https://lore.kernel.org/all/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com/ [2] https://github.com/jsitnicki/linux/commits/skb-meta/safeproof-netdevs/ ==================== Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-0-5ceb08a9b37b@cloudflare.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-11-10 10:52:34 -08:00
Jakub Sitnicki	d2c5cca3fb	selftests/bpf: Cover skb metadata access after bpf_skb_change_proto Add a test to verify that skb metadata remains accessible after calling bpf_skb_change_proto(), which modifies packet headroom to accommodate different IP header sizes. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-16-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:33 -08:00
Jakub Sitnicki	85d454afef	selftests/bpf: Cover skb metadata access after change_head/tail helper Add a test to verify that skb metadata remains accessible after calling bpf_skb_change_head() and bpf_skb_change_tail(), which modify packet headroom/tailroom and can trigger head reallocation. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-15-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:33 -08:00
Jakub Sitnicki	29960e635b	selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room Add a test to verify that skb metadata remains accessible after calling bpf_skb_adjust_room(), which modifies the packet headroom and can trigger head reallocation. The helper expects an Ethernet frame carrying an IP packet so switch test packet identification by source MAC address since we can no longer rely on Ethernet proto being set to zero. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-14-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:33 -08:00
Jakub Sitnicki	354d020c29	selftests/bpf: Cover skb metadata access after vlan push/pop helper Add a test to verify that skb metadata remains accessible after calling bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet headroom. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-13-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:32 -08:00
Jakub Sitnicki	1e1357fde8	selftests/bpf: Expect unclone to preserve skb metadata Since pskb_expand_head() no longer clears metadata on unclone, update tests for cloned packets to expect metadata to remain intact. Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests. Creating an r/w dynptr slice is sufficient to trigger an unclone in the prologue, so remove the extraneous writes to the data/meta slice. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-12-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:32 -08:00
Jakub Sitnicki	9ef9ac15a5	selftests/bpf: Dump skb metadata on verification failure Add diagnostic output when metadata verification fails to help with troubleshooting test failures. Introduce a check_metadata() helper that prints both expected and received metadata to the BPF program's stderr stream on mismatch. The userspace test reads and dumps this stream on failure. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-11-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:32 -08:00
Jakub Sitnicki	967534e57c	selftests/bpf: Verify skb metadata in BPF instead of userspace Move metadata verification into the BPF TC programs. Previously, userspace read metadata from a map and verified it once at test end. Now TC programs compare metadata directly using __builtin_memcmp() and set a test_pass flag. This enables verification at multiple points during test execution rather than a single final check. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-10-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:32 -08:00
Jakub Sitnicki	fb206fc312	bpf: Make bpf_skb_change_head helper metadata-safe Although bpf_skb_change_head() doesn't move packet data after skb_push(), skb metadata still needs to be relocated. Use the dedicated helper to handle it. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-9-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:32 -08:00
Jakub Sitnicki	8cfc172ce2	bpf: Make bpf_skb_change_proto helper metadata-safe bpf_skb_change_proto reuses the same headroom operations as bpf_skb_adjust_room, already updated to handle metadata safely. The remaining step is to ensure that there is sufficient headroom to accommodate metadata on skb_push(). Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-8-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:32 -08:00
Jakub Sitnicki	be83105d38	bpf: Make bpf_skb_adjust_room metadata-safe bpf_skb_adjust_room() may push or pull bytes from skb->data. In both cases, skb metadata must be moved accordingly to stay accessible. Replace existing memmove() calls, which only move payload, with a helper that also handles metadata. Reserve enough space for metadata to fit after skb_push. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-7-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:32 -08:00
Jakub Sitnicki	55ffc98b44	bpf: Make bpf_skb_vlan_push helper metadata-safe Use the metadata-aware helper to move packet bytes after skb_push(), ensuring metadata remains valid after calling the BPF helper. Also, take care to reserve sufficient headroom for metadata to fit. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-6-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:31 -08:00
Jakub Sitnicki	efd35c2623	bpf: Make bpf_skb_vlan_pop helper metadata-safe Use the metadata-aware helper to move packet bytes after skb_pull(), ensuring metadata remains valid after calling the BPF helper. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-5-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:31 -08:00
Jakub Sitnicki	b85be58e2f	vlan: Make vlan_remove_tag return nothing All callers ignore the return value. Prepare to reorder memmove() after skb_pull() which is a common pattern. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-4-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:31 -08:00
Jakub Sitnicki	f38499ff45	bpf: Unclone skb head on bpf_dynptr_write to skb metadata Currently bpf_dynptr_from_skb_meta() marks the dynptr as read-only when the skb is cloned, preventing writes to metadata. Remove this restriction and unclone the skb head on bpf_dynptr_write() to metadata, now that the metadata is preserved during uncloning. This makes metadata dynptr consistent with skb dynptr, allowing writes regardless of whether the skb is cloned. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-3-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:31 -08:00
Jakub Sitnicki	290fc0be09	net: Preserve metadata on pskb_expand_head pskb_expand_head() copies headroom, including skb metadata, into the newly allocated head, but then clears the metadata. As a result, metadata is lost when BPF helpers trigger an skb head reallocation. Let the skb metadata remain in the newly created copy of head. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-2-5ceb08a9b37b@cloudflare.com	2025-11-10 10:52:31 -08:00

... 5 6 7 8 9 ...

1399042 Commits