linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-05 05:57:16 -04:00

Author	SHA1	Message	Date
Linus Torvalds	ba36dd5ee6	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Mark migrate_disable/enable() as always_inline to avoid issues with partial inlining (Yonghong Song) - Fix powerpc stack register definition in libbpf bpf_tracing.h (Andrii Nakryiko) - Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann) - Conditionally include dynptr copy kfuncs (Malin Jonsson) - Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal) - Do not audit capability check in x86 do_jit() (Ondrej Mosnacek) - Fix arm64 JIT of BPF_ST insn when it writes into arena memory (Puranjay Mohan) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf/arm64: Fix BPF_ST into arena memory bpf: Make migrate_disable always inline to avoid partial inlining bpf: Reject negative head_room in __bpf_skb_change_head bpf: Conditionally include dynptr copy kfuncs libbpf: Fix powerpc's stack register definition in bpf_tracing.h bpf: Do not audit capability check in do_jit() bpf: Sync pending IRQ work before freeing ring buffer	2025-10-31 18:22:26 -07:00
Ranganath V N	51e5ad549c	net: sctp: fix KMSAN uninit-value in sctp_inq_pop Fix an issue detected by syzbot: KMSAN reported an uninitialized-value access in sctp_inq_pop BUG: KMSAN: uninit-value in sctp_inq_pop The issue is actually caused by skb trimming via sk_filter() in sctp_rcv(). In the reproducer, skb->len becomes 1 after sk_filter(), which bypassed the original check: if (skb->len < sizeof(struct sctphdr) + sizeof(struct sctp_chunkhdr) + skb_transport_offset(skb)) To handle this safely, a new check should be performed after sk_filter(). Reported-by: syzbot+d101e12bccd4095460e7@syzkaller.appspotmail.com Tested-by: syzbot+d101e12bccd4095460e7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d101e12bccd4095460e7 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Ranganath V N <vnranganath.20@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251026-kmsan_fix-v3-1-2634a409fa5f@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-30 11:21:05 +01:00
Shivaji Kant	6a2108c780	net: devmem: refresh devmem TX dst in case of route invalidation The zero-copy Device Memory (Devmem) transmit path relies on the socket's route cache (`dst_entry`) to validate that the packet is being sent via the network device to which the DMA buffer was bound. However, this check incorrectly fails and returns `-ENODEV` if the socket's route cache entry (`dst`) is merely missing or expired (`dst == NULL`). This scenario is observed during network events, such as when flow steering rules are deleted, leading to a temporary route cache invalidation. This patch fixes -ENODEV error for `net_devmem_get_binding()` by doing the following: 1. It attempts to rebuild the route via `rebuild_header()` if the route is initially missing (`dst == NULL`). This allows the TCP/IP stack to recover from transient route cache misses. 2. It uses `rcu_read_lock()` and `dst_dev_rcu()` to safely access the network device pointer (`dst_dev`) from the route, preventing use-after-free conditions if the device is concurrently removed. 3. It maintains the critical safety check by validating that the retrieved destination device (`dst_dev`) is exactly the device registered in the Devmem binding (`binding->dev`). These changes prevent unnecessary ENODEV failures while maintaining the critical safety requirement that the Devmem resources are only used on the bound network device. Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: Vedant Mathur <vedantmathur@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: `bd61848900` ("net: devmem: Implement TX path") Signed-off-by: Shivaji Kant <shivajikant@google.com> Link: https://patch.msgid.link/20251029065420.3489943-1-shivajikant@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 19:23:21 -07:00
Shahar Shitrit	c15d5c62ab	net: tls: Cancel RX async resync request on rcd_delta overflow When a netdev issues a RX async resync request for a TLS connection, the TLS module handles it by logging record headers and attempting to match them to the tcp_sn provided by the device. If a match is found, the TLS module approves the tcp_sn for resynchronization. While waiting for a device response, the TLS module also increments rcd_delta each time a new TLS record is received, tracking the distance from the original resync request. However, if the device response is delayed or fails (e.g due to unstable connection and device getting out of tracking, hardware errors, resource exhaustion etc.), the TLS module keeps logging and incrementing, which can lead to a WARN() when rcd_delta exceeds the threshold. To address this, introduce tls_offload_rx_resync_async_request_cancel() to explicitly cancel resync requests when a device response failure is detected. Call this helper also as a final safeguard when rcd_delta crosses its threshold, as reaching this point implies that earlier cancellation did not occur. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761508983-937977-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 18:32:18 -07:00
Jakub Kicinski	e98cda764a	Merge tag 'nf-25-10-29' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net 1) its not possible to attach conntrack labels via ctnetlink unless one creates a dummy 'ct labels set' rule in nftables. This is an oversight, the 'ruleset tests presence, userspace (netlink) sets' use-case is valid and should 'just work'. Always broken since this got added in Linux 4.7. 2) nft_connlimit reads count value without holding the relevant lock, add a READ_ONCE annotation. From Fernando Fernandez Mancera. 3) There is a long-standing bug (since 4.12) in nftables helper infra when NAT is in use: if the helper gets assigned after the nat binding was set up, we fail to initialise the 'seqadj' extension, which is needed in case NAT payload rewrites need to add (or remove) from the packet payload. Fix from Andrii Melnychenko. * tag 'nf-25-10-29' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_ct: add seqadj extension for natted connections netfilter: nft_connlimit: fix possible data race on connection count netfilter: nft_ct: enable labels for get case too ==================== Link: https://patch.msgid.link/20251029135617.18274-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 18:25:12 -07:00
Paolo Abeni	fe11dfa109	mptcp: zero window probe mib Explicitly account for MPTCP-level zero windows probe, to catch hopefully earlier issues alike the one addressed by the previous patch. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251028-net-mptcp-send-timeout-v1-4-38ffff5a9ec8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 17:44:28 -07:00
Paolo Abeni	a824084b98	mptcp: restore window probe Since commit `72377ab2d6` ("mptcp: more conservative check for zero probes") the MPTCP-level zero window probe check is always disabled, as the TCP-level write queue always contains at least the newly allocated skb. Refine the relevant check tacking in account that the above condition and that such skb can have zero length. Fixes: `72377ab2d6` ("mptcp: more conservative check for zero probes") Cc: stable@vger.kernel.org Reported-by: Geliang Tang <geliang@kernel.org> Closes: https://lore.kernel.org/d0a814c364e744ca6b836ccd5b6e9146882e8d42.camel@kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251028-net-mptcp-send-timeout-v1-3-38ffff5a9ec8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 17:44:28 -07:00
Paolo Abeni	8e04ce45a8	mptcp: fix MSG_PEEK stream corruption If a MSG_PEEK \| MSG_WAITALL read operation consumes all the bytes in the receive queue and recvmsg() need to waits for more data - i.e. it's a blocking one - upon arrival of the next packet the MPTCP protocol will start again copying the oldest data present in the receive queue, corrupting the data stream. Address the issue explicitly tracking the peeked sequence number, restarting from the last peeked byte. Fixes: `ca4fb89257` ("mptcp: add MSG_PEEK support") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251028-net-mptcp-send-timeout-v1-2-38ffff5a9ec8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 17:44:28 -07:00
Paolo Abeni	27b0e701d3	mptcp: drop bogus optimization in __mptcp_check_push() Accessing the transmit queue without owning the msk socket lock is inherently racy, hence __mptcp_check_push() could actually quit early even when there is pending data. That in turn could cause unexpected tx lock and timeout. Dropping the early check avoids the race, implicitly relaying on later tests under the relevant lock. With such change, all the other mptcp_send_head() call sites are now under the msk socket lock and we can additionally drop the now unneeded annotation on the transmit head pointer accesses. Fixes: `6e628cd3a8` ("mptcp: use mptcp release_cb for delayed tasks") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251028-net-mptcp-send-timeout-v1-1-38ffff5a9ec8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 17:44:28 -07:00
Eric Dumazet	aa251c8463	tcp: fix too slow tcp_rcvbuf_grow() action While the blamed commits apparently avoided an overshoot, they also limited how fast a sender can increase BDP at each RTT. This is not exactly a revert, we do not add the 16 * tp->advmss cushion we had, and we are keeping the out_of_order_queue contribution. Do the same in mptcp_rcvbuf_grow(). Tested: emulated 50ms rtt (tcp_stream --tcp-tx-delay 50000), cubic 20 second flow. net.ipv4.tcp_rmem set to "4096 131072 67000000" perf record -a -e tcp:tcp_rcvbuf_grow sleep 20 perf script Before: We can see we fail to roughly double RWIN at each RTT. Sender is RWIN limited while CWND is ramping up (before getting tcp_wmem limited). tcp_stream 33793 [010] 825.717525: tcp:tcp_rcvbuf_grow: time=100869 rtt_us=50428 copied=49152 inq=0 space=40960 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=103970 window_clamp=112128 rcv_wnd=106496 tcp_stream 33793 [010] 825.768966: tcp:tcp_rcvbuf_grow: time=51447 rtt_us=50362 copied=86016 inq=0 space=49152 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=107474 window_clamp=112128 rcv_wnd=106496 tcp_stream 33793 [010] 825.821539: tcp:tcp_rcvbuf_grow: time=52577 rtt_us=50243 copied=114688 inq=0 space=86016 ooo=0 scaling_ratio=219 rcvbuf=201096 rcv_ssthresh=167377 window_clamp=172031 rcv_wnd=167936 tcp_stream 33793 [010] 825.871781: tcp:tcp_rcvbuf_grow: time=50248 rtt_us=50237 copied=167936 inq=0 space=114688 ooo=0 scaling_ratio=219 rcvbuf=268129 rcv_ssthresh=224722 window_clamp=229375 rcv_wnd=225280 tcp_stream 33793 [010] 825.922475: tcp:tcp_rcvbuf_grow: time=50698 rtt_us=50183 copied=241664 inq=0 space=167936 ooo=0 scaling_ratio=219 rcvbuf=392617 rcv_ssthresh=331217 window_clamp=335871 rcv_wnd=323584 tcp_stream 33793 [010] 825.973326: tcp:tcp_rcvbuf_grow: time=50855 rtt_us=50213 copied=339968 inq=0 space=241664 ooo=0 scaling_ratio=219 rcvbuf=564986 rcv_ssthresh=478674 window_clamp=483327 rcv_wnd=462848 tcp_stream 33793 [010] 826.023970: tcp:tcp_rcvbuf_grow: time=50647 rtt_us=50248 copied=491520 inq=0 space=339968 ooo=0 scaling_ratio=219 rcvbuf=794811 rcv_ssthresh=671778 window_clamp=679935 rcv_wnd=651264 tcp_stream 33793 [010] 826.074612: tcp:tcp_rcvbuf_grow: time=50648 rtt_us=50227 copied=700416 inq=0 space=491520 ooo=0 scaling_ratio=219 rcvbuf=1149124 rcv_ssthresh=974881 window_clamp=983039 rcv_wnd=942080 tcp_stream 33793 [010] 826.125452: tcp:tcp_rcvbuf_grow: time=50845 rtt_us=50225 copied=987136 inq=8192 space=700416 ooo=0 scaling_ratio=219 rcvbuf=1637502 rcv_ssthresh=1392674 window_clamp=1400831 rcv_wnd=1339392 tcp_stream 33793 [010] 826.175698: tcp:tcp_rcvbuf_grow: time=50250 rtt_us=50198 copied=1347584 inq=0 space=978944 ooo=0 scaling_ratio=219 rcvbuf=2288672 rcv_ssthresh=1949729 window_clamp=1957887 rcv_wnd=1945600 tcp_stream 33793 [010] 826.225947: tcp:tcp_rcvbuf_grow: time=50252 rtt_us=50240 copied=1945600 inq=0 space=1347584 ooo=0 scaling_ratio=219 rcvbuf=3150516 rcv_ssthresh=2687010 window_clamp=2695167 rcv_wnd=2691072 tcp_stream 33793 [010] 826.276175: tcp:tcp_rcvbuf_grow: time=50233 rtt_us=50224 copied=2691072 inq=0 space=1945600 ooo=0 scaling_ratio=219 rcvbuf=4548617 rcv_ssthresh=3883041 window_clamp=3891199 rcv_wnd=3887104 tcp_stream 33793 [010] 826.326403: tcp:tcp_rcvbuf_grow: time=50233 rtt_us=50229 copied=3887104 inq=0 space=2691072 ooo=0 scaling_ratio=219 rcvbuf=6291456 rcv_ssthresh=5370482 window_clamp=5382144 rcv_wnd=5373952 tcp_stream 33793 [010] 826.376723: tcp:tcp_rcvbuf_grow: time=50323 rtt_us=50218 copied=5373952 inq=0 space=3887104 ooo=0 scaling_ratio=219 rcvbuf=9087658 rcv_ssthresh=7755537 window_clamp=7774207 rcv_wnd=7757824 tcp_stream 33793 [010] 826.426991: tcp:tcp_rcvbuf_grow: time=50274 rtt_us=50196 copied=7757824 inq=180224 space=5373952 ooo=0 scaling_ratio=219 rcvbuf=12563759 rcv_ssthresh=10729233 window_clamp=10747903 rcv_wnd=10575872 tcp_stream 33793 [010] 826.477229: tcp:tcp_rcvbuf_grow: time=50241 rtt_us=50078 copied=10731520 inq=180224 space=7577600 ooo=0 scaling_ratio=219 rcvbuf=17715667 rcv_ssthresh=15136529 window_clamp=15155199 rcv_wnd=14983168 tcp_stream 33793 [010] 826.527482: tcp:tcp_rcvbuf_grow: time=50258 rtt_us=50153 copied=15138816 inq=360448 space=10551296 ooo=0 scaling_ratio=219 rcvbuf=24667870 rcv_ssthresh=21073410 window_clamp=21102591 rcv_wnd=20766720 tcp_stream 33793 [010] 826.577712: tcp:tcp_rcvbuf_grow: time=50234 rtt_us=50228 copied=21073920 inq=0 space=14778368 ooo=0 scaling_ratio=219 rcvbuf=34550339 rcv_ssthresh=29517041 window_clamp=29556735 rcv_wnd=29519872 tcp_stream 33793 [010] 826.627982: tcp:tcp_rcvbuf_grow: time=50275 rtt_us=50220 copied=29519872 inq=540672 space=21073920 ooo=0 scaling_ratio=219 rcvbuf=49268707 rcv_ssthresh=42090625 window_clamp=42147839 rcv_wnd=41627648 tcp_stream 33793 [010] 826.678274: tcp:tcp_rcvbuf_grow: time=50296 rtt_us=50185 copied=42053632 inq=761856 space=28979200 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57238168 window_clamp=57316406 rcv_wnd=56606720 tcp_stream 33793 [010] 826.728627: tcp:tcp_rcvbuf_grow: time=50357 rtt_us=50128 copied=43913216 inq=851968 space=41291776 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56524800 tcp_stream 33793 [010] 827.131364: tcp:tcp_rcvbuf_grow: time=50239 rtt_us=50127 copied=43843584 inq=655360 space=43061248 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56696832 tcp_stream 33793 [010] 827.181613: tcp:tcp_rcvbuf_grow: time=50254 rtt_us=50115 copied=43843584 inq=524288 space=43188224 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56807424 tcp_stream 33793 [010] 828.339635: tcp:tcp_rcvbuf_grow: time=50283 rtt_us=50110 copied=43843584 inq=458752 space=43319296 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56864768 tcp_stream 33793 [010] 828.440350: tcp:tcp_rcvbuf_grow: time=50404 rtt_us=50099 copied=43843584 inq=393216 space=43384832 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56922112 tcp_stream 33793 [010] 829.195106: tcp:tcp_rcvbuf_grow: time=50154 rtt_us=50077 copied=43843584 inq=196608 space=43450368 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=57090048 After: It takes few steps to increase RWIN. Sender is no longer RWIN limited. tcp_stream 50826 [010] 935.634212: tcp:tcp_rcvbuf_grow: time=100788 rtt_us=50315 copied=49152 inq=0 space=40960 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=103970 window_clamp=112128 rcv_wnd=106496 tcp_stream 50826 [010] 935.685642: tcp:tcp_rcvbuf_grow: time=51437 rtt_us=50361 copied=86016 inq=0 space=49152 ooo=0 scaling_ratio=219 rcvbuf=160875 rcv_ssthresh=132969 window_clamp=137623 rcv_wnd=131072 tcp_stream 50826 [010] 935.738299: tcp:tcp_rcvbuf_grow: time=52660 rtt_us=50256 copied=139264 inq=0 space=86016 ooo=0 scaling_ratio=219 rcvbuf=502741 rcv_ssthresh=411497 window_clamp=430079 rcv_wnd=413696 tcp_stream 50826 [010] 935.788544: tcp:tcp_rcvbuf_grow: time=50249 rtt_us=50233 copied=307200 inq=0 space=139264 ooo=0 scaling_ratio=219 rcvbuf=728690 rcv_ssthresh=618717 window_clamp=623371 rcv_wnd=618496 tcp_stream 50826 [010] 935.838796: tcp:tcp_rcvbuf_grow: time=50258 rtt_us=50202 copied=618496 inq=0 space=307200 ooo=0 scaling_ratio=219 rcvbuf=2450338 rcv_ssthresh=1855709 window_clamp=2096187 rcv_wnd=1859584 tcp_stream 50826 [010] 935.889140: tcp:tcp_rcvbuf_grow: time=50347 rtt_us=50166 copied=1261568 inq=0 space=618496 ooo=0 scaling_ratio=219 rcvbuf=4376503 rcv_ssthresh=3725291 window_clamp=3743961 rcv_wnd=3706880 tcp_stream 50826 [010] 935.939435: tcp:tcp_rcvbuf_grow: time=50300 rtt_us=50185 copied=2478080 inq=24576 space=1261568 ooo=0 scaling_ratio=219 rcvbuf=9082648 rcv_ssthresh=7733731 window_clamp=7769921 rcv_wnd=7692288 tcp_stream 50826 [010] 935.989681: tcp:tcp_rcvbuf_grow: time=50251 rtt_us=50221 copied=4915200 inq=114688 space=2453504 ooo=0 scaling_ratio=219 rcvbuf=16574936 rcv_ssthresh=14108110 window_clamp=14179339 rcv_wnd=14024704 tcp_stream 50826 [010] 936.039967: tcp:tcp_rcvbuf_grow: time=50289 rtt_us=50279 copied=9830400 inq=114688 space=4800512 ooo=0 scaling_ratio=219 rcvbuf=32695050 rcv_ssthresh=27896187 window_clamp=27969593 rcv_wnd=27815936 tcp_stream 50826 [010] 936.090172: tcp:tcp_rcvbuf_grow: time=50211 rtt_us=50200 copied=19841024 inq=114688 space=9715712 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57245176 window_clamp=57316406 rcv_wnd=57163776 tcp_stream 50826 [010] 936.140430: tcp:tcp_rcvbuf_grow: time=50262 rtt_us=50197 copied=39501824 inq=114688 space=19726336 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57245176 window_clamp=57316406 rcv_wnd=57163776 tcp_stream 50826 [010] 936.190527: tcp:tcp_rcvbuf_grow: time=50101 rtt_us=50071 copied=43655168 inq=262144 space=39387136 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57032704 tcp_stream 50826 [010] 936.240719: tcp:tcp_rcvbuf_grow: time=50197 rtt_us=50057 copied=43843584 inq=262144 space=43393024 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57032704 tcp_stream 50826 [010] 936.341271: tcp:tcp_rcvbuf_grow: time=50297 rtt_us=50123 copied=43843584 inq=131072 space=43581440 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57147392 tcp_stream 50826 [010] 936.642503: tcp:tcp_rcvbuf_grow: time=50131 rtt_us=50084 copied=43843584 inq=0 space=43712512 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57262080 Fixes: `65c5287892` ("tcp: fix sk_rcvbuf overshoot") Fixes: `e118cdc34d` ("mptcp: rcvbuf auto-tuning improvement") Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/589 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251028-net-tcp-recv-autotune-v3-4-74b43ba4c84c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 17:30:19 -07:00
Eric Dumazet	b1e014a1f3	tcp: add newval parameter to tcp_rcvbuf_grow() This patch has no functional change, and prepares the following one. tcp_rcvbuf_grow() will need to have access to tp->rcvq_space.space old and new values. Change mptcp_rcvbuf_grow() in a similar way. Signed-off-by: Eric Dumazet <edumazet@google.com> [ Moved 'oldval' declaration to the next patch to avoid warnings at build time. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251028-net-tcp-recv-autotune-v3-3-74b43ba4c84c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 17:30:19 -07:00
Paolo Abeni	a6f0459aad	mptcp: fix subflow rcvbuf adjust The mptcp PM can add subflow to the conn_list before tcp_init_transfer(). Calling tcp_rcvbuf_grow() on such subflow is not correct as later init will overwrite the update. Fix the issue calling tcp_rcvbuf_grow() only after init buffer initialization. Fixes: `e118cdc34d` ("mptcp: rcvbuf auto-tuning improvement") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251028-net-tcp-recv-autotune-v3-1-74b43ba4c84c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-29 17:30:18 -07:00
Andrii Melnychenko	90918e3b64	netfilter: nft_ct: add seqadj extension for natted connections Sequence adjustment may be required for FTP traffic with PASV/EPSV modes. due to need to re-write packet payload (IP, port) on the ftp control connection. This can require changes to the TCP length and expected seq / ack_seq. The easiest way to reproduce this issue is with PASV mode. Example ruleset: table inet ftp_nat { ct helper ftp_helper { type "ftp" protocol tcp l3proto inet } chain prerouting { type filter hook prerouting priority 0; policy accept; tcp dport 21 ct state new ct helper set "ftp_helper" } } table ip nat { chain prerouting { type nat hook prerouting priority -100; policy accept; tcp dport 21 dnat ip prefix to ip daddr map { 192.168.100.1 : 192.168.13.2/32 } } chain postrouting { type nat hook postrouting priority 100 ; policy accept; tcp sport 21 snat ip prefix to ip saddr map { 192.168.13.2 : 192.168.100.1/32 } } } Note that the ftp helper gets assigned after the dnat setup. The inverse (nat after helper assign) is handled by an existing check in nf_nat_setup_info() and will not show the problem. Topoloy: +-------------------+ +----------------------------------+ \| FTP: 192.168.13.2 \| <-> \| NAT: 192.168.13.3, 192.168.100.1 \| +-------------------+ +----------------------------------+ \| +-----------------------+ \| Client: 192.168.100.2 \| +-----------------------+ ftp nat changes do not work as expected in this case: Connected to 192.168.100.1. [..] ftp> epsv EPSV/EPRT on IPv4 off. ftp> ls 227 Entering passive mode (192,168,100,1,209,129). 421 Service not available, remote server has closed connection. Kernel logs: Missing nfct_seqadj_ext_add() setup call WARNING: CPU: 1 PID: 0 at net/netfilter/nf_conntrack_seqadj.c:41 [..] __nf_nat_mangle_tcp_packet+0x100/0x160 [nf_nat] nf_nat_ftp+0x142/0x280 [nf_nat_ftp] help+0x4d1/0x880 [nf_conntrack_ftp] nf_confirm+0x122/0x2e0 [nf_conntrack] nf_hook_slow+0x3c/0xb0 .. Fix this by adding the required extension when a conntrack helper is assigned to a connection that has a nat binding. Fixes: `1a64edf54f` ("netfilter: nft_ct: add helper set support") Signed-off-by: Andrii Melnychenko <a.melnychenko@vyos.io> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-10-29 14:47:59 +01:00
Fernando Fernandez Mancera	8d96dfdcab	netfilter: nft_connlimit: fix possible data race on connection count nft_connlimit_eval() reads priv->list->count to check if the connection limit has been exceeded. This value is being read without a lock and can be modified by a different process. Use READ_ONCE() for correctness. Fixes: `df4a902509` ("netfilter: nf_conncount: merge lookup and add functions") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-10-29 14:47:59 +01:00
Florian Westphal	514f1dc8f2	netfilter: nft_ct: enable labels for get case too conntrack labels can only be set when the conntrack has been created with the "ctlabel" extension. For older iptables (connlabel match), adding an "-m connlabel" rule turns on the ctlabel extension allocation for all future conntrack entries. For nftables, its only enabled for 'ct label set foo', but not for 'ct label foo' (i.e. check). But users could have a ruleset that only checks for presence, and rely on userspace to set a label bit via ctnetlink infrastructure. This doesn't work without adding a dummy 'ct label set' rule. We could also enable extension infra for the first (failing) ctnetlink request, but unlike ruleset we would not be able to disable the extension again. Therefore turn on ctlabel extension allocation if an nftables ruleset checks for a connlabel too. Fixes: `1ad8f48df6` ("netfilter: nftables: add connlabel set support") Reported-by: Antonio Ojea <aojea@google.com> Closes: https://lore.kernel.org/netfilter-devel/aPi_VdZpVjWujZ29@strlen.de/ Signed-off-by: Florian Westphal <fw@strlen.de>	2025-10-29 14:47:59 +01:00
Daniel Borkmann	2cbb259ec4	bpf: Reject negative head_room in __bpf_skb_change_head Yinhao et al. recently reported: Our fuzzing tool was able to create a BPF program which triggered the below BUG condition inside pskb_expand_head. [ 23.016047][T10006] kernel BUG at net/core/skbuff.c:2232! [...] [ 23.017301][T10006] RIP: 0010:pskb_expand_head+0x1519/0x1530 [...] [ 23.021249][T10006] Call Trace: [ 23.021387][T10006] <TASK> [ 23.021507][T10006] ? __pfx_pskb_expand_head+0x10/0x10 [ 23.021725][T10006] __bpf_skb_change_head+0x22a/0x520 [ 23.021939][T10006] bpf_skb_change_head+0x34/0x1b0 [ 23.022143][T10006] ___bpf_prog_run+0xf70/0xb670 [ 23.022342][T10006] __bpf_prog_run32+0xed/0x140 [...] The problem is that in __bpf_skb_change_head() we need to reject a negative head_room as otherwise this propagates all the way to the pskb_expand_head() from skb_cow(). For example, if the BPF test infra passes a skb with gso_skb:1 to the BPF helper with a negative head_room of -22, then this gets passed into skb_cow(). __skb_cow() in this example calculates a delta of -86 which gets aligned to -64, and then triggers BUG_ON(nhead < 0). Thus, reject malformed negative input. Fixes: `3a0af8fd61` ("bpf: BPF for lightweight tunnel infrastructure") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Link: https://patch.msgid.link/20251023125532.182262-1-daniel@iogearbox.net	2025-10-28 14:54:56 -07:00
Jakub Kicinski	855e43164e	Merge tag 'batadv-net-pullrequest-20251024' of https://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - release references to inactive interfaces, by Sven Eckelmann * tag 'batadv-net-pullrequest-20251024' of https://git.open-mesh.org/linux-merge: batman-adv: Release references to inactive interfaces ==================== Link: https://patch.msgid.link/20251024091150.231141-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-27 18:00:54 -07:00
Johan Hovold	91d35ec9b3	Bluetooth: rfcomm: fix modem control handling The RFCOMM driver confuses the local and remote modem control signals, which specifically means that the reported DTR and RTS state will instead reflect the remote end (i.e. DSR and CTS). This issue dates back to the original driver (and a follow-on update) merged in 2002, which resulted in a non-standard implementation of TIOCMSET that allowed controlling also the TS07.10 IC and DV signals by mapping them to the RI and DCD input flags, while TIOCMGET failed to return the actual state of DTR and RTS. Note that the bogus control of input signals in tiocmset() is just dead code as those flags will have been masked out by the tty layer since 2003. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:32:19 -04:00
Luiz Augusto von Dentz	751463ceef	Bluetooth: hci_core: Fix tracking of periodic advertisement Periodic advertising enabled flag cannot be tracked by the enabled flag since advertising and periodic advertising each can be enabled/disabled separately from one another causing the states to be inconsistent when for example an advertising set is disabled its enabled flag is set to false which is then used for periodic which has not being disabled. Fixes: `eca0ae4aea` ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:31:59 -04:00
Luiz Augusto von Dentz	857eb0fabc	Bluetooth: hci_conn: Fix connection cleanup with BIG with 2 or more BIS This fixes bis_cleanup not considering connections in BT_OPEN state before attempting to remove the BIG causing the following error: btproxy[20110]: < HCI Command: LE Terminate Broadcast Isochronous Group (0x08\|0x006a) plen 2 BIG Handle: 0x01 Reason: Connection Terminated By Local Host (0x16) > HCI Event: Command Status (0x0f) plen 4 LE Terminate Broadcast Isochronous Group (0x08\|0x006a) ncmd 1 Status: Unknown Advertising Identifier (0x42) Fixes: `fa224d0c09` ("Bluetooth: ISO: Reassociate a socket with an active BIS") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>	2025-10-24 10:31:41 -04:00
Luiz Augusto von Dentz	c403da5e98	Bluetooth: ISO: Fix another instance of dst_type handling Socket dst_type cannot be directly assigned to hci_conn->type since there domain is different which may lead to the wrong address type being used. Fixes: `6a5ad251b7` ("Bluetooth: ISO: Fix possible circular locking dependency") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:30:50 -04:00
Frédéric Danis	76e20da0bd	Revert "Bluetooth: L2CAP: convert timeouts to secs_to_jiffies()" This reverts commit `c9d84da18d`. It replaces in L2CAP calls to msecs_to_jiffies() to secs_to_jiffies() and updates the constants accordingly. But the constants are also used in LCAP Configure Request and L2CAP Configure Response which expect values in milliseconds. This may prevent correct usage of L2CAP channel. To fix it, keep those constants in milliseconds and so revert this change. Fixes: `c9d84da18d` ("Bluetooth: L2CAP: convert timeouts to secs_to_jiffies()") Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:30:30 -04:00
Pauli Virtanen	e8785404de	Bluetooth: MGMT: fix crash in set_mesh_sync and set_mesh_complete There is a BUG: KASAN: stack-out-of-bounds in set_mesh_sync due to memcpy from badly declared on-stack flexible array. Another crash is in set_mesh_complete() due to double list_del via mgmt_pending_valid + mgmt_pending_remove. Use DEFINE_FLEX to declare the flexible array right, and don't memcpy outside bounds. As mgmt_pending_valid removes the cmd from list, use mgmt_pending_free, and also report status on error. Fixes: `302a1f674c` ("Bluetooth: MGMT: Fix possible UAFs") Signed-off-by: Pauli Virtanen <pav@iki.fi> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:21:37 -04:00
Luiz Augusto von Dentz	0d92808024	Bluetooth: HCI: Fix tracking of advertisement set/instance 0x00 This fixes the state tracking of advertisement set/instance 0x00 which is considered a legacy instance and is not tracked individually by adv_instances list, previously it was assumed that hci_dev itself would track it via HCI_LE_ADV but that is a global state not specifc to instance 0x00, so to fix it a new flag is introduced that only tracks the state of instance 0x00. Fixes: `1488af7b8b` ("Bluetooth: hci_sync: Fix hci_resume_advertising_sync") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:21:07 -04:00
Luiz Augusto von Dentz	f0c200a4a5	Bluetooth: ISO: Fix BIS connection dst_type handling Socket dst_type cannot be directly assigned to hci_conn->type since there domain is different which may lead to the wrong address type being used. Fixes: `6a5ad251b7` ("Bluetooth: ISO: Fix possible circular locking dependency") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:20:34 -04:00
Cen Zhang	09b0cd1297	Bluetooth: hci_sync: fix race in hci_cmd_sync_dequeue_once hci_cmd_sync_dequeue_once() does lookup and then cancel the entry under two separate lock sections. Meanwhile, hci_cmd_sync_work() can also delete the same entry, leading to double list_del() and "UAF". Fix this by holding cmd_sync_work_lock across both lookup and cancel, so that the entry cannot be removed concurrently. Fixes: `505ea2b295` ("Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queue") Reported-by: Cen Zhang <zzzccc427@163.com> Signed-off-by: Cen Zhang <zzzccc427@163.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-10-24 10:20:15 -04:00
Jakub Kicinski	a83155cc4e	Merge tag 'wireless-2025-10-23' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== First set of fixes: - brcmfmac: long-standing crash when used w/o P2P - iwlwifi: fix for a use-after-free bug - mac80211: key tailroom accounting bug could leave allocation overhead and cause a warning - ath11k: add a missing platform, fix key flag operations - bcma: skip devices disabled in OF/DT - various (potential) memory leaks * tag 'wireless-2025-10-23' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: nl80211: call kfree without a NULL check wifi: mac80211: fix key tailroom accounting leak wifi: brcmfmac: fix crash while sending Action Frames in standalone AP Mode MAINTAINERS: wcn36xx: Add linux-wireless list bcma: don't register devices disabled in OF wifi: mac80211: reset FILS discovery and unsol probe resp intervals wifi: iwlwifi: fix potential use after free in iwl_mld_remove_link() wifi: ath11k: avoid bit operation on key flags wifi: ath12k: free skb during idr cleanup callback wifi: ath11k: Add missing platform IDs for quirk table wifi: ath10k: Fix memory leak on unsupported WMI command ==================== Link: https://patch.msgid.link/20251023180604.626946-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-23 17:15:48 -07:00
Linus Torvalds	ab431bc397	Merge tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can. Slim pickings, I'm guessing people haven't really started testing. Current release - new code bugs: - eth: mlx5e: - psp: avoid 'accel' NULL pointer dereference - skip PPHCR register query for FEC histogram if not supported Previous releases - regressions: - bonding: update the slave array for broadcast mode - rtnetlink: re-allow deleting FDB entries in user namespace - eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path Previous releases - always broken: - can: drop skb on xmit if device is in listen-only mode - gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb() - eth: mlx5e - RX, fix generating skb from non-linear xdp_buff if program trims frags - make devcom init failures non-fatal, fix races with IPSec Misc: - some documentation formatting 'fixes'" * tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) net/mlx5: Fix IPsec cleanup over MPV device net/mlx5: Refactor devcom to return NULL on failure net/mlx5e: Skip PPHCR register query if not supported by the device net/mlx5: Add PPHCR to PCAM supported registers mask virtio-net: zero unused hash fields net: phy: micrel: always set shared->phydev for LAN8814 vsock: fix lock inversion in vsock_assign_transport() ovpn: use datagram_poll_queue for socket readiness in TCP espintcp: use datagram_poll_queue for socket readiness net: datagram: introduce datagram_poll_queue for custom receive queues net: bonding: fix possible peer notify event loss or dup issue net: hsr: prevent creation of HSR device with slaves from another netns sctp: avoid NULL dereference when chunk data buffer is missing ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop net: ravb: Ensure memory write completes before ringing TX doorbell net: ravb: Enforce descriptor type ordering net: hibmcge: select FIXED_PHY net: dlink: use dev_kfree_skb_any instead of dev_kfree_skb Documentation: networking: ax25: update the mailing list info. net: gro_cells: fix lock imbalance in gro_cells_receive() ...	2025-10-23 07:03:18 -10:00
Stefano Garzarella	f7c877e753	vsock: fix lock inversion in vsock_assign_transport() Syzbot reported a potential lock inversion deadlock between vsock_register_mutex and sk_lock-AF_VSOCK when vsock_linger() is called. The issue was introduced by commit `687aa0c558` ("vsock: Fix transport_* TOCTOU") which added vsock_register_mutex locking in vsock_assign_transport() around the transport->release() call, that can call vsock_linger(). vsock_assign_transport() can be called with sk_lock held. vsock_linger() calls sk_wait_event() that temporarily releases and re-acquires sk_lock. During this window, if another thread hold vsock_register_mutex while trying to acquire sk_lock, a circular dependency is created. Fix this by releasing vsock_register_mutex before calling transport->release() and vsock_deassign_transport(). This is safe because we don't need to hold vsock_register_mutex while releasing the old transport, and we ensure the new transport won't disappear by obtaining a module reference first via try_module_get(). Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Fixes: `687aa0c558` ("vsock: Fix transport_* TOCTOU") Cc: mhal@rbox.co Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251021121718.137668-1-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-23 16:07:58 +02:00
Ralf Lici	0fc3e32c2c	espintcp: use datagram_poll_queue for socket readiness espintcp uses a custom queue (ike_queue) to deliver packets to userspace. The polling logic relies on datagram_poll, which checks sk_receive_queue, which can lead to false readiness signals when that queue contains non-userspace packets. Switch espintcp_poll to use datagram_poll_queue with ike_queue, ensuring poll only signals readiness when userspace data is actually available. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251021100942.195010-3-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-23 15:46:04 +02:00
Ralf Lici	f6ceec6434	net: datagram: introduce datagram_poll_queue for custom receive queues Some protocols using TCP encapsulation (e.g., espintcp, openvpn) deliver userspace-bound packets through a custom skb queue rather than the standard sk_receive_queue. Introduce datagram_poll_queue that accepts an explicit receive queue, and convert datagram_poll into a wrapper around datagram_poll_queue. This allows protocols with custom skb queues to reuse the core polling logic without relying on sk_receive_queue. Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20251021100942.195010-2-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-23 15:46:04 +02:00
Fernando Fernandez Mancera	c0178eec88	net: hsr: prevent creation of HSR device with slaves from another netns HSR/PRP driver does not handle correctly having slaves/interlink devices in a different net namespace. Currently, it is possible to create a HSR link in a different net namespace than the slaves/interlink with the following command: ip link add hsr0 netns hsr-ns type hsr slave1 eth1 slave2 eth2 As there is no use-case on supporting this scenario, enforce that HSR device link matches netns defined by IFLA_LINK_NETNSID. The iproute2 command mentioned above will throw the following error: Error: hsr: HSR slaves/interlink must be on the same net namespace than HSR link. Fixes: `f421436a59` ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20251020135533.9373-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-22 19:22:22 -07:00
Alexey Simakov	441f0647f7	sctp: avoid NULL dereference when chunk data buffer is missing chunk->skb pointer is dereferenced in the if-block where it's supposed to be NULL only. chunk->skb can only be NULL if chunk->head_skb is not. Check for frag_list instead and do it just before replacing chunk->skb. We're sure that otherwise chunk->skb is non-NULL because of outer if() condition. Fixes: `90017accff` ("sctp: Add GSO support") Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/20251021130034.6333-1-bigalex934@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-22 19:19:31 -07:00
Eric Dumazet	c5394b8b7a	net: gro_cells: fix lock imbalance in gro_cells_receive() syzbot found that the local_unlock_nested_bh() call was missing in some cases. WARNING: possible recursive locking detected syzkaller #0 Not tainted -------------------------------------------- syz.2.329/7421 is trying to acquire lock: ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:44 [inline] ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: gro_cells_receive+0x404/0x790 net/core/gro_cells.c:30 but task is already holding lock: ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:44 [inline] ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: gro_cells_receive+0x404/0x790 net/core/gro_cells.c:30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock((&cell->bh_lock)); lock((&cell->bh_lock)); * DEADLOCK * Given the introduction of @have_bh_lock variable, it seems the author intent was to have the local_unlock_nested_bh() after the @unlock label. Fixes: `25718fdcbd` ("net: gro_cells: Use nested-BH locking for gro_cell") Reported-by: syzbot+f9651b9a8212e1c8906f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68f65eb9.a70a0220.205af.0034.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251020161114.1891141-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-21 17:41:09 -07:00
Matthieu Baerts (NGI0)	e84cb860ac	mptcp: pm: in-kernel: C-flag: handle late ADD_ADDR The special C-flag case expects the ADD_ADDR to be received when switching to 'fully-established'. But for various reasons, the ADD_ADDR could be sent after the "4th ACK", and the special case doesn't work. On NIPA, the new test validating this special case for the C-flag failed a few times, e.g. 102 default limits, server deny join id 0 syn rx [FAIL] got 0 JOIN[s] syn rx expected 2 Server ns stats (...) MPTcpExtAddAddrTx 1 MPTcpExtEchoAdd 1 Client ns stats (...) MPTcpExtAddAddr 1 MPTcpExtEchoAddTx 1 synack rx [FAIL] got 0 JOIN[s] synack rx expected 2 ack rx [FAIL] got 0 JOIN[s] ack rx expected 2 join Rx [FAIL] see above syn tx [FAIL] got 0 JOIN[s] syn tx expected 2 join Tx [FAIL] see above I had a suspicion about what the issue could be: the ADD_ADDR might have been received after the switch to the 'fully-established' state. The issue was not easy to reproduce. The packet capture shown that the ADD_ADDR can indeed be sent with a delay, and the client would not try to establish subflows to it as expected. A simple fix is not to mark the endpoints as 'used' in the C-flag case, when looking at creating subflows to the remote initial IP address and port. In this case, there is no need to try. Note: newly added fullmesh endpoints will still continue to be used as expected, thanks to the conditions behind mptcp_pm_add_addr_c_flag_case. Fixes: `4b1ff850e0` ("mptcp: pm: in-kernel: usable client side with C-flag") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-1-8207030cb0e8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-21 17:36:45 -07:00
Wang Liang	f584239a9e	net/smc: fix general protection fault in __smc_diag_dump The syzbot report a crash: Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 Call Trace: <TASK> smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 netlink_dump_start include/linux/netlink.h:341 [inline] smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 __sock_diag_cmd net/core/sock_diag.c:249 [inline] sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] __sock_sendmsg net/socket.c:729 [inline] ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 __sys_sendmsg+0x16d/0x220 net/socket.c:2700 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> The process like this: (CPU1) \| (CPU2) ---------------------------------\|------------------------------- inet_create() \| // init clcsock to NULL \| sk = sk_alloc() \| \| // unexpectedly change clcsock \| inet_init_csk_locks() \| \| // add sk to hash table \| smc_inet_init_sock() \| smc_sk_init() \| smc_hash_sk() \| \| // traverse the hash table \| smc_diag_dump_proto \| __smc_diag_dump() \| // visit wrong clcsock \| smc_diag_msg_common_fill() // alloc clcsock \| smc_create_clcsk \| sock_create_kern \| With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed in inet_init_csk_locks(). The INET_PROTOSW_ICSK flag is no need by smc, just remove it. After removing the INET_PROTOSW_ICSK flag, this patch alse revert commit `6fd27ea183` ("net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC") to avoid casting smc_sock to inet_connection_sock. Reported-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f775be4458668f7d220e Tested-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com Fixes: `d25a92ccae` ("net/smc: Introduce IPPROTO_SMC") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Link: https://patch.msgid.link/20251017024827.3137512-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-20 17:46:06 -07:00
Emmanuel Grumbach	249e1443e3	wifi: nl80211: call kfree without a NULL check Coverity is unhappy because we may leak old_radio_rts_threshold. Since this pointer is only valid in the context of the function and kfree is NULL pointer safe, don't check and just call kfree. Note that somehow, we were checking old_rts_threshold to free old_radio_rts_threshold which is a bit odd. Fixes: `264637941c` ("wifi: cfg80211: Add Support to Set RTS Threshold for each Radio") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://patch.msgid.link/20251020075745.44168-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-10-20 13:57:26 +02:00
Johannes Berg	ed6a47346e	wifi: mac80211: fix key tailroom accounting leak For keys added by ieee80211_gtk_rekey_add(), we assume that they're already present in the hardware and set the flag KEY_FLAG_UPLOADED_TO_HARDWARE. However, setting this flag needs to be paired with decrementing the tailroom needed, which was missed. Fixes: `f52a0b408e` ("wifi: mac80211: mark keys as uploaded when added by the driver") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251019115358.c88eafb4083e.I69e9d4d78a756a133668c55b5570cf15a4b0e6a4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-10-20 13:57:10 +02:00
Aloka Dixit	6078447614	wifi: mac80211: reset FILS discovery and unsol probe resp intervals When ieee80211_stop_ap() deletes the FILS discovery and unsolicited broadcast probe response templates, the associated interval values are not reset. This can lead to drivers subsequently operating with the non-zero values, leading to unexpected behavior. Trigger repeated retrieval attempts of the FILS discovery template in ath12k, resulting in excessive log messages such as: mac vdev 0 failed to retrieve FILS discovery template mac vdev 4 failed to retrieve FILS discovery template Fix this by resetting the intervals in ieee80211_stop_ap() to ensure proper cleanup of FILS discovery and unsolicited broadcast probe response templates. Fixes: `295b02c4be` ("mac80211: Add FILS discovery support") Fixes: `632189a018` ("mac80211: Unsolicited broadcast probe response support") Signed-off-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com> Signed-off-by: Aaradhana Sahu <aaradhana.sahu@oss.qualcomm.com> Link: https://patch.msgid.link/20250924130014.2575533-1-aaradhana.sahu@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-10-20 13:53:06 +02:00
Linus Torvalds	d303caf5ca	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov) - Make selftests/bpf arg_parsing.c more robust to errors (Andrii Nakryiko) - Fix redefinition of 'off' as different kind of symbol when I40E driver is builtin (Brahmajit Das) - Do not disable preemption in bpf_test_run (Sahil Chandna) - Fix memory leak in __lookup_instance error path (Shardul Bankar) - Ensure test data is flushed to disk before reading it (Xing Guo) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Fix redefinition of 'off' as different kind of symbol bpf: Do not disable preemption in bpf_test_run(). bpf: Fix memory leak in __lookup_instance error path selftests: arg_parsing: Ensure data is flushed to disk before reading. bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures. selftests/bpf: make arg_parsing.c more robust to crashes bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path	2025-10-18 08:00:43 -10:00
Sahil Chandna	7c33e97a6e	bpf: Do not disable preemption in bpf_test_run(). The timer mode is initialized to NO_PREEMPT mode by default, this disables preemption and force execution in atomic context causing issue on PREEMPT_RT configurations when invoking spin_lock_bh(), leading to the following warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6107, name: syz.0.17 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 Preemption disabled at: [<ffffffff891fce58>] bpf_test_timer_enter+0xf8/0x140 net/bpf/test_run.c:42 Fix this, by removing NO_PREEMPT/NO_MIGRATE mode check. Also, the test timer context no longer needs explicit calls to migrate_disable()/migrate_enable() with rcu_read_lock()/rcu_read_unlock(). Use helpers rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() instead. Reported-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1f1fbecb9413cdbfbef8 Suggested-by: Yonghong Song <yonghong.song@linux.dev> Suggested-by: Menglong Dong <menglong.dong@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com Co-developed-by: Brahmajit Das <listout@listout.xyz> Signed-off-by: Brahmajit Das <listout@listout.xyz> Signed-off-by: Sahil Chandna <chandna.sahil@gmail.com> Link: https://lore.kernel.org/r/20251014185635.10300-1-chandna.sahil@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-17 11:29:35 -07:00
Johannes Wiesböck	bf29555f5b	rtnetlink: Allow deleting FDB entries in user namespace Creating FDB entries is possible from a non-initial user namespace when having CAP_NET_ADMIN, yet, when deleting FDB entries, processes receive an EPERM because the capability is always checked against the initial user namespace. This restricts the FDB management from unprivileged containers. Drop the netlink_capable check in rtnl_fdb_del as it was originally dropped in `c5c351088a` and reintroduced in `1690be63a2` without intention. This patch was tested using a container on GyroidOS, where it was possible to delete FDB entries from an unprivileged user namespace and private network namespace. Fixes: `1690be63a2` ("bridge: Add vlan support to static neighbors") Reviewed-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Tested-by: Harshal Gohel <hg@simonwunderlich.de> Signed-off-by: Johannes Wiesböck <johannes.wiesboeck@aisec.fraunhofer.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251015201548.319871-1-johannes.wiesboeck@aisec.fraunhofer.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-16 16:09:56 -07:00
Eric Dumazet	d0d3e9c286	net: gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb() Some network drivers assume this field is zero after napi_get_frags(). We must clear it in napi_reuse_skb() otherwise the following can happen: 1) A packet is received, and skb_shinfo(skb)->hwtstamps is populated because a bit in the receive descriptor announced hwtstamp availability for this packet. 2) Packet is given to gro layer via napi_gro_frags(). 3) Packet is merged to a prior one held in GRO queues. 4) skb is saved after some cleanup in napi->skb via a call to napi_reuse_skb(). 5) Next packet is received 10 seconds later, gets the recycled skb from napi_get_frags(). 6) The receive descriptor does not announce hwtstamp availability. Driver does not clear shinfo->hwtstamps. 7) We have in shinfo->hwtstamps an old timestamp. Fixes: `ac45f602ee` ("net: infrastructure for hardware time stamping") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251015063221.4171986-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-16 15:42:49 -07:00
Linus Torvalds	634ec1fc79	Merge tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN Current release - regressions: - udp: do not use skb_release_head_state() before skb_attempt_defer_free() - gro_cells: use nested-BH locking for gro_cell - dpll: zl3073x: increase maximum size of flash utility Previous releases - regressions: - core: fix lockdep splat on device unregister - tcp: fix tcp_tso_should_defer() vs large RTT - tls: - don't rely on tx_work during send() - wait for pending async decryptions if tls_strp_msg_hold fails - can: j1939: add missing calls in NETDEV_UNREGISTER notification handler - eth: lan78xx: fix lost EEPROM write timeout in lan78xx_write_raw_eeprom Previous releases - always broken: - ip6_tunnel: prevent perpetual tunnel growth - dpll: zl3073x: handle missing or corrupted flash configuration - can: m_can: fix pm_runtime and CAN state handling - eth: - ixgbe: fix too early devlink_free() in ixgbe_remove() - ixgbevf: fix mailbox API compatibility - gve: Check valid ts bit on RX descriptor before hw timestamping - idpf: cleanup remaining SKBs in PTP flows - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H" * tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) udp: do not use skb_release_head_state() before skb_attempt_defer_free() net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset netdevsim: set the carrier when the device goes up selftests: tls: add test for short splice due to full skmsg selftests: net: tls: add tests for cmsg vs MSG_MORE tls: don't rely on tx_work during send() tls: wait for pending async decryptions if tls_strp_msg_hold fails tls: always set record_type in tls_process_cmsg tls: wait for async encrypt in case of error during latter iterations of sendmsg tls: trim encrypted message to match the plaintext on short splice tg3: prevent use of uninitialized remote_adv and local_adv variables MAINTAINERS: new entry for IPv6 IOAM gve: Check valid ts bit on RX descriptor before hw timestamping net: core: fix lockdep splat on device unregister MAINTAINERS: add myself as maintainer for b53 selftests: net: check jq command is supported net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit() tcp: fix tcp_tso_should_defer() vs large RTT r8152: add error handling in rtl8152_driver_init usbnet: Fix using smp_processor_id() in preemptible code warnings ...	2025-10-16 09:41:21 -07:00
Eric Dumazet	6de1dec1c1	udp: do not use skb_release_head_state() before skb_attempt_defer_free() Michal reported and bisected an issue after recent adoption of skb_attempt_defer_free() in UDP. The issue here is that skb_release_head_state() is called twice per skb, one time from skb_consume_udp(), then a second time from skb_defer_free_flush() and napi_consume_skb(). As Sabrina suggested, remove skb_release_head_state() call from skb_consume_udp(). Add a DEBUG_NET_WARN_ON_ONCE(skb_nfct(skb)) in skb_attempt_defer_free() Many thanks to Michal, Sabrina, Paolo and Florian for their help. Fixes: `6471658dc6` ("udp: use skb_attempt_defer_free()") Reported-and-bisected-by: Michal Kubecek <mkubecek@suse.cz> Closes: https://lore.kernel.org/netdev/gpjh4lrotyephiqpuldtxxizrsg6job7cvhiqrw72saz2ubs3h@g6fgbvexgl3r/ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251015052715.4140493-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-10-16 16:03:07 +02:00
Jakub Kicinski	5e655aadda	Merge tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-10-14 The first 2 paches are by Celeste Liu and target the gS_usb driver. The first patch remove the limitation to 3 CAN interface per USB device. The second patch adds the missing population of net_device->dev_port. The next 4 patches are by me and fix the m_can driver. They add a missing pm_runtime_disable(), fix the CAN state transition back to Error Active and fix the state after ifup and suspend/resume. Another patch by me targets the m_can driver, too and replaces Dong Aisheng's old email address. The next 2 patches are by Vincent Mailhol and update the CAN networking Documentation. Tetsuo Handa contributes the last patch that add missing cleanup calls in the NETDEV_UNREGISTER notification handler. * tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: j1939: add missing calls in NETDEV_UNREGISTER notification handler can: add Transmitter Delay Compensation (TDC) documentation can: remove false statement about 1:1 mapping between DLC and length can: m_can: replace Dong Aisheng's old email address can: m_can: fix CAN state in system PM can: m_can: m_can_chip_config(): bring up interface in correct state can: m_can: m_can_handle_state_errors(): fix CAN state transition to Error Active can: m_can: m_can_plat_remove(): add missing pm_runtime_disable() can: gs_usb: gs_make_candev(): populate net_device->dev_port can: gs_usb: increase max interface to U8_MAX ==================== Link: https://patch.msgid.link/20251014122140.990472-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-15 17:56:20 -07:00
Sabrina Dubroca	7f846c65ca	tls: don't rely on tx_work during send() With async crypto, we rely on tx_work to actually transmit records once encryption completes. But while send() is running, both the tx_lock and socket lock are held, so tx_work_handler cannot process the queue of encrypted records, and simply reschedules itself. During a large send(), this could last a long time, and use a lot of memory. Transmit any pending encrypted records before restarting the main loop of tls_sw_sendmsg_locked. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/8396631478f70454b44afb98352237d33f48d34d.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-15 17:41:45 -07:00
Sabrina Dubroca	b8a6ff84ab	tls: wait for pending async decryptions if tls_strp_msg_hold fails Async decryption calls tls_strp_msg_hold to create a clone of the input skb to hold references to the memory it uses. If we fail to allocate that clone, proceeding with async decryption can lead to various issues (UAF on the skb, writing into userspace memory after the recv() call has returned). In this case, wait for all pending decryption requests. Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/b9fe61dcc07dab15da9b35cf4c7d86382a98caf2.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-15 17:41:45 -07:00
Sabrina Dubroca	b6fe4c29bb	tls: always set record_type in tls_process_cmsg When userspace wants to send a non-DATA record (via the TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a previous MSG_MORE send() as a separate DATA record. If that DATA record is encrypted asynchronously, tls_handle_open_record will return -EINPROGRESS. This is currently treated as an error by tls_process_cmsg, and it will skip setting record_type to the correct value, but the caller (tls_sw_sendmsg_locked) handles that return value correctly and proceeds with sending the new message with an incorrect record_type (DATA instead of whatever was requested in the cmsg). Always set record_type before handling the open record. If tls_handle_open_record returns an error, record_type will be ignored. If it succeeds, whether with synchronous crypto (returning 0) or asynchronous (returning -EINPROGRESS), the caller will proceed correctly. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/0457252e578a10a94e40c72ba6288b3a64f31662.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-15 17:41:45 -07:00
Sabrina Dubroca	b014a4e066	tls: wait for async encrypt in case of error during latter iterations of sendmsg If we hit an error during the main loop of tls_sw_sendmsg_locked (eg failed allocation), we jump to send_end and immediately return. Previous iterations may have queued async encryption requests that are still pending. We should wait for those before returning, as we could otherwise be reading from memory that userspace believes we're not using anymore, which would be a sort of use-after-free. This is similar to what tls_sw_recvmsg already does: failures during the main loop jump to the "wait for async" code, not straight to the unlock/return. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/c793efe9673b87f808d84fdefc0f732217030c52.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-15 17:41:45 -07:00

1 2 3 4 5 ...

82177 Commits