linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-03 23:12:57 -04:00

Author	SHA1	Message	Date
Johannes Berg	a74e893f30	wifi: mac80211: fix MLE defragmentation If either reconf or EPCS multi-link element (MLE) is contained in a non-transmitted profile, the defragmentation routine is called with a pointer to the defragmented copy, but the original elements. This is incorrect for two reasons: - if the original defragmentation was needed, it will not find the correct data - if the original frame is at a higher address, the parsing will potentially overrun the heap data (though given the layout of the buffers, only into the new defragmentation buffer, and then it has to stop and fail once that's filled with copied data. Fix it by tracking the container along with the pointer and in doing so also unify the two almost identical defragmentation routines. Fixes: `4d70e9c548` ("wifi: mac80211: defragment reconfiguration MLE when parsing") Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Link: https://patch.msgid.link/20260508091031.8a6c34613178.I4de16ebbce2d27f2f8f98fc49949c7a376c2fe8d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:19:52 +02:00
Emmanuel Grumbach	e1e83feb8e	wifi: mac80211: don't override max_amsdu_subframes In client mode, the extended capabilities are handled by the kernel looking at the association frame. When the supplicant installs the keys it calls sta_apply_parameters and it doesn't include the extended capabilities since those can't change after association. As a result, we overrode the max_amsdu_subframes that we set after association. Check that the ext_capa coming from the user space is valid before looking at it. If the ext_capa is NULL, it really means that the extended capabilities are not changed (as opposed to cleared). The default value for max_amsdu_subframes is 0, which means there is no limit. This value is valid and in case the association response frame does not have extended capabilities, this is the value we should use. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221079 Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260513170623.828dbb58c782.Ifd2bfc190c26140e919127adb02ffddd7b551499@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:14:41 +02:00
Alexandru Hossu	f718506edd	wifi: mac80211: bounds-check link_id in ieee80211_ml_epcs IEEE80211_MLE_STA_EPCS_CONTROL_LINK_ID is 0x000f, so link_id extracted from a PRIO_ACCESS ML element PER_STA_PROFILE subelement can be 0..15. sdata->link[] has IEEE80211_MLD_MAX_NUM_LINKS (15) entries (indices 0..14), making index 15 out-of-bounds. A connected WiFi 7 AP can trigger this by sending an EPCS Enable Response action frame with a PER_STA_PROFILE subelement where link_id = 15. The unsolicited-notification path (dialog_token = 0) is reachable any time EPCS is already enabled, without any prior client request. sdata->link[15] reads into the first word of sdata->activate_links_work (a wiphy_work whose embedded list_head is non-NULL after INIT_LIST_HEAD), so the NULL check on the result does not catch the invalid access. The garbage pointer is then passed to ieee80211_sta_wmm_params(), which dereferences link->sdata and crashes the kernel. The same class of bug was fixed for ieee80211_ml_reconfiguration() by commit `162d331d83` ("wifi: mac80211: bounds-check link_id in ieee80211_ml_reconfiguration"). Fixes: `de86c5f608` ("wifi: mac80211: Add support for EPCS configuration") Signed-off-by: Alexandru Hossu <hossu.alexandru@gmail.com> Link: https://patch.msgid.link/20260515102908.1653088-1-hossu.alexandru@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:04:17 +02:00
John Walker	7666dbb1ba	wifi: cfg80211: advance loop vars in cfg80211_merge_profile() cfg80211_merge_profile() reassembles a Multi-BSSID non-transmitted BSS profile that has been split across multiple consecutive MBSSID elements. Its while-loop calls cfg80211_get_profile_continuation(ie, ielen, mbssid_elem, sub_elem) but never advances mbssid_elem or sub_elem inside the body. Each iteration therefore searches for a continuation that follows the same fixed pair; the helper returns the same next_mbssid; and the same next_sub bytes are memcpy()'d into merged_ie at a growing offset until the buffer fills. Advance both mbssid_elem and sub_elem to the just-consumed continuation so the next call to cfg80211_get_profile_continuation() searches for a further continuation beyond it (or returns NULL when none exists). A specially-crafted malicious beacon can take advantage of this bug to cause the kernel to spend an excessive amount of time in cfg80211_merge_profile (up to as much as 2ms per beacon received), which could theoretically be abused in some way. Cc: stable@vger.kernel.org Fixes: `fe806e4992` ("cfg80211: support profile split between elements") Signed-off-by: John Walker <johnwalker0@gmail.com> Link: https://patch.msgid.link/20260507230720.64783-1-johnwalker0@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-08 09:20:03 +02:00
Kuniyuki Iwashima	ecddc523cf	tcp: Fix dst leak in tcp_v6_connect(). If a socket is bound to a wildcard address, tcp_v[46]_connect() updates it with a non-wildcard address based on the route lookup. After bhash2 was introduced in the cited commit, we must call inet_bhash2_update_saddr() to update the bhash2 entry as well. If inet_bhash2_update_saddr() fails, we must release the refcount for dst by ip_route_connect() or ip6_dst_lookup_flow(). While tcp_v4_connect() calls ip_rt_put() in the error path, tcp_v6_connect() does not call dst_release(). Let's call dst_release() when inet_bhash2_update_saddr() fails in tcp_v6_connect(). Fixes: `28044fc1d4` ("net: Add a bhash2 table hashed by port and address") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260506070443.1699879-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-07 08:39:15 -07:00
Kuniyuki Iwashima	019c892e46	ipmr: Call ipmr_fib_lookup() under RCU. Yi Lai reported RCU splat in reg_vif_xmit() below. [0] When CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup() uses rcu_dereference() without explicit rcu_read_lock(). Although rcu_read_lock_bh() is already held by the caller __dev_queue_xmit(), lockdep requires explicit rcu_read_lock() for rcu_dereference(). Let's move up rcu_read_lock() in reg_vif_xmit() to cover ipmr_fib_lookup(). [0]: WARNING: suspicious RCU usage 7.1.0-rc2-next-20260504-9d0d467c3572 #1 Not tainted ----------------------------- net/ipv4/ipmr.c:329 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syz.2.17/1779: #0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: rcu_read_lock_bh include/linux/rcupdate.h:891 [inline] #0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x239/0x4140 net/core/dev.c:4792 #1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline] #1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __netif_tx_lock include/linux/netdevice.h:4795 [inline] #1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __dev_queue_xmit+0x1d5d/0x4140 net/core/dev.c:4865 stack backtrace: CPU: 1 UID: 0 PID: 1779 Comm: syz.2.17 Not tainted 7.1.0-rc2-next-20260504-9d0d467c3572 #1 PREEMPT(lazy) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x121/0x150 lib/dump_stack.c:120 dump_stack+0x19/0x20 lib/dump_stack.c:129 lockdep_rcu_suspicious+0x15b/0x1f0 kernel/locking/lockdep.c:6878 ipmr_fib_lookup net/ipv4/ipmr.c:329 [inline] reg_vif_xmit+0x2ee/0x3c0 net/ipv4/ipmr.c:540 __netdev_start_xmit include/linux/netdevice.h:5382 [inline] netdev_start_xmit include/linux/netdevice.h:5391 [inline] xmit_one net/core/dev.c:3889 [inline] dev_hard_start_xmit+0x170/0x700 net/core/dev.c:3905 __dev_queue_xmit+0x1df1/0x4140 net/core/dev.c:4871 dev_queue_xmit include/linux/netdevice.h:3423 [inline] packet_xmit+0x252/0x370 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3082 [inline] packet_sendmsg+0x39ad/0x5650 net/packet/af_packet.c:3114 sock_sendmsg_nosec net/socket.c:797 [inline] __sock_sendmsg net/socket.c:812 [inline] ____sys_sendmsg+0xa21/0xba0 net/socket.c:2716 ___sys_sendmsg+0x121/0x1c0 net/socket.c:2770 __sys_sendmsg+0x177/0x220 net/socket.c:2802 __do_sys_sendmsg net/socket.c:2807 [inline] __se_sys_sendmsg net/socket.c:2805 [inline] __x64_sys_sendmsg+0x80/0xc0 net/socket.c:2805 x64_sys_call+0x1d9c/0x21c0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xc1/0x1020 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f37e563ee5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe5caa7fa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000005c5fa0 RCX: 00007f37e563ee5d RDX: 0000000000000000 RSI: 00002000000012c0 RDI: 0000000000000004 RBP: 00000000005c5fa0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000005c5fac R15: 00000000005c5fa0 </TASK> Fixes: `b3b6babf47` ("ipmr: Free mr_table after RCU grace period.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: Yi Lai <yi1.lai@intel.com> Closes: https://lore.kernel.org/netdev/afrY34dLXNUboevf@ly-workstation/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260506065955.1695753-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-07 08:38:37 -07:00
D. Wythe	9032f76769	net/smc: fix missing sk_err when TCP handshake fails In smc_connect_work(), when the underlying TCP handshake fails, the error code (rc) must be propagated to sk_err to ensure userspace can correctly retrieve the error status via SO_ERROR. Currently, the code only handles a restricted set of error codes (e.g., EPIPE, ECONNREFUSED). If other errors occurs, such as EHOSTUNREACH, sk_err remains unset (zero). This affects applications that rely on SO_ERROR to determine connect outcome. For example, higher versions of Go's netpoller treats SO_ERROR == 0 combined with a failed getpeername() as a spurious wakeup and re-enters epoll_wait(). Under ET mode, no further edge will be generated since the socket is already in a terminal state, causing the connect to hang indefinitely or until a user-specified timeout, if one is set. Fixes: `50717a37db` ("net/smc: nonblocking connect rework") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Link: https://patch.msgid.link/20260506014105.27093-1-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-07 08:36:34 -07:00
Jiexun Wang	d119775f2b	af_unix: Reject SIOCATMARK on non-stream sockets SIOCATMARK reports whether the receive queue is at the urgent mark for MSG_OOB. In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets. SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(), so they should not support SIOCATMARK either. Return -EOPNOTSUPP for non-stream sockets before checking the receive queue. Fixes: `314001f0bf` ("af_unix: Add OOB support") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-07 08:36:02 -07:00
Matthieu Baerts (NGI0)	166b783440	mptcp: pm: prio: skip closed subflows When sending an MP_PRIO, closed subflows need to be skipped. This fixes the case where the initial subflow got closed, re-opened later, then an MP_PRIO is needed for the same local address. Note that explicit MP_PRIO cannot be sent during the 3WHS, so it is fine to use __mptcp_subflow_active(). Fixes: `067065422f` ("mptcp: add the outgoing MP_PRIO support") Cc: stable@vger.kernel.org Fixes: `b29fcfb54c` ("mptcp: full disconnect implementation") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-9-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:45 -07:00
Matthieu Baerts (NGI0)	62a9b19dce	mptcp: pm: ADD_ADDR rtx: return early if no retrans No need to iterate over all subflows if there is no retransmission needed. Exit early in this case then. Fixes: `30549eebc4` ("mptcp: make ADD_ADDR retransmission timeout adaptive") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-8-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:45 -07:00
Matthieu Baerts (NGI0)	c6d395e2de	mptcp: pm: ADD_ADDR rtx: skip inactive subflows When looking at the maximum RTO amongst the subflows, inactive subflows were taken into account: that includes stale ones, and the initial one if it has been already been closed. Unusable subflows are now simply skipped. Stale ones are used as an alternative: if there are only stale ones, to take their maximum RTO and avoid to eventually fallback to net.mptcp.add_addr_timeout, which is set to 2 minutes by default. Fixes: `30549eebc4` ("mptcp: make ADD_ADDR retransmission timeout adaptive") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-7-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:45 -07:00
Matthieu Baerts (NGI0)	3cf1249289	mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker When an ADD_ADDR needs to be retransmitted and another one has already been prepared -- e.g. multiple ADD_ADDRs have been sent in a row and need to be retransmitted later -- this additional retransmission will need to wait. In this case, the timer was reset to TCP_RTO_MAX / 8, which is ~15 seconds. This delay is unnecessary long: it should just be rescheduled at the next opportunity, e.g. after the retransmission timeout. Without this modification, some issues can be seen from time to time in the selftests when multiple ADD_ADDRs are sent, and the host takes time to process them, e.g. the "signal addresses, ADD_ADDR timeout" MPTCP Join selftest, especially with a debug kernel config. Note that on older kernels, 'timeout' is not available. It should be enough to replace it by one second (HZ). Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-6-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:45 -07:00
Matthieu Baerts (NGI0)	b7b9a46156	mptcp: pm: ADD_ADDR rtx: free sk if last When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(), and released at the end. If at that moment, it was the last reference being held, the sk would not be freed. sock_put() should then be called instead of __sock_put(). But that's not enough: if it is the last reference, sock_put() will call sk_free(), which will end up calling sk_stop_timer_sync() on the same timer, and waiting indefinitely to finish. So it is needed to mark that the timer is done at the end of the timer handler when it has not been rescheduled, not to call sk_stop_timer_sync() on "itself". Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-5-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:44 -07:00
Matthieu Baerts (NGI0)	9634cb35af	mptcp: pm: ADD_ADDR rtx: always decrease sk refcount When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(). It should then be released in all cases at the end. Some (unlikely) checks were returning directly instead of calling sock_put() to decrease the refcount. Jump to a new 'exit' label to call __sock_put() (which will become sock_put() in the next commit) to fix this potential leak. While at it, drop the '!msk' check which cannot happen because it is never reset, and explicitly mark the remaining one as "unlikely". Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-4-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:44 -07:00
Matthieu Baerts (NGI0)	5cd6e0ad79	mptcp: pm: ADD_ADDR rtx: fix potential data-race This mptcp_pm_add_timer() helper is executed as a timer callback in softirq context. To avoid any data races, the socket lock needs to be held with bh_lock_sock(). If the socket is in use, retry again soon after, similar to what is done with the keepalive timer. Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-3-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:44 -07:00
Matthieu Baerts (NGI0)	03f324f3f1	mptcp: pm: ADD_ADDR rtx: allow ID 0 ADD_ADDR can be sent for the ID 0, which corresponds to the local address and port linked to the initial subflow. Indeed, this address could be removed, and re-added later on, e.g. what is done in the "delete re-add signal" MPTCP Join selftests. So no reason to ignore it. Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-2-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:44 -07:00
Matthieu Baerts (NGI0)	b12014d2d3	mptcp: pm: kernel: correctly retransmit ADD_ADDR ID 0 When adding the ADD_ADDR to the list, the address including the IP, port and ID are copied. On the other hand, when the endpoint corresponds to the one from the initial subflow, the ID is set to 0, as specified by the MPTCP protocol. The issue is that the ID was reset after having copied the ID in the ADD_ADDR entry. So the retransmission was done, but using a different ID than the initial one. Fixes: `8b8ed1b429` ("mptcp: pm: reuse ID 0 after delete and re-add") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-1-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:16:44 -07:00
Eric Dumazet	c8f7244c8c	tcp: tcp_child_process() related UAF tcp_child_process( .. child ...) currently calls sock_put(child). Unfortunately @child (named @nsk in callers) can be used after this point to send a RST packet. To fix this UAF, I remove the sock_put() from tcp_child_process() and let the callers handle this after it is safe. Remove @rsk variable in tcp_v4_do_rcv() and change tcp_v6_do_rcv() so that both functions look the same. Fixes: `cfb6eeb4c8` ("[TCP]: MD5 Signature Option (RFC2385) support.") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260505153927.3435532-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 18:11:33 -07:00
Eric Dumazet	770b136ff9	net/sched: sch_sfq: annotate data-races from sfq_dump_class_stats() sfq_dump_class_stats() runs locklessly, add needed READ_ONCE() and WRITE_ONCE() annotations. Fixes: `edb09eb17e` ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260505091133.2452510-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 17:46:05 -07:00
Eric Dumazet	67ef49047d	inetpeer: add a missing read_seqretry() in inet_getpeer() When performing a lockless lookup over the inet_peer rbtree, if a matching node is found, inet_getpeer() returns it immediately without validating the seqlock sequence. This missing check introduces a race condition: Trigger Path: When a host receives an incoming fragmented IPv4 packet, ip4_frag_init() (in net/ipv4/ip_fragment.c) calls inet_getpeer_v4() to track the peer. The Race: If the packet is from a new source IP, CPU A acquires the write_seqlock, allocates a new inet_peer node (p), sets its IP address (daddr), and links it to the rbtree (rb_link_node). Uninitialized Access: Due to the lack of memory barriers between rb_link_node and the initialization of the rest of the struct (like refcount_set(&p->refcnt, 1)), CPU A can make the node visible to readers before its refcnt is initialized. This is especially true on weakly-ordered architectures like ARM64 where the CPU can reorder the memory stores. Lockless Reader: Concurrently, CPU B processes a second fragmented packet from the same source IP. CPU B does a lockless lookup, finds the newly inserted node, and returns it immediately. Use-After-Free (UAF): CPU B reads p->refcnt as uninitialized garbage (left over from previous kmalloc-128/192 allocations). If the garbage is > 0, refcount_inc_not_zero(&p->refcnt) succeeds. CPU A then executes refcount_set(&p->refcnt, 1), overwriting CPU B's increment. When CPU B finishes with the fragment queue, it calls inet_putpeer(), which drops the refcount to 0 and frees the node via RCU. The node is now freed but remains linked in the rbtree, resulting in a Use-After-Free in the rbtree. Fixes: `b145425f26` ("inetpeer: remove AVL implementation in favor of RB tree") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260505133233.3039575-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 17:44:13 -07:00
Eric Dumazet	7aaa8f5e45	ipv6: fix potential UAF caused by ip6_forward_proxy_check() ip6_forward_proxy_check() calls pskb_may_pull() which might re-allocate skb->head. Reload ipv6_hdr() after the pskb_may_pull() call to avoid using the freed memory. Fixes: `e21e0b5f19` ("[IPV6] NDISC: Handle NDP messages to proxied addresses.") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260505130056.2927197-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 17:29:23 -07:00
Jakub Kicinski	dc61989e37	Merge tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-05-05 1. Fix an IPv6 encapsulation error path that leaked route references when UDPv6 ESP decapsulation resolved to an error route. From Yilin Zhu. 2. Fix AH with ESN on async crypto paths by accounting for the extra high-order sequence number when reconstructing the temporary authentication layout in the completion callbacks. From Michael Bomarito. 3. Fix XFRM output so it does not overwrite already-correct inner header pointers when a tunnel layer such as VXLAN has already saved them. The fix comes with new selftests. From Cosmin Ratiu. 4. Add the missing native payload size entry for XFRM_MSG_MAPPING in the compat translation path. From Ruijie Li. 5. Harden __xfrm_state_delete() against repeated or inconsistent unhashing of state list nodes by keying the removal on actual list membership and using delete-and-init helpers. From Michal Kosiorek. 6. Prevent ESP from decrypting shared splice-backed skb fragments in place by marking UDP splice frags as shared and forcing copy-on-write in ESP input when needed. From Kuan-Ting Chen. * tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: esp: avoid in-place decrypt on shared skb frags xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete xfrm: provide message size for XFRM_MSG_MAPPING xfrm: Don't clobber inner headers when already set tools/selftests: Add a VXLAN+IPsec traffic test tools/selftests: Use a sensible timeout value for iperf3 client xfrm: ah: account for ESN high bits in async callbacks ipv6: xfrm6: release dst on error in xfrm6_rcv_encap() ==================== Link: https://patch.msgid.link/20260505132326.1362733-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 16:49:42 -07:00
Michael Bommarito	c5d415596c	Bluetooth: HIDP: serialise l2cap_unregister_user via hidp_session_sem Commit `dbf666e4fc` ("Bluetooth: HIDP: Fix possible UAF") made hidp_session_remove() drop the L2CAP reference and set session->conn = NULL once the session is considered removed, and added a bare if (session->conn) guard around the kthread-exit l2cap_unregister_user() call in hidp_session_thread(). The sibling ioctl site in hidp_connection_del() still reads session->conn unlocked and unguarded, and the kthread-exit guard itself is a lockless double-read. hidp_session_find() drops hidp_session_sem before returning, so hidp_session_remove() can null session->conn between the lookup and the call in hidp_connection_del(). Worse, since commit `752a6c9596` ("Bluetooth: L2CAP: Fix use-after-free in l2cap_unregister_user") takes mutex_lock(&conn->lock) inside l2cap_unregister_user(), a stale non-NULL snapshot also UAFs on conn->lock. v1 only added an if (session->conn) guard at the ioctl site, which doesn't address either race; Luiz suggested snapshotting session->conn under the sem and clearing it before the call. Taking hidp_session_sem across l2cap_unregister_user() would be wrong: l2cap_conn_del() already establishes the lock order conn->lock -> hidp_session_sem via l2cap_unregister_all_users() -> user->remove == hidp_session_remove(), so taking hidp_session_sem before conn->lock would AB/BA deadlock. Factor a helper hidp_session_unregister_conn() that under down_write(&hidp_session_sem) snapshots session->conn and clears the member, then outside the sem calls l2cap_unregister_user() and l2cap_conn_put() on the snapshot. Call it from both hidp_connection_del() and hidp_session_thread()'s exit path. At most one consumer wins the write-sem; later callers observe session->conn == NULL and skip the unregister and put, so the reference hidp_session_new() took via l2cap_conn_get() is consumed exactly once. session_free() already tolerates a NULL session->conn. Fixes: `dbf666e4fc` ("Bluetooth: HIDP: Fix possible UAF") Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Link: https://lore.kernel.org/all/20260422011437.176643-1-michael.bommarito@gmail.com/ Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:27:53 -04:00
Jann Horn	72d97cae2a	Bluetooth: hci_event: fix memset typo hci_le_big_sync_established_evt() currently does: conn->num_bis = 0; memset(conn->bis, 0, sizeof(conn->num_bis)); sizeof(conn->num_bis) is wrong - it would make sense to either use conn->num_bis (before setting that to 0) or sizeof(conn->bis). Fix it by using sizeof(conn->bis), the least intrusive change. Luckily, nothing actually depends on this memset() working properly: Nothing seems to ever read from conn->bis beyond conn->num_bis, and when conn->num_bis is increased, the corresponding elements of conn->bis are initialized. So I think this line could also just be removed. This is a purely theoretical fix and should have no impact on actual behavior. Fixes: `42ecf19471` ("Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:27:29 -04:00
Pengpeng Hou	8f59d17b18	Bluetooth: RFCOMM: pull credit byte with skb_pull_data() rfcomm_recv_data() treats the first payload byte as a credit field when the UIH frame carries PF and credit-based flow control is enabled. After the header has been stripped, the PF/CFC path consumes that byte with a direct skb->data dereference followed by skb_pull(). A malformed short frame can reach this path without a byte available. Use skb_pull_data() so the length check and pull happen together before the returned credit byte is consumed. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:23:20 -04:00
SeungJu Cheon	f958c7805b	Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths Several iso_pi(sk) fields (qos, qos_user_set, bc_sid, base, base_len, sync_handle, bc_num_bis) are written under lock_sock in iso_sock_setsockopt() and iso_sock_bind(), but read and written under hci_dev_lock only in two other paths: - iso_connect_bis() / iso_connect_cis(), invoked from connect(2), read qos/base/bc_sid and reset qos to default_qos on the qos_user_set validation failure -- all without lock_sock. - iso_connect_ind(), invoked from hci_rx_work, writes sync_handle, bc_sid, qos.bcast.encryption, bc_num_bis, base and base_len on PA_SYNC_ESTABLISHED / PAST_RECEIVED / BIG_INFO_ADV_REPORT / PER_ADV_REPORT events. The BIG_INFO handler additionally passes &iso_pi(sk)->qos together with sync_handle / bc_num_bis / bc_bis to hci_conn_big_create_sync() while setsockopt may be mutating them. Acquire lock_sock around the affected accesses in both paths. The locking order hci_dev_lock -> lock_sock matches the existing iso_conn_big_sync() precedent, whose comment documents the same requirement for hci_conn_big_create_sync(). The HCI connect/bind helpers do not wait for command completion -- they enqueue work via hci_cmd_sync_queue{,_once}() / hci_le_create_cis_pending() and return -- so the added hold time is comparable to iso_conn_big_sync(). KCSAN report: BUG: KCSAN: data-race in iso_connect_cis / iso_sock_setsockopt read to 0xffffa3ae8ce3cdc8 of 1 bytes by task 335 on cpu 0: iso_connect_cis+0x49f/0xa20 iso_sock_connect+0x60e/0xb40 __sys_connect_file+0xbd/0xe0 __sys_connect+0xe0/0x110 __x64_sys_connect+0x40/0x50 x64_sys_call+0xcad/0x1c60 do_syscall_64+0x133/0x590 entry_SYSCALL_64_after_hwframe+0x77/0x7f write to 0xffffa3ae8ce3cdc8 of 60 bytes by task 334 on cpu 1: iso_sock_setsockopt+0x69a/0x930 do_sock_setsockopt+0xc3/0x170 __sys_setsockopt+0xd1/0x130 __x64_sys_setsockopt+0x64/0x80 x64_sys_call+0x1547/0x1c60 do_syscall_64+0x133/0x590 entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 334 Comm: iso_setup_race Not tainted 7.0.0-10949-g8541d8f725c6 #44 PREEMPT(lazy) The iso_connect_ind() races were found by inspection. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: SeungJu Cheon <suunj1331@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:22:05 -04:00
SeungJu Cheon	ca40d48107	Bluetooth: ISO: Fix data-race on dst in iso_sock_connect() iso_sock_connect() copies the destination address into iso_pi(sk)->dst under lock_sock, then releases the lock and reads it back with bacmp() to decide between the CIS and BIS connect paths: lock_sock(sk); bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr); iso_pi(sk)->dst_type = sa->iso_bdaddr_type; release_sock(sk); if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY)) // <- no lock held This read after release_sock() races with any concurrent write to iso_pi(sk)->dst on the same socket. Fix by reading the destination address directly from the local sockaddr argument (sa->iso_bdaddr) instead of iso_pi(sk)->dst. Since sa is a function-local argument, reading it requires no locking and avoids the race. This patch addresses only the bacmp() race in iso_sock_connect(); other unprotected iso_pi(sk) accesses are fixed separately in the next patch. KCSAN report: BUG: KCSAN: data-race in memcmp+0x39/0xb0 race at unknown origin, with read to 0xffff8f96ea66dde3 of 1 bytes by task 549 on cpu 1: memcmp+0x39/0xb0 iso_sock_connect+0x275/0xb40 __sys_connect_file+0xbd/0xe0 __sys_connect+0xe0/0x110 __x64_sys_connect+0x40/0x50 x64_sys_call+0xcad/0x1c60 do_syscall_64+0x133/0x590 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x00 -> 0xee Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 549 Comm: iso_race_combin Not tainted 7.0.0-08391-g1d51b370a0f8 #40 PREEMPT(lazy) Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: SeungJu Cheon <suunj1331@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:21:58 -04:00
Pauli Virtanen	4e37f6452d	Bluetooth: SCO: hold sk properly in sco_conn_ready sk deref in sco_conn_ready must be done either under conn->lock, or holding a refcount, to avoid concurrent close. conn->sk and parent sk is currently accessed without either, and without checking parent->sk_state: [Task 1] [Task 2] sco_sock_release sco_conn_ready sk = conn->sk lock_sock(sk) conn->sk = NULL lock_sock(sk) release_sock(sk) sco_sock_kill(sk) UAF on sk deref and similarly for access to sco_get_sock_listen() return value. Fix possible UAF by holding sk refcount in sco_conn_ready() and making sco_get_sock_listen() increase refcount. Also recheck after lock_sock that the socket is still valid. Adjust conn->sk locking so it's protected also by lock_sock() of the associated socket if any. Fixes: `27c24fda62` ("Bluetooth: switch to lock_sock in SCO") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:21:25 -04:00
Siwei Zhang	0a120d9616	Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb() Add the same NULL guard already present in l2cap_sock_resume_cb() and l2cap_sock_ready_cb(). Fixes: `80808e431e` ("Bluetooth: Add l2cap_chan_ops abstraction") Cc: stable@kernel.org Signed-off-by: Siwei Zhang <oss@fourdim.xyz> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:21:09 -04:00
Siwei Zhang	78a88d43da	Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb() Add the same NULL guard already present in l2cap_sock_resume_cb() and l2cap_sock_ready_cb(). Fixes: `8d836d71e2` ("Bluetooth: Access sk_sndtimeo indirectly in l2cap_core.c") Cc: stable@kernel.org Signed-off-by: Siwei Zhang <oss@fourdim.xyz> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:21:07 -04:00
Siwei Zhang	2ff1a41a91	Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb() Add the same NULL guard already present in l2cap_sock_resume_cb() and l2cap_sock_ready_cb(). Fixes: `89bc500e41` ("Bluetooth: Add state tracking to struct l2cap_chan") Cc: stable@kernel.org Signed-off-by: Siwei Zhang <oss@fourdim.xyz> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:21:04 -04:00
Mikhail Gavrilov	91b5a598b5	Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion When a BLE peripheral sends an L2CAP Connection Parameter Update Request the processing path is: process_pending_rx() [takes conn->lock] l2cap_le_sig_channel() l2cap_conn_param_update_req() hci_le_conn_update() [takes hdev->lock] Meanwhile other code paths take the locks in the opposite order: l2cap_chan_connect() [takes hdev->lock] ... mutex_lock(&conn->lock) l2cap_conn_ready() [hdev->lock via hci_cb_list_lock] ... mutex_lock(&conn->lock) This is a classic AB/BA deadlock which lockdep reports as a circular locking dependency when connecting a BLE MIDI keyboard (Carry-On FC-49). Fix this by making hci_le_conn_update() defer the HCI command through hci_cmd_sync_queue() so it no longer needs to take hdev->lock in the caller context. The sync callback uses __hci_cmd_sync_status_sk() to wait for the HCI_EV_LE_CONN_UPDATE_COMPLETE event, then updates the stored connection parameters (hci_conn_params) and notifies userspace (mgmt_new_conn_param) only after the controller has confirmed the update. A reference on hci_conn is held via hci_conn_get()/hci_conn_put() for the lifetime of the queued work to prevent use-after-free, and hci_conn_valid() is checked before proceeding in case the connection was removed while the work was pending. The hci_dev_lock is held across hci_conn_valid() and all conn field accesses to prevent a concurrent disconnect from invalidating the connection mid-use. Fixes: `f044eb0524` ("Bluetooth: Store latency and supervision timeout in connection params") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:20:51 -04:00
Dudu Lu	4f42363c81	Bluetooth: l2cap: fix MPS check in l2cap_ecred_reconf_req The L2CAP specification states that if more than one channel is being reconfigured, the MPS shall not be decreased. The current check has two issues: 1) The comparison uses >= (greater-than-or-equal), which incorrectly rejects reconfiguration requests where the MPS stays the same. Since the spec says MPS "shall be greater than or equal to the current MPS", only a strict decrease (remote_mps > mps) should be rejected. Keeping the same MPS is valid. 2) The multi-channel guard uses `&& i` (loop index) to approximate "more than one channel", but this incorrectly allows MPS decrease for the first channel (i==0) even when multiple channels are being reconfigured. Replace with `&& num_scid > 1` which correctly checks whether the request covers more than one channel. Fixes: `7accb1c432` ("Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ") Signed-off-by: Dudu Lu <phx0fer@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:20:38 -04:00
Dudu Lu	72b8deccff	Bluetooth: bnep: fix incorrect length parsing in bnep_rx_frame() extension handling In bnep_rx_frame(), the BNEP_FILTER_NET_TYPE_SET and BNEP_FILTER_MULTI_ADDR_SET extension header parsing has two bugs: 1) The 2-byte length field is read with (u16 )(skb->data + 1), which performs a native-endian read. The BNEP protocol specifies this field in big-endian (network byte order), and the same file correctly uses get_unaligned_be16() for the identical fields in bnep_ctrl_set_netfilter() and bnep_ctrl_set_mcfilter(). 2) The length is multiplied by 2, but unlike BNEP_SETUP_CONN_REQ where the length byte counts UUID pairs (requiring * 2 for two UUIDs per entry), the filter extension length field already represents the total data size in bytes. This is confirmed by bnep_ctrl_set_netfilter() which reads the same field as a byte count and divides by 4 to get the number of filter entries. The bogus * 2 means skb_pull advances twice as far as it should, either dropping valid data from the next header or causing the pull to fail entirely when the doubled length exceeds the remaining skb. Fix by splitting the pull into two steps: first use skb_pull_data() to safely pull and validate the 3-byte fixed header (ctrl type + length), then pull the variable-length data using the properly decoded length. Fixes: `bf8b9a9cb7` ("Bluetooth: bnep: Add support to extended headers of control frames") Signed-off-by: Dudu Lu <phx0fer@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:19:09 -04:00
Luiz Augusto von Dentz	5ddb801426	Bluetooth: hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt hci_le_create_big_complete_evt() iterates over BT_BOUND connections for a BIG handle using a while loop, accessing ev->bis_handle[i++] on each iteration. However, there is no check that i stays within ev->num_bis before the array access. When a controller sends a LE_Create_BIG_Complete event with fewer bis_handle entries than there are BT_BOUND connections for that BIG, or with num_bis=0, the loop reads beyond the valid bis_handle[] flex array into adjacent heap memory. Since the out-of-bounds values typically exceed HCI_CONN_HANDLE_MAX (0x0EFF), hci_conn_set_handle() rejects them and the connection remains in BT_BOUND state. The same connection is then found again by hci_conn_hash_lookup_big_state(), creating an infinite loop with hci_dev_lock held. Fix this by terminating the BIG if in case not all BIS could be setup properly. Fixes: `a0bfde167b` ("Bluetooth: ISO: Add support for connecting multiple BISes") Cc: stable@vger.kernel.org Signed-off-by: ZhiTao Ou <hkbinbinbin@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 16:18:22 -04:00
David Carlier	0beddb0c38	Bluetooth: hci_conn: fix potential UAF in create_big_sync Add hci_conn_valid() check in create_big_sync() to detect stale connections before proceeding with BIG creation. Handle the resulting -ECANCELED in create_big_complete() and re-validate the connection under hci_dev_lock() before dereferencing, matching the pattern used by create_le_conn_complete() and create_pa_complete(). Keep the hci_conn object alive across the async boundary by taking a reference via hci_conn_get() when queueing create_big_sync(), and dropping it in the completion callback. The refcount and the lock are complementary: the refcount keeps the object allocated, while hci_dev_lock() serializes hci_conn_hash_del()'s list_del_rcu() on hdev->conn_hash, as required by hci_conn_del(). hci_conn_put() is called outside hci_dev_unlock() so the final put (which resolves to kfree() via bt_link_release) does not run under hdev->lock, though the release path would be safe either way. Without this, create_big_complete() would unconditionally dereference the conn pointer on error, causing a use-after-free via hci_connect_cfm() and hci_conn_del(). Fixes: `eca0ae4aea` ("Bluetooth: Add initial implementation of BIS connections") Cc: stable@vger.kernel.org Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 15:58:40 -04:00
Pauli Virtanen	b819db93d7	Bluetooth: SCO: fix sleeping under spinlock in sco_conn_ready sco_conn_ready calls sleeping functions under conn->lock spinlock. The critical section can be reduced: conn->hcon is modified only with hdev->lock held. It is guaranteed to be held in sco_conn_ready, so conn->lock is not needed to guard it. Move taking conn->lock after lock_sock(parent). This also follows the lock ordering lock_sock() > conn->lock elsewhere in the file. Fixes: `27c24fda62` ("Bluetooth: switch to lock_sock in SCO") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-06 15:58:29 -04:00
Jakub Kicinski	b89e0100a5	Merge tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a number of fixes now: - mac80211 - remove HT NSS validation to work with broken APs (with a kunit fix now) - remove 'static' that could cause races - check station link lookup before further processing - fix use-after-free due to delete in list iteration - remove AP station on assoc failures to fix crashes - ath12k - fix OF node refcount imbalance - fix queue flush ("REO update") in MLO - fix RCU assert - ath12k: - fix Kconfig with POWER_SEQUENCING - fix WMI buffer leaks on error conditions - don't use uninitialized stack data when processing RSSI events - fix logic for determining the peer ID in the RX path - ath5k: fix a potential stack buffer overwrite - rsi: fix thread lifetime race - brcmfmac: fix potential UAF - nl80211: - stricter permissions/checks for PMK and netns - fix netlink policy vs. code type confusion - cw1200: revert a broken locking change - various fixes to not trust values from firmware * tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (25 commits) wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage wifi: mac80211: remove station if connection prep fails wifi: mac80211: use safe list iteration in radar detect work wifi: libertas: notify firmware load wait on disconnect wifi: ath5k: do not access array OOB wifi: ath12k: fix peer_id usage in normal RX path wifi: ath12k: initialize RSSI dBm conversion event state wifi: ath12k: fix leak in some ath12k_wmi_xxx() functions wifi: cw1200: Revert "Fix locking in error paths" wifi: mac80211: tests: mark HT check strict wifi: rsi: fix kthread lifetime race between self-exit and external-stop wifi: mac80211: drop stray 'static' from fast-RX rx_result wifi: mac80211: check ieee80211_rx_data_set_link return in pubsta MLO path wifi: nl80211: require admin perm on SET_PMK / DEL_PMK wifi: libertas: fix integer underflow in process_cmdrequest() wifi: b43legacy: enforce bounds check on firmware key index in RX path wifi: b43: enforce bounds check on firmware key index in b43_rx() wifi: brcmfmac: Fix potential use-after-free issue when stopping watchdog task ... ==================== Link: https://patch.msgid.link/20260506110325.219675-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-06 07:29:31 -07:00
Maoyi Xie	79240f3f6d	wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation NL80211_CMD_GET_SCAN is implemented as a multi-call dumpit. The first invocation of nl80211_prepare_wdev_dump() validates the requested wdev against the caller's netns via __cfg80211_wdev_from_attrs(). Subsequent invocations look up the same wiphy by its global index and do not check that the wiphy is still in the caller's netns. Add the same filter to the continuation path. If the wiphy's netns no longer matches the caller's, return -ENODEV and the netlink dump machinery terminates the walk cleanly. Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260506064854.2207105-3-maoyixie.tju@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-06 11:08:41 +02:00
Maoyi Xie	15994bb0cb	wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS NL80211_CMD_SET_WIPHY_NETNS dispatches with GENL_UNS_ADMIN_PERM, which verifies that the caller has CAP_NET_ADMIN for the source netns. It doesn't verify that the caller has CAP_NET_ADMIN over the target netns selected by NL80211_ATTR_NETNS_FD or NL80211_ATTR_PID. This diverges from the convention enforced in net/core/rtnetlink.c::rtnl_get_net_ns_capable(): /* For now, the caller is required to have CAP_NET_ADMIN in * the user namespace owning the target net ns. */ if (!sk_ns_capable(sk, net->user_ns, CAP_NET_ADMIN)) return ERR_PTR(-EACCES); A user with CAP_NET_ADMIN in their own user namespace can therefore push a wiphy into an arbitrary netns (including init_net) over which they have no privilege. Mirror the rtnetlink convention by requiring CAP_NET_ADMIN in the target netns before calling cfg80211_switch_netns(). Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260506064854.2207105-2-maoyixie.tju@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-06 11:05:52 +02:00
Johannes Berg	0f3c0a1973	wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage This is documented as a u8 and has a policy of NLA_U8, but uses nla_get_u32() which means it's completely broken on big-endian. Fix it to use nla_get_u8(). Fixes: `9bb7e0f24e` ("cfg80211: add peer measurement with FTM initiator API") Link: https://patch.msgid.link/20260505113837.260159-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-06 11:03:21 +02:00
Johannes Berg	283fc9e44f	wifi: mac80211: remove station if connection prep fails If connection preparation fails for MLO connections, then the interface is completely reset to non-MLD. In this case, we must not keep the station since it's related to the link of the vif being removed. Delete an existing station. Any "new_sta" is already being removed, so that doesn't need changes. This fixes a use-after-free/double-free in debugfs if that's enabled, because a vif going from MLD (and to MLD, but that's not relevant here) recreates its entire debugfs. Cc: stable@vger.kernel.org Fixes: `81151ce462` ("wifi: mac80211: support MLO authentication/association with one link") Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260505151533.c4e52deb06ad.Iafe56cec7de8512626169496b134bce3a6c17010@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-06 11:02:57 +02:00
Jason Xing	203cee647f	xsk: fix u64 descriptor address truncation on 32-bit architectures In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit descriptor address into skb_shinfo(skb)->destructor_arg (void ) via a uintptr_t cast: skb_shinfo(skb)->destructor_arg = (void )((uintptr_t)addr \| 0x1UL); On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned mode the chunk offset is encoded in bits 48-63 of the descriptor address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is lost entirely. The completion queue then returns a truncated address to userspace, making buffer recycling impossible. Fix this by handling the 32-bit case directly in xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an xsk_addrs struct (the same path already used for multi-descriptor SKBs) to store the full u64 address. The existing tagged-pointer logic in xsk_skb_destructor_is_addr() stays unchanged: slab pointers returned from kmem_cache_zalloc() are always word-aligned and therefore have bit 0 clear, which correctly identifies them as a struct pointer rather than an inline tagged address on every architecture. Factor the shared kmem_cache_zalloc + destructor_arg assignment into __xsk_addrs_alloc() and add a wrapper xsk_addrs_alloc() that handles the inline-to-list upgrade (is_addr check + get_addr + num_descs = 1). The three former open-coded kmem_cache_zalloc call sites now reduce to a single call each. Propagate the -ENOMEM from xsk_skb_destructor_set_addr() through xsk_skb_init_misc() so the caller can clean up the skb via kfree_skb() before skb->destructor is installed. The overhead is one extra kmem_cache_zalloc per first descriptor on 32-bit only; 64-bit builds are completely unchanged. Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/ Fixes: `0ebc27a4c6` ("xsk: avoid data corruption on cq descriptor number") Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-9-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:51 -07:00
Jason Xing	e0f229025a	xsk: fix xsk_addrs slab leak on multi-buffer error path When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first continuation descriptor, it promotes destructor_arg from an inlined address to a freshly allocated xsk_addrs (num_descs = 1). The counter is bumped to >= 2 only at the very end of a successful build (by calling xsk_inc_num_desc()). If the build fails in between (e.g. alloc_page() returns NULL with -EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip calling xsk_inc_num_desc() to increment num_descs and leave the half-built skb attached to xs->skb for the app to retry. The skb now has 1) destructor_arg = a real xsk_addrs pointer, 2) num_descs = 1 If the app never retries and just close()s the socket, xsk_release() calls xsk_drop_skb() -> xsk_consume_skb(), which decides whether to free xsk_addrs by testing num_descs > 1: if (unlikely(num_descs > 1)) kmem_cache_free(xsk_tx_generic_cache, destructor_arg); Because num_descs is exactly 1 the branch is skipped and the xsk_addrs object is leaked to the xsk_tx_generic_cache slab. Fix it by directly testing if destructor_arg is still addr. Or else it is modified and used to store the newly allocated memory from xsk_tx_generic_cache regardless of increment of num_desc, which we need to handle. Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/ Fixes: `0ebc27a4c6` ("xsk: avoid data corruption on cq descriptor number") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-8-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:51 -07:00
Jason Xing	8c2cff50af	xsk: avoid skb leak in XDP_TX_METADATA case Fix it by explicitly adding kfree_skb() before returning back to its caller. How to reproduce it in virtio_net: 1. the current skb is the first one (which means no frag and xs->skb is NULL) and users enable metadata feature. 2. xsk_skb_metadata() returns a error code. 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. 4. there is no chance to free this skb anymore. Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/ Fixes: `30c3055f9c` ("xsk: wrap generic metadata handling onto separate function") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-7-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:50 -07:00
Jason Xing	3dec153ae4	xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb() Once xsk_skb_init_misc() has been called on an skb, its destructor is set to xsk_destruct_skb(), which submits the descriptor address(es) to the completion queue and advances the CQ producer. If such an skb is subsequently freed via kfree_skb() along an error path - before the skb has ever been handed to the driver - the destructor still runs and submits a bogus, half-initialized address to the CQ. Postpone the init phase when we believe the allocation of first frag is successfully completed. Before this init, skb can be safely freed by kfree_skb(). Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/ Fixes: `c30d084960` ("xsk: avoid overwriting skb fields for multi-buffer traffic") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-6-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:50 -07:00
Jason Xing	0f3776583d	xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path When xsk_build_skb() processes multi-buffer packets in copy mode, the first descriptor stores data into the skb linear area without adding any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb to accumulate subsequent descriptors. If a continuation descriptor fails (e.g. alloc_page returns NULL with -EAGAIN), we jump to free_err where the condition: if (skb && !skb_shinfo(skb)->nr_frags) kfree_skb(skb); evaluates to true because nr_frags is still 0 (the first descriptor used the linear area, not frags). This frees the skb while xs->skb still points to it, creating a dangling pointer. On the next transmit attempt or socket close, xs->skb is dereferenced, causing a use-after-free or double-free. Fix by using a !xs->skb check to handle first frag situation, ensuring we only free skbs that were freshly allocated in this call (xs->skb is NULL) and never free an in-progress multi-buffer skb that the caller still references. Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/ Fixes: `6b9c129c2f` ("xsk: remove @first_frag from xsk_build_skb()") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-5-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:50 -07:00
Jason Xing	8cd3c1c6e7	xsk: handle NULL dereference of the skb without frags issue When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in xsk_build_skb_zerocopy() (e.g., MAX_SKB_FRAGS exceeded), the free_err -EOVERFLOW handler unconditionally dereferences xs->skb via xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL pointer dereference. Fix this by guarding the existing xsk_inc_num_desc()/xsk_drop_skb() calls with an xs->skb check (for the continuation case), and add an else branch for the first-descriptor case that manually cancels the one reserved CQ slot and increments invalid_descs by one to account for the single invalid descriptor. Fixes: `cf24f5a5fe` ("xsk: add support for AF_XDP multi-buffer on Tx path") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-4-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:50 -07:00
Jason Xing	0bb7a9caf5	xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Fix it by explicitly adding kfree_skb() before returning back to its caller. How to reproduce it in virtio_net: 1. the current skb is the first one (which means xs->skb is NULL) and hit the limit MAX_SKB_FRAGS. 2. xsk_build_skb_zerocopy() returns -EOVERFLOW. 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This is why bug can be triggered. 4. there is no chance to free this skb anymore. Note that if in this case the xs->skb is not NULL, xsk_build_skb() will call xsk_drop_skb(xs->skb) to do the right thing. Fixes: `cf24f5a5fe` ("xsk: add support for AF_XDP multi-buffer on Tx path") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:50 -07:00
Jason Xing	d73a9a63f9	xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices skb_checksum_help() is a common helper that writes the folded 16-bit checksum back via skb->data + csum_start + csum_offset, i.e. it relies on the skb's linear head and fails (with WARN_ONCE and -EINVAL) when skb_headlen() is 0. AF_XDP generic xmit takes two very different paths depending on the netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net) skip the "copy payload into a linear head" step on purpose as a performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM pages as frags and never calls skb_put(), so skb_headlen() stays 0 for the whole skb. For these skbs there is simply no linear area for skb_checksum_help() to write the csum into - the sw-csum fallback is structurally inapplicable. The patch tries to catch this and reject the combination with error at setup time. Rejecting at bind() converts this silent per-packet failure into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected because they do not call skb_checksum_help(). Without the patch, every descriptor carrying 'XDP_TX_METADATA \| XDP_TXMD_FLAGS_CHECKSUM' produces: 1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(), 2) sendmsg() returning -EINVAL without consuming the descriptor (invalid_descs is not incremented), 3) a wedged TX ring: __xsk_generic_xmit() does not advance the consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads the same descriptor and re-hits the same WARN until the socket is closed. Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Fixes: `30c3055f9c` ("xsk: wrap generic metadata handling onto separate function") Link: https://patch.msgid.link/20260502200722.53960-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-05 19:27:49 -07:00

1 2 3 4 5 ...

84193 Commits