In smc_connect_work(), when the underlying TCP handshake fails, the error
code (rc) must be propagated to sk_err to ensure userspace can correctly
retrieve the error status via SO_ERROR. Currently, the code only handles
a restricted set of error codes (e.g., EPIPE, ECONNREFUSED). If other
errors occurs, such as EHOSTUNREACH, sk_err remains unset (zero).
This affects applications that rely on SO_ERROR to determine connect
outcome. For example, higher versions of Go's netpoller treats
SO_ERROR == 0 combined with a failed getpeername() as a spurious wakeup
and re-enters epoll_wait(). Under ET mode, no further edge will be
generated since the socket is already in a terminal state, causing the
connect to hang indefinitely or until a user-specified timeout, if one
is set.
Fixes: 50717a37db ("net/smc: nonblocking connect rework")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20260506014105.27093-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
XDP redirect into a veth device (via bpf_redirect()) calls
veth_xdp_xmit(), which enqueues frames into the peer's ptr_ring using
smp_processor_id() % peer->real_num_rx_queues
as the ring index. With an asymmetric veth pair where the peer has
fewer TX queues than RX queues, that index can exceed
peer->real_num_tx_queues.
veth_poll() then resolves peer_txq for the ring via:
peer_txq = peer_dev ? netdev_get_tx_queue(peer_dev, queue_idx) : NULL;
where queue_idx = rq->xdp_rxq.queue_index. When queue_idx exceeds
peer_dev->real_num_tx_queues this is an out-of-bounds (OOB) access
into the peer's netdev_queue array, triggering DEBUG_NET_WARN_ON_ONCE
in netdev_get_tx_queue().
The normal ndo_start_xmit path is not affected: the stack clamps
skb->queue_mapping via netdev_cap_txqueue() before invoking
ndo_start_xmit, so rxq in veth_xmit() never exceeds real_num_tx_queues.
Fix veth_poll() by clamping: only dereference peer_txq when queue_idx is
within bounds, otherwise set it to NULL. The out-of-range rings are fed
exclusively via XDP redirect (veth_xdp_xmit), never via ndo_start_xmit
(veth_xmit), so the peer txq was never stopped and there is nothing to
wake; NULL is the correct fallback.
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/all/20260502071828.616C3C19425@smtp.kernel.org/
Fixes: dc82a33297 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20260505132159.241305-2-hawk@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In gmac_rx() (drivers/net/ethernet/cortina/gemini.c), when
gmac_get_queue_page() returns NULL for the second page of a multi-page
fragment, the driver logs an error and continues — but does not free the
partially assembled skb that was being assembled via napi_build_skb() /
napi_get_frags().
Free the in-progress partially assembled skb via napi_free_frags()
and increase the number of dropped frames appropriately
and assign the skb pointer NULL to make sure it is not lingering
around, matching the pattern already used elsewhere in the driver.
Fixes: 4d5ae32f5e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Andreas Haarmann-Thiemann <eitschman@nebelreich.de>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260505-gemini-ethernet-fix-v2-1-997c31d06079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: pm: misc. fixes for v7.1-rc3
Here are various fixes, mainly related to ADD_ADDRs:
- Patch 1: save ADD_ADDR for rtx with ID0 when needed. A fix for v6.1.
- Patch 2: remove unneeded exception for ID 0. A fix for v5.10.
- Patches 3-5: fix potential data-race and leaks during ADD_ADDR rtx. A
fix for v5.10.
- Patch 6: resched blocked ADD_ADDR rtx after a more appropriated
timeout, not after 15 seconds. A fix for v5.10.
- Patch 7: skip inactive subflows when when looking at the max RTO. A
fix for v6.18.
- Patch 8: avoid iterating over all subflows when there is no need to. A
fix for v6.18.
- Patch 9: skip closed subflows when looking at sending MP_PRIO. A fix
for v5.17.
- Patch 10: properly catch errors when using check_output() in the
selftests. A fix for v6.9.
- Patch 11: skip the 'unknown' flag test when 'ip mptcp' is used. A fix
for v6.10.
====================
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-0-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When pm_netlink.sh is executed with '-i', 'ip mptcp' is used instead of
'pm_nl_ctl'. IPRoute2 doesn't support the 'unknown' flag, which has only
been added to 'pm_nl_ctl' for this specific check: to ensure that the
kernel ignores such unsupported flag.
No reason to add this flag to 'ip mptcp'. Then, this check should be
skipped when 'ip mptcp' is used.
Fixes: 0cef6fcac2 ("selftests: mptcp: ip_mptcp option for more scripts")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-11-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using '${?}' inside the if-statement to check the returned value from
the command that was evaluated as part of the if-statement is not
correct: here, '${?}' will be linked to the previous instruction, not
the one that is expected here (${cmd}).
Instead, simply mark the error, except if an error is expected. If
that's the case, 1 can be passed as the 4th argument of this helper.
Three checks from pm_netlink.sh expect an error.
While at it, improve the error message when the command unexpectedly
fails or succeeds.
Note that we could expect a specific returned value, but the checks
currently expecting an error can be used with 'ip mptcp' or 'pm_nl_ctl',
and these two tools don't return the same error code.
Fixes: 2d0c1d27ea ("selftests: mptcp: add mptcp_lib_check_output helper")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-10-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When looking at the maximum RTO amongst the subflows, inactive subflows
were taken into account: that includes stale ones, and the initial one
if it has been already been closed.
Unusable subflows are now simply skipped. Stale ones are used as an
alternative: if there are only stale ones, to take their maximum RTO and
avoid to eventually fallback to net.mptcp.add_addr_timeout, which is set
to 2 minutes by default.
Fixes: 30549eebc4 ("mptcp: make ADD_ADDR retransmission timeout adaptive")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-7-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When an ADD_ADDR needs to be retransmitted and another one has already
been prepared -- e.g. multiple ADD_ADDRs have been sent in a row and
need to be retransmitted later -- this additional retransmission will
need to wait.
In this case, the timer was reset to TCP_RTO_MAX / 8, which is ~15
seconds. This delay is unnecessary long: it should just be rescheduled
at the next opportunity, e.g. after the retransmission timeout.
Without this modification, some issues can be seen from time to time in
the selftests when multiple ADD_ADDRs are sent, and the host takes time
to process them, e.g. the "signal addresses, ADD_ADDR timeout" MPTCP
Join selftest, especially with a debug kernel config.
Note that on older kernels, 'timeout' is not available. It should be
enough to replace it by one second (HZ).
Fixes: 00cfd77b90 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-6-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(),
and released at the end.
If at that moment, it was the last reference being held, the sk would
not be freed. sock_put() should then be called instead of __sock_put().
But that's not enough: if it is the last reference, sock_put() will call
sk_free(), which will end up calling sk_stop_timer_sync() on the same
timer, and waiting indefinitely to finish. So it is needed to mark that
the timer is done at the end of the timer handler when it has not been
rescheduled, not to call sk_stop_timer_sync() on "itself".
Fixes: 00cfd77b90 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-5-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer().
It should then be released in all cases at the end.
Some (unlikely) checks were returning directly instead of calling
sock_put() to decrease the refcount. Jump to a new 'exit' label to call
__sock_put() (which will become sock_put() in the next commit) to fix
this potential leak.
While at it, drop the '!msk' check which cannot happen because it is
never reset, and explicitly mark the remaining one as "unlikely".
Fixes: 00cfd77b90 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-4-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When adding the ADD_ADDR to the list, the address including the IP, port
and ID are copied. On the other hand, when the endpoint corresponds to
the one from the initial subflow, the ID is set to 0, as specified by
the MPTCP protocol.
The issue is that the ID was reset after having copied the ID in the
ADD_ADDR entry. So the retransmission was done, but using a different ID
than the initial one.
Fixes: 8b8ed1b429 ("mptcp: pm: reuse ID 0 after delete and re-add")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-1-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tcp_child_process( .. child ...) currently calls sock_put(child).
Unfortunately @child (named @nsk in callers) can be used after
this point to send a RST packet.
To fix this UAF, I remove the sock_put() from tcp_child_process()
and let the callers handle this after it is safe.
Remove @rsk variable in tcp_v4_do_rcv() and change tcp_v6_do_rcv()
so that both functions look the same.
Fixes: cfb6eeb4c8 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260505153927.3435532-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When performing a lockless lookup over the inet_peer rbtree,
if a matching node is found, inet_getpeer() returns it immediately
without validating the seqlock sequence.
This missing check introduces a race condition:
Trigger Path: When a host receives an incoming fragmented IPv4 packet,
ip4_frag_init() (in net/ipv4/ip_fragment.c) calls inet_getpeer_v4()
to track the peer.
The Race: If the packet is from a new source IP, CPU A acquires the
write_seqlock, allocates a new inet_peer node (p), sets its IP address
(daddr), and links it to the rbtree (rb_link_node).
Uninitialized Access: Due to the lack of memory barriers between
rb_link_node and the initialization of the rest of the struct
(like refcount_set(&p->refcnt, 1)), CPU A can make the node visible
to readers before its refcnt is initialized.
This is especially true on weakly-ordered architectures like ARM64
where the CPU can reorder the memory stores.
Lockless Reader: Concurrently, CPU B processes a second fragmented packet
from the same source IP. CPU B does a lockless lookup, finds the newly
inserted node, and returns it immediately.
Use-After-Free (UAF): CPU B reads p->refcnt as uninitialized garbage
(left over from previous kmalloc-128/192 allocations).
If the garbage is > 0, refcount_inc_not_zero(&p->refcnt) succeeds.
CPU A then executes refcount_set(&p->refcnt, 1), overwriting CPU B's increment.
When CPU B finishes with the fragment queue, it calls inet_putpeer(),
which drops the refcount to 0 and frees the node via RCU.
The node is now freed but remains linked in the rbtree,
resulting in a Use-After-Free in the rbtree.
Fixes: b145425f26 ("inetpeer: remove AVL implementation in favor of RB tree")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260505133233.3039575-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Zahka says:
====================
netdevsim: psp: fix init and uninit bugs
This series has three fixes. The first is a straightforward NULL
pointer dereference that is reachable by creating and destroying some
vfs on a kernel with INET_PSP enabled.
The last two patches deal with nsim_psp_rereg_write(), which is a
debugfs handler that reregisters netdevsim's psp_dev without
aquiescing and disabling tx/rx processing. This was added to enable
some tests in psp.py where a psp device is unregistered while it still
referenced by tcp socket state.
There are two issues with this code:
1. Calls to nsim_psp_uninit() are not properly serialized
2. netdevsim's psp_dev refcount can be released while nsim_do_psp() is
reading from it.
====================
Link: https://patch.msgid.link/20260505-psd-rcu-v1-0-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are two issues with the way psp_dev is used in nsim_do_psp():
1. There is no check for IS_ERR() on the peers psp_dev, before
dereferencing.
2. The refcount on this psp_dev can be dropped by
nsim_psp_rereg_write()
To fix this, we can make netdevsim's reference to its psp_dev an rcu
reference, and then nsim_do_psp() can read the fields it needs from an
rcu critical section.
Fixes: f857478d62 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-3-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The debugfs write handler, nsim_psp_rereg_write(), can race against
nsim_destroy() and against itself, causing nsim_psp_uninit() to run
more than once concurrently. Two complementary changes serialize all
callers:
1. Delete the psp_rereg debugfs file from nsim_psp_uninit() before
doing the actual teardown. debugfs_remove() drains any in-flight
writers and prevents new ones from starting.
2. Add a mutex around the body of nsim_psp_rereg_write() so that two
concurrent userspace writers cannot both enter the teardown path
at once.
The teardown work itself is moved into a new __nsim_psp_uninit() that
the rereg handler calls under the mutex, while the public
nsim_psp_uninit() wraps it with the debugfs_remove()/mutex_destroy()
pair so nsim_destroy() doesn't have to know about the psp internals.
Fixes: f857478d62 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-2-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VFs go through nsim_init_netdevsim_vf() which never calls
nsim_psp_init(), so ns->psp.dev stays NULL. nsim_psp_uninit() guards
with !IS_ERR(ns->psp.dev), so destroying a VF reaches
psp_dev_unregister(NULL) and dereferences NULL on the first
mutex_lock(&psd->lock):
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:mutex_lock+0x1c/0x30
Call Trace:
psp_dev_unregister+0x2a/0x1a0
nsim_psp_uninit+0x1f/0x40 [netdevsim]
nsim_destroy+0x61/0x1e0 [netdevsim]
__nsim_dev_port_del+0x47/0x90 [netdevsim]
nsim_drv_configure_vfs+0xc9/0x130 [netdevsim]
nsim_bus_dev_numvfs_store+0x79/0xb0 [netdevsim]
Gate nsim_psp_uninit() on nsim_dev_port_is_pf(), matching the pattern
already used for nsim_exit_netdevsim() and the bpf/ipsec/macsec/queue
teardowns.
Reproducer:
modprobe netdevsim
echo "10 1" > /sys/bus/netdevsim/new_device
echo 1 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs
devlink dev eswitch set netdevsim/netdevsim10 mode switchdev
echo 0 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs
Fixes: f857478d62 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-1-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Klassert says:
====================
pull request (net): ipsec 2026-05-05
1. Fix an IPv6 encapsulation error path that leaked route references
when UDPv6 ESP decapsulation resolved to an error route.
From Yilin Zhu.
2. Fix AH with ESN on async crypto paths by accounting for the extra
high-order sequence number when reconstructing the temporary
authentication layout in the completion callbacks.
From Michael Bomarito.
3. Fix XFRM output so it does not overwrite already-correct inner header
pointers when a tunnel layer such as VXLAN has already saved them.
The fix comes with new selftests. From Cosmin Ratiu.
4. Add the missing native payload size entry for XFRM_MSG_MAPPING in the
compat translation path. From Ruijie Li.
5. Harden __xfrm_state_delete() against repeated or inconsistent unhashing
of state list nodes by keying the removal on actual list membership and
using delete-and-init helpers. From Michal Kosiorek.
6. Prevent ESP from decrypting shared splice-backed skb fragments in place
by marking UDP splice frags as shared and forcing copy-on-write in ESP
input when needed. From Kuan-Ting Chen.
* tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: esp: avoid in-place decrypt on shared skb frags
xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
xfrm: provide message size for XFRM_MSG_MAPPING
xfrm: Don't clobber inner headers when already set
tools/selftests: Add a VXLAN+IPsec traffic test
tools/selftests: Use a sensible timeout value for iperf3 client
xfrm: ah: account for ESN high bits in async callbacks
ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()
====================
Link: https://patch.msgid.link/20260505132326.1362733-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Antonio Quartulli says:
====================
Includes changes:
* ensure MAC header offset is reset before delivering packet
* ensure gro_cells_receive() and dstats_dev_add() are called
with BH disabled
* reduce ping count in selftest to ensure it completes within
timeout
* tag 'ovpn-net-20260504' of https://github.com/OpenVPN/ovpn-net-next:
selftests: ovpn: reduce ping count in test.sh
ovpn: ensure packet delivery happens with BH disabled
ovpn: reset MAC header before passing skb up
====================
Link: https://patch.msgid.link/20260504230305.2681646-1-antonio@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit dbf666e4fc ("Bluetooth: HIDP: Fix possible UAF") made
hidp_session_remove() drop the L2CAP reference and set
session->conn = NULL once the session is considered removed, and
added a bare if (session->conn) guard around the kthread-exit
l2cap_unregister_user() call in hidp_session_thread(). The sibling
ioctl site in hidp_connection_del() still reads session->conn
unlocked and unguarded, and the kthread-exit guard itself is a
lockless double-read.
hidp_session_find() drops hidp_session_sem before returning, so
hidp_session_remove() can null session->conn between the lookup and
the call in hidp_connection_del(). Worse, since commit 752a6c9596
("Bluetooth: L2CAP: Fix use-after-free in l2cap_unregister_user")
takes mutex_lock(&conn->lock) inside l2cap_unregister_user(), a
stale non-NULL snapshot also UAFs on conn->lock. v1 only added an
if (session->conn) guard at the ioctl site, which doesn't address
either race; Luiz suggested snapshotting session->conn under the
sem and clearing it before the call.
Taking hidp_session_sem across l2cap_unregister_user() would be
wrong: l2cap_conn_del() already establishes the lock order
conn->lock -> hidp_session_sem
via l2cap_unregister_all_users() -> user->remove ==
hidp_session_remove(), so taking hidp_session_sem before conn->lock
would AB/BA deadlock.
Factor a helper hidp_session_unregister_conn() that under
down_write(&hidp_session_sem) snapshots session->conn and clears
the member, then outside the sem calls l2cap_unregister_user() and
l2cap_conn_put() on the snapshot. Call it from both
hidp_connection_del() and hidp_session_thread()'s exit path. At
most one consumer wins the write-sem; later callers observe
session->conn == NULL and skip the unregister and put, so the
reference hidp_session_new() took via l2cap_conn_get() is consumed
exactly once. session_free() already tolerates a NULL session->conn.
Fixes: dbf666e4fc ("Bluetooth: HIDP: Fix possible UAF")
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Link: https://lore.kernel.org/all/20260422011437.176643-1-michael.bommarito@gmail.com/
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_le_big_sync_established_evt() currently does:
conn->num_bis = 0;
memset(conn->bis, 0, sizeof(conn->num_bis));
sizeof(conn->num_bis) is wrong - it would make sense to either use
conn->num_bis (before setting that to 0) or sizeof(conn->bis).
Fix it by using sizeof(conn->bis), the least intrusive change.
Luckily, nothing actually depends on this memset() working properly:
Nothing seems to ever read from conn->bis beyond conn->num_bis, and when
conn->num_bis is increased, the corresponding elements of conn->bis are
initialized. So I think this line could also just be removed.
This is a purely theoretical fix and should have no impact on actual
behavior.
Fixes: 42ecf19471 ("Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
rfcomm_recv_data() treats the first payload byte as a credit field when
the UIH frame carries PF and credit-based flow control is enabled.
After the header has been stripped, the PF/CFC path consumes that byte
with a direct skb->data dereference followed by skb_pull(). A malformed
short frame can reach this path without a byte available.
Use skb_pull_data() so the length check and pull happen together before
the returned credit byte is consumed.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
virtbt_rx_handle() reads the leading pkt_type byte from the RX skb
and forwards the remainder to hci_recv_frame() for every
event/ACL/SCO/ISO type, without checking that the remaining payload
is at least the fixed HCI header for that type.
After the preceding patch bounds the backend-supplied used.len to
[1, VIRTBT_RX_BUF_SIZE], a one-byte completion still reaches
hci_recv_frame() with skb->len already pulled to 0. If the byte
happened to be HCI_ACLDATA_PKT, the ACL-vs-ISO classification
fast-path in hci_dev_classify_pkt_type() dereferences
hci_acl_hdr(skb)->handle whenever the HCI device has an active
CIS_LINK, BIS_LINK, or PA_LINK connection, reading two bytes of
uninitialized RX-buffer data. The same hazard exists for every
packet type the driver accepts because none of the switch cases in
virtbt_rx_handle() check skb->len against the per-type minimum HCI
header size before handing the frame to the core.
After stripping pkt_type, require skb->len to cover the fixed
header size for the selected type (event 2, ACL 4, SCO 3, ISO 4)
before calling hci_recv_frame(); drop ratelimited otherwise.
Unknown pkt_type values still take the original kfree_skb() default
path.
Use bt_dev_err_ratelimited() because both the length and pkt_type
values come from an untrusted backend that can otherwise flood the
kernel log.
Fixes: 160fbcf3bf ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
virtbt_rx_work() calls skb_put(skb, len) where len comes directly
from virtqueue_get_buf() with no validation against the buffer we
posted to the device. The RX skb is allocated in virtbt_add_inbuf()
and exposed to virtio as exactly 1000 bytes via sg_init_one().
Checking len against skb_tailroom(skb) is not sufficient because
alloc_skb() can leave more tailroom than the 1000 bytes actually
handed to the device. A malicious or buggy backend can therefore
report used.len between 1001 and skb_tailroom(skb), causing skb_put()
to include uninitialized kernel heap bytes that were never written by
the device.
The same path also accepts len == 0, in which case skb_put(skb, 0)
leaves the skb empty but virtbt_rx_handle() still reads the pkt_type
byte from skb->data, consuming uninitialized memory.
Define VIRTBT_RX_BUF_SIZE once and reuse it in alloc_skb() and
sg_init_one(), and gate virtbt_rx_work() on that same constant so
the bound checked matches the buffer actually exposed to the device.
Reject used.len == 0 in the same gate so an empty completion can
no longer reach virtbt_rx_handle().
Use bt_dev_err_ratelimited() because the length value comes from an
untrusted backend that can otherwise flood the kernel log.
Same class of bug as commit c04db81cd0 ("net/9p: Fix buffer
overflow in USB transport layer"), which hardened the USB 9p
transport against unchecked device-reported length.
Fixes: 160fbcf3bf ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
btmtk_usb_hci_wmt_sync() casts the WMT event response SKB data to
struct btmtk_hci_wmt_evt (7 bytes) and struct btmtk_hci_wmt_evt_funcc
(9 bytes) without first checking that the SKB contains enough data.
A short firmware response causes out-of-bounds reads from SKB tailroom.
Use skb_pull_data() to validate and advance past the base WMT event
header. For the FUNC_CTRL case, pull the additional status field bytes
before accessing them.
Fixes: d019930b00 ("Bluetooth: btmtk: move btusb_mtk_hci_wmt_sync to btmtk.c")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Several iso_pi(sk) fields (qos, qos_user_set, bc_sid, base, base_len,
sync_handle, bc_num_bis) are written under lock_sock in
iso_sock_setsockopt() and iso_sock_bind(), but read and written under
hci_dev_lock only in two other paths:
- iso_connect_bis() / iso_connect_cis(), invoked from connect(2),
read qos/base/bc_sid and reset qos to default_qos on the
qos_user_set validation failure -- all without lock_sock.
- iso_connect_ind(), invoked from hci_rx_work, writes sync_handle,
bc_sid, qos.bcast.encryption, bc_num_bis, base and base_len on
PA_SYNC_ESTABLISHED / PAST_RECEIVED / BIG_INFO_ADV_REPORT /
PER_ADV_REPORT events. The BIG_INFO handler additionally passes
&iso_pi(sk)->qos together with sync_handle / bc_num_bis / bc_bis
to hci_conn_big_create_sync() while setsockopt may be mutating
them.
Acquire lock_sock around the affected accesses in both paths.
The locking order hci_dev_lock -> lock_sock matches the existing
iso_conn_big_sync() precedent, whose comment documents the same
requirement for hci_conn_big_create_sync(). The HCI connect/bind
helpers do not wait for command completion -- they enqueue work via
hci_cmd_sync_queue{,_once}() / hci_le_create_cis_pending() and
return -- so the added hold time is comparable to iso_conn_big_sync().
KCSAN report:
BUG: KCSAN: data-race in iso_connect_cis / iso_sock_setsockopt
read to 0xffffa3ae8ce3cdc8 of 1 bytes by task 335 on cpu 0:
iso_connect_cis+0x49f/0xa20
iso_sock_connect+0x60e/0xb40
__sys_connect_file+0xbd/0xe0
__sys_connect+0xe0/0x110
__x64_sys_connect+0x40/0x50
x64_sys_call+0xcad/0x1c60
do_syscall_64+0x133/0x590
entry_SYSCALL_64_after_hwframe+0x77/0x7f
write to 0xffffa3ae8ce3cdc8 of 60 bytes by task 334 on cpu 1:
iso_sock_setsockopt+0x69a/0x930
do_sock_setsockopt+0xc3/0x170
__sys_setsockopt+0xd1/0x130
__x64_sys_setsockopt+0x64/0x80
x64_sys_call+0x1547/0x1c60
do_syscall_64+0x133/0x590
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 334 Comm: iso_setup_race Not tainted 7.0.0-10949-g8541d8f725c6 #44 PREEMPT(lazy)
The iso_connect_ind() races were found by inspection.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
iso_sock_connect() copies the destination address into
iso_pi(sk)->dst under lock_sock, then releases the lock and reads
it back with bacmp() to decide between the CIS and BIS connect
paths:
lock_sock(sk);
bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr);
iso_pi(sk)->dst_type = sa->iso_bdaddr_type;
release_sock(sk);
if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY)) // <- no lock held
This read after release_sock() races with any concurrent write to
iso_pi(sk)->dst on the same socket.
Fix by reading the destination address directly from the local
sockaddr argument (sa->iso_bdaddr) instead of iso_pi(sk)->dst.
Since sa is a function-local argument, reading it requires no
locking and avoids the race.
This patch addresses only the bacmp() race in iso_sock_connect();
other unprotected iso_pi(sk) accesses are fixed separately in the
next patch.
KCSAN report:
BUG: KCSAN: data-race in memcmp+0x39/0xb0
race at unknown origin, with read to 0xffff8f96ea66dde3 of 1 bytes by task 549 on cpu 1:
memcmp+0x39/0xb0
iso_sock_connect+0x275/0xb40
__sys_connect_file+0xbd/0xe0
__sys_connect+0xe0/0x110
__x64_sys_connect+0x40/0x50
x64_sys_call+0xcad/0x1c60
do_syscall_64+0x133/0x590
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x00 -> 0xee
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 549 Comm: iso_race_combin Not tainted 7.0.0-08391-g1d51b370a0f8 #40 PREEMPT(lazy)
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When a fault is injected during hci_uart line discipline setup, the
proto open() callback may fail leaving hu->priv as NULL. A subsequent
TIOCSTI ioctl can trigger the recv() callback before priv is
initialized, causing a NULL pointer dereference.
Fix all four affected HCI UART protocol drivers by adding a NULL check
on hu->priv at the start of their recv() callbacks: h4, h5, ath and
bcsp.
Reported-by: syzbot+ff30eeab8e07b37d524e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ff30eeab8e07b37d524e
Signed-off-by: Aurelien DESBRIERES <aurelien@hackers.camp>
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
CSR boot stage register bit 12 is documented as a device warning,
not a fatal error. Rename the bit definition accordingly and stop
including it in btintel_pcie_in_error().
This keeps warning-only boot stage values from being classified as
errors while preserving abort-handler state as the actual error
condition.
Fixes: 190377500f ("Bluetooth: btintel_pcie: Dump debug registers on error")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Sai Teja Aluvala <aluvala.sai.teja@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
sk deref in sco_conn_ready must be done either under conn->lock, or
holding a refcount, to avoid concurrent close. conn->sk and parent sk is
currently accessed without either, and without checking parent->sk_state:
[Task 1] [Task 2]
sco_sock_release
sco_conn_ready
sk = conn->sk
lock_sock(sk)
conn->sk = NULL
lock_sock(sk)
release_sock(sk)
sco_sock_kill(sk)
UAF on sk deref
and similarly for access to sco_get_sock_listen() return value.
Fix possible UAF by holding sk refcount in sco_conn_ready() and making
sco_get_sock_listen() increase refcount. Also recheck after lock_sock
that the socket is still valid. Adjust conn->sk locking so it's
protected also by lock_sock() of the associated socket if any.
Fixes: 27c24fda62 ("Bluetooth: switch to lock_sock in SCO")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().
Fixes: 80808e431e ("Bluetooth: Add l2cap_chan_ops abstraction")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().
Fixes: 8d836d71e2 ("Bluetooth: Access sk_sndtimeo indirectly in l2cap_core.c")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().
Fixes: 89bc500e41 ("Bluetooth: Add state tracking to struct l2cap_chan")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When a BLE peripheral sends an L2CAP Connection Parameter Update Request
the processing path is:
process_pending_rx() [takes conn->lock]
l2cap_le_sig_channel()
l2cap_conn_param_update_req()
hci_le_conn_update() [takes hdev->lock]
Meanwhile other code paths take the locks in the opposite order:
l2cap_chan_connect() [takes hdev->lock]
...
mutex_lock(&conn->lock)
l2cap_conn_ready() [hdev->lock via hci_cb_list_lock]
...
mutex_lock(&conn->lock)
This is a classic AB/BA deadlock which lockdep reports as a circular
locking dependency when connecting a BLE MIDI keyboard (Carry-On FC-49).
Fix this by making hci_le_conn_update() defer the HCI command through
hci_cmd_sync_queue() so it no longer needs to take hdev->lock in the
caller context. The sync callback uses __hci_cmd_sync_status_sk() to
wait for the HCI_EV_LE_CONN_UPDATE_COMPLETE event, then updates the
stored connection parameters (hci_conn_params) and notifies userspace
(mgmt_new_conn_param) only after the controller has confirmed the update.
A reference on hci_conn is held via hci_conn_get()/hci_conn_put() for
the lifetime of the queued work to prevent use-after-free, and
hci_conn_valid() is checked before proceeding in case the connection was
removed while the work was pending. The hci_dev_lock is held across
hci_conn_valid() and all conn field accesses to prevent a concurrent
disconnect from invalidating the connection mid-use.
Fixes: f044eb0524 ("Bluetooth: Store latency and supervision timeout in connection params")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The L2CAP specification states that if more than one channel is being
reconfigured, the MPS shall not be decreased. The current check has
two issues:
1) The comparison uses >= (greater-than-or-equal), which incorrectly
rejects reconfiguration requests where the MPS stays the same.
Since the spec says MPS "shall be greater than or equal to the
current MPS", only a strict decrease (remote_mps > mps) should be
rejected. Keeping the same MPS is valid.
2) The multi-channel guard uses `&& i` (loop index) to approximate
"more than one channel", but this incorrectly allows MPS decrease
for the first channel (i==0) even when multiple channels are being
reconfigured. Replace with `&& num_scid > 1` which correctly
checks whether the request covers more than one channel.
Fixes: 7accb1c432 ("Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ")
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
In bnep_rx_frame(), the BNEP_FILTER_NET_TYPE_SET and
BNEP_FILTER_MULTI_ADDR_SET extension header parsing has two bugs:
1) The 2-byte length field is read with *(u16 *)(skb->data + 1), which
performs a native-endian read. The BNEP protocol specifies this field
in big-endian (network byte order), and the same file correctly uses
get_unaligned_be16() for the identical fields in
bnep_ctrl_set_netfilter() and bnep_ctrl_set_mcfilter().
2) The length is multiplied by 2, but unlike BNEP_SETUP_CONN_REQ where
the length byte counts UUID pairs (requiring * 2 for two UUIDs per
entry), the filter extension length field already represents the total
data size in bytes. This is confirmed by bnep_ctrl_set_netfilter()
which reads the same field as a byte count and divides by 4 to get
the number of filter entries.
The bogus * 2 means skb_pull advances twice as far as it should,
either dropping valid data from the next header or causing the pull
to fail entirely when the doubled length exceeds the remaining skb.
Fix by splitting the pull into two steps: first use skb_pull_data() to
safely pull and validate the 3-byte fixed header (ctrl type + length),
then pull the variable-length data using the properly decoded length.
Fixes: bf8b9a9cb7 ("Bluetooth: bnep: Add support to extended headers of control frames")
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_le_create_big_complete_evt() iterates over BT_BOUND connections for
a BIG handle using a while loop, accessing ev->bis_handle[i++] on each
iteration. However, there is no check that i stays within ev->num_bis
before the array access.
When a controller sends a LE_Create_BIG_Complete event with fewer
bis_handle entries than there are BT_BOUND connections for that BIG,
or with num_bis=0, the loop reads beyond the valid bis_handle[] flex
array into adjacent heap memory. Since the out-of-bounds values
typically exceed HCI_CONN_HANDLE_MAX (0x0EFF), hci_conn_set_handle()
rejects them and the connection remains in BT_BOUND state. The same
connection is then found again by hci_conn_hash_lookup_big_state(),
creating an infinite loop with hci_dev_lock held.
Fix this by terminating the BIG if in case not all BIS could be setup
properly.
Fixes: a0bfde167b ("Bluetooth: ISO: Add support for connecting multiple BISes")
Cc: stable@vger.kernel.org
Signed-off-by: ZhiTao Ou <hkbinbinbin@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Add hci_conn_valid() check in create_big_sync() to detect stale
connections before proceeding with BIG creation. Handle the
resulting -ECANCELED in create_big_complete() and re-validate the
connection under hci_dev_lock() before dereferencing, matching the
pattern used by create_le_conn_complete() and create_pa_complete().
Keep the hci_conn object alive across the async boundary by taking
a reference via hci_conn_get() when queueing create_big_sync(), and
dropping it in the completion callback. The refcount and the lock
are complementary: the refcount keeps the object allocated, while
hci_dev_lock() serializes hci_conn_hash_del()'s list_del_rcu() on
hdev->conn_hash, as required by hci_conn_del().
hci_conn_put() is called outside hci_dev_unlock() so the final put
(which resolves to kfree() via bt_link_release) does not run under
hdev->lock, though the release path would be safe either way.
Without this, create_big_complete() would unconditionally
dereference the conn pointer on error, causing a use-after-free
via hci_connect_cfm() and hci_conn_del().
Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Cc: stable@vger.kernel.org
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>