linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-04 19:04:46 -04:00

Author	SHA1	Message	Date
Eric Dumazet	e4bdef4d32	ipv4: use WARN_ON_ONCE() in ip_rt_bug() It turns out ip_rt_bug() can be called more than expected. syzbot will still panic (because of panic_on_warn=1), but non debug kernels will no longer die while repeating stack traces on the console. Fixes: `c378a9c019` ("ipv4: Give backtrace in ip_rt_bug().") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260519193248.4018872-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 19:00:36 -07:00
Eric Dumazet	7eb72c1e39	ipv4: icmp: reject broadcast/multicast routes syzbot was able to trigger ip_rt_bug() in a loop, using an IPv4 packet with a crafted IPOPT_SSRR option: options: ipv4_options { options: array[ipv4_option] { union ipv4_option { ssrr: ipv4_option_route[IPOPT_SSRR] { type: const = 0x89 (1 bytes) length: len = 0x7 (1 bytes) pointer: int8 = 0xa2 (1 bytes) data: array[ipv4_addr] { union ipv4_addr { broadcast: const = 0xffffffff (4 bytes) } } } } Change __icmp_send() to not send ICMP to broadcast/multicast destinations. Fixes: `c378a9c019` ("ipv4: Give backtrace in ip_rt_bug().") Reported-by: syzbot+c13a57c2639c2c0d03a6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a0cc169.170a0220.1f6c2d.0004.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260519200836.4141061-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 19:00:02 -07:00
David Carlier	4eb82ba543	net: devmem: reject dma-buf bind with non-page-aligned size or SG length net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be PAGE_SIZE multiples without checking: - tx_vec is sized dmabuf->size / PAGE_SIZE, and net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = NPAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = NPAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() covers the full byte len, so a non-page-multiple non-final sg desyncs num_niovs from the gen_pool region for every later sg, on both RX and TX. dma-buf does not require page-aligned sizes, so the bind path has to enforce what its own indexing assumes. Reject both with -EINVAL. The size check is TX-only (only tx_vec is sized off dmabuf->size); the SG-length check covers both directions. Fixes: `bd61848900` ("net: devmem: Implement TX path") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20260519203530.66310-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 18:59:01 -07:00
Jakub Kicinski	5027c886e2	Merge tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE - L2CAP: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del() - ISO: drop ISO_END frames received without prior ISO_START - MGMT: validate Add Extended Advertising Data length - bnep: Fix UAF read of dev->name - btmtk: fix urb->setup_packet leak in error paths - btintel_pcie: Fix incorrect MAC access programming - hci_uart: fix UAFs and race conditions in close and init paths * tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del() Bluetooth: hci_uart: fix UAFs and race conditions in close and init paths Bluetooth: MGMT: validate Add Extended Advertising Data length Bluetooth: btmtk: fix urb->setup_packet leak in error paths Bluetooth: ISO: drop ISO_END frames received without prior ISO_START Bluetooth: btintel_pcie: Fix incorrect MAC access programming Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE Bluetooth: bnep: Fix UAF read of dev->name ==================== Link: https://patch.msgid.link/20260520204959.2902497-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 17:26:56 -07:00
Jakub Kicinski	4a2844dcc0	Merge branch 'bpf-skmsg-fix-verdict-sk_data_ready-racing-with-ktls-rx' Xingwang Xiang says: ==================== bpf, skmsg: fix verdict sk_data_ready racing with ktls rx sk_psock_verdict_data_ready() lacks the tls_sw_has_ctx_rx() guard that sk_psock_strp_data_ready() gained in `e91de6afa8`. When a socket is inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is configured, the missing guard causes tcp_read_skb() to drain sk_receive_queue without advancing copied_seq, leaving a dangling frag_list pointer that tls_decrypt_sg() walks — a use-after-free. Patch 1 mirrors the fix from `e91de6afa8`: add the tls_sw_has_ctx_rx() check to sk_psock_verdict_data_ready() so that when a TLS RX context is present the function defers to psock->saved_data_ready (sock_def_readable) instead of calling tcp_read_skb(). Patch 2 adds a selftest that drives the vulnerable sequence end-to-end and verifies recv() returns the correct decrypted data. ==================== Link: https://patch.msgid.link/20260517145630.20521-1-v3rdant.xiang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 17:23:56 -07:00
Xingwang Xiang	33644bd38a	selftests/bpf: add regression test for ktls+sockmap verdict UAF Test the scenario where a socket is inserted into a sockmap with a BPF_SK_SKB_VERDICT program before TLS RX is configured. Previously sk_psock_verdict_data_ready() would call tcp_read_skb() and drain the receive queue without advancing copied_seq, causing tls_decrypt_sg() to walk a dangling frag_list pointer (use-after-free). The test drives the full vulnerable sequence and verifies that after the fix recv() returns the correct decrypted data. Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com> Link: https://patch.msgid.link/20260517145630.20521-3-v3rdant.xiang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 17:23:56 -07:00
Xingwang Xiang	ddf8029623	bpf, skmsg: fix verdict sk_data_ready racing with ktls rx sk_psock_strp_data_ready() already checks tls_sw_has_ctx_rx() and defers to psock->saved_data_ready when a TLS RX context is present, avoiding a conflict with the TLS strparser's ownership of the receive queue (commit `e91de6afa8`, "bpf: Fix running sk_skb program types with ktls"). sk_psock_verdict_data_ready() has no equivalent guard. When a socket is inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is configured, tls_sw_strparser_arm() saves sk_psock_verdict_data_ready as rx_ctx->saved_data_ready. On data arrival: tls_data_ready -> tls_strp_data_ready -> tls_rx_msg_ready -> saved_data_ready() = sk_psock_verdict_data_ready() -> tcp_read_skb() drains sk_receive_queue via __skb_unlink() without calling tcp_eat_skb(), so copied_seq is not advanced. tls_strp_msg_load() then finds tcp_inq() >= full_len (stale), calls tcp_recv_skb() on the now-empty queue, hits WARN_ON_ONCE(!first), and returns with rx_ctx->strp.anchor.frag_list pointing at a psock-owned (potentially freed) skb. tls_decrypt_sg() subsequently walks that frag_list: use-after-free. Apply the same fix as sk_psock_strp_data_ready(): if a TLS RX context is present, call psock->saved_data_ready (sock_def_readable) to wake recv() waiters and return immediately, leaving the receive queue untouched. TLS retains sole ownership of the queue and decrypts the record normally through tls_sw_recvmsg(). Fixes: `ef5659280e` ("bpf, sockmap: Allow skipping sk_skb parser program") Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com> Link: https://patch.msgid.link/20260517145630.20521-2-v3rdant.xiang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 17:21:21 -07:00
Rosen Penev	e7c70bf97e	net: ag71xx: check error for platform_get_irq Complete error handling for a failed platform_get_irq() call Fixes: `d51b6ce441` ("net: ethernet: add ag71xx driver") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260516212616.11758-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:50:47 -07:00
Jakub Kicinski	7769d17e02	Merge branch 'rxrpc-better-fix-for-data-response-decrypt-vs-splice' David Howells says: ==================== rxrpc: Better fix for DATA/RESPONSE decrypt vs splice() Here are two patches containing better fixes for the in-place decryption of DATA and RESPONSE packets that can corrupt pagecache spliced into UDP packets and sent to an AF_RXRPC server [CVE-2026-43500], plus a patch to precheck the length of rxgk-secured DATA packets. Of the main patches, one patch fixes DATA decryption by having recvmsg unconditionally extract the data into a flat bounce buffer and, if need be, decrypt it there. It doesn't seem to cause a performance problem to do this even on unencrypted packets; for encrypted packets it makes sure the content is correctly aligned for crypto which seems to get a small performance gain. Further, it means that DATA packets are no longer copied in the I/O thread, avoiding a slowdown of the protocol engine that runs there. The other main patch fixes RESPONSE decryption by having the connection event handler worker copy the data to a flat buffer and, again, decrypt it there. This simplifies RESPONSE handling. With these two fixes, the data content of the received sk_buff no longer gets altered. ==================== Link: https://patch.msgid.link/20260515230516.2718212-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:36:47 -07:00
David Howells	8bfab4b6ff	rxrpc: Fix RESPONSE packet verification to extract skb to a linear buffer This improves the fix for CVE-2026-43500. Fix the verification of RESPONSE packets to avoid the problem of overwriting a RESPONSE packet sent via splice to a local address by extracting the contents of the UDP packet into a kmalloc'd linear buffer rather than decrypting the data in place in the sk_buff (which may corrupt the original buffer). Fixes: `24481a7f57` ("rxrpc: Fix conn-level packet handling to unshare RESPONSE packets") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Simon Horman <horms@kernel.org> cc: Jiayuan Chen <jiayuan.chen@linux.dev> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Link: https://patch.msgid.link/20260515230516.2718212-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:36:45 -07:00
David Howells	d2bc90cf6c	rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg This improves the fix for CVE-2026-43500. Fix the pagecache corruption from in-place decryption of a DATA packet transmitted locally by splice() by getting rid of the packet sharing in the I/O thread and unconditionally extracting the packet content into a bounce buffer in which the buffer is decrypted. recvmsg() (or the kernel equivalent) then copies the data from the bounce buffer to the destination buffer. The sk_buff then remains unmodified. This has an additional advantage in that the packet is then arranged in the buffer with the correct alignment required for the crypto algorithms to process directly. The performance of the crypto does seem to be a little faster and, surprisingly, the unencrypted performance doesn't seem to change much - possibly due to removing complexity from the I/O thread. Yet another advantage is that the I/O thread doesn't have to copy packets which would slow down packet distribution, ACK generation, etc.. The buffer belongs to the call and is allocated initially at 2K, sufficiently large to hold a whole jumbo subpacket, but the buffer will be increased in size if needed. However, to take this work, MSG_PEEK may cause a later packet to be decrypted into the buffer, in which case the earlier one will need re-decrypting for a subsequent recvmsg(). Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so switch to using USHRT_MAX to indicate an invalid offset. Note also that I would generally prefer to replace the buffers of the current sk_buff with a new kmalloc'd buffer of the right size, ditching the old data and frags as this makes the handling of MSG_PEEK easier and removes the re-decryption issue, but this looks like quite a complicated thing to achieve. skb_morph() looks half way to what I want, but I don't want to have to allocate a new sk_buff. Fixes: `d0d5c0cd1e` ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Simon Horman <horms@kernel.org> cc: Jiayuan Chen <jiayuan.chen@linux.dev> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Link: https://patch.msgid.link/20260515230516.2718212-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:36:45 -07:00
David Howells	2b50aceafe	crypto/krb5, rxrpc: Fix lack of pre-decrypt/pre-verify length checks Change the krb5 crypto library to provide facilities to precheck the length of the message about to be decrypted or verified. Fix AF_RXRPC to make use of this to validate DATA packets secured with RxGK. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260511160753.607296-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: Simon Horman <horms@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Link: https://patch.msgid.link/20260515230516.2718212-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:36:45 -07:00
Jakub Kicinski	b1a736f8bc	Merge branch 'net-shaper-fix-valid-confusion-even-more' Jakub Kicinski says: ==================== net: shaper: fix VALID confusion even more Sashiko reported another pre-exising issue in the previous batch of fixes: https://sashiko.dev/#/patchset/20260510192904.3987113-7-kuba@kernel.org Turns out I over-esitmated the guarantees of the XArray flags. Stop using them completely. ==================== Link: https://patch.msgid.link/20260515221325.1685455-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:34:22 -07:00
Jakub Kicinski	b8d7519352	net: shaper: rework the VALID marking (again) Recent commit changed the semantics from NOT_VALID to VALID. I didn't realize that the flags are not stored atomically with the entry in XArray. There's still a race of reader observing a VALID mark for a slot, getting interrupted, writer replacing the entry with a different one, reader continuing, fetching the entry which is now a different pointer than the pointer for which VALID was meant. The biggest consequence of this is that we may see a UAF since net_shaper_rollback() assumed that entries without VALID can be freed without observing RCU. Looks like the XArray marks are buying us nothing at this point. Let's convert the code to an explicit valid field. The smp_load_acquire() / smp_store_release() barriers are marginally cleaner. Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: `93954b40f6` ("net-shapers: implement NL set and delete operations") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260515221325.1685455-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:34:20 -07:00
Jakub Kicinski	a3442936dd	net: shaper: annotate the data races As previously discussed we don't care about making the shaper state fully RCU-compliant because the hierarchy itself can't be dumped in one go over Netlink. Let's annotate the reads and writes to make that clear. The field-by-field assignments will also be useful for the next commit which adds explicit "valid" field (which we don't want to override with the current full struct assignment). Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260515221325.1685455-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:34:20 -07:00
Prathamesh Deshpande	abe003b332	net/mlx5e: Fix eswitch mode block underflow on IPsec acquire SA mlx5e_xfrm_add_state() handles acquire-flow temporary SAs by allocating software state and skipping hardware offload setup. That path jumps to the common success label before taking the eswitch mode block. After tunnel-mode validation was moved earlier, the common success label unconditionally calls mlx5_eswitch_unblock_mode(). For acquire SAs, this decrements esw->offloads.num_block_mode without a matching increment. Return directly after installing the acquire SA offload handle, so only the paths that successfully called mlx5_eswitch_block_mode() call the matching unblock. Fixes: `22239eb258` ("net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed") Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260510225903.13184-1-prathameshdeshpande7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 15:19:29 -07:00
Jakub Kicinski	9a8e01c500	Merge branch 'udp-gso-fix-__udp_gso_segment-after-gso_partial-udp-length-change' Gal Pressman says: ==================== udp: gso: Fix __udp_gso_segment() after GSO_PARTIAL UDP length change This series fixes two issues introduced by commit `b10b446ce7` ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL"), which switched __udp_gso_segment() to write the single MSS length into the UDP header for GSO_PARTIAL skbs. Patch 1 (from Alice) fixes the UDP checksum adjustment in __udp_gso_segment(). The patch adjusts the checksum by the correct delta, and since msslen and newlen become equivalent before the loop, drops one of the two variables to simplify the code. Patch 2 handles the case where the last segment of a GSO_PARTIAL skb is itself a GSO skb. This happens when the original packet size is an exact multiple of MSS, so the post-loop segment is not a remainder skb but a full GSO chunk and must also carry the single MSS length in its UDP header. ==================== Link: https://patch.msgid.link/20260518062250.3019914-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 15:04:17 -07:00
Gal Pressman	78effd896e	udp: Fix UDP length on last GSO_PARTIAL segment Following the cited commit, __udp_gso_segment() writes single MSS length in the UDP header. The cited patch doesn't account for the fact that the last segment could be a GSO skb by itself. This could happen when the size of the packet is a multiple of MSS, hence the first segment is also the last one (there is no need for a remainder skb). When the post-loop segment is a GSO skb, assign the single MSS length in the UDP header. Fixes: `b10b446ce7` ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL") Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Closes: https://lore.kernel.org/all/6c3fb15e-711d-4b8d-b152-e03d9b05293f@linux.dev/ Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260518062250.3019914-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 15:03:47 -07:00
Alice Mikityanska	5f17ae0f59	udp: gso: Fix handling checksum in __udp_gso_segment The cited commit started using msslen for uh->len, but still uses newlen to adjust uh->check. Although the checksum is ignored in most cases due to the hardware offload, __udp_gso_segment attempts to maintain the correct one. Fix uh->check and adjust it by the right value. Additionally, after the fix, newlen becomes assigned and unused before the loop. The code can be simplified a bit if mss adjustment is dropped, so that newlen becomes equal to msslen before the loop, and msslen can be also dropped, saving a few lines of code. This brings us back to one variable, drops an unneeded arithmetic for mss, and fixes the UDP checksum. Fixes: `b10b446ce7` ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL") Signed-off-by: Alice Mikityanska <alice@isovalent.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260518062250.3019914-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 15:02:47 -07:00
Safa Karakuş	ab1513597c	Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del() bt_accept_dequeue() unlinks a not-yet-accepted child from the parent accept queue and release_sock()s it before returning, so the returned sk has no caller reference and is unlocked. l2cap_sock_cleanup_listen() walks these children on listening-socket close. A concurrent HCI disconnect drives hci_rx_work -> l2cap_conn_del() which runs l2cap_chan_del() + l2cap_sock_kill() and frees the child sk and its l2cap_chan; cleanup_listen() then uses both: BUG: KASAN: slab-use-after-free in l2cap_sock_kill l2cap_sock_kill / l2cap_sock_cleanup_listen / __x64_sys_close Freed by: l2cap_conn_del -> l2cap_sock_close_cb -> l2cap_sock_kill This is distinct from the two fixes already in this area: commit `e83f5e24da` ("Bluetooth: serialize accept_q access") serialises the accept_q list/poll and takes temporary refs inside bt_accept_dequeue(), and CVE-2025-39860 serialises the userspace close()/accept() race by calling cleanup_listen() under lock_sock() in l2cap_sock_release(). Neither covers l2cap_conn_del() running from hci_rx_work, so this UAF still reproduces on current bluetooth/master. Take the reference at the source: bt_accept_dequeue() does sock_hold() while sk is still locked, before release_sock(); callers sock_put(). cleanup_listen() pins the chan with l2cap_chan_hold_unless_zero() under a brief child sk lock (serialising vs l2cap_sock_teardown_cb()), drops it before l2cap_chan_lock(), and skips a duplicate l2cap_sock_kill() on SOCK_DEAD. conn->lock is not taken here: cleanup_listen() runs under the parent sk lock and that would invert conn->lock -> chan->lock -> sk_lock (lockdep). KASAN/SMP: an unprivileged listen/close vs HCI-disconnect race produced 12 use-after-free reports per run before this change; 0, and no lockdep report, over 1600+ raced iterations after it on bluetooth/master. Fixes: `15f02b9105` ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Cc: stable@vger.kernel.org Reported-by: Siwei Zhang <oss@fourdim.xyz> Reviewed-by: Siwei Zhang <oss@fourdim.xyz> Signed-off-by: Safa Karakuş <safa.karakus@secunnix.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Mingyu Wang	c1bb9336ae	Bluetooth: hci_uart: fix UAFs and race conditions in close and init paths Vulnerabilities leading to Use-After-Free (UAF) and Null Pointer Dereference (NPD) conditions were observed in the lifecycle management of hci_uart. The primary issue arises because the workqueues (init_ready and write_work) are only flushed/cancelled if the HCI_UART_PROTO_READY flag is set during TTY close. If a hangup occurs before setup completes, hci_uart_tty_close() skips the teardown of these workqueues and proceeds to free the `hu` struct. When the scheduled work executes later, it blindly dereferences the freed `hu` struct. Furthermore, several data races and UAFs were identified in the teardown sequence: 1. Calling hci_uart_flush() from hci_uart_close() without effectively disabling write_work causes a race condition where both can concurrently double-free hu->tx_skb. This happens because protocol timers can concurrently invoke hci_uart_tx_wakeup() and requeue write_work. 2. Calling hci_free_dev(hdev) before hu->proto->close(hu) causes a UAF when vendor specific protocol close callbacks dereference hu->hdev. 3. In the initialization error paths, failing to take the proto_lock write lock before clearing PROTO_READY leads to races with active readers. Additionally, hci_uart_tty_receive() accesses hu->hdev outside the read lock, leading to UAFs if the initialization error path frees hdev concurrently. Fix these synchronization and lifecycle issues by: 1. Re-ordering hci_uart_tty_close() to clear HCI_UART_PROTO_READY first, followed immediately by a cancel_work_sync(&hu->write_work). Clearing the flag locks out concurrent protocol timers from successfully invoking hci_uart_tx_wakeup(), effectively rendering the cancellation permanent and preventing the tx_skb double-free. 2. Note: Clearing PROTO_READY early causes hci_uart_close() to skip hu->proto->flush(). This is perfectly safe in the tty_close path because hu->proto->close() executes shortly after, which intrinsically purges all protocol SKB queues and tears down the state. 3. Relocating hu->proto->close(hu) strictly prior to hci_free_dev(hdev) across all close and error paths to prevent vendor-level UAFs. 4. Moving the hdev->stat.byte_rx increment in hci_uart_tty_receive() inside the proto_lock read-side critical section to safely synchronize with device unregistration. 5. Adding cancel_work_sync(&hu->write_work) to hci_uart_close() to safely flush the workqueue before hci_uart_flush() is invoked via the HCI core. 6. Utilizing cancel_work_sync() instead of disable_work_sync() across all paths to prevent permanently breaking user-space retry capabilities. Fixes: `3b799254cf` ("Bluetooth: hci_uart: Cancel init work before unregistering") Cc: stable@vger.kernel.org Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Michael Bommarito	d3f7d17960	Bluetooth: MGMT: validate Add Extended Advertising Data length MGMT_OP_ADD_EXT_ADV_DATA is registered as a variable-length command, with MGMT_ADD_EXT_ADV_DATA_SIZE as the fixed header size. The handler then uses cp->adv_data_len and cp->scan_rsp_len to validate and copy cp->data, but it never checks that those bytes are part of the mgmt command payload. A short command can therefore make add_ext_adv_data() pass an out-of-bounds pointer into tlv_data_is_valid(). If the bytes beyond the command buffer are addressable, they can also be copied into the advertising instance as scan response data, where the caller can read them back via MGMT_OP_GET_ADV_INSTANCE. The trigger requires CAP_NET_ADMIN in the initial user namespace; KASAN reports an 8-byte slab-out-of-bounds read. Reject commands whose length does not match the fixed header plus both advertising data lengths before parsing cp->data. Fixes: `1241057283` ("Bluetooth: Break add adv into two mgmt commands") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Jiajia Liu	dd1dda6b8d	Bluetooth: btmtk: fix urb->setup_packet leak in error paths The setup_packet of control urb is not freed if usb_submit_urb fails or the submitted urb is killed. Add free in these two paths. Fixes: `a1c49c434e` ("Bluetooth: btusb: Add protocol support for MediaTek MT7668U USB devices") Signed-off-by: Jiajia Liu <liujiajia@kylinos.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
David Carlier	84c24fb151	Bluetooth: ISO: drop ISO_END frames received without prior ISO_START ISO data PDUs carry a packet-boundary flag indicating START, CONT, END or SINGLE. The ISO_CONT branch of iso_recv() guards against a missing ISO_START by checking conn->rx_len before touching conn->rx_skb, but ISO_END does not. If a peer sends an ISO_END as the first packet on a fresh ISO connection, conn->rx_skb is still NULL and conn->rx_len is zero, so skb_put(conn->rx_skb, ...) dereferences NULL and oopses. For BIS, where receivers sync to a broadcaster without pairing, any broadcaster on the air can trigger this. Mirror the ISO_CONT check at the top of ISO_END so a stray end fragment is logged and dropped instead of crashing the host. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Kiran K	88365d04fd	Bluetooth: btintel_pcie: Fix incorrect MAC access programming btintel_pcie_get_mac_access() and btintel_pcie_release_mac_access() were programming STOP_MAC_ACCESS_DIS and XTAL_CLK_REQ in addition to the MAC_ACCESS_REQ handshake. These bits are not part of the host MAC-access handshake on the supported parts; the driver was programming them incorrectly. Drop the writes so the register update contains only the bits the controller actually consumes. Fixes: `b9465e6670` ("Bluetooth: btintel_pcie: Read hardware exception data") Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Luiz Augusto von Dentz	23d528d817	Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE This fixes not setting the bit for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE when extended features bit is set otherwise the controller may not generate HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE causing hci_le_read_all_remote_features_sync to timeout waiting for it. Also remove dead code. Fixes: `a106e50be7` ("Bluetooth: HCI: Add support for LL Extended Feature Set") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Jann Horn	59e932ded9	Bluetooth: bnep: Fix UAF read of dev->name bnep_add_connection() needs to keep holding the bnep_session_sem while reading dev->name (just like bnep_get_connlist() does); otherwise the bnep_session() thread can concurrently free the net_device, which can for example be triggered by a concurrent bnep_del_connection(). (This UAF is fairly uninteresting from a security perspective; calling bnep_add_connection() requires passing a capable(CAP_NET_ADMIN) check. It also requires completely tearing down a netdev during a fairly tight race window.) Cc: stable@vger.kernel.org Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Nikhil P. Rao	dc416e32ba	pds_core: fix debugfs_lookup dentry leak and error handling debugfs_lookup() returns a dentry with an elevated reference count that must be released with dput(). The current code discards the returned dentry without calling dput(), causing a reference leak on every firmware reset recovery. Additionally, when CONFIG_DEBUG_FS is disabled, debugfs_lookup() returns ERR_PTR(-ENODEV), not NULL. The current check passes for error pointers and would call dput() on an invalid pointer, causing a crash. Fixes: `bc90fbe0c3` ("pds_core: Rework teardown/setup flow to be more common") Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260515212907.998028-3-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 19:18:33 -07:00
Nikhil P. Rao	0e46b6635b	pds_core: fix error handling in pdsc_devcmd_wait Fix two cases where pdsc_devcmd_wait() returns stale success from the completion register instead of an error: 1. FW crash: If firmware stops running, the wait loop breaks early with running=false. The condition "if ((!done \|\| timeout) && running)" is false, so error handling is bypassed and stale status is returned. Check !running first and return -ENXIO. 2. Timeout: If a command times out, err is set to -ETIMEDOUT but then overwritten by pdsc_err_to_errno(status) which reads stale status. Return -ETIMEDOUT immediately after cleaning up. Both errors now propagate to pdsc_devcmd_locked() which queues health_work for recovery. Fixes: `45d76f4929` ("pds_core: set up device and adminq") Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260515212907.998028-1-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 19:12:42 -07:00
Christian Marangi	0cb5a74faa	net: airoha: Fix NPU RX DMA descriptor bits In an internal review from Airoha, it was notice that the RX DMA descriptor bits and mask are wrong. These values probably refer to an old NPU firmware never published. The previous value works correctly but it was reported that in some specific condition in mixed scenario with both Ethernet and WiFi offload it's possible that RX DMA descriptor signal wrong value with the problem to the RX ring or packets getting dropped. To handle these specific scenario, apply the new suggested bits mask from Airoha. Correct functionality of both AN7581 NPU and MT7996 variant were verified and confirmed working. Fixes: `a7fc8c641c` ("net: airoha: Fix npu rx DMA definitions") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260518134530.3683-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 19:05:20 -07:00
Jann Horn	be309f8eae	af_unix: Fix UAF read of tail->len in unix_stream_data_wait() unix_stream_data_wait() does skb_peek_tail(&sk->sk_receive_queue) without holding any lock that prevents SKBs on that queue from being dequeued and freed. This has been the case since commit `79f632c71b` ("unix/stream: fix peeking with an offset larger than data in queue"). The first consequence of this is that the pointer comparison `tail != last` can be false even if `last` semantically refers to an already-freed SKB while `tail` is a new SKB allocated at the same address; which can cause unix_stream_data_wait() to wrongly keep blocking after new data has arrived, but only in a weird scenario where a peeking recv() and a normal recv() on the same socket are racing, which is probably not a real problem. But since commit `2b514574f7` ("net: af_unix: implement splice for stream af_unix sockets"), `tail` is actually dereferenced, which can cause UAF in the following race scenario (where test_setup() runs single-threaded, and afterwards, test_thread1() and test_thread2() run concurrently in two threads: ``` static int socks[2]; void test_setup(void) { socketpair(AF_UNIX, SOCK_STREAM, 0, socks); send(socks[1], "A", 1, 0); int peekoff = 1; setsockopt(socks[0], SOL_SOCKET, SO_PEEK_OFF, &peekoff, sizeof(peekoff)); } void test_thread1(void) { char dummy; recv(socks[0], &dummy, 1, MSG_PEEK); } void test_thread2(void) { char dummy; recv(socks[0], &dummy, 1, 0); shutdown(socks[1], SHUT_WR); } ``` when racing like this: ``` thread1 thread2 unix_stream_read_generic mutex_lock(&u->iolock) skb_peek(&sk->sk_receive_queue) skb_peek_next(skb, &sk->sk_receive_queue) mutex_unlock(&u->iolock) unix_stream_read_generic unix_state_lock(sk) skb_peek(&sk->sk_receive_queue) unix_state_unlock(sk) unix_stream_data_wait unix_state_lock(sk) tail = skb_peek_tail(&sk->sk_receive_queue) spin_lock(&sk->sk_receive_queue.lock) __skb_unlink(skb, &sk->sk_receive_queue) spin_unlock(&sk->sk_receive_queue.lock) consume_skb(skb) [frees the SKB] `tail != last`: false `tail`: true `tail->len != last_len` *UAF* ``` Fix the UAF by removing the read of tail->len; checking tail->len would only make sense if SKBs in the receive queue of a UNIX socket could grow, which can no longer happen. Kuniyuki explained: > When commit `869e7c6248` ("net: af_unix: implement stream sendpage > support") added sendpage() support, data could be appended to the last > skb in the receiver's queue. > > That's why we needed to check if the length of the last skb was changed > while waiting for new data in unix_stream_data_wait(). > > However, commit `a0dbf5f818` ("af_unix: Support MSG_SPLICE_PAGES") and > commit `57d44a354a` ("unix: Convert unix_stream_sendpage() to use > MSG_SPLICE_PAGES") refactored sendmsg(), and now data is always added > to a new skb. That means this fix is not suitable for kernels before 6.5. Fixes: `2b514574f7` ("net: af_unix: implement splice for stream af_unix sockets") Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260518-b4-unix-recv-wait-hotfix-v2-1-83e29ce8ad31@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:53:56 -07:00
Justin Iurman	d4ea0dfd75	ipv6: ioam: add NULL check for idev in ipv6_hop_ioam() Reported by Sashiko: The function ipv6_hop_ioam() accesses __in6_dev_get(skb->dev)->cnf.ioam6_enabled without validating the returned idev pointer. Because addrconf_ifdown() can concurrently clear dev->ip6_ptr via RCU, __in6_dev_get() can return NULL during interface teardown, which could cause a NULL pointer dereference when processing an IOAM Hop-by-Hop option. Let's add a check and use SKB_DROP_REASON_IPV6DISABLED accordingly. Fixes: `9ee11f0fff` ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Justin Iurman <justin.iurman@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260517183059.29140-1-justin.iurman@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:46:44 -07:00
Jakub Kicinski	d03e6124c7	Merge branch 'net-phy-honor-eee_disabled_modes-when-advertising-eee' Nicolai Buchwitz says: ==================== net: phy: honor eee_disabled_modes when advertising EEE While debugging why ethtool --show-eee reports "not supported" on a Raspberry Pi CM4 with eee-broken-1000t / eee-broken-100tx set on the PHY node, I noticed two phylib helpers copy phydev->supported_eee into phydev->advertising_eee without applying phydev->eee_disabled_modes: phy_support_eee() and phy_advertise_eee_all(). That undoes the filtering phy_probe() set up after of_set_phy_eee_broken(), so the PHY ends up advertising EEE for modes that were marked broken in DT (or by the driver via eee_disabled_modes). The visible effect on MAC drivers that call phy_support_eee() after probe (bcmgenet, fec, lan743x, lan78xx, r8169) is that ethtool on the local interface reports "not supported" (because supported is masked by eee_disabled_modes and ends up empty), while the link partner happily sees EEE negotiated and active. Patch 1 fixes phy_support_eee(). Patch 2 fixes phy_advertise_eee_all(), which is also reached from genphy_c45_ethtool_set_eee() when user space passes an empty advertisement. I went through the other users of supported_eee as suggested by Andrew and they look fine: - phy_probe() already masks via eee_disabled_modes after of_set_phy_eee_broken(). - genphy_c45_ethtool_get_eee() masks supported_eee with eee_disabled_modes when reporting to user space. - genphy_c45_ethtool_set_eee() masks user-supplied adv against eee_disabled_modes, and the empty-adv path is now covered by patch 2. - genphy_c45_read_eee_abilities(), read_eee_cap1/cap2 populate supported_eee from PHY registers (source of truth). - genphy_c45_read_eee_adv(), read_eee_lpa() and write_eee_adv() use supported_eee only to gate which MMD registers to access, not to construct an advertisement. ==================== Link: https://patch.msgid.link/20260518-devel-phy-support-eee-fix-v2-0-05b52626fa68@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:45:27 -07:00
Nicolai Buchwitz	8baa7506d7	net: phy: honor eee_disabled_modes in phy_advertise_eee_all() phy_advertise_eee_all() copies supported_eee into advertising_eee unconditionally, overwriting any filtering applied during phy_probe() based on DT eee-broken-* properties or driver-populated eee_disabled_modes. genphy_c45_ethtool_set_eee() calls this helper when user space passes an empty advertisement, undoing the filtering. Apply the same eee_disabled_modes mask in phy_advertise_eee_all() so the filtering survives the copy, matching the pattern in phy_probe() and phy_support_eee(). Fixes: `b64691274f` ("net: phy: add helper phy_advertise_eee_all") Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260518-devel-phy-support-eee-fix-v2-2-05b52626fa68@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:45:26 -07:00
Nicolai Buchwitz	3655063e08	net: phy: honor eee_disabled_modes in phy_support_eee() phy_support_eee() copies supported_eee into advertising_eee unconditionally, overwriting any filtering applied during phy_probe() based on DT eee-broken-* properties or driver-populated eee_disabled_modes. MAC drivers that call phy_support_eee() after probe (e.g. bcmgenet, fec, lan743x, lan78xx, r8169) then cause the PHY to advertise EEE for modes the user marked as broken. The symptom is that ethtool --show-eee on the local interface reports "not supported" (supported & ~eee_disabled_modes is empty) while the link partner sees EEE negotiated and active. phy_probe() already filters advertising_eee via eee_disabled_modes after calling of_set_phy_eee_broken(). Apply the same mask in phy_support_eee() so the filtering survives the copy. Fixes: `49168d1980` ("net: phy: Add phy_support_eee() indicating MAC support EEE") Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260518-devel-phy-support-eee-fix-v2-1-05b52626fa68@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:45:26 -07:00
Nerijus Bendžiūnas	960e77ce14	net: phy: skip EEE advertisement write when autoneg is disabled genphy_c45_an_config_eee_aneg() writes the EEE advertisement to the auto-negotiation device's MMD register space (MDIO_MMD_AN, register MDIO_AN_EEE_ADV). These registers are read by the link partner only during auto-negotiation, so writing them while autoneg is disabled cannot influence the link. On some PHYs (e.g. Broadcom BCM54213PE) the write nevertheless reaches the chip and disturbs the receive datapath. Concretely, running ethtool -s eth0 speed 100 duplex full autoneg off ethtool --set-eee eth0 eee off leaves eth0 with TX working and RX completely silent on a Raspberry Pi 4 / CM4 board (bcmgenet + BCM54213PE in rgmii-rxid). Switching back to autoneg recovers the link. Prior to commit `f26a29a038` ("net: phy: ensure that genphy_c45_an_config_eee_aneg() sees new value of phydev->eee_cfg.eee_enabled"), the disable path was effectively a no-op because the helper read the stale eee_cfg.eee_enabled, so the underlying PHY behavior never surfaced. Bisected on rpi-6.12.y between commits 83943264 (good) and effcbc88 (bad) to `f26a29a038`. Fixes: `f26a29a038` ("net: phy: ensure that genphy_c45_an_config_eee_aneg() sees new value of phydev->eee_cfg.eee_enabled") Cc: stable@vger.kernel.org Signed-off-by: Nerijus Bendžiūnas <nerijus.bendziunas@gmail.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Tested-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260516150251.879680-1-nerijus.bendziunas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:45:16 -07:00
Jakub Kicinski	90fc1a3937	Merge branch 'bridge-mcast-fix-a-possible-use-after-free-when-removing-a-bridge-port' Ido Schimmel says: ==================== bridge: mcast: Fix a possible use-after-free when removing a bridge port Patch #1 fixes a possible use-after-free when removing a bridge port. Patch #2 adds a test case that triggers the problem. In net-next we can: 1. Add DEBUG_NET_WARN_ON_ONCE() when a port multicast context is de-initialized while enabled. 2. When de-initializing a port multicast context, synchronously shutdown all the timers that were initialized when the context was initialized. ==================== Link: https://patch.msgid.link/20260517121122.188333-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:15:24 -07:00
Ido Schimmel	ae743a8ca8	selftests: bridge_vlan_mcast: Test toggling of multicast snooping Test toggling of multicast snooping when per-VLAN multicast snooping is enabled. The test always passes, but without "bridge: mcast: Fix possible use-after-free when removing a bridge port" it results in a splat. Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260517121122.188333-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:15:22 -07:00
Ido Schimmel	4df78ff026	bridge: mcast: Fix a possible use-after-free when removing a bridge port When per-VLAN multicast snooping is enabled, the bridge iterates over all the bridge ports, disables the per-port multicast context on each port and enables the per-{port, VLAN} multicast contexts instead. The reverse happens when per-VLAN multicast snooping is disabled. When global multicast snooping is enabled, the bridge iterates over all the bridge ports and enables the per-port multicast context on each port. The reverse happens when multicast snooping is disabled. The above scheme can result in a situation where both types of contexts (per-port and per-{port, VLAN}) are enabled on a single bridge port: # ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1 vlan_filtering 1 # ip link add name dummy1 up master br1 type dummy # ip link set dev br1 type bridge mcast_vlan_snooping 1 # ip link set dev br1 type bridge mcast_snooping 0 # ip link set dev br1 type bridge mcast_snooping 1 This is not intended and it is a problem since the commit cited below. Prior to this commit, when removing a bridge port, br_multicast_disable_port() would disable the per-port multicast context and the per-{port, VLAN} multicast contexts would get disabled when flushing VLANs. After this commit, br_multicast_disable_port() only disables the per-port multicast context if per-VLAN multicast snooping is disabled. If both types of contexts were enabled on the port when it was removed, the per-port multicast context would remain enabled when freeing the bridge port, leading to a use-after-free [1]. Fix by preventing the bridge from enabling / disabling the per-port multicast contexts when toggling global multicast snooping if per-VLAN multicast snooping is enabled. [1] ODEBUG: free active (active state 0) object: ffff88810f8bda78 object type: timer_list hint: br_ip6_multicast_port_query_expired (net/bridge/br_multicast.c:1927) WARNING: lib/debugobjects.c:629 at debug_print_object+0x1b1/0x3e0, CPU#5: swapper/5/0 [...] Call Trace: <IRQ> __debug_check_no_obj_freed (lib/debugobjects.c:1116) kfree (mm/slub.c:2620 mm/slub.c:6250 mm/slub.c:6565) kobject_cleanup (lib/kobject.c:689) rcu_do_batch (kernel/rcu/tree.c:2617) rcu_core (kernel/rcu/tree.c:2869) handle_softirqs (kernel/softirq.c:622) __irq_exit_rcu (kernel/softirq.c:656 kernel/softirq.c:496 kernel/softirq.c:735) irq_exit_rcu (kernel/softirq.c:752) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1061 (discriminator 47) arch/x86/kernel/apic/apic.c:1061 (discriminator 47)) </IRQ> Fixes: `4b30ae9adb` ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions") Reported-by: syzbot+ae231e0552fa77b26ea1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/87qznowlfs.ffs@tglx/ Reported-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260517121122.188333-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:15:22 -07:00
Dawei Feng	9b244c242b	octeontx2-pf: avoid double free of pool->stack on AQ init failure otx2_pool_aq_init() frees pool->stack when mailbox sync or retry allocation fails, but leaves the pointer unchanged. Later, otx2_sq_aura_pool_init() unwinds the partial setup through otx2_aura_pool_free(), which frees pool->stack again. The CN20K-specific cn20k_pool_aq_init() implementation has the same bug in its corresponding error path. Set pool->stack to NULL immediately after the local free so the shared cleanup path does not free the same stack again while cleaning up partially initialized pool state. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. Manual inspection confirms that the bug is still present in v7.1-rc3. Runtime validation was not performed because reproducing this path requires OcteonTX2/CN20K hardware. Fixes: `caa2da34fd` ("octeontx2-pf: Initialize and config queues") Fixes: `d322fbd172` ("octeontx2-pf: Initialize cn20k specific aura and pool contexts") Cc: stable@vger.kernel.org Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260515151826.1005397-1-dawei.feng@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 17:49:49 -07:00
Jonas Jelonek	33d35975cb	net: pse-pd: fix sign on -ENOENT check in of_load_pse_pis() of_count_phandle_with_args() returns the count on success and a negative errno on failure, including -ENOENT when the "pairsets" property is absent. The existing comparison in of_load_pse_pis() checks against ENOENT (positive 2) instead of -ENOENT, so the branch is taken for any error return: legitimate DTs that omit "pairsets" trigger a spurious "wrong number of pairsets" error and probe fails with -EINVAL. Compare against -ENOENT so a missing "pairsets" property is correctly treated as "this PI has no pairsets, continue". Fixes: `9be9567a7c` ("net: pse-pd: Add support for PSE PIs") Cc: stable@vger.kernel.org Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260515143103.1721888-1-jelonek.jonas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 17:44:27 -07:00
Paolo Abeni	edc502717b	Merge branch 'mptcp-misc-fixes-for-v7-1-rc4' Matthieu Baerts says: ==================== mptcp: misc fixes for v7.1-rc4 Here are various unrelated fixes: - Patch 1: avoid dropping partial packets. A previous version has been sent a few week ago. A fix for 5.10. - Patches 2-3: stop ADD_ADDR timer when an ADD_ADDR can never been sent due to insufficient option space. A fix for v5.10. - Patch 4: reset rcv_wnd_sent on disconnect, just in case the next connection falls back to TCP. A fix for 5.17. - Patch 5: update window_clamp when SO_RCVBUF is set during the connection. A fix similar to a recent one on TCP side, for v6.6. - Patch 6: avoid wrong time being displayed in the selftests when using uutils 0.8.0 which contains a regression with 'date +%3N'. It doesn't fix an issue in the kernel selftests, but having the fix is helpful for those using uutils 0.8.0. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> ==================== Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-0-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:39 +02:00
Matthieu Baerts (NGI0)	01ff78e4b3	selftests: mptcp: drop nanoseconds width specifier Using the format specifier +%s%3N with GNU date is honoured, and only prints 3 digits of the nanoseconds portion of the seconds since epoch, which corresponds to the milliseconds. The uutils implementation of date currently does not honour this, and always prints all 9 digits. This is a known issue [1], but can be worked around by adapting this test to use nanoseconds instead of microseconds, and then divide it by 1e6. This fix is similar to what has been done on systemd side [2], and it is needed to run the selftests on Ubuntu 26.04, containing uutils 0.8.0. Note that the Fixes tag is there even if this patch doesn't fix an issue in the kernel selftests, but it is useful for those using uutils 0.8.0. Fixes: `048d19d444` ("mptcp: add basic kselftest for mptcp") Cc: stable@vger.kernel.org Link: https://github.com/uutils/coreutils/issues/11658 [1] Link: https://github.com/systemd/systemd/pull/41627 [2] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-6-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Gang Yan	3a543ae0e2	mptcp: update window_clamp on subflows when SO_RCVBUF is set Add __mptcp_subflow_set_rcvbuf() helper to write the subflow sk_rcvbuf, but also to call the recently added tcp_set_rcvbuf() helper to update window_clamp. This is needed because the window clap is updated when scaling_ratio changes, in tcp_measure_rcv_mss(). Until scaling_ratio changes, the subflow is stuck with the old window clamp which may be based on a small initial buffer. Use this new helper in both mptcp_sol_socket_sync_intval() (setsockopt path) and sync_socket_options() (new subflow creation path). Note that this patch depends on commit `b025461303` ("tcp: update window_clamp when SO_RCVBUF is set"): it fixes the issue on TCP side, but the same fix is needed on MPTCP side as well. Fixes: `a2cbb16039` ("tcp: Update window clamping condition") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/619 Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-5-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Paolo Abeni	0981f90e1a	mptcp: reset rcv wnd on disconnect If the MPTCP socket fallback to TCP before the MP handshake completion, the IASN remain 0, and the rcv_wnd_sent field is not explicitly initialized, just incremented over time with the data transfer. At disconnect time such value is not cleared. If the next connection falls back to TCP before the MP handshake completion, the data transfer will keep incrementing the receive window end sequence starting from the last value used in the previous connection: the announced window will be unrelated from the actual receiver buffer size and likely too big. Address the issue zeroing the field at disconnect time. Fixes: `b29fcfb54c` ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-4-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Li Xiasong	fc5ef43318	selftests: mptcp: join: cover ADD_ADDR tx drop and list progress Extend add_addr_ports_tests with IPv6 signaling cases that exercise ADD_ADDR tx-space shortage when tcp_timestamps are enabled. Add one case to verify PM still progresses to later signal endpoints after the first one is dropped. This covers both failure accounting and the non-blocking behavior of the announce list after a tx-space drop on pure ACK. Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-3-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Li Xiasong	51e398a3b8	mptcp: pm: fix ADD_ADDR timer infinite retry on option space insufficient When TCP option space is insufficient (e.g., when sending ADD_ADDR with an IPv6 address and port while tcp_timestamps is enabled), the original code jumped to out_unlock without clearing the addr_signal flag. This caused mptcp_pm_add_timer to keep rescheduling indefinitely, not sending ADD_ADDR, preventing subsequent addresses in the endpoint list from being announced. Handle this case by clearing the ADD_ADDR signal and skipping the matching ADD_ADDR retransmission entry. The skip path cancels the matching timer (with id check) and advances PM state progression, preserving forward progress to subsequent PM work. This cancellation is inherently best-effort. A concurrent add_timer callback may already be running and may acquire pm.lock before the cancel path updates entry state. In that case, one final ADD_ADDR transmit attempt can still be executed. Once the cancel path sets entry->retrans_times to ADD_ADDR_RETRANS_MAX, the callback-side retrans_times check suppresses further ADD_ADDR retransmissions. Note that when an ADD_ADDR is being prepared, a pure-ACK is queued. On the output side, it means that it is fine to skip non-pure-ACK packets, when drop_other_suboptions is set: a pure-ACK will be processed soon after. Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-2-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Shardul Bankar	50c2d91c5d	mptcp: do not drop partial packets When a packet arrives with map_seq < ack_seq < end_seq, the beginning of the packet has already been acknowledged but the end contains new data. Currently the entire packet is dropped as "old data," forcing the sender to retransmit. Instead, skip the already-acked bytes by adjusting the skb offset and enqueue only the new portion. Update bytes_received and ack_seq to reflect the new data consumed. A previous attempt at this fix has been sent by Paolo Abeni [1], but had issues [2]: it also added a zero-window check and changed rcv_wnd_sent initialization, which caused test regressions. This version addresses only the partial packet handling without modifying receive window accounting. Fixes: `ab174ad8ef` ("mptcp: move ooo skbs into msk out of order queue.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/c9b426a4e163aa3c4fe8b80c79f1a610f47ae7d8.1763075056.git.pabeni@redhat.com [1] Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/600 [2] Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> [pabeni@redhat.com: update map] Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-1-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Paolo Abeni	956aa1ec97	Merge tag 'ovpn-net-20260514' of https://github.com/OpenVPN/ovpn-net-next Antonio Quartulli says: ==================== Included fixes: * fix TCP selftest failures by reducing number of attempted pings * fix RCU ptr deref outside of RCU read section * fix UAF in case of TCP peer failed to be added to hashtable * fix race condition between iface teardown and new peer being added * ensure dstats are updated with BH disabled to avoid concurrency * tag 'ovpn-net-20260514' of https://github.com/OpenVPN/ovpn-net-next: ovpn: disable BHs when updating device stats ovpn: fix race between deleting interface and adding new peer ovpn: respect peer refcount in CMD_NEW_PEER error path ovpn: tcp - use cached peer pointer in ovpn_tcp_close() selftests: ovpn: reduce remaining ping flood counts ==================== Link: https://patch.msgid.link/20260514231544.795993-1-antonio@openvpn.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 13:51:08 +02:00
Erni Sri Satya Vennela	35f0f0a253	net: mana: Fix TOCTOU double-fetch of hwc_msg_id from DMA buffer In mana_hwc_rx_event_handler(), resp->response.hwc_msg_id is read from DMA-coherent memory and bounds-checked, then mana_hwc_handle_resp() re-reads the same field from the same DMA buffer for test_bit() and pointer arithmetic. DMA-coherent memory is mapped uncacheable on x86 and is shared, unencrypted, in Confidential VMs (SEV-SNP/TDX), so each load goes directly to host-visible memory. A H/W can modify the value between the check and the use, bypassing the bounds validation. Fix this by reading hwc_msg_id exactly once using READ_ONCE() into a stack-local variable in mana_hwc_rx_event_handler(), and passing the validated value as a parameter to mana_hwc_handle_resp(). Fixes: `ca9c54d2d6` ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Link: https://patch.msgid.link/20260514194156.466823-1-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 13:00:28 +02:00

1 2 3 4 5 ...

1445367 Commits