Pull bpf fixes from Alexei Starovoitov:
- Mark migrate_disable/enable() as always_inline to avoid issues with
partial inlining (Yonghong Song)
- Fix powerpc stack register definition in libbpf bpf_tracing.h (Andrii
Nakryiko)
- Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann)
- Conditionally include dynptr copy kfuncs (Malin Jonsson)
- Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal)
- Do not audit capability check in x86 do_jit() (Ondrej Mosnacek)
- Fix arm64 JIT of BPF_ST insn when it writes into arena memory
(Puranjay Mohan)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf/arm64: Fix BPF_ST into arena memory
bpf: Make migrate_disable always inline to avoid partial inlining
bpf: Reject negative head_room in __bpf_skb_change_head
bpf: Conditionally include dynptr copy kfuncs
libbpf: Fix powerpc's stack register definition in bpf_tracing.h
bpf: Do not audit capability check in do_jit()
bpf: Sync pending IRQ work before freeing ring buffer
The zero-copy Device Memory (Devmem) transmit path
relies on the socket's route cache (`dst_entry`) to
validate that the packet is being sent via the network
device to which the DMA buffer was bound.
However, this check incorrectly fails and returns `-ENODEV`
if the socket's route cache entry (`dst`) is merely missing
or expired (`dst == NULL`). This scenario is observed during
network events, such as when flow steering rules are deleted,
leading to a temporary route cache invalidation.
This patch fixes -ENODEV error for `net_devmem_get_binding()`
by doing the following:
1. It attempts to rebuild the route via `rebuild_header()`
if the route is initially missing (`dst == NULL`). This
allows the TCP/IP stack to recover from transient route
cache misses.
2. It uses `rcu_read_lock()` and `dst_dev_rcu()` to safely
access the network device pointer (`dst_dev`) from the
route, preventing use-after-free conditions if the
device is concurrently removed.
3. It maintains the critical safety check by validating
that the retrieved destination device (`dst_dev`) is
exactly the device registered in the Devmem binding
(`binding->dev`).
These changes prevent unnecessary ENODEV failures while
maintaining the critical safety requirement that the
Devmem resources are only used on the bound network device.
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: Vedant Mathur <vedantmathur@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: bd61848900 ("net: devmem: Implement TX path")
Signed-off-by: Shivaji Kant <shivajikant@google.com>
Link: https://patch.msgid.link/20251029065420.3489943-1-shivajikant@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a netdev issues a RX async resync request for a TLS connection,
the TLS module handles it by logging record headers and attempting to
match them to the tcp_sn provided by the device. If a match is found,
the TLS module approves the tcp_sn for resynchronization.
While waiting for a device response, the TLS module also increments
rcd_delta each time a new TLS record is received, tracking the distance
from the original resync request.
However, if the device response is delayed or fails (e.g due to
unstable connection and device getting out of tracking, hardware
errors, resource exhaustion etc.), the TLS module keeps logging and
incrementing, which can lead to a WARN() when rcd_delta exceeds the
threshold.
To address this, introduce tls_offload_rx_resync_async_request_cancel()
to explicitly cancel resync requests when a device response failure is
detected. Call this helper also as a final safeguard when rcd_delta
crosses its threshold, as reaching this point implies that earlier
cancellation did not occur.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761508983-937977-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
netfilter: updates for net
1) its not possible to attach conntrack labels via ctnetlink
unless one creates a dummy 'ct labels set' rule in nftables.
This is an oversight, the 'ruleset tests presence, userspace
(netlink) sets' use-case is valid and should 'just work'.
Always broken since this got added in Linux 4.7.
2) nft_connlimit reads count value without holding the relevant
lock, add a READ_ONCE annotation. From Fernando Fernandez Mancera.
3) There is a long-standing bug (since 4.12) in nftables helper infra
when NAT is in use: if the helper gets assigned after the nat binding
was set up, we fail to initialise the 'seqadj' extension, which is
needed in case NAT payload rewrites need to add (or remove) from the
packet payload. Fix from Andrii Melnychenko.
* tag 'nf-25-10-29' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_ct: add seqadj extension for natted connections
netfilter: nft_connlimit: fix possible data race on connection count
netfilter: nft_ct: enable labels for get case too
====================
Link: https://patch.msgid.link/20251029135617.18274-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Accessing the transmit queue without owning the msk socket lock is
inherently racy, hence __mptcp_check_push() could actually quit early
even when there is pending data.
That in turn could cause unexpected tx lock and timeout.
Dropping the early check avoids the race, implicitly relaying on later
tests under the relevant lock. With such change, all the other
mptcp_send_head() call sites are now under the msk socket lock and we
can additionally drop the now unneeded annotation on the transmit head
pointer accesses.
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251028-net-mptcp-send-timeout-v1-1-38ffff5a9ec8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sequence adjustment may be required for FTP traffic with PASV/EPSV modes.
due to need to re-write packet payload (IP, port) on the ftp control
connection. This can require changes to the TCP length and expected
seq / ack_seq.
The easiest way to reproduce this issue is with PASV mode.
Example ruleset:
table inet ftp_nat {
ct helper ftp_helper {
type "ftp" protocol tcp
l3proto inet
}
chain prerouting {
type filter hook prerouting priority 0; policy accept;
tcp dport 21 ct state new ct helper set "ftp_helper"
}
}
table ip nat {
chain prerouting {
type nat hook prerouting priority -100; policy accept;
tcp dport 21 dnat ip prefix to ip daddr map {
192.168.100.1 : 192.168.13.2/32 }
}
chain postrouting {
type nat hook postrouting priority 100 ; policy accept;
tcp sport 21 snat ip prefix to ip saddr map {
192.168.13.2 : 192.168.100.1/32 }
}
}
Note that the ftp helper gets assigned *after* the dnat setup.
The inverse (nat after helper assign) is handled by an existing
check in nf_nat_setup_info() and will not show the problem.
Topoloy:
+-------------------+ +----------------------------------+
| FTP: 192.168.13.2 | <-> | NAT: 192.168.13.3, 192.168.100.1 |
+-------------------+ +----------------------------------+
|
+-----------------------+
| Client: 192.168.100.2 |
+-----------------------+
ftp nat changes do not work as expected in this case:
Connected to 192.168.100.1.
[..]
ftp> epsv
EPSV/EPRT on IPv4 off.
ftp> ls
227 Entering passive mode (192,168,100,1,209,129).
421 Service not available, remote server has closed connection.
Kernel logs:
Missing nfct_seqadj_ext_add() setup call
WARNING: CPU: 1 PID: 0 at net/netfilter/nf_conntrack_seqadj.c:41
[..]
__nf_nat_mangle_tcp_packet+0x100/0x160 [nf_nat]
nf_nat_ftp+0x142/0x280 [nf_nat_ftp]
help+0x4d1/0x880 [nf_conntrack_ftp]
nf_confirm+0x122/0x2e0 [nf_conntrack]
nf_hook_slow+0x3c/0xb0
..
Fix this by adding the required extension when a conntrack helper is assigned
to a connection that has a nat binding.
Fixes: 1a64edf54f ("netfilter: nft_ct: add helper set support")
Signed-off-by: Andrii Melnychenko <a.melnychenko@vyos.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
nft_connlimit_eval() reads priv->list->count to check if the connection
limit has been exceeded. This value is being read without a lock and can
be modified by a different process. Use READ_ONCE() for correctness.
Fixes: df4a902509 ("netfilter: nf_conncount: merge lookup and add functions")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
conntrack labels can only be set when the conntrack has been created
with the "ctlabel" extension.
For older iptables (connlabel match), adding an "-m connlabel" rule
turns on the ctlabel extension allocation for all future conntrack
entries.
For nftables, its only enabled for 'ct label set foo', but not for
'ct label foo' (i.e. check).
But users could have a ruleset that only checks for presence, and rely
on userspace to set a label bit via ctnetlink infrastructure.
This doesn't work without adding a dummy 'ct label set' rule.
We could also enable extension infra for the first (failing) ctnetlink
request, but unlike ruleset we would not be able to disable the
extension again.
Therefore turn on ctlabel extension allocation if an nftables ruleset
checks for a connlabel too.
Fixes: 1ad8f48df6 ("netfilter: nftables: add connlabel set support")
Reported-by: Antonio Ojea <aojea@google.com>
Closes: https://lore.kernel.org/netfilter-devel/aPi_VdZpVjWujZ29@strlen.de/
Signed-off-by: Florian Westphal <fw@strlen.de>
Yinhao et al. recently reported:
Our fuzzing tool was able to create a BPF program which triggered
the below BUG condition inside pskb_expand_head.
[ 23.016047][T10006] kernel BUG at net/core/skbuff.c:2232!
[...]
[ 23.017301][T10006] RIP: 0010:pskb_expand_head+0x1519/0x1530
[...]
[ 23.021249][T10006] Call Trace:
[ 23.021387][T10006] <TASK>
[ 23.021507][T10006] ? __pfx_pskb_expand_head+0x10/0x10
[ 23.021725][T10006] __bpf_skb_change_head+0x22a/0x520
[ 23.021939][T10006] bpf_skb_change_head+0x34/0x1b0
[ 23.022143][T10006] ___bpf_prog_run+0xf70/0xb670
[ 23.022342][T10006] __bpf_prog_run32+0xed/0x140
[...]
The problem is that in __bpf_skb_change_head() we need to reject a
negative head_room as otherwise this propagates all the way to the
pskb_expand_head() from skb_cow(). For example, if the BPF test infra
passes a skb with gso_skb:1 to the BPF helper with a negative head_room
of -22, then this gets passed into skb_cow(). __skb_cow() in this
example calculates a delta of -86 which gets aligned to -64, and then
triggers BUG_ON(nhead < 0). Thus, reject malformed negative input.
Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://patch.msgid.link/20251023125532.182262-1-daniel@iogearbox.net
The RFCOMM driver confuses the local and remote modem control signals,
which specifically means that the reported DTR and RTS state will
instead reflect the remote end (i.e. DSR and CTS).
This issue dates back to the original driver (and a follow-on update)
merged in 2002, which resulted in a non-standard implementation of
TIOCMSET that allowed controlling also the TS07.10 IC and DV signals by
mapping them to the RI and DCD input flags, while TIOCMGET failed to
return the actual state of DTR and RTS.
Note that the bogus control of input signals in tiocmset() is just
dead code as those flags will have been masked out by the tty layer
since 2003.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Periodic advertising enabled flag cannot be tracked by the enabled
flag since advertising and periodic advertising each can be
enabled/disabled separately from one another causing the states to be
inconsistent when for example an advertising set is disabled its
enabled flag is set to false which is then used for periodic which has
not being disabled.
Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes bis_cleanup not considering connections in BT_OPEN state
before attempting to remove the BIG causing the following error:
btproxy[20110]: < HCI Command: LE Terminate Broadcast Isochronous Group (0x08|0x006a) plen 2
BIG Handle: 0x01
Reason: Connection Terminated By Local Host (0x16)
> HCI Event: Command Status (0x0f) plen 4
LE Terminate Broadcast Isochronous Group (0x08|0x006a) ncmd 1
Status: Unknown Advertising Identifier (0x42)
Fixes: fa224d0c09 ("Bluetooth: ISO: Reassociate a socket with an active BIS")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Socket dst_type cannot be directly assigned to hci_conn->type since
there domain is different which may lead to the wrong address type being
used.
Fixes: 6a5ad251b7 ("Bluetooth: ISO: Fix possible circular locking dependency")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This reverts commit c9d84da18d. It
replaces in L2CAP calls to msecs_to_jiffies() to secs_to_jiffies()
and updates the constants accordingly. But the constants are also
used in LCAP Configure Request and L2CAP Configure Response which
expect values in milliseconds.
This may prevent correct usage of L2CAP channel.
To fix it, keep those constants in milliseconds and so revert this
change.
Fixes: c9d84da18d ("Bluetooth: L2CAP: convert timeouts to secs_to_jiffies()")
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
There is a BUG: KASAN: stack-out-of-bounds in set_mesh_sync due to
memcpy from badly declared on-stack flexible array.
Another crash is in set_mesh_complete() due to double list_del via
mgmt_pending_valid + mgmt_pending_remove.
Use DEFINE_FLEX to declare the flexible array right, and don't memcpy
outside bounds.
As mgmt_pending_valid removes the cmd from list, use mgmt_pending_free,
and also report status on error.
Fixes: 302a1f674c ("Bluetooth: MGMT: Fix possible UAFs")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes the state tracking of advertisement set/instance 0x00 which
is considered a legacy instance and is not tracked individually by
adv_instances list, previously it was assumed that hci_dev itself would
track it via HCI_LE_ADV but that is a global state not specifc to
instance 0x00, so to fix it a new flag is introduced that only tracks the
state of instance 0x00.
Fixes: 1488af7b8b ("Bluetooth: hci_sync: Fix hci_resume_advertising_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Socket dst_type cannot be directly assigned to hci_conn->type since
there domain is different which may lead to the wrong address type being
used.
Fixes: 6a5ad251b7 ("Bluetooth: ISO: Fix possible circular locking dependency")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_cmd_sync_dequeue_once() does lookup and then cancel
the entry under two separate lock sections. Meanwhile,
hci_cmd_sync_work() can also delete the same entry,
leading to double list_del() and "UAF".
Fix this by holding cmd_sync_work_lock across both
lookup and cancel, so that the entry cannot be removed
concurrently.
Fixes: 505ea2b295 ("Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queue")
Reported-by: Cen Zhang <zzzccc427@163.com>
Signed-off-by: Cen Zhang <zzzccc427@163.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Johannes Berg says:
====================
First set of fixes:
- brcmfmac: long-standing crash when used w/o P2P
- iwlwifi: fix for a use-after-free bug
- mac80211: key tailroom accounting bug could leave
allocation overhead and cause a warning
- ath11k: add a missing platform,
fix key flag operations
- bcma: skip devices disabled in OF/DT
- various (potential) memory leaks
* tag 'wireless-2025-10-23' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: nl80211: call kfree without a NULL check
wifi: mac80211: fix key tailroom accounting leak
wifi: brcmfmac: fix crash while sending Action Frames in standalone AP Mode
MAINTAINERS: wcn36xx: Add linux-wireless list
bcma: don't register devices disabled in OF
wifi: mac80211: reset FILS discovery and unsol probe resp intervals
wifi: iwlwifi: fix potential use after free in iwl_mld_remove_link()
wifi: ath11k: avoid bit operation on key flags
wifi: ath12k: free skb during idr cleanup callback
wifi: ath11k: Add missing platform IDs for quirk table
wifi: ath10k: Fix memory leak on unsupported WMI command
====================
Link: https://patch.msgid.link/20251023180604.626946-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from can. Slim pickings, I'm guessing people haven't
really started testing.
Current release - new code bugs:
- eth: mlx5e:
- psp: avoid 'accel' NULL pointer dereference
- skip PPHCR register query for FEC histogram if not supported
Previous releases - regressions:
- bonding: update the slave array for broadcast mode
- rtnetlink: re-allow deleting FDB entries in user namespace
- eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path
Previous releases - always broken:
- can: drop skb on xmit if device is in listen-only mode
- gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb()
- eth: mlx5e
- RX, fix generating skb from non-linear xdp_buff if program
trims frags
- make devcom init failures non-fatal, fix races with IPSec
Misc:
- some documentation formatting 'fixes'"
* tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5: Fix IPsec cleanup over MPV device
net/mlx5: Refactor devcom to return NULL on failure
net/mlx5e: Skip PPHCR register query if not supported by the device
net/mlx5: Add PPHCR to PCAM supported registers mask
virtio-net: zero unused hash fields
net: phy: micrel: always set shared->phydev for LAN8814
vsock: fix lock inversion in vsock_assign_transport()
ovpn: use datagram_poll_queue for socket readiness in TCP
espintcp: use datagram_poll_queue for socket readiness
net: datagram: introduce datagram_poll_queue for custom receive queues
net: bonding: fix possible peer notify event loss or dup issue
net: hsr: prevent creation of HSR device with slaves from another netns
sctp: avoid NULL dereference when chunk data buffer is missing
ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop
net: ravb: Ensure memory write completes before ringing TX doorbell
net: ravb: Enforce descriptor type ordering
net: hibmcge: select FIXED_PHY
net: dlink: use dev_kfree_skb_any instead of dev_kfree_skb
Documentation: networking: ax25: update the mailing list info.
net: gro_cells: fix lock imbalance in gro_cells_receive()
...
Syzbot reported a potential lock inversion deadlock between
vsock_register_mutex and sk_lock-AF_VSOCK when vsock_linger() is called.
The issue was introduced by commit 687aa0c558 ("vsock: Fix
transport_* TOCTOU") which added vsock_register_mutex locking in
vsock_assign_transport() around the transport->release() call, that can
call vsock_linger(). vsock_assign_transport() can be called with sk_lock
held. vsock_linger() calls sk_wait_event() that temporarily releases and
re-acquires sk_lock. During this window, if another thread hold
vsock_register_mutex while trying to acquire sk_lock, a circular
dependency is created.
Fix this by releasing vsock_register_mutex before calling
transport->release() and vsock_deassign_transport(). This is safe
because we don't need to hold vsock_register_mutex while releasing the
old transport, and we ensure the new transport won't disappear by
obtaining a module reference first via try_module_get().
Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
Fixes: 687aa0c558 ("vsock: Fix transport_* TOCTOU")
Cc: mhal@rbox.co
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251021121718.137668-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
espintcp uses a custom queue (ike_queue) to deliver packets to
userspace. The polling logic relies on datagram_poll, which checks
sk_receive_queue, which can lead to false readiness signals when that
queue contains non-userspace packets.
Switch espintcp_poll to use datagram_poll_queue with ike_queue, ensuring
poll only signals readiness when userspace data is actually available.
Fixes: e27cca96cd ("xfrm: add espintcp (RFC 8229)")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20251021100942.195010-3-ralf@mandelbit.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Some protocols using TCP encapsulation (e.g., espintcp, openvpn) deliver
userspace-bound packets through a custom skb queue rather than the
standard sk_receive_queue.
Introduce datagram_poll_queue that accepts an explicit receive queue,
and convert datagram_poll into a wrapper around datagram_poll_queue.
This allows protocols with custom skb queues to reuse the core polling
logic without relying on sk_receive_queue.
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Antonio Quartulli <antonio@openvpn.net>
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20251021100942.195010-2-ralf@mandelbit.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
HSR/PRP driver does not handle correctly having slaves/interlink devices
in a different net namespace. Currently, it is possible to create a HSR
link in a different net namespace than the slaves/interlink with the
following command:
ip link add hsr0 netns hsr-ns type hsr slave1 eth1 slave2 eth2
As there is no use-case on supporting this scenario, enforce that HSR
device link matches netns defined by IFLA_LINK_NETNSID.
The iproute2 command mentioned above will throw the following error:
Error: hsr: HSR slaves/interlink must be on the same net namespace than HSR link.
Fixes: f421436a59 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20251020135533.9373-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
chunk->skb pointer is dereferenced in the if-block where it's supposed
to be NULL only.
chunk->skb can only be NULL if chunk->head_skb is not. Check for frag_list
instead and do it just before replacing chunk->skb. We're sure that
otherwise chunk->skb is non-NULL because of outer if() condition.
Fixes: 90017accff ("sctp: Add GSO support")
Signed-off-by: Alexey Simakov <bigalex934@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://patch.msgid.link/20251021130034.6333-1-bigalex934@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot found that the local_unlock_nested_bh() call was
missing in some cases.
WARNING: possible recursive locking detected
syzkaller #0 Not tainted
--------------------------------------------
syz.2.329/7421 is trying to acquire lock:
ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:44 [inline]
ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: gro_cells_receive+0x404/0x790 net/core/gro_cells.c:30
but task is already holding lock:
ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:44 [inline]
ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: gro_cells_receive+0x404/0x790 net/core/gro_cells.c:30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock((&cell->bh_lock));
lock((&cell->bh_lock));
*** DEADLOCK ***
Given the introduction of @have_bh_lock variable, it seems the author
intent was to have the local_unlock_nested_bh() after the @unlock label.
Fixes: 25718fdcbd ("net: gro_cells: Use nested-BH locking for gro_cell")
Reported-by: syzbot+f9651b9a8212e1c8906f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68f65eb9.a70a0220.205af.0034.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20251020161114.1891141-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The special C-flag case expects the ADD_ADDR to be received when
switching to 'fully-established'. But for various reasons, the ADD_ADDR
could be sent after the "4th ACK", and the special case doesn't work.
On NIPA, the new test validating this special case for the C-flag failed
a few times, e.g.
102 default limits, server deny join id 0
syn rx [FAIL] got 0 JOIN[s] syn rx expected 2
Server ns stats
(...)
MPTcpExtAddAddrTx 1
MPTcpExtEchoAdd 1
Client ns stats
(...)
MPTcpExtAddAddr 1
MPTcpExtEchoAddTx 1
synack rx [FAIL] got 0 JOIN[s] synack rx expected 2
ack rx [FAIL] got 0 JOIN[s] ack rx expected 2
join Rx [FAIL] see above
syn tx [FAIL] got 0 JOIN[s] syn tx expected 2
join Tx [FAIL] see above
I had a suspicion about what the issue could be: the ADD_ADDR might have
been received after the switch to the 'fully-established' state. The
issue was not easy to reproduce. The packet capture shown that the
ADD_ADDR can indeed be sent with a delay, and the client would not try
to establish subflows to it as expected.
A simple fix is not to mark the endpoints as 'used' in the C-flag case,
when looking at creating subflows to the remote initial IP address and
port. In this case, there is no need to try.
Note: newly added fullmesh endpoints will still continue to be used as
expected, thanks to the conditions behind mptcp_pm_add_addr_c_flag_case.
Fixes: 4b1ff850e0 ("mptcp: pm: in-kernel: usable client side with C-flag")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-1-8207030cb0e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Coverity is unhappy because we may leak old_radio_rts_threshold. Since
this pointer is only valid in the context of the function and kfree is
NULL pointer safe, don't check and just call kfree.
Note that somehow, we were checking old_rts_threshold to free
old_radio_rts_threshold which is a bit odd.
Fixes: 264637941c ("wifi: cfg80211: Add Support to Set RTS Threshold for each Radio")
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20251020075745.44168-1-emmanuel.grumbach@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When ieee80211_stop_ap() deletes the FILS discovery and unsolicited
broadcast probe response templates, the associated interval values
are not reset. This can lead to drivers subsequently operating with
the non-zero values, leading to unexpected behavior.
Trigger repeated retrieval attempts of the FILS discovery template in
ath12k, resulting in excessive log messages such as:
mac vdev 0 failed to retrieve FILS discovery template
mac vdev 4 failed to retrieve FILS discovery template
Fix this by resetting the intervals in ieee80211_stop_ap() to ensure
proper cleanup of FILS discovery and unsolicited broadcast probe
response templates.
Fixes: 295b02c4be ("mac80211: Add FILS discovery support")
Fixes: 632189a018 ("mac80211: Unsolicited broadcast probe response support")
Signed-off-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com>
Signed-off-by: Aaradhana Sahu <aaradhana.sahu@oss.qualcomm.com>
Link: https://patch.msgid.link/20250924130014.2575533-1-aaradhana.sahu@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull bpf fixes from Alexei Starovoitov:
- Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak
imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov)
- Make selftests/bpf arg_parsing.c more robust to errors (Andrii
Nakryiko)
- Fix redefinition of 'off' as different kind of symbol when I40E
driver is builtin (Brahmajit Das)
- Do not disable preemption in bpf_test_run (Sahil Chandna)
- Fix memory leak in __lookup_instance error path (Shardul Bankar)
- Ensure test data is flushed to disk before reading it (Xing Guo)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Fix redefinition of 'off' as different kind of symbol
bpf: Do not disable preemption in bpf_test_run().
bpf: Fix memory leak in __lookup_instance error path
selftests: arg_parsing: Ensure data is flushed to disk before reading.
bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
selftests/bpf: make arg_parsing.c more robust to crashes
bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
Creating FDB entries is possible from a non-initial user namespace when
having CAP_NET_ADMIN, yet, when deleting FDB entries, processes receive
an EPERM because the capability is always checked against the initial
user namespace. This restricts the FDB management from unprivileged
containers.
Drop the netlink_capable check in rtnl_fdb_del as it was originally
dropped in c5c351088a and reintroduced in 1690be63a2 without
intention.
This patch was tested using a container on GyroidOS, where it was
possible to delete FDB entries from an unprivileged user namespace and
private network namespace.
Fixes: 1690be63a2 ("bridge: Add vlan support to static neighbors")
Reviewed-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Tested-by: Harshal Gohel <hg@simonwunderlich.de>
Signed-off-by: Johannes Wiesböck <johannes.wiesboeck@aisec.fraunhofer.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251015201548.319871-1-johannes.wiesboeck@aisec.fraunhofer.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some network drivers assume this field is zero after napi_get_frags().
We must clear it in napi_reuse_skb() otherwise the following can happen:
1) A packet is received, and skb_shinfo(skb)->hwtstamps is populated
because a bit in the receive descriptor announced hwtstamp
availability for this packet.
2) Packet is given to gro layer via napi_gro_frags().
3) Packet is merged to a prior one held in GRO queues.
4) skb is saved after some cleanup in napi->skb via a call
to napi_reuse_skb().
5) Next packet is received 10 seconds later, gets the recycled skb
from napi_get_frags().
6) The receive descriptor does not announce hwtstamp availability.
Driver does not clear shinfo->hwtstamps.
7) We have in shinfo->hwtstamps an old timestamp.
Fixes: ac45f602ee ("net: infrastructure for hardware time stamping")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251015063221.4171986-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN
Current release - regressions:
- udp: do not use skb_release_head_state() before
skb_attempt_defer_free()
- gro_cells: use nested-BH locking for gro_cell
- dpll: zl3073x: increase maximum size of flash utility
Previous releases - regressions:
- core: fix lockdep splat on device unregister
- tcp: fix tcp_tso_should_defer() vs large RTT
- tls:
- don't rely on tx_work during send()
- wait for pending async decryptions if tls_strp_msg_hold fails
- can: j1939: add missing calls in NETDEV_UNREGISTER notification
handler
- eth: lan78xx: fix lost EEPROM write timeout in
lan78xx_write_raw_eeprom
Previous releases - always broken:
- ip6_tunnel: prevent perpetual tunnel growth
- dpll: zl3073x: handle missing or corrupted flash configuration
- can: m_can: fix pm_runtime and CAN state handling
- eth:
- ixgbe: fix too early devlink_free() in ixgbe_remove()
- ixgbevf: fix mailbox API compatibility
- gve: Check valid ts bit on RX descriptor before hw timestamping
- idpf: cleanup remaining SKBs in PTP flows
- r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"
* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
udp: do not use skb_release_head_state() before skb_attempt_defer_free()
net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
netdevsim: set the carrier when the device goes up
selftests: tls: add test for short splice due to full skmsg
selftests: net: tls: add tests for cmsg vs MSG_MORE
tls: don't rely on tx_work during send()
tls: wait for pending async decryptions if tls_strp_msg_hold fails
tls: always set record_type in tls_process_cmsg
tls: wait for async encrypt in case of error during latter iterations of sendmsg
tls: trim encrypted message to match the plaintext on short splice
tg3: prevent use of uninitialized remote_adv and local_adv variables
MAINTAINERS: new entry for IPv6 IOAM
gve: Check valid ts bit on RX descriptor before hw timestamping
net: core: fix lockdep splat on device unregister
MAINTAINERS: add myself as maintainer for b53
selftests: net: check jq command is supported
net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
tcp: fix tcp_tso_should_defer() vs large RTT
r8152: add error handling in rtl8152_driver_init
usbnet: Fix using smp_processor_id() in preemptible code warnings
...
Marc Kleine-Budde says:
====================
pull-request: can 2025-10-14
The first 2 paches are by Celeste Liu and target the gS_usb driver.
The first patch remove the limitation to 3 CAN interface per USB
device. The second patch adds the missing population of
net_device->dev_port.
The next 4 patches are by me and fix the m_can driver. They add a
missing pm_runtime_disable(), fix the CAN state transition back to
Error Active and fix the state after ifup and suspend/resume.
Another patch by me targets the m_can driver, too and replaces Dong
Aisheng's old email address.
The next 2 patches are by Vincent Mailhol and update the CAN
networking Documentation.
Tetsuo Handa contributes the last patch that add missing cleanup calls
in the NETDEV_UNREGISTER notification handler.
* tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
can: add Transmitter Delay Compensation (TDC) documentation
can: remove false statement about 1:1 mapping between DLC and length
can: m_can: replace Dong Aisheng's old email address
can: m_can: fix CAN state in system PM
can: m_can: m_can_chip_config(): bring up interface in correct state
can: m_can: m_can_handle_state_errors(): fix CAN state transition to Error Active
can: m_can: m_can_plat_remove(): add missing pm_runtime_disable()
can: gs_usb: gs_make_candev(): populate net_device->dev_port
can: gs_usb: increase max interface to U8_MAX
====================
Link: https://patch.msgid.link/20251014122140.990472-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With async crypto, we rely on tx_work to actually transmit records
once encryption completes. But while send() is running, both the
tx_lock and socket lock are held, so tx_work_handler cannot process
the queue of encrypted records, and simply reschedules itself. During
a large send(), this could last a long time, and use a lot of memory.
Transmit any pending encrypted records before restarting the main
loop of tls_sw_sendmsg_locked.
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/8396631478f70454b44afb98352237d33f48d34d.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Async decryption calls tls_strp_msg_hold to create a clone of the
input skb to hold references to the memory it uses. If we fail to
allocate that clone, proceeding with async decryption can lead to
various issues (UAF on the skb, writing into userspace memory after
the recv() call has returned).
In this case, wait for all pending decryption requests.
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/b9fe61dcc07dab15da9b35cf4c7d86382a98caf2.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When userspace wants to send a non-DATA record (via the
TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a
previous MSG_MORE send() as a separate DATA record. If that DATA record
is encrypted asynchronously, tls_handle_open_record will return
-EINPROGRESS. This is currently treated as an error by
tls_process_cmsg, and it will skip setting record_type to the correct
value, but the caller (tls_sw_sendmsg_locked) handles that return
value correctly and proceeds with sending the new message with an
incorrect record_type (DATA instead of whatever was requested in the
cmsg).
Always set record_type before handling the open record. If
tls_handle_open_record returns an error, record_type will be
ignored. If it succeeds, whether with synchronous crypto (returning 0)
or asynchronous (returning -EINPROGRESS), the caller will proceed
correctly.
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/0457252e578a10a94e40c72ba6288b3a64f31662.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we hit an error during the main loop of tls_sw_sendmsg_locked (eg
failed allocation), we jump to send_end and immediately
return. Previous iterations may have queued async encryption requests
that are still pending. We should wait for those before returning, as
we could otherwise be reading from memory that userspace believes
we're not using anymore, which would be a sort of use-after-free.
This is similar to what tls_sw_recvmsg already does: failures during
the main loop jump to the "wait for async" code, not straight to the
unlock/return.
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/c793efe9673b87f808d84fdefc0f732217030c52.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>