Fix a use-after-free which happens due to enslave failure after the new
slave has been added to the array. Since the new slave can be used for Tx
immediately, we can use it after it has been freed by the enslave error
cleanup path which frees the allocated slave memory. Slave update array is
supposed to be called last when further enslave failures are not expected.
Move it after xdp setup to avoid any problems.
It is very easy to reproduce the problem with a simple xdp_pass prog:
ip l add bond1 type bond mode balance-xor
ip l set bond1 up
ip l set dev bond1 xdp object xdp_pass.o sec xdp_pass
ip l add dumdum type dummy
Then run in parallel:
while :; do ip l set dumdum master bond1 1>/dev/null 2>&1; done;
mausezahn bond1 -a own -b rand -A rand -B 1.1.1.1 -c 0 -t tcp "dp=1-1023, flags=syn"
The crash happens almost immediately:
[ 605.602850] Oops: general protection fault, probably for non-canonical address 0xe0e6fc2460000137: 0000 [#1] SMP KASAN NOPTI
[ 605.602916] KASAN: maybe wild-memory-access in range [0x07380123000009b8-0x07380123000009bf]
[ 605.602946] CPU: 0 UID: 0 PID: 2445 Comm: mausezahn Kdump: loaded Tainted: G B 6.19.0-rc6+ #21 PREEMPT(voluntary)
[ 605.602979] Tainted: [B]=BAD_PAGE
[ 605.602998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 605.603032] RIP: 0010:netdev_core_pick_tx+0xcd/0x210
[ 605.603063] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 3e 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 6b 08 49 8d 7d 30 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 25 01 00 00 49 8b 45 30 4c 89 e2 48 89 ee 48 89
[ 605.603111] RSP: 0018:ffff88817b9af348 EFLAGS: 00010213
[ 605.603145] RAX: dffffc0000000000 RBX: ffff88817d28b420 RCX: 0000000000000000
[ 605.603172] RDX: 00e7002460000137 RSI: 0000000000000008 RDI: 07380123000009be
[ 605.603199] RBP: ffff88817b541a00 R08: 0000000000000001 R09: fffffbfff3ed8c0c
[ 605.603226] R10: ffffffff9f6c6067 R11: 0000000000000001 R12: 0000000000000000
[ 605.603253] R13: 073801230000098e R14: ffff88817d28b448 R15: ffff88817b541a84
[ 605.603286] FS: 00007f6570ef67c0(0000) GS:ffff888221dfa000(0000) knlGS:0000000000000000
[ 605.603319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 605.603343] CR2: 00007f65712fae40 CR3: 000000011371b000 CR4: 0000000000350ef0
[ 605.603373] Call Trace:
[ 605.603392] <TASK>
[ 605.603410] __dev_queue_xmit+0x448/0x32a0
[ 605.603434] ? __pfx_vprintk_emit+0x10/0x10
[ 605.603461] ? __pfx_vprintk_emit+0x10/0x10
[ 605.603484] ? __pfx___dev_queue_xmit+0x10/0x10
[ 605.603507] ? bond_start_xmit+0xbfb/0xc20 [bonding]
[ 605.603546] ? _printk+0xcb/0x100
[ 605.603566] ? __pfx__printk+0x10/0x10
[ 605.603589] ? bond_start_xmit+0xbfb/0xc20 [bonding]
[ 605.603627] ? add_taint+0x5e/0x70
[ 605.603648] ? add_taint+0x2a/0x70
[ 605.603670] ? end_report.cold+0x51/0x75
[ 605.603693] ? bond_start_xmit+0xbfb/0xc20 [bonding]
[ 605.603731] bond_start_xmit+0x623/0xc20 [bonding]
Fixes: 9e2ee5c7e7 ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reported-by: Chen Zhen <chenzhen126@huawei.com>
Closes: https://lore.kernel.org/netdev/fae17c21-4940-5605-85b2-1d5e17342358@huawei.com/
CC: Jussi Maki <joamaki@gmail.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260123120659.571187-1-razor@blackwall.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Some PHYs stop the refclk for power saving, usually while link down.
This causes reading stats to time out.
Therefore, in emac_stats_update(), also don't update and reschedule if
!netif_carrier_ok(). But that means we could be missing later updates if
the link comes back up, so also reschedule when link up is detected in
emac_adjust_link().
While we're at it, improve the comments and error message prints around
this to reflect the better understanding of how this could happen.
Hopefully if this happens again on new hardware, these comments will
direct towards a solution.
Closes: https://lore.kernel.org/r/20260119141620.1318102-1-amadeus@jmu.edu.cn/
Fixes: bfec6d7f20 ("net: spacemit: Add K1 Ethernet MAC")
Co-developed-by: Chukun Pan <amadeus@jmu.edu.cn>
Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20260123-k1-ethernet-clarify-stat-timeout-v3-1-93b9df627e87@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rocker_world_port_pre_init(), rocker_port->wpriv is allocated with
kzalloc(wops->port_priv_size, GFP_KERNEL). However, in
rocker_world_port_post_fini(), the memory is only freed when
wops->port_post_fini callback is set:
if (!wops->port_post_fini)
return;
wops->port_post_fini(rocker_port);
kfree(rocker_port->wpriv);
Since rocker_ofdpa_ops does not implement port_post_fini callback
(it is NULL), the wpriv memory allocated for each port is never freed
when ports are removed. This leads to a memory leak of
sizeof(struct ofdpa_port) bytes per port on every device removal.
Fix this by always calling kfree(rocker_port->wpriv) regardless of
whether the port_post_fini callback exists.
Fixes: e420114eef ("rocker: introduce worlds infrastructure")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260123211030.2109-2-qikeyu2017@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function mlx5_esw_vport_vhca_id() is declared to return bool,
but returns -EOPNOTSUPP (-45), which is an int error code. This
causes a signedness bug as reported by smatch.
This patch fixes this smatch report:
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:981 mlx5_esw_vport_vhca_id()
warn: signedness bug returning '(-45)'
Fixes: 1baf304265 ("net/mlx5: E-Switch, Set/Query hca cap via vhca id")
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Zeng Chi <zengchi@kylinos.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260123085749.1401969-1-zeng_chi911@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When receiving data in the DPMAIF RX path,
the t7xx_dpmaif_set_frag_to_skb() function adds
page fragments to an skb without checking if the number of
fragments has exceeded MAX_SKB_FRAGS. This could lead to a buffer overflow
in skb_shinfo(skb)->frags[] array, corrupting adjacent memory and
potentially causing kernel crashes or other undefined behavior.
This issue was identified through static code analysis by comparing with a
similar vulnerability fixed in the mt76 driver commit b102f0c522 ("mt76:
fix array overflow on receiving too many fragments for a packet").
The vulnerability could be triggered if the modem firmware sends packets
with excessive fragments. While under normal protocol conditions (MTU 3080
bytes, BAT buffer 3584 bytes),
a single packet should not require additional
fragments, the kernel should not blindly trust firmware behavior.
Malicious, buggy, or compromised firmware could potentially craft packets
with more fragments than the kernel expects.
Fix this by adding a bounds check before calling skb_add_rx_frag() to
ensure nr_frags does not exceed MAX_SKB_FRAGS.
The check must be performed before unmapping to avoid a page leak
and double DMA unmap during device teardown.
Fixes: d642b012df ("net: wwan: t7xx: Add data path interface")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Link: https://patch.msgid.link/20260122170401.1986-2-qikeyu2017@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When replying to a ICMPv6 echo request that comes from localhost address
the right output ifindex is 1 (lo) and not rt6i_idev dev index. Use the
skb device ifindex instead. This fixes pinging to a local address from
localhost source address.
$ ping6 -I ::1 2001:1:1::2 -c 3
PING 2001:1:1::2 (2001:1:1::2) from ::1 : 56 data bytes
64 bytes from 2001:1:1::2: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 2001:1:1::2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 2001:1:1::2: icmp_seq=3 ttl=64 time=0.122 ms
2001:1:1::2 ping statistics
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.037/0.076/0.122/0.035 ms
Fixes: 1b70d792cf ("ipv6: Use rt6i_idev index for echo replies to a local address")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260121194409.6749-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In mvpp2_ethtool_cls_rule_ins(), the ethtool_rule is allocated by
ethtool_rx_flow_rule_create(). If the subsequent conversion to flow
type fails, the function jumps to the clean_rule label.
However, the clean_rule label only frees efs, skipping the cleanup
of ethtool_rule, which leads to a memory leak.
Fix this by jumping to the clean_eth_rule label, which properly calls
ethtool_rx_flow_rule_destroy() before freeing efs.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: f4f1ba1819 ("net: mvpp2: cls: Report an error for unsupported flow types")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260123065716.2248324-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2026-01-23
The first patch is by Zilin Guan and fixes a memory leak in the error
path of the at91_can driver's probe function.
The last patch is by me and fixes yet another error in the gs_usb's
gs_usb_receive_bulk_callback() function.
* tag 'linux-can-fixes-for-6.19-20260123' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: gs_usb: gs_usb_receive_bulk_callback(): fix error message
can: at91_can: Fix memory leak in at91_can_probe()
====================
Link: https://patch.msgid.link/20260123173241.1026226-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In octep_device_setup(), if octep_ctrl_net_init() fails, the function
returns directly without unmapping the mapped resources and freeing the
allocated configuration memory.
Fix this by jumping to the unsupported_dev label, which performs the
necessary cleanup. This aligns with the error handling logic of other
paths in this function.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: 577f0d1b1c ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260121130551.3717090-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_uart: fix null-ptr-deref in hci_uart_write_work
- MGMT: Fix memory leak in set_ssp_complete
* tag 'for-net-2026-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix memory leak in set_ssp_complete
Bluetooth: hci_uart: fix null-ptr-deref in hci_uart_write_work
====================
Link: https://patch.msgid.link/20260122200751.2950279-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In esw_acl_ingress_lgcy_setup(), if esw_acl_table_create() fails,
the function returns directly without releasing the previously
created counter, leading to a memory leak.
Fix this by jumping to the out label instead of returning directly,
which aligns with the error handling logic of other paths in this
function.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260120134640.2717808-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix memory leak in set_ssp_complete() where mgmt_pending_cmd structures
are not freed after being removed from the pending list.
Commit 302a1f674c ("Bluetooth: MGMT: Fix possible UAFs") replaced
mgmt_pending_foreach() calls with individual command handling but missed
adding mgmt_pending_free() calls in both error and success paths of
set_ssp_complete(). Other completion functions like set_le_complete()
were fixed correctly in the same commit.
This causes a memory leak of the mgmt_pending_cmd structure and its
associated parameter data for each SSP command that completes.
Add the missing mgmt_pending_free(cmd) calls in both code paths to fix
the memory leak. Also fix the same issue in set_advertising_complete().
Fixes: 302a1f674c ("Bluetooth: MGMT: Fix possible UAFs")
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_uart_set_proto() sets HCI_UART_PROTO_INIT before calling
hci_uart_register_dev(), which calls proto->open() to initialize
hu->priv. However, if a TTY write wakeup occurs during this window,
hci_uart_tx_wakeup() may schedule write_work before hu->priv is
initialized, leading to a NULL pointer dereference in
hci_uart_write_work() when proto->dequeue() accesses hu->priv.
The race condition is:
CPU0 CPU1
---- ----
hci_uart_set_proto()
set_bit(HCI_UART_PROTO_INIT)
hci_uart_register_dev()
tty write wakeup
hci_uart_tty_wakeup()
hci_uart_tx_wakeup()
schedule_work(&hu->write_work)
proto->open(hu)
// initializes hu->priv
hci_uart_write_work()
hci_uart_dequeue()
proto->dequeue(hu)
// accesses hu->priv (NULL!)
Fix this by moving set_bit(HCI_UART_PROTO_INIT) after proto->open()
succeeds, ensuring hu->priv is initialized before any work can be
scheduled.
Fixes: 5df5dafc17 ("Bluetooth: hci_uart: Fix another race during initialization")
Link: https://lore.kernel.org/linux-bluetooth/6969764f.170a0220.2b9fc4.35a7@mx.google.com/
Signed-off-by: Jia-Hong Su <s11242586@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and wireless.
Pretty big, but hard to make up any cohesive story that would explain
it, a random collection of fixes. The two reverts of bad patches from
this release here feel like stuff that'd normally show up by rc5 or
rc6. Perhaps obvious thing to say, given the holiday timing.
That said, no active investigations / regressions. Let's see what the
next week brings.
Current release - fix to a fix:
- can: alloc_candev_mqs(): add missing default CAN capabilities
Current release - regressions:
- usbnet: fix crash due to missing BQL accounting after resume
- Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ...
Previous releases - regressions:
- Revert "nfc/nci: Add the inconsistency check between the input ...
Previous releases - always broken:
- number of driver fixes for incorrect use of seqlocks on stats
- rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue
when MSG_PEEK was set
- ipvlan: make the addrs_lock be per port avoid races in the port
hash table
- sched: enforce that teql can only be used as root qdisc
- virtio: coalesce only linear skb
- wifi: ath12k: fix dead lock while flushing management frames
- eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue"
* tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
Octeontx2-af: Add proper checks for fwdata
dpll: Prevent duplicate registrations
net/sched: act_ife: avoid possible NULL deref
hinic3: Fix netif_queue_set_napi queue_index input parameter error
vsock/test: add stream TX credit bounds test
vsock/virtio: cap TX credit to local buffer size
vsock/test: fix seqpacket message bounds test
vsock/virtio: fix potential underflow in virtio_transport_get_credit()
net: fec: account for VLAN header in frame length calculations
net: openvswitch: fix data race in ovs_vport_get_upcall_stats
octeontx2-af: Fix error handling
net: pcs: pcs-mtk-lynxi: report in-band capability for 2500Base-X
rxrpc: Fix data-race warning and potential load/store tearing
net: dsa: fix off-by-one in maximum bridge ID determination
net: bcmasp: Fix network filter wake for asp-3.0
bonding: provide a net pointer to __skb_flow_dissect()
selftests: net: amt: wait longer for connection before sending packets
be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list
Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning"
netrom: fix double-free in nr_route_frame()
...
Pull LED fix from Lee Jones:
- Fix race condition leading to null pointer dereference on ThinkPad
* tag 'leds-fixes-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds:
leds: led-class: Only Add LED to leds_list when it is fully ready
Modify the internal registration helpers dpll_xa_ref_{dpll,pin}_add()
to reject duplicate registration attempts.
Previously, if a caller attempted to register the same pin multiple
times (with the same ops, priv, and cookie) on the same device, the core
silently increments the reference count and return success. This behavior
is incorrect because if the caller makes these duplicate registrations
then for the first one dpll_pin_registration is allocated and for others
the associated dpll_pin_ref.refcount is incremented. During the first
unregistration the associated dpll_pin_registration is freed and for
others WARN is fired.
Fix this by updating the logic to return `-EEXIST` if a matching
registration is found to enforce a strict "register once" policy.
Fixes: 9431063ad3 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260121130012.112606-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In at91_can_probe(), the dev structure is allocated via alloc_candev().
However, if the subsequent call to devm_phy_optional_get() fails, the
code jumps directly to exit_iounmap, missing the call to free_candev().
This results in a memory leak of the allocated net_device structure.
Fix this by jumping to the exit_free label instead, which ensures that
free_candev() is called to properly release the memory.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: 3ecc09856a ("can: at91_can: add CAN transceiver support")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Link: https://patch.msgid.link/20260122114128.643752-1-zilin@seu.edu.cn
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Johannes Berg says:
====================
Another set of updates:
- various small fixes for ath10k/ath12k/mwifiex/rsi
- cfg80211 fix for HE bitrate overflow
- mac80211 fixes
- S1G beacon handling in scan
- skb tailroom handling for HW encryption
- CSA fix for multi-link
- handling of disabled links during association
* tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: ignore link disabled flag from userspace
wifi: mac80211: apply advertised TTLM from association response
wifi: mac80211: parse all TTLM entries
wifi: mac80211: don't increment crypto_tx_tailroom_needed_cnt twice
wifi: mac80211: don't perform DA check on S1G beacon
wifi: ath12k: Fix wrong P2P device link id issue
wifi: ath12k: fix dead lock while flushing management frames
wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel
wifi: ath12k: cancel scan only on active scan vdev
wifi: mwifiex: Fix a loop in mwifiex_update_ampdu_rxwinsize()
wifi: mac80211: correctly check if CSA is active
wifi: cfg80211: Fix bitrate calculation overflow for HE rates
wifi: rsi: Fix memory corruption due to not set vif driver data size
wifi: ath12k: don't force radio frequency check in freq_to_idx()
wifi: ath12k: fix dma_free_coherent() pointer
wifi: ath10k: fix dma_free_coherent() pointer
====================
Link: https://patch.msgid.link/20260122110248.15450-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefano Garzarella says:
====================
vsock/virtio: fix TX credit handling
The original series was posted by Melbin K Mathew <mlbnkm1@gmail.com> till v4.
Since it's a real issue and the original author seems busy, I'm sending
the new version fixing my comments but keeping the authorship (and restoring
mine on patch 2 as reported on v4).
v5: https://lore.kernel.org/netdev/20260116201517.273302-1-sgarzare@redhat.com/
v4: https://lore.kernel.org/netdev/20251217181206.3681159-1-mlbnkm1@gmail.com/
From Melbin K Mathew <mlbnkm1@gmail.com>:
This series fixes TX credit handling in virtio-vsock:
Patch 1: Fix potential underflow in get_credit() using s64 arithmetic
Patch 2: Fix vsock_test seqpacket bounds test
Patch 3: Cap TX credit to local buffer size (security hardening)
Patch 4: Add stream TX credit bounds regression test
The core issue is that a malicious guest can advertise a huge buffer
size via SO_VM_SOCKETS_BUFFER_SIZE, causing the host to allocate
excessive sk_buff memory when sending data to that guest.
On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with
32 guest vsock connections advertising 2 GiB each and reading slowly
drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB; the system only
recovered after killing the QEMU process.
With this series applied, the same PoC shows only ~35 MiB increase in
Slab/SUnreclaim, no host OOM, and the guest remains responsive.
====================
Link: https://patch.msgid.link/20260121093628.9941-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a regression test for the TX credit bounds fix. The test verifies
that a sender with a small local buffer size cannot queue excessive
data even when the peer advertises a large receive buffer.
The client:
- Sets a small buffer size (64 KiB)
- Connects to server (which advertises 2 MiB buffer)
- Sends in non-blocking mode until EAGAIN
- Verifies total queued data is bounded
This guards against the original vulnerability where a remote peer
could cause unbounded kernel memory allocation by advertising a large
buffer and reading slowly.
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: use sock_buf_size to check the bytes sent + small fixes]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-5-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The virtio transports derives its TX credit directly from peer_buf_alloc,
which is set from the remote endpoint's SO_VM_SOCKETS_BUFFER_SIZE value.
On the host side this means that the amount of data we are willing to
queue for a connection is scaled by a guest-chosen buffer size, rather
than the host's own vsock configuration. A malicious guest can advertise
a large buffer and read slowly, causing the host to allocate a
correspondingly large amount of sk_buff memory.
The same thing would happen in the guest with a malicious host, since
virtio transports share the same code base.
Introduce a small helper, virtio_transport_tx_buf_size(), that
returns min(peer_buf_alloc, buf_alloc), and use it wherever we consume
peer_buf_alloc.
This ensures the effective TX window is bounded by both the peer's
advertised buffer and our own buf_alloc (already clamped to
buffer_max_size via SO_VM_SOCKETS_BUFFER_MAX_SIZE), so a remote peer
cannot force the other to queue more data than allowed by its own
vsock settings.
On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with
32 guest vsock connections advertising 2 GiB each and reading slowly
drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB; the system only
recovered after killing the QEMU process. That said, if QEMU memory is
limited with cgroups, the maximum memory used will be limited.
With this patch applied:
Before:
MemFree: ~61.6 GiB
Slab: ~142 MiB
SUnreclaim: ~117 MiB
After 32 high-credit connections:
MemFree: ~61.5 GiB
Slab: ~178 MiB
SUnreclaim: ~152 MiB
Only ~35 MiB increase in Slab/SUnreclaim, no host OOM, and the guest
remains responsive.
Compatibility with non-virtio transports:
- VMCI uses the AF_VSOCK buffer knobs to size its queue pairs per
socket based on the local vsk->buffer_* values; the remote side
cannot enlarge those queues beyond what the local endpoint
configured.
- Hyper-V's vsock transport uses fixed-size VMBus ring buffers and
an MTU bound; there is no peer-controlled credit field comparable
to peer_buf_alloc, and the remote endpoint cannot drive in-flight
kernel memory above those ring sizes.
- The loopback path reuses virtio_transport_common.c, so it
naturally follows the same semantics as the virtio transport.
This change is limited to virtio_transport_common.c and thus affects
virtio-vsock, vhost-vsock, and loopback, bringing them in line with the
"remote window intersected with local policy" behaviour that VMCI and
Hyper-V already effectively have.
Fixes: 06a8fc7836 ("VSOCK: Introduce virtio_vsock_common.ko")
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: small adjustments after changing the previous patch]
[Stefano: tweak the commit message]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-4-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test requires the sender (client) to send all messages before waking
up the receiver (server).
Since virtio-vsock had a bug and did not respect the size of the TX
buffer, this test worked, but now that we are going to fix the bug, the
test hangs because the sender would fill the TX buffer before waking up
the receiver.
Set the buffer size in the sender (client) as well, as we already do for
the receiver (server).
Fixes: 5c338112e4 ("test/vsock: rework message bounds test")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-3-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The credit calculation in virtio_transport_get_credit() uses unsigned
arithmetic:
ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
If the peer shrinks its advertised buffer (peer_buf_alloc) while bytes
are in flight, the subtraction can underflow and produce a large
positive value, potentially allowing more data to be queued than the
peer can handle.
Reuse virtio_transport_has_space() which already handles this case and
add a comment to make it clear why we are doing that.
Fixes: 06a8fc7836 ("VSOCK: Introduce virtio_vsock_common.ko")
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: use virtio_transport_has_space() instead of duplicating the code]
[Stefano: tweak the commit message]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-2-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The MAX_FL (maximum frame length) and related calculations used ETH_HLEN,
which does not account for the 4-byte VLAN tag in tagged frames. This
caused the hardware to reject valid VLAN frames as oversized, resulting
in RX errors and dropped packets.
Use VLAN_ETH_HLEN instead of ETH_HLEN in the MAX_FL register setup,
cut-through mode threshold, buffer allocation, and max_mtu calculation.
Cc: stable@kernel.org # v6.18+
Fixes: 62b5bb7be7 ("net: fec: update MAX_FL based on the current MTU")
Fixes: d466c16026 ("net: fec: enable the Jumbo frame support for i.MX8QM")
Fixes: 59e9bf037d ("net: fec: add change_mtu to support dynamic buffer allocation")
Fixes: ec2a1681ed ("net: fec: use a member variable for maximum buffer size")
Signed-off-by: Clemens Gruber <mail@clemensgruber.at>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260121083751.66997-1-mail@clemensgruber.at
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull hyperv fixes from Wei Liu:
- Fix ARM64 port of the MSHV driver (Anirudh Rayabharam)
- Fix huge page handling in the MSHV driver (Stanislav Kinsburskii)
- Minor fixes to driver code (Julia Lawall, Michael Kelley)
* tag 'hyperv-fixes-signed-20260121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: handle gpa intercepts for arm64
mshv: add definitions for arm64 gpa intercepts
mshv: Add __user attribute to argument passed to access_ok()
mshv: Store the result of vfs_poll in a variable of type __poll_t
mshv: Align huge page stride with guest mapping
Drivers: hv: Always do Hyper-V panic notification in hv_kmsg_dump()
Drivers: hv: vmbus: fix typo in function name reference
Pull perf-tools fix from Namhyung Kim:
"A minor fix for error handling in the event parser"
* tag 'perf-tools-fixes-for-v6.19-2026-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf parse-events: Fix evsel allocation failure
Fix the following:
BUG: KCSAN: data-race in rxrpc_peer_keepalive_worker / rxrpc_send_data_packet
which is reporting an issue with the reads and writes to ->last_tx_at in:
conn->peer->last_tx_at = ktime_get_seconds();
and:
keepalive_at = peer->last_tx_at + RXRPC_KEEPALIVE_TIME;
The lockless accesses to these to values aren't actually a problem as the
read only needs an approximate time of last transmission for the purposes
of deciding whether or not the transmission of a keepalive packet is
warranted yet.
Also, as ->last_tx_at is a 64-bit value, tearing can occur on a 32-bit
arch.
Fix both of these by switching to an unsigned int for ->last_tx_at and only
storing the LSW of the time64_t. It can then be reconstructed at need
provided no more than 68 years has elapsed since the last transmission.
Fixes: ace45bec6d ("rxrpc: Fix firewall route keepalive")
Reported-by: syzbot+6182afad5045e6703b3d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/695e7cfb.050a0220.1c677c.036b.GAE@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Link: https://patch.msgid.link/1107124.1768903985@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-20 (ice, idpf)
For ice:
Cody Haas breaks dependency of needing both RSS key and LUT for
ice_get_rxfh() as ethtool ioctls do not always supply both.
Paul fixes issues related to devlink reload; adding missing deinit HW
call and moving hwmon exit function to the proper call chain.
For idpf:
Mina Almasry moves a register read call into the time sandwich to ensure
the register is properly flushed.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: read lower clock bits inside the time sandwich
ice: fix devlink reload call trace
ice: add missing ice_deinit_hw() in devlink reinit path
ice: Fix persistent failure in ice_get_rxfh
====================
Link: https://patch.msgid.link/20260120224430.410377-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prior to the blamed commit, the bridge_num range was from
0 to ds->max_num_bridges - 1. After the commit, it is from
1 to ds->max_num_bridges.
So this check:
if (bridge_num >= max)
return 0;
must be updated to:
if (bridge_num > max)
return 0;
in order to allow the last bridge_num value (==max) to be used.
This is easiest visible when a driver sets ds->max_num_bridges=1.
The observed behaviour is that even the first created bridge triggers
the netlink extack "Range of offloadable bridges exceeded" warning, and
is handled in software rather than being offloaded.
Fixes: 3f9bb0301d ("net: dsa: make dp->bridge_num one-based")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260120211039.3228999-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After 3cbf4ffba5 ("net: plumb network namespace into __skb_flow_dissect")
we have to provide a net pointer to __skb_flow_dissect(),
either via skb->dev, skb->sk, or a user provided pointer.
In the following case, syzbot was able to cook a bare skb.
WARNING: net/core/flow_dissector.c:1131 at __skb_flow_dissect+0xb57/0x68b0 net/core/flow_dissector.c:1131, CPU#1: syz.2.1418/11053
Call Trace:
<TASK>
bond_flow_dissect drivers/net/bonding/bond_main.c:4093 [inline]
__bond_xmit_hash+0x2d7/0xba0 drivers/net/bonding/bond_main.c:4157
bond_xmit_hash_xdp drivers/net/bonding/bond_main.c:4208 [inline]
bond_xdp_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5139 [inline]
bond_xdp_get_xmit_slave+0x1fd/0x710 drivers/net/bonding/bond_main.c:5515
xdp_master_redirect+0x13f/0x2c0 net/core/filter.c:4388
bpf_prog_run_xdp include/net/xdp.h:700 [inline]
bpf_test_run+0x6b2/0x7d0 net/bpf/test_run.c:421
bpf_prog_test_run_xdp+0x795/0x10e0 net/bpf/test_run.c:1390
bpf_prog_test_run+0x2c7/0x340 kernel/bpf/syscall.c:4703
__sys_bpf+0x562/0x860 kernel/bpf/syscall.c:6182
__do_sys_bpf kernel/bpf/syscall.c:6274 [inline]
__se_sys_bpf kernel/bpf/syscall.c:6272 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6272
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
Fixes: 58deb77cc5 ("bonding: balance ICMP echoes in layer3+4 mode")
Reported-by: syzbot+c46409299c70a221415e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/696faa23.050a0220.4cb9c.001f.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matteo Croce <mcroce@redhat.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260120161744.1893263-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both send_mcast4() and send_mcast6() use sleep 2 to wait for the tunnel
connection between the gateway and the relay, and for the listener
socket to be created in the LISTENER namespace.
However, tests sometimes fail because packets are sent before the
connection is fully established.
Increase the waiting time to make the tests more reliable, and use
wait_local_port_listen() to explicitly wait for the listener socket.
Fixes: c08e8baea7 ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20260120133930.863845-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the parameter pmac_id_valid argument of be_cmd_get_mac_from_list() is
set to false, the driver may request the PMAC_ID from the firmware of the
network card, and this function will store that PMAC_ID at the provided
address pmac_id. This is the contract of this function.
However, there is a location within the driver where both
pmac_id_valid == false and pmac_id == NULL are being passed. This could
result in dereferencing a NULL pointer.
To resolve this issue, it is necessary to pass the address of a stub
variable to the function.
Fixes: 95046b927a ("be2net: refactor MAC-addr setup code")
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Link: https://patch.msgid.link/20260120113734.20193-1-a.vatoropin@crpt.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit eeecf5d3a3.
This change lead to MHI WWAN device can't connect to internet.
I found a netwrok issue with kernel 6.19-rc4, but network works
well with kernel 6.18-rc1. After checking, this commit is the
root cause.
Before appliing this serial changes on MHI WWAN network, we shall
revert this change in case of v6.19 being impacted.
Fixes: eeecf5d3a3 ("net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning")
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Link: https://patch.msgid.link/20260120072018.29375-1-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull SoC fixes from Arnd Bergmann:
"The main changes are devicetree updates for qualcomm and rockchips
arm64 platforms, fixing minor mistakes in SoC and board specific
settings:
- GPIO settings for Pinephone Pro buttons
- Register ranges for rk3576 GPU
- Power domains on sc8280xp
- Clocks on qcom talos
- dtc warnings for extraneous properties, nonstandard node names and
undocument identifiers
The Tegra210 platform gets a single revert for a devicetree change
that caused a 6.19 regression.
On 32-bit Arm, we have trivial fixes for Microchip SAMA7 devicetree
files and NPCM Kconfig, as well as Andrew Jeffery being officially
listed as MAINTAINER for NPCM.
A single driver fix is for Qualcomm RPMHD power domains, bringing the
driver up to date with a devicetree change that added additional power
domains to be enabled"
* tag 'soc-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits)
MAINTAINERS: Add Andrew as M: to ARM/NUVOTON NPCM ARCHITECTURE
MAINTAINERS: update email address for Yixun Lan
Revert "arm64: tegra: Add interconnect properties for Tegra210"
arm64: dts: rockchip: Drop unsupported properties
arm64: dts: rockchip: Fix gpio pinctrl node names
arm64: dts: rockchip: Fix pinctrl property typo on rk3326-odroid-go3
arm64: dts: rockchip: Drop "sitronix,st7789v" fallback compatible from rk3568-wolfvision
ARM: dts: microchip: sama7d65: fix size-cells property for i2c3
ARM: dts: microchip: sama7d65: fix the ranges property for flx9
arm: npcm: drop unused Kconfig ERRATA symbol
arm64: dts: rockchip: Fix wrong register range of rk3576 gpu
arm64: dts: rockchip: Configure MCLK for analog sound on NanoPi M5
arm64: dts: rockchip: Fix headphones widget name on NanoPi M5
ARM: dts: microchip: lan966x: Fix the access to the PHYs for pcb8290
arm64: dts: rockchip: remove redundant max-link-speed from nanopi-r4s
arm64: dts: rockchip: remove dangerous max-link-speed from helios64
arm64: dts: rockchip: fix unit-address for RK3588 NPU's core1 and core2's IOMMU
arm64: dts: rockchip: Fix wifi interrupts flag on Sakura Pi RK3308B
arm64: dts: qcom: sm8650: Fix compile warnings in USB controller node
arm64: dts: qcom: sm8550: Fix compile warnings in USB controller node
...