Commit Graph

1445403 Commits

Author SHA1 Message Date
Dawei Feng
2bccfb8476 qed: fix double free in qed_cxt_tables_alloc()
If one of the later PF or VF CID bitmap allocations fails,
qed_cid_map_alloc() jumps to cid_map_fail and frees the previously
allocated CID bitmaps before returning an error. qed_cxt_tables_alloc()
then calls qed_cxt_mngr_free(), which invokes qed_cid_map_free()
again.

Fix this by setting each CID bitmap pointer to NULL after bitmap_free()
to avoid double free.

The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc3.

Runtime reproduction was not attempted because exercising the failing
allocation path requires device-specific setup.

Fixes: fe56b9e6a8 ("qed: Add module with basic common support")
Cc: stable@vger.kernel.org
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
Link: https://patch.msgid.link/20260520070323.2762379-1-dawei.feng@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:15:00 -07:00
Aditya Garg
b809d04099 net: mana: validate rx_req_idx to prevent out-of-bounds array access
In mana_hwc_rx_event_handler(), rx_req_idx is derived from
sge->address in DMA-coherent memory. In Confidential VMs
(SEV-SNP/TDX), this memory is shared unencrypted and HW can modify
WQE contents at any time. No bounds check exists on rx_req_idx,
which can lead to an out-of-bounds access into reqs[].

Add bounds check on rx_req_idx in mana_hwc_rx_event_handler() before
using it to index the reqs[] array.

Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20260520051553.857120-1-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:14:11 -07:00
Ratheesh Kannoth
9eddc819f0 octeontx2-af: npc: Fix allmulticast skip logic for LBK and SDP VFs
When installing the allmulticast NPC rule, rvu_npc_install_allmulti_entry()
should skip LBK and SDP VFs (only CGX PF/VF may add the entry).  The
code combined is_lbk_vf() and is_sdp_vf() with logical AND, which is
never true for a single pcifunc, so the intended early return never ran.

Use logical OR instead.

Cc: Geetha sowjanya <gakula@marvell.com>
Fixes: ae703539f4 ("octeontx2-af: Cleanup loopback device checks")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260520043036.1523798-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:12:59 -07:00
Zhang Cen
c367b90821 netpoll: normalize skb->dev to the netpoll device
__netpoll_send_skb() always transmits through np->dev and queues busy
packets on np->dev->npinfo->txq, but it leaves skb->dev unchanged.
Stacked callers such as DSA and macvlan can reach netpoll with skb->dev
still naming the upper device while np->dev is the lower device that
owns the netpoll state.

If the skb has to be deferred, queue_process() later dequeues it from
the lower device's txq but retries it through skb->dev. That can
re-enter the upper ndo_start_xmit path on an already transformed skb,
and if the upper device disappears before the lower txq drains the
workqueue can dereference a stale skb->dev pointer.

The buggy scenario involves two paths, with each column showing the
order within that path:

path A label: netpoll enqueue path   path B label: upper-device teardown
1. Stacked xmit calls netpoll        1. Teardown unregisters the upper
   with lower np->dev and upper         net_device while lower npinfo
   skb->dev.                            stays alive.
2. __netpoll_send_skb() uses         2. netdev_release() runs for the
   np->dev->npinfo as the txq           upper net_device.
   owner.
3. Busy transmit queues the skb      3. The lower txq still owns the
   on that lower txq with upper         deferred skb.
   skb->dev.
4. queue_process() drains the        4. queue_process() dereferences
   lower txq and reads skb->dev.        that stale upper skb->dev.

Normalize skb->dev to np->dev after loading np->dev from the netpoll
instance, before either the direct transmit path or the fallback enqueue.
This keeps the queued skb in the same device and txq domain as the
netpoll state that owns it.

KASAN report as below:

KASAN slab-use-after-free in queue_process+0x7c/0x480
Workqueue: events queue_process
The buggy address belongs to the object at ffff88810906c000 which belongs
to the cache kmalloc-4k of size 4096
The buggy address is located 168 bytes inside of freed 4096-byte region
[ffff88810906c000, ffff88810906d000)
Read of size 8
Call trace:
  dump_stack_lvl+0x73/0xb0 (?:?)
  print_report+0xd1/0x620 (?:?)
  srso_alias_return_thunk+0x5/0xfbef5 (?:?)
  __virt_addr_valid+0x215/0x420 (?:?)
  kasan_complete_mode_report_info+0x64/0x200 (?:?)
  kasan_report+0xf7/0x130 (?:?)
  queue_process+0x7c/0x480 (net/core/netpoll.c:88)
  kasan_check_range+0x10c/0x1c0 (?:?)
  __kasan_check_read+0x15/0x20 (?:?)
  process_one_work+0x8b7/0x1af0 (kernel/workqueue.c:3200)
  assign_work+0x170/0x3f0 (?:?)
  worker_thread+0x574/0xf10 (?:?)
  _raw_spin_unlock_irqrestore+0x4b/0x60 (?:?)
  trace_hardirqs_on+0x2a/0x180 (?:?)
  kthread+0x2fc/0x3f0 (?:?)
  ret_from_fork+0x58b/0x830 (?:?)
  __switch_to+0x58e/0xe90 (?:?)
  __switch_to_asm+0x39/0x70 (?:?)
  ret_from_fork_asm+0x1a/0x30 (?:?)
Freed by task stack:
  kasan_save_stack+0x3d/0x60 (?:?)
  kasan_save_track+0x18/0x40 (?:?)
  kasan_save_free_info+0x3f/0x60 (?:?)
  __kasan_slab_free+0x48/0x70 (?:?)
  kfree+0x20e/0x4e0 (?:?)
  kvfree+0x31/0x40 (?:?)
  netdev_release+0x71/0x90 (net/core/net-sysfs.c:2227)
  device_release+0xd2/0x250 (?:?)
  kobject_put+0x181/0x4c0 (lib/kobject.c:730)
  netdev_run_todo+0x700/0x1000 (net/core/dev.c:11666)
  rtnl_dellink+0x396/0xc00 (net/core/rtnetlink.c:3558)
  rtnetlink_rcv_msg+0x740/0xc20 (net/core/rtnetlink.c:6897)
  netlink_rcv_skb+0x147/0x3a0 (?:?)
  rtnetlink_rcv+0x19/0x20 (net/core/rtnetlink.c:7021)
  netlink_unicast+0x4d1/0x830 (net/netlink/af_netlink.c:1327)
  netlink_sendmsg+0x840/0xe10 (net/netlink/af_netlink.c:1812)
  ____sys_sendmsg+0x8a7/0xb50 (?:?)
  ___sys_sendmsg+0x104/0x190 (?:?)
  __sys_sendmsg+0x135/0x1d0 (?:?)
  __x64_sys_sendmsg+0x7b/0xc0 (?:?)
  x64_sys_call+0x205c/0x2130 (?:?)
  do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87)
  entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?)

Fixes: 5de4a473bd ("netpoll queue cleanup")
Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
Link: https://patch.msgid.link/20260519104647.3517990-1-rollkingzzc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:10:18 -07:00
Abdun Nihaal
c5d93b2c40 net: wwan: iosm: fix potential memory leaks in ipc_imem_init()
The memory allocated in ipc_protocol_init() is not freed on the error
paths that follow in ipc_imem_init(). Fix that by calling the
corresponding release function ipc_protocol_deinit() in the error path.

Fixes: 3670970dd8 ("net: iosm: shared memory IPC interface")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Link: https://patch.msgid.link/20260519062815.55545-1-nihaal@cse.iitm.ac.in
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:06:02 -07:00
Michael Grzeschik
85fac50b58 MAINTAINERS: Update address for Michael Grzeschik
Since I am moving from Pengutronix update my email address for the
ARCNET subsystems to point to my kernel.org address.

Also update .mailmap.

Signed-off-by: Michael Grzeschik <mgr@kernel.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Markus Schneider-Pargmann <mail@markussp.com>
Link: https://patch.msgid.link/20260521-maintainer-v1-1-29b5e106682d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:47:49 -07:00
Jakub Kicinski
099258bde1 MAINTAINERS: add missing entry for Bluetooth include files
We X-out net/bluetooth/ from "NETWORKING [GENERAL]" so that only
the dedicated list is CCed on patches, and networking gets them
once already processed by Luiz. We missed include/net/bluetooth.

Link: https://patch.msgid.link/20260521004151.625049-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:47:40 -07:00
Nimrod Oren
dfc0770433 selftests: net: Fix checksums in xdp_native
Data adjustment cases failed with "Data exchange failed" when using IPv4
because the program did not update the IP and UDP checksums in the IPv4
branch. The issue was masked when both IPv4 and IPv6 were configured,
since the test harness prefers IPv6.

While here, generalize csum_fold_helper() to fold twice so it works for
any 32-bit input.

Fixes: 0b65cfcef9 ("selftests: drv-net: Test tail-adjustment support")
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20260520153928.3371765-1-noren@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:47:00 -07:00
Yuho Choi
1341db3224 ipv6: route: Unregister netdevice notifier on BPF init failure
ip6_route_init() registers ip6_route_dev_notifier before registering the
IPv6 route BPF iterator target. If bpf_iter_register() fails after the
notifier has been registered, the error path currently jumps to
out_register_late_subsys and unwinds the RTNL handlers and pernet route
state without removing the notifier from the netdevice notifier chain.

This leaves ip6_route_dev_notify() callable after the IPv6 route state it
uses has been torn down. Add a separate unwind label for the BPF iterator
failure path and unregister the netdevice notifier before continuing with
the existing cleanup.

Fixes: 138d0be35b ("net: bpf: Add netlink and ipv6_route bpf_iter targets")
Signed-off-by: Yuho Choi <dbgh9129@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260520030329.1061183-1-dbgh9129@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:43:15 -07:00
Matthieu Baerts (NGI0)
92cc6708f4 selftests: rds: config: disable modules
The run.sh script explicitly checks that CONFIG_MODULES is disabled.

By default, this config option is enabled. Explicitly disable it to be
able to run the RDS tests.

Note that writing '# CONFIG_(...) is not set' is usually recommended to
disable an option in the .config, but it looks like selftests usually
set 'CONFIG_(...)=n', which looks clearer.

Fixes: 0f5d680047 ("selftests: rds: add tools/testing/selftests/net/rds/config")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260520-net-rds-config-modules-v1-1-2100df02fe9a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:39:20 -07:00
Zijing Yin
dbc81608e3 phonet/pep: disable BH around forwarded sk_receive_skb()
The networking receive path is usually run from softirq context, but
protocols that take the socket lock may have packets stored in the
backlog and processed later from process context. In that case
release_sock() -> __release_sock() drops the slock with spin_unlock_bh()
and then calls sk->sk_backlog_rcv() with bottom halves enabled.

Typical sk_backlog_rcv handlers process the socket whose backlog is
being drained, so the BH state at entry is irrelevant for the slocks
they touch. pep_do_rcv() is different: when the inbound skb targets an
existing PEP pipe, it forwards the skb to a different *child* socket
via sk_receive_skb(). That helper takes the child slock with
bh_lock_sock_nested(), which is just spin_lock_nested() and assumes BH
is already off. The same child slock therefore ends up acquired with
BH on (process path) and with BH off (softirq path):

  process context                   softirq context
  ---------------                   ---------------
  release_sock(listener)            __netif_receive_skb()
   __release_sock()                  phonet_rcv()
    spin_unlock_bh()                  __sk_receive_skb(listener)
    [BH now ENABLED]                  [BH already disabled]
    sk_backlog_rcv:                   sk_backlog_rcv:
     pep_do_rcv()                      pep_do_rcv()
      sk_receive_skb(child)             sk_receive_skb(child)
       bh_lock_sock_nested(child)        bh_lock_sock_nested(child)
       => SOFTIRQ-ON-W                   => IN-SOFTIRQ-W

Lockdep flags this as inconsistent lock state, and it can become a real
self-deadlock if a softirq on the same CPU tries to receive to the same
child socket while its slock is held in the BH-enabled path:

  WARNING: inconsistent lock state
  inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
   (slock-AF_PHONET/1){+.?.}-{3:3}, at: __sk_receive_skb+0x1cf/0x900
    __sk_receive_skb              net/core/sock.c:563
    sk_receive_skb                include/net/sock.h:2022 [inline]
    pep_do_rcv                    net/phonet/pep.c:675
    sk_backlog_rcv                include/net/sock.h:1190
    __release_sock                net/core/sock.c:3216
    release_sock                  net/core/sock.c:3815
    pep_sock_accept               net/phonet/pep.c:879

Wrap the forwarded sk_receive_skb() in local_bh_disable() /
local_bh_enable() so the child slock is always acquired with BH off.
local_bh_disable() nests safely on the softirq path.

Discovered via in-house syzkaller fuzzing; the same root cause also
on the linux-6.1.y syzbot dashboard as extid 44f0626dd6284f02663c.
Reproduced under KASAN + LOCKDEP + PROVE_LOCKING, reproducer:
https://pastebin.com/A3t8xzCR

Fixes: 9641458d3e ("Phonet: Pipe End Point for Phonet Pipes protocol")
Link: https://syzkaller.appspot.com/bug?extid=44f0626dd6284f02663c
Cc: stable@vger.kernel.org
Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Reported-by: syzbot+9f4a135646b66c509935@syzkaller.appspotmail.com
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260519172635.86304-1-yzjaurora@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:38:21 -07:00
Paolo Abeni
42734af663 Merge tag 'batadv-net-pullrequest-20260520' of https://git.open-mesh.org/batadv
Simon Wunderlich says:

====================
Here are batman-adv bugfixes, all by by Sven Eckelmann.

 - fix batadv_skb_is_frag() kernel-doc

 - BATMAN V: stop OGMv2 on disabled interface

 - BATMAN IV: abort OGM send on tvlv append failure

 - BATMAN IV: reject oversized TVLV packets

 - tp_meter: fix race condition in send error reporting

 - tp_meter: avoid role confusion in tp_list

 - mcast: fix use-after-free in orig_node RCU release

 - BATMAN IV: recover OGM scheduling after forward packet error

 - bla: fix report_work leak on backbone_gw purge

 - bla: avoid double decrement of bla.num_requests

 - bla: avoid NULL-ptr deref for claim via dropped interface

* tag 'batadv-net-pullrequest-20260520' of https://git.open-mesh.org/batadv:
  batman-adv: bla: avoid NULL-ptr deref for claim via dropped interface
  batman-adv: bla: avoid double decrement of bla.num_requests
  batman-adv: bla: fix report_work leak on backbone_gw purge
  batman-adv: iv: recover OGM scheduling after forward packet error
  batman-adv: mcast: fix use-after-free in orig_node RCU release
  batman-adv: tp_meter: avoid role confusion in tp_list
  batman-adv: tp_meter: fix race condition in send error reporting
  batman-adv: tvlv: reject oversized TVLV packets
  batman-adv: tvlv: abort OGM send on tvlv append failure
  batman-adv: v: stop OGMv2 on disabled interface
  batman-adv: fix batadv_skb_is_frag() kernel-doc
====================

Link: https://patch.msgid.link/20260520115422.53552-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 15:59:11 +02:00
Paolo Abeni
94e3dd6874 Merge branch 'vsock-virtio-fix-skb-overhead-accounting-to-preserve-full-buf_alloc'
Stefano Garzarella says:

====================
vsock/virtio: fix skb overhead accounting to preserve full buf_alloc

Patch 1 resets the connection when we can no longer queue packets,
this prevents silent data loss, and both peers are notified.

Patch 2 increases the total budget to `buf_alloc * 2` for payload
plus skb overhead similar to how SO_RCVBUF is doubled to reserve
space for sk_buff metadata. This preserves the full buf_alloc for
payload under normal operation, while still bounding the skb queue
growth.

In the future, we plan to improve how we handle the merging of packets
to minimize overhead and avoid closing connections.

v3: https://lore.kernel.org/netdev/20260513105417.56761-1-sgarzare@redhat.com/
v2: https://lore.kernel.org/netdev/20260512080737.36787-1-sgarzare@redhat.com/
v1: https://lore.kernel.org/netdev/20260508092330.69690-1-sgarzare@redhat.com/
====================

Link: https://patch.msgid.link/20260518090656.134588-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 13:14:04 +02:00
Stefano Garzarella
c6087c5aaa vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
After commit 059b7dbd20 ("vsock/virtio: fix potential unbounded skb
queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
buf_alloc when checking whether a new packet fits. This reduces the
effective receive buffer below what the user configured via
SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
silently dropped and applications that rely on the full buffer size
to deadlock.

Also, the reduced space is not communicated to the remote peer, so
its credit calculation accounts more credit than the receiver will
actually accept, causing data loss (there is no retransmission).

With this approach we currently have failures in
tools/testing/vsock/vsock_test.c. Test 18 sometimes fails, while
test 22 always fails in this way:
    18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch

    22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
    Resource temporarily unavailable

Fix by allowing at most `buf_alloc * 2` as the total budget for payload
plus skb overhead in virtio_transport_inc_rx_pkt(), similar to how
SO_RCVBUF is doubled to reserve space for sk_buff metadata.
This preserves the full buf_alloc for payload under normal operation,
while still bounding the skb queue growth.

With this patch, all tests in tools/testing/vsock/vsock_test.c are
now passing again.

Fixes: 059b7dbd20 ("vsock/virtio: fix potential unbounded skb queue")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260518090656.134588-3-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 13:14:01 +02:00
Stefano Garzarella
a4f0b00178 vsock/virtio: reset connection on receiving queue overflow
When there is no more space to queue an incoming packet, the packet is
silently dropped. This causes data loss without any notification to
either peer, since there is no retransmission.

Under normal circumstances, this should never happen. However, it could
happen if the other peer doesn't respect the credit, or if the skb
overhead, which we recently began to take into account with commit
059b7dbd20 ("vsock/virtio: fix potential unbounded skb queue"),
is too high.

Fix this by resetting the connection and setting the local socket error
to ENOBUFS when virtio_transport_recv_enqueue() can no longer queue a
packet, so both peers are explicitly notified of the failure rather than
silently losing data.

Fixes: ae6fcfbf5f ("vsock/virtio: discard packets if credit is not respected")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260518090656.134588-2-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 13:14:01 +02:00
Paolo Abeni
0377bd2722 Merge branch 'net-stmmac-eic7700-fix-delay-calculation-and-initialization-ordering'
Zhi Li says:

====================
net: stmmac: eic7700: fix delay calculation and initialization ordering

From: Zhi Li <lizhi2@eswincomputing.com>
====================

Link: https://patch.msgid.link/20260518021919.404-1-lizhi2@eswincomputing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:58:19 +02:00
Zhi Li
c2e152f7ce net: stmmac: eswin: validate RGMII delay values
Validate rx-internal-delay-ps and tx-internal-delay-ps against the
hardware capabilities of the EIC7700 MAC.

The programmable RGMII delay supports 20 ps steps and a maximum value of
2540 ps. The driver previously accepted arbitrary values and silently
truncated unsupported settings when converting them to hardware units.

As a result, invalid device tree values could lead to unexpected delay
programming and incorrect RGMII timing.

Reject delay values that are not multiples of 20 ps or exceed the
supported hardware range.

Fixes: ea77dbbdbc ("net: stmmac: add Eswin EIC7700 glue driver")
Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
Link: https://patch.msgid.link/20260518022214.507-1-lizhi2@eswincomputing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:58:17 +02:00
Zhi Li
6ffcef9bc1 net: stmmac: eswin: correct RGMII delay granularity to 20 ps
The EIC7700 MAC implements programmable RGMII delay adjustment with a
granularity of 20 ps per hardware step.

The driver previously converted rx-internal-delay-ps and
tx-internal-delay-ps values using a 100 ps step size, resulting in
incorrect delay programming.

Update the conversion to use the correct 20 ps granularity so the
programmed delay matches the values described in the device tree.

Fixes: ea77dbbdbc ("net: stmmac: add Eswin EIC7700 glue driver")
Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
Link: https://patch.msgid.link/20260518022156.484-1-lizhi2@eswincomputing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:58:17 +02:00
Zhi Li
6872fb088e net: stmmac: eswin: clear TXD and RXD delay registers during initialization
Clear the TXD and RXD delay control registers during EIC7700 DWMAC
initialization.

These registers may retain values programmed by the bootloader. If left
unchanged, residual delays can alter the effective RGMII timing seen by
the MAC and override the configuration described by the device tree.

This may violate the expected RGMII timing model and can cause link
instability or prevent the Ethernet controller from operating correctly.

Explicitly clearing these registers ensures that the MAC delay settings
are determined solely by the kernel configuration.

The corresponding register offsets are optional, and the registers are
only cleared when the offsets are provided in the device tree.

Fixes: ea77dbbdbc ("net: stmmac: add Eswin EIC7700 glue driver")
Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
Link: https://patch.msgid.link/20260518022137.464-1-lizhi2@eswincomputing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:58:17 +02:00
Zhi Li
23386defe9 net: stmmac: eswin: fix HSP CSR init ordering after clock enable
Fix the initialization ordering of the HSP CSR configuration in the
EIC7700 DWMAC glue driver.

The HSP CSR registers control MAC-side RGMII delay behavior and must
only be accessed after the corresponding clocks are enabled. The
previous implementation could trigger register access before clock
enablement, leading to undefined behavior depending on boot state.

Move the HSP CSR configuration into the post-clock-enable initialization
path to ensure all register accesses occur under valid clock domains.

This change ensures deterministic initialization and prevents
clock-dependent register access failures during probe or resume.

Fixes: ea77dbbdbc ("net: stmmac: add Eswin EIC7700 glue driver")
Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
Link: https://patch.msgid.link/20260518022055.444-1-lizhi2@eswincomputing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:58:16 +02:00
Zhi Li
c36069c6f4 dt-bindings: ethernet: eswin: add optional TXD and RXD delay register offsets
Document two optional cells in eswin,hsp-sp-csr for the TXD and RXD
delay control register offsets.

These registers are used by the driver to clear any residual delay
configuration left by the bootloader, ensuring that MAC-side RGMII delay
settings are applied solely according to the kernel configuration.

Add a reference to the EIC7700X SoC Technical Reference Manual for
background information about the HSP CSR block.

Fixes: 888bd0eca9 ("dt-bindings: ethernet: eswin: Document for EIC7700 SoC")
Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260518022023.427-1-lizhi2@eswincomputing.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:58:16 +02:00
Hyunwoo Kim
48f6a5356a net: skbuff: propagate shared-frag marker through frag-transfer helpers
Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail
to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()->flags when
moving frags from source to destination.  __pskb_copy_fclone() defers
the rest of the shinfo metadata to skb_copy_header() after copying
frag descriptors, but that helper only carries over gso_{size,segs,
type} and never touches skb_shinfo()->flags; skb_shift() moves frag
descriptors directly and leaves flags untouched.  As a result, the
destination skb keeps a reference to the same externally-owned or
page-cache-backed pages while reporting skb_has_shared_frag() as
false.

The mismatch is harmful in any in-place writer that uses
skb_has_shared_frag() to decide whether shared pages must be detoured
through skb_cow_data().  ESP input is one such writer (esp4.c,
esp6.c), and a single nft 'dup to <local>' rule -- or any other
nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d
skb in esp_input() with the marker stripped, letting an unprivileged
user write into the page cache of a root-owned read-only file via
authencesn-ESN stray writes.

Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors
were actually moved from the source.  skb_copy() and skb_copy_expand()
share skb_copy_header() too but linearize all paged data into freshly
allocated head storage and emerge with nr_frags == 0, so
skb_has_shared_frag() returns false on its own; they need no change.

The same omission exists in skb_gro_receive() and skb_gro_receive_list().
The former moves the incoming skb's frag descriptors into the
accumulator's last sub-skb via two paths (a direct frag-move loop and
the head_frag + memcpy path); the latter chains the incoming skb whole
onto p's frag_list.  Downstream skb_segment() reads only
skb_shinfo(p)->flags, and skb_segment_list() reuses each sub-skb's
shinfo as the nskb -- both p and lp must carry the marker.

The same omission also exists in tcp_clone_payload(), which builds an
MTU probe skb by moving frag descriptors from skbs on sk_write_queue
into a freshly allocated nskb.  The helper falls into the same family
and warrants the same fix for consistency; no TCP TX-side in-place
writer is currently known to reach a user page through this gap, but
a future consumer depending on the marker would regress silently.

The same omission exists in skb_segment(): the per-iteration flag
merge takes only head_skb's flag, and the inner switch that rebinds
frag_skb to list_skb on head_skb-frags exhaustion does not fold the
new frag_skb's flag into nskb.  Fold frag_skb's flag at both sites
so segments drawing frags from frag_list members carry the marker.

Fixes: cef401de7b ("net: fix possible wrong checksum generation")
Fixes: f4c50a4034 ("xfrm: esp: avoid in-place decrypt on shared skb frags")
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Suggested-by: Lin Ma <malin89@huawei.com>
Suggested-by: Jingguo Tan <tanjingguo@huawei.com>
Suggested-by: Aaron Esau <aaron1esau@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Tested-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Link: https://patch.msgid.link/ageeJfJHwgzmKXbh@v4bel
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:31:05 +02:00
Eric Dumazet
1bbf0ced1d tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction
Blamed commit moved the TIME_WAIT-derived ISN from the skb control
block to a per-CPU variable, assuming the value would always be consumed
by tcp_conn_request() for the same packet that wrote it. That assumption
is violated by multiple drop paths between the producer
(__this_cpu_write(tcp_tw_isn, isn) in tcp_v{4,6}_rcv()) and the consumer
(tcp_conn_request()):

 - min_ttl / min_hopcount check
 - xfrm policy check
 - tcp_inbound_hash() MD5/AO mismatch
 - tcp_filter() eBPF/SO_ATTACH_FILTER drop
 - th->syn && th->fin discard in tcp_rcv_state_process() TCP_LISTEN
 - psp_sk_rx_policy_check() in tcp_v{4,6}_do_rcv()
 - tcp_checksum_complete() in tcp_v{4,6}_do_rcv()
 - tcp_v{4,6}_cookie_check() returning NULL

When a packet is dropped on any of these paths, tcp_tw_isn is left set.

The next SYN processed on the same CPU then consumes the non zero value in
tcp_conn_request(), receiving a potentially predictable ISN.

This patch moves back tcp_tw_isn to skb->cb[], getting rid of the per-cpu
variable.

Note that tcp_v{4,6}_fill_cb() do not set it.

Very litle impact on overall code size/complexity:

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7)
Function                                     old     new   delta
tcp_v6_rcv                                  3038    3042      +4
tcp_v4_rcv                                  3035    3039      +4
tcp_conn_request                            2938    2923     -15
Total: Before=24436060, After=24436053, chg -0.00%

Fixes: 41eecbd712 ("tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260519084611.2485277-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:14:06 -07:00
Minh Nguyen
99e22ddf4e vsock/vmci: fix UAF when peer resets connection during handshake
vmci_transport_recv_connecting_server() returned err = 0 for a peer
RST in its default switch arm:

	err = pkt->type == VMCI_TRANSPORT_PACKET_TYPE_RST ? 0 : -EINVAL;

That made vmci_transport_recv_listen() skip vsock_remove_pending(),
leaving the pending socket on the listener's pending_links with
sk_state = TCP_CLOSE while destroy: still dropped the explicit
reference taken before schedule_delayed_work().

One second later vsock_pending_work() observed is_pending=true and
performed full cleanup: vsock_remove_pending() then the two trailing
sock_put(sk) calls -- the first reached refcount 0 and __sk_freed
the socket, and the second wrote into the freed object:

  BUG: KASAN: slab-use-after-free in refcount_warn_saturate
  Write of size 4 at addr ffff88800b1cac80 by task kworker
  Workqueue: events vsock_pending_work

Treat peer RST like any other unexpected packet type (err = -EINVAL).
All destroy: arms now return err < 0, so vmci_transport_recv_listen()
removes pending from pending_links synchronously and
vsock_pending_work() takes the is_pending=false / !rejected branch,
dropping only its own work reference.  This also closes the
multi-packet race Sashiko reported on v2: pending is removed from
the list before any subsequent packet can find it.

The pre-existing sk_acceptq_removed() gap on the err < 0 path of
vmci_transport_recv_listen() that Sashiko also noted is not
introduced or changed by this patch.

Tested on lts-6.12.79 with KASAN: 52/100 unpatched -> 0/100 patched.

Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Cc: stable@vger.kernel.org
Signed-off-by: Minh Nguyen <minhnguyen.080505@gmail.com>
Acked-by: Bryan Tan <bryan-bt.tan@broadcom.com>
Link: https://patch.msgid.link/20260519102310.237181-1-minhnguyen.080505@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:11:18 -07:00
Ivan Vecera
fa997ddef5 dpll: zl3073x: fix memory leak on pin registration failure
If zl3073x_dpll_pin_register() fails, the allocated pin is not yet
added to zldpll->pins list. The error path calls
zl3073x_dpll_pins_unregister() which only iterates pins on the list,
so the current pin is leaked. Free the pin before jumping to the error
label.

Additionally move the pin->dpll_pin = NULL assignment in
zl3073x_dpll_pin_register() from err_register to the common
err_pin_get path. When dpll_pin_get() fails, pin->dpll_pin holds an
ERR_PTR value. Without this fix the subsequent zl3073x_dpll_pin_free()
would trigger a spurious WARN because it checks pin->dpll_pin for
non-NULL.

Fixes: 75a71ecc24 ("dpll: zl3073x: Register DPLL devices and pins")
Reviewed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260519132205.161847-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:02:01 -07:00
Eric Dumazet
e4bdef4d32 ipv4: use WARN_ON_ONCE() in ip_rt_bug()
It turns out ip_rt_bug() can be called more than expected.

syzbot will still panic (because of panic_on_warn=1), but non debug
kernels will no longer die while repeating stack traces on the console.

Fixes: c378a9c019 ("ipv4: Give backtrace in ip_rt_bug().")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260519193248.4018872-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:00:36 -07:00
Eric Dumazet
7eb72c1e39 ipv4: icmp: reject broadcast/multicast routes
syzbot was able to trigger ip_rt_bug() in a loop, using an IPv4 packet
with a crafted IPOPT_SSRR option:

  options: ipv4_options {
    options: array[ipv4_option] {
      union ipv4_option {
        ssrr: ipv4_option_route[IPOPT_SSRR] {
         type: const = 0x89 (1 bytes)
         length: len = 0x7 (1 bytes)
         pointer: int8 = 0xa2 (1 bytes)
         data: array[ipv4_addr] {
           union ipv4_addr {
             broadcast: const = 0xffffffff (4 bytes)
           }
         }
       }
     }

Change __icmp_send() to not send ICMP to broadcast/multicast destinations.

Fixes: c378a9c019 ("ipv4: Give backtrace in ip_rt_bug().")
Reported-by: syzbot+c13a57c2639c2c0d03a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a0cc169.170a0220.1f6c2d.0004.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260519200836.4141061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:00:02 -07:00
David Carlier
4eb82ba543 net: devmem: reject dma-buf bind with non-page-aligned size or SG length
net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
PAGE_SIZE multiples without checking:

  - tx_vec is sized dmabuf->size / PAGE_SIZE, and
    net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
    before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
    N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
    N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.

  - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
    covers the full byte len, so a non-page-multiple non-final sg
    desyncs num_niovs from the gen_pool region for every later sg, on
    both RX and TX.

dma-buf does not require page-aligned sizes, so the bind path has to
enforce what its own indexing assumes. Reject both with -EINVAL.

The size check is TX-only (only tx_vec is sized off dmabuf->size); the
SG-length check covers both directions.

Fixes: bd61848900 ("net: devmem: Implement TX path")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20260519203530.66310-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 18:59:01 -07:00
Jakub Kicinski
5027c886e2 Merge tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
 - L2CAP: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
 - ISO: drop ISO_END frames received without prior ISO_START
 - MGMT: validate Add Extended Advertising Data length
 - bnep: Fix UAF read of dev->name
 - btmtk: fix urb->setup_packet leak in error paths
 - btintel_pcie: Fix incorrect MAC access programming
 - hci_uart: fix UAFs and race conditions in close and init paths

* tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
  Bluetooth: hci_uart: fix UAFs and race conditions in close and init paths
  Bluetooth: MGMT: validate Add Extended Advertising Data length
  Bluetooth: btmtk: fix urb->setup_packet leak in error paths
  Bluetooth: ISO: drop ISO_END frames received without prior ISO_START
  Bluetooth: btintel_pcie: Fix incorrect MAC access programming
  Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
  Bluetooth: bnep: Fix UAF read of dev->name
====================

Link: https://patch.msgid.link/20260520204959.2902497-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 17:26:56 -07:00
Jakub Kicinski
4a2844dcc0 Merge branch 'bpf-skmsg-fix-verdict-sk_data_ready-racing-with-ktls-rx'
Xingwang Xiang says:

====================
bpf, skmsg: fix verdict sk_data_ready racing with ktls rx

sk_psock_verdict_data_ready() lacks the tls_sw_has_ctx_rx() guard that
sk_psock_strp_data_ready() gained in e91de6afa8.  When a socket is
inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is configured,
the missing guard causes tcp_read_skb() to drain sk_receive_queue without
advancing copied_seq, leaving a dangling frag_list pointer that
tls_decrypt_sg() walks — a use-after-free.

Patch 1 mirrors the fix from e91de6afa8: add the tls_sw_has_ctx_rx()
check to sk_psock_verdict_data_ready() so that when a TLS RX context is
present the function defers to psock->saved_data_ready (sock_def_readable)
instead of calling tcp_read_skb().

Patch 2 adds a selftest that drives the vulnerable sequence end-to-end
and verifies recv() returns the correct decrypted data.
====================

Link: https://patch.msgid.link/20260517145630.20521-1-v3rdant.xiang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 17:23:56 -07:00
Xingwang Xiang
33644bd38a selftests/bpf: add regression test for ktls+sockmap verdict UAF
Test the scenario where a socket is inserted into a sockmap with a
BPF_SK_SKB_VERDICT program before TLS RX is configured.  Previously
sk_psock_verdict_data_ready() would call tcp_read_skb() and drain the
receive queue without advancing copied_seq, causing tls_decrypt_sg()
to walk a dangling frag_list pointer (use-after-free).

The test drives the full vulnerable sequence and verifies that after
the fix recv() returns the correct decrypted data.

Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com>
Link: https://patch.msgid.link/20260517145630.20521-3-v3rdant.xiang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 17:23:56 -07:00
Xingwang Xiang
ddf8029623 bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
sk_psock_strp_data_ready() already checks tls_sw_has_ctx_rx() and
defers to psock->saved_data_ready when a TLS RX context is present,
avoiding a conflict with the TLS strparser's ownership of the receive
queue (commit e91de6afa8, "bpf: Fix running sk_skb program types
with ktls").

sk_psock_verdict_data_ready() has no equivalent guard.  When a socket
is inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is
configured, tls_sw_strparser_arm() saves sk_psock_verdict_data_ready
as rx_ctx->saved_data_ready.  On data arrival:

  tls_data_ready -> tls_strp_data_ready -> tls_rx_msg_ready
    -> saved_data_ready() = sk_psock_verdict_data_ready()
      -> tcp_read_skb() drains sk_receive_queue via __skb_unlink()
         without calling tcp_eat_skb(), so copied_seq is not advanced.

tls_strp_msg_load() then finds tcp_inq() >= full_len (stale), calls
tcp_recv_skb() on the now-empty queue, hits WARN_ON_ONCE(!first), and
returns with rx_ctx->strp.anchor.frag_list pointing at a psock-owned
(potentially freed) skb.  tls_decrypt_sg() subsequently walks that
frag_list: use-after-free.

Apply the same fix as sk_psock_strp_data_ready(): if a TLS RX context
is present, call psock->saved_data_ready (sock_def_readable) to wake
recv() waiters and return immediately, leaving the receive queue
untouched.  TLS retains sole ownership of the queue and decrypts the
record normally through tls_sw_recvmsg().

Fixes: ef5659280e ("bpf, sockmap: Allow skipping sk_skb parser program")
Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com>
Link: https://patch.msgid.link/20260517145630.20521-2-v3rdant.xiang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 17:21:21 -07:00
Rosen Penev
e7c70bf97e net: ag71xx: check error for platform_get_irq
Complete error handling for a failed platform_get_irq() call

Fixes: d51b6ce441 ("net: ethernet: add ag71xx driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260516212616.11758-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:50:47 -07:00
Jakub Kicinski
7769d17e02 Merge branch 'rxrpc-better-fix-for-data-response-decrypt-vs-splice'
David Howells says:

====================
rxrpc: Better fix for DATA/RESPONSE decrypt vs splice()

Here are two patches containing better fixes for the in-place decryption of
DATA and RESPONSE packets that can corrupt pagecache spliced into UDP
packets and sent to an AF_RXRPC server [CVE-2026-43500], plus a patch to
precheck the length of rxgk-secured DATA packets.

Of the main patches, one patch fixes DATA decryption by having recvmsg
unconditionally extract the data into a flat bounce buffer and, if need be,
decrypt it there.  It doesn't seem to cause a performance problem to do
this even on unencrypted packets; for encrypted packets it makes sure the
content is correctly aligned for crypto which seems to get a small
performance gain.

Further, it means that DATA packets are no longer copied in the I/O thread,
avoiding a slowdown of the protocol engine that runs there.

The other main patch fixes RESPONSE decryption by having the connection
event handler worker copy the data to a flat buffer and, again, decrypt it
there.  This simplifies RESPONSE handling.

With these two fixes, the data content of the received sk_buff no longer
gets altered.
====================

Link: https://patch.msgid.link/20260515230516.2718212-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:36:47 -07:00
David Howells
8bfab4b6ff rxrpc: Fix RESPONSE packet verification to extract skb to a linear buffer
This improves the fix for CVE-2026-43500.

Fix the verification of RESPONSE packets to avoid the problem of
overwriting a RESPONSE packet sent via splice to a local address by
extracting the contents of the UDP packet into a kmalloc'd linear buffer
rather than decrypting the data in place in the sk_buff (which may corrupt
the original buffer).

Fixes: 24481a7f57 ("rxrpc: Fix conn-level packet handling to unshare RESPONSE packets")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: Jiayuan Chen <jiayuan.chen@linux.dev>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260515230516.2718212-4-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:36:45 -07:00
David Howells
d2bc90cf6c rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
This improves the fix for CVE-2026-43500.

Fix the pagecache corruption from in-place decryption of a DATA packet
transmitted locally by splice() by getting rid of the packet sharing in the
I/O thread and unconditionally extracting the packet content into a bounce
buffer in which the buffer is decrypted.  recvmsg() (or the kernel
equivalent) then copies the data from the bounce buffer to the destination
buffer.  The sk_buff then remains unmodified.

This has an additional advantage in that the packet is then arranged in the
buffer with the correct alignment required for the crypto algorithms to
process directly.  The performance of the crypto does seem to be a little
faster and, surprisingly, the unencrypted performance doesn't seem to
change much - possibly due to removing complexity from the I/O thread.

Yet another advantage is that the I/O thread doesn't have to copy packets
which would slow down packet distribution, ACK generation, etc..

The buffer belongs to the call and is allocated initially at 2K,
sufficiently large to hold a whole jumbo subpacket, but the buffer will be
increased in size if needed.  However, to take this work, MSG_PEEK may
cause a later packet to be decrypted into the buffer, in which case the
earlier one will need re-decrypting for a subsequent recvmsg().

Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
switch to using USHRT_MAX to indicate an invalid offset.

Note also that I would generally prefer to replace the buffers of the
current sk_buff with a new kmalloc'd buffer of the right size, ditching the
old data and frags as this makes the handling of MSG_PEEK easier and
removes the re-decryption issue, but this looks like quite a complicated
thing to achieve.  skb_morph() looks half way to what I want, but I don't
want to have to allocate a new sk_buff.

Fixes: d0d5c0cd1e ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: Jiayuan Chen <jiayuan.chen@linux.dev>
cc: linux-afs@lists.infradead.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260515230516.2718212-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:36:45 -07:00
David Howells
2b50aceafe crypto/krb5, rxrpc: Fix lack of pre-decrypt/pre-verify length checks
Change the krb5 crypto library to provide facilities to precheck the length
of the message about to be decrypted or verified.

Fix AF_RXRPC to make use of this to validate DATA packets secured with
RxGK.

Fixes: 9d1d2b5934 ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Closes: https://sashiko.dev/#/patchset/20260511160753.607296-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Simon Horman <horms@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: linux-afs@lists.infradead.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260515230516.2718212-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:36:45 -07:00
Jakub Kicinski
b1a736f8bc Merge branch 'net-shaper-fix-valid-confusion-even-more'
Jakub Kicinski says:

====================
net: shaper: fix VALID confusion even more

Sashiko reported another pre-exising issue in the previous
batch of fixes:
https://sashiko.dev/#/patchset/20260510192904.3987113-7-kuba@kernel.org

Turns out I over-esitmated the guarantees of the XArray flags.
Stop using them completely.
====================

Link: https://patch.msgid.link/20260515221325.1685455-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:34:22 -07:00
Jakub Kicinski
b8d7519352 net: shaper: rework the VALID marking (again)
Recent commit changed the semantics from NOT_VALID to VALID.
I didn't realize that the flags are not stored atomically
with the entry in XArray. There's still a race of reader
observing a VALID mark for a slot, getting interrupted,
writer replacing the entry with a different one, reader
continuing, fetching the entry which is now a different
pointer than the pointer for which VALID was meant.

The biggest consequence of this is that we may see a UAF
since net_shaper_rollback() assumed that entries without
VALID can be freed without observing RCU.

Looks like the XArray marks are buying us nothing at this
point. Let's convert the code to an explicit valid field.
The smp_load_acquire() / smp_store_release() barriers are
marginally cleaner.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: 93954b40f6 ("net-shapers: implement NL set and delete operations")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260515221325.1685455-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:34:20 -07:00
Jakub Kicinski
a3442936dd net: shaper: annotate the data races
As previously discussed we don't care about making the shaper
state fully RCU-compliant because the hierarchy itself can't
be dumped in one go over Netlink. Let's annotate the reads
and writes to make that clear.

The field-by-field assignments will also be useful for the
next commit which adds explicit "valid" field (which we don't
want to override with the current full struct assignment).

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260515221325.1685455-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:34:20 -07:00
Prathamesh Deshpande
abe003b332 net/mlx5e: Fix eswitch mode block underflow on IPsec acquire SA
mlx5e_xfrm_add_state() handles acquire-flow temporary SAs by allocating
software state and skipping hardware offload setup.

That path jumps to the common success label before taking the eswitch mode
block. After tunnel-mode validation was moved earlier, the common success
label unconditionally calls mlx5_eswitch_unblock_mode(). For acquire SAs,
this decrements esw->offloads.num_block_mode without a matching increment.

Return directly after installing the acquire SA offload handle, so only the
paths that successfully called mlx5_eswitch_block_mode() call the matching
unblock.

Fixes: 22239eb258 ("net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed")
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260510225903.13184-1-prathameshdeshpande7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 15:19:29 -07:00
Jakub Kicinski
9a8e01c500 Merge branch 'udp-gso-fix-__udp_gso_segment-after-gso_partial-udp-length-change'
Gal Pressman says:

====================
udp: gso: Fix __udp_gso_segment() after GSO_PARTIAL UDP length change

This series fixes two issues introduced by commit b10b446ce7 ("udp:
gso: Use single MSS length in UDP header for GSO_PARTIAL"), which
switched __udp_gso_segment() to write the single MSS length into the UDP
header for GSO_PARTIAL skbs.

Patch 1 (from Alice) fixes the UDP checksum adjustment in
__udp_gso_segment().
The patch adjusts the checksum by the correct delta, and since msslen
and newlen become equivalent before the loop, drops one of the two
variables to simplify the code.

Patch 2 handles the case where the last segment of a GSO_PARTIAL skb is
itself a GSO skb. This happens when the original packet size is an exact
multiple of MSS, so the post-loop segment is not a remainder skb but a
full GSO chunk and must also carry the single MSS length in its UDP
header.
====================

Link: https://patch.msgid.link/20260518062250.3019914-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 15:04:17 -07:00
Gal Pressman
78effd896e udp: Fix UDP length on last GSO_PARTIAL segment
Following the cited commit, __udp_gso_segment() writes single MSS length
in the UDP header.
The cited patch doesn't account for the fact that the last segment could
be a GSO skb by itself. This could happen when the size of the packet is
a multiple of MSS, hence the first segment is also the last one (there
is no need for a remainder skb).

When the post-loop segment is a GSO skb, assign the single MSS length in
the UDP header.

Fixes: b10b446ce7 ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Closes: https://lore.kernel.org/all/6c3fb15e-711d-4b8d-b152-e03d9b05293f@linux.dev/
Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260518062250.3019914-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 15:03:47 -07:00
Alice Mikityanska
5f17ae0f59 udp: gso: Fix handling checksum in __udp_gso_segment
The cited commit started using msslen for uh->len, but still uses newlen
to adjust uh->check. Although the checksum is ignored in most cases due
to the hardware offload, __udp_gso_segment attempts to maintain the
correct one. Fix uh->check and adjust it by the right value.

Additionally, after the fix, newlen becomes assigned and unused before
the loop. The code can be simplified a bit if mss adjustment is dropped,
so that newlen becomes equal to msslen before the loop, and msslen can
be also dropped, saving a few lines of code.

This brings us back to one variable, drops an unneeded arithmetic for
mss, and fixes the UDP checksum.

Fixes: b10b446ce7 ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260518062250.3019914-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 15:02:47 -07:00
Safa Karakuş
ab1513597c Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
bt_accept_dequeue() unlinks a not-yet-accepted child from the parent
accept queue and release_sock()s it before returning, so the returned
sk has no caller reference and is unlocked.

l2cap_sock_cleanup_listen() walks these children on listening-socket
close.  A concurrent HCI disconnect drives hci_rx_work ->
l2cap_conn_del() which runs l2cap_chan_del() + l2cap_sock_kill() and
frees the child sk and its l2cap_chan; cleanup_listen() then uses both:

  BUG: KASAN: slab-use-after-free in l2cap_sock_kill
    l2cap_sock_kill / l2cap_sock_cleanup_listen / __x64_sys_close
  Freed by: l2cap_conn_del -> l2cap_sock_close_cb -> l2cap_sock_kill

This is distinct from the two fixes already in this area: commit
e83f5e24da ("Bluetooth: serialize accept_q access") serialises the
accept_q list/poll and takes temporary refs inside bt_accept_dequeue(),
and CVE-2025-39860 serialises the userspace close()/accept() race by
calling cleanup_listen() under lock_sock() in l2cap_sock_release().
Neither covers l2cap_conn_del() running from hci_rx_work, so this UAF
still reproduces on current bluetooth/master.

Take the reference at the source: bt_accept_dequeue() does sock_hold()
while sk is still locked, before release_sock(); callers sock_put().
cleanup_listen() pins the chan with l2cap_chan_hold_unless_zero() under
a brief child sk lock (serialising vs l2cap_sock_teardown_cb()), drops
it before l2cap_chan_lock(), and skips a duplicate l2cap_sock_kill() on
SOCK_DEAD.  conn->lock is not taken here: cleanup_listen() runs under
the parent sk lock and that would invert
conn->lock -> chan->lock -> sk_lock (lockdep).

KASAN/SMP: an unprivileged listen/close vs HCI-disconnect race produced
12 use-after-free reports per run before this change; 0, and no lockdep
report, over 1600+ raced iterations after it on bluetooth/master.

Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Cc: stable@vger.kernel.org
Reported-by: Siwei Zhang <oss@fourdim.xyz>
Reviewed-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Safa Karakuş <safa.karakus@secunnix.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Mingyu Wang
c1bb9336ae Bluetooth: hci_uart: fix UAFs and race conditions in close and init paths
Vulnerabilities leading to Use-After-Free (UAF) and Null Pointer
Dereference (NPD) conditions were observed in the lifecycle management
of hci_uart.

The primary issue arises because the workqueues (init_ready and
write_work) are only flushed/cancelled if the HCI_UART_PROTO_READY
flag is set during TTY close. If a hangup occurs before setup completes,
hci_uart_tty_close() skips the teardown of these workqueues and
proceeds to free the `hu` struct. When the scheduled work executes
later, it blindly dereferences the freed `hu` struct.

Furthermore, several data races and UAFs were identified in the teardown
sequence:
1. Calling hci_uart_flush() from hci_uart_close() without effectively
   disabling write_work causes a race condition where both can concurrently
   double-free hu->tx_skb. This happens because protocol timers can
   concurrently invoke hci_uart_tx_wakeup() and requeue write_work.
2. Calling hci_free_dev(hdev) before hu->proto->close(hu) causes a UAF
   when vendor specific protocol close callbacks dereference hu->hdev.
3. In the initialization error paths, failing to take the proto_lock
   write lock before clearing PROTO_READY leads to races with active
   readers. Additionally, hci_uart_tty_receive() accesses hu->hdev
   outside the read lock, leading to UAFs if the initialization error
   path frees hdev concurrently.

Fix these synchronization and lifecycle issues by:
1. Re-ordering hci_uart_tty_close() to clear HCI_UART_PROTO_READY first,
   followed immediately by a cancel_work_sync(&hu->write_work). Clearing
   the flag locks out concurrent protocol timers from successfully invoking
   hci_uart_tx_wakeup(), effectively rendering the cancellation permanent
   and preventing the tx_skb double-free.
2. Note: Clearing PROTO_READY early causes hci_uart_close() to skip
   hu->proto->flush(). This is perfectly safe in the tty_close path
   because hu->proto->close() executes shortly after, which intrinsically
   purges all protocol SKB queues and tears down the state.
3. Relocating hu->proto->close(hu) strictly prior to hci_free_dev(hdev)
   across all close and error paths to prevent vendor-level UAFs.
4. Moving the hdev->stat.byte_rx increment in hci_uart_tty_receive()
   inside the proto_lock read-side critical section to safely synchronize
   with device unregistration.
5. Adding cancel_work_sync(&hu->write_work) to hci_uart_close() to safely
   flush the workqueue before hci_uart_flush() is invoked via the HCI core.
6. Utilizing cancel_work_sync() instead of disable_work_sync() across
   all paths to prevent permanently breaking user-space retry capabilities.

Fixes: 3b799254cf ("Bluetooth: hci_uart: Cancel init work before unregistering")
Cc: stable@vger.kernel.org
Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Michael Bommarito
d3f7d17960 Bluetooth: MGMT: validate Add Extended Advertising Data length
MGMT_OP_ADD_EXT_ADV_DATA is registered as a variable-length command,
with MGMT_ADD_EXT_ADV_DATA_SIZE as the fixed header size.  The handler
then uses cp->adv_data_len and cp->scan_rsp_len to validate and copy
cp->data, but it never checks that those bytes are part of the mgmt
command payload.

A short command can therefore make add_ext_adv_data() pass an
out-of-bounds pointer into tlv_data_is_valid().  If the bytes beyond
the command buffer are addressable, they can also be copied into the
advertising instance as scan response data, where the caller can read
them back via MGMT_OP_GET_ADV_INSTANCE.  The trigger requires
CAP_NET_ADMIN in the initial user namespace; KASAN reports an 8-byte
slab-out-of-bounds read.

Reject commands whose length does not match the fixed header plus both
advertising data lengths before parsing cp->data.

Fixes: 1241057283 ("Bluetooth: Break add adv into two mgmt commands")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Jiajia Liu
dd1dda6b8d Bluetooth: btmtk: fix urb->setup_packet leak in error paths
The setup_packet of control urb is not freed if usb_submit_urb fails or
the submitted urb is killed. Add free in these two paths.

Fixes: a1c49c434e ("Bluetooth: btusb: Add protocol support for MediaTek MT7668U USB devices")
Signed-off-by: Jiajia Liu <liujiajia@kylinos.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
David Carlier
84c24fb151 Bluetooth: ISO: drop ISO_END frames received without prior ISO_START
ISO data PDUs carry a packet-boundary flag indicating START, CONT, END
or SINGLE. The ISO_CONT branch of iso_recv() guards against a missing
ISO_START by checking conn->rx_len before touching conn->rx_skb, but
ISO_END does not.

If a peer sends an ISO_END as the first packet on a fresh ISO
connection, conn->rx_skb is still NULL and conn->rx_len is zero, so
skb_put(conn->rx_skb, ...) dereferences NULL and oopses. For BIS,
where receivers sync to a broadcaster without pairing, any broadcaster
on the air can trigger this.

Mirror the ISO_CONT check at the top of ISO_END so a stray end fragment
is logged and dropped instead of crashing the host.

Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Kiran K
88365d04fd Bluetooth: btintel_pcie: Fix incorrect MAC access programming
btintel_pcie_get_mac_access() and btintel_pcie_release_mac_access()
were programming STOP_MAC_ACCESS_DIS and XTAL_CLK_REQ in addition to
the MAC_ACCESS_REQ handshake. These bits are not part of the host
MAC-access handshake on the supported parts; the driver was
programming them incorrectly. Drop the writes so the register update
contains only the bits the controller actually consumes.

Fixes: b9465e6670 ("Bluetooth: btintel_pcie: Read hardware exception data")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00