Justin Chen says:
====================
net: bcmgenet: fix queue lock up
We have been seeing reports of logs like this.
[ 41.761198] bcmgenet 1001300000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 2 timed out 10039 ms
[ 43.745198] bcmgenet 1001300000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 2 timed out 12023 ms
[ 45.729198] bcmgenet 1001300000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 2 timed out 14007 ms
We have two issues. The persistent queue timeouts and the eventual
lock up of the entire transmit.
We address the lock up issue first. The queue timeouts are due to
a fundamental design issue not a bug perse. Timeouts still persist,
but we should no longer lock up.
====================
Link: https://patch.msgid.link/20260406175756.134567-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cosmin Ratiu says:
====================
macsec: Add support for VLAN filtering in offload mode
This short series adds support for VLANs in MACsec devices when offload
mode is enabled. This allows VLAN netdevs on top of MACsec netdevs to
function, which accidentally used to be the case in the past, but was
broken. This series adds back proper support.
As part of this, the existing nsim-only MACsec offload tests were
translated to Python so they can run against real HW and new
traffic-based tests were added for VLAN filter propagation, since
there's currently no uAPI to check VLAN filters.
====================
Link: https://patch.msgid.link/20260408115240.1636047-1-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VLAN-filtering is done through two netdev features
(NETIF_F_HW_VLAN_CTAG_FILTER and NETIF_F_HW_VLAN_STAG_FILTER) and two
netdev ops (ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid).
Implement these and advertise the features if the lower device supports
them. This allows proper VLAN filtering to work on top of MACsec
devices, when the lower device is capable of VLAN filtering.
As a concrete example, having this chain of interfaces now works:
vlan_filtering_capable_dev(1) -> macsec_dev(2) -> macsec_vlan_dev(3)
Before the mentioned commit this used to accidentally work because the
MACsec device (and thus the lower device) was put in promiscuous mode
and the VLAN filter was not used. But after commit [1] correctly made
the macsec driver expose the IFF_UNICAST_FLT flag, promiscuous mode was
no longer used and VLAN filters on dev 1 kicked in. Without support in
dev 2 for propagating VLAN filters down, the register_vlan_dev ->
vlan_vid_add -> __vlan_vid_add -> vlan_add_rx_filter_info call from dev
3 is silently eaten (because vlan_hw_filter_capable returns false and
vlan_add_rx_filter_info silently succeeds).
For MACsec, VLAN filters are only relevant for offload, otherwise
the VLANs are encrypted and the lower devices don't care about them. So
VLAN filters are only passed on to lower devices in offload mode.
Flipping between offload modes now needs to offload/unoffload the
filters with vlan_{get,drop}_rx_*_filter_info().
To avoid the back-and-forth filter updating during rollback, the setting
of macsec->offload is moved after the add/del secy ops. This is safe
since none of the code called from those requires macsec->offload.
In case adding the filters fails, the added ones are rolled back and an
error is returned to the operation toggling the offload state.
Fixes: 0349659fd7 ("macsec: set IFF_UNICAST_FLT priv flag")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260408115240.1636047-5-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add VLAN filter propagation tests through offloaded MACsec devices via
actual traffic.
The tests create MACsec tunnels with matching SAs on both endpoints,
stack VLANs on top, and verify connectivity with ping. Covered:
- Offloaded MACsec with VLAN (filters propagate to HW)
- Software MACsec with VLAN (no HW filter propagation)
- Offload on/off toggle and verifying traffic still works
On netdevsim this makes use of the VLAN filter debugfs file to actually
validate that filters are applied/removed correctly.
On real hardware the traffic should validate actual VLAN filter
propagation.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260408115240.1636047-4-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move MACsec offload API and ethtool feature tests from
tools/testing/selftests/drivers/net/netdevsim/macsec-offload.sh to
tools/testing/selftests/drivers/net/macsec.py using the NetDrvEnv
framework so tests can run against both netdevsim (default) and real
hardware (NETIF=ethX). As some real hardware requires MACsec to use
encryption, add that to the tests.
Netdevsim-specific limit checks (max SecY, max RX SC) were moved into
separate test cases to avoid failures on real hardware.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260408115240.1636047-2-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter, IPsec and wireless. This is again
considerably bigger than the old average. No known outstanding
regressions.
Current release - regressions:
- net: increase IP_TUNNEL_RECURSION_LIMIT to 5
- eth: ice: fix PTP timestamping broken by SyncE code on E825C
Current release - new code bugs:
- eth: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure
Previous releases - regressions:
- core: fix cross-cache free of KFENCE-allocated skb head
- sched: act_csum: validate nested VLAN headers
- rxrpc: fix call removal to use RCU safe deletion
- xfrm:
- wait for RCU readers during policy netns exit
- fix refcount leak in xfrm_migrate_policy_find
- wifi: rt2x00usb: fix devres lifetime
- mptcp: fix slab-use-after-free in __inet_lookup_established
- ipvs: fix NULL deref in ip_vs_add_service error path
- eth:
- airoha: fix memory leak in airoha_qdma_rx_process()
- lan966x: fix use-after-free and leak in lan966x_fdma_reload()
Previous releases - always broken:
- ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data()
- ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group
dump
- bridge: guard local VLAN-0 FDB helpers against NULL vlan group
- xsk: tailroom reservation and MTU validation
- rxrpc:
- fix to request an ack if window is limited
- fix RESPONSE authenticator parser OOB read
- netfilter: nft_ct: fix use-after-free in timeout object destroy
- batman-adv: hold claim backbone gateways by reference
- eth:
- stmmac: fix PTP ref clock for Tegra234
- idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling
- ipa: fix GENERIC_CMD register field masks for IPA v5.0+"
* tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (104 commits)
net: lan966x: fix use-after-free and leak in lan966x_fdma_reload()
net: lan966x: fix page pool leak in error paths
net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool()
nfc: pn533: allocate rx skb before consuming bytes
l2tp: Drop large packets with UDP encap
net: ipa: fix event ring index not programmed for IPA v5.0+
net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+
MAINTAINERS: Add Prashanth as additional maintainer for amd-xgbe driver
devlink: Fix incorrect skb socket family dumping
af_unix: read UNIX_DIAG_VFS data under unix_state_lock
Revert "mptcp: add needs_id for netlink appending addr"
mptcp: fix slab-use-after-free in __inet_lookup_established
net: txgbe: leave space for null terminators on property_entry
net: ioam6: fix OOB and missing lock
rxrpc: proc: size address buffers for %pISpc output
rxrpc: only handle RESPONSE during service challenge
rxrpc: Fix buffer overread in rxgk_do_verify_authenticator()
rxrpc: Fix leak of rxgk context in rxgk_verify_response()
rxrpc: Fix integer overflow in rxgk_verify_response()
rxrpc: Fix missing error checks for rxkad encryption/decryption failure
...
Pull IOMMU fix from Will Deacon:
- Fix regression introduced by the empty MMU gather fix in -rc7, where
the ->iotlb_sync() callback can be elided incorrectly, resulting in
boot failures (hangs), crashes and potential memory corruption.
* tag 'iommu-fixes-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu: Ensure .iotlb_sync is called correctly
Pull x86 platform drivers fixes from Ilpo Järvinen:
- amd/pmc: Add Thinkpad L14 Gen3 to quirk_s2idle_bug
- asus-armoury: Add support for FA607NU, GU605MU, and GV302XU.
- intel-uncore-freq: Handle autonomous UFS status bit
- ISST: Handle cases with less than max buckets correctly
- intel-uncore-freq & ISST: Mark minor version 3 supported (no
additional driver changes required)
* tag 'platform-drivers-x86-v7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-armoury: add support for GU605MU
platform/x86: asus-armoury: add support for FA607NU
platform/x86: asus-armoury: add support for GV302XU
platform/x86/amd: pmc: Add Thinkpad L14 Gen3 to quirk_s2idle_bug
platform/x86/intel-uncore-freq: Increase minor version
platform/x86: ISST: Increase minor version
platform/x86/intel-uncore-freq: Handle autonomous UFS status bit
platform/x86: ISST: Reset core count to 0
David Carlier says:
====================
net: lan966x: fix page_pool error handling and error paths
This series fixes error handling around the lan966x page pool:
1/3 adds the missing IS_ERR check after page_pool_create(), preventing
a kernel oops when the error pointer flows into
xdp_rxq_info_reg_mem_model().
2/3 plugs page pool leaks in the lan966x_fdma_rx_alloc() and
lan966x_fdma_init() error paths, now reachable after 1/3.
3/3 fixes a use-after-free and page pool leak in the
lan966x_fdma_reload() restore path, where the hardware could
resume DMA into pages already returned to the page pool.
====================
Link: https://patch.msgid.link/20260405055241.35767-1-devnexen@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When lan966x_fdma_reload() fails to allocate new RX buffers, the restore
path restarts DMA using old descriptors whose pages were already freed
via lan966x_fdma_rx_free_pages(). Since page_pool_put_full_page() can
release pages back to the buddy allocator, the hardware may DMA into
memory now owned by other kernel subsystems.
Additionally, on the restore path, the newly created page pool (if
allocation partially succeeded) is overwritten without being destroyed,
leaking it.
Fix both issues by deferring the release of old pages until after the
new allocation succeeds. Save the old page array before the allocation
so old pages can be freed on the success path. On the failure path, the
old descriptors, pages and page pool are all still valid, making the
restore safe. Also ensure the restore path re-enables NAPI and wakes
the netdev, matching the success path.
Fixes: 89ba464fcf ("net: lan966x: refactor buffer reload function")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260405055241.35767-4-devnexen@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
lan966x_fdma_rx_alloc() creates a page pool but does not destroy it if
the subsequent fdma_alloc_coherent() call fails, leaking the pool.
Similarly, lan966x_fdma_init() frees the coherent DMA memory when
lan966x_fdma_tx_alloc() fails but does not destroy the page pool that
was successfully created by lan966x_fdma_rx_alloc(), leaking it.
Add the missing page_pool_destroy() calls in both error paths.
Fixes: 11871aba19 ("net: lan96x: Use page_pool API")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260405055241.35767-3-devnexen@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
page_pool_create() can return an ERR_PTR on failure. The return value
is used unconditionally in the loop that follows, passing the error
pointer through xdp_rxq_info_reg_mem_model() into page_pool_use_xdp_mem(),
which dereferences it, causing a kernel oops.
Add an IS_ERR check after page_pool_create() to return early on failure.
Fixes: 11871aba19 ("net: lan96x: Use page_pool API")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260405055241.35767-2-devnexen@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Many drivers have no reason to use the iotlb_gather mechanism, but do
still depend on .iotlb_sync being called to properly complete an unmap.
Since the core code is now relying on the gather to detect when there
is legitimately something to sync, it should also take care of encoding
a successful unmap when the driver does not touch the gather itself.
Fixes: 90c5def10b ("iommu: Do not call drivers for empty gathers")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Will Deacon <will@kernel.org>
pn532_receive_buf() reports the number of accepted bytes to the serdev
core. The current code consumes bytes into recv_skb and may already hand
a complete frame to pn533_recv_frame() before allocating a fresh receive
buffer.
If that alloc_skb() fails, the callback returns 0 even though it has
already consumed bytes, and it leaves recv_skb as NULL for the next
receive callback. That breaks the receive_buf() accounting contract and
can also lead to a NULL dereference on the next skb_put_u8().
Allocate the receive skb lazily before consuming the next byte instead.
If allocation fails, return the number of bytes already accepted.
Fixes: c656aa4c27 ("nfc: pn533: add UART phy driver")
Cc: stable@vger.kernel.org
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Link: https://patch.msgid.link/20260405094003.3-pn533-v2-pengpeng@iscas.ac.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For IPA v5.0+, the event ring index field moved from CH_C_CNTXT_0 to
CH_C_CNTXT_1. The v5.0 register definition intended to define this
field in the CH_C_CNTXT_1 fmask array but used the old identifier of
ERINDEX instead of CH_ERINDEX.
Without a valid event ring, GSI channels could never signal transfer
completions. This caused gsi_channel_trans_quiesce() to block
forever in wait_for_completion().
At least for IPA v5.2 this resolves an issue seen where runtime
suspend, system suspend, and remoteproc stop all hanged forever. It
also meant the IPA data path was completely non functional.
Fixes: faf0678ec8 ("net: ipa: add IPA v5.0 GSI register definitions")
Signed-off-by: Alexander Koskovich <akoskovich@pm.me>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260403-milos-ipa-v1-2-01e9e4e03d3e@fairphone.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-04-06 (idpf, ice, ixgbe, ixgbevf, igb, e1000)
Emil converts to use spinlock_t for virtchnl transactions to make
consistent use of the xn_bm_lock when accessing the free_xn_bm bitmap,
while also avoiding nested raw/bh spinlock issue on PREEMPT_RT kernels.
He also sets payload size before calling the async handler, to make sure
it doesn't error out prematurely due to invalid size check for idpf.
Kohei Enju changes WARN_ON for missing PTP control PF to a dev_info() on
ice as there are cases where this is expected and acceptable.
Petr Oros fixes conditions in which error paths failed to call
ice_ptp_port_phy_restart() breaking PTP functionality on ice.
Alex significantly reduces reporting of driver information, and time
under RTNL locl, on ixgbe e610 devices by reducing reads of flash info
only on events that could change it.
Michal Schmidt adds missing Hyper-V op on ixgbevf.
Alex Dvoretsky removes call to napi_synchronize() in igb_down() to
resolve a deadlock.
Agalakov Daniil adds error check on e1000 for failed EEPROM read.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000: check return value of e1000_read_eeprom
igb: remove napi_synchronize() in igb_down()
ixgbevf: add missing negotiate_features op to Hyper-V ops table
ixgbe: stop re-reading flash on every get_drvinfo for e610
ice: fix PTP timestamping broken by SyncE code on E825C
ice: ptp: don't WARN when controlling PF is unavailable
idpf: set the payload size before calling the async handler
idpf: improve locking around idpf_vc_xn_push_free()
idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling
====================
Link: https://patch.msgid.link/20260406213038.444732-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The devlink_fmsg_dump_skb function was incorrectly using the socket
type (sk->sk_type) instead of the socket family (sk->sk_family)
when filling the "family" field in the fast message dump.
This patch fixes this to properly display the socket family.
Fixes: 3dbfde7f6b ("devlink: add devlink_fmsg_dump_skb() function")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://patch.msgid.link/20260407022730.2393-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit was originally adding the ability to add MPTCP endpoints
with ID 0 by accident. The in-kernel PM, handling MPTCP endpoints at the
net namespace level, is not supposed to handle endpoints with such ID,
because this ID 0 is reserved to the initial subflow, as mentioned in
the MPTCPv1 protocol [1], a per-connection setting.
Note that 'ip mptcp endpoint add id 0' stops early with an error, but
other tools might still request the in-kernel PM to create MPTCP
endpoints with this restricted ID 0.
In other words, it was wrong to call the mptcp_pm_has_addr_attr_id
helper to check whether the address ID attribute is set: if it was set
to 0, a new MPTCP endpoint would be created with ID 0, which is not
expected, and might cause various issues later.
Fixes: 584f389426 ("mptcp: add needs_id for netlink appending addr")
Cc: stable@vger.kernel.org
Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.2-9 [1]
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260407-net-mptcp-revert-pm-needs-id-v2-1-7a25cbc324f8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ehash table lookups are lockless and rely on
SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability
during RCU read-side critical sections. Both tcp_prot and
tcpv6_prot have their slab caches created with this flag
via proto_register().
However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into
tcpv6_prot_override during inet_init() (fs_initcall, level 5),
before inet6_init() (module_init/device_initcall, level 6) has
called proto_register(&tcpv6_prot). At that point,
tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab
remains NULL permanently.
This causes MPTCP v6 subflow child sockets to be allocated via
kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab
cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so
when these sockets are freed without SOCK_RCU_FREE (which is
cleared for child sockets by design), the memory can be
immediately reused. Concurrent ehash lookups under
rcu_read_lock can then access freed memory, triggering a
slab-use-after-free in __inet_lookup_established.
Fix this by splitting the IPv6-specific initialization out of
mptcp_subflow_init() into a new mptcp_subflow_v6_init(), called
from mptcp_proto_v6_init() before protocol registration. This
ensures tcpv6_prot_override.slab correctly inherits the
SLAB_TYPESAFE_BY_RCU slab cache.
Fixes: b19bc2945b ("mptcp: implement delegated actions")
Cc: stable@vger.kernel.org
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260406031512.189159-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When trace->type.bit6 is set:
if (trace->type.bit6) {
...
queue = skb_get_tx_queue(dev, skb);
qdisc = rcu_dereference(queue->qdisc);
This code can lead to an out-of-bounds access of the dev->_tx[] array
when is_input is true. In such a case, the packet is on the RX path and
skb->queue_mapping contains the RX queue index of the ingress device. If
the ingress device has more RX queues than the egress device (dev) has
TX queues, skb_get_queue_mapping(skb) will exceed dev->num_tx_queues.
Add a check to avoid this situation since skb_get_tx_queue() does not
clamp the index. This issue has also revealed that per queue visibility
cannot be accurate and will be replaced later as a new feature.
While at it, add missing lock around qdisc_qstats_qlen_backlog(). The
function __ioam6_fill_trace_data() is called from both softirq and
process contexts, hence the use of spin_lock_bh() here.
Fixes: b63c5478e9 ("ipv6: ioam: Support for Queue depth data field")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260403214418.2233266-2-kuba@kernel.org/
Signed-off-by: Justin Iurman <justin.iurman@gmail.com>
Link: https://patch.msgid.link/20260404134137.24553-1-justin.iurman@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
A few last-minute fixes:
- rfkill: prevent boundless event list
- rt2x00: fix USB resource management
- brcmfmac: validate firmware IDs
- brcmsmac: fix DMA free size
* tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
net: rfkill: prevent unlimited numbers of rfkill events from being created
wifi: rt2x00usb: fix devres lifetime
wifi: brcmfmac: validate bsscfg indices in IF events
wifi: brcmsmac: Fix dma_free_coherent() size
====================
Link: https://patch.msgid.link/20260408081802.111623-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Klassert says:
====================
pull request (net): ipsec 2026-04-08
1) Clear trailing padding in build_polexpire() to prevent
leaking unititialized memory. From Yasuaki Torimaru.
2) Fix aevent size calculation when XFRMA_IF_ID is used.
From Keenan Dong.
3) Wait for RCU readers during policy netns exit before
freeing the policy hash tables.
4) Fix dome too eaerly dropped references on the netdev
when uding transport mode. From Qi Tang.
5) Fix refcount leak in xfrm_migrate_policy_find().
From Kotlyarov Mihail.
6) Fix two fix info leaks in build_report() and
in build_mapping(). From Greg Kroah-Hartman.
7) Zero aligned sockaddr tail in PF_KEY exports.
From Zhengchuan Liang.
* tag 'ipsec-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
net: af_key: zero aligned sockaddr tail in PF_KEY exports
xfrm_user: fix info leak in build_report()
xfrm_user: fix info leak in build_mapping()
xfrm: fix refcount leak in xfrm_migrate_policy_find
xfrm: hold dev ref until after transport_finish NF_HOOK
xfrm: Wait for RCU readers during policy netns exit
xfrm: account XFRMA_IF_ID in aevent size calculation
xfrm: clear trailing padding in build_polexpire()
====================
Link: https://patch.msgid.link/20260408095925.253681-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Wunderlich says:
====================
Here are two batman-adv bugfixes:
- reject oversized global TT response buffers, by Ruide Cao
- hold claim backbone gateways by reference, by Haoze Xie
* tag 'batadv-net-pullrequest-20260408' of https://git.open-mesh.org/linux-merge:
batman-adv: hold claim backbone gateways by reference
batman-adv: reject oversized global TT response buffers
====================
Link: https://patch.msgid.link/20260408110255.976389-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
netfilter updates for net
I only included crash fixes, as we're closer to a release, rest will
be handled via -next.
1) Fix a NULL pointer dereference in ip_vs_add_service error path, from
Weiming Shi, bug added in 6.2 development cycle.
2) Don't leak kernel data bytes from allocator to userspace: nfnetlink_log
needs to init the trailing NLMSG_DONE terminator. From Xiang Mei.
3) xt_multiport match lacks range validation, bogus userspace request will
cause out-of-bounds read. From Ren Wei.
4) ip6t_eui64 match must reject packets with invalid mac header before
calling eth_hdr. Make existing check unconditional. From Zhengchuan
Liang.
5) nft_ct timeout policies are free'd via kfree() while they may still
be reachable by other cpus that process a conntrack object that
uses such a timeout policy. Existing reaping of entries is not
sufficient because it doesn't wait for a grace period. Use kfree_rcu().
From Tuan Do.
6/7) Make nfnetlink_queue hash table per queue. As-is we can hit a page
fault in case underlying page of removed element was free'd. Per-queue
hash prevents parallel lookups. This comes with a test case that
demonstrates the bug, from Fernando Fernandez Mancera.
* tag 'nf-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: nft_queue.sh: add a parallel stress test
netfilter: nfnetlink_queue: make hash table per queue
netfilter: nft_ct: fix use-after-free in timeout object destroy
netfilter: ip6t_eui64: reject invalid MAC header for all packets
netfilter: xt_multiport: validate range encoding in checkentry
netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
ipvs: fix NULL deref in ip_vs_add_service error path
====================
Link: https://patch.msgid.link/20260408163512.30537-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here are some fixes for rxrpc:
(1) Fix key quota calculation.
(2) Fix a memory leak.
(3) Fix rxrpc_new_client_call_for_sendmsg() to substitute NULL for an
empty key.
Might want to remove this substitution entirely or handle it in
rxrpc_init_client_call_security() instead.
(4) Fix deletion of call->link to be RCU safe.
(5) Fix missing bounds checks when parsing RxGK tickets.
(6) Fix use of wrong skbuff to get challenge serial number. Also actually
substitute the newer response skbuff and release the older one.
(7) Fix unexpected RACK timer warning to report old mode.
(8) Fix call key refcount leak.
(9) Fix the interaction of jumbograms with Tx window space, setting the
request-ack flag when the window space is getting low, typically
because each jumbogram take a big bite out of the window and fewer UDP
packets get traded.
(10) Don't call rxrpc_put_call() with a NULL pointer.
(11) Reject undecryptable rxkad response tickets by checking result of
decryption.
(12) Fix buffer bounds calculation in the RESPONSE authenticator parser.
(13) Fix oversized response length check.
(14) Fix refcount leak on multiple setting of server keyring.
(15) Fix checks made by RXRPC_SECURITY_KEY and RXRPC_SECURITY_KEYRING (both
should be allowed).
(16) Fix lack of result checking on calls to crypto_skcipher_en/decrypt().
(17) Fix token_len limit check in rxgk_verify_response().
(18) Fix rxgk context leak in rxgk_verify_response().
(19) Fix read beyond end of buffer in rxgk_do_verify_authenticator().
(20) Fix parsing of RESPONSE packet on a connection that has already been set
from a prior response.
(21) Fix size of buffers used for rendering addresses into for procfiles.
====================
Link: https://patch.msgid.link/20260408121252.2249051-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The AF_RXRPC procfs helpers format local and remote socket addresses into
fixed 50-byte stack buffers with "%pISpc".
That is too small for the longest current-tree IPv6-with-port form the
formatter can produce. In lib/vsprintf.c, the compressed IPv6 path uses a
dotted-quad tail not only for v4mapped addresses, but also for ISATAP
addresses via ipv6_addr_is_isatap().
As a result, a case such as
[ffff:ffff:ffff:ffff:0:5efe:255.255.255.255]:65535
is possible with the current formatter. That is 50 visible characters, so
51 bytes including the trailing NUL, which does not fit in the existing
char[50] buffers used by net/rxrpc/proc.c.
Size the buffers from the formatter's maximum textual form and switch the
call sites to scnprintf().
Changes since v1:
- correct the changelog to cite the actual maximum current-tree case
explicitly
- frame the proof around the ISATAP formatting path instead of the earlier
mapped-v4 example
Fixes: 75b54cb57c ("rxrpc: Add IPv6 support")
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Anderson Nascimento <anderson@allelesecurity.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Link: https://patch.msgid.link/20260408121252.2249051-22-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An AF_RXRPC socket can be both client and server at the same time. When
sending new calls (ie. it's acting as a client), it uses rx->key to set the
security, and when accepting incoming calls (ie. it's acting as a server),
it uses rx->securities.
setsockopt(RXRPC_SECURITY_KEY) sets rx->key to point to an rxrpc-type key
and setsockopt(RXRPC_SECURITY_KEYRING) sets rx->securities to point to a
keyring of rxrpc_s-type keys.
Now, it should be possible to use both rx->key and rx->securities on the
same socket - but for userspace AF_RXRPC sockets rxrpc_setsockopt()
prevents that.
Fix this by:
(1) Remove the incorrect check rxrpc_setsockopt(RXRPC_SECURITY_KEYRING)
makes on rx->key.
(2) Move the check that rxrpc_setsockopt(RXRPC_SECURITY_KEY) makes on
rx->key down into rxrpc_request_key().
(3) Remove rxrpc_request_key()'s check on rx->securities.
This (in combination with a previous patch) pushes the checks down into the
functions that set those pointers and removes the cross-checks that prevent
both key and keyring being set.
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Anderson Nascimento <anderson@allelesecurity.com>
cc: Luxiao Xu <rakukuip@gmail.com>
cc: Yuan Tan <yuantan098@gmail.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Link: https://patch.msgid.link/20260408121252.2249051-16-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxgk_verify_response() decodes auth_len from the packet and is supposed
to verify that it fits in the remaining bytes. The existing check is
inverted, so oversized RESPONSE authenticators are accepted and passed
to rxgk_decrypt_skb(), which can later reach skb_to_sgvec() with an
impossible length and hit BUG_ON(len).
Decoded from the original latest-net reproduction logs with
scripts/decode_stacktrace.sh:
RIP: __skb_to_sgvec()
[net/core/skbuff.c:5285 (discriminator 1)]
Call Trace:
skb_to_sgvec() [net/core/skbuff.c:5305]
rxgk_decrypt_skb() [net/rxrpc/rxgk_common.h:81]
rxgk_verify_response() [net/rxrpc/rxgk.c:1268]
rxrpc_process_connection()
[net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364
net/rxrpc/conn_event.c:386]
process_one_work() [kernel/workqueue.c:3281]
worker_thread()
[kernel/workqueue.c:3353 kernel/workqueue.c:3440]
kthread() [kernel/kthread.c:436]
ret_from_fork() [arch/x86/kernel/process.c:164]
Reject authenticator lengths that exceed the remaining packet payload.
Fixes: 9d1d2b5934 ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Signed-off-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: Willy Tarreau <w@1wt.eu>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Link: https://patch.msgid.link/20260408121252.2249051-14-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxgk_verify_authenticator() copies auth_len bytes into a temporary
buffer and then passes p + auth_len as the parser limit to
rxgk_do_verify_authenticator(). Since p is a __be32 *, that inflates the
parser end pointer by a factor of four and lets malformed RESPONSE
authenticators read past the kmalloc() buffer.
Decoded from the original latest-net reproduction logs with
scripts/decode_stacktrace.sh:
BUG: KASAN: slab-out-of-bounds in rxgk_verify_response()
Call Trace:
dump_stack_lvl() [lib/dump_stack.c:123]
print_report() [mm/kasan/report.c:379 mm/kasan/report.c:482]
kasan_report() [mm/kasan/report.c:597]
rxgk_verify_response()
[net/rxrpc/rxgk.c:1103 net/rxrpc/rxgk.c:1167
net/rxrpc/rxgk.c:1274]
rxrpc_process_connection()
[net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364
net/rxrpc/conn_event.c:386]
process_one_work() [kernel/workqueue.c:3281]
worker_thread()
[kernel/workqueue.c:3353 kernel/workqueue.c:3440]
kthread() [kernel/kthread.c:436]
ret_from_fork() [arch/x86/kernel/process.c:164]
Allocated by task 54:
rxgk_verify_response()
[include/linux/slab.h:954 net/rxrpc/rxgk.c:1155
net/rxrpc/rxgk.c:1274]
rxrpc_process_connection()
[net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364
net/rxrpc/conn_event.c:386]
Convert the byte count to __be32 units before constructing the parser
limit.
Fixes: 9d1d2b5934 ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Signed-off-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: Willy Tarreau <w@1wt.eu>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Link: https://patch.msgid.link/20260408121252.2249051-13-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>