Commit Graph

1430677 Commits

Author SHA1 Message Date
Jakub Kicinski
166b0cc6df selftests: drv-net: gro: remove TOTAL_HDR_LEN
Willem points out TOTAL_HDR_LEN is identical to MAX_HDR_LEN.
This seems to have been the case ever since the test was added.
Replace the uses of TOTAL_HDR_LEN with MAX_HDR_LEN, MAX seems
more common for what this value is.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:44 -07:00
Jakub Kicinski
5469b695f2 selftests: drv-net: gro: prepare for ip6ip6 support
Try to use already calculated offsets and not depend on the ipip
flag as much. This patch should not change any functionality,
it's just a cleanup to make ip6ip6 support easier.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:43 -07:00
Jakub Kicinski
d973484747 selftests: drv-net: gro: always wait for FIN in the capacity test
The new capacity/order test exits as soon as it sees the expected
packet sequence. This may allow the "flushing" FIN packet to spill
over to the next test. Let's always wait for the FIN before exiting.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:43 -07:00
Jakub Kicinski
436ea8a1b7 selftests: drv-net: gro: add 1 byte payload test
Small IPv4 packets get padded to 60B, this may break / confuse
some buggy implementations. Add a test to coalesce a 1B payload.
Keep this separate from the lrg_sml test because I suspect some
implementations may not handle this case (treat padded frames
as ineligible for coalescing).

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:43 -07:00
Jakub Kicinski
30f831b44a selftests: drv-net: gro: add data burst test case
Add a test trying to induce a GRO context timeout followed
by another sequence of packets for the same flow. The second
burst arrives 100ms after the first one so any implementation
(SW or HW) must time out waiting at that point. We expect both
bursts to be aggregated successfully but separately.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:42 -07:00
Russell King (Oracle)
789ec16eb3 net: stmmac: qcom-ethqos: set clk_csr
The clocks for qcom-ethqos return a rate of zero as firmware manages
their rate. According to hardware documentation, the clock which is
fed to the slave AHB interface can range between 50 to 100MHz for
non-RGMII and 30 to 75MHz for boards with a RGMII interfaces.

Currently, stmmac uses an undefined divisor value. Instead, use
STMMAC_CSR_60_100M which will mean we meet IEEE 802.3 specification
since this will generate:

  714kHz  @ 30MHz
  1.19MHz @ 50MHz
  1.79MHz @ 75MHz
  2.42MHz @ 100MHz

This gives MDC rates within the IEEE 802.3 specification, although the
30MHz case is particularly slow.

Selecting the next lowest divisor, STMMAC_CSR_35_60M, which is /26
will give:

   1.15MHz @ 30MHz
   1.92MHz @ 50MHz
   2.88MHz @ 75MHz (exceeding 802.3 spec)
   3.85MHz @ 100MHz (exceeding 802.3 spec)

Unfortunately, this divisor makes the upper bound of both ranges exeed
the IEEE 802.3 specification, and thus we can not use it without knowing
for certain what the current CSR clock rate actually is.

So, STMMAC_CSR_60_100M is the best fit for all boards based on the
information provided thus far.

Link: https://lore.kernel.org/r/acGhQ0oui+dVRdLY@oss.qualcomm.com
Link: https://lore.kernel.org/r/acw1habUsiSqlrky@oss.qualcomm.com
Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w8JKr-0000000EdLC-41Bt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 14:39:04 -07:00
Julian Braha
e2f152c822 stmmac: cleanup dead dependencies on STMMAC_PLATFORM and STMMAC_ETH in Kconfig
There are already 'if STMMAC_ETH' and 'STMMAC_PLATFORM'
conditions wrapping these config options, making the
'depends on' statements duplicate dependencies (dead code).

I propose leaving the outer 'if STMMAC_PLATFORM...endif' and
'if STMMAC_ETH...endif' conditions, and removing the
individual 'depends on' statements.

This dead code was found by kconfirm, a static analysis tool for Kconfig.

Signed-off-by: Julian Braha <julianbraha@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260402145858.240231-1-julianbraha@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 14:37:31 -07:00
Eric Dumazet
a9b460225e net: always inline some skb helpers
Some performance critical helpers from include/linux/skbuff.h
are not inlined by clang.

Use __always_inline hint for:

 - __skb_fill_netmem_desc()
 - __skb_fill_page_desc()
 - skb_fill_netmem_desc()
 - skb_fill_page_desc()
 - __skb_pull()
 - pskb_may_pull_reason()
 - pskb_may_pull()
 - pskb_pull()
 - pskb_trim()
 - skb_orphan()
 - skb_postpull_rcsum()
 - skb_header_pointer()
 - skb_clear_delivery_time()
 - skb_tstamp_cond()
 - skb_warn_if_lro()

This increases performance and saves ~1200 bytes of text.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 4/24 grow/shrink: 66/12 up/down: 4104/-5306 (-1202)
Function                                     old     new   delta
ip_multipath_l3_keys                           -     303    +303
tcp_sendmsg_locked                          4560    4848    +288
xfrm_input                                  6240    6455    +215
esp_output_head                             1516    1711    +195
skb_try_coalesce                             696     866    +170
bpf_prog_test_run_skb                       1951    2091    +140
tls_strp_read_copy                           528     667    +139
gue_udp_recv                                 738     871    +133
__ip6_append_data                           4159    4279    +120
__bond_xmit_hash                            1019    1122    +103
ip6_multipath_l3_keys                        394     495    +101
bpf_lwt_seg6_action                         1096    1197    +101
input_action_end_dx2                         344     442     +98
vxlan_remcsum                                487     581     +94
udpv6_queue_rcv_skb                          393     480     +87
udp_queue_rcv_skb                            385     471     +86
gue_remcsum                                  453     539     +86
udp_lib_checksum_complete                     84     168     +84
vxlan_xmit                                  2777    2857     +80
nf_reset_ct                                  456     532     +76
igmp_rcv                                    1902    1978     +76
mpls_forward                                1097    1169     +72
tcp_add_backlog                             1226    1292     +66
nfulnl_log_packet                           3091    3156     +65
tcp_rcv_established                         1966    2026     +60
__strp_recv                                 1547    1603     +56
eth_type_trans                               357     411     +54
bond_flow_ip                                 392     444     +52
__icmp_send                                 1584    1630     +46
ip_defrag                                   1636    1681     +45
tpacket_rcv                                 2793    2837     +44
refcount_add                                 132     176     +44
nf_ct_frag6_gather                          1959    2003     +44
napi_skb_free_stolen_head                    199     240     +41
__pskb_trim                                    -      41     +41
napi_reuse_skb                               319     358     +39
icmpv6_rcv                                  1877    1916     +39
br_handle_frame_finish                      1672    1711     +39
ip_rcv_core                                  841     879     +38
ip_check_defrag                              377     415     +38
br_stp_rcv                                   909     947     +38
qdisc_pkt_len_segs_init                      366     399     +33
mld_query_work                              2945    2975     +30
bpf_sk_assign_tcp_reqsk                      607     637     +30
udp_gro_receive                             1657    1686     +29
ip6_rcv_core                                1170    1193     +23
ah_input                                    1176    1197     +21
tun_get_user                                5174    5194     +20
llc_rcv                                      815     834     +19
__pfx_udp_lib_checksum_complete               16      32     +16
__pfx_refcount_add                            48      64     +16
__pfx_nf_reset_ct                             96     112     +16
__pfx_ip_multipath_l3_keys                     -      16     +16
__pfx___pskb_trim                              -      16     +16
packet_sendmsg                              5771    5781     +10
esp_output_tail                             1460    1470     +10
alloc_skb_with_frags                         433     443     +10
xsk_generic_xmit                            3477    3486      +9
mptcp_sendmsg_frag                          2250    2259      +9
__ip_append_data                            4166    4175      +9
__ip6_tnl_rcv                               1159    1168      +9
skb_zerocopy                                1215    1220      +5
gre_parse_header                            1358    1362      +4
__iptunnel_pull_header                       405     407      +2
skb_vlan_untag                               692     693      +1
psp_dev_rcv                                  701     702      +1
netkit_xmit                                 1263    1264      +1
gre_rcv                                     2776    2777      +1
gre_gso_segment                             1521    1522      +1
bpf_skb_net_hdr_pop                          535     536      +1
udp6_ufo_fragment                            888     884      -4
br_multicast_rcv                            9154    9148      -6
snap_rcv                                     312     305      -7
skb_copy_ubufs                              1841    1834      -7
__pfx_skb_tstamp_cond                         16       -     -16
__pfx_skb_clear_delivery_time                 16       -     -16
__pfx_pskb_trim                               16       -     -16
__pfx_pskb_pull                               16       -     -16
ipv6_gso_segment                            1400    1383     -17
ipv6_frag_rcv                               2511    2492     -19
erspan_xmit                                 1221    1190     -31
__pfx_skb_warn_if_lro                         32       -     -32
__pfx___skb_fill_page_desc                    32       -     -32
skb_tstamp_cond                               42       -     -42
pskb_trim                                     46       -     -46
__pfx_skb_postpull_rcsum                      48       -     -48
tcp_gso_segment                             1524    1475     -49
skb_clear_delivery_time                       54       -     -54
__pfx_skb_fill_page_desc                      64       -     -64
__pfx_skb_header_pointer                      80       -     -80
pskb_pull                                     91       -     -91
skb_warn_if_lro                              110       -    -110
tcp_v6_rcv                                  3288    3170    -118
__pfx___skb_pull                             128       -    -128
__pfx_skb_orphan                             144       -    -144
__pfx_pskb_may_pull                          160       -    -160
tcp_v4_rcv                                  3334    3153    -181
__skb_fill_page_desc                         231       -    -231
udp_rcv                                     1809    1553    -256
skb_postpull_rcsum                           318       -    -318
skb_header_pointer                           367       -    -367
fib_multipath_hash                          3399    3018    -381
skb_orphan                                   513       -    -513
skb_fill_page_desc                           534       -    -534
__skb_pull                                   568       -    -568
pskb_may_pull                                604       -    -604
Total: Before=29652698, After=29651496, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260402152654.1720627-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 14:36:48 -07:00
Jakub Kicinski
8b0e64d6c9 Merge branch 'enic-sr-iov-v2-preparatory-infrastructure'
Satish Kharat says:

====================
enic: SR-IOV V2 preparatory infrastructure

This is the first of four series adding SR-IOV V2 support to the enic
driver for Cisco VIC 14xx/15xx adapters.

The existing V1 SR-IOV implementation has VFs that interact directly
with the VIC firmware, leaving the PF driver with no visibility or
control over VF behavior. V2 introduces a PF-mediated model where VFs
communicate with the PF through a mailbox over a dedicated admin
channel. This brings enic in line with the standard Linux SR-IOV
model, enabling full PF management of VFs via ip link (MAC, VLAN,
link state, spoofchk, trust, and per-VF statistics).

This preparatory series adds detection and resource helper code with
no functional change to existing driver behavior:

  - Extend BAR resource discovery for admin channel resources
  - Register the V2 VF PCI device ID
  - Detect VF type (V1/V2/usNIC) from SR-IOV PCI capability
  - Make enic_dev_enable/disable ref-counted for shared use by data
    path and admin channel
  - Add type-aware resource allocation for admin WQ/RQ/CQ/INTR
  - Detect presence of admin channel resources at probe time

Tested on VIC 14xx and 15xx series adapters with V2 VFs under KVM
(sriov_numvfs, VF passthrough, ip link VF configuration, VF traffic).

Based in part on initial work by Christian Benvenuti.
====================

Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:05:08 -07:00
Satish Kharat
4368f5fab4 enic: detect admin channel resources for SR-IOV
Check for the presence of admin channel BAR resources
(RES_TYPE_ADMIN_WQ, ADMIN_RQ, ADMIN_CQ, SRIOV_INTR) during resource
discovery. Set has_admin_channel when all four are available.

Use ARRAY_SIZE(enic->admin_cq) for the admin CQ count check since the
driver allocates two admin CQs (one for WQ completions, one for RQ
completions) and both must be backed by hardware resources.

Add admin WQ, RQ, CQ and INTR fields to struct enic for use by the
upcoming admin channel open/close paths.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-6-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:05:06 -07:00
Satish Kharat
730ce15d44 enic: add type-aware alloc for WQ, RQ, CQ and INTR resources
The existing vnic_wq_alloc(), vnic_rq_alloc(), vnic_cq_alloc() and
vnic_intr_alloc() hardcode data-path resource types (RES_TYPE_WQ,
RES_TYPE_RQ, RES_TYPE_CQ, RES_TYPE_INTR_CTRL). The upcoming admin
channel uses different BAR resource types (RES_TYPE_ADMIN_WQ/RQ/CQ,
RES_TYPE_SRIOV_INTR) for its queues.

Add _with_type() variants that accept an explicit resource type
parameter. Refactor the original functions as thin wrappers that
pass the default data-path type. No functional change.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-5-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:05:06 -07:00
Satish Kharat
0266ecb59d enic: make enic_dev_enable/disable ref-counted
Both the data path (ndo_open/ndo_stop) and the upcoming admin channel
need to enable and disable the vNIC device independently. Without
reference counting, closing the admin channel while the netdev is up
would inadvertently disable the entire device.

Add an enable_count to struct enic, protected by the existing
devcmd_lock. enic_dev_enable() issues CMD_ENABLE_WAIT only on the
first caller (0 -> 1 transition), and enic_dev_disable() issues
CMD_DISABLE only when the last caller releases (1 -> 0 transition).

Also check the return value of enic_dev_enable() in enic_open() and
fail the open if the firmware enable command fails. Without this check,
a failed enable leaves enable_count at zero while the interface appears
up, which can cause a later admin channel enable/disable cycle to
incorrectly disable the hardware under the active data path.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-4-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:05:06 -07:00
Satish Kharat
56a4d7a865 enic: detect SR-IOV VF type from PCI capability
Read the VF device ID from the SR-IOV PCI capability at probe time to
determine whether the PF is configured for V1, USNIC, or V2 virtual
functions. Store the result in enic->vf_type for use by subsequent
SR-IOV operations.

The VF type is a firmware-configured property (set via UCSM, CIMC,
Intersight etc) that is immutable from the driver's perspective. Only
PFs are probed for this capability; VFs and dynamic vnics skip
detection.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-3-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:05:06 -07:00
Satish Kharat
803a1b0202 enic: add V2 SR-IOV VF device ID
Register the V2 VF PCI device ID (0x02b7) so the driver binds to V2
virtual functions created via sriov_configure. Update enic_is_sriov_vf()
to recognize V2 VFs alongside the existing V1 type.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-2-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:05:06 -07:00
Satish Kharat
74fb32ed73 enic: extend resource discovery for SR-IOV admin channel
VIC firmware exposes admin channel resources (WQ, RQ, CQ) for PF-VF
communication when SR-IOV is active. Add the corresponding resource
type definitions and teach the discovery and access functions to
handle them.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-1-d5834b2ef1b9@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:05:05 -07:00
Jakub Kicinski
0ea7e61f65 Merge branch 'net-phy-microchip-add-downshift-support-for-lan88xx'
Nicolai Buchwitz says:

====================
net: phy: microchip: add downshift support for LAN88xx

Add standard ETHTOOL_PHY_DOWNSHIFT tunable support for the Microchip
LAN88xx PHY, following the same pattern used by Marvell and other PHY
drivers.

Ethernet cables with faulty or missing pairs (specifically C and D)
can successfully auto-negotiate 1000BASE-T but fail to establish a
stable link. The LAN88xx PHY supports automatic downshift to
100BASE-TX after a configurable number of failed attempts (2-5).

Patch 1 adds the get/set tunable implementation.
Patch 2 enables downshift by default with a count of 2. The setting is
stored in the driver's private data so that user changes via ethtool are
preserved across suspend/resume cycles.

Based on an earlier downstream implementation by Phil Elwell.

Tested on Raspberry Pi 3B+ (LAN7515/LAN88xx).
====================

Link: https://patch.msgid.link/20260401123848.696766-1-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:03:07 -07:00
Nicolai Buchwitz
70180f72d9 net: phy: microchip: enable downshift by default on LAN88xx
Enable auto-downshift from 1000BASE-T to 100BASE-TX after 2 failed
auto-negotiation attempts by default. This ensures that links with
faulty or missing cable pairs (C and D) fall back to 100Mbps without
requiring userspace configuration.

The downshift count is stored in the driver's private data and applied
in config_init, so user changes via ethtool are preserved across
suspend/resume cycles.

Users can override or disable downshift at runtime:

  ethtool --set-phy-tunable eth0 downshift off

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260401123848.696766-3-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:03:03 -07:00
Nicolai Buchwitz
e417ac73d2 net: phy: microchip: add downshift tunable support for LAN88xx
Implement the standard ETHTOOL_PHY_DOWNSHIFT tunable for the LAN88xx
PHY. This allows runtime configuration of the auto-downshift feature
via ethtool:

  ethtool --set-phy-tunable eth0 downshift on count 3

The LAN88xx PHY supports downshifting from 1000BASE-T to 100BASE-TX
after 2-5 failed auto-negotiation attempts. Valid count values are
2, 3, 4 and 5.

This is based on an earlier downstream implementation by Phil Elwell.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260401123848.696766-2-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:03:03 -07:00
Chih Kai Hsu
86f5dd4e0f r8152: Add helper functions for SRAM2
Add the following helper functions for SRAM2 access to simplify the code
and improve readability:

- sram2_write() - write data to SRAM2 address
- sram2_read() - read data from SRAM2 address
- sram2_write_w0w1() - read-modify-write operation

Signed-off-by: Chih Kai Hsu <hsu.chih.kai@realtek.com>
Reviewed-by: Hayes Wang <hayeswang@realtek.com>
Link: https://patch.msgid.link/20260401115542.34601-1-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 18:01:06 -07:00
Daniel Wagner
7eaff1eff0 net: phy: bcm84881: add LED framework support for BCM84891/BCM84892
Expose LED1 and LED2 pins via the PHY LED framework. Each pin has a
source mask (MASK_LOW + MASK_EXT registers) selecting which hardware
events light it, plus a CTL field in the shared 0xA83B register
(RMW; LED4 is firmware-controlled per the datasheet).

Hardware can offload per-speed link triggers (1000/2500/5000/10000),
RX/TX activity, and force-on. LINK_100 is accepted only alongside
LINK_1000: source bit 4 lights at both speeds and 100-alone isn't
representable, so the unrepresentable case falls to software.

The chip has five LED pins; only LED1/LED2 are exposed here as those
are the only ones characterized on tested hardware. LED4 is firmware-
controlled regardless of strap configuration.

Tested on TRENDnet TEG-S750 (LED1/LED2 wired to an antiparallel
bicolor LED): brightness_set via sysfs; netdev trigger offloaded=1
with amber lit at 100M/1G/2.5G and green lit at 10G via respective
link_* modes; LED off immediately on cable unplug with no software
involvement.

Signed-off-by: Daniel Wagner <wagner.daniel.t@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260401114931.3091818-1-wagner.daniel.t@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 17:59:43 -07:00
Jakub Kicinski
9b79da5d69 Merge branch 'macvlan-broadcast-delivery-changes'
Eric Dumazet says:

====================
macvlan: broadcast delivery changes

First patch adds data-race annotations.

Second patch changes macvlan_broadcast_enqueue() to return
early if the queue is full.
====================

Link: https://patch.msgid.link/20260401103809.3038139-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 17:56:37 -07:00
Eric Dumazet
0d5dc1d7aa macvlan: avoid spinlock contention in macvlan_broadcast_enqueue()
Under high stress, we spend a lot of time cloning skbs,
then acquiring a spinlock, then freeing the clone because
the queue is full.

Add a shortcut to avoid these costs under pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260401103809.3038139-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 17:56:35 -07:00
Eric Dumazet
1ef5789d99 macvlan: annotate data-races around port->bc_queue_len_used
port->bc_queue_len_used is read and written locklessly,
add READ_ONCE()/WRITE_ONCE() annotations.

While WRITE_ONCE() in macvlan_fill_info() is not yet needed,
it is a prereq for future RTNL avoidance.

Fixes: d4bff72c84 ("macvlan: Support for high multicast packet rate")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260401103809.3038139-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 17:56:35 -07:00
Jakub Kicinski
f35340f2d6 Merge branch 'net-stmmac-tso-fixes-cleanups'
Russell King says:

====================
net: stmmac: TSO fixes/cleanups

This is a more refined version of the previous patch series fixing
and cleaning up the TSO code.

I'm not sure whether "TSO" or "GSO" should be used to describe this
feature - although it primarily handles TCP, dwmac4 appears to also
be able to handle UDP.

In essence, this series adds a .ndo_features_check() method to handle
whether TSO/GSO can be used for a particular skbuff - checking which
queue the skbuff is destined for and whether that has TBS available
which precludes TSO being enabled on that channel.

I'm also adding a check that the header is smaller than 1024 bytes,
as documented in those sources which have TSO support - this is due
to the hardware buffering the header in "TSO memory" which I guess
is limited to 1KiB. I expect this test never to trigger, but if
the headers ever exceed that size, the hardware will likely fail.
While IPv4 headers are unlikely to be anywhere near this, there is
nothing in the protocol which prevents IPv6 headers up to 64KiB.

As we now have a .ndo_features_check() method, I'm moving the VLAN
insertion for TSO packets into core code by unpublishing the VLAN
insertion features when we use TSO. Another move is for checksumming,
which is required for TSO, but stmmac's requirements for offloading
checksums are more strict - and this seems to be a bug in the TSO
path.

I've changed the hardware initialisation to always enable TSO support
on the channels even if the user requests TSO/GSO to be disabled -
this fixes another issue as pointed out by Jakub in a previous review.

I'm moving the setup of the GSO features, cleaning those up, and
adding a warning if platform glue requests this to be enabled but the
hardware has no support. Hopefully this will never trigger if everyone
got the STMMAC_FLAG_TSO_EN flag correct. Also adding a check for TxPBL
value.

Finally, moving the "TSO supported" message to the new
stmmac_set_gso_features() function so keep all this TSO stuff together.
====================

Link: https://patch.msgid.link/aczHVF04LIGq_lYO@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:23 -07:00
Russell King (Oracle)
0f96212a51 net: stmmac: move "TSO supported" message to stmmac_set_gso_features()
Move the "TSO supported" message to stmmac_set_gso_features() so that
we group all probe-time TSO stuff in one place.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu8-0000000Eau5-3Zne@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:20 -07:00
Russell King (Oracle)
33f5cc83bb net: stmmac: check txpbl for TSO
Documentation states that TxPBL must be >= 4 to allow TSO support, but
the driver doesn't check this. TxPBL comes from the platform glue code
or DT. Add a check with a warning if platform glue code attempts to
enable TSO support with TxPBL too low.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu3-0000000Eatz-39ts@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:20 -07:00
Russell King (Oracle)
f8c70ab540 net: stmmac: add warning when TSO is requested but unsupported
Add a warning message if TSO is requested by the platform glue code but
the core wasn't configured for TSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pty-0000000Eatt-2TjZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:20 -07:00
Russell King (Oracle)
6ad0044428 net: stmmac: make stmmac_set_gso_features() more readable
Make stmmac_set_gso_features() more readable by adding some whitespace
and getting rid of the indentation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptt-0000000Eatn-1ziK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:20 -07:00
Russell King (Oracle)
c04939cb98 net: stmmac: split out gso features setup
Move the GSO features setup into a separate function, co-loated with
other GSO/TSO support.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pto-0000000Eath-1VDH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:19 -07:00
Russell King (Oracle)
2e4082e4b7 net: stmmac: simplify GSO/TSO test in stmmac_xmit()
The test in stmmac_xmit() to see whether we should pass the skbuff to
stmmac_tso_xmit() is more complex than it needs to be. This test can
be simplified by storing the mask of GSO types that we will pass, and
setting it according to the enabled features.

Note that "tso" is a mis-nomer since commit b776620651 ("net:
stmmac: Implement UDP Segmentation Offload"). Also note that this
commit controls both via the TSO feature. We preserve this behaviour
in this commit.

Also, this commit unconditionally accessed skb_shinfo(skb)->gso_type
for all frames, even when skb_is_gso() was false. This access is
eliminated.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptj-0000000Eatb-11zK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:19 -07:00
Russell King (Oracle)
b55dfb173c net: stmmac: move check for hardware checksum supported
Add a check in .ndo_features_check() to indicate whether hardware
checksum can be performed on the skbuff. Where hardware checksum is
not supported - either because the channel does not support Tx COE
or the skb isn't suitable (stmmac uses a tighter test than
can_checksum_protocol()) we also need to disable TSO, which will be
done by harmonize_features() in net/core/dev.c

This fixes a bug where a channel which has COE disabled may still
receive TSO skbuffs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pte-0000000EatU-0ILt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:19 -07:00
Russell King (Oracle)
3f6a6eb9ef net: stmmac: move TSO VLAN tag insertion to core code
stmmac_tso_xmit() checks whether the skbuff is trying to offload
vlan tag insertion to hardware, which from the comment in the code
appears to be buggy when the TSO feature is used.

Rather than stmmac_tso_xmit() inserting the VLAN tag, handle this
in stmmac_features_check() which will then use core net code to
handle this. See net/core/dev.c::validate_xmit_skb()

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptY-0000000EatO-42Qv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:19 -07:00
Russell King (Oracle)
c05a81cbee net: stmmac: add GSO MSS checks
Add GSO MSS checks to stmmac_features_check().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptT-0000000EatI-3feh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:19 -07:00
Russell King (Oracle)
6732e474f8 net: stmmac: add TSO check for header length
According to the STM32MP151 documentation which covers dwmac v4.2, the
hardware TSO feature can handle header lengths up to a maximum of 1023
bytes.

Add a .ndo_features_check() method implementation to check the header
length meets these requirements, otherwise fall back to software GSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptO-0000000EatC-39il@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:18 -07:00
Russell King (Oracle)
f799b5dab9 net: stmmac: add stmmac_tso_header_size()
We will need to compute the size of the protocol headers in two places,
so move this into a separate function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptJ-0000000Eat5-2ZlA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:18 -07:00
Russell King (Oracle)
e32820264c net: stmmac: fix TSO support when some channels have TBS available
According to the STM32MP25xx manual, which is dwmac v5.3, TBS (time
based scheduling) is not permitted for channels which have hardware
TSO enabled. Intel's commit 5e6038b88a ("net: stmmac: fix TSO and
TBS feature enabling during driver open") concurs with this, but it
is incomplete.

This commit avoids enabling TSO support on the channels which have
TBS available, which, as far as the hardware is concerned, means we
do not set the TSE bit in the DMA channel's transmit control register.

However, the net device's features apply to all queues(channels), which
means these channels may still be handed TSO skbs to transmit, and the
driver will pass them to stmmac_tso_xmit(). This will generate the
descriptors for TSO, even though the channel has the TSE bit clear.

Fix this by checking whether the queue(channel) has TBS available,
and if it does, fall back to software GSO support.

Fixes: 5e6038b88a ("net: stmmac: fix TSO and TBS feature enabling during driver open")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptE-0000000Easz-28tv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:18 -07:00
Russell King (Oracle)
afe840ddf1 net: stmmac: fix .ndo_fix_features()
netdev features documentation requires that .ndo_fix_features() is
stateless: it shouldn't modify driver state. Yet, stmmac_fix_features()
does exactly that, changing whether GSO frames are processed by the
driver.

Move this code to stmmac_set_features() instead, which is the correct
place for it. We don't need to check whether TSO is supported; this
is already handled via the setup of netdev->hw_features, and we are
guaranteed that if netdev->hw_features indicates that a feature is
not supported, .ndo_set_features() won't be called with it set.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pt9-0000000East-1YAO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:18 -07:00
Russell King (Oracle)
989a9c20f6 net: stmmac: fix channel TSO enable on resume
Rather than configuring the channels depending on whether GSO/TSO is
currently enabled by the user, always enable if the hardware has TSO
support and the platform wants TSO to be enabled.

This avoids the channel TSO enable bit being disabled after a resume
when the user has disabled TSO features. This will cause problems when
the user re-enables TSO.

This bug goes back to commit f748be531d ("stmmac: support new GMAC4")

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pt4-0000000Easn-14WL@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:28:18 -07:00
Jakub Kicinski
8ffb33d770 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-7.0-rc7).

Conflicts:

net/vmw_vsock/af_vsock.c
  b18c833888 ("vsock: initialize child_ns_mode_locked in vsock_net_init()")
  0de607dc4f ("vsock: add G2H fallback for CIDs not owned by H2G transport")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
  ceee35e567 ("bnxt_en: Refactor some basic ring setup and adjustment logic")
  57cdfe0dc7 ("bnxt_en: Resize RSS contexts on channel count change")

drivers/net/wireless/intel/iwlwifi/mld/mac80211.c
  4d56037a02 ("wifi: iwlwifi: mld: block EMLSR during TDLS connections")
  687a95d204 ("wifi: iwlwifi: mld: correctly set wifi generation data")

drivers/net/wireless/intel/iwlwifi/mld/scan.h
  b6045c899e ("wifi: iwlwifi: mld: Refactor scan command handling")
  ec66ec6a5a ("wifi: iwlwifi: mld: Fix MLO scan timing")

drivers/net/wireless/intel/iwlwifi/mvm/fw.c
  078df640ef ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v
2")
  323156c354 ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:03:13 -07:00
Linus Torvalds
f8f5627a8a Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "With fixes from wireless, bluetooth and netfilter included we're back
  to each PR carrying 30%+ more fixes than in previous era.

  The good news is that so far none of the "extra" fixes are themselves
  causing real regressions. Not sure how much comfort that is.

  Current release - fix to a fix:

   - netdevsim: fix build if SKB_EXTENSIONS=n

   - eth: stmmac: skip VLAN restore when VLAN hash ops are missing

  Previous releases - regressions:

   - wifi: iwlwifi: mvm: don't send a 6E related command when
     not supported

  Previous releases - always broken:

   - some info leak fixes

   - add missing clearing of skb->cb[] on ICMP paths from tunnels

   - ipv6:
      - flowlabel: defer exclusive option free until RCU teardown
      - avoid overflows in ip6_datagram_send_ctl()

   - mpls: add seqcount to protect platform_labels from OOB access

   - bridge: improve safety of parsing ND options

   - bluetooth: fix leaks, overflows and races in hci_sync

   - netfilter: add more input validation, some to address bugs directly
     some to prevent exploits from cooking up broken configurations

   - wifi:
      - ath: avoid poor performance due to stopping the wrong
        aggregation session
      - virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free

   - eth:
      - fec: fix the PTP periodic output sysfs interface
      - enetc: safely reinitialize TX BD ring when it has unsent frames"

* tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
  eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
  ipv6: avoid overflows in ip6_datagram_send_ctl()
  net: hsr: fix VLAN add unwind on slave errors
  net: hsr: serialize seq_blocks merge across nodes
  vsock: initialize child_ns_mode_locked in vsock_net_init()
  selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks
  net/sched: cls_flow: fix NULL pointer dereference on shared blocks
  net/sched: cls_fw: fix NULL pointer dereference on shared blocks
  net/x25: Fix overflow when accumulating packets
  net/x25: Fix potential double free of skb
  bnxt_en: Restore default stat ctxs for ULP when resource is available
  bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()
  bnxt_en: Refactor some basic ring setup and adjustment logic
  net/mlx5: Fix switchdev mode rollback in case of failure
  net/mlx5: Avoid "No data available" when FW version queries fail
  net/mlx5: lag: Check for LAG device before creating debugfs
  net: macb: properly unregister fixed rate clocks
  net: macb: fix clk handling on PCI glue driver removal
  virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN
  net/sched: sch_netem: fix out-of-bounds access in packet corruption
  ...
2026-04-02 09:57:06 -07:00
Linus Torvalds
4c2c526b5a Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:

 - IOMMU-PT related compile breakage in for AMD driver

 - IOTLB flushing behavior when unmapped region is larger than requested
   due to page-sizes

 - Fix IOTLB flush behavior with empty gathers

* tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline
  iommupt: Fix short gather if the unmap goes into a large mapping
  iommu: Do not call drivers for empty gathers
2026-04-02 09:53:16 -07:00
Linus Torvalds
2ec9074b28 Merge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "People have been so busy for hunting and we're still getting more
  changes than wished for, but it doesn't look too scary; almost all
  changes are device-specific small fixes.

  I guess it's rather a casual bump, and no more Easter eggs are left
  for 7.0 (hopefully)...

   - Fixes for the recent regression on ctxfi driver

   - Fix missing INIT_LIST_HEAD() for ASoC card_aux_list

   - Usual HD- and USB-audio, and ASoC AMD quirk updates

   - ASoC fixes for AMD and Intel"

* tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log
  ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP
  ALSA: hda/realtek: add quirk for Acer Swift SFG14-73
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9
  ASoC: Intel: boards: fix unmet dependency on PINCTRL
  ASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_params
  ALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initialization
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10
  ALSA: hda/realtek: add quirk for HP Laptop 15-fc0xxx
  ASoC: ep93xx: Fix unchecked clk_prepare_enable() and add rollback on failure
  ASoC: soc-core: call missing INIT_LIST_HEAD() for card_aux_list
  ALSA: hda/realtek: Add quirk for Samsung Book2 Pro 360 (NP950QED)
  ASoC: amd: yc: Add DMI entry for HP Laptop 15-fc0xxx
  ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 16X OLED M7601RM
  ALSA: hda/realtek: Add quirk for ASUS ROG Strix SCAR 15
  ALSA: usb-audio: Exclude Scarlett Solo 1st Gen from SKIP_IFACE_SETUP
  ALSA: caiaq: fix stack out-of-bounds read in init_card
  ALSA: ctxfi: Check the error for index mapping
  ALSA: ctxfi: Fix missing SPDIFI1 index handling
  ALSA: hda/realtek: add quirk for HP Victus 15-fb0xxx
  ...
2026-04-02 09:41:21 -07:00
Linus Torvalds
2064d7784e Merge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay
Pull auxdisplay fixes from Andy Shevchenko:

 - Fix NULL dereference in linedisp_release()

 - Fix ht16k33 DT bindings to avoid warnings

 - Handle errors in I²C transfers in lcd2s driver

* tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
  auxdisplay: line-display: fix NULL dereference in linedisp_release
  auxdisplay: lcd2s: add error handling for i2c transfers
  dt-bindings: auxdisplay: ht16k33: Use unevaluatedProperties to fix common property warning
2026-04-02 09:34:22 -07:00
Dimitri Daskalakis
ec7067e661 eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
On systems with 64K pages, RX queues will be wedged if users set the
descriptor count to the current minimum (16). Fbnic fragments large
pages into 4K chunks, and scales down the ring size accordingly. With
64K pages and 16 descriptors, the ring size mask is 0 and will never
be filled.

32 descriptors is another special case that wedges the RX rings.
Internally, the rings track pages for the head/tail pointers, not page
fragments. So with 32 descriptors, there's only 1 usable page as one
ring slot is kept empty to disambiguate between an empty/full ring.
As a result, the head pointer never advances and the HW stalls after
consuming 16 page fragments.

Fixes: 0cb4c0a137 ("eth: fbnic: Implement Rx queue alloc/start/stop/free")
Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260401162848.2335350-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 08:38:34 -07:00
Eric Dumazet
4e45337556 ipv6: avoid overflows in ip6_datagram_send_ctl()
Yiming Qian reported :
<quote>
 I believe I found a locally triggerable kernel bug in the IPv6 sendmsg
 ancillary-data path that can panic the kernel via `skb_under_panic()`
 (local DoS).

 The core issue is a mismatch between:

 - a 16-bit length accumulator (`struct ipv6_txoptions::opt_flen`, type
 `__u16`) and
 - a pointer to the *last* provided destination-options header (`opt->dst1opt`)

 when multiple `IPV6_DSTOPTS` control messages (cmsgs) are provided.

 - `include/net/ipv6.h`:
   - `struct ipv6_txoptions::opt_flen` is `__u16` (wrap possible).
 (lines 291-307, especially 298)
 - `net/ipv6/datagram.c:ip6_datagram_send_ctl()`:
   - Accepts repeated `IPV6_DSTOPTS` and accumulates into `opt_flen`
 without rejecting duplicates. (lines 909-933)
 - `net/ipv6/ip6_output.c:__ip6_append_data()`:
   - Uses `opt->opt_flen + opt->opt_nflen` to compute header
 sizes/headroom decisions. (lines 1448-1466, especially 1463-1465)
 - `net/ipv6/ip6_output.c:__ip6_make_skb()`:
   - Calls `ipv6_push_frag_opts()` if `opt->opt_flen` is non-zero.
 (lines 1930-1934)
 - `net/ipv6/exthdrs.c:ipv6_push_frag_opts()` / `ipv6_push_exthdr()`:
   - Push size comes from `ipv6_optlen(opt->dst1opt)` (based on the
 pointed-to header). (lines 1179-1185 and 1206-1211)

 1. `opt_flen` is a 16-bit accumulator:

 - `include/net/ipv6.h:298` defines `__u16 opt_flen; /* after fragment hdr */`.

 2. `ip6_datagram_send_ctl()` accepts *repeated* `IPV6_DSTOPTS` cmsgs
 and increments `opt_flen` each time:

 - In `net/ipv6/datagram.c:909-933`, for `IPV6_DSTOPTS`:
   - It computes `len = ((hdr->hdrlen + 1) << 3);`
   - It checks `CAP_NET_RAW` using `ns_capable(net->user_ns,
 CAP_NET_RAW)`. (line 922)
   - Then it does:
     - `opt->opt_flen += len;` (line 927)
     - `opt->dst1opt = hdr;` (line 928)

 There is no duplicate rejection here (unlike the legacy
 `IPV6_2292DSTOPTS` path which rejects duplicates at
 `net/ipv6/datagram.c:901-904`).

 If enough large `IPV6_DSTOPTS` cmsgs are provided, `opt_flen` wraps
 while `dst1opt` still points to a large (2048-byte)
 destination-options header.

 In the attached PoC (`poc.c`):

 - 32 cmsgs with `hdrlen=255` => `len = (255+1)*8 = 2048`
 - 1 cmsg with `hdrlen=0` => `len = 8`
 - Total increment: `32*2048 + 8 = 65544`, so `(__u16)opt_flen == 8`
 - The last cmsg is 2048 bytes, so `dst1opt` points to a 2048-byte header.

 3. The transmit path sizes headers using the wrapped `opt_flen`:

- In `net/ipv6/ip6_output.c:1463-1465`:
  - `headersize = sizeof(struct ipv6hdr) + (opt ? opt->opt_flen +
 opt->opt_nflen : 0) + ...;`

 With wrapped `opt_flen`, `headersize`/headroom decisions underestimate
 what will be pushed later.

 4. When building the final skb, the actual push length comes from
 `dst1opt` and is not limited by wrapped `opt_flen`:

 - In `net/ipv6/ip6_output.c:1930-1934`:
   - `if (opt->opt_flen) proto = ipv6_push_frag_opts(skb, opt, proto);`
 - In `net/ipv6/exthdrs.c:1206-1211`, `ipv6_push_frag_opts()` pushes
 `dst1opt` via `ipv6_push_exthdr()`.
 - In `net/ipv6/exthdrs.c:1179-1184`, `ipv6_push_exthdr()` does:
   - `skb_push(skb, ipv6_optlen(opt));`
   - `memcpy(h, opt, ipv6_optlen(opt));`

 With insufficient headroom, `skb_push()` underflows and triggers
 `skb_under_panic()` -> `BUG()`:

 - `net/core/skbuff.c:2669-2675` (`skb_push()` calls `skb_under_panic()`)
 - `net/core/skbuff.c:207-214` (`skb_panic()` ends in `BUG()`)

 - The `IPV6_DSTOPTS` cmsg path requires `CAP_NET_RAW` in the target
 netns user namespace (`ns_capable(net->user_ns, CAP_NET_RAW)`).
 - Root (or any task with `CAP_NET_RAW`) can trigger this without user
 namespaces.
 - An unprivileged `uid=1000` user can trigger this if unprivileged
 user namespaces are enabled and it can create a userns+netns to obtain
 namespaced `CAP_NET_RAW` (the attached PoC does this).

 - Local denial of service: kernel BUG/panic (system crash).
 - Reproducible with a small userspace PoC.
</quote>

This patch does not reject duplicated options, as this might break
some user applications.

Instead, it makes sure to adjust opt_flen and opt_nflen to correctly
reflect the size of the current option headers, preventing the overflows
and the potential for panics.

This applies to IPV6_DSTOPTS, IPV6_HOPOPTS, and IPV6_RTHDR.

Specifically:

When a new IPV6_DSTOPTS is processed, the length of the old opt->dst1opt
is subtracted from opt->opt_flen before adding the new length.

When a new IPV6_HOPOPTS is processed, the length of the old opt->dst0opt
is subtracted from opt->opt_nflen.

When a new Routing Header (IPV6_RTHDR or IPV6_2292RTHDR) is processed,
the length of the old opt->srcrt is subtracted from opt->opt_nflen.

In the special case within IPV6_2292RTHDR handling where dst1opt is moved
to dst0opt, the length of the old opt->dst0opt is subtracted from
opt->opt_nflen before the new one is added.

Fixes: 333fad5364 ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Closes: https://lore.kernel.org/netdev/CAL_bE8JNzawgr5OX5m+3jnQDHry2XxhQT5=jThW1zDPtUikRYA@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260401154721.3740056-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 08:25:22 -07:00
Jakub Kicinski
be193568be Merge branch 'net-hsr-fixes-for-prp-duplication-and-vlan-unwind'
Luka Gejak says:

====================
net: hsr: fixes for PRP duplication and VLAN unwind

This series addresses two logic bugs in the HSR/PRP implementation
identified during a protocol audit. These are targeted for the 'net'
tree as they fix potential memory corruption and state inconsistency.

The primary change resolves a race condition in the node merging path by
implementing address-based lock ordering. This ensures that concurrent
mutations of sequence blocks do not lead to state corruption or
deadlocks.

An additional fix corrects asymmetric VLAN error unwinding by
implementing a centralized unwind path on slave errors.
====================

Link: https://patch.msgid.link/20260401092243.52121-1-luka.gejak@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 08:23:55 -07:00
Luka Gejak
2e3514e63b net: hsr: fix VLAN add unwind on slave errors
When vlan_vid_add() fails for a secondary slave, the error path calls
vlan_vid_del() on the failing port instead of the peer slave that had
already succeeded. This results in asymmetric VLAN state across the HSR
pair.

Fix this by switching to a centralized unwind path that removes the VID
from any slave device that was already programmed.

Fixes: 1a8a63a530 ("net: hsr: Add VLAN CTAG filter support")
Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
Link: https://patch.msgid.link/20260401092243.52121-3-luka.gejak@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 08:23:49 -07:00
Luka Gejak
f5df2990c3 net: hsr: serialize seq_blocks merge across nodes
During node merging, hsr_handle_sup_frame() walks node_curr->seq_blocks
to update node_real without holding node_curr->seq_out_lock. This
allows concurrent mutations from duplicate registration paths, risking
inconsistent state or XArray/bitmap corruption.

Fix this by locking both nodes' seq_out_lock during the merge.
To prevent ABBA deadlocks, locks are acquired in order of memory
address.

Reviewed-by: Felix Maurer <fmaurer@redhat.com>
Fixes: 415e636751 ("hsr: Implement more robust duplicate discard for PRP")
Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
Link: https://patch.msgid.link/20260401092243.52121-2-luka.gejak@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 08:23:49 -07:00
Stefano Garzarella
b18c833888 vsock: initialize child_ns_mode_locked in vsock_net_init()
The `child_ns_mode_locked` field lives in `struct net`, which persists
across vsock module reloads. When the module is unloaded and reloaded,
`vsock_net_init()` resets `mode` and `child_ns_mode` back to their
default values, but does not reset `child_ns_mode_locked`.

The stale lock from the previous module load causes subsequent writes
to `child_ns_mode` to silently fail: `vsock_net_set_child_mode()` sees
the old lock, skips updating the actual value, and returns success
when the requested mode matches the stale lock. The sysctl handler
reports no error, but `child_ns_mode` remains unchanged.

Steps to reproduce:
    $ modprobe vsock
    $ echo local > /proc/sys/net/vsock/child_ns_mode
    $ cat /proc/sys/net/vsock/child_ns_mode
    local
    $ modprobe -r vsock
    $ modprobe vsock
    $ echo local > /proc/sys/net/vsock/child_ns_mode
    $ cat /proc/sys/net/vsock/child_ns_mode
    global    <--- expected "local"

Fix this by initializing `child_ns_mode_locked` to 0 (unlocked) in
`vsock_net_init()`, so the write-once mechanism works correctly after
module reload.

Fixes: 102eab95f0 ("vsock: lock down child_ns_mode as write-once")
Reported-by: Jin Liu <jinl@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260401092153.28462-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 08:18:56 -07:00
Lorenzo Bianconi
269389ba53 net: airoha: Set REG_RX_CPU_IDX() once in airoha_qdma_fill_rx_queue()
It is not necessary to update REG_RX_CPU_IDX register for each iteration
of the descriptor loop in airoha_qdma_fill_rx_queue routine.
Move REG_RX_CPU_IDX configuration out of the descriptor loop and rely on
the last queue head value updated in the descriptor loop.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260331-airoha-cpu-idx-out-off-loop-v1-1-75c66b428f50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-02 15:13:22 +02:00