linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 06:41:39 -04:00

Author	SHA1	Message	Date
Jakub Kicinski	166b0cc6df	selftests: drv-net: gro: remove TOTAL_HDR_LEN Willem points out TOTAL_HDR_LEN is identical to MAX_HDR_LEN. This seems to have been the case ever since the test was added. Replace the uses of TOTAL_HDR_LEN with MAX_HDR_LEN, MAX seems more common for what this value is. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260402210000.1512696-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:05:44 -07:00
Jakub Kicinski	5469b695f2	selftests: drv-net: gro: prepare for ip6ip6 support Try to use already calculated offsets and not depend on the ipip flag as much. This patch should not change any functionality, it's just a cleanup to make ip6ip6 support easier. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260402210000.1512696-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:05:43 -07:00
Jakub Kicinski	d973484747	selftests: drv-net: gro: always wait for FIN in the capacity test The new capacity/order test exits as soon as it sees the expected packet sequence. This may allow the "flushing" FIN packet to spill over to the next test. Let's always wait for the FIN before exiting. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260402210000.1512696-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:05:43 -07:00
Jakub Kicinski	436ea8a1b7	selftests: drv-net: gro: add 1 byte payload test Small IPv4 packets get padded to 60B, this may break / confuse some buggy implementations. Add a test to coalesce a 1B payload. Keep this separate from the lrg_sml test because I suspect some implementations may not handle this case (treat padded frames as ineligible for coalescing). Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260402210000.1512696-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:05:43 -07:00
Jakub Kicinski	30f831b44a	selftests: drv-net: gro: add data burst test case Add a test trying to induce a GRO context timeout followed by another sequence of packets for the same flow. The second burst arrives 100ms after the first one so any implementation (SW or HW) must time out waiting at that point. We expect both bursts to be aggregated successfully but separately. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260402210000.1512696-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:05:42 -07:00
Russell King (Oracle)	789ec16eb3	net: stmmac: qcom-ethqos: set clk_csr The clocks for qcom-ethqos return a rate of zero as firmware manages their rate. According to hardware documentation, the clock which is fed to the slave AHB interface can range between 50 to 100MHz for non-RGMII and 30 to 75MHz for boards with a RGMII interfaces. Currently, stmmac uses an undefined divisor value. Instead, use STMMAC_CSR_60_100M which will mean we meet IEEE 802.3 specification since this will generate: 714kHz @ 30MHz 1.19MHz @ 50MHz 1.79MHz @ 75MHz 2.42MHz @ 100MHz This gives MDC rates within the IEEE 802.3 specification, although the 30MHz case is particularly slow. Selecting the next lowest divisor, STMMAC_CSR_35_60M, which is /26 will give: 1.15MHz @ 30MHz 1.92MHz @ 50MHz 2.88MHz @ 75MHz (exceeding 802.3 spec) 3.85MHz @ 100MHz (exceeding 802.3 spec) Unfortunately, this divisor makes the upper bound of both ranges exeed the IEEE 802.3 specification, and thus we can not use it without knowing for certain what the current CSR clock rate actually is. So, STMMAC_CSR_60_100M is the best fit for all boards based on the information provided thus far. Link: https://lore.kernel.org/r/acGhQ0oui+dVRdLY@oss.qualcomm.com Link: https://lore.kernel.org/r/acw1habUsiSqlrky@oss.qualcomm.com Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w8JKr-0000000EdLC-41Bt@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:39:04 -07:00
Julian Braha	e2f152c822	stmmac: cleanup dead dependencies on STMMAC_PLATFORM and STMMAC_ETH in Kconfig There are already 'if STMMAC_ETH' and 'STMMAC_PLATFORM' conditions wrapping these config options, making the 'depends on' statements duplicate dependencies (dead code). I propose leaving the outer 'if STMMAC_PLATFORM...endif' and 'if STMMAC_ETH...endif' conditions, and removing the individual 'depends on' statements. This dead code was found by kconfirm, a static analysis tool for Kconfig. Signed-off-by: Julian Braha <julianbraha@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260402145858.240231-1-julianbraha@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:37:31 -07:00
Eric Dumazet	a9b460225e	net: always inline some skb helpers Some performance critical helpers from include/linux/skbuff.h are not inlined by clang. Use __always_inline hint for: - __skb_fill_netmem_desc() - __skb_fill_page_desc() - skb_fill_netmem_desc() - skb_fill_page_desc() - __skb_pull() - pskb_may_pull_reason() - pskb_may_pull() - pskb_pull() - pskb_trim() - skb_orphan() - skb_postpull_rcsum() - skb_header_pointer() - skb_clear_delivery_time() - skb_tstamp_cond() - skb_warn_if_lro() This increases performance and saves ~1200 bytes of text. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 4/24 grow/shrink: 66/12 up/down: 4104/-5306 (-1202) Function old new delta ip_multipath_l3_keys - 303 +303 tcp_sendmsg_locked 4560 4848 +288 xfrm_input 6240 6455 +215 esp_output_head 1516 1711 +195 skb_try_coalesce 696 866 +170 bpf_prog_test_run_skb 1951 2091 +140 tls_strp_read_copy 528 667 +139 gue_udp_recv 738 871 +133 __ip6_append_data 4159 4279 +120 __bond_xmit_hash 1019 1122 +103 ip6_multipath_l3_keys 394 495 +101 bpf_lwt_seg6_action 1096 1197 +101 input_action_end_dx2 344 442 +98 vxlan_remcsum 487 581 +94 udpv6_queue_rcv_skb 393 480 +87 udp_queue_rcv_skb 385 471 +86 gue_remcsum 453 539 +86 udp_lib_checksum_complete 84 168 +84 vxlan_xmit 2777 2857 +80 nf_reset_ct 456 532 +76 igmp_rcv 1902 1978 +76 mpls_forward 1097 1169 +72 tcp_add_backlog 1226 1292 +66 nfulnl_log_packet 3091 3156 +65 tcp_rcv_established 1966 2026 +60 __strp_recv 1547 1603 +56 eth_type_trans 357 411 +54 bond_flow_ip 392 444 +52 __icmp_send 1584 1630 +46 ip_defrag 1636 1681 +45 tpacket_rcv 2793 2837 +44 refcount_add 132 176 +44 nf_ct_frag6_gather 1959 2003 +44 napi_skb_free_stolen_head 199 240 +41 __pskb_trim - 41 +41 napi_reuse_skb 319 358 +39 icmpv6_rcv 1877 1916 +39 br_handle_frame_finish 1672 1711 +39 ip_rcv_core 841 879 +38 ip_check_defrag 377 415 +38 br_stp_rcv 909 947 +38 qdisc_pkt_len_segs_init 366 399 +33 mld_query_work 2945 2975 +30 bpf_sk_assign_tcp_reqsk 607 637 +30 udp_gro_receive 1657 1686 +29 ip6_rcv_core 1170 1193 +23 ah_input 1176 1197 +21 tun_get_user 5174 5194 +20 llc_rcv 815 834 +19 __pfx_udp_lib_checksum_complete 16 32 +16 __pfx_refcount_add 48 64 +16 __pfx_nf_reset_ct 96 112 +16 __pfx_ip_multipath_l3_keys - 16 +16 __pfx___pskb_trim - 16 +16 packet_sendmsg 5771 5781 +10 esp_output_tail 1460 1470 +10 alloc_skb_with_frags 433 443 +10 xsk_generic_xmit 3477 3486 +9 mptcp_sendmsg_frag 2250 2259 +9 __ip_append_data 4166 4175 +9 __ip6_tnl_rcv 1159 1168 +9 skb_zerocopy 1215 1220 +5 gre_parse_header 1358 1362 +4 __iptunnel_pull_header 405 407 +2 skb_vlan_untag 692 693 +1 psp_dev_rcv 701 702 +1 netkit_xmit 1263 1264 +1 gre_rcv 2776 2777 +1 gre_gso_segment 1521 1522 +1 bpf_skb_net_hdr_pop 535 536 +1 udp6_ufo_fragment 888 884 -4 br_multicast_rcv 9154 9148 -6 snap_rcv 312 305 -7 skb_copy_ubufs 1841 1834 -7 __pfx_skb_tstamp_cond 16 - -16 __pfx_skb_clear_delivery_time 16 - -16 __pfx_pskb_trim 16 - -16 __pfx_pskb_pull 16 - -16 ipv6_gso_segment 1400 1383 -17 ipv6_frag_rcv 2511 2492 -19 erspan_xmit 1221 1190 -31 __pfx_skb_warn_if_lro 32 - -32 __pfx___skb_fill_page_desc 32 - -32 skb_tstamp_cond 42 - -42 pskb_trim 46 - -46 __pfx_skb_postpull_rcsum 48 - -48 tcp_gso_segment 1524 1475 -49 skb_clear_delivery_time 54 - -54 __pfx_skb_fill_page_desc 64 - -64 __pfx_skb_header_pointer 80 - -80 pskb_pull 91 - -91 skb_warn_if_lro 110 - -110 tcp_v6_rcv 3288 3170 -118 __pfx___skb_pull 128 - -128 __pfx_skb_orphan 144 - -144 __pfx_pskb_may_pull 160 - -160 tcp_v4_rcv 3334 3153 -181 __skb_fill_page_desc 231 - -231 udp_rcv 1809 1553 -256 skb_postpull_rcsum 318 - -318 skb_header_pointer 367 - -367 fib_multipath_hash 3399 3018 -381 skb_orphan 513 - -513 skb_fill_page_desc 534 - -534 __skb_pull 568 - -568 pskb_may_pull 604 - -604 Total: Before=29652698, After=29651496, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260402152654.1720627-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:36:48 -07:00
Jakub Kicinski	8b0e64d6c9	Merge branch 'enic-sr-iov-v2-preparatory-infrastructure' Satish Kharat says: ==================== enic: SR-IOV V2 preparatory infrastructure This is the first of four series adding SR-IOV V2 support to the enic driver for Cisco VIC 14xx/15xx adapters. The existing V1 SR-IOV implementation has VFs that interact directly with the VIC firmware, leaving the PF driver with no visibility or control over VF behavior. V2 introduces a PF-mediated model where VFs communicate with the PF through a mailbox over a dedicated admin channel. This brings enic in line with the standard Linux SR-IOV model, enabling full PF management of VFs via ip link (MAC, VLAN, link state, spoofchk, trust, and per-VF statistics). This preparatory series adds detection and resource helper code with no functional change to existing driver behavior: - Extend BAR resource discovery for admin channel resources - Register the V2 VF PCI device ID - Detect VF type (V1/V2/usNIC) from SR-IOV PCI capability - Make enic_dev_enable/disable ref-counted for shared use by data path and admin channel - Add type-aware resource allocation for admin WQ/RQ/CQ/INTR - Detect presence of admin channel resources at probe time Tested on VIC 14xx and 15xx series adapters with V2 VFs under KVM (sriov_numvfs, VF passthrough, ip link VF configuration, VF traffic). Based in part on initial work by Christian Benvenuti. ==================== Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:05:08 -07:00
Satish Kharat	4368f5fab4	enic: detect admin channel resources for SR-IOV Check for the presence of admin channel BAR resources (RES_TYPE_ADMIN_WQ, ADMIN_RQ, ADMIN_CQ, SRIOV_INTR) during resource discovery. Set has_admin_channel when all four are available. Use ARRAY_SIZE(enic->admin_cq) for the admin CQ count check since the driver allocates two admin CQs (one for WQ completions, one for RQ completions) and both must be backed by hardware resources. Add admin WQ, RQ, CQ and INTR fields to struct enic for use by the upcoming admin channel open/close paths. Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-6-d5834b2ef1b9@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:05:06 -07:00
Satish Kharat	730ce15d44	enic: add type-aware alloc for WQ, RQ, CQ and INTR resources The existing vnic_wq_alloc(), vnic_rq_alloc(), vnic_cq_alloc() and vnic_intr_alloc() hardcode data-path resource types (RES_TYPE_WQ, RES_TYPE_RQ, RES_TYPE_CQ, RES_TYPE_INTR_CTRL). The upcoming admin channel uses different BAR resource types (RES_TYPE_ADMIN_WQ/RQ/CQ, RES_TYPE_SRIOV_INTR) for its queues. Add _with_type() variants that accept an explicit resource type parameter. Refactor the original functions as thin wrappers that pass the default data-path type. No functional change. Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-5-d5834b2ef1b9@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:05:06 -07:00
Satish Kharat	0266ecb59d	enic: make enic_dev_enable/disable ref-counted Both the data path (ndo_open/ndo_stop) and the upcoming admin channel need to enable and disable the vNIC device independently. Without reference counting, closing the admin channel while the netdev is up would inadvertently disable the entire device. Add an enable_count to struct enic, protected by the existing devcmd_lock. enic_dev_enable() issues CMD_ENABLE_WAIT only on the first caller (0 -> 1 transition), and enic_dev_disable() issues CMD_DISABLE only when the last caller releases (1 -> 0 transition). Also check the return value of enic_dev_enable() in enic_open() and fail the open if the firmware enable command fails. Without this check, a failed enable leaves enable_count at zero while the interface appears up, which can cause a later admin channel enable/disable cycle to incorrectly disable the hardware under the active data path. Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-4-d5834b2ef1b9@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:05:06 -07:00
Satish Kharat	56a4d7a865	enic: detect SR-IOV VF type from PCI capability Read the VF device ID from the SR-IOV PCI capability at probe time to determine whether the PF is configured for V1, USNIC, or V2 virtual functions. Store the result in enic->vf_type for use by subsequent SR-IOV operations. The VF type is a firmware-configured property (set via UCSM, CIMC, Intersight etc) that is immutable from the driver's perspective. Only PFs are probed for this capability; VFs and dynamic vnics skip detection. Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-3-d5834b2ef1b9@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:05:06 -07:00
Satish Kharat	803a1b0202	enic: add V2 SR-IOV VF device ID Register the V2 VF PCI device ID (0x02b7) so the driver binds to V2 virtual functions created via sriov_configure. Update enic_is_sriov_vf() to recognize V2 VFs alongside the existing V1 type. Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-2-d5834b2ef1b9@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:05:06 -07:00
Satish Kharat	74fb32ed73	enic: extend resource discovery for SR-IOV admin channel VIC firmware exposes admin channel resources (WQ, RQ, CQ) for PF-VF communication when SR-IOV is active. Add the corresponding resource type definitions and teach the discovery and access functions to handle them. Signed-off-by: Satish Kharat <satishkh@cisco.com> Link: https://patch.msgid.link/20260401-enic-sriov-v2-prep-v4-1-d5834b2ef1b9@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:05:05 -07:00
Jakub Kicinski	0ea7e61f65	Merge branch 'net-phy-microchip-add-downshift-support-for-lan88xx' Nicolai Buchwitz says: ==================== net: phy: microchip: add downshift support for LAN88xx Add standard ETHTOOL_PHY_DOWNSHIFT tunable support for the Microchip LAN88xx PHY, following the same pattern used by Marvell and other PHY drivers. Ethernet cables with faulty or missing pairs (specifically C and D) can successfully auto-negotiate 1000BASE-T but fail to establish a stable link. The LAN88xx PHY supports automatic downshift to 100BASE-TX after a configurable number of failed attempts (2-5). Patch 1 adds the get/set tunable implementation. Patch 2 enables downshift by default with a count of 2. The setting is stored in the driver's private data so that user changes via ethtool are preserved across suspend/resume cycles. Based on an earlier downstream implementation by Phil Elwell. Tested on Raspberry Pi 3B+ (LAN7515/LAN88xx). ==================== Link: https://patch.msgid.link/20260401123848.696766-1-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:03:07 -07:00
Nicolai Buchwitz	70180f72d9	net: phy: microchip: enable downshift by default on LAN88xx Enable auto-downshift from 1000BASE-T to 100BASE-TX after 2 failed auto-negotiation attempts by default. This ensures that links with faulty or missing cable pairs (C and D) fall back to 100Mbps without requiring userspace configuration. The downshift count is stored in the driver's private data and applied in config_init, so user changes via ethtool are preserved across suspend/resume cycles. Users can override or disable downshift at runtime: ethtool --set-phy-tunable eth0 downshift off Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260401123848.696766-3-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:03:03 -07:00
Nicolai Buchwitz	e417ac73d2	net: phy: microchip: add downshift tunable support for LAN88xx Implement the standard ETHTOOL_PHY_DOWNSHIFT tunable for the LAN88xx PHY. This allows runtime configuration of the auto-downshift feature via ethtool: ethtool --set-phy-tunable eth0 downshift on count 3 The LAN88xx PHY supports downshifting from 1000BASE-T to 100BASE-TX after 2-5 failed auto-negotiation attempts. Valid count values are 2, 3, 4 and 5. This is based on an earlier downstream implementation by Phil Elwell. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260401123848.696766-2-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:03:03 -07:00
Chih Kai Hsu	86f5dd4e0f	r8152: Add helper functions for SRAM2 Add the following helper functions for SRAM2 access to simplify the code and improve readability: - sram2_write() - write data to SRAM2 address - sram2_read() - read data from SRAM2 address - sram2_write_w0w1() - read-modify-write operation Signed-off-by: Chih Kai Hsu <hsu.chih.kai@realtek.com> Reviewed-by: Hayes Wang <hayeswang@realtek.com> Link: https://patch.msgid.link/20260401115542.34601-1-nic_swsd@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 18:01:06 -07:00
Daniel Wagner	7eaff1eff0	net: phy: bcm84881: add LED framework support for BCM84891/BCM84892 Expose LED1 and LED2 pins via the PHY LED framework. Each pin has a source mask (MASK_LOW + MASK_EXT registers) selecting which hardware events light it, plus a CTL field in the shared 0xA83B register (RMW; LED4 is firmware-controlled per the datasheet). Hardware can offload per-speed link triggers (1000/2500/5000/10000), RX/TX activity, and force-on. LINK_100 is accepted only alongside LINK_1000: source bit 4 lights at both speeds and 100-alone isn't representable, so the unrepresentable case falls to software. The chip has five LED pins; only LED1/LED2 are exposed here as those are the only ones characterized on tested hardware. LED4 is firmware- controlled regardless of strap configuration. Tested on TRENDnet TEG-S750 (LED1/LED2 wired to an antiparallel bicolor LED): brightness_set via sysfs; netdev trigger offloaded=1 with amber lit at 100M/1G/2.5G and green lit at 10G via respective link_* modes; LED off immediately on cable unplug with no software involvement. Signed-off-by: Daniel Wagner <wagner.daniel.t@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260401114931.3091818-1-wagner.daniel.t@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 17:59:43 -07:00
Jakub Kicinski	9b79da5d69	Merge branch 'macvlan-broadcast-delivery-changes' Eric Dumazet says: ==================== macvlan: broadcast delivery changes First patch adds data-race annotations. Second patch changes macvlan_broadcast_enqueue() to return early if the queue is full. ==================== Link: https://patch.msgid.link/20260401103809.3038139-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 17:56:37 -07:00
Eric Dumazet	0d5dc1d7aa	macvlan: avoid spinlock contention in macvlan_broadcast_enqueue() Under high stress, we spend a lot of time cloning skbs, then acquiring a spinlock, then freeing the clone because the queue is full. Add a shortcut to avoid these costs under pressure. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260401103809.3038139-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 17:56:35 -07:00
Eric Dumazet	1ef5789d99	macvlan: annotate data-races around port->bc_queue_len_used port->bc_queue_len_used is read and written locklessly, add READ_ONCE()/WRITE_ONCE() annotations. While WRITE_ONCE() in macvlan_fill_info() is not yet needed, it is a prereq for future RTNL avoidance. Fixes: `d4bff72c84` ("macvlan: Support for high multicast packet rate") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260401103809.3038139-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 17:56:35 -07:00
Jakub Kicinski	f35340f2d6	Merge branch 'net-stmmac-tso-fixes-cleanups' Russell King says: ==================== net: stmmac: TSO fixes/cleanups This is a more refined version of the previous patch series fixing and cleaning up the TSO code. I'm not sure whether "TSO" or "GSO" should be used to describe this feature - although it primarily handles TCP, dwmac4 appears to also be able to handle UDP. In essence, this series adds a .ndo_features_check() method to handle whether TSO/GSO can be used for a particular skbuff - checking which queue the skbuff is destined for and whether that has TBS available which precludes TSO being enabled on that channel. I'm also adding a check that the header is smaller than 1024 bytes, as documented in those sources which have TSO support - this is due to the hardware buffering the header in "TSO memory" which I guess is limited to 1KiB. I expect this test never to trigger, but if the headers ever exceed that size, the hardware will likely fail. While IPv4 headers are unlikely to be anywhere near this, there is nothing in the protocol which prevents IPv6 headers up to 64KiB. As we now have a .ndo_features_check() method, I'm moving the VLAN insertion for TSO packets into core code by unpublishing the VLAN insertion features when we use TSO. Another move is for checksumming, which is required for TSO, but stmmac's requirements for offloading checksums are more strict - and this seems to be a bug in the TSO path. I've changed the hardware initialisation to always enable TSO support on the channels even if the user requests TSO/GSO to be disabled - this fixes another issue as pointed out by Jakub in a previous review. I'm moving the setup of the GSO features, cleaning those up, and adding a warning if platform glue requests this to be enabled but the hardware has no support. Hopefully this will never trigger if everyone got the STMMAC_FLAG_TSO_EN flag correct. Also adding a check for TxPBL value. Finally, moving the "TSO supported" message to the new stmmac_set_gso_features() function so keep all this TSO stuff together. ==================== Link: https://patch.msgid.link/aczHVF04LIGq_lYO@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:23 -07:00
Russell King (Oracle)	0f96212a51	net: stmmac: move "TSO supported" message to stmmac_set_gso_features() Move the "TSO supported" message to stmmac_set_gso_features() so that we group all probe-time TSO stuff in one place. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7pu8-0000000Eau5-3Zne@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:20 -07:00
Russell King (Oracle)	33f5cc83bb	net: stmmac: check txpbl for TSO Documentation states that TxPBL must be >= 4 to allow TSO support, but the driver doesn't check this. TxPBL comes from the platform glue code or DT. Add a check with a warning if platform glue code attempts to enable TSO support with TxPBL too low. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7pu3-0000000Eatz-39ts@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:20 -07:00
Russell King (Oracle)	f8c70ab540	net: stmmac: add warning when TSO is requested but unsupported Add a warning message if TSO is requested by the platform glue code but the core wasn't configured for TSO. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7pty-0000000Eatt-2TjZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:20 -07:00
Russell King (Oracle)	6ad0044428	net: stmmac: make stmmac_set_gso_features() more readable Make stmmac_set_gso_features() more readable by adding some whitespace and getting rid of the indentation. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7ptt-0000000Eatn-1ziK@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:20 -07:00
Russell King (Oracle)	c04939cb98	net: stmmac: split out gso features setup Move the GSO features setup into a separate function, co-loated with other GSO/TSO support. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7pto-0000000Eath-1VDH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:19 -07:00
Russell King (Oracle)	2e4082e4b7	net: stmmac: simplify GSO/TSO test in stmmac_xmit() The test in stmmac_xmit() to see whether we should pass the skbuff to stmmac_tso_xmit() is more complex than it needs to be. This test can be simplified by storing the mask of GSO types that we will pass, and setting it according to the enabled features. Note that "tso" is a mis-nomer since commit `b776620651` ("net: stmmac: Implement UDP Segmentation Offload"). Also note that this commit controls both via the TSO feature. We preserve this behaviour in this commit. Also, this commit unconditionally accessed skb_shinfo(skb)->gso_type for all frames, even when skb_is_gso() was false. This access is eliminated. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7ptj-0000000Eatb-11zK@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:19 -07:00
Russell King (Oracle)	b55dfb173c	net: stmmac: move check for hardware checksum supported Add a check in .ndo_features_check() to indicate whether hardware checksum can be performed on the skbuff. Where hardware checksum is not supported - either because the channel does not support Tx COE or the skb isn't suitable (stmmac uses a tighter test than can_checksum_protocol()) we also need to disable TSO, which will be done by harmonize_features() in net/core/dev.c This fixes a bug where a channel which has COE disabled may still receive TSO skbuffs. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7pte-0000000EatU-0ILt@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:19 -07:00
Russell King (Oracle)	3f6a6eb9ef	net: stmmac: move TSO VLAN tag insertion to core code stmmac_tso_xmit() checks whether the skbuff is trying to offload vlan tag insertion to hardware, which from the comment in the code appears to be buggy when the TSO feature is used. Rather than stmmac_tso_xmit() inserting the VLAN tag, handle this in stmmac_features_check() which will then use core net code to handle this. See net/core/dev.c::validate_xmit_skb() Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7ptY-0000000EatO-42Qv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:19 -07:00
Russell King (Oracle)	c05a81cbee	net: stmmac: add GSO MSS checks Add GSO MSS checks to stmmac_features_check(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7ptT-0000000EatI-3feh@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:19 -07:00
Russell King (Oracle)	6732e474f8	net: stmmac: add TSO check for header length According to the STM32MP151 documentation which covers dwmac v4.2, the hardware TSO feature can handle header lengths up to a maximum of 1023 bytes. Add a .ndo_features_check() method implementation to check the header length meets these requirements, otherwise fall back to software GSO. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7ptO-0000000EatC-39il@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:18 -07:00
Russell King (Oracle)	f799b5dab9	net: stmmac: add stmmac_tso_header_size() We will need to compute the size of the protocol headers in two places, so move this into a separate function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7ptJ-0000000Eat5-2ZlA@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:18 -07:00
Russell King (Oracle)	e32820264c	net: stmmac: fix TSO support when some channels have TBS available According to the STM32MP25xx manual, which is dwmac v5.3, TBS (time based scheduling) is not permitted for channels which have hardware TSO enabled. Intel's commit `5e6038b88a` ("net: stmmac: fix TSO and TBS feature enabling during driver open") concurs with this, but it is incomplete. This commit avoids enabling TSO support on the channels which have TBS available, which, as far as the hardware is concerned, means we do not set the TSE bit in the DMA channel's transmit control register. However, the net device's features apply to all queues(channels), which means these channels may still be handed TSO skbs to transmit, and the driver will pass them to stmmac_tso_xmit(). This will generate the descriptors for TSO, even though the channel has the TSE bit clear. Fix this by checking whether the queue(channel) has TBS available, and if it does, fall back to software GSO support. Fixes: `5e6038b88a` ("net: stmmac: fix TSO and TBS feature enabling during driver open") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7ptE-0000000Easz-28tv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:18 -07:00
Russell King (Oracle)	afe840ddf1	net: stmmac: fix .ndo_fix_features() netdev features documentation requires that .ndo_fix_features() is stateless: it shouldn't modify driver state. Yet, stmmac_fix_features() does exactly that, changing whether GSO frames are processed by the driver. Move this code to stmmac_set_features() instead, which is the correct place for it. We don't need to check whether TSO is supported; this is already handled via the setup of netdev->hw_features, and we are guaranteed that if netdev->hw_features indicates that a feature is not supported, .ndo_set_features() won't be called with it set. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7pt9-0000000East-1YAO@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:18 -07:00
Russell King (Oracle)	989a9c20f6	net: stmmac: fix channel TSO enable on resume Rather than configuring the channels depending on whether GSO/TSO is currently enabled by the user, always enable if the hardware has TSO support and the platform wants TSO to be enabled. This avoids the channel TSO enable bit being disabled after a resume when the user has disabled TSO features. This will cause problems when the user re-enables TSO. This bug goes back to commit `f748be531d` ("stmmac: support new GMAC4") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1w7pt4-0000000Easn-14WL@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:28:18 -07:00
Jakub Kicinski	8ffb33d770	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c `b18c833888` ("vsock: initialize child_ns_mode_locked in vsock_net_init()") `0de607dc4f` ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c `ceee35e567` ("bnxt_en: Refactor some basic ring setup and adjustment logic") `57cdfe0dc7` ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c `4d56037a02` ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") `687a95d204` ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h `b6045c899e` ("wifi: iwlwifi: mld: Refactor scan command handling") `ec66ec6a5a` ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c `078df640ef` ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") `323156c354` ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:03:13 -07:00
Linus Torvalds	f8f5627a8a	Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "With fixes from wireless, bluetooth and netfilter included we're back to each PR carrying 30%+ more fixes than in previous era. The good news is that so far none of the "extra" fixes are themselves causing real regressions. Not sure how much comfort that is. Current release - fix to a fix: - netdevsim: fix build if SKB_EXTENSIONS=n - eth: stmmac: skip VLAN restore when VLAN hash ops are missing Previous releases - regressions: - wifi: iwlwifi: mvm: don't send a 6E related command when not supported Previous releases - always broken: - some info leak fixes - add missing clearing of skb->cb[] on ICMP paths from tunnels - ipv6: - flowlabel: defer exclusive option free until RCU teardown - avoid overflows in ip6_datagram_send_ctl() - mpls: add seqcount to protect platform_labels from OOB access - bridge: improve safety of parsing ND options - bluetooth: fix leaks, overflows and races in hci_sync - netfilter: add more input validation, some to address bugs directly some to prevent exploits from cooking up broken configurations - wifi: - ath: avoid poor performance due to stopping the wrong aggregation session - virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free - eth: - fec: fix the PTP periodic output sysfs interface - enetc: safely reinitialize TX BD ring when it has unsent frames" * tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64 ipv6: avoid overflows in ip6_datagram_send_ctl() net: hsr: fix VLAN add unwind on slave errors net: hsr: serialize seq_blocks merge across nodes vsock: initialize child_ns_mode_locked in vsock_net_init() selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks net/sched: cls_flow: fix NULL pointer dereference on shared blocks net/sched: cls_fw: fix NULL pointer dereference on shared blocks net/x25: Fix overflow when accumulating packets net/x25: Fix potential double free of skb bnxt_en: Restore default stat ctxs for ULP when resource is available bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode() bnxt_en: Refactor some basic ring setup and adjustment logic net/mlx5: Fix switchdev mode rollback in case of failure net/mlx5: Avoid "No data available" when FW version queries fail net/mlx5: lag: Check for LAG device before creating debugfs net: macb: properly unregister fixed rate clocks net: macb: fix clk handling on PCI glue driver removal virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN net/sched: sch_netem: fix out-of-bounds access in packet corruption ...	2026-04-02 09:57:06 -07:00
Linus Torvalds	4c2c526b5a	Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - IOMMU-PT related compile breakage in for AMD driver - IOTLB flushing behavior when unmapped region is larger than requested due to page-sizes - Fix IOTLB flush behavior with empty gathers * tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline iommupt: Fix short gather if the unmap goes into a large mapping iommu: Do not call drivers for empty gathers	2026-04-02 09:53:16 -07:00
Linus Torvalds	2ec9074b28	Merge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "People have been so busy for hunting and we're still getting more changes than wished for, but it doesn't look too scary; almost all changes are device-specific small fixes. I guess it's rather a casual bump, and no more Easter eggs are left for 7.0 (hopefully)... - Fixes for the recent regression on ctxfi driver - Fix missing INIT_LIST_HEAD() for ASoC card_aux_list - Usual HD- and USB-audio, and ASoC AMD quirk updates - ASoC fixes for AMD and Intel" * tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits) ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP ALSA: hda/realtek: add quirk for Acer Swift SFG14-73 ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9 ASoC: Intel: boards: fix unmet dependency on PINCTRL ASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_params ALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initialization ALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10 ALSA: hda/realtek: add quirk for HP Laptop 15-fc0xxx ASoC: ep93xx: Fix unchecked clk_prepare_enable() and add rollback on failure ASoC: soc-core: call missing INIT_LIST_HEAD() for card_aux_list ALSA: hda/realtek: Add quirk for Samsung Book2 Pro 360 (NP950QED) ASoC: amd: yc: Add DMI entry for HP Laptop 15-fc0xxx ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 16X OLED M7601RM ALSA: hda/realtek: Add quirk for ASUS ROG Strix SCAR 15 ALSA: usb-audio: Exclude Scarlett Solo 1st Gen from SKIP_IFACE_SETUP ALSA: caiaq: fix stack out-of-bounds read in init_card ALSA: ctxfi: Check the error for index mapping ALSA: ctxfi: Fix missing SPDIFI1 index handling ALSA: hda/realtek: add quirk for HP Victus 15-fb0xxx ...	2026-04-02 09:41:21 -07:00
Linus Torvalds	2064d7784e	Merge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay Pull auxdisplay fixes from Andy Shevchenko: - Fix NULL dereference in linedisp_release() - Fix ht16k33 DT bindings to avoid warnings - Handle errors in I²C transfers in lcd2s driver * tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay: auxdisplay: line-display: fix NULL dereference in linedisp_release auxdisplay: lcd2s: add error handling for i2c transfers dt-bindings: auxdisplay: ht16k33: Use unevaluatedProperties to fix common property warning	2026-04-02 09:34:22 -07:00
Dimitri Daskalakis	ec7067e661	eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64 On systems with 64K pages, RX queues will be wedged if users set the descriptor count to the current minimum (16). Fbnic fragments large pages into 4K chunks, and scales down the ring size accordingly. With 64K pages and 16 descriptors, the ring size mask is 0 and will never be filled. 32 descriptors is another special case that wedges the RX rings. Internally, the rings track pages for the head/tail pointers, not page fragments. So with 32 descriptors, there's only 1 usable page as one ring slot is kept empty to disambiguate between an empty/full ring. As a result, the head pointer never advances and the HW stalls after consuming 16 page fragments. Fixes: `0cb4c0a137` ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260401162848.2335350-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:38:34 -07:00
Eric Dumazet	4e45337556	ipv6: avoid overflows in ip6_datagram_send_ctl() Yiming Qian reported : <quote> I believe I found a locally triggerable kernel bug in the IPv6 sendmsg ancillary-data path that can panic the kernel via `skb_under_panic()` (local DoS). The core issue is a mismatch between: - a 16-bit length accumulator (`struct ipv6_txoptions::opt_flen`, type `__u16`) and - a pointer to the last provided destination-options header (`opt->dst1opt`) when multiple `IPV6_DSTOPTS` control messages (cmsgs) are provided. - `include/net/ipv6.h`: - `struct ipv6_txoptions::opt_flen` is `__u16` (wrap possible). (lines 291-307, especially 298) - `net/ipv6/datagram.c:ip6_datagram_send_ctl()`: - Accepts repeated `IPV6_DSTOPTS` and accumulates into `opt_flen` without rejecting duplicates. (lines 909-933) - `net/ipv6/ip6_output.c:__ip6_append_data()`: - Uses `opt->opt_flen + opt->opt_nflen` to compute header sizes/headroom decisions. (lines 1448-1466, especially 1463-1465) - `net/ipv6/ip6_output.c:__ip6_make_skb()`: - Calls `ipv6_push_frag_opts()` if `opt->opt_flen` is non-zero. (lines 1930-1934) - `net/ipv6/exthdrs.c:ipv6_push_frag_opts()` / `ipv6_push_exthdr()`: - Push size comes from `ipv6_optlen(opt->dst1opt)` (based on the pointed-to header). (lines 1179-1185 and 1206-1211) 1. `opt_flen` is a 16-bit accumulator: - `include/net/ipv6.h:298` defines `__u16 opt_flen; /* after fragment hdr /`. 2. `ip6_datagram_send_ctl()` accepts repeated* `IPV6_DSTOPTS` cmsgs and increments `opt_flen` each time: - In `net/ipv6/datagram.c:909-933`, for `IPV6_DSTOPTS`: - It computes `len = ((hdr->hdrlen + 1) << 3);` - It checks `CAP_NET_RAW` using `ns_capable(net->user_ns, CAP_NET_RAW)`. (line 922) - Then it does: - `opt->opt_flen += len;` (line 927) - `opt->dst1opt = hdr;` (line 928) There is no duplicate rejection here (unlike the legacy `IPV6_2292DSTOPTS` path which rejects duplicates at `net/ipv6/datagram.c:901-904`). If enough large `IPV6_DSTOPTS` cmsgs are provided, `opt_flen` wraps while `dst1opt` still points to a large (2048-byte) destination-options header. In the attached PoC (`poc.c`): - 32 cmsgs with `hdrlen=255` => `len = (255+1)8 = 2048` - 1 cmsg with `hdrlen=0` => `len = 8` - Total increment: `322048 + 8 = 65544`, so `(__u16)opt_flen == 8` - The last cmsg is 2048 bytes, so `dst1opt` points to a 2048-byte header. 3. The transmit path sizes headers using the wrapped `opt_flen`: - In `net/ipv6/ip6_output.c:1463-1465`: - `headersize = sizeof(struct ipv6hdr) + (opt ? opt->opt_flen + opt->opt_nflen : 0) + ...;` With wrapped `opt_flen`, `headersize`/headroom decisions underestimate what will be pushed later. 4. When building the final skb, the actual push length comes from `dst1opt` and is not limited by wrapped `opt_flen`: - In `net/ipv6/ip6_output.c:1930-1934`: - `if (opt->opt_flen) proto = ipv6_push_frag_opts(skb, opt, proto);` - In `net/ipv6/exthdrs.c:1206-1211`, `ipv6_push_frag_opts()` pushes `dst1opt` via `ipv6_push_exthdr()`. - In `net/ipv6/exthdrs.c:1179-1184`, `ipv6_push_exthdr()` does: - `skb_push(skb, ipv6_optlen(opt));` - `memcpy(h, opt, ipv6_optlen(opt));` With insufficient headroom, `skb_push()` underflows and triggers `skb_under_panic()` -> `BUG()`: - `net/core/skbuff.c:2669-2675` (`skb_push()` calls `skb_under_panic()`) - `net/core/skbuff.c:207-214` (`skb_panic()` ends in `BUG()`) - The `IPV6_DSTOPTS` cmsg path requires `CAP_NET_RAW` in the target netns user namespace (`ns_capable(net->user_ns, CAP_NET_RAW)`). - Root (or any task with `CAP_NET_RAW`) can trigger this without user namespaces. - An unprivileged `uid=1000` user can trigger this if unprivileged user namespaces are enabled and it can create a userns+netns to obtain namespaced `CAP_NET_RAW` (the attached PoC does this). - Local denial of service: kernel BUG/panic (system crash). - Reproducible with a small userspace PoC. </quote> This patch does not reject duplicated options, as this might break some user applications. Instead, it makes sure to adjust opt_flen and opt_nflen to correctly reflect the size of the current option headers, preventing the overflows and the potential for panics. This applies to IPV6_DSTOPTS, IPV6_HOPOPTS, and IPV6_RTHDR. Specifically: When a new IPV6_DSTOPTS is processed, the length of the old opt->dst1opt is subtracted from opt->opt_flen before adding the new length. When a new IPV6_HOPOPTS is processed, the length of the old opt->dst0opt is subtracted from opt->opt_nflen. When a new Routing Header (IPV6_RTHDR or IPV6_2292RTHDR) is processed, the length of the old opt->srcrt is subtracted from opt->opt_nflen. In the special case within IPV6_2292RTHDR handling where dst1opt is moved to dst0opt, the length of the old opt->dst0opt is subtracted from opt->opt_nflen before the new one is added. Fixes: `333fad5364` ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8JNzawgr5OX5m+3jnQDHry2XxhQT5=jThW1zDPtUikRYA@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260401154721.3740056-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:25:22 -07:00
Jakub Kicinski	be193568be	Merge branch 'net-hsr-fixes-for-prp-duplication-and-vlan-unwind' Luka Gejak says: ==================== net: hsr: fixes for PRP duplication and VLAN unwind This series addresses two logic bugs in the HSR/PRP implementation identified during a protocol audit. These are targeted for the 'net' tree as they fix potential memory corruption and state inconsistency. The primary change resolves a race condition in the node merging path by implementing address-based lock ordering. This ensures that concurrent mutations of sequence blocks do not lead to state corruption or deadlocks. An additional fix corrects asymmetric VLAN error unwinding by implementing a centralized unwind path on slave errors. ==================== Link: https://patch.msgid.link/20260401092243.52121-1-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:23:55 -07:00
Luka Gejak	2e3514e63b	net: hsr: fix VLAN add unwind on slave errors When vlan_vid_add() fails for a secondary slave, the error path calls vlan_vid_del() on the failing port instead of the peer slave that had already succeeded. This results in asymmetric VLAN state across the HSR pair. Fix this by switching to a centralized unwind path that removes the VID from any slave device that was already programmed. Fixes: `1a8a63a530` ("net: hsr: Add VLAN CTAG filter support") Signed-off-by: Luka Gejak <luka.gejak@linux.dev> Link: https://patch.msgid.link/20260401092243.52121-3-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:23:49 -07:00
Luka Gejak	f5df2990c3	net: hsr: serialize seq_blocks merge across nodes During node merging, hsr_handle_sup_frame() walks node_curr->seq_blocks to update node_real without holding node_curr->seq_out_lock. This allows concurrent mutations from duplicate registration paths, risking inconsistent state or XArray/bitmap corruption. Fix this by locking both nodes' seq_out_lock during the merge. To prevent ABBA deadlocks, locks are acquired in order of memory address. Reviewed-by: Felix Maurer <fmaurer@redhat.com> Fixes: `415e636751` ("hsr: Implement more robust duplicate discard for PRP") Signed-off-by: Luka Gejak <luka.gejak@linux.dev> Link: https://patch.msgid.link/20260401092243.52121-2-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:23:49 -07:00
Stefano Garzarella	b18c833888	vsock: initialize child_ns_mode_locked in vsock_net_init() The `child_ns_mode_locked` field lives in `struct net`, which persists across vsock module reloads. When the module is unloaded and reloaded, `vsock_net_init()` resets `mode` and `child_ns_mode` back to their default values, but does not reset `child_ns_mode_locked`. The stale lock from the previous module load causes subsequent writes to `child_ns_mode` to silently fail: `vsock_net_set_child_mode()` sees the old lock, skips updating the actual value, and returns success when the requested mode matches the stale lock. The sysctl handler reports no error, but `child_ns_mode` remains unchanged. Steps to reproduce: $ modprobe vsock $ echo local > /proc/sys/net/vsock/child_ns_mode $ cat /proc/sys/net/vsock/child_ns_mode local $ modprobe -r vsock $ modprobe vsock $ echo local > /proc/sys/net/vsock/child_ns_mode $ cat /proc/sys/net/vsock/child_ns_mode global <--- expected "local" Fix this by initializing `child_ns_mode_locked` to 0 (unlocked) in `vsock_net_init()`, so the write-once mechanism works correctly after module reload. Fixes: `102eab95f0` ("vsock: lock down child_ns_mode as write-once") Reported-by: Jin Liu <jinl@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260401092153.28462-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:18:56 -07:00
Lorenzo Bianconi	269389ba53	net: airoha: Set REG_RX_CPU_IDX() once in airoha_qdma_fill_rx_queue() It is not necessary to update REG_RX_CPU_IDX register for each iteration of the descriptor loop in airoha_qdma_fill_rx_queue routine. Move REG_RX_CPU_IDX configuration out of the descriptor loop and rely on the last queue head value updated in the descriptor loop. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260331-airoha-cpu-idx-out-off-loop-v1-1-75c66b428f50@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-02 15:13:22 +02:00

1 2 3 4 5 ...

1430677 Commits