linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 14:51:51 -04:00

Author	SHA1	Message	Date
Mohsin Bashir	cc663d3fed	eth: mlx5: Move pause storm errors to pause stats Report device_stall_critical_watermark_cnt as tx_pause_storm_events in the ethtool_pause_stats struct. This counter tracks pause storm error events which indicate the NIC has been sending pause frames for an extended period due to a stall. The ethtool_pause_stats struct reports these stalls as a single value, whereas the device supports tracking them per priority. Aggregate the counter across all priority classes to capture stalls on all priorities. Note that the stats are fetched from the device for each priority via mlx5_core_access_reg(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-6-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 16:26:53 +01:00
Mohsin Bashir	8d282b680c	eth: fbnic: Fetch TX pause storm stats With pause storm protection in place, track the occurrence of pause storm events. Since there is a one-to-one mapping between pause storm interrupts and events, use the interrupt count to track this metric. ./ethtool -I -a eth0 Pause parameters for eth0: Autonegotiate: off RX: off TX: on Statistics: tx_pause_frames: 759657 rx_pause_frames: 0 tx_pause_storm_events: 219 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-5-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 16:26:52 +01:00
Mohsin Bashir	9b7c8728f5	eth: fbnic: Add protection against pause storm Add protection against TX pause storms. A pause storm occurs when a device fails to send received packets up to the stack. When a pause storm is detected (pause state persists beyond the configured timeout), the device stops sending the pause frames and begins dropping packets instead of back-pressuring. The timeout is configurable via ethtool tunable (pfc-prevention-tout) with a maximum value of 10485ms, and the default value of 500ms. Once the device transitions to the storm-detected state, the service task periodically attempts recovery, returning the device to normal operation to handle any subsequent pause storm episodes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-4-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 16:26:52 +01:00
Mohsin Bashir	817de93c34	net: ethtool: Update doc for tunable ETHTOOL_PFC_PREVENTION_TOUT enables the configuration of timeout value for PFC storm prevention. This can also be used to configure storm detection timeout for global pause settings. In fact some existing drivers are already using it for the said purpose. Highlight that the knob can formally be used to configure timeout value for pause storm prevention mechanism. The update to the ethtool man page will follow afterwards. Link: https://lore.kernel.org/aa5f189a-ac62-4633-97b5-ebf939e9c535@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-3-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 16:26:52 +01:00
Mohsin Bashir	cc39325f92	net: ethtool: Track pause storm events With TX pause enabled, if a device is unable to pass packets up to the stack (e.g., CPU is hanged), the device can cause pause storm. Given that devices can have native support to protect the neighbor from such flooding, such events need some tracking. This support is to track TX pause storm events for better observability. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-2-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 16:26:52 +01:00
Paolo Abeni	bc531c2cc1	Merge branch 'gve-optimize-and-enable-hw-gro-for-dqo' Ankit Garg says: ==================== gve: optimize and enable HW GRO for DQO The DQO device has always performed HW GRO, not LRO. This series updates the feature bit and modifies the RX path to enhance support. It sets gso_segs correctly so the software stack can continue coalescing, and pulls network headers into the skb linear space to avoid multiple small memory copies when header-split is disabled. We also enable HW GRO by default on supported devices. ==================== Link: https://patch.msgid.link/20260303195549.2679070-1-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 15:49:53 +01:00
Ankit Garg	3c398063ef	gve: Enable hw-gro by default if device supported Change the driver's default behavior to enable hw-gro whenever supported for device. Performance observations: - We observed ~10% improvement in RX single stream throughput across various MTU sizes. - No change in TCP_RR/TCP_CRR latencies Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-5-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 15:49:51 +01:00
Ankit Garg	0c7025fd24	gve: pull network headers into skb linear part Currently, in DQO mode with hw-gro enabled, entire received packet is placed into skb fragments when header-split is disabled. This leaves the skb linear part empty, forcing the networking stack to do multiple small memory copies to access eth, IP and TCP headers. This patch adds a single memcpy to put all headers into linear portion before packet reaches the SW GRO stack; thus eliminating multiple smaller memcpy calls. Additionally, the criteria for calling napi_gro_frags() was updated. Since skb->head is now populated, we instead check if the SKB is the cached NAPI scratchpad to ensure we continue using the zero-allocation path. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-4-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 15:49:51 +01:00
Ankit Garg	ea4c117687	gve: fix SW coalescing when hw-GRO is used Leaving gso_segs unpopulated on hardware GRO packet prevents further coalescing by software stack because the kernel's GRO logic marks the SKB for flush because the expected length of all segments doesn't match actual payload length. Setting gso_segs correctly results in significantly more segments being coalesced as measured by the result of dev_gro_receive(). gso_segs are derived from payload length. When header-split is enabled, payload is in the non-linear portion of skb. And when header-split is disabled, we have to parse the headers to determine payload length. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-3-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 15:49:51 +01:00
Ankit Garg	e637c244b9	gve: Advertise NETIF_F_GRO_HW instead of NETIF_F_LRO The device behind DQO format has always coalesced packets per stricter hardware GRO spec even though it was being advertised as LRO. Update advertised capability to match device behavior. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-2-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 15:49:51 +01:00
Javen Xu	a90e3029f2	r8169: add support for RTL8125cp This patch adds support for chip RTL8125cp. Its XID is 0x708. We apply different configuration and firmware for RTL8125cp. Signed-off-by: Javen Xu <javen_xu@realsil.com.cn> Link: https://patch.msgid.link/20260303094611.450-1-javen_xu@realsil.com.cn [pabeni@redhat.com: changelog cleanup] Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 13:41:48 +01:00
Qingfang Deng	70836c8d0f	ppp: don't store tx skb in the fastpath Currently, ppp->xmit_pending is used in ppp_send_frame() to pass a skb to ppp_push(), and holds the skb when a PPP channel cannot immediately transmit it. This state is redundant because the transmit queue (ppp->file.xq) can already handle the backlog. Furthermore, during normal operation, an skb is queued in file.xq only to be immediately dequeued, causing unnecessary overhead. Refactor the transmit path to avoid stashing the skb when possible: - Remove ppp->xmit_pending. - Rename ppp_send_frame() to ppp_prepare_tx_skb(), and don't call ppp_push() in it. It returns 1 if the skb is consumed (dropped/handled) or 0 if it can be passed to ppp_push(). - Update ppp_push() to accept the skb. It returns 1 if the skb is consumed, or 0 if the channel is busy. - Optimize __ppp_xmit_process(): - Fastpath: If the queue is empty, attempt to send the skb directly via ppp_push(). If busy, queue it. - Slowpath: If the queue is not empty, process the backlog in file.xq. Split dequeuing loop into a separate function ppp_xmit_flush() so ppp_channel_push() uses that directly instead of passing a NULL skb to __ppp_xmit_process(). This simplifies the states and reduces locking in the fastpath. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Link: https://patch.msgid.link/20260303093219.234403-1-dqfext@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 13:26:46 +01:00
Erni Sri Satya Vennela	0172f8d802	net: mana: Add MAC address to vPort logs and clarify error messages Add MAC address to vPort configuration success message and update error message to be more specific about HWC message errors in mana_send_request. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260302174204.234837-1-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 12:20:14 +01:00
Paolo Abeni	6d32a196be	Merge tag 'nf-next-26-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next The following patchset contains Netfilter updates for net-next, including changes to IPv6 stack and updates to IPVS from Julian Anastasov. 1) ipv6: export fib6_lookup for nft_fib_ipv6 module 2) factor out ipv6_anycast_destination logic so its usable without dst_entry. These are dependencies for patch 3. 3) switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. This gets us ~13% higher packet rate in my tests. Patches 4 to 8, from Eric Dumazet, zap sk_callback_lock usage in netfilter. Patch 9 removes another sk_callback_lock instance. Remaining patches, from Julian Anastasov, improve IPVS, Quoting Julian: * Add infrastructure for resizable hash tables based on hlist_bl. * Change the 256-bucket service hash table to be resizable. * Change the global connection table to be per-net and resizable. * Make connection hashing more secure for setups with multiple services. netfilter pull request nf-next-26-03-04 * tag 'nf-next-26-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: ipvs: use more keys for connection hashing ipvs: switch to per-net connection table ipvs: use resizable hash table for services ipvs: add resizable hash tables rculist_bl: add hlist_bl_for_each_entry_continue_rcu netfilter: nfnetlink_queue: remove locking in nfqnl_get_sk_secctx netfilter: nfnetlink_queue: no longer acquire sk_callback_lock netfilter: nfnetlink_log: no longer acquire sk_callback_lock netfilter: nft_meta: no longer acquire sk_callback_lock in nft_meta_get_eval_skugid() netfilter: xt_owner: no longer acquire sk_callback_lock in mt_owner() netfilter: nf_log_syslog: no longer acquire sk_callback_lock in nf_log_dump_sk_uid_gid() netfilter: nft_fib_ipv6: switch to fib6_lookup ipv6: make ipv6_anycast_destination logic usable without dst_entry ipv6: export fib6_lookup for nft_fib_ipv6 ==================== Link: https://patch.msgid.link/20260304114921.31042-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 11:32:50 +01:00
Paolo Abeni	d8103bfe41	Merge branch 'amd-xgbe-add-support-for-p100a-platform' Raju Rangoju says: ==================== amd-xgbe: add support for P100a platform This patch series adds support for the AMD P100a platform featuring the ethernet controller PCI device ID 0x1122. The P100a platform uses different register access patterns and speed encoding compared to previous generation hardware (Yellow Carp,etc.) Key differences include: 1. Different XPCS window offset calculation due to changed memory mapping 2. 2.5G speed uses XGMII mode (ss=0x06) instead of GMII (ss=0x02) 3. Extended port speed bits (6-bit instead of 5-bit) for 5G support The series is organized as follows: Patch 1: Defines macros for MAC version numbers and speed select values to replace hardcoded magic numbers Patch 2: Adds the core P100a platform support with PCI ID, register configuration, and version-specific behavior Tested on AMD P100a platform verifying: - 10G/2.5G/1G/100M link establishment - PHY initialization and auto-negotiation - No register access errors ==================== Link: https://patch.msgid.link/20260302044409.1388430-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 11:18:19 +01:00
Raju Rangoju	ea274bf852	amd-xgbe: add support for P100a platform Add hardware support for the AMD P100a platform featuring the ethernet controller PCI device ID 0x1122. Platform-specific changes include: 1. PCI device ID and register configuration: - Add XGBE_P100a_PCI_DEVICE_ID (0x1122) for recognition - Configure platform-specific XPCS window registers - Disable CDR workaround and RRC for this platform 2. XPCS window offset calculation fix: The P100a platform uses a different memory mapping scheme for XPCS register access. The offset calculation differs between platforms: - Older platforms (YC): offset = base + (addr & mask) The address is masked first, then added to the window base. - P100a: offset = (base + addr) & mask The full address is added to base first, then masked. This is critical because using the wrong calculation causes register reads/writes to access incorrect addresses, leading to incorrect behaviour. 3. 2.5G speed mode handling: P100a uses XGMII mode (ss=0x06) for 2.5G instead of GMII mode (ss=0x02) used by older platforms. The MAC version check determines which mode to use. 4. Port speed bits extended: Extend XP_PROP_0_PORT_SPEEDS from 5 bits to 6 bits to support the additional 5G speed capability. 5. Rx adaptation disabled: Rx adaptation is disabled for P100a (MAC version 0x33) as this feature requires further development for this platform. 6. Rate change command for 2.5G: Use XGBE_MB_SUBCMD_2_5G_KX subcommand for 2.5G mode on P100a instead of XGBE_MB_SUBCMD_NONE used on older platforms. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260302044634.1388661-2-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 11:18:17 +01:00
Raju Rangoju	718e1f6dd0	amd-xgbe: define macros for MAC versions and speed select values Define symbolic constants for MAC hardware version numbers and speed select register values to improve code readability and maintainability. This replaces magic numbers like 0x30, 0x33, 0x07, 0x06, etc. with descriptive macro names that indicate their purpose: MAC versions: - XGBE_MAC_VER_30: Baseline version supporting Rx adaptation - XGBE_MAC_VER_33: P100a platform version Speed select values for MAC_TCR_SS register: - XGBE_MAC_SS_10G: 10Gbps XGMII mode - XGBE_MAC_SS_2_5G_GMII: 2.5Gbps GMII mode (older platforms) - XGBE_MAC_SS_2_5G_XGMII: 2.5Gbps XGMII mode (P100a) - XGBE_MAC_SS_1G: 1Gbps mode - XGBE_MAC_SS_100M: 100Mbps mode - XGBE_MAC_SS_10M: 10Mbps mode No functional changes. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260302044634.1388661-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 11:18:17 +01:00
Jori Koolstra	ad3dfa80be	dibs: change dibs_class to a const struct The class_create() call has been deprecated in favor of class_register() as the driver core now allows for a struct class to be in read-only memory. Change dibs_class to be a const struct class and drop the class_create() call. Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/ Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20260303163104.3749311-1-jkoolstra@xs4all.nl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 18:48:15 -08:00
Jakub Kicinski	d37f53822c	selftests: drv-net: update the README I have added some instructions for driver authors on the NIPA wiki: https://github.com/linux-netdev/nipa/wiki/Guidance-for-test-authors last year. Given the increasingly common use of LLMs let's add those in tree as well. Hopefully this will decrease the number of review comments we have to give to AI-assisted noobs. While at it sync the overall instructions with what's on the GitHub as well. Link: https://patch.msgid.link/20260303213626.2320308-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 18:47:23 -08:00
Dimitri Daskalakis	f85db97bc5	selftests: drv-net: rss: Fix error calculation in test_hitless_key_update This test verifies there are no errors when a devices RSS key is updated while traffic is flowing. The current check is a no-op since the last sample was subtracted from itself. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20260303202258.1595661-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 18:47:06 -08:00
Jakub Kicinski	d824c64a93	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2026-03-02 (ice, i40e, ixgbe) For ice: Simon Horman adds const modifier to read only member of a struct. For i40e: Yury Norov removes an unneeded check of bitmap_weight(). Andy Shevchenko adds a missing include. For ixgbe: Aleksandr changes declaration of a bitmap to utilize DECLARE_BITMAP() macro. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ixgbe: refactor: use DECLARE_BITMAP for ring state field i40e: Add missing wordpart.h header i40e: drop useless bitmap_weight() call in i40e_set_rxfh_fields() ice: Make name member of struct ice_cgu_pin_desc const ==================== Link: https://patch.msgid.link/20260304000800.3536872-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 18:37:22 -08:00
Eric Dumazet	c66e0f453d	net: use ktime_t in struct scm_timestamping_internal Instead of using struct timespec64 in scm_timestamping_internal, use ktime_t, saving 24 bytes in kernel stack. This makes tcp_update_recv_tstamps() small enough to be inlined. The ktime_t -> timespec64 conversions happen after socket lock has been released in tcp_recvmsg(), and only if the application requested them. $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/2 grow/shrink: 5/4 up/down: 146/-277 (-131) Function old new delta tcp_zerocopy_receive 2383 2425 +42 mptcp_recvmsg 1565 1607 +42 tcp_recvmsg_locked 3797 3823 +26 put_cmsg_scm_timestamping64 131 149 +18 put_cmsg_scm_timestamping 131 149 +18 __pfx_tcp_update_recv_tstamps 16 - -16 do_tcp_getsockopt 4024 4006 -18 tcp_recv_timestamp 474 430 -44 tcp_zc_handle_leftover 417 371 -46 __sock_recv_timestamp 1087 1031 -56 tcp_update_recv_tstamps 97 - -97 Total: Before=25223788, After=25223657, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20260304012747.881644-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 17:53:34 -08:00
Randy Dunlap	39ae83b0f5	net: openvswitch: clean up some kernel-doc warnings Fix some kernel-doc warnings in openvswitch.h: Mark enum placeholders that are not used as "private" so that kernel-doc comments are not needed for them. Correct names for 2 enum values: Warning: include/uapi/linux/openvswitch.h:300 Excess enum value '@OVS_VPORT_UPCALL_SUCCESS' description in 'ovs_vport_upcall_attr' Warning: include/uapi/linux/openvswitch.h:300 Excess enum value '@OVS_VPORT_UPCALL_FAIL' description in 'ovs_vport_upcall_attr' Convert one comment from "/*" kernel-doc to a plain C "/" comment: Warning: include/uapi/linux/openvswitch.h:638 This comment starts with '/*', but isn't a kernel-doc comment. Omit attributes for notifications. Add more kernel-doc: - add kernel-doc for kernel-only enums; - add missing kernel-doc for enum ovs_datapath_attr; - add missing kernel-doc for enum ovs_flow_attr; - add missing kernel-doc for enum ovs_sample_attr; - add kernel-doc for enum ovs_check_pkt_len_attr; - add kernel-doc for enum ovs_action_attr; - add kernel-doc for enum ovs_action_push_eth; - add kernel-doc for enum ovs_vport_attr; Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Ilya Maximets <i.maximets@ovn.org> Link: https://patch.msgid.link/20260304012437.469151-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 17:52:46 -08:00
Eric Dumazet	1d88db1615	tcp: move tcp_do_parse_auth_options() to net/ipv4/tcp.c tcp_do_parse_auth_options() fast path user is tcp_inbound_hash(). Move tcp_do_parse_auth_options() right before tcp_inbound_hash() so that it can be (auto)inlined by the compiler. As a bonus, stack canary is removed from tcp_inbound_hash(). Also use EXPORT_IPV6_MOD(tcp_do_parse_auth_options). $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/0 grow/shrink: 1/0 up/down: 131/0 (131) Function old new delta tcp_inbound_hash 565 696 +131 Total: Before=25223788, After=25223919, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260303191243.557245-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 17:46:42 -08:00
Jakub Kicinski	db739ff277	Merge branch 'rfs-use-high-order-allocations-for-hash-tables' Eric Dumazet says: ==================== rfs: use high-order allocations for hash tables This series adds rps_tag_ptr which encodes both a pointer and a size of a power-of-two hash table in a single long word. RFS hash tables (global and per rx-queue) are converted to rps_tag_ptr. This removes a cache line miss, and allows high-order allocations. The global hash table can benefit from huge pages. ==================== Link: https://patch.msgid.link/20260302181432.1836150-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:13 -08:00
Eric Dumazet	a435163d31	net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table Instead of storing the @log at the beginning of rps_dev_flow_table use 5 low order bits of the rps_tag_ptr to store the log of the size. This removes a potential cache line miss (for light traffic). This allows us to switch to one high-order allocation instead of vmalloc() when CONFIG_RFS_ACCEL is not set. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:10 -08:00
Eric Dumazet	b2cc61857e	net-sysfs: remove rcu field from 'struct rps_dev_flow_table' Remove rps_dev_flow_table_release() in favor of kvfree_rcu_mightsleep(). In the following pach, we will remove "u8 @log" field and 'struct rps_dev_flow_table' size will be a power-of-two. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:10 -08:00
Eric Dumazet	68b6394a22	net-sysfs: get rid of rps_dev_flow_lock Use unrcu_pointer() and xchg() in store_rps_dev_flow_table_cnt() instead of a dedicated spinlock. Make a similar change in rx_queue_release(), so that both functions use a similar construct and synchronization. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:10 -08:00
Eric Dumazet	dd378109d2	net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table Instead of storing the @mask at the beginning of rps_sock_flow_table, use 5 low order bits of the rps_tag_ptr to store the log of the size. This removes a potential cache line miss to fetch @mask. More importantly, we can switch to vmalloc_huge() without wasting memory. Tested with: numactl --interleave=all bash -c "echo 4194304 >/proc/sys/net/core/rps_sock_flow_entries" Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:09 -08:00
Eric Dumazet	9cde131cdd	net-sysfs: add rps_sock_flow_table_mask() helper In preparation of the following patch, abstract access to the @mask field in 'struct rps_sock_flow_table'. Also cleanup rps_sock_flow_sysctl() a bit : - Rename orig_sock_table to o_sock_table. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:09 -08:00
Eric Dumazet	61753849b8	net-sysfs: remove rcu field from 'struct rps_sock_flow_table' Removing rcu_head (and @mask in a following patch) will allow a power-of-two allocation and thus high-order allocation for better performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:09 -08:00
Eric Dumazet	42a101775b	net: add rps_tag_ptr type and helpers Add a new rps_tag_ptr type to encode a pointer and a size to a power-of-two table. Three helpers are added converting an rps_tag_ptr to: 1) A log of the size. 2) A mask : (size - 1). 3) A pointer to the array. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:54:09 -08:00
Eric Dumazet	c26b8c4e29	net: fix off-by-one in udp_flow_src_port() / psp_write_headers() udp_flow_src_port() and psp_write_headers() use ip_local_port_range. ip_local_port_range is inclusive : all ports between min and max can be used. Before this patch, if ip_local_port_range was set to 40000-40001 40001 would not be used as a source port. Use reciprocal_scale() to help code readability. Not tagged for stable trees, as this change could break user expectations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260302163933.1754393-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:51:10 -08:00
Jakub Kicinski	98d95000bb	Merge branch 'tools-ynl-tests-adjust-makefile-to-mimic-ksft' Jakub Kicinski says: ==================== tools: ynl: tests: adjust Makefile to mimic ksft Make a few minor adjustments to tools/net/ynl/tests/Makefile to align its behavior more with how real kselftests behave. This series allows running the YNL tests in NIPA with little extra integration effort. If anyone already integrated these tests into their CI minor adjustments to the integration may be needed (due to patch 2). ==================== Link: https://patch.msgid.link/20260303163504.2084981-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:41:57 -08:00
Jakub Kicinski	32d6fd5832	tools: ynl: produce kselftest-list.txt from tests Executors will need kselftest-list.txt so create it when tests are installed. Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260303163504.2084981-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:41:55 -08:00
Jakub Kicinski	2bfc36f5ea	tools: ynl: support INSTALL_PATH in the tests Makefile We have modelled the YNL tests after ksft to be able to reuse the NIPA wrappers. Make sure YNL honors INSTALL_PATH not just DESTDIR, ksft uses INSTALL_PATH. Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260303163504.2084981-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:41:55 -08:00
Jakub Kicinski	3e90e00da9	tools: ynl: don't install tests in /usr/bin/ Until commit `790792ebc9` ("tools: ynl: don't install tests") YNL selftests were installed with all the other YNL outputs. That's no longer the case, as tests are not really production artifacts. Let's not install them in /usr/bin at all, and mirror kselftest format more closely: For: make -C tools/net/ynl/tests/ install DESTDIR=tmp tmp/usr/share/kselftest ├── ktap_helpers.sh └── ynl ├── test_ynl_cli.sh └── test_ynl_ethtool.sh Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260303163504.2084981-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:41:55 -08:00
Jakub Kicinski	d86670b837	tools: ynl: rename TESTS variable to TEST_PROGS Use the standard kselftest variable naming for tests in the Makefile. NIPA depends on being able to selectively target tests by setting those variables on the CLI. Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260303163504.2084981-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:41:54 -08:00
Daniel Golle	aefa52a28a	net: dsa: mxl862xx: rename MDIO op arguments The use of the 'port' argument name for functions implementing the MDIO bus operations is misleading as the port address isn't equal to the PHY address. Rename the MDIO operation argument name to match the prototypes of mdiobus_write, mdiobus_read, mdiobus_c45_read and mdiobus_c45_write. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/e1f4cb3bcffc7df9af0f2c9b673b14c7e1201c9a.1772507674.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:41:19 -08:00
Daniel Golle	8838bb185e	dt-bindings: net: dsa: maxlinear,mxl862xx: remove port label The ports in the example device tree should not have a 'label' property. Labels for all user ports have been removed from an earlier submission, but this was overlooked in the case of the CPU port. Remove 'cpu' port label from the example. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/61579de297eb636ec5f1e6c97d453e26abb0625d.1772507210.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 16:40:36 -08:00
Jakub Kicinski	dbbda7dd68	Merge tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Notable features this time: - cfg80211/mac80211 - finished assoc frame encryption/EPPKE/802.1X-over-auth (also hwsim) - radar detection improvements - 6 GHz incumbent signal detection APIs - multi-link support for FILS, probe response templates and client probling - ath12k: - monitor mode support on IPQ5332 - basic hwmon temperature reporting * tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (38 commits) wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing wifi: mac80211_hwsim: change hwsim_class to a const struct wifi: mac80211: give the AP more time for EPPKE as well wifi: ath12k: Remove the unused argument from the Rx data path wifi: ath12k: Enable monitor mode support on IPQ5332 wifi: ath12k: Set up MLO after SSR wifi: ath11k: Silence remoteproc probe deferral prints wifi: cfg80211: support key installation on non-netdev wdevs wifi: cfg80211: make cluster id an array wifi: mac80211: update outdated comment wifi: mac80211: Advertise IEEE 802.1X authentication support wifi: mac80211: Add support for IEEE 802.1X authentication protocol in non-AP STA mode wifi: cfg80211: add support for IEEE 802.1X Authentication Protocol wifi: mac80211: Advertise EPPKE support based on driver capabilities wifi: mac80211_hwsim: Advertise support for (Re)Association frame encryption wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO wifi: rt2x00: use generic nvmem_cell_get wifi: mac80211: fetch unsolicited probe response template by link ID wifi: mac80211: fetch FILS discovery template by link ID wifi: nl80211: don't allow DFS channels for NAN ... ==================== Link: https://patch.msgid.link/20260304113707.175181-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-04 15:30:05 -08:00
Karthikeyan Kathirvel	44d93cf1ab	wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing Add UHR Operation and Capability definitions and parsing helpers: - Define ieee80211_uhr_dps_info, ieee80211_uhr_dbe_info, ieee80211_uhr_p_edca_info with masks. - Update ieee80211_uhr_oper_size_ok() to account for optional DPS/DBE/P-EDCA blocks. - Move NPCA pointer position after DPS Operation Parameter if it is present in ieee80211_uhr_oper_size_ok(). - Move NPCA pointer position after DPS info if it is present in ieee80211_uhr_npca_info(). Signed-off-by: Karthikeyan Kathirvel <karthikeyan.kathirvel@oss.qualcomm.com> Link: https://patch.msgid.link/20260304085343.1093993-2-karthikeyan.kathirvel@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-04 11:50:03 +01:00
Julian Anastasov	f20c73b046	ipvs: use more keys for connection hashing Simon Kirby reported long time ago that IPVS connection hashing based only on the client address/port (caddr, cport) as hash keys is not suitable for setups that accept traffic on multiple virtual IPs and ports. It can happen for multiple VIP:VPORT services, for single or many fwmark service(s) that match multiple virtual IPs and ports or even for passive FTP with peristence in DR/TUN mode where we expect traffic on multiple ports for the virtual IP. Fix it by adding virtual addresses and ports to the hash function. This causes the traffic from NAT real servers to clients to use second hashing for the in->out direction. As result: - the IN direction from client will use hash node hn0 where the source/dest addresses and ports used by client will be used as hash keys - the OUT direction from NAT real servers will use hash node hn1 for the traffic from real server to client - the persistence templates are hashed only with parameters based on the IN direction, so they now will also use the virtual address, port and fwmark from the service. OLD: - all methods: c_list node: proto, caddr:cport - persistence templates: c_list node: proto, caddr_net:0 - persistence engine templates: c_list node: per-PE, PE-SIP uses jhash NEW: - all methods: hn0 node (dir 0): proto, caddr:cport -> vaddr:vport - MASQ method: hn1 node (dir 1): proto, daddr:dport -> caddr:cport - persistence templates: hn0 node (dir 0): proto, caddr_net:0 -> vaddr:vport_or_0 proto, caddr_net:0 -> fwmark:0 - persistence engine templates: hn0 node (dir 0): as before Also reorder the ip_vs_conn fields, so that hash nodes are on same read-mostly cache line while write-mostly fields are on separate cache line. Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Julian Anastasov	2fa7cc9c70	ipvs: switch to per-net connection table Use per-net resizable hash table for connections. The global table is slow to walk when using many namespaces. The table can be resized in the range of [256 - ip_vs_conn_tab_size]. Table is attached only while services are present. Resizing is done by delayed work based on load (the number of connections). Add a hash_key field into the connection to store the table ID in the highest bit and the entry's hash value in the lowest bits. The lowest part of the hash value is used as bucket ID, the remaining part is used to filter the entries in the bucket before matching the keys and as result, helps the lookup operation to access only one cache line. By knowing the table ID and bucket ID for entry, we can unlink it without calculating the hash value and doing lookup by keys. We need only to validate the saved hash_key under lock. For better security switch from jhash to siphash for the default connection hashing but the persistence engines may use their own function. Keeping the hash table loaded with entries below the size (12%) allows to avoid collision for 96+% of the conns. ip_vs_conn_fill_cport() now will rehash the connection with proper locking because unhash+hash is not safe for RCU readers. To invalidate the templates setting just dport to 0xffff is enough, no need to rehash them. As result, ip_vs_conn_unhash() is now unused and removed. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Julian Anastasov	840aac3d90	ipvs: use resizable hash table for services Make the hash table for services resizable in the bit range of 4-20. Table is attached only while services are present. Resizing is done by delayed work based on load (the number of hashed services). Table grows when load increases 2+ times (above 12.5% with lfactor=-3) and shrinks 8+ times when load decreases 16+ times (below 0.78%). Switch to jhash hashing to reduce the collisions for multiple services. Add a hash_key field into the service to store the table ID in the highest bit and the entry's hash value in the lowest bits. The lowest part of the hash value is used as bucket ID, the remaining part is used to filter the entries in the bucket before matching the keys and as result, helps the lookup operation to access only one cache line. By knowing the table ID and bucket ID for entry, we can unlink it without calculating the hash value and doing lookup by keys. We need only to validate the saved hash_key under lock. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Julian Anastasov	b655388111	ipvs: add resizable hash tables Add infrastructure for resizable hash tables based on hlist_bl which we will use in followup patches. The tables allow RCU lookups during resizing, bucket modifications are protected with per-bucket bit lock and additional custom locking, the tables are resized when load reaches thresholds determined based on load factor parameter. Compared to other implementations we rely on: * fast entry removal by using node unlinking without pre-lookup * entry rehashing when hash key changes * entries can contain multiple hash nodes * custom locking depending on different contexts * adjustable load factor to customize the grow/shrink process Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Julian Anastasov	1ac252ad03	rculist_bl: add hlist_bl_for_each_entry_continue_rcu Change the old hlist_bl_first_rcu to hlist_bl_first_rcu_dereference to indicate that it is a RCU dereference. Add hlist_bl_next_rcu and hlist_bl_first_rcu to use RCU pointers and use them to fix sparse warnings. Add hlist_bl_for_each_entry_continue_rcu. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Florian Westphal	34a6a003d4	netfilter: nfnetlink_queue: remove locking in nfqnl_get_sk_secctx We don't need the cb lock here. Also, if skb was NULL we'd have crashed already. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:45 +01:00
Eric Dumazet	013e2f91d0	netfilter: nfnetlink_queue: no longer acquire sk_callback_lock After commit `983512f3a8` ("net: Drop the lock in skb_may_tx_timestamp()") from Sebastian Andrzej Siewior, apply the same logic in nfqnl_put_sk_uidgid() to avoid touching sk->sk_callback_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:44 +01:00
Eric Dumazet	b297aaefc6	netfilter: nfnetlink_log: no longer acquire sk_callback_lock After commit `983512f3a8` ("net: Drop the lock in skb_may_tx_timestamp()") from Sebastian Andrzej Siewior, apply the same logic in __build_packet_message() to avoid touching sk->sk_callback_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-03-04 11:45:44 +01:00

1 2 3 4 5 ...

1427313 Commits