linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-06 00:47:56 -04:00

Author	SHA1	Message	Date
David Arinzon	51d58804a5	net: ena: PHC silent reset Each PHC device kernel registration receives a unique kernel index, which is associated with a new PHC device file located at "/dev/ptp<index>". This device file serves as an interface for obtaining PHC timestamps. Examples of tools that use "/dev/ptp" include testptp [1] and chrony [2]. A reset flow may occur in the ENA driver while PHC is active. During a reset, the driver will unregister and then re-register the PHC device with the kernel. Under race conditions, particularly during heavy PHC loads, the driver’s reset flow might complete faster than the kernel’s PHC unregister/register process. This can result in the PHC index being different from what it was prior to the reset, as the PHC index is selected using kernel ID allocation [3]. While driver rmmod/insmod are done by the user, a reset may occur at anytime, without the user awareness, consequently, the driver might receive a new PHC index after the reset, potentially compromising the user experience. To prevent this issue, the PHC flow will detect the reset during PHC destruction and will skip the PHC unregister/register calls to preserve the kernel PHC index. During the reset flow, any attempt to get a PHC timestamp will fail as expected, but the kernel PHC index will remain unchanged. [1]: https://github.com/torvalds/linux/blob/v6.1/tools/testing/selftests/ptp/testptp.c [2]: https://github.com/mlichvar/chrony [3]: https://www.kernel.org/doc/html/latest/core-api/idr.html Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:57:28 -07:00
David Arinzon	e0ea34158e	net: ena: Add PHC support in the ENA driver The ENA driver will be extended to support the new PHC feature using ptp_clock interface [1]. this will provide timestamp reference for user space to allow measuring time offset between the PHC and the system clock in order to achieve nanosecond accuracy. [1] - https://www.kernel.org/doc/html/latest/driver-api/ptp.html Signed-off-by: Amit Bernstein <amitbern@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617110545.5659-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:57:28 -07:00
Jakub Kicinski	253833da4e	Merge branch 'udp_tunnel-remove-rtnl_lock-dependency' Stanislav Fomichev says: ==================== udp_tunnel: remove rtnl_lock dependency Recently bnxt had to grow back a bunch of rtnl dependencies because of udp_tunnel's infra. Add separate (global) mutext to protect udp_tunnel state. ==================== Link: https://patch.msgid.link/20250616162117.287806-1-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:53 -07:00
Stanislav Fomichev	850d9248d2	Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path" This reverts commit `325eb217e4`. udp_tunnel infra doesn't need RTNL, should be safe to get back to only netdev instance lock. Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-7-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
Stanislav Fomichev	e054c8ba3b	netdevsim: remove udp_ports_sleep Now that there is only one path in udp_tunnel, there is no need to have udp_ports_sleep knob. Remove it and adjust the test. Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-6-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
Stanislav Fomichev	3a321b6b1f	net: remove redundant ASSERT_RTNL() in queue setup functions The existing netdev_ops_assert_locked() already asserts that either the RTNL lock or the per-device lock is held, making the explicit ASSERT_RTNL() redundant. Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-5-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
Stanislav Fomichev	1ead750109	udp_tunnel: remove rtnl_lock dependency Drivers that are using ops lock and don't depend on RTNL lock still need to manage it because udp_tunnel's RTNL dependency. Introduce new udp_tunnel_nic_lock and use it instead of rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to grab udp_tunnel_nic_lock mutex and might sleep). Cover more places in v4: - netlink - udp_tunnel_notify_add_rx_port (ndo_open) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_notify_del_rx_port (ndo_stop) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_get_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - udp_tunnel_drop_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_DROP_INFO - udp_tunnel_nic_reset_ntf (ndo_open) - notifiers - udp_tunnel_nic_netdevice_event, depending on the event: - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - triggers NETDEV_UDP_TUNNEL_DROP_INFO - ethnl_tunnel_info_reply_size - udp_tunnel_nic_set_port_priv (two intel drivers) Cc: Michael Chan <michael.chan@broadcom.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
Stanislav Fomichev	df5425b3bd	vxlan: drop sock_lock We won't be able to sleep soon in vxlan_offload_rx_ports and won't be able to grab sock_lock. Instead of having separate spinlock to manage sockets, rely on rtnl lock. This is similar to how geneve manages its sockets. Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250616162117.287806-3-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
Stanislav Fomichev	3e14960f3b	geneve: rely on rtnl lock in geneve_offload_rx_ports udp_tunnel_push_rx_port will grab mutex in the next patch so we can't use rcu. geneve_offload_rx_ports is called from geneve_netdevice_event for NETDEV_UDP_TUNNEL_PUSH_INFO and NETDEV_UDP_TUNNEL_DROP_INFO which both have ASSERT_RTNL. Entries are added to and removed from the sock_list under rtnl lock as well (when adding or removing a tunneling device). Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250616162117.287806-2-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
Yue Haibing	a33556940b	tcp: Remove inet_hashinfo2_free_mod() DCCP was removed, inet_hashinfo2_free_mod() is unused now. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250617130613.498659-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:29:58 -07:00
Heiner Kallweit	d8155c1df5	dpaa_eth: don't use fixed_phy_change_carrier This effectively reverts `6e8b0ff1ba` ("dpaa_eth: Add change_carrier() for Fixed PHYs"). Usage of fixed_phy_change_carrier() requires that fixed_phy_register() has been called before, directly or indirectly. And that's not the case in this driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/7eb189b3-d5fd-4be6-8517-a66671a4e4e3@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:28:54 -07:00
Mark Zhang	e0e3265acf	net/mlx4e: Don't redefine IB_MTU_XXX enum Rely on existing IB_MTU_XXX definitions which exist in ib_verbs.h. Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/382c91ee506e7f1f3c1801957df6b28963484b7d.1750147222.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 14:17:56 -07:00
Simon Horman	a9874d961e	nfc: Remove checks for nla_data returning NULL The implementation of nla_data is as follows: static inline void nla_data(const struct nlattr nla) { return (char *) nla + NLA_HDRLEN; } Excluding the case where nla is exactly -NLA_HDRLEN, it will not return NULL. And it seems misleading to assume that it can, other than in this corner case. So drop checks for this condition. Flagged by Smatch. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250617-nfc-null-data-v1-1-c7525ead2e95@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 14:17:32 -07:00
Jakub Kicinski	4f451b977e	Merge branch 'eth-migrate-more-drivers-to-new-rxfh-callbacks' Jakub Kicinski says: ==================== eth: migrate more drivers to new RXFH callbacks Migrate a batch of drivers to the recently added dedicated .get_rxfh_fields and .set_rxfh_fields ethtool callbacks. ==================== Link: https://patch.msgid.link/20250617014848.436741-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:19:08 -07:00
Jakub Kicinski	c2cd2f6125	eth: sxgbe: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). RXFH is all this driver supports in RXNFC so old callbacks are completely removed. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:19:01 -07:00
Jakub Kicinski	20ffe3bbc2	eth: dpaa2: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:19:00 -07:00
Jakub Kicinski	17da66f140	eth: dpaa: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). RXFH is all this driver supports in RXNFC so old callbacks are completely removed. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:19:00 -07:00
Jakub Kicinski	b6f7e4fafe	eth: mvpp2: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:19:00 -07:00
Jakub Kicinski	b82d92dd71	eth: niu: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250617014848.436741-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:19:00 -07:00
Jakub Kicinski	2fca0d1277	Merge branch 'eth-migrate-some-drivers-to-new-rxfh-callbacks' Jakub Kicinski says: ==================== eth: migrate some drivers to new RXFH callbacks Migrate a batch of drivers to the recently added dedicated .get_rxfh_fields and .set_rxfh_fields ethtool callbacks. ==================== Link: https://patch.msgid.link/20250617014555.434790-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:17:52 -07:00
Jakub Kicinski	f99ff3c2a3	eth: otx2: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Link: https://patch.msgid.link/20250617014555.434790-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:17:50 -07:00
Jakub Kicinski	e8b8738439	eth: thunder: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). The driver has no other RXNFC functionality so the SET callback can be now removed. Link: https://patch.msgid.link/20250617014555.434790-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:17:47 -07:00
Jakub Kicinski	e7860a6e18	eth: ena: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). The driver has no other RXNFC functionality so the SET callback can be now removed. Reviewed-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20250617014555.434790-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:17:43 -07:00
Jakub Kicinski	82113468a0	eth: bnxt: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250617014555.434790-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:17:38 -07:00
Jakub Kicinski	f1a6fcc454	eth: bnx2x: migrate to new RXFH callbacks Migrate to new callbacks added by commit `9bb00786fc` ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). The driver has no other RXNFC functionality so the SET callback can be now removed. Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/20250617014555.434790-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 13:17:32 -07:00
David S. Miller	fc4842cd0f	Merge branch 'netconsole-msgid' into main Gustavo Luiz Duarte says: ==================== netconsole: Add support for msgid in sysdata This patch series introduces a new feature to netconsole which allows appending a message ID to the userdata dictionary. If the msgid feature is enabled, the message ID is built from a per-target 32 bit counter that is incremented and appended to every message sent to the target. Example:: echo 1 > "/sys/kernel/config/netconsole/cmdline0/userdata/msgid_enabled" echo "This is message #1" > /dev/kmsg echo "This is message #2" > /dev/kmsg 13,434,54928466,-;This is message #1 msgid=1 13,435,54934019,-;This is message #2 msgid=2 This feature can be used by the target to detect if messages were dropped or reordered before reaching the target. This allows system administrators to assess the reliability of their netconsole pipeline and detect loss of messages due to network contention or temporary unavailability. --- Changes in v3: - Add kdoc documentation for msgcounter. - Link to v2: https://lore.kernel.org/r/20250612-netconsole-msgid-v2-0-d4c1abc84bac@gmail.com Changes in v2: - Use wrapping_assign_add() to avoid warnings in UBSAN and friends. - Improve documentation to clarify wrapping and distinguish msgid from sequnum. - Rebase and fix conflict in prepare_extradata(). - Link to v1: https://lore.kernel.org/r/20250611-netconsole-msgid-v1-0-1784a51feb1e@gmail.com ==================== Suggested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-18 10:46:31 +01:00
Gustavo Luiz Duarte	8c587aa3fa	docs: netconsole: document msgid feature Add documentation explaining the msgid feature in netconsole. This feature appends unique id to the userdata dictionary. The message ID is populated from a per-target 32 bit counter which is incremented for each message sent to the target. This allows a target to detect if messages are dropped before reaching the target. Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-18 10:46:10 +01:00
Gustavo Luiz Duarte	68707c079e	selftests: netconsole: Add tests for 'msgid' feature in sysdata Extend the self-tests to cover the 'msgid' feature in sysdata. Verify that msgid is appended to the message when the feature is enabled and that it is not appended when the feature is disabled. Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-18 10:46:10 +01:00
Gustavo Luiz Duarte	c5efaabd45	netconsole: append msgid to sysdata Add msgcounter to the netconsole_target struct to generate message IDs. If the msgid_enabled attribute is true, increment msgcounter and append msgid=<msgcounter> to sysdata buffer before sending the message. Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-18 10:46:10 +01:00
Gustavo Luiz Duarte	53def0c4c8	netconsole: implement configfs for msgid_enabled Implement the _show and _store functions for the msgid_enabled configfs attribute under userdata. Set the sysdata_fields bit accordingly. Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-18 10:46:10 +01:00
Gustavo Luiz Duarte	15b3c930a2	netconsole: introduce 'msgid' as a new sysdata field This adds a new sysdata field to enable assigning a per-target unique id to each message sent to that target. This id can later be appended as part of sysdata, allowing targets to detect dropped netconsole messages. Update count_extradata_entries() to take the new field into account. Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-18 10:46:10 +01:00
Simon Horman	ec315832f6	dpll: remove documentation of rclk_dev_name Remove documentation of rclk_dev_name member of dpll_device which doesn't exist. Flagged by ./scripts/kernel-doc -none Introduced by commit `9431063ad3` ("dpll: core: Add DPLL framework base functions") Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250616-dpll-member-v1-1-8c9e6b8e1fd4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:53:37 -07:00
Jakub Kicinski	189bd9c873	Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== libeth: add libeth_xdp helper lib Alexander Lobakin says: Time to add XDP helpers infra to libeth to greatly simplify adding XDP to idpf and iavf, as well as improve and extend XDP in ice and i40e. Any vendor is free to reuse helpers. If this happens, I'm fine with moving the folder of out intel/. The helpers greatly simplify building xdp_buff, running a prog, handling the verdict, implement XDP_TX, .ndo_xdp_xmit, XDP buffer completion. Same applies to XSk (with XSk xmit instead of .ndo_xdp_xmit, plus stuff like XSk wakeup). They are entirely generic with no HW definitions or assumptions. HW-specific stuff like parsing Rx desc / filling Tx desc is passed from the driver as inline callbacks. For now, key assumptions that optimize performance / avoid code bloat, but might not fit every driver in driver/net/: * netmem holding the buffers are always order-0; * driver has separate XDP Tx queues, doesn't use stack queues for that. For best efficiency, you may want to have nr_cpu_ids XDP queues, but less (queue sharing) is also supported; * XDP Tx queues are interrupt-less and use "lazy" cleaning only when there are less than 1/4 free Tx descriptors of the queue size; * main target platforms are 64-bit, although 32-bit is also fully supported, but the code might be not as optimized for them. Library code already supports multi-buffer for all kinds of Tx and both header split and no split for Rx and Tx. Frags can come from devmem/io_uring etc., direct `struct page ` is used only for header buffers for which it's always true. Drivers are free to pass their own Rx hints and XSK xmit hints ops. XDP_TX and ndo_xdp_xmit use onstack bulk for the frames to be sent and send them by batches of 16 buffers. This eats ~280 bytes on the stack, but gives good boosts and allow to greatly optimize the main sending function leaving it without any error/exception paths. XSk xmit fills Tx descriptors in the loop unrolled by 8. This was proven to improve perf on ice and i40e. XDP_TX and ndo_xdp_xmit doesn't use unrolling as I wasn't able to get any improvements in those scenenarios from this, while +1 Kb for their sending functions for nothing doesn't sound reasonable. XSk wakeup, instead of traditionally used "SW interrupts" provided by NICs, uses IPI to schedule NAPI on the CPU corresponding to the given queue pair. It gives better control over CPU distribution and in general performs way better than "SW interrupts", plus allows us to not pass any HW-specific callbacks there. The code is built the way that all callbacks passed from drivers get inlined; in general, most of hotpath gets inlined. Everything slow/exception lands to .c files in the libeth folder, doesn't create copies in the drivers themselves and doesn't overloat hotpath. Sure, inlining means that hotpath will be compiled into every driver that uses the lib, but the core code is written in one place, so no copying of bugs happens. Fixed once -- works everywhere. The last commit might look like sorta hack, but it gives really good boosts and decreases object code size, plus there are checks that all those wider accesses are fully safe, so I don't feel anything bad about it. An example of using libeth_xdp can be found either on my GitHub or on the mailing lists here ("XDP for idpf"). Macros for building driver XDP functions lead to that some implementations (XDP_TX, ndo_xdp_xmit etc.) consist of really only a few lines. '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: libeth: xdp, xsk: access adjacent u32s as u64 where applicable libeth: xsk: add XSkFQ refill and XSk wakeup helpers libeth: xsk: add XSk Rx processing support libeth: xsk: add XSk xmit functions libeth: xsk: add XSk XDP_TX sending helpers libeth: xdp: add RSS hash hint and XDP features setup helpers libeth: xdp: add templates for building driver-side callbacks libeth: xdp: add XDP prog run and verdict result handling libeth: xdp: add helpers for preparing/processing &libeth_xdp_buff libeth: xdp: add XDPSQ cleanup timers libeth: xdp: add XDPSQ locking helpers libeth: xdp: add XDPSQE completion helpers libeth: xdp: add .ndo_xdp_xmit() helpers libeth: xdp: add XDP_TX buffers sending libeth: support native XDP and register memory model libeth: convert to netmem libeth, libie: clean symbol exports up a little ==================== Link: https://patch.msgid.link/20250616201639.710420-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:50:57 -07:00
Jakub Kicinski	8152c4028c	Merge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy' Mark Bloch says: ==================== net/mlx5e: Add support for devmem and io_uring TCP zero-copy This series adds support for zerocopy rx TCP with devmem and io_uring for ConnectX7 NICs and above. For performance reasons and simplicity HW-GRO will also be turned on when header-data split mode is on. Performance =========== Test setup: * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (single NUMA) * NIC: ConnectX7 * Benchmarking tool: kperf [0] * Single TCP flow * Test duration: 60s With application thread and interrupts pinned to the same core: \|------+-----------+----------\| \| MTU \| epoll \| io_uring \| \|------+-----------+----------\| \| 1500 \| 61.6 Gbps \| 114 Gbps \| \| 4096 \| 69.3 Gbps \| 151 Gbps \| \| 9000 \| 67.8 Gbps \| 187 Gbps \| \|------+-----------+----------\| The CPU usage for io_uring is 95%. Reproduction steps for io_uring: server --no-daemon -a 2001:db8::1 --no-memcmp --iou --iou_sendzc \ --iou_zcrx --iou_dev_name eth2 --iou_zcrx_queue_id 2 server --no-daemon -a 2001:db8::2 --no-memcmp --iou --iou_sendzc client --src 2001:db8::2 --dst 2001:db8::1 \ --msg-zerocopy -t 60 --cpu-min=2 --cpu-max=2 Patch overview: ================ First, a netmem API for skb_can_coalesce is added to the core to be able to do skb fragment coalescing on netmems. The next patches introduce some cleanups in the internal SHAMPO code and improvements to hw gro capability checks in FW. A separate page_pool is introduced for headers, to be used only when the rxq has a memory provider. Then the driver is converted to use the netmem API and to allow support for unreadable netmem page pool. The queue management ops are implemented. Finally, the tcp-data-split ring parameter is exposed. References ========== [0] kperf: git://git.kernel.dk/kperf.git v1: https://lore.kernel.org/20250116215530.158886-1-saeed@kernel.org v2: https://lore.kernel.org/1747950086-1246773-1-git-send-email-tariqt@nvidia.com v3: https://lore.kernel.org/20250609145833.990793-1-mbloch@nvidia.com v4: https://lore.kernel.org/20250610150950.1094376-1-mbloch@nvidia.com v5: https://lore.kernel.org/20250612154648.1161201-1-mbloch@nvidia.com ==================== Link: https://patch.msgid.link/20250616141441.1243044-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:15 -07:00
Dragos Tatulea	5a842c288c	net/mlx5e: Add TX support for netmems Declare netmem TX support in netdev. As required, use the netmem aware dma unmapping APIs for unmapping netmems in tx completion path. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-13-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:13 -07:00
Saeed Mahameed	46bcce5dfd	net/mlx5e: Support ethtool tcp-data-split settings In mlx5, tcp header-data split requires HW GRO to be on. Enabling it fails when HW GRO is off. mlx5e_fix_features now keeps HW GRO on when tcp data split is enabled. Finally, when tcp data split is disabled, features are updated to maybe remove the forced HW GRO. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-12-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:13 -07:00
Saeed Mahameed	b2588ea40e	net/mlx5e: Implement queue mgmt ops and single channel swap The bulk of the work is done in mlx5e_queue_mem_alloc, where we allocate and create the new channel resources, similar to mlx5e_safe_switch_params, but here we do it for a single channel using existing params, sort of a clone channel. To swap the old channel with the new one, we deactivate and close the old channel then replace it with the new one, since the swap procedure doesn't fail in mlx5, we do it all in one place (mlx5e_queue_start). Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250616141441.1243044-11-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:13 -07:00
Saeed Mahameed	db3010bb5a	net/mlx5e: Add support for UNREADABLE netmem page pools On netdev_rx_queue_restart, a special type of page pool maybe expected. In this patch declare support for UNREADABLE netmem iov pages in the pool params only when header data split shampo RQ mode is enabled, also set the queue index in the page pool params struct. Shampo mode requirement: Without header split rx needs to peek at the data, we can't do UNREADABLE_NETMEM. The patch also enables the use of a separate page pool for headers when a memory provider is installed for the queue, otherwise the same common page pool continues to be used. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-10-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:12 -07:00
Saeed Mahameed	d1668f1199	net/mlx5e: Convert over to netmem mlx5e_page_frag holds the physical page itself, to naturally support zc page pools, remove physical page reference from mlx5 and replace it with netmem_ref, to avoid internal handling in mlx5 for net_iov backed pages. SHAMPO can issue packets that are not split into header and data. These packets will be dropped if the data part resides in a net_iov as the driver can't read into this area. No performance degradation observed. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:12 -07:00
Saeed Mahameed	e225d9bd93	net/mlx5e: SHAMPO: Separate pool for headers Allow allocating a separate page pool for headers when SHAMPO is on. This will be useful for adding support to zc page pool, which has to be different from the headers page pool. For now, the pools are the same. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:12 -07:00
Saeed Mahameed	d2760abded	net/mlx5e: SHAMPO: Improve hw gro capability checking Add missing HW capabilities, declare the feature in netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev. No functional change here as all by default disabled features are explicitly disabled at the bottom of the function. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:12 -07:00
Saeed Mahameed	16142defd3	net/mlx5e: SHAMPO: Remove redundant params Two SHAMPO params are static and always the same, remove them from the global mlx5e_params struct. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:12 -07:00
Saeed Mahameed	af4312c4c9	net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc Drop redundant SHAMPO structure alloc/free functions. Gather together function calls pertaining to header split info, pass header per WQE (hd_per_wqe) as parameter to those function to avoid use before initialization future mistakes. Allocate HW GRO related info outside of the header related info scope. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:12 -07:00
Dragos Tatulea	a202f24b08	page_pool: Add page_pool_dev_alloc_netmems helper This is the netmem counterpart of page_pool_dev_alloc_pages() which uses the default GFP flags for RX. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:12 -07:00
Dragos Tatulea	1cbb49f85b	net: Add skb_can_coalesce for netmem Allow drivers that have moved over to netmem to do fragment coalescing. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:11 -07:00
Dragos Tatulea	c9e1225352	net: Allow const args for of page_to_netmem() This allows calling page_to_netmem() with a const page * argument. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:34:11 -07:00
Tejun Heo	fd0406e5ca	net: tcp: tsq: Convert from tasklet to BH workqueue The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts TCP Small Queues implementation from tasklet to BH workqueue. Semantically, this is an equivalent conversion and there shouldn't be any user-visible behavior changes. While workqueue's queueing and execution paths are a bit heavier than tasklet's, unless the work item is being queued every packet, the difference hopefully shouldn't matter. My experience with the networking stack is very limited and this patch definitely needs attention from someone who actually understands networking. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/aFBeJ38AS1ZF3Dq5@slm.duckdns.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:29:21 -07:00
Jakub Kicinski	e15962ae74	Merge branch 'ipmr-ip6mr-allow-mc-routing-locally-generated-mc-packets' Petr Machata says: ==================== ipmr, ip6mr: Allow MC-routing locally-generated MC packets Multicast routing is today handled in the input path. Locally generated MC packets don't hit the IPMR code. Thus if a VXLAN remote address is multicast, the driver needs to set an OIF during route lookup. In practice that means that MC routing configuration needs to be kept in sync with the VXLAN FDB and MDB. Ideally, the VXLAN packets would be routed by the MC routing code instead. To that end, this patchset adds support to route locally generated multicast packets. However, an installation that uses a VXLAN underlay netdevice for which it also has matching MC routes, would get a different routing with this patch. Previously, the MC packets would be delivered directly to the underlay port, whereas now they would be MC-routed. In order to avoid this change in behavior, introduce an IPCB/IP6CB flag. Unless the flag is set, the new MC-routing code is skipped. All this is keyed to a new VXLAN attribute, IFLA_VXLAN_MC_ROUTE. Only when it is set does any of the above engage. In addition to that, and as is the case today with MC forwarding, IPV4_DEVCONF_MC_FORWARDING must be enabled for the netdevice that acts as a source of MC traffic (i.e. the VXLAN PHYS_DEV), so an MC daemon must be attached to the netdevice. When a VXLAN netdevice with a MC remote is brought up, the physical netdevice joins the indicated MC group. This is important for local delivery of MC packets, so it is still necessary to configure a physical netdevice -- the parameter cannot go away. The netdevice would however typically not be a front panel port, but a dummy. An MC daemon would then sit on top of that netdevice as well as any front panel ports that it needs to service, and have routes set up between the two. A way to configure the VXLAN netdevice to take advantage of the new MC routing would be: # ip link add name d up type dummy # ip link add name vx10 up type vxlan id 1000 dstport 4789 \ local 192.0.2.1 group 225.0.0.1 ttl 16 dev d mrcoute # ip link set dev vx10 master br # plus vlans etc. With the following MC routes: (192.0.2.1, 225.0.0.1) iif=d oil=swp1,swp2 # TX route (, 225.0.0.1) iif=swp1 oil=d,swp2 # RX route (, 225.0.0.1) iif=swp2 oil=d,swp1 # RX route The RX path has not changed, with the exception of an extra MC hop. Packets are delivered to the front panel port and MC-forwarded to the VXLAN physical port, here "d". Since the port has joined the multicast group, the packets are locally delivered, and end up being processed by the VXLAN netdevice. This patchset is based on earlier patches from Nikolay Aleksandrov and Roopa Prabhu, though it underwent significant changes. Roopa broadly presented the topic on LPC 2019 [0]. Patchset progression: - Patches #1 to #4 add ip_mr_output() - Patches #5 to #10 add ip6_mr_output() - Patch #11 adds the VXLAN bits to enable MR engagement - Patches #12 to #14 prepare selftest libraries - Patch #15 includes a new test suite [0] https://www.youtube.com/watch?v=xlReECfi-uo ==================== Link: https://patch.msgid.link/cover.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:49 -07:00
Petr Machata	e3180379e2	selftests: forwarding: Add a test for verifying VXLAN MC underlay Add tests for MC-routing underlay VXLAN traffic. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/eecd2c0fefc754182e74be8e8e65751bf5749c21.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:46 -07:00
Petr Machata	237f84a6d2	selftests: forwarding: adf_mcd_start(): Allow configuring custom interfaces Tests may wish to add other interfaces to listen on. Notably locally generated traffic uses dummy interfaces. The multicast daemon needs to know about these so that it allows forming rules that involve these interfaces, and so that net.ipv4.conf.X.mc_forwarding is set for the interfaces. To that end, allow passing in a list of interfaces to configure in addition to all the physical ones. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/2e8d83297985933be4850f2b9f296b3c27110388.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:46 -07:00

1 2 3 4 5 ...

1367723 Commits