linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 15:13:44 -04:00

Author	SHA1	Message	Date
Csókás, Bence	67800d2961	net: fec: Refactor MAC reset to function The core is reset both in `fec_restart()` (called on link-up) and `fec_stop()` (going to sleep, driver remove etc.). These two functions had their separate implementations, which was at first only a register write and a `udelay()` (and the accompanying block comment). However, since then we got soft-reset (MAC disable) and Wake-on-LAN support, which meant that these implementations diverged, often causing bugs. For instance, as of now, `fec_stop()` does not check for `FEC_QUIRK_NO_HARD_RESET`, meaning the MII/RMII mode is cleared on eg. a PM power-down event; and `fec_restart()` missed the refactor renaming the "magic" constant `1` to `FEC_ECR_RESET`. To harmonize current implementations, and eliminate this source of potential future bugs, refactor implementation to a common function. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Link: https://patch.msgid.link/20250207121255.161146-2-csokas.bence@prolan.hu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-02-11 10:55:25 +01:00
Ido Schimmel	907dd32b4a	mlxsw: Enable Tx checksum offload The device is able to checksum plain TCP / UDP packets over IPv4 / IPv6 when the 'ipcs' bit in the send descriptor is set. Advertise support for the 'NETIF_F_IP{,6}_CSUM' features in net devices registered by the driver and VLAN uppers and set the 'ipcs' bit when the stack requests Tx checksum offload. Note that the device also calculates the IPv4 checksum, but it first zeroes the current checksum so there should not be any difference compared to the checksum calculated by the kernel. On SN5600 (Spectrum-4) there is about 10% improvement in Tx packet rate with 1400 byte packets when using pktgen. Tested on Spectrum-{1,2,3,4} with all the combinations of IPv4 / IPv6, TCP / UDP, with and without VLAN. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8dc86c95474ce10572a0fa83b8adb0259558e982.1738950446.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:10:51 -08:00
Jakub Kicinski	3337064f42	selftests: drv-net: add helper for path resolution Refering to C binaries from Python code is going to be a common need. Add a helper to convert from path in relation to the test. Meaning, if the test is in the same directory as the binary, the call would be simply: cfg.rpath("binary"). The helper name "rpath" is not great. I can't think of a better name that would be accurate yet concise. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250207184140.1730466-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:10:10 -08:00
Jakub Kicinski	29604bc2aa	selftests: drv-net: factor out a DrvEnv base class We have separate Env classes for local tests and tests with a remote endpoint. Make it easier to share the code by creating a base class. Make env loading a method of this class. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250207184140.1730466-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:10:10 -08:00
Jakub Kicinski	a980da54b6	selftests: drv-net: remove an unnecessary libmnl include ncdevmem doesn't need libmnl, remove the unnecessary include. Since YNL doesn't depend on libmnl either, any more, it's actually possible to build selftests without having libmnl installed. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250207183119.1721424-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:09:58 -08:00
Jakub Kicinski	f3737edbc9	Merge branch 'fib-rules-convert-rtm_newrule-and-rtm_delrule-to-per-netns-rtnl' Kuniyuki Iwashima says: ==================== fib: rules: Convert RTM_NEWRULE and RTM_DELRULE to per-netns RTNL. Patch 1 ~ 2 are small cleanup, and patch 3 ~ 8 make fib_nl_newrule() and fib_nl_delrule() hold per-netns RTNL. v1: https://lore.kernel.org/20250206084629.16602-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250207072502.87775-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:55 -08:00
Kuniyuki Iwashima	88b9cfca8d	net: fib_rules: Convert RTM_DELRULE to per-netns RTNL. fib_nl_delrule() is the doit() handler for RTM_DELRULE but also called from vrf_newlink() in case something fails in vrf_add_fib_rules(). In the latter case, RTNL is already held and the 4th arg is true. Let's hold per-netns RTNL in fib_delrule() if rtnl_held is false. Now we can place ASSERT_RTNL_NET() in call_fib_rule_notifiers(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:53 -08:00
Kuniyuki Iwashima	1cf770da01	net: fib_rules: Add error_free label in fib_delrule(). We will hold RTNL just before calling fib_nl2rule_rtnl() in fib_delrule() and release it before kfree(nlrule). Let's add a new rule to make the following change cleaner. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:52 -08:00
Kuniyuki Iwashima	98d3a6f681	net: fib_rules: Convert RTM_NEWRULE to per-netns RTNL. fib_nl_newrule() is the doit() handler for RTM_NEWRULE but also called from vrf_newlink(). In the latter case, RTNL is already held and the 4th arg is true. Let's hold per-netns RTNL in fib_newrule() if rtnl_held is false. Note that we call fib_rule_get() before releasing per-netns RTNL to call notify_rule_change() without RTNL and prevent freeing the new rule. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:52 -08:00
Kuniyuki Iwashima	a0596c2c63	net: fib_rules: Factorise fib_newrule() and fib_delrule(). fib_nl_newrule() / fib_nl_delrule() is the doit() handler for RTM_NEWRULE / RTM_DELRULE but also called from vrf_newlink(). Currently, we hold RTNL on both paths but will not on the former. Also, we set dev_net(dev)->rtnl to skb->sk in vrf_fib_rule() because fib_nl_newrule() / fib_nl_delrule() fetch net as sock_net(skb->sk). Let's Factorise the two functions and pass net and rtnl_held flag. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:52 -08:00
Kuniyuki Iwashima	5a1ccffd30	ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure(). The following patch will not set skb->sk from VRF path. Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk) in fib[46]_rule_configure(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:52 -08:00
Kuniyuki Iwashima	8b498773c8	net: fib_rules: Split fib_nl2rule(). We will move RTNL down to fib_nl_newrule() and fib_nl_delrule(). Some operations in fib_nl2rule() require RTNL: fib_default_rule_pref() and __dev_get_by_name(). Let's split the RTNL parts as fib_nl2rule_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:52 -08:00
Kuniyuki Iwashima	a9ffd24b55	net: fib_rules: Pass net to fib_nl2rule() instead of skb. skb is not used in fib_nl2rule() other than sock_net(skb->sk), which is already available in callers, fib_nl_newrule() and fib_nl_delrule(). Let's pass net directly to fib_nl2rule(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:52 -08:00
Kuniyuki Iwashima	7b7df666a2	net: fib_rules: Don't check net in rule_exists() and rule_find(). fib_nl_newrule() / fib_nl_delrule() looks up struct fib_rules_ops in sock_net(skb->sk) and calls rule_exists() / rule_find() respectively. fib_nl_newrule() creates a new rule and links it to the found ops, so struct fib_rule never belongs to a different netns's ops->rules_list. Let's remove redundant netns check in rule_exists() and rule_find(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:08:51 -08:00
Jakub Kicinski	51b2483b08	Merge branch 'tun-unify-vnet-implementation' Akihiko Odaki says: ==================== tun: Unify vnet implementation When I implemented virtio's hash-related features to tun/tap [1], I found tun/tap does not fill the entire region reserved for the virtio header, leaving some uninitialized hole in the middle of the buffer after read()/recvmesg(). This series fills the uninitialized hole. More concretely, the num_buffers field will be initialized with 1, and the other fields will be inialized with 0. Setting the num_buffers field to 1 is mandated by virtio 1.0 [2]. The change to virtio header is preceded by another change that refactors tun and tap to unify their virtio-related code. [1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com [2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/ ==================== Link: https://patch.msgid.link/20250207-tun-v6-0-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:13 -08:00
Akihiko Odaki	6a53fc5a87	tap: Use tun's vnet-related code tun and tap implements the same vnet-related features so reuse the code. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-7-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:11 -08:00
Akihiko Odaki	74212f20f3	tap: Keep hdr_len in tap_get_user() hdr_len is repeatedly used so keep it in a local variable. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-6-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:11 -08:00
Akihiko Odaki	1d41e2fa93	tun: Extract the vnet handling code The vnet handling code will be reused by tap. Functions are renamed to ensure that their names contain "vnet" to clarify that they are part of the decoupled vnet handling code. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-5-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:11 -08:00
Akihiko Odaki	2506251e81	tun: Decouple vnet handling Decouple the vnet handling code so that we can reuse it for tap. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-4-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:11 -08:00
Akihiko Odaki	60df67b948	tun: Decouple vnet from tun_struct Decouple vnet-related functions from tun_struct so that we can reuse them for tap in the future. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-3-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:10 -08:00
Akihiko Odaki	07e8b3bae2	tun: Keep hdr_len in tun_get_user() hdr_len is repeatedly used so keep it in a local variable. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-2-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:10 -08:00
Akihiko Odaki	5a9c5e5d8a	tun: Refactor CONFIG_TUN_VNET_CROSS_LE Check IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE) to save some lines and make future changes easier. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-1-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 19:07:10 -08:00
Jakub Kicinski	d28e2d7f5d	Merge branch 'net-xilinx-axienet-enable-adaptive-irq-coalescing-with-dim' Sean Anderson says: ==================== net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM To improve performance without sacrificing latency under low load, enable DIM. While I appreciate not having to write the library myself, I do think there are many unusual aspects to DIM, as detailed in the last patch. ==================== Link: https://patch.msgid.link/20250206201036.1516800-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 18:53:44 -08:00
Sean Anderson	e1d27d29db	net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM The default RX IRQ coalescing settings of one IRQ per packet can represent a significant CPU load. However, increasing the coalescing unilaterally can result in undesirable latency under low load. Adaptive IRQ coalescing with DIM offers a way to adjust the coalescing settings based on load. This device only supports "CQE" mode [1], where each packet resets the timer. Therefore, an interrupt is fired either when we receive coalesce_count_rx packets or when the interface is idle for coalesce_usec_rx. With this in mind, consider the following scenarios: Link saturated Here we want to set coalesce_count_rx to a large value, in order to coalesce more packets and reduce CPU load. coalesce_usec_rx should be set to at least the time for one packet. Otherwise the link will be "idle" and we will get an interrupt for each packet anyway. Bursts of packets Each burst should be coalesced into a single interrupt, although it may be prudent to reduce coalesce_count_rx for better latency. coalesce_usec_rx should be set to at least the time for one packet so bursts are coalesced. However, additional time beyond the packet time will just increase latency at the end of a burst. Sporadic packets Due to low load, we can set coalesce_count_rx to 1 in order to reduce latency to the minimum. coalesce_usec_rx does not matter in this case. Based on this analysis, I expected the CQE profiles to look something like usec = 0, pkts = 1 // Low load usec = 16, pkts = 4 usec = 16, pkts = 16 usec = 16, pkts = 64 usec = 16, pkts = 256 // High load Where usec is set to 16 to be a few us greater than the 12.3 us packet time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is instead usec = 2, pkts = 256 // Low load usec = 8, pkts = 128 usec = 16, pkts = 64 usec = 32, pkts = 64 usec = 64, pkts = 64 // High load I found this very surprising. The number of coalesced packets decreases as load increases. But as load increases we have more opportunities to coalesce packets without affecting latency as much. Additionally, the profile increases the usec as the load increases. But as load increases, the gaps between packets will tend to become smaller, making it possible to decrease usec for better latency at the end of a "burst". I consider the default CQE profile unsuitable for this NIC. Therefore, we use the first profile outlined in this commit instead. coalesce_usec_rx is set to 16 by default, but the user can customize it. This may be necessary if they are using jumbo frames. I think adjusting the profile times based on the link speed/mtu would be good improvement for generic DIM. In addition to the above profile problems, I noticed the following additional issues with DIM while testing: - DIM tends to "wander" when at low load, since the performance gradient is pretty flat. If you only have 10p/ms anyway then adjusting the coalescing settings will not affect throughput very much. - DIM takes a long time to adjust back to low indices when load is decreased following a period of high load. This is because it only re-evaluates its settings once every 64 interrupts. However, at low load 64 interrupts can be several seconds. Finally: performance. This patch increases receive throughput with iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from 69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from 43% to 9%. [1] Who names this stuff? Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-5-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 18:53:40 -08:00
Sean Anderson	eb80520e8a	net: xilinx: axienet: Get coalesce parameters from driver state The cr variables now contain the same values as the control registers themselves. Extract/calculate the values from the variables instead of saving the user-specified values. This allows us to remove some bookeeping, and also lets the user know what the actual coalesce settings are. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-4-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 18:53:40 -08:00
Sean Anderson	d048c717df	net: xilinx: axienet: Support adjusting coalesce settings while running In preparation for adaptive IRQ coalescing, we first need to support adjusting the settings at runtime. The existing code doesn't require any locking because - dma_start is the only function that modifies rx/tx_dma_cr. It is always called with IRQs and NAPI disabled, so nothing else is touching the hardware. - The IRQs don't race with poll, since the latter is a softirq. - The IRQs don't race with dma_stop since they both just clear the control registers. - dma_stop doesn't race with poll since the former is called with NAPI disabled. However, once we introduce another function that modifies rx/tx_dma_cr, we need to have some locking to prevent races. Introduce two locks to protect these variables and their registers. The control register values are now generated where the coalescing settings are set. Converting coalescing settings to control register values may require sleeping because of clk_get_rate. However, the read/modify/write of the control registers themselves can't sleep because it needs to happen in IRQ context. By pre-calculating the control register values, we avoid introducing an additional mutex. Since axienet_dma_start writes the control settings when it runs, we don't bother updating the CR registers when rx/tx_dma_started is false. This prevents any issues from writing to the control registers in the middle of a reset sequence. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 18:53:40 -08:00
Sean Anderson	e76d1ea8cb	net: xilinx: axienet: Combine CR calculation Combine the common parts of the CR calculations for better code reuse. While we're at it, simplify the code a bit. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 18:53:39 -08:00
Aleksander Jan Bajkowski	848b09d53d	r8152: add vendor/device ID pair for Dell Alienware AW1022z The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 17:57:35 -08:00
Jakub Kicinski	7aba666429	Merge branch 'xsk-the-lost-bits-from-chapter-iii' Alexander Lobakin says: ==================== xsk: the lost bits from Chapter III Before introducing libeth_xdp, we need to add a couple more generic helpers. Notably: * 01: add generic loop unrolling hint helpers; * 04: add helper to get both xdp_desc's DMA address and metadata pointer in one go, saving several cycles and hotpath object code size in drivers (especially when unrolling). Bonus: * 02, 03: convert two drivers which were using custom macros to generic unrolled_count() (trivial, no object code changes). ==================== Link: https://patch.msgid.link/20250206182630.3914318-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 17:54:46 -08:00
Alexander Lobakin	23d9324a27	xsk: add helper to get &xdp_desc's DMA and meta pointer in one go Currently, when your driver supports XSk Tx metadata and you want to send an XSk frame, you need to do the following: * call external xsk_buff_raw_get_dma(); * call inline xsk_buff_get_metadata(), which calls external xsk_buff_raw_get_data() and then do some inline checks. This effectively means that the following piece: addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; is done twice per frame, plus you have 2 external calls per frame, plus this: meta = pool->addrs + addr - pool->tx_metadata_len; if (unlikely(!xsk_buff_valid_tx_metadata(meta))) is always inlined, even if there's no meta or it's invalid. Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that in one go. It returns a small structure with 2 fields: DMA address, filled unconditionally, and metadata pointer, non-NULL only if it's present and valid. The address correction is performed only once and you also have only 1 external call per XSk frame, which does all the calculations and checks outside of your hotpath. You only need to check `if (ctx.meta)` for the metadata presence. To not copy any existing code, derive address correction and getting virtual and DMA address into small helpers. bloat-o-meter reports no object code changes for the existing functionality. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-5-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 17:54:43 -08:00
Alexander Lobakin	2fc6b26ac8	ice: use generic unrolled_count() macro ice, same as i40e, has a custom loop unrolling macros for unrolling Tx descriptors filling on XSk xmit. Replace ice defs with generic unrolled_count(), which is also more convenient as it allows passing defines as its argument, not hardcoded values, while the loop declaration will still be usual for-loop. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-4-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 17:54:43 -08:00
Alexander Lobakin	9144e6f404	i40e: use generic unrolled_count() macro i40e, as well as ice, has a custom loop unrolling macro for unrolling Tx descriptors filling on XSk xmit. Replace i40e defs with generic unrolled_count(), which is also more convenient as it allows passing defines as its argument, not hardcoded values, while the loop declaration will still be a usual for-loop. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 17:54:43 -08:00
Alexander Lobakin	c6594d6427	unroll: add generic loop unroll helpers There are cases when we need to explicitly unroll loops. For example, cache operations, filling DMA descriptors on very high speeds etc. Add compiler-specific attribute macros to give the compiler a hint that we'd like to unroll a loop. Example usage: #define UNROLL_BATCH 8 unrolled_count(UNROLL_BATCH) for (u32 i = 0; i < UNROLL_BATCH; i++) op(priv, i); Note that sometimes the compilers won't unroll loops if they think this would have worse optimization and perf than without unrolling, and that unroll attributes are available only starting GCC 8. For older compiler versions, no hints/attributes will be applied. For better unrolling/parallelization, don't have any variables that interfere between iterations except for the iterator itself. Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 17:54:43 -08:00
Oleksij Rempel	5b281fe7e3	net: phy: dp83td510: introduce LED framework support Add LED brightness, mode, HW control and polarity functions to enable external LED control in the TI DP83TD510 PHY. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250205103846.2273833-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 17:49:19 -08:00
Jakub Kicinski	39f54262ba	Merge branch 'eth-fbnic-support-rss-contexts-and-ntuple-filters' Jakub Kicinski says: ==================== eth: fbnic: support RSS contexts and ntuple filters Add support for RSS contexts and ntuple filters in fbnic. The device has only one context, intended for use by TCP zero-copy Rx. First two patches add a check we seem to be missing in the core, to avoid having to copy it to all drivers. $ ./drivers/net/hw/rss_ctx.py KTAP version 1 1..16 ok 1 rss_ctx.test_rss_key_indir ok 2 rss_ctx.test_rss_queue_reconfigure ok 3 rss_ctx.test_rss_resize ok 4 rss_ctx.test_hitless_key_update ok 5 rss_ctx.test_rss_context # Failed to create context 2, trying to test what we got ok 6 rss_ctx.test_rss_context4 # SKIP Tested only 1 contexts, wanted 4 # Increasing queue count 44 -> 66 # Failed to create context 2, trying to test what we got ok 7 rss_ctx.test_rss_context32 # SKIP Tested only 1 contexts, wanted 32 # Added only 1 out of 3 contexts ok 8 rss_ctx.test_rss_context_dump # Driver does not support rss + queue offset ok 9 rss_ctx.test_rss_context_queue_reconfigure ok 10 rss_ctx.test_rss_context_overlap ok 11 rss_ctx.test_rss_context_overlap2 # SKIP Test requires at least 2 contexts, but device only has 1 ok 12 rss_ctx.test_rss_context_out_of_order # SKIP Test requires at least 4 contexts, but device only has 1 # Failed to create context 2, trying to test what we got ok 13 rss_ctx.test_rss_context4_create_with_cfg # SKIP Tested only 1 contexts, wanted 4 ok 14 rss_ctx.test_flow_add_context_missing ok 15 rss_ctx.test_delete_rss_context_busy ok 16 rss_ctx.test_rss_ntuple_addition # SKIP Ntuple filter with RSS and nonzero action not supported # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:6 error:0 ==================== Link: https://patch.msgid.link/20250206235334.1425329-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:55 -08:00
Alexander Duyck	5797d3c62d	eth: fbnic: support listing tcam content via debugfs The device has a handful of relatively small TCAM tables, support dumping the driver state via debugfs. # ethtool -N eth0 flow-type tcp6 \ dst-ip 1111::2222 dst-port $((0x1122)) \ src-ip 3333::4444 src-port $((0x3344)) \ action 2 Added rule with ID 47 # cd $dbgfs # cat ip_src Idx S TCAM Bitmap V Addr/Mask ------------------------------------ 00 1 00020000,00000000 6 33330000000000000000000000004444 00000000000000000000000000000000 ... # cat ip_dst Idx S TCAM Bitmap V Addr/Mask ------------------------------------ 00 1 00020000,00000000 6 11110000000000000000000000002222 00000000000000000000000000000000 ... # cat act_tcam Idx S Value/Mask RSS Dest ------------------------------------------------------------------------ ... 49 1 0000 0000 0000 0000 0000 0000 1122 3344 0000 9c00 0088 000f 00000212 ffff ffff ffff ffff ffff ffff 0000 0000 ffff 23ff ff00 ... The ipo_* tables are for outer IP addresses. The tce_* table is for directing/stealing traffic to NC-SI. Signed-off-by: Alexander Duyck <alexanderduyck@meta.com> Link: https://patch.msgid.link/20250206235334.1425329-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:51 -08:00
Jakub Kicinski	d2348b4bf7	selftests: drv-net: rss_ctx: skip tests which need multiple contexts cleanly There's no good API to check how many contexts device supports. But initial tests sense the context count already, so just store that number and skip tests which we know need more. Link: https://patch.msgid.link/20250206235334.1425329-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:51 -08:00
Alexander Duyck	2230035439	eth: fbnic: support n-tuple filters Add ethtool -n / -N support. Support only "un-ordered" rule sets (RX_CLS_LOC_ANY), just for simplicity of the code. It's unclear anyone actually cares about the rule ordering. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20250206235334.1425329-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:51 -08:00
Alexander Duyck	3a265bd6a3	eth: fbnic: add IP TCAM programming IPv6 addresses are huge so the device has 4 TCAMs used for narrowing them down to a smaller key before the main match / action engine. Add the tables in which we'll keep the IP addresses used by ethtool n-tuple rules. Add the code for programming them into the device, and code for allocating and freeing entries. A bit of copy / paste here as we need to support IPv4 and IPv6 in the same tables, and there is four of them. But it makes the code easier to match up with the device. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20250206235334.1425329-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:50 -08:00
Daniel Zahka	260676ebb1	eth: fbnic: support an additional RSS context Add support for an extra RSS context. The device has a primary and a secondary context. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250206235334.1425329-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:50 -08:00
Jakub Kicinski	23bac39910	selftests: net-drv: test adding flow rule to invalid RSS context Check that adding Rx flow steering rules pointing to an RSS context which does not exist is prevented. Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250206235334.1425329-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:50 -08:00
Jakub Kicinski	de7f7582df	net: ethtool: prevent flow steering to RSS contexts which don't exist Since commit `42dc431f5d` ("ethtool: rss: prevent rss ctx deletion when in use") we prevent removal of RSS contexts pointed to by existing flow rules. Core should also prevent creation of rules which point to RSS context which don't exist in the first place. Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250206235334.1425329-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-10 08:26:50 -08:00
David S. Miller	34c84b3948	Merge branch 'netconsole-cpu-population' Breno Leitao says: ==================== netconsole: Add support for CPU population The current implementation of netconsole sends all log messages in parallel, which can lead to an intermixed and interleaved output on the receiving side. This makes it challenging to demultiplex the messages and attribute them to their originating CPUs. As a result, users and developers often struggle to effectively analyze and debug the parallel log output received through netconsole. Example of a message got from produciton hosts: ------------[ cut here ]------------ ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0 refcount_t: addition on 0; use-after-free. WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0 Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E) This series of patches introduces a new feature to the netconsole subsystem that allows the automatic population of the CPU number in the userdata field for each log message. This enhancement provides several benefits: * Improved demultiplexing of parallel log output: When multiple CPUs are sending messages concurrently, the added CPU number in the userdata makes it easier to differentiate and attribute the messages to their originating CPUs. * Better visibility into message sources: The CPU number information gives users and developers more insight into which specific CPU a particular log message came from, which can be valuable for debugging and analysis. The changes in this series are as follows Patches:: Patch "consolidate send buffers into netconsole_target struct" ================================================= Move the static buffers to netconsole target, from static declaration in send_msg_no_fragmentation() and send_msg_fragmented(). Patch "netconsole: Rename userdata to extradata" ================================================= Create the a concept of extradata, which encompasses the concept of userdata and the upcoming sysdatao Sysdata is a new concept being added, which is basically fields that are populated by the kernel. At this time only the CPU#, but, there is a desire to add current task name, kernel release version, etc. Patch "netconsole: Helper to count number of used entries" =========================================================== Create a simple helper to count number of entries in extradata. I am separating this in a function since it will need to count userdata and sysdata. For instance, when the user adds an extra userdata, we need to check if there is space, counting the previous data entries (from userdata and cpu data) Patch "Introduce configfs helpers for sysdata features" ====================================================== Create the concept of sysdata feature in the netconsole target, and create the configfs helpers to enable the bit in nt->sysdata Patch "Include sysdata in extradata entry count" ================================================ Add the concept of sysdata when counting for available space in the buffer. This will protect users from creating new userdata/sysdata if there is no more space Patch "netconsole: add support for sysdata and CPU population" =============================================================== This is the core patch. Basically add a new option to enable automatic CPU number population in the netconsole userdata Provides a new "cpu_nr" sysfs attribute to control this feature Patch "netconsole: selftest: test CPU number auto-population" ============================================================= Expands the existing netconsole selftest to verify the CPU number auto-population functionality Ensures the received netconsole messages contain the expected "cpu=<CPU>" entry in the message. Test different permutation with userdata Patch "netconsole: docs: Add documentation for CPU number auto-population" ============================================================================= Updates the netconsole documentation to explain the new CPU number auto-population feature Provides instructions on how to enable and use the feature I believe these changes will be a valuable addition to the netconsole subsystem, enhancing its usefulness for kernel developers and users. PS: This patchset is on top of the patch that created netcons_fragmented_msg selftest: https://lore.kernel.org/all/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@debian.org/ --- Changes in v5: - Fixed a kernel doc syntax syntax (Simon) - Link to v4: https://lore.kernel.org/r/20250204-netcon_cpu-v4-0-9480266ef556@debian.org Changes in v4: - Fixed Kernel doc for netconsole_target (Simon) - Fixed a typo in disable_sysdata_feature (Simon) - Improved sysdata_cpu_nr_show() to return !! in a bit-wise operation - Link to v3: https://lore.kernel.org/r/20250124-netcon_cpu-v3-0-12a0d286ba1d@debian.org Changes in v3: - Moved the buffer into netconsole_target, avoiding static functions in the send path (Jakub). - Fix a documentation error (Randy Dunlap) - Created a function that handle all the extradata, consolidating it in a single place (Jakub) - Split the patch even more, trying to simplify the review. - Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org Changes in v2: - Create the concept of extradata and sysdata. This will make the design easier to understand, and the code easier to read. * Basically extradata encompasses userdata and the new sysdata. Userdata originates from user, and sysdata originates in kernel. - Improved the test to send from a very specific CPU, which can be checked to be correct on the other side, as suggested by Jakub. - Fixed a bug where CPU # was populated at the wrong place - Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org ==================== Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:18 +00:00
Breno Leitao	a7aec70a90	netconsole: docs: Add documentation for CPU number auto-population Update the netconsole documentation to explain the new feature that allows automatic population of the CPU number. The key changes include introducing a new section titled "CPU number auto population in userdata", explaining how to enable the CPU number auto-population feature by writing to the "populate_cpu_nr" file in the netconsole configfs hierarchy. This documentation update ensures users are aware of the new CPU number auto-population functionality and how to leverage it for better demultiplexing and visibility of parallel netconsole output. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:18 +00:00
Breno Leitao	12fd83ca44	netconsole: selftest: test for sysdata CPU Add a new selftest to verify that the netconsole module correctly handles CPU runtime data in sysdata. The test validates three scenarios: 1. Basic CPU sysdata functionality - verifies that cpu=X is appended to messages 2. CPU sysdata with userdata - ensures CPU data works alongside userdata 3. Disabled CPU sysdata - confirms no CPU data is included when disabled The test uses taskset to control which CPU sends messages and verifies the reported CPU matches the one used. This helps ensure that netconsole accurately tracks and reports the originating CPU of messages. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:18 +00:00
Breno Leitao	ec15bc46c6	netconsole: add support for sysdata and CPU population Add infrastructure to automatically append kernel-generated data (sysdata) to netconsole messages. As the first use case, implement CPU number population, which adds the CPU that sent the message. This change introduces three distinct data types: - extradata: The complete set of appended data (sysdata + userdata) - userdata: User-provided key-value pairs from userspace - sysdata: Kernel-populated data (e.g. cpu=XX) The implementation adds a new configfs attribute 'cpu_nr' to control CPU number population per target. When enabled, each message is tagged with its originating CPU. The sysdata is dynamically updated at message time and appended after any existing userdata. The CPU number is formatted as "cpu=XX" and is added to the extradata buffer, respecting the existing size limits. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:18 +00:00
Breno Leitao	2bae25b16a	netconsole: Include sysdata in extradata entry count Modify count_extradata_entries() to include sysdata fields when calculating the total number of extradata entries. This change ensures that the sysdata feature, specifically the CPU number field, is correctly counted against the MAX_EXTRADATA_ITEMS limit. The modification adds a simple check for the CPU_NR flag in the sysdata_fields, incrementing the entry count accordingly. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:18 +00:00
Breno Leitao	364f67837e	netconsole: Introduce configfs helpers for sysdata features This patch introduces a bitfield to store sysdata features in the netconsole_target struct. It also adds configfs helpers to enable or disable the CPU_NR feature, which populates the CPU number in sysdata. The patch provides the necessary infrastructure to set or unset the CPU_NR feature, but does not modify the message itself. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:18 +00:00
Breno Leitao	563fe939a8	netconsole: Helper to count number of used entries Add a helper function nr_extradata_entries() to count the number of used extradata entries in a netconsole target. This refactors the duplicate code for counting entries into a single function, which will be reused by upcoming CPU sysdata changes. The helper uses list_count_nodes() to count the number of children in the userdata group configfs hierarchy. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:18 +00:00
Breno Leitao	4205f6495e	netconsole: Rename userdata to extradata Rename "userdata" to "extradata" since this structure will hold both user and system data in future patches. Keep "userdata" term only for data that comes from userspace (configfs), while "extradata" encompasses both userdata and future kerneldata. These are the rules of the design 1. extradata_complete will hold userdata and sysdata (coming) 2. sysdata will come after userdata_length 3. extradata_complete[userdata_length] string will be replaced at every message 5. userdata is replaced when configfs changes (update_userdata()) 6. sysdata is replaced at every message Example: extradata_complete = "userkey=uservalue cpu=42" userdata_length = 17 sysdata_length = 7 (space (" ") is part of sysdata) Since sysdata is still not available, you will see the following in the send functions: extradata_len = nt->userdata_length; The upcoming patches will, which will add support for sysdata, will change it to: extradata_len = nt->userdata_length + sysdata_len; Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-10 15:04:17 +00:00

1 2 3 4 5 ...

1335706 Commits