linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 15:13:44 -04:00

Author	SHA1	Message	Date
Yevgeny Kliteynik	9d4024edce	net/mlx5: HWS, force rehash when rule insertion failed Rules are inserted into hash table in accordance with their hash index. When a certain number of rules is reached, the table is rehashed: a bigger new table is allocated and all the rules are moved there. But sometimes a new rule can't be inserted into the hash table because its index is full, even though the number of rules in the table is well below the threshold. The hash function is not perfect, so such cases are not rare. When that happens, we want to do the same rehash, in order to increase the table size and lower the probability for such cases. This patch fixes the usecase where rule insertion was failing, but rehash couldn't be initiated due to low number of rules: it adds flag that denotes that rehash is required, even if the number of rules in the table is below the rehash threshold. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-7-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-13 15:30:25 -07:00
Yevgeny Kliteynik	17e0accac5	net/mlx5: HWS, support complex matchers This patch adds support for Complex Matchers/Rules Overview: -------- A matcher can match on a certain set of match parameters. However, the number and size of match params for a single matcher are limited: all the parameters must fit within a single definer. A common example of this limitation is IPv6 address matching, where matching both source and destination IPs requires more bits than a single definer can support. SW Steering addresses this limitation by chaining multiple Steering Table Entries (STEs) within the same matcher, where each STE matches on a subset of the parameters. In HW Steering, such chaining is not possible — the matcher's STEs are managed in a hash table, and a single definer is used to calculate the hash index for STEs. To address this limitation in HW Steering, we introduce Complex Matchers, which consist of two chained matchers. This allows matching on twice as many parameters. Complex Matchers are filled with Complex Rules — rules that are split into two parts and inserted into their respective matchers. The first half of the Complex Matcher is a regular matcher and points to the second half, which is an Isolated Matcher. An Isolated Matcher has its own isolated table and is accessible only by traffic coming from the first half of the Complex Matcher. This splitting of matchers/rules into multiple parts is transparent to users. It is hidden under the BWC HWS API. It becomes visible only when dumping steering debug information, where the Complex Matcher appears as two separate matchers: one in the user-created table and another in its isolated table. Some implementation details: --------------------------- All user actions are performed on the second part of the rules only. The first part handles matching and applies two actions: modify header (set metadata, see details below) and go-to-table (directing traffic to the isolated table containing the isolated matcher). Rule updates (updating rule actions) are applied to the second part of the rule since user-provided actions are not executed in the first matcher. We use REG_C_6 metadata register to set and match on unique per-rule tag (see details below). Splitting rules into two parts introduces new challenges: 1. Invalid Combinations Consider two rules with different matching values: - Rule 1: A+B - Rule 2: C+D Let's split the rules into two parts as follows: \|---\| \|---\| \| A \| \| B \| \|---\| --> \|---\| \| C \| \| D \| \|---\| \|---\| Splitting these rules results in invalid combinations like A+D and C+B. To resolve this, we assign unique tags to each rule on the first matcher and match these tags on the second matcher (the tag is implemented through modify_hdr action that sets value to metadata register REG_C_6): \|----------\| \|---------\| \| A \| \| B, TagA \| \| action: \| \| \| \| set TagA \| \| \| \|----------\| --> \|---------\| \| C \| \| D, TagB \| \| action: \| \| \| \| set TagB \| \| \| \|----------\| \|---------\| 2. Duplicated Entries: Consider two rules with overlapping values: - Rule 1: A+B - Rule 2: A+D Let's split the rules into two parts as follows: \|---\| \|---\| \| A \| \| B \| \|---\| --> \|---\| \| \| \| D \| \|---\| \|---\| This leads to the duplicated entries on the first matcher, which HWS doesn't allow: subsequent delete of either of the rules will delete the only entry in the first matcher, leaving the remaining rule broken. To address this, we use a reference count for entries in the first matcher and delete STEs only when their refcount reaches zero. Both challenges are resolved by having a per-matcher data structure (implemented with rhashtable) that manages refcounts for the first part of the rules and holds unique tags (managed via IDA) for these rules to set and to match on the second matcher. Limitations: ----------- We utilize metadata register REG_C_6 in this implementation, so its usage anywhere along the steering of the flow that might include the need for Complex Matcher is prohibited. The number and size of match parameters remain limited — now it is constrained by what can be represented by two definers instead of one. This architectural limitation arises from the structure of Complex Matchers. If future requirements demand more parameters, Complex Matchers can be extended beyond two matchers. Additionally, there is an implementation limit of 32 match parameters per rule (disregarding parameter size). This limit can be lifted if needed. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-13 15:30:25 -07:00
Yevgeny Kliteynik	b816743a18	net/mlx5: HWS, introduce isolated matchers In preparation for complex matcher support, introduce the isolated matcher. Isolated matcher is a matcher that has its own isolated table. It is used as the second half of the complex matcher: when the rule is split into two parts (complex rule), then matching on the first part will send the packet to the isolated matcher that will try to match on the second part. In case of miss, the packet goes back to the matcher's end flow table. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-13 15:30:25 -07:00
Yevgeny Kliteynik	3c739d1624	net/mlx5: HWS, expose polling function in header file In preparation for complex matcher, expose the function that is polling queue for completion (mlx5hws_bwc_queue_poll) in header file, so that it will be used by complex matcher code. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-13 15:30:25 -07:00
Yevgeny Kliteynik	fed5f48312	net/mlx5: HWS, add definer function to get field name str In preparation for complex matcher support, add function for converting definer fname to str, which will be used in following patches. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-13 15:30:25 -07:00
Yevgeny Kliteynik	d2338a27fc	net/mlx5: HWS, expose function mlx5hws_table_ft_set_next_ft in header In preparation for complex matcher support, make function mlx5hws_table_ft_set_next_ft() non-static and expose it in header. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1746992290-568936-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-13 15:30:25 -07:00
Paolo Abeni	9f607dc39b	Merge branch 'amd-xgbe-add-support-for-amd-renoir' Raju Rangoju says: ==================== amd-xgbe: add support for AMD Renoir Add support for a new AMD Ethernet device called "Renoir". It has a new PCI ID, add this to the current list of supported devices in the amd-xgbe devices. Also, the BAR1 addresses cannot be used to access the PCS registers on Renoir platform, use the indirect addressing via SMN instead. ==================== Link: https://patch.msgid.link/20250509155325.720499-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:29:43 +02:00
Raju Rangoju	795f86ff05	amd-xgbe: add support for new pci device id 0x1641 Add support for new pci device id 0x1641 to register Renoir device with PCIe. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509155325.720499-6-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:29:41 +02:00
Raju Rangoju	ab95bc9aa7	amd-xgbe: Add XGBE_XPCS_ACCESS_V3 support to xgbe_pci_probe() A new version of XPCS access routines have been introduced, add the support to xgbe_pci_probe() to use these routines. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509155325.720499-5-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:29:41 +02:00
Raju Rangoju	e49479f30e	amd-xgbe: add support for new XPCS routines Add the necessary support to enable Renoir ethernet device. Since the BAR1 address cannot be used to access the XPCS registers on Renoir, use the smn functions. Some of the ethernet add-in-cards have dual PHY but share a single MDIO line (between the ports). In such cases, link inconsistencies are noticed during the heavy traffic and during reboot stress tests. Using smn calls helps avoid such race conditions. Suggested-by: Sudheesh Mavila <sudheesh.mavila@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509155325.720499-4-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:29:41 +02:00
Raju Rangoju	bbbd7303ea	amd-xgbe: reorganize the xgbe_pci_probe() code path Reorganize the xgbe_pci_probe() code path to convert if/else statements to switch case to help add future code. This helps code look cleaner. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509155325.720499-3-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:29:40 +02:00
Raju Rangoju	2d4407160f	amd-xgbe: reorganize the code of XPCS access The xgbe_{read/write}_mmd_regs_v* functions have common code which can be moved to helper functions. Add new helper functions to calculate the mmd_address for v1/v2 of xpcs access. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509155325.720499-2-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:29:40 +02:00
Paolo Abeni	42bd96cb9e	Merge branch 'tools-ynl-gen-support-sub-types-for-binary-attributes' Jakub Kicinski says: ==================== tools: ynl-gen: support sub-types for binary attributes Binary attributes have sub-type annotations which either indicate that the binary object should be interpreted as a raw / C array of a simple type (e.g. u32), or that it's a struct. Use this information in the C codegen instead of outputting void * for all binary attrs. It doesn't make a huge difference in the genl families, but in classic Netlink there is a lot more structs. v1: https://lore.kernel.org/20250508022839.1256059-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250509154213.1747885-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:22:34 +02:00
Jakub Kicinski	25e37418c8	tools: ynl-gen: support struct for binary attributes Support using a struct pointer for binary attrs. Len field is maintained because the structs may grow with newer kernel versions. Or, which matters more, be shorter if the binary is built against newer uAPI than kernel against which it's executed. Since we are storing a pointer to a struct type - always allocate at least the amount of memory needed by the struct per current uAPI headers (unused mem is zeroed). Technically users should check the length field but per modern ASAN checks storing a short object under a pointer seems like a bad idea. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250509154213.1747885-4-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:22:32 +02:00
Jakub Kicinski	9ba8e351ef	tools: ynl-gen: auto-indent else We auto-indent if statements (increase the indent of the subsequent line by 1), do the same thing for else branches without a block. There hasn't been any else branches before but we're about to add one. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250509154213.1747885-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:22:32 +02:00
Jakub Kicinski	02a562bb2b	tools: ynl-gen: support sub-type for binary attributes Sub-type annotation on binary attributes may indicate that the attribute carries an array of simple types (also referred to as "C array" in docs). Support rendering them as such in the C user code. For example for u32, instead of: struct { u32 arr; } _len; void arr; render: struct { u32 arr; } _count; __u32 arr; Note that count is the number of elements while len was the length in bytes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250509154213.1747885-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 13:22:32 +02:00
Paolo Abeni	ac4d1baf97	Merge branch 'device-memory-tcp-tx' Mina Almasry says: ==================== Device memory TCP TX The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.com/ [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#m066dd407fbed108828e2c40ae50e3f4376ef57fd Cc: sdf@fomichev.me Cc: asml.silence@gmail.com Cc: dw@davidwei.uk Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Victor Nogueira <victor@mojatatu.com> Cc: Pedro Tammela <pctammela@mojatatu.com> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> v14: https://lore.kernel.org/netdev/20250429032645.363766-1-almasrymina@google.com/ v13: https://lore.kernel.org/netdev/20250425204743.617260-1-almasrymina@google.com/ v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.com/ v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.com/ v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.com/ v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.com/ v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.com/ v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.com/ v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.com/ v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.com/ v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.com/ v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ==================== Link: https://patch.msgid.link/20250508004830.4100853-1-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:14:46 +02:00
Mina Almasry	2f1a805f32	selftests: ncdevmem: Implement devmem TCP TX Add support for devmem TX in ncdevmem. This is a combination of the ncdevmem from the devmem TCP series RFCv1 which included the TX path, and work by Stan to include the netlink API and refactored on top of his generic memory_provider support. Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-10-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:49 +02:00
Mina Almasry	ae28cb1147	net: check for driver support in netmem TX We should not enable netmem TX for drivers that don't declare support. Check for driver netmem TX support during devmem TX binding and fail if the driver does not have the functionality. Check for driver support in validate_xmit_skb as well. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-9-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:49 +02:00
Mina Almasry	c32532670c	gve: add netmem TX support to GVE DQO-RDA mode Use netmem_dma_*() helpers in gve_tx_dqo.c DQO-RDA paths to enable netmem TX support in that mode. Declare support for netmem TX in GVE DQO-RDA mode. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250508004830.4100853-8-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:49 +02:00
Mina Almasry	383faec0fd	net: enable driver support for netmem TX Drivers need to make sure not to pass netmem dma-addrs to the dma-mapping API in order to support netmem TX. Add helpers and netmem_dma_*() helpers that enables special handling of netmem dma-addrs that drivers can use. Document in netmem.rst what drivers need to do to support netmem TX. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-7-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:49 +02:00
Mina Almasry	17af8cc06a	net: add devmem TCP TX documentation Add documentation outlining the usage and details of the devmem TCP TX API. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-6-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:48 +02:00
Mina Almasry	bd61848900	net: devmem: Implement TX path Augment dmabuf binding to be able to handle TX. Additional to all the RX binding, we also create tx_vec needed for the TX path. Provide API for sendmsg to be able to send dmabufs bound to this device: - Provide a new dmabuf_tx_cmsg which includes the dmabuf to send from. - MSG_ZEROCOPY with SCM_DEVMEM_DMABUF cmsg indicates send from dma-buf. Devmem is uncopyable, so piggyback off the existing MSG_ZEROCOPY implementation, while disabling instances where MSG_ZEROCOPY falls back to copying. We additionally pipe the binding down to the new zerocopy_fill_skb_from_devmem which fills a TX skb with net_iov netmems instead of the traditional page netmems. We also special case skb_frag_dma_map to return the dma-address of these dmabuf net_iovs instead of attempting to map pages. The TX path may release the dmabuf in a context where we cannot wait. This happens when the user unbinds a TX dmabuf while there are still references to its netmems in the TX path. In that case, the netmems will be put_netmem'd from a context where we can't unmap the dmabuf, Resolve this by making __net_devmem_dmabuf_binding_free schedule_work'd. Based on work by Stanislav Fomichev <sdf@fomichev.me>. A lot of the meat of the implementation came from devmem TCP RFC v1[1], which included the TX path, but Stan did all the rebasing on top of netmem/net_iov. Cc: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-5-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:48 +02:00
Stanislav Fomichev	8802087d20	net: devmem: TCP tx netlink api Add bind-tx netlink call to attach dmabuf for TX; queue is not required, only ifindex and dmabuf fd for attachment. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:48 +02:00
Mina Almasry	e9f3d61db5	net: add get_netmem/put_netmem support Currently net_iovs support only pp ref counts, and do not support a page ref equivalent. This is fine for the RX path as net_iovs are used exclusively with the pp and only pp refcounting is needed there. The TX path however does not use pp ref counts, thus, support for get_page/put_page equivalent is needed for netmem. Support get_netmem/put_netmem. Check the type of the netmem before passing it to page or net_iov specific code to obtain a page ref equivalent. For dmabuf net_iovs, we obtain a ref on the underlying binding. This ensures the entire binding doesn't disappear until all the net_iovs have been put_netmem'ed. We do not need to track the refcount of individual dmabuf net_iovs as we don't allocate/free them from a pool similar to what the buddy allocator does for pages. This code is written to be extensible by other net_iov implementers. get_netmem/put_netmem will check the type of the netmem and route it to the correct helper: pages -> [get\|put]_page() dmabuf net_iovs -> net_devmem_[get\|put]_net_iov() new net_iovs -> new helpers Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508004830.4100853-3-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:48 +02:00
Mina Almasry	03e96b8c11	netmem: add niov->type attribute to distinguish different net_iov types Later patches in the series adds TX net_iovs where there is no pp associated, so we can't rely on niov->pp->mp_ops to tell what is the type of the net_iov. Add a type enum to the net_iov which tells us the net_iov type. Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250508004830.4100853-2-almasrymina@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-13 11:12:48 +02:00
Jonas Gorski	e39d14a760	net: dsa: b53: implement setting ageing time b53 supported switches support configuring ageing time between 1 and 1,048,575 seconds, so add an appropriate setter. This allows b53 to pass the FDB learning test for both vlan aware and vlan unaware bridges. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250510092211.276541-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:51:09 -07:00
Jason Xing	b86bcfee30	net: mlx4: add SOF_TIMESTAMPING_TX_SOFTWARE flag when getting ts info As mlx4 has implemented skb_tx_timestamp() in mlx4_en_xmit(), the SOFTWARE flag is surely needed when users are trying to get timestamp information. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250510093442.79711-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:50:30 -07:00
Jakub Kicinski	a96876057b	netlink: fix policy dump for int with validation callback Recent devlink change added validation of an integer value via NLA_POLICY_VALIDATE_FN, for sparse enums. Handle this in policy dump. We can't extract any info out of the callback, so report only the type. Fixes: `429ac62114` ("devlink: define enum for attr types of dynamic attributes") Reported-by: syzbot+01eb26848144516e7f0a@syzkaller.appspotmail.com Link: https://patch.msgid.link/20250509212751.1905149-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:50:09 -07:00
Jakub Kicinski	cc42263172	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Prepare for Intel IPU E2000 (GEN3) This is the first part in introducing RDMA support for idpf. ---------------------------------------------------------------- Tatyana Nikolova says: To align with review comments, the patch series introducing RDMA RoCEv2 support for the Intel Infrastructure Processing Unit (IPU) E2000 line of products is going to be submitted in three parts: 1. Modify ice to use specific and common IIDC definitions and pass a core device info to irdma. 2. Add RDMA support to idpf and modify idpf to use specific and common IIDC definitions and pass a core device info to irdma. 3. Add RDMA RoCEv2 support for the E2000 products, referred to as GEN3 to irdma. This first part is a 5 patch series based on the original "iidc/ice/irdma: Update IDC to support multiple consumers" patch to allow for multiple CORE PCI drivers, using the auxbus. Patches: 1) Move header file to new name for clarity and replace ice specific DSCP define with a kernel equivalent one in irdma 2) Unify naming convention 3) Separate header file into common and driver specific info 4) Replace ice specific DSCP define with a kernel equivalent one in ice 5) Implement core device info struct and update drivers to use it ---------------------------------------------------------------- v1: https://lore.kernel.org/20250505212037.2092288-1-anthony.l.nguyen@intel.com IWL reviews: [v5] https://lore.kernel.org/20250416021549.606-1-tatyana.e.nikolova@intel.com [v4] https://lore.kernel.org/20250225050428.2166-1-tatyana.e.nikolova@intel.com [v3] https://lore.kernel.org/20250207194931.1569-1-tatyana.e.nikolova@intel.com [v2] https://lore.kernel.org/20240824031924.421-1-tatyana.e.nikolova@intel.com [v1] https://lore.kernel.org/20240724233917.704-1-tatyana.e.nikolova@intel.com * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: iidc/ice/irdma: Update IDC to support multiple consumers ice: Replace ice specific DSCP mapping num with a kernel define iidc/ice/irdma: Break iidc.h into two headers iidc/ice/irdma: Rename to iidc_* convention iidc/ice/irdma: Rename IDC header file ==================== Link: https://patch.msgid.link/20250509200712.2911060-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:48:27 -07:00
Jakub Kicinski	908aef9a71	Merge branch 'net-vertexcom-mse102x-improve-rx-handling' Stefan Wahren says: ==================== net: vertexcom: mse102x: Improve RX handling This series is the second part of two series for the Vertexcom driver. It contains some improvements for the RX handling of the Vertexcom MSE102x. ==================== Link: https://patch.msgid.link/20250509120435.43646-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:46:46 -07:00
Stefan Wahren	8ea6e51e54	net: vertexcom: mse102x: Simplify mse102x_rx_pkt_spi The function mse102x_rx_pkt_spi is used in two cases: * initial polling to re-arm RX interrupt * level based RX interrupt handler Both of them doesn't need an open-coded retry mechanism. In the first case the function can be called again, if the return code is IRQ_NONE. This keeps the error behavior during netdev open. In the second case the proper retry would be handled implicit by the SPI interrupt. So drop the retry code and simplify the receive path. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://patch.msgid.link/20250509120435.43646-7-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:46:44 -07:00
Stefan Wahren	4ecf56f4b6	net: vertexcom: mse102x: Return code for mse102x_rx_pkt_spi The MSE102x doesn't provide any interrupt register, so the only way to handle the level interrupt is to fetch the whole packet from the MSE102x internal buffer via SPI. So in cases the interrupt handler fails to do this, it should return IRQ_NONE. This allows the core to disable the interrupt in case the issue persists and prevent an interrupt storm. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://patch.msgid.link/20250509120435.43646-6-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:46:44 -07:00
Stefan Wahren	6ce9348468	net: vertexcom: mse102x: Implement flag for valid CMD After removal of the invalid command counter only a relevant debug message is left, which can be cumbersome. So add a new flag to debugfs, which indicates whether the driver has ever received a valid CMD. This helps to differentiate between general and temporary receive issues. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250509120435.43646-5-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:46:44 -07:00
Stefan Wahren	aeb90c40ee	net: vertexcom: mse102x: Drop invalid cmd stats There are several reasons for an invalid command response by the MSE102x: * SPI line interferences * MSE102x is in reset or has no firmware * MSE102x is busy * no packet in MSE102x receive buffer So the counter for invalid command isn't very helpful without further context. So drop the confusing statistics counter, but keep the debug messages about "unexpected response" in order to debug possible hardware issues. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250509120435.43646-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:46:44 -07:00
Stefan Wahren	fed56943a8	net: vertexcom: mse102x: Add warning about IRQ trigger type The example of the initial DT binding of the Vertexcom MSE 102x suggested a IRQ_TYPE_EDGE_RISING, which is wrong. So warn everyone to fix their device tree to level based IRQ. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250509120435.43646-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:46:44 -07:00
Stefan Wahren	a29a728666	dt-bindings: vertexcom-mse102x: Fix IRQ type in example According to the MSE102x documentation the trigger type is a high level. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250509120435.43646-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:46:44 -07:00
Matthias Schiffer	6bf7884937	net: phy: dp83867: use 2ns delay if not specified in DTB Most PHY drivers default to a 2ns delay if internal delay is requested and no value is specified. Having a default value makes sense, as it allows a Device Tree to only care about board design (whether there are delays on the PCB or not), and not whether the delay is added on the MAC or the PHY side when needed. Whether the delays are actually applied is controlled by the DP83867_RGMII_*_CLK_DELAY_EN flags, so the behavior is only changed in configurations that would previously be rejected with -EINVAL. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Link: https://patch.msgid.link/e2509b248a11ee29ea408a50c231da4c1fa0863b.1746612711.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:43:35 -07:00
Matthias Schiffer	cc7734e03e	net: phy: dp83867: remove check of delay strap configuration The check that intended to handle "rgmii" PHY mode differently to the RGMII modes with internal delay never worked as intended: - added in commit `2a10154abc` ("net: phy: dp83867: Add TI dp83867 phy"): logic error caused the condition to always evaluate to true - changed in commit `a46fa260f6` ("net: phy: dp83867: Fix warning check for setting the internal delay"): now the condition incorrectly evaluates to false for rgmii-txid - removed in commit `2b89264925` ("net: phy: dp83867: Set up RGMII TX delay") Around the time of the removal, commit `c11669a275` ("net: phy: dp83867: Rework delay rgmii delay handling") started clearing the delay enable flags in RGMIICTL. The change attempted to preserve the historical behavior of not disabling internal delays with "rgmii" PHY mode and also documented this in a comment, but due to a conflict between "Set up RGMII TX delay" and "Rework delay rgmii delay handling", the behavior dp83867_verify_rgmii_cfg() warned about (and that was also described in a comment in dp83867_config_init()) disappeared in the following merge of net into net-next in commit `b4b12b0d2f` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"). While is doesn't appear that this breaking change was intentional, it has been like this since 2019, and the new behavior to disable the delays with "rgmii" PHY mode is generally desirable - in particular with MAC drivers that have to fix up the delay mode, resulting in the PHY driver not even seeing the same mode that was specified in the Device Tree. Remove the obsolete check and comment. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Link: https://patch.msgid.link/8a286207cd11b460bb0dbd27931de3626b9d7575.1746612711.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:43:35 -07:00
Lad Prabhakar	6b466efc63	dt-bindings: net: renesas-gbeth: Add support for RZ/V2N (R9A09G056) SoC Document support for the GBETH IP found on the Renesas RZ/V2N (R9A09G056) SoC. The GBETH controller on the RZ/V2N SoC is functionally identical to the one found on the RZ/V2H(P) (R9A09G057) SoC, so `renesas,rzv2h-gbeth` will be used as a fallback compatible. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20250507173551.100280-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:12:40 -07:00
Jakub Kicinski	e9c392a155	Merge branch 'selftests-net-configure-rp_filter-in-setup_ns' Hangbin Liu says: ==================== selftests: net: configure rp_filter in setup_ns Some distributions enable rp_filter globally by default, which can interfere with various test cases. To address this, many tests explicitly disable rp_filter within their scripts. To avoid duplication and ensure consistent behavior across tests, this patch moves the rp_filter configuration into setup_ns, applied immediately after a new namespace is created. This change ensures that all namespace-based tests inherit the appropriate rp_filter settings, simplifying individual test scripts and improving maintainability. ==================== Link: https://patch.msgid.link/20250508081910.84216-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:57 -07:00
Hangbin Liu	b83d98c1db	selftests: mptcp: remove rp_filter configuration Remove the rp_filter configuration from MPTCP tests, as it is now handled by setup_ns. Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250508081910.84216-7-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:56 -07:00
Hangbin Liu	7c8b89ec50	selftests: netfilter: remove rp_filter configuration Remove the rp_filter configuration in netfilter lib, as setup_ns already sets it appropriately by default Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250508081910.84216-6-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:56 -07:00
Hangbin Liu	3f68f59e95	selftests: net: use setup_ns for SRv6 tests and remove rp_filter configuration Some SRv6 tests manually set up network namespaces and disable rp_filter. Since the setup_ns library function already handles rp_filter configuration, convert these SRv6 tests to use setup_ns and remove the redundant rp_filter settings. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20250508081910.84216-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:55 -07:00
Hangbin Liu	69ea46e7d0	selftests: net: use setup_ns for bareudp testing Switch bareudp testing to use setup_ns, which sets up rp_filter by default. This allows us to remove the manual rp_filter configuration from the script. Additionally, since setup_ns handles namespace naming and cleanup, we no longer need a separate cleanup function. We also move the trap setup earlier in the script, before the test setup begins. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250508081910.84216-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:55 -07:00
Hangbin Liu	50ad88d576	selftests: net: remove redundant rp_filter configuration The following tests use setup_ns to create a network namespace, which will disables rp_filter immediately after namespace creation. Therefore, it is no longer necessary to disable rp_filter again within these individual tests. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250508081910.84216-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:55 -07:00
Hangbin Liu	ce17831f8e	selftests: net: disable rp_filter after namespace initialization Some distributions enable rp_filter globally by default. To ensure consistent behavior across environments, we explicitly disable it in several test cases. This patch moves the rp_filter disabling logic to immediately after the network namespace is initialized. With this change, individual test cases with creating namespace via setup_ns no longer need to disable rp_filter again. This helps avoid redundancy and ensures test consistency. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250508081910.84216-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:55 -07:00
Vladimir Oltean	c14e1ecefd	net: ixp4xx_eth: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set() New timestamping API was introduced in commit `66f7223039` ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the intel ixp4xx ethernet driver to the new API, so that the ndo_eth_ioctl() path can be removed completely. hwtstamp_get() and hwtstamp_set() are only called if netif_running() when the code path is engaged through the legacy ioctl. As I don't want to make an unnecessary functional change which I can't test, preserve that restriction when going through the new operations. When cpu_is_ixp46x() is false, the execution of SIOCGHWTSTAMP and SIOCSHWTSTAMP falls through to phy_mii_ioctl(), which may process it in case of a timestamping PHY, or may return -EOPNOTSUPP. In the new API, the core handles timestamping PHYs directly and does not call the netdev driver, so just return -EOPNOTSUPP directly for equivalent logic. A gratuitous change I chose to do anyway is prefixing hwtstamp_get() and hwtstamp_set() with the driver name, ipx4xx. This reflects modern coding sensibilities, and we are touching the involved lines anyway. The remainder of eth_ioctl() is exactly equivalent to phy_do_ioctl_running(), so use that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20250508211043.3388702-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:10:24 -07:00
Jakub Kicinski	ef5224ed25	selftests: drv-net: ping: make sure the ping test restores checksum offload The ping test flips checksum offload on and off. Make sure the original value is restored if test fails. Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250508214005.1518013-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:08:13 -07:00
Stanislav Fomichev	2451d3fb38	net/mlx5: support software TX timestamp Having a software timestamp (along with existing hardware one) is useful to trace how the packets flow through the stack. mlx5e_tx_skb_update_hwts_flags is called from tx paths to setup HW timestamp; extend it to add software one as well. Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250508235109.585096-1-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-12 18:07:43 -07:00

1 2 3 4 5 ...

1353678 Commits