Commit Graph

1352101 Commits

Author SHA1 Message Date
Jakub Kicinski
a1980cc962 Merge branch 'implement-udp-tunnel-port-for-txgbe'
Jiawen Wu says:

====================
Implement udp tunnel port for txgbe

v3: https://lore.kernel.org/20250417080328.426554-1-jiawenwu@trustnetic.com
v2: https://lore.kernel.org/20250414091022.383328-1-jiawenwu@trustnetic.com
v1: https://lore.kernel.org/20250410074456.321847-1-jiawenwu@trustnetic.com
====================

Link: https://patch.msgid.link/20250421022956.508018-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:53:40 -07:00
Jiawen Wu
3b05aa997c net: wangxun: restrict feature flags for tunnel packets
Implement ndo_features_check to restrict Tx checksum offload flags, since
there are some inner layer length and protocols unsupported.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20250421022956.508018-3-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:53:35 -07:00
Jiawen Wu
f294516f1f net: txgbe: Support to set UDP tunnel port
Tunnel types VXLAN/VXLAN_GPE/GENEVE are supported for txgbe devices. The
hardware supports to set only one port for each tunnel type.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250421022956.508018-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:53:35 -07:00
Jakub Kicinski
84ee6e5040 Merge branch 'net-followup-series-for-exit_rtnl'
Kuniyuki Iwashima says:

====================
net: Followup series for ->exit_rtnl().

Patch 1 drops the hold_rtnl arg in ops_undo_list() as suggested by Jakub.
Patch 2 & 3 apply ->exit_rtnl() to pfcp and ppp.

v1: https://lore.kernel.org/20250415022258.11491-1-kuniyu@amazon.com
====================

Link: https://patch.msgid.link/20250418003259.48017-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:07:45 -07:00
Kuniyuki Iwashima
7ee32072c7 ppp: Split ppp_exit_net() to ->exit_rtnl().
ppp_exit_net() unregisters devices related to the netns under
RTNL and destroys lists and IDR.

Let's use ->exit_rtnl() for the device unregistration part to
save RTNL dances for each netns.

Note that we delegate the for_each_netdev_safe() part to
default_device_exit_batch() and replace unregister_netdevice_queue()
with ppp_nl_dellink() to align with bond, geneve, gtp, and pfcp.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418003259.48017-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:07:41 -07:00
Kuniyuki Iwashima
81eccc131b pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
pfcp_net_exit() holds RTNL and cleans up all devices in the netns
and other devices tied to sockets in the netns.

We can use ->exit_rtnl() to save RTNL dance for all dying netns.

Note that we delegate the for_each_netdev() part to
default_device_exit_batch() to avoid a list corruption splat
like the one reported in commit 4ccacf8649 ("gtp: Suppress
list corruption splat in gtp_net_exit_batch_rtnl().")

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418003259.48017-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:07:41 -07:00
Kuniyuki Iwashima
434efd3d0c net: Drop hold_rtnl arg from ops_undo_list().
ops_undo_list() first iterates over ops_list for ->pre_exit().

Let's check if any of the ops has ->exit_rtnl() there and drop
the hold_rtnl argument.

Note that nexthop uses ->exit_rtnl() and is built-in, so hold_rtnl
is always true for setup_net() and cleanup_net() for now.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250414170148.21f3523c@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418003259.48017-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:07:41 -07:00
Russell King (Oracle)
21b01cb8e8 net: stmmac: visconti: convert to set_clk_tx_rate() method
Convert visconti to use the set_clk_tx_rate() method. By doing so,
the GMAC control register will already have been updated (unlike with
the fix_mac_speed() method) so this code can be removed while porting
to the set_clk_tx_rate() method.

There is also no need for the spinlock, and has never been - neither
fix_mac_speed() nor set_clk_tx_rate() can be called by more than one
thread at a time, so the lock does nothing useful.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u5SiQ-001I0B-OQ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:06:02 -07:00
Colin Ian King
f7ca612018 net: dsa: rzn1_a5psw: Make the read-only array offsets static const
Don't populate the read-only array offsets on the stack at run time,
instead make it static const.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://patch.msgid.link/20250417161353.490219-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 19:05:04 -07:00
Jakub Kicinski
eff59eb102 Merge branch 'add-gbeth-glue-layer-driver-for-renesas-rz-v2h-p-soc'
Lad Prabhakar says:

====================
Add GBETH glue layer driver for Renesas RZ/V2H(P) SoC

This patch series adds support for the GBETH (Gigabit Ethernet) glue layer
driver for the Renesas RZ/V2H(P) SoC. The GBETH IP is integrated with
the Synopsys DesignWare MAC (version 5.20). The changes include updating
the device tree bindings, documenting the GBETH bindings, and adding the
DWMAC glue layer for the Renesas GBETH.

v1: https://lore.kernel.org/20250302181808.728734-1-prabhakar.mahadev-lad.rj@bp.renesas.com
====================

Link: https://patch.msgid.link/20250417084015.74154-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 18:49:27 -07:00
Lad Prabhakar
326976b055 MAINTAINERS: Add entry for Renesas RZ/V2H(P) DWMAC GBETH glue layer driver
Add a new MAINTAINERS entry for the Renesas RZ/V2H(P) DWMAC GBETH
glue layer driver.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250417084015.74154-5-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 18:49:20 -07:00
Lad Prabhakar
461f6529e5 net: stmmac: Add DWMAC glue layer for Renesas GBETH
Add the DWMAC glue layer for the GBETH IP found in the Renesas RZ/V2H(P)
SoC.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patch.msgid.link/20250417084015.74154-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 18:48:47 -07:00
Lad Prabhakar
8fff7ae84d dt-bindings: net: Document support for Renesas RZ/V2H(P) GBETH
GBETH IP on the Renesas RZ/V2H(P) SoC is integrated with Synopsys
DesignWare MAC (version 5.20). Document the device tree bindings for
the GBETH glue layer.

Generic compatible string 'renesas,rzv2h-gbeth' is added since this
module is identical on both the RZ/V2H(P) and RZ/G3E SoCs.

The Rx/Tx clocks supplied for GBETH on the RZ/V2H(P) SoC is depicted
below:

                      Rx / Tx
-------+------------- on / off -------
       |
       |            Rx-180 / Tx-180
       +---- not ---- on / off -------

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250417084015.74154-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 18:48:47 -07:00
Lad Prabhakar
8c989368c0 dt-bindings: net: dwmac: Increase 'maxItems' for 'interrupts' and 'interrupt-names'
Increase the `maxItems` value for the `interrupts` and `interrupt-names`
properties to 11 to support additional per-channel Tx/Rx completion
interrupts on the Renesas RZ/V2H(P) SoC, which features the
`snps,dwmac-5.20` IP.

Refactor the `interrupt-names` property by replacing repeated `enum`
entries with a `oneOf` list. Add support for per-channel receive and
transmit completion interrupts using regex patterns.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250417084015.74154-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 18:48:47 -07:00
Krzysztof Kozlowski
a7696fb251 ptp: Do not enable by default during compile testing
Enabling the compile test should not cause automatic enabling of all
drivers, but only allow to choose to compile them.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417074643.81448-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 18:43:10 -07:00
Kees Cook
f0f149d974 emulex/benet: Annotate flash_cookie as nonstring
GCC 15's new -Wunterminated-string-initialization notices that the 32
character "flash_cookie" (which is not used as a C-String)
needs to be marked as "nonstring":

drivers/net/ethernet/emulex/benet/be_cmds.c:2618:51: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Wunterminated-string-initialization]
 2618 | static char flash_cookie[2][16] = {"*** SE FLAS", "H DIRECTORY *** "};
      |                                                   ^~~~~~~~~~~~~~~~~~

Add this annotation, avoid using a multidimensional array, but keep the
string split (with a comment about why). Additionally mark it const
and annotate the "cookie" member that is being memcmp()ed against as
nonstring too.

Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250416221028.work.967-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 18:22:12 -07:00
Jakub Kicinski
044412d9b6 Merge branch 'net-phy-dp83822-add-support-for-changing-the-mac-series-termination'
Dimitri Fedrau says:

====================
net: phy: dp83822: Add support for changing the MAC series termination

The dp83822 provides the possibility to set the resistance value of the
the MAC series termination. Modifying the resistance to an appropriate
value can reduce signal reflections and therefore improve signal quality.

v2: https://lore.kernel.org/20250408-dp83822-mac-impedance-v2-0-fefeba4a9804@liebherr.com
v1: https://lore.kernel.org/20250307-dp83822-mac-impedance-v1-0-bdd85a759b45@liebherr.com
====================

Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-0-028ac426cddb@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 17:50:01 -07:00
Dimitri Fedrau
6c3c3c230a net: phy: dp83822: Add support for changing the MAC termination
The dp83822 provides the possibility to set the resistance value of the
the MAC termination. Modifying the resistance to an appropriate value can
reduce signal reflections and therefore improve signal quality.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-4-028ac426cddb@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 17:49:48 -07:00
Dimitri Fedrau
145436ae01 net: phy: Add helper for getting MAC termination resistance
Add helper which returns the MAC termination resistance value. Modifying
the resistance to an appropriate value can reduce signal reflections and
therefore improve signal quality.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-3-028ac426cddb@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 17:49:48 -07:00
Dimitri Fedrau
1de1390ee0 dt-bindings: net: dp83822: add constraints for mac-termination-ohms
Property mac-termination-ohms is defined in ethernet-phy.yaml. Add allowed
values for the property.

Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-2-028ac426cddb@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 17:49:47 -07:00
Dimitri Fedrau
4cb6316d33 dt-bindings: net: ethernet-phy: add property mac-termination-ohms
Add property mac-termination-ohms in the device tree bindings for selecting
the resistance value of the builtin series termination resistors of the
PHY. Changing the resistance to an appropriate value can reduce signal
reflections and therefore improve signal quality.

Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-1-028ac426cddb@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 17:49:47 -07:00
Heiner Kallweit
b7ed5d5a78 r8169: use pci_prepare_to_sleep in rtl_shutdown
Use pci_prepare_to_sleep() like PCI core does in pci_pm_suspend_noirq.
This aligns setting a low-power mode during shutdown with the handling
of the transition to system suspend. Also the transition to runtime
suspend uses pci_target_state() instead of setting D3hot unconditionally.

Note: pci_prepare_to_sleep() uses device_may_wakeup() to check whether
      device may generate wakeup events. So we don't lose anything by
      not passing tp->saved_wolopts any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/f573fdbd-ba6d-41c1-b68f-311d3c88db2c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 17:22:02 -07:00
Dr. David Alan Gilbert
67b083f14f octeontx2-af: Remove unused rvu_npc_enable_bcast_entry
The last use of rvu_npc_enable_bcast_entry() was removed in 2021 by
commit 967db3529e ("octeontx2-af: add support for multicast/promisc
packet replication feature")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250420225810.171852-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 17:20:23 -07:00
Dr. David Alan Gilbert
45bd443bfd net: 802: Remove unused p8022 code
p8022.c defines two external functions, register_8022_client()
and unregister_8022_client(), the last use of which was removed in
2018 by
commit 7a2e838d28 ("staging: ipx: delete it from the tree")

Remove the p8022.c file, it's corresponding header, and glue
surrounding it.  There was one place the header was included in vlan.c
but it didn't use the functions it declared.

There was a comment in net/802/Makefile about checking
against net/core/Makefile, but that's at least 20 years old and
there's no sign of net/core/Makefile mentioning it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250418011519.145320-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22 07:04:02 -07:00
Justin Lai
37f2f2fe26 rtase: Add ndo_setup_tc support for CBS offload in traffic control setup
Add support for ndo_setup_tc to enable CBS offload functionality as
part of traffic control configuration for network devices, where CBS
is applied from the CPU to the switch. More specifically, CBS is
applied at the GMAC in the topmost architecture diagram.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250416115757.28156-1-justinlai0215@realtek.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 13:48:16 +02:00
Dan Carpenter
b77ad30c42 rxrpc: rxgk: Set error code in rxgk_yfs_decode_ticket()
Propagate the error code if key_alloc() fails.  Don't return
success.

Fixes: 9d1d2b5934 ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/Z_-P_1iLDWksH1ik@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 13:42:27 +02:00
Paolo Abeni
a86aa9c247 Merge branch 'ionic-support-qsfp-cmis'
Shannon Nelson says:

====================
ionic: support QSFP CMIS

This patchset sets up support for additional pages and better
handling of the QSFP CMIS data.

v1: https://lore.kernel.org/netdev/20250411182140.63158-1-shannon.nelson@amd.com/
====================

Link: https://patch.msgid.link/20250415231317.40616-1-shannon.nelson@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 12:27:32 +02:00
Shannon Nelson
0651c83ea9 ionic: add module eeprom channel data to ionic_if and ethtool
Make the CMIS module type's page 17 channel data available for
ethtool to request.  As done previously, carve space for this
data from the port_info reserved space.

In the future, if additional pages are needed, a new firmware
AdminQ command will be added for accessing random pages.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250415231317.40616-4-shannon.nelson@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 12:27:30 +02:00
Shannon Nelson
9c2e17d30b ionic: support ethtool get_module_eeprom_by_page
Add support for the newer get_module_eeprom_by_page interface.
Only the upper half of the 256 byte page is available for
reading, and the firmware puts the two sections into the
extended sprom buffer, so a union is used over the extended
sprom buffer to make clear which page is to be accessed.

With get_module_eeprom_by_page implemented there is no need
for the older get_module_info or git_module_eeprom interfaces,
so remove them.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250415231317.40616-3-shannon.nelson@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 12:27:30 +02:00
Shannon Nelson
c51ab838f5 ionic: extend the QSFP module sprom for more pages
Some QSFP modules have more eeprom to be read by ethtool than
the initial high and low page 0 that is currently available
in the DSC's ionic sprom[] buffer.  Since the current sprom[]
is baked into the middle of an existing API struct, to make
the high end of page 1 and page 2 available a block is carved
from a reserved space of the existing port_info struct and the
ionic_get_module_eeprom() service is taught how to get there.

Newer firmware writes the additional QSFP page info here,
yet this remains backward compatible because older firmware
sets this space to all 0 and older ionic drivers do not use
the reserved space.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250415231317.40616-2-shannon.nelson@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 12:27:30 +02:00
Paolo Abeni
30af0cb310 Merge branch 'vxlan-convert-fdb-table-to-rhashtable'
Ido Schimmel says:

====================
vxlan: Convert FDB table to rhashtable

The VXLAN driver currently stores FDB entries in a hash table with a
fixed number of buckets (256), resulting in reduced performance as the
number of entries grows. This patchset solves the issue by converting
the driver to use rhashtable which maintains a more or less constant
performance regardless of the number of entries.

Measured transmitted packets per second using a single pktgen thread
with varying number of entries when the transmitted packet always hits
the default entry (worst case):

Number of entries | Improvement
------------------|------------
1k                | +1.12%
4k                | +9.22%
16k               | +55%
64k               | +585%
256k              | +2460%

The first patches are preparations for the conversion in the last patch.
Specifically, the series is structured as follows:

Patch #1 adds RCU read-side critical sections in the Tx path when
accessing FDB entries. Targeting at net-next as I am not aware of any
issues due to this omission despite the code being structured that way
for a long time. Without it, traces will be generated when converting
FDB lookup to rhashtable_lookup().

Patch #2-#5 simplify the creation of the default FDB entry (all-zeroes).
Current code assumes that insertion into the hash table cannot fail,
which will no longer be true with rhashtable.

Patches #6-#10 add FDB entries to a linked list for entry traversal
instead of traversing over them using the fixed size hash table which is
removed in the last patch.

Patches #11-#12 add wrappers for FDB lookup that make it clear when each
should be used along with lockdep annotations. Needed as a preparation
for rhashtable_lookup() that must be called from an RCU read-side
critical section.

Patch #13 treats dst cache initialization errors as non-fatal. See more
info in the commit message. The current code happens to work because
insertion into the fixed size hash table is slow enough for the per-CPU
allocator to be able to create new chunks of per-CPU memory.

Patch #14 adds an FDB key structure that includes the MAC address and
source VNI. To be used as rhashtable key.

Patch #15 does the conversion to rhashtable.
====================

Link: https://patch.msgid.link/20250415121143.345227-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:28 +02:00
Ido Schimmel
1f763fa808 vxlan: Convert FDB table to rhashtable
FDB entries are currently stored in a hash table with a fixed number of
buckets (256), resulting in performance degradation as the number of
entries grows. Solve this by converting the driver to use rhashtable
which maintains more or less constant performance regardless of the
number of entries.

Measured transmitted packets per second using a single pktgen thread
with varying number of entries when the transmitted packet always hits
the default entry (worst case):

Number of entries | Improvement
------------------|------------
1k                | +1.12%
4k                | +9.22%
16k               | +55%
64k               | +585%
256k              | +2460%

In addition, the change reduces the size of the VXLAN device structure
from 2584 bytes to 672 bytes.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-16-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:16 +02:00
Ido Schimmel
f13f3b4157 vxlan: Introduce FDB key structure
In preparation for converting the FDB table to rhashtable, introduce a
key structure that includes the MAC address and source VNI.

No functional changes intended.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-15-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:16 +02:00
Ido Schimmel
20c76dadc7 vxlan: Do not treat dst cache initialization errors as fatal
FDB entries are allocated in an atomic context as they can be added from
the data path when learning is enabled.

After converting the FDB hash table to rhashtable, the insertion rate
will be much higher (*) which will entail a much higher rate of per-CPU
allocations via dst_cache_init().

When adding a large number of entries (e.g., 256k) in a batch, a small
percentage (< 0.02%) of these per-CPU allocations will fail [1]. This
does not happen with the current code since the insertion rate is low
enough to give the per-CPU allocator a chance to asynchronously create
new chunks of per-CPU memory.

Given that:

a. Only a small percentage of these per-CPU allocations fail.

b. The scenario where this happens might not be the most realistic one.

c. The driver can work correctly without dst caches. The dst_cache_*()
APIs first check that the dst cache was properly initialized.

d. The dst caches are not always used (e.g., 'tos inherit').

It seems reasonable to not treat these allocation failures as fatal.

Therefore, do not bail when dst_cache_init() fails and suppress warnings
by specifying '__GFP_NOWARN'.

[1] percpu: allocation failed, size=40 align=8 atomic=1, atomic alloc failed, no space left

(*) 97% reduction in average latency of vxlan_fdb_update() when adding
256k entries in a batch.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-14-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:16 +02:00
Ido Schimmel
ebe6420674 vxlan: Create wrappers for FDB lookup
__vxlan_find_mac() is called from both the data path (e.g., during
learning) and the control path (e.g., when replacing an entry). The
function is missing lockdep annotations to make sure that the FDB hash
lock is held during FDB updates.

Rename __vxlan_find_mac() to vxlan_find_mac_rcu() to reflect the fact
that it should be called from an RCU read-side critical section and call
it from vxlan_find_mac() which checks that the FDB hash lock is held.
Change callers to invoke the appropriate function.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-13-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:16 +02:00
Ido Schimmel
5cde39ea38 vxlan: Rename FDB Tx lookup function
vxlan_find_mac() is only expected to be called from the Tx path as it
updates the 'used' timestamp. Rename it to vxlan_find_mac_tx() to
reflect that and to avoid incorrect updates of this timestamp like those
addressed by commit 9722f834fe ("vxlan: Avoid unnecessary updates to
FDB 'used' time").

No functional changes intended.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-12-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:16 +02:00
Ido Schimmel
54f45187b6 vxlan: Convert FDB flushing to RCU
Instead of holding the FDB hash lock when traversing the FDB linked list
during flushing, use RCU and only acquire the lock for entries that need
to be flushed.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-11-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:16 +02:00
Ido Schimmel
a6d04f8937 vxlan: Convert FDB garbage collection to RCU
Instead of holding the FDB hash lock when traversing the FDB linked list
during garbage collection, use RCU and only acquire the lock for entries
that need to be removed (aged out).

Avoid races by using hlist_unhashed() to check that the entry has not
been removed from the list by another thread.

Note that vxlan_fdb_destroy() uses hlist_del_init_rcu() to remove an
entry from the list which should cause list_unhashed() to return true.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-10-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
7aa0dc750d vxlan: Use linked list to traverse FDB entries
In preparation for removing the fixed size hash table, convert FDB entry
traversal to use the newly added FDB linked list.

No functional changes intended.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-9-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
8d45673d2d vxlan: Add a linked list of FDB entries
Currently, FDB entries are stored in a hash table with a fixed number of
buckets. The table is used for both lookups and entry traversal.
Subsequent patches will convert the table to rhashtable which is not
suitable for entry traversal.

In preparation for this conversion, add FDB entries to a linked list.
Subsequent patches will convert the driver to use this list when
traversing entries during dump, flush, etc.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-8-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
094adad913 vxlan: Use a single lock to protect the FDB table
Currently, the VXLAN driver stores FDB entries in a hash table with a
fixed number of buckets (256). Subsequent patches are going to convert
this table to rhashtable with a linked list for entry traversal, as
rhashtable is more scalable.

In preparation for this conversion, move from a per-bucket spin lock to
a single spin lock that protects the entire FDB table.

The per-bucket spin locks were introduced by commit fe1e0713bb
("vxlan: Use FDB_HASH_SIZE hash_locks to reduce contention") citing
"huge contention when inserting/deleting vxlan_fdbs into the fdb_head".

It is not clear from the commit message which code path was holding the
spin lock for long periods of time, but the obvious suspect is the FDB
cleanup routine (vxlan_cleanup()) that periodically traverses the entire
table in order to delete aged-out entries.

This will be solved by subsequent patches that will convert the FDB
cleanup routine to traverse the linked list of FDB entries using RCU,
only acquiring the spin lock when deleting an aged-out entry.

The change reduces the size of the VXLAN device structure from 3600
bytes to 2576 bytes.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-7-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
6ba480cca2 vxlan: Relocate assignment of default remote device
The default FDB entry can be associated with a net device if a physical
device (i.e., 'dev PHYS_DEV') was specified during the creation of the
VXLAN device.

The assignment of the net device pointer to 'dst->remote_dev' logically
belongs in the if block that resolves the pointer from the specified
ifindex, so move it there.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-6-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
ccc203b9a8 vxlan: Unsplit default FDB entry creation and notification
Commit 0241b83673 ("vxlan: fix default fdb entry netlink notify
ordering during netdev create") split the creation of the default FDB
entry from its notification to avoid sending a RTM_NEWNEIGH notification
before RTM_NEWLINK.

Previous patches restructured the code so that the default FDB entry is
created after registering the VXLAN device and the notification about
the new entry immediately follows its creation.

Therefore, simplify the code and revert back to vxlan_fdb_update() which
takes care of both creating the FDB entry and notifying user space
about it.

Hold the FDB hash lock when calling vxlan_fdb_update() like it expects.
A subsequent patch will add a lockdep assertion to make sure this is
indeed the case.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-5-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
69281e0fe1 vxlan: Insert FDB into hash table in vxlan_fdb_create()
Commit 7c31e54aee ("vxlan: do not destroy fdb if register_netdevice()
is failed") split the insertion of FDB entries into the FDB hash table
from the function where they are created.

This was done in order to work around a problem that is no longer
possible after the previous patch. Simplify the code and move the body
of vxlan_fdb_insert() back into vxlan_fdb_create().

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-4-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
884dd448f1 vxlan: Simplify creation of default FDB entry
There is asymmetry in how the default FDB entry (all-zeroes) is created
and destroyed in the VXLAN driver. It is created as part of the driver's
newlink() routine, but destroyed as part of its ndo_uninit() routine.

This caused multiple problems in the past. First, commit 0241b83673
("vxlan: fix default fdb entry netlink notify ordering during netdev
create") split the notification about the entry from its creation so
that it will not be notified to user space before the VXLAN device is
registered.

Then, commit 6db9246871 ("vxlan: Fix error path in
__vxlan_dev_create()") made the error path in __vxlan_dev_create()
asymmetric by destroying the FDB entry before unregistering the net
device. Otherwise, the FDB entry would have been freed twice: By
ndo_uninit() as part of unregister_netdevice() and by
vxlan_fdb_destroy() in the error path.

Finally, commit 7c31e54aee ("vxlan: do not destroy fdb if
register_netdevice() is failed") split the insertion of the FDB entry
into the hash table from its creation, moving the insertion after the
registration of the net device. Otherwise, like before, the FDB entry
would have been freed twice: By ndo_uninit() as part of
register_netdevice()'s error path and by vxlan_fdb_destroy() in the
error path of __vxlan_dev_create().

The end result is that the code is unnecessarily complex. In addition,
the fixed size hash table cannot be converted to rhashtable as
vxlan_fdb_insert() cannot fail, which will no longer be true with
rhashtable.

Solve this by making the addition and deletion of the default FDB entry
completely symmetric. Namely, as part of newlink() routine, create the
entry, insert it into to the hash table and send a notification to user
space after the net device was registered. Note that at this stage the
net device is still administratively down and cannot transmit / receive
packets.

Move the deletion from ndo_uninit() to the dellink routine(): Flush the
default entry together with all the other entries, before unregistering
the net device.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-3-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Ido Schimmel
804b09be09 vxlan: Add RCU read-side critical sections in the Tx path
The Tx path does not run from an RCU read-side critical section which
makes the current lockless accesses to FDB entries invalid. As far as I
am aware, this has not been a problem in practice, but traces will be
generated once we transition the FDB lookup to rhashtable_lookup().

Add rcu_read_{lock,unlock}() around the handling of FDB entries in the
Tx path. Remove the RCU read-side critical section from vxlan_xmit_nh()
as now the function is always called from an RCU read-side critical
section.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-2-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22 11:11:15 +02:00
Jakub Kicinski
07e32237ed Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-04-17

We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 18 files changed, 1748 insertions(+), 19 deletions(-).

The main changes are:

1) bpf qdisc support, from Amery Hung.
   A qdisc can be implemented in bpf struct_ops programs and
   can be used the same as other existing qdiscs in the
   "tc qdisc" command.

2) Add xsk tail adjustment tests, from Tushar Vyavahare.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Test attaching bpf qdisc to mq and non root
  selftests/bpf: Add a bpf fq qdisc to selftest
  selftests/bpf: Add a basic fifo qdisc test
  libbpf: Support creating and destroying qdisc
  bpf: net_sched: Disable attaching bpf qdisc to non root
  bpf: net_sched: Support updating bstats
  bpf: net_sched: Add a qdisc watchdog timer
  bpf: net_sched: Add basic bpf qdisc kfuncs
  bpf: net_sched: Support implementation of Qdisc_ops in bpf
  bpf: Prepare to reuse get_ctx_arg_idx
  selftests/xsk: Add tail adjustment tests and support check
  selftests/xsk: Add packet stream replacement function
====================

Link: https://patch.msgid.link/20250417184338.3152168-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21 18:51:08 -07:00
Jakub Kicinski
59af38cada Merge branch 'bnxt_en-update-for-net-next'
Michael Chan says:

====================
bnxt_en: Update for net-next

The first patch changes the FW message timeout threshold for a warning
message.  The second patch adjusts the ethtool -w coredump length to
suppress a warning.  The last 2 patches are small cleanup patches for
the bnxt_ulp RoCE auxbus code.

v1: https://lore.kernel.org/netdev/20250415174818.1088646-1-michael.chan@broadcom.com/
====================

Link: https://patch.msgid.link/20250417172448.1206107-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21 18:50:37 -07:00
Kalesh AP
76a69f360a bnxt_en: Remove unused macros in bnxt_ulp.h
BNXT_ROCE_ULP and BNXT_MAX_ULP are no longer used.  Remove them to
clean up the code.

Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250417172448.1206107-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21 18:50:34 -07:00
Kalesh AP
5bccacb4cc bnxt_en: Remove unused field "ref_count" in struct bnxt_ulp
The "ref_count" field in struct bnxt_ulp is unused after
commit a43c26fa2e ("RDMA/bnxt_re: Remove the sriov config callback").
So we can just remove it now.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250417172448.1206107-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21 18:50:34 -07:00