Commit Graph

1324987 Commits

Author SHA1 Message Date
Parav Pandit
bb70b0d48d devlink: Improve the port attributes description
Current PF number description is vague, sometimes interpreted as
some PF index. VF number in the PCI specification starts at 1; however
in kernel, it starts at 0 for representor model.

Improve the description of devlink port attributes PF, VF and SF
numbers with these details.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/20241224183706.26571-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 17:10:57 -08:00
Thomas Weißschuh
be16b46f9e ptp: ocp: constify 'struct bin_attribute'
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20241222-sysfs-const-bin_attr-ptp-v1-1-5c1f3ee246fb@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 17:10:27 -08:00
Heiner Kallweit
c83ca5a4df net: phy: fix phy_disable_eee
genphy_c45_write_eee_adv() becomes a no-op if phydev->supported_eee
is cleared. That's not what we want because this function is still
needed to clear the EEE advertisement register(s).
Fill phydev->eee_broken_modes instead to ensure that userspace
can't re-enable EEE advertising.

Fixes: b55498ff14 ("net: phy: add phy_disable_eee")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/57e2ae5f-4319-413c-b5c4-ebc8d049bc23@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 17:09:09 -08:00
Jakub Kicinski
9268abe611 Merge branch 'net-lan969x-add-rgmii-support'
Daniel Machon says:

====================
net: lan969x: add RGMII support

== Description:

This series is the fourth of a multi-part series, that prepares and adds
support for the new lan969x switch driver.

The upstreaming efforts is split into multiple series (might change a
bit as we go along):

        1) Prepare the Sparx5 driver for lan969x (merged)

        2) Add support for lan969x (same basic features as Sparx5
           provides excl. FDMA and VCAP, merged).

        3) Add lan969x VCAP functionality (merged).

    --> 4) Add RGMII support.

        5) Add FDMA support.

== RGMII support:

The lan969x switch device includes two RGMII port interfaces (port 28
and 29) supporting data speeds of 1 Gbps, 100 Mbps and 10 Mbps.

== Patch breakdown:

Patch #1 does some preparation work.

Patch #2 adds new function: is_port_rgmii() to the match data ops.

Patch #3 uses the is_port_rgmii() in a number of places.

Patch #4 makes sure that we do not configure an RGMII device as a
         low-speed device, when doing a port config.

Patch #5 makes sure we only return the PCS if the port mode requires
         it.

Patch #6 adds checks for RGMII PHY modes in sparx5_verify_speeds().

Patch #7 adds registers required to configure RGMII.

Patch #8 adds RGMII implementation.

Patch #9 documents RGMII delays in the dt-bindings.

Details are in the commit description of the individual patches

v4: https://lore.kernel.org/20241213-sparx5-lan969x-switch-driver-4-v4-0-d1a72c9c4714@microchip.com
v3: https://lore.kernel.org/20241118-sparx5-lan969x-switch-driver-4-v3-0-3cefee5e7e3a@microchip.com
v2: https://lore.kernel.org/20241113-sparx5-lan969x-switch-driver-4-v2-0-0db98ac096d1@microchip.com
v1: https://lore.kernel.org/20241106-sparx5-lan969x-switch-driver-4-v1-0-f7f7316436bd@microchip.com
====================

Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-0-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:58:08 -08:00
Daniel Machon
f0706c0472 dt-bindings: net: sparx5: document RGMII delays
The lan969x switch device supports two RGMII port interfaces that can be
configured for MAC level rx and tx delays. Document two new properties
{rx,tx}-internal-delay-ps in the bindings, used to select these delays.

Tested-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-9-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:56 -08:00
Daniel Machon
010fe5dff1 net: lan969x: add RGMII implementation
The lan969x switch device includes two RGMII port interfaces (port 28
and 29) supporting data speeds of 1 Gbps, 100 Mbps and 10 Mbps. MAC
level delays are configurable through the HSIO_WRAP target, by choosing
a phase shift selector, corresponding to a certain time delay in nano
seconds.

Add new file: lan969x_rgmii.c that contains the implementation for
configuring the RGMII port devices. MAC level delays are configured
using the "{rx,tx}-internal-delay-ps" properties. These properties must
be specified independently of the phy-mode. If missing, or set to zero,
the MAC will not apply any delay.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-8-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:56 -08:00
Daniel Machon
fb6ac1829b net: lan969x: add RGMII registers
Configuration of RGMII is done by configuring the GPIO and clock
settings in the HSIOWRAP target, and configuring the RGMII port devices
in the DEVRGMII target. Both targets contain registers replicated for
the number of RGMII port devices, which is two.

Add said targets and register macros required to configure RGMII.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-7-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:55 -08:00
Daniel Machon
95e467b85e net: sparx5: verify RGMII speeds
When doing a port config, we verify the port speed against the PHY mode
and supported speeds of that PHY mode. Add checks for the four RGMII phy
modes: RGMII, RGMII_ID, RGMII_TXID and RGMII_RXID.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-6-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:55 -08:00
Daniel Machon
9b8d70ecfe net: sparx5: only return PCS for modes that require it
The RGMII ports have no PCS to configure. Make sure we only return the
PCS for port modes that require it.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-5-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:55 -08:00
Daniel Machon
d9450934f9 net: sparx5: skip low-speed configuration when port is RGMII
When doing a port config, we configure low-speed port devices, among
other things. We have a check to ensure, that the device is indeed a
low-speed device, an not a high-speed device. Add an additional check,
to ensure that the device is not an RGMII device.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-4-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:54 -08:00
Daniel Machon
05bda8a1bd net: sparx5: use is_port_rgmii() throughout
Now that we can check if a given port is an RGMII port, use it in the
following cases:

 - To set RGMII PHY modes for RGMII port devices.

 - To avoid checking for a SerDes node in the devicetree, when the port
   is an RGMII port.

 - To bail out of sparx5_port_init() when the common configuration is
   done.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-3-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:54 -08:00
Daniel Machon
dd2baee108 net: sparx5: add function for RGMII port check
The lan969x device contains two RGMII port interfaces, sitting at port
28 and 29. Add function: is_port_rgmii() to the match data ops, that
checks if a given port is an RGMII port or not. For Sparx5, this
function always returns false.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-2-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:54 -08:00
Daniel Machon
c71b59690a net: sparx5: do some preparation work
The sparx5_port_init() does initial configuration of a variety of
different features and options for each port. Some are shared for all
types of devices, some are not. As it is now, common configuration is
done after configuration of low-speed devices. This will not work when
adding RGMII support in a subsequent patch.

In preparation for lan969x RGMII support, move a block of code, that
configures 2g5 devices, down. This ensures that the configuration common
to all devices is done before configuration of 2g5, 5g, 10g and 25g
devices.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-1-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:57:53 -08:00
Jakub Kicinski
847cf3b9c3 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
ixgbe, ixgbevf: Add support for Intel(R) E610 device

Piotr Kwapulinski says:

Add initial support for Intel(R) E610 Series of network devices. The E610
is based on X550 but adds firmware managed link, enhanced security
capabilities and support for updated server manageability.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbevf: Add support for Intel(R) E610 device
  PCI: Add PCI_VDEVICE_SUB helper macro
  ixgbe: Enable link management in E610 device
  ixgbe: Clean up the E610 link management related code
  ixgbe: Add ixgbe_x540 multiple header inclusion protection
  ixgbe: Add support for EEPROM dump in E610 device
  ixgbe: Add support for NVM handling in E610 device
  ixgbe: Add link management support for E610 device
  ixgbe: Add support for E610 device capabilities detection
  ixgbe: Add support for E610 FW Admin Command Interface
====================

Link: https://patch.msgid.link/20241220201521.3363985-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:46:49 -08:00
Kory Maincent
4c61d809cf net: ethtool: Fix suspicious rcu_dereference usage
The __ethtool_get_ts_info function can be called with or without the
rtnl lock held. When the rtnl lock is not held, using rtnl_dereference()
triggers a warning due to the lack of lock context.

Add an rcu_read_lock() to ensure the lock is acquired and to maintain
synchronization.

Reported-by: syzbot+a344326c05c98ba19682@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/676147f8.050a0220.37aaf.0154.GAE@google.com/
Fixes: b9e3f7dc9e ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241220083741.175329-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:38:46 -08:00
Jakub Kicinski
3f8f2e93cd Merge branch 'eth-fbnic-support-basic-rss-config-and-setting-channel-count'
Jakub Kicinski says:

====================
eth: fbnic: support basic RSS config and setting channel count

Add support for basic RSS config (indirection table, key get and set),
and changing the number of channels.

  # ./ksft-net-drv/run_kselftest.sh -t drivers/net/hw:rss_ctx.py
  TAP version 13
  1..1
  # timeout set to 0
  # selftests: drivers/net/hw: rss_ctx.py
  # KTAP version 1
  # 1..15
  # ok 1 rss_ctx.test_rss_key_indir
  # ok 2 rss_ctx.test_rss_queue_reconfigure
  # ok 3 rss_ctx.test_rss_resize
  # ok 4 rss_ctx.test_hitless_key_update

  .. the rest of the tests are for additional contexts so they
  get skipped..

The slicing of the patches (and bugs) are mine, but I'm keeping
Alex as the author on the patches where he wrote 100% of the code.
====================

Link: https://patch.msgid.link/20241220025241.1522781-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:59 -08:00
Jakub Kicinski
52dc722db0 eth: fbnic: support ring channel set while up
Implement the channel count changes. Copy the netdev priv,
allocate new channels using it. Stop, swap, start.
Then free the copy of the priv along with the channels it
holds, which are now the channels that used to be on the
real priv.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241220025241.1522781-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:56 -08:00
Jakub Kicinski
3a481cc726 eth: fbnic: support ring channel get and set while down
Trivial implementation of ethtool channel get and set. Set is only
supported when device is closed, next patch will add code for
live reconfig.

Asymmetric configurations are supported (combined + extra Tx or Rx),
so are configurations with independent IRQs for Rx and Tx.
Having all 3 NAPI types (combined, Tx, Rx) is not supported.

We used to only call fbnic_reset_indir_tbl() during init.
Now that we call it after device had been register must
be careful not to override user config.

Link: https://patch.msgid.link/20241220025241.1522781-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:56 -08:00
Alexander Duyck
557d02238e eth: fbnic: centralize the queue count and NAPI<>queue setting
To simplify dealing with RTNL_ASSERT() requirements further
down the line, move setting queue count and NAPI<>queue
association to their own helpers.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20241220025241.1522781-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:56 -08:00
Jakub Kicinski
3a856ab347 eth: fbnic: add IRQ reuse support
Change our method of swapping NAPIs without disturbing existing config.
This is primarily needed for "live reconfiguration" such as changing
the channel count when interface is already up.

Previously we were planning to use a trick of using shared interrupts.
We would install a second IRQ handler for the new NAPI, and make it
return IRQ_NONE until we were ready for it to take over. This works fine
functionally but breaks IRQ naming. The IRQ subsystem uses the IRQ name
to create the procfs entry, since both handlers used the same name
the second handler wouldn't get a proc directory registered.
When first one gets removed on success full ring count change
it would remove its directory and we would be left with none.

New approach uses a double pointer to the NAPI. The IRQ handler needs
to know how to locate the NAPI to schedule. We register a single IRQ handler
and give it a pointer to a pointer. We can then change what it points to
without re-registering. This may have a tiny perf impact, but really
really negligible.

Link: https://patch.msgid.link/20241220025241.1522781-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:55 -08:00
Jakub Kicinski
db7159c400 eth: fbnic: store NAPIs in an array instead of the list
We will need an array for storing NAPIs in the upcoming IRQ handler
reuse rework. Replace the current list we have, so that we are able
to reuse it later.

In a few places replace i as the iterator with t when we iterate
over triads, this seems slightly less confusing than having
i, j, k variables.

Link: https://patch.msgid.link/20241220025241.1522781-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:55 -08:00
Alexander Duyck
c23a1461bf eth: fbnic: let user control the RSS hash fields
Support setting the fields over which RSS computes its hash.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20241220025241.1522781-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:55 -08:00
Alexander Duyck
31ab733e99 eth: fbnic: support setting RSS configuration
Let the user program the RSS indirection table and the RSS key.
Straightforward implementation. Track the changes and don't bother
poking the HW if user asked for a config identical to what's already
programmed. The device only supports Toeplitz hash.

Similarly to the GET support - all the real code that does the programming
was part of initial driver submission, already.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20241220025241.1522781-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:54 -08:00
Jakub Kicinski
ef1c28817b eth: fbnic: don't reset the secondary RSS indir table
Secondary RSS indirection table is for additional contexts.
It can / should be initialized when such context is created.
Since we don't support creating RSS contexts, yet, this change
has no user visible effect.

Link: https://patch.msgid.link/20241220025241.1522781-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:54 -08:00
Alexander Duyck
7cb06a6a77 eth: fbnic: support querying RSS config
The initial driver submission already added all the RSS state,
as part of multi-queue support. Expose the configuration via
the ethtool APIs.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241220025241.1522781-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:54 -08:00
Jakub Kicinski
7d0bf493b1 eth: fbnic: reorder ethtool code
Define ethtool callback handlers in order in which they are defined
in the ops struct. It doesn't really matter what the order is,
but it's good to have an order.

Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20241220025241.1522781-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:35:53 -08:00
Jakub Kicinski
f6f1795d0c Merge branch 'mlx5-misc-changes-2024-12-19'
Tariq Toukan says:

====================
mlx5 misc changes 2024-12-19

The first two patches by Rongwei add support for multi-host LAG. The new
multi-host NICs provide each host with partial ports, allowing each host
to maintain its unique LAG configuration.

Patches 3-7 by Moshe, Mark and Yevgeny are enhancements and preparations
in fs_core and HW steering, in preparation for future patchsets.

Patches 8-9 by Itamar add SW Steering support for ConnectX-8. They are
moved here after being part of previous submissions, yet to be accepted.

Patch 10 by Carolina cleans up an unnecessary log message.

Patch 11 by Patrisious allows RDMA RX steering creation over devices
with IB link layer.
====================

Link: https://patch.msgid.link/20241219175841.1094544-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:48 -08:00
Patrisious Haddad
ef1749d506 net/mlx5: fs, Add support for RDMA RX steering over IB link layer
Relax the capability check for creating the RDMA RX steering domain
by considering only the capabilities reported by the firmware
as necessary for its creation, which in turn allows RDMA RX creation
over devices with IB link layer as well.

The table_miss_action_domain capability is required only for a specific
priority, which is handled in mlx5_rdma_enable_roce_steering().
The additional capability check for this case is already in place.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:46 -08:00
Carolina Jubran
f440d69a21 net/mlx5: Remove PTM support log message
The absence of Precision Time Measurement support should not emit a
message, as it can be misleading in contexts where PTM is not required.

Remove the log message indicating the lack of PCIe PTM support.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:46 -08:00
Itamar Gozlan
4d617b5757 net/mlx5: DR, add support for ConnectX-8 steering
Add support for a new steering format version that is implemented by
ConnectX-8.
Except for several differences, the STEv3 is identical to STEv2, so
for most callbacks STEv3 context struct will call STEv2 functions.

Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:46 -08:00
Itamar Gozlan
aa90a30804 net/mlx5: DR, expand SWS STE callbacks and consolidate common structs
Expand SWS STE callbacks to support ConnectX-8 hardware.
Move common enums and structures to a shared header file.

Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:46 -08:00
Yevgeny Kliteynik
429776b601 net/mlx5: HWS, do not initialize native API queues
HWS has two types of APIs:
 - Native: fastest and slimmest, async API.
   The user of this API is required to manage rule handles memory,
   and to poll for completion for each rule.
 - BWC: backward compatible API, similar semantics to SWS API.
   This layer is implemented above native API and it does all
   the work for the user, so that it is easy to switch between
   SWS and HWS.

Right now the existing users of HWS require only BWC API.
Therefore, in order to not waste resources, this patch disables
send queues allocation for native API.

If in the future support for faster HWS rule insertion will be required
(such as for Connection Tracking), native queues can be enabled.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:45 -08:00
Yevgeny Kliteynik
9a0155a709 net/mlx5: HWS, no need to expose mlx5hws_send_queues_open/close
No need to have mlx5hws_send_queues_open/close in header.
Make them static and remove from header.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:45 -08:00
Mark Bloch
586face881 net/mlx5: fs, retry insertion to hash table on EBUSY
When inserting into an rhashtable faster than it can grow, an -EBUSY error
may be encountered. Modify the insertion logic to retry on -EBUSY until
either a successful insertion or a genuine error is returned.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241219175841.1094544-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:45 -08:00
Moshe Shemesh
31d1356b8f net/mlx5: fs, add mlx5_fs_pool API
Refactor fc_pool API to create generic fs_pool API, as HW steering has
more flow steering elements which can take advantage of the same pool of
bulks API. Change fs_counters code to use the fs_pool API.

Note, removed __counted_by from struct mlx5_fc_bulk as bulk_len is now
inner struct member. It will be added back once __counted_by can support
inner struct members.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:45 -08:00
Moshe Shemesh
95f68e06b4 net/mlx5: fs, add counter object to flow destination
Currently mlx5_flow_destination includes counter_id which is assigned in
case we use flow counter on the flow steering rule. However, counter_id
is not enough data in case of using HW Steering. Thus, have mlx5_fc
object as part of mlx5_flow_destination instead of counter_id and assign
it where needed.

In case counter_id is received from user space, create a local counter
object to represent it.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:45 -08:00
Rongwei Liu
60d01cc468 net/mlx5: LAG, Support LAG over Multi-Host NICs
New multi-host NICs provide each host with partial ports,
allowing each host to maintain its unique LAG configuration.

On these multi-host NICs, the 'native_port_num' capability
is no longer continuous on each host and can exceed the
'num_lag_ports' capability. Therefore, it is necessary to
skip the PFs with ldev->pf[i].dev == NULL when querying/modifying
the lag devices' information.
There is no need to check dev.native_port_num against ldev->ports.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:45 -08:00
Rongwei Liu
ddbb5ddc43 net/mlx5: LAG, Refactor lag logic
Wrap the lag pf access into two new macros:
1. ldev_for_each()
2. ldev_for_each_reverse()
The maximum number of lag ports and the index to `natvie_port_num`
mapping will be handled by the two new macros.
Users shouldn't use the for loop anymore.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:34:44 -08:00
Jakub Kicinski
8d94a744e1 Merge branch 'add-rds-ptp-library-for-microchip-phys'
Divya Koppera says:

====================
Add rds ptp library for Microchip phys

Adds support for rds ptp library in Microchip phys, where rds is internal
code name for ptp IP or hardware. This library will be re-used in
Microchip phys where same ptp hardware is used. Register base addresses
and mmd may changes, due to which base addresses and mmd is made variable
in this library.
====================

Link: https://patch.msgid.link/20241219123311.30213-1-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:31:01 -08:00
Divya Koppera
9fc3d6fe80 net: phy: microchip_t1 : Add initialization of ptp for lan887x
Add initialization of ptp for lan887x.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Link: https://patch.msgid.link/20241219123311.30213-6-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:30:59 -08:00
Divya Koppera
85b39f7593 net: phy: Makefile: Add makefile support for rds ptp in Microchip phys
Add makefile support for rds ptp library.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Link: https://patch.msgid.link/20241219123311.30213-5-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:30:59 -08:00
Divya Koppera
2550afc61e net: phy: Kconfig: Add rds ptp library support and 1588 optional flag in Microchip phys
Add ptp library support in Kconfig
As some of Microchip T1 phys support ptp, add dependency
of 1588 optional flag in Kconfig

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Link: https://patch.msgid.link/20241219123311.30213-4-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:30:59 -08:00
Divya Koppera
fa51199c5f net: phy: microchip_rds_ptp : Add rds ptp library for Microchip phys
Add rds ptp library for Microchip phys
1-step and 2-step modes are supported, over Ethernet and UDP(ipv4, ipv6)

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Link: https://patch.msgid.link/20241219123311.30213-3-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:30:58 -08:00
Divya Koppera
d46ef4ee38 net: phy: microchip_rds_ptp: Add header file for Microchip rds ptp library
This rds ptp header file will cover ptp macros for future phys in
Microchip where addresses will be same but base offset and mmd address
may changes.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Link: https://patch.msgid.link/20241219123311.30213-2-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:30:58 -08:00
Jakub Kicinski
b4cbbf078c Merge branch 'vsock-test-tests-for-memory-leaks'
Michal Luczaj says:

====================
vsock/test: Tests for memory leaks

Series adds tests for recently fixed memory leaks[1]:

commit d7b0ff5a86 ("virtio/vsock: Fix accept_queue memory leak")
commit fbf7085b3a ("vsock: Fix sk_error_queue memory leak")
commit 60cf6206a1 ("virtio/vsock: Improve MSG_ZEROCOPY error handling")

Patch 1 is a non-functional preparatory cleanup.
Patch 2 is a test suite extension for picking specific tests.
Patch 3 explains the need of kmemleak scans.
Patch 4 adapts utility functions to handle MSG_ZEROCOPY.
Patches 5-6-7 add the tests.

NOTE: Test in the last patch ("vsock/test: Add test for MSG_ZEROCOPY
completion memory leak") may stop working even before this series is
merged. See changes proposed in [2]. The failslab variant would be
unaffected.

[1] https://lore.kernel.org/20241107-vsock-mem-leaks-v2-0-4e21bfcfc818@rbox.co
[2] https://lore.kernel.org/CANn89i+oL+qoPmbbGvE_RT3_3OWgeck7cCPcTafeehKrQZ8kyw@mail.gmail.com

v3: https://lore.kernel.org/20241218-test-vsock-leaks-v3-0-f1a4dcef9228@rbox.co
v2: https://lore.kernel.org/20241216-test-vsock-leaks-v2-0-55e1405742fc@rbox.co
v1: https://lore.kernel.org/20241206-test-vsock-leaks-v1-0-c31e8c875797@rbox.co
====================

Link: https://patch.msgid.link/20241219-test-vsock-leaks-v4-0-a416e554d9d7@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:29:01 -08:00
Michal Luczaj
d127ac8b1d vsock/test: Add test for MSG_ZEROCOPY completion memory leak
Exercise the ENOMEM error path by attempting to hit net.core.optmem_max
limit on send().

Test aims to create a memory leak, kmemleak should be employed.

Fixed by commit 60cf6206a1 ("virtio/vsock: Improve MSG_ZEROCOPY error
handling").

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20241219-test-vsock-leaks-v4-7-a416e554d9d7@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:28:01 -08:00
Michal Luczaj
ec50efee8c vsock/test: Add test for sk_error_queue memory leak
Ask for MSG_ZEROCOPY completion notification, but do not recv() it.
Test attempts to create a memory leak, kmemleak should be employed.

Fixed by commit fbf7085b3a ("vsock: Fix sk_error_queue memory leak").

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20241219-test-vsock-leaks-v4-6-a416e554d9d7@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:28:01 -08:00
Michal Luczaj
f66ef469a7 vsock/test: Add test for accept_queue memory leak
Attempt to enqueue a child after the queue was flushed, but before
SOCK_DONE flag has been set.

Test tries to produce a memory leak, kmemleak should be employed. Dealing
with a race condition, test by its very nature may lead to a false
negative.

Fixed by commit d7b0ff5a86 ("virtio/vsock: Fix accept_queue memory
leak").

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20241219-test-vsock-leaks-v4-5-a416e554d9d7@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:28:01 -08:00
Michal Luczaj
f52e7f593b vsock/test: Adapt send_byte()/recv_byte() to handle MSG_ZEROCOPY
For a zerocopy send(), buffer (always byte 'A') needs to be preserved (thus
it can not be on the stack) or the data recv()ed check in recv_byte() might
fail.

While there, change the printf format to 0x%02x so the '\0' bytes can be
seen.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20241219-test-vsock-leaks-v4-4-a416e554d9d7@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:28:01 -08:00
Michal Luczaj
50f9434463 vsock/test: Add README blurb about kmemleak usage
Document the suggested use of kmemleak for memory leak detection.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20241219-test-vsock-leaks-v4-3-a416e554d9d7@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:28:00 -08:00