Commit Graph

1309076 Commits

Author SHA1 Message Date
Russell King (Oracle)
acb5fb5a42 net: pcs: xpcs: use dev_*() to print messages
Use the dev_*() family of functions to print all messages from the XPCS
driver so we know which instance issues the messages.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
d69908faf1 net: pcs: xpcs: convert to use read_poll_timeout()
Convert the xpcs driver to use read_poll_timeout() when waiting for
reset to complete, rather than open-coding this.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
ce8d6081fc net: pcs: xpcs: add _modify() accessors
The xpcs driver does a lot of read-modify-write operations on
registers, which leads to long-winded code to read the register, check
whether the read was successful, modify the value in some way, and then
write it back.

We have a mdiodev _modify() accessor that encapsulates this, and does
the register modification under the MDIO bus lock ensuring that the
modification is atomic with respect to other bus operations. Convert
the xpcs driver to use this accessor.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
f681891810 net: pcs: xpcs: use FIELD_PREP() and FIELD_GET()
Convert xpcs to use the bitfield macros rather than definining the
bitfield shifts and open-coding the insertion and extraction of these
bitfields.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
7921d3e602 net: pcs: xpcs: move searching ID list out of line
Move the searching of the physical ID out of xpcs_create() and into
its own xpcs_identify() function, which makes it self contained.
This reduces the complexity in xpcs_craete(), making it easier to
follow, rather than having a lot of once-run code in the big for()
loop.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
135d118bfd net: pcs: xpcs: rename xpcs_get_id()
Rename xpcs_get_id() to xpcs_read_id() which more closely reflects
the purpose of this function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
accd5f5cd2 net: pcs: xpcs: move definition of struct dw_xpcs to private header
There should be no reason for anything outside the XPCS code to know
the contents of struct dw_xpcs - this is a private structure to XPCS.
Move the definition to the private pcs-xpcs.h header, leaving a
declaration in the global pcs/pcs-xpcs.h

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
f042365a26 net: pcs: xpcs: provide a helper to get the phylink pcs given xpcs
Provide a helper to provide the pointer to the phylink_pcs struct
given a valid xpcs pointer. This will be necessary when we make
struct dw_xpcs private to pcs-xpcs.c

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
4490f5669b net: pcs: xpcs: pass xpcs instead of xpcs->id to xpcs_find_compat()
xpcs_find_compat() is now always passed xpcs->id. Rather than always
dereferencing this in the caller, move it into xpcs_find_compat(),
thus making this function consistent with most of the other xpcs
functions in taking an xpcs pointer.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
0397212f93 net: pcs: xpcs: don't use array for interface
Currently, xpcs uses an array of interfaces that each "compat" entry
supports. When looking up the compat entry for an interface, we
iterate over the compat entries and then over each interface.

Since each compat entry only has a single interface in its interfaces
array, replace the array with a single member in the compat structure.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Russell King (Oracle)
e30993a9ab net: pcs: xpcs: remove dw_xpcs_compat enum
There is no reason for the struct dw_xpcs_compat arrays to be a fixed
size other than the way we iterate over them. The index into the array
isn't used for anything, and having them fixed size needlessly wastes
space.

Remove the enum that defines their size, and instead use an empty
array entry (with NULL ->supported) to mark the end of the array.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09 12:13:11 +01:00
Tarun Alle
36efaca9cb net: phy: microchip_t1: SQI support for LAN887x
Add support for measuring Signal Quality Index for LAN887x T1 PHY.
Signal Quality Index (SQI) is measure of Link Channel Quality from
0 to 7, with 7 as the best. By default, a link loss event shall
indicate an SQI of 0.

Signed-off-by: Tarun Alle <Tarun.Alle@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241007063943.3233-1-tarun.alle@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 18:24:16 -07:00
Jakub Kicinski
3a04f87127 Merge branch 'net-phy-marvell-88q2xxx-enable-auto-negotiation-for-mv88q2110'
Niklas Söderlund says:

====================
net: phy: marvell-88q2xxx: Enable auto negotiation for mv88q2110

This series enables auto negotiation for the mv88q2110 device.
Previously this feature have been disabled for mv88q2110, while enabled
for other devices supported by this driver.

The initial driver implementation states this is due to the
configuration sequence provided by the vendor did not work. By comparing
the initialization sequence of other devices this driver supports and
the out-of-tree PHY driver for mv88q2110 found in the Renesas BSP [1]
I was able to figure out a working configuration.

As I have no access to the datasheets of either of these devices it
would be super if someone who has could sanity check the initialization
sequence.

With this series I'm able to auto negotiate both 1000Mbps and 100Mbps
links without issue.

    # ethtool eth0
    Settings for eth0:
            Supported ports: [  ]
            Supported link modes:   100baseT1/Full
                                    1000baseT1/Full
            Supported pause frame use: Symmetric Receive-only
            Supports auto-negotiation: Yes
            Supported FEC modes: Not reported
            Advertised link modes:  100baseT1/Full
                                    1000baseT1/Full
            Advertised pause frame use: No
            Advertised auto-negotiation: Yes
            Advertised FEC modes: Not reported
            Link partner advertised link modes:  100baseT1/Full
                                                 1000baseT1/Full
            Link partner advertised pause frame use: No
            Link partner advertised auto-negotiation: Yes
            Link partner advertised FEC modes: Not reported
            Speed: 1000Mb/s
            Duplex: Full
            Auto-negotiation: on
            master-slave cfg: preferred master
            master-slave status: slave
            Port: Twisted Pair
            PHYAD: 0
            Transceiver: external
            MDI-X: Unknown
            Link detected: yes
            SQI: 15/15

And the performance is good too. Without this change I was not able to
manually configure a 1000Mbps link, only 100Mbps ones. So this gives a
huge performance boost for my use-case.

    [  5] local 10.1.0.2 port 5201 connected to 10.1.0.1 port 38346
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  96.8 MBytes   812 Mbits/sec    0    469 KBytes
    [  5]   1.00-2.00   sec  94.3 MBytes   791 Mbits/sec    0    469 KBytes
    [  5]   2.00-3.00   sec  96.1 MBytes   806 Mbits/sec    0    469 KBytes
    [  5]   3.00-4.00   sec  98.3 MBytes   825 Mbits/sec    0    469 KBytes
    [  5]   4.00-5.00   sec  98.4 MBytes   825 Mbits/sec    0    469 KBytes
    [  5]   5.00-6.00   sec  98.4 MBytes   826 Mbits/sec    0    469 KBytes
    [  5]   6.00-7.00   sec  98.9 MBytes   830 Mbits/sec    0    469 KBytes
    [  5]   7.00-8.00   sec  91.7 MBytes   769 Mbits/sec    0    469 KBytes
    [  5]   8.00-9.00   sec  99.4 MBytes   834 Mbits/sec    0    747 KBytes
    [  5]   9.00-10.00  sec   101 MBytes   851 Mbits/sec    0    747 KBytes

Patch 1/3 and 2/3 are preparation patches that align and move functions
around as the mv88q2110 code paths can now reuses much of what is done
for mv88q2220. While patch 3/3 adds the new initialization sequence and
removes the auto negotiation limit for mv88q2110.

1.  2a1f07d0e7
====================

Link: https://patch.msgid.link/20241005112412.544360-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 18:18:21 -07:00
Niklas Söderlund
20c7722a7a net: phy: marvell-88q2xxx: Enable auto negotiation for mv88q2110
The initial marvell-88q2xxx driver only supported the Marvell 88Q2110
PHY without auto negotiation support. The reason documented states that
the provided initialization sequence did not to work. Now a method to
enable auto negotiation have been found by comparing the initialization
of other supported devices and an out-of-tree PHY driver.

Perform the minimal needed initialization of the PHY to get auto
negotiation working and remove the limitation that disables the auto
negotiation feature for the mv88q2110 device.

With this change a 1000Mbps full duplex link is able to be negotiated
between two mv88q2110 and the link works perfectly. The other side also
reflects the manually configure settings of the master device.

    # ethtool eth0
    Settings for eth0:
            Supported ports: [  ]
            Supported link modes:   100baseT1/Full
                                    1000baseT1/Full
            Supported pause frame use: Symmetric Receive-only
            Supports auto-negotiation: Yes
            Supported FEC modes: Not reported
            Advertised link modes:  100baseT1/Full
                                    1000baseT1/Full
            Advertised pause frame use: No
            Advertised auto-negotiation: Yes
            Advertised FEC modes: Not reported
            Link partner advertised link modes:  100baseT1/Full
                                                 1000baseT1/Full
            Link partner advertised pause frame use: No
            Link partner advertised auto-negotiation: Yes
            Link partner advertised FEC modes: Not reported
            Speed: 1000Mb/s
            Duplex: Full
            Auto-negotiation: on
            master-slave cfg: preferred master
            master-slave status: slave
            Port: Twisted Pair
            PHYAD: 0
            Transceiver: external
            MDI-X: Unknown
            Link detected: yes
            SQI: 15/15

Before this change I was not able to manually configure 1000Mbps link,
only a 100Mpps link so this change providers an improvement in
performance for this device.

    [  5] local 10.1.0.2 port 5201 connected to 10.1.0.1 port 38346
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  96.8 MBytes   812 Mbits/sec    0    469 KBytes
    [  5]   1.00-2.00   sec  94.3 MBytes   791 Mbits/sec    0    469 KBytes
    [  5]   2.00-3.00   sec  96.1 MBytes   806 Mbits/sec    0    469 KBytes
    [  5]   3.00-4.00   sec  98.3 MBytes   825 Mbits/sec    0    469 KBytes
    [  5]   4.00-5.00   sec  98.4 MBytes   825 Mbits/sec    0    469 KBytes
    [  5]   5.00-6.00   sec  98.4 MBytes   826 Mbits/sec    0    469 KBytes
    [  5]   6.00-7.00   sec  98.9 MBytes   830 Mbits/sec    0    469 KBytes
    [  5]   7.00-8.00   sec  91.7 MBytes   769 Mbits/sec    0    469 KBytes
    [  5]   8.00-9.00   sec  99.4 MBytes   834 Mbits/sec    0    747 KBytes
    [  5]   9.00-10.00  sec   101 MBytes   851 Mbits/sec    0    747 KBytes

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Tested-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241005112412.544360-4-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 18:18:16 -07:00
Niklas Söderlund
0e58c18871 net: phy: marvell-88q2xxx: Make register writer function generic
In preparation to adding auto negotiation support to mv88q2110 move and
rename the helper function used to write an array of register values to
the PHY.

Just as for mv88q2220 devices this helper will be needed to for the
initial configuration of the mv88q2110 to support auto negotiation.

The function is moved verbatim, there is no change in behavior.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Tested-by: Stefan Eichenberger <eichest@gmail.com>
Link: https://patch.msgid.link/20241005112412.544360-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 18:18:16 -07:00
Niklas Söderlund
21185019aa net: phy: marvell-88q2xxx: Align soft reset for mv88q2110 and mv88q2220
The soft reset implementations for mv88q2110 and mv88q2220 differ as the
later need to consider that auto negation is supported on mv88q2220
devices. In preparation of enabling auto negotiation on mv88q2110 merge
the two rest functions into a device generic one.

The mv88q2220 behavior is kept as is but extended to wait for the reset
bit to be clears before continuing, as was done previously on mv88q2220.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Tested-by: Stefan Eichenberger <eichest@gmail.com>
Link: https://patch.msgid.link/20241005112412.544360-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 18:18:16 -07:00
Andrew Kreimer
ed1f3b7f15 fsl/fman: Fix a typo
Fix a typo in comments: bellow -> below.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241006130829.13967-1-algonell@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 18:14:47 -07:00
Daniel Golle
a2e1ba275e net: phy: aquantia: allow forcing order of MDI pairs
Despite supporting Auto MDI-X, it looks like Aquantia only supports
swapping pair (1,2) with pair (3,6) like it used to be for MDI-X on
100MBit/s networks.

When all 4 pairs are in use (for 1000MBit/s or faster) the link does not
come up with pair order is not configured correctly, either using
MDI_CFG pin or using the "PMA Receive Reserved Vendor Provisioning 1"
register.

Normally, the order of MDI pairs being either ABCD or DCBA is configured
by pulling the MDI_CFG pin.

However, some hardware designs require overriding the value configured
by that bootstrap pin. The PHY allows doing that by setting a bit in
"PMA Receive Reserved Vendor Provisioning 1" register which allows
ignoring the state of the MDI_CFG pin and another bit configuring
whether the order of MDI pairs should be normal (ABCD) or reverse
(DCBA). Pair polarity is not affected and remains identical in both
settings.

Introduce property "marvell,mdi-cfg-order" which allows forcing either
normal or reverse order of the MDI pairs from DT.

If the property isn't present, the behavior is unchanged and MDI pair
order configuration is untouched (ie. either the result of MDI_CFG pin
pull-up/pull-down, or pair order override already configured by the
bootloader before Linux is started).

Forcing normal pair order is required on the Adtran SDG-8733A Wi-Fi 7
residential gateway.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/9ed760ff87d5fc456f31e407ead548bbb754497d.1728058550.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 17:16:35 -07:00
Daniel Golle
1432965bf5 dt-bindings: net: marvell,aquantia: add property to override MDI_CFG
Usually the MDI pair order reversal configuration is defined by
bootstrap pin MDI_CFG. Some designs, however, require overriding the MDI
pair order and force either normal or reverse order.

Add property 'marvell,mdi-cfg-order' to allow forcing either normal or
reverse order of the MDI pairs.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/7ccf25d6d7859f1ce9983c81a2051cfdfb0e0a99.1728058550.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 17:16:35 -07:00
Jakub Kicinski
33019c70ae Merge branch 'selftests-mlxsw-stabilize-red-tests'
Petr Machata says:

====================
selftests: mlxsw: Stabilize RED tests

Tweak the mlxsw-specific RED selftests to increase stability on
Spectrum-3 and Spectrum-4 machines.
====================

Link: https://patch.msgid.link/cover.1728316370.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:37:26 -07:00
Petr Machata
501fa2426b selftests: mlxsw: sch_red_core: Lower TBF rate
The RED test uses a pair of TBF shapers. The first to get predictably-sized
stream of traffic, and second to get a 100% saturated chokepoint. To this
chokepoint it injects individual packets. Because the chokepoint is
saturated, these additional packets go straight to the backlog. This allows
the test to check RED behavior across various queue sizes.

The shapers are rated at 1Gbps, for historical reasons (before mlxsw
supported TBF offload, the test used port speed to create the chokepoints).
Machines with a low-power CPU may have trouble consistently generating
1Gbps of traffic, and the test then spuriously fails.

Instead, drop the rate to 200Mbps (Spectrum has a guaranteed shaper rate
granularity of 200Mbps, so anything lower is not guaranteed to work well).
Because that means fewer packets will be mirrored in the ECN-mark test,
adjust the passing condition accordingly.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/c6712f9c5de75ae0bc2ab3d8ea7d92aaaf93af95.1728316370.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:37:24 -07:00
Petr Machata
7049166e51 selftests: mlxsw: sch_red_core: Send more packets for drop tests
This test works by injecting into a port with a maxed-out queue a couple
packets and checks if a corresponding number of packets were dropped. This
has worked well on Spectrum<4, but on Spectrum-4 it has been noisy. This
is in line with the observation that on Spectrum-4, queue size tends to
fluctuate more. A handful of packets could then still be accepted to the
queue even though it was nominally full just recently.

In order to accommodate this behavior, send many more packets. The buffer
can fit N extra packets, but not N% packets. This therefore allows us to
set wider absolute margins, while actually narrowing them relatively.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/abc869b9f6003d400d6293ddd5edb2f4517f44d5.1728316370.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:37:24 -07:00
Petr Machata
787f148cec selftests: mlxsw: sch_red_core: Sleep before querying queue depth
The qdisc stats are taken from the port's periodic HW stats, which are
updated once a second. We try to accommodate the latency by using busywait
in build_backlog().

The issue in that seems to be that when do_mark_test() builds the backlog,
it makes the decision whether to send more packets based on the first
instance of the queue depth stat exceeding the current value, when in fact
more traffic is on the way and the queue depth would increase further. This
leads to failures in TC 1 of mark-mirror test, where we see the following
failure:

TEST: TC 0: marked packets mirror'd                                 [ OK ]
TEST: TC 1: marked packets mirror'd                                 [FAIL]
        Spurious packets (1680 -> 2290) observed without buffer pressure

Fix by waiting for the full second before reading the queue depth for the
first time, to make sure it reflects all in-flight traffic.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Link: https://patch.msgid.link/321dcf8b3e9a1f0766429c8cf3e3f1746f1bc375.1728316370.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:37:24 -07:00
Petr Machata
8fb5b60734 selftests: mlxsw: sch_red_core: Increase backlog size tolerance
Backlog fluctuates on Spectrum-4 much more than on <4. In practice we can
sample queue depth values going from about -12% to about +7% of the
configured RED limit. The test which checks the queue size has a limit of
+-10%, and as a result often fails. We attempted to fix the issue by
busywaiting for several seconds hoping to get within the bounds, but that
still proved to be too noisy (or the wait time would be impractically
long). Unfortunately we have to bump the value tolerance from 10% to 15%,
which in this patch do.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Link: https://patch.msgid.link/f54950df2a8fcba46c3ddc1053376352fa2e592b.1728316370.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:37:24 -07:00
Petr Machata
870dd51117 selftests: mlxsw: sch_red_ets: Increase required backlog
Backlog fluctuates on Spectrum-4 much more than on <4. Increasing the
desired backlog seems to help, as the constant fluctuations do not overlap
into the territory where packets are marked.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Link: https://patch.msgid.link/0821fb3aa8bb6a6c0d3000baab04995517c9a0cc.1728316370.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:37:23 -07:00
Bartosz Golaszewski
881c98f44f net: phy: smsc: use devm_clk_get_optional_enabled_with_rate()
Fold the separate call to clk_set_rate() into the clock getter.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241007134100.107921-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:36:28 -07:00
Dr. David Alan Gilbert
35213cfeef chelsio/chtls: Remove unused chtls_set_tcb_tflag
chtls_set_tcb_tflag() has been unused since 2021's commit
827d329105 ("chtls: Remove invalid set_tcb call")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241007004652.150065-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:34:38 -07:00
Dr. David Alan Gilbert
3fe3dbaf26 caif: Remove unused cfsrvl_getphyid
cfsrvl_getphyid() has been unused since 2011's commit
f362144084 ("caif: Use RCU and lists in cfcnfg.c for managing caif link layers")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241007004456.149899-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:33:49 -07:00
Jason Xing
da5e06dee5 net-timestamp: namespacify the sysctl_tstamp_allow_data
Let it be tuned in per netns by admins.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241005222609.94980-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:33:11 -07:00
Aryan Srivastava
ada5c3229b net: dsa: mv88e6xxx: Add FID map cache
Add a cached FID bitmap. This mitigates the need to walk all VTU entries
to find the next free FID.

When flushing the VTU (during init), zero the FID bitmap. Use and
manipulate this bitmap from now on, instead of reading HW for the FID
map.

The repeated VTU walks are costly and can take ~40 mins if ~4000 vlans
are added. Caching the FID map reduces this time to <2 mins.

Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241006212905.3142976-1-aryan.srivastava@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 15:30:55 -07:00
Jakub Kicinski
42b2331081 tools: ynl-gen: refactor check validation for TypeBinary
We only support a single check at a time for TypeBinary.
Refactor the code to cover 'exact-len' and make adding
new checks easier.

Link: https://lore.kernel.org/20241004063855.1a693dd1@kernel.org
Link: https://patch.msgid.link/20241007155311.1193382-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 08:22:38 -07:00
Joe Damato
49717ef01c idpf: Don't hard code napi_struct size
The sizeof(struct napi_struct) can change. Don't hardcode the size to
400 bytes and instead use "sizeof(struct napi_struct)".

Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20241004105407.73585-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08 08:14:50 -07:00
Paolo Abeni
489cee4cae Merge branch 'rtnetlink-per-netns-rtnl'
Kuniyuki Iwashima says:

====================
rtnetlink: Per-netns RTNL.

rtnl_lock() is a "Big Kernel Lock" in the networking slow path and
serialised all rtnetlink requests until 4.13.

Since RTNL_FLAG_DOIT_UNLOCKED and RTNL_FLAG_DUMP_UNLOCKED have been
introduced in 4.14 and 6.9, respectively, rtnetlink message handlers
are ready to be converted to RTNL-less/free.

15 out of 44 dumpit()s have been converted to RCU so far, and the
progress is pretty good.  We can now dump various major network
resources without RTNL.

12 out of 87 doit()s have been converted, but most of the converted
doit()s are also on the reader side of RTNL; their message types are
RTM_GET*.

So, most of RTM_(NEW|DEL|SET)* operations are still serialised by RTNL.

For example, one of our services creates 2K netns and a small number
of network interfaces in each netns that require too many writer-side
rtnetlink requests, and setting up a single host takes 10+ minutes.

RTNL is still a huge pain for network configuration paths, and we need
more granular locking, given converting all doit()s would be unfeasible.

Actually, most RTNL users do not need to freeze multiple netns, and such
users can be protected by per-netns RTNL mutex.  The exceptions would be
RTM_NEWLINK, RTM_DELLINK, and RTM_SETLINK.  (See [0] and [1])

This series is the first step of the per-netns RTNL conversion that
gradually replaces rtnl_lock() with rtnl_net_lock(net) under
CONFIG_DEBUG_NET_SMALL_RTNL.

[0]: https://netdev.bots.linux.dev/netconf/2024/index.html
[1]: https://lpc.events/event/18/contributions/1959/
====================

Link: https://patch.msgid.link/20241004221031.77743-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 15:17:02 +02:00
Kuniyuki Iwashima
03fa534856 rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.
The global and per-netns netdev notifier depend on RTNL, and its
dependency is not so clear due to nested calls.

Let's add a placeholder to place ASSERT_RTNL_NET() for each event.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 15:16:59 +02:00
Kuniyuki Iwashima
844e5e7e65 rtnetlink: Add assertion helpers for per-netns RTNL.
Once an RTNL scope is converted with rtnl_net_lock(), we will replace
RTNL helper functions inside the scope with the following per-netns
alternatives:

  ASSERT_RTNL()           -> ASSERT_RTNL_NET(net)
  rcu_dereference_rtnl(p) -> rcu_dereference_rtnl_net(net, p)

Note that the per-netns helpers are equivalent to the conventional
helpers unless CONFIG_DEBUG_NET_SMALL_RTNL is enabled.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 15:16:59 +02:00
Kuniyuki Iwashima
76aed95319 rtnetlink: Add per-netns RTNL.
The goal is to break RTNL down into per-netns mutex.

This patch adds per-netns mutex and its helper functions, rtnl_net_lock()
and rtnl_net_unlock().

rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and
rtnl_net_unlock() releases them.

We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes
rtnl_lock() in rtnl_net_lock().

When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(),
and its locking order is defined by rtnl_net_lock_cmp_fn() as follows:

  1. init_net is first
  2. netns address ascending order

Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL
with LOCKDEP so that we can carefully add the extra mutex without slowing
down RTNL operations during conversion.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 15:16:59 +02:00
Kuniyuki Iwashima
ec763c234d Revert "rtnetlink: add guard for RTNL"
This reverts commit 464eb03c4a.

Once we have a per-netns RTNL, we won't use guard(rtnl).

Also, there's no users for now.

  $ grep -rnI "guard(rtnl" || true
  $

Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/CANn89i+KoYzUH+VPLdGmLABYf5y4TW0hrM4UAeQQJ9AREty0iw@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 15:16:59 +02:00
Paolo Abeni
f178812d74 Merge branch 'net-fec-add-pps-channel-configuration'
Francesco Dolcini says:

====================
net: fec: add PPS channel configuration

Make the FEC Ethernet PPS channel configurable from device tree.
====================

Link: https://patch.msgid.link/20241004152419.79465-1-francesco@dolcini.it
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:29:37 +02:00
Francesco Dolcini
566c2d8388 net: fec: make PPS channel configurable
Depending on the SoC where the FEC is integrated into the PPS channel
might be routed to different timer instances. Make this configurable
from the devicetree.

When the related DT property is not present fallback to the previous
default and use channel 0.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Tested-by: Rafael Beims <rafael.beims@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Csókás, Bence <csokas.bence@prolan.hu>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:29:34 +02:00
Francesco Dolcini
bf8ca67e21 net: fec: refactor PPS channel configuration
Preparation patch to allow for PPS channel configuration, no functional
change intended.

Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Csókás, Bence <csokas.bence@prolan.hu>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:29:34 +02:00
Francesco Dolcini
1aa772be04 dt-bindings: net: fec: add pps channel property
Add fsl,pps-channel property to select where to connect the PPS signal.
This depends on the internal SoC routing and on the board, for example
on the i.MX8 SoC it can be connected to an external pin (using channel 1)
or to internal eDMA as DMA request (channel 0).

Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:29:34 +02:00
Paolo Abeni
5d6a8aeabd Merge branch 'net-sparx5-prepare-for-lan969x-switch-driver'
Daniel Machon says:

====================
net: sparx5: prepare for lan969x switch driver

== Description:

This series is the first of a multi-part series, that prepares and adds
support for the new lan969x switch driver.

The upstreaming efforts is split into multiple series (might change a
bit as we go along):

    1) Prepare the Sparx5 driver for lan969x (this series)
    2) Add support lan969x (same basic features as Sparx5 provides +
       RGMII, excl.  FDMA and VCAP)
    3) Add support for lan969x FDMA
    4) Add support for lan969x VCAP

== Lan969x in short:

The lan969x Ethernet switch family [1] provides a rich set of
switching features and port configurations (up to 30 ports) from 10Mbps
to 10Gbps, with support for RGMII, SGMII, QSGMII, USGMII, and USXGMII,
ideal for industrial & process automation infrastructure applications,
transport, grid automation, power substation automation, and ring &
intra-ring topologies. The LAN969x family is hardware and software
compatible and scalable supporting 46Gbps to 102Gbps switch bandwidths.

== Preparing Sparx5 for lan969x:

The lan969x switch chip reuses many of the IP's of the Sparx5 switch
chip, therefore it has been decided to add support through the existing
Sparx5 driver, in order to avoid a bunch of duplicate code. However, in
order to reuse the Sparx5 switch driver, we have to introduce some
mechanisms to handle the chip differences that are there.  These
mechanisms are:

    - Platform match data to contain all the differences that needs to
      be handled (constants, ops etc.)

    - Register macro indirection layer so that we can reuse the existing
      register macros.

    - Function for branching out on platform type where required.

In some places we ops out functions and in other places we branch on the
chip type. Exactly when we choose one over the other, is an estimate in
each case.

After this series is applied, the Sparx5 driver will be prepared for
lan969x and still function exactly as before.

== Patch breakdown:

Patch #1        adds private match data

Patch #2        adds register macro indirection layer

Patch #3-#4     does some preparation work

Patch #5-#7     adds chip constants and updates the code to use them

Patch #8-#13    adds and uses ops for handling functions differently on the
                two platforms.

Patch #14       adds and uses a macro for branching out on the chip type.

Patch #15 (NEW) redefines macros for internal ports and PGID's.

[1] https://www.microchip.com/en-us/product/lan9698

To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Lars Povlsen <lars.povlsen@microchip.com>
To: Steen Hegelund <Steen.Hegelund@microchip.com>
To: horatiu.vultur@microchip.com
To: jensemil.schulzostergaard@microchip.com
To: UNGLinuxDriver@microchip.com
To: Richard Cochran <richardcochran@gmail.com>
To: horms@kernel.org
To: justinstitt@google.com
To: gal@nvidia.com
To: aakash.r.menon@gmail.com
To: jacob.e.keller@intel.com
To: ast@fiberby.net
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================

Link: https://patch.msgid.link/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:07 +02:00
Daniel Machon
8cc4102363 net: sparx5: redefine internal ports and PGID's as offsets
Internal ports and PGID's are both defined relative to the number of
front ports on Sparx5. This will not work on lan969x. Instead make them
offsets to the number of front ports and add two helpers to retrieve
them. Use the helpers throughout.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:02 +02:00
Daniel Machon
4b67bcb909 net: sparx5: add is_sparx5 macro and use it throughout
We dont want to ops out each time a function needs to do some platform
specifics. In particular we have a few places, where it would be
convenient to just branch out on the platform type. Add the function
is_sparx5() and, initially, use it for:

    - register writes that should only be done on Sparx5 (QSYS_CAL_CTRL,
      CLKGEN_LCPLL1_CORE_CLK).

    - function calls that should only be done on Sparx5
      (ethtool_op_get_ts_info())

    - register writes that are chip-exclusive (MASK_CFG1/2, PGID_CFG1/2,
      these are replicated for n_ports >32 on Sparx5).

The is_sparx5() function simply checks the target chip type, to
determine if this is a Sparx5 SKU or not.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:01 +02:00
Daniel Machon
a0dd890682 net: sparx5: ops out function for DSM calendar calculation
The DSM (Disassembler) calendar grants each port access to internal
busses. The configuration of the calendar is done differently on Sparx5
and lan969x. Therefore ops out the function that calculates the
calendar.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:01 +02:00
Daniel Machon
8c274d6909 net: sparx5: ops out PTP IRQ handler
The PTP registers are located in two different register targets on
Sparx5 and lan969x. We can't handle this with the register macros, so
ops out the handler.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:01 +02:00
Daniel Machon
b7e09ddb67 net: sparx5: ops out function for setting the port mux
Port muxing is configured based on the supported port modes. As these
modes can differ on Sparx5 and lan969x we ops out the port muxing
function.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:01 +02:00
Daniel Machon
beb36b5071 net: sparx5: ops out functions for getting certain array values
Add getters for getting values in arrays: sdlb_groups and
sparx5_hsch_max_group_rate and ops out the getters, as these arrays will
differ on lan969x.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:01 +02:00
Daniel Machon
20f8bc8755 net: sparx5: ops out chip port to device index/bit functions
The chip port device index and mode bit can be obtained using the port
number.  However the mapping of port number to chip device index and
mode bit differs on Sparx5 and lan969x. Therefore ops out the function.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:01 +02:00
Daniel Machon
048c96907c net: sparx5: add ops to match data
Add new struct sparx5_ops, containing functions that needs to be
different as the implementation differs on Sparx5 and lan969x. Initially
we add functions for checking the port type (2g5, 5g, 10g or 25g) based
on the port number. Update the code to use the ops instead of the
platform specific functions.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08 12:07:01 +02:00