Commit Graph

1383511 Commits

Author SHA1 Message Date
Heiner Kallweit
0625b3bfbb net: phy: fixed_phy: remove member no_carrier from struct fixed_phy
After the recent removal of gpio support member no_carrier isn't
needed any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:11:50 -07:00
Heiner Kallweit
fecf7087f0 net: phy: fixed_phy: remove unused interrupt support
The two callers of __fixed_phy_add() both pass PHY_POLL, so we can
remove the irq argument to simplify the function.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:11:47 -07:00
Alok Tiwari
d436b5abba ipv4: udp: fix typos in comments
Correct typos in ipv4/udp.c comments for clarity:
"Encapulation" -> "Encapsulation"
"measureable" -> "measurable"
"tacking care" -> "taking care"

No functional changes.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907192535.3610686-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 16:29:05 -07:00
Jakub Kicinski
1c0353a6df selftests: net: speed up pmtu.sh by avoiding unnecessary cleanup
The pmtu test takes nearly an hour when run on a debug kernel
(10min on a normal kernel, so the debug slow down is quite significant).
NIPA tries to ensure all results are delivered by a certain deadline
so this prevents it from retrying the test in case of a flake.

Looks like one of the slowest operations in the test is calling out
to ./openvswitch/ovs-dpctl.py to remove potential leftover OvS interfaces.
Check whether the interfaces exist in the first place in sysfs,
since it can be done directly in bash it is very fast.

This should save us around 20-30% of the test runtime.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250906214535.3204785-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 16:26:44 -07:00
Jakub Kicinski
a12fd5c31b selftests: net: run groups from fcnal-test in parallel
fcnal-test.sh takes almost hour and a half to finish.
The tests are already grouped into ipv4, ipv6 and other.
Run those groups separately.

Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250908201021.270681-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 15:34:11 -07:00
Jakub Kicinski
3b4296f589 Merge tag 'mlx5-rs-fec-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:

====================
mlx5-next updates 2025-09-09

The following pull-request contains a common mlx5 update.

* tag 'mlx5-rs-fec-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add RS FEC histogram infrastructure
====================

Link: https://patch.msgid.link/1757413460-539097-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 09:03:51 -07:00
Jakub Kicinski
0574c27cbe eth: fbnic: support persistent NAPI config
No shenanigans in this driver, AFAIU, pass the vector index to NAPI
registration.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250905022254.2635707-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 16:03:27 +02:00
Brett A C Sheffield
aeb8d48ea9 selftests: net: add test for ipv6 fragmentation
Add selftest for the IPv6 fragmentation regression which affected
several stable kernels.

Commit a18dfa9925 ("ipv6: save dontfrag in cork") was backported to
stable without some prerequisite commits.  This caused a regression when
sending IPv6 UDP packets by preventing fragmentation and instead
returning -1 (EMSGSIZE).

Add selftest to check for this issue by attempting to send a packet
larger than the interface MTU. The packet will be fragmented on a
working kernel, with sendmsg(2) correctly returning the expected number
of bytes sent.  When the regression is present, sendmsg returns -1 and
sets errno to EMSGSIZE.

Link: https://lore.kernel.org/stable/aElivdUXqd1OqgMY@karahi.gladserv.com
Signed-off-by: Brett A C Sheffield <bacs@librecast.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250903154925.13481-1-bacs@librecast.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 11:43:59 +02:00
Hangbin Liu
d67ca09ca3 hsr: use netdev_master_upper_dev_link() when linking lower ports
Unlike VLAN devices, HSR changes the lower device’s rx_handler, which
prevents the lower device from being attached to another master.
Switch to using netdev_master_upper_dev_link() when setting up the lower
device.

This could improves user experience, since ip link will now display the
HSR device as the master for its ports.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902065558.360927-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 11:27:37 +02:00
Paolo Abeni
c27334aef6 Merge branch 'bonding-support-aggregator-selection-based-on-port-priority'
Hangbin Liu says:

====================
bonding: support aggregator selection based on port priority

This patchset introduces a new per-port bonding option: `ad_actor_port_prio`.

It allows users to configure the actor's port priority, which can then be used
by the bonding driver for aggregator selection based on port priority.

This provides finer control over LACP aggregator choice, especially in setups
with multiple eligible aggregators over 2 switches.
====================

Link: https://patch.msgid.link/20250902064501.360822-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:05 +02:00
Hangbin Liu
c2377f1763 selftests: bonding: add test for LACP actor port priority
Add comprehensive selftest to verify:
- Per-port actor priority setting via ad_actor_port_prio
- Aggregator selection behavior with port_priority ad_select policy

Also move cmd_jq helper from forwarding/lib.sh to net/lib.sh for
broader reusability across network selftests.

Here is the result output
  # ./bond_lacp_prio.sh
  TEST: bond 802.3ad (ad_actor_port_prio setting)                     [ OK ]
  TEST: bond 802.3ad (ad_actor_port_prio select)                      [ OK ]
  TEST: bond 802.3ad (ad_actor_port_prio switch)                      [ OK ]

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-4-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:02 +02:00
Hangbin Liu
e5a6643435 bonding: support aggregator selection based on port priority
Add a new ad_select policy 'port_priority' that uses the per-port
actor priority values (set via ad_actor_port_prio) to determine
aggregator selection.

This allows administrators to influence which ports are preferred
for aggregation by assigning different priority values, providing
more flexible load balancing control in LACP configurations.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:02 +02:00
Hangbin Liu
6b6dc81ee7 bonding: add support for per-port LACP actor priority
Introduce a new netlink attribute 'actor_port_prio' to allow setting
the LACP actor port priority on a per-slave basis. This extends the
existing bonding infrastructure to support more granular control over
LACP negotiations.

The priority value is embedded in LACPDU packets and will be used by
subsequent patches to influence aggregator selection policies.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:02 +02:00
Carolina Jubran
ff97bc38be net/mlx5: Add RS FEC histogram infrastructure
Define the Ports Phy Histogram Configuration Register (PPHCR) to expose
RS-FEC histogram bin ranges, and expose a new counter group in the Ports
Performance Counters Register (PPCNT) to report the corresponding
histogram values.

Co-developed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1756884600-520195-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-09 04:18:19 -04:00
Paolo Abeni
389cca2bde Merge branch 'support-exposing-raw-cycle-counters-in-ptp-and-mlx5'
Tariq Toukan says:

====================
Support exposing raw cycle counters in PTP and mlx5

This series by Carolina adds support in ptp and usage in mlx5 for
exposing the raw free-running cycle counter of PTP hardware clocks.

This is V2. Find previous one here:
https://lore.kernel.org/all/1752556533-39218-1-git-send-email-tariqt@nvidia.com/

Find detailed description by Carolina below [1].

[1]
This patch series introduces support for exposing the raw free-running
cycle counter of PTP hardware clocks. When the device is in free-running
mode, it emits timestamps as raw cycle values instead of nanoseconds.
These values may be passed directly to user space through:

- fwctl: exposes internal device event records that include raw
         cycle-based timestamps.

- DPDK: retrieves CQEs that contain raw cycle counters, which are passed
        to user space unmodified.

To address this, the series introduces two new ioctl commands that allow
userspace to query the device's raw cycle counter together with host
time:

 - PTP_SYS_OFFSET_PRECISE_CYCLES

 - PTP_SYS_OFFSET_EXTENDED_CYCLES

These commands work like their existing counterparts but return the
device timestamp in cycle units instead of real-time nanoseconds.  This
allows user space to collect (cycle, time) pairs and build a mapping
between the device’s free-running clock and host time.

This can also be useful in the XDP fast path: if a driver inserts the
raw cycle value into metadata instead of a real-time timestamp, it can
avoid the overhead of converting cycles to time in the kernel. Then
userspace can resolve the cycle-to-time mapping using this ioctl when
needed.

The ioctl enables user space to correlate those with host time, without
requiring the PHC to be synchronized, so long as the drift remains
stable during collection.

Adds the new PTP ioctls and integrates support in ptp_ioctl():
- ptp: Add ioctl commands to expose raw cycle counter values

Support for exposing raw cycles in mlx5:
- net/mlx5: Extract MTCTR register read logic into helper function
- net/mlx5: Support getcyclesx and getcrosscycles
====================

Link: https://patch.msgid.link/1755008228-88881-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:26 +02:00
Carolina Jubran
a3fb485505 net/mlx5: Support getcyclesx and getcrosscycles
Implement the getcyclesx64 and getcrosscycles callbacks in ptp_info to
expose the device’s raw free-running counter.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1755008228-88881-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:24 +02:00
Carolina Jubran
96c345c3c5 net/mlx5: Extract MTCTR register read logic into helper function
Refactor the MTCTR register reading logic into a dedicated helper to
lay the groundwork for the next patch.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1755008228-88881-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:24 +02:00
Carolina Jubran
faf23f54d3 ptp: Add ioctl commands to expose raw cycle counter values
Introduce two new ioctl commands, PTP_SYS_OFFSET_PRECISE_CYCLES and
PTP_SYS_OFFSET_EXTENDED_CYCLES, to allow user space to access the
raw free-running cycle counter from PTP devices.

These ioctls are variants of the existing PRECISE and EXTENDED
offset queries, but instead of returning device time in realtime,
they return the raw cycle counter value.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/1755008228-88881-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:24 +02:00
Håkon Bugge
9f0730b063 rds: ib: Remove unused extern definition
In the old days, RDS used FMR (Fast Memory Registration) to register
IB MRs to be used by RDMA. A newer and better verbs based
registration/de-registration method called FRWR (Fast Registration
Work Request) was added to RDS by commit 1659185fb4 ("RDS: IB:
Support Fastreg MR (FRMR) memory registration mode") in 2016.

Detection and enablement of FRWR was done in commit 2cb2912d65
("RDS: IB: add Fastreg MR (FRMR) detection support"). But said commit
added an extern bool prefer_frmr, which was not used by said commit -
nor used by later commits. Hence, remove it.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250905101958.4028647-1-haakon.bugge@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:16:49 -07:00
Jakub Kicinski
6e0cca6ba3 Merge branch 'net-stmmac-mdio-cleanups'
Russell King says:

====================
net: stmmac: mdio cleanups

Clean up the stmmac MDIO code:
- provide an address register formatter to avoid repeated code
- provide a common function to wait for the busy bit to clear
- pre-compute the CR field (mdio clock divider)
- move address formatter into read/write functions
- combine the read/write functions into a common accessor function
- move runtime PM handling into common accessor function
- rename register constants to better reflect manufacturer names
- move stmmac_clk_csr_set() into stmmac_mdio
- make stmmac_clk_csr_set() return the CR field value and remove
  priv->clk_csr
- clean up if() range tests in stmmac_clk_csr_set()
- use STMMAC_CSR_xxx definitions in initialisers

For Qualcomm QCS9100 Ride R3 board with the AQR115C PHY:

Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
====================

Link: https://patch.msgid.link/aLmBwsMdW__XBv7g@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:10 -07:00
Russell King (Oracle)
fc8f62c827 net: stmmac: use STMMAC_CSR_xxx definitions in platform glue
Use the STMMAC_CSR_xxx definitions to initialise plat->clk_csr in the
platform glue drivers to make the integer values meaningful.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oh-00000001vpT-0vk2@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:03 -07:00
Russell King (Oracle)
78c91bec8f net: stmmac: mdio: remove redundant clock rate tests
The pattern:

	... if (v < A)
		...
	else if (v >= A && v < B)
		...

can be simplified to:

	... if (v < A)
		...
	else if (v < B)
		...

which makes the string of ifelse more readable.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oc-00000001vpN-0S1A@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:03 -07:00
Russell King (Oracle)
231e2b016f net: stmmac: mdio: return clk_csr value from stmmac_clk_csr_set()
Return the clk_csr value from stmmac_clk_csr_set() rather than
using priv->clk_csr, as this struct member now serves very little
purpose. This allows us to remove priv->clk_csr.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oW-00000001vpH-46zf@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:03 -07:00
Russell King (Oracle)
661a868937 net: stmmac: mdio: move initialisation of priv->clk_csr to stmmac_mdio
The only user of priv->clk_csr is the MDIO code, so move its
initialisation to stmmac_mdio.c.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oR-00000001vpB-3fbY@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:03 -07:00
Russell King (Oracle)
3581acbb78 net: stmmac: mdio: improve mdio register field definitions
Include the register name in the definitions, and use a name which
more closely resembles that used in documentation, while still being
descriptive.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oM-00000001vp4-3DC5@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:03 -07:00
Russell King (Oracle)
9b88194a3b net: stmmac: mdio: move runtime PM into stmmac_mdio_access()
Move the runtime PM handling into the common stmmac_mdio_access()
function, rather than having it in the four top-level bus access
functions.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oH-00000001voy-2jfU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:02 -07:00
Russell King (Oracle)
9b0ed33a42 net: stmmac: mdio: merge stmmac_mdio_read() and stmmac_mdio_write()
stmmac_mdio_read() and stmmac_mdio_write() are virtually identical
except for the final read in the stmmac_mdio_read(). Handle this as
a flag.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oC-00000001vos-2JnA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:02 -07:00
Russell King (Oracle)
6cb3d67ad6 net: stmmac: mdio: move stmmac_mdio_format_addr() into read/write
Move stmmac_mdio_format_addr() into stmmac_mdio_read() and
stmmac_mdio_write().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8o7-00000001vom-1pN8@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:02 -07:00
Russell King (Oracle)
6717746f33 net: stmmac: mdio: provide priv->gmii_address_bus_config
Provide a pre-formatted value for the MDIO address register fields
which remain constant across the various different transactions
rather than recreating the register value from scratch every time.
Currently, we only do this for the CR (clock range) field.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8o2-00000001vog-1LyK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:02 -07:00
Russell King (Oracle)
9eb633ad1d net: stmmac: mdio: provide stmmac_mdio_wait()
All the readl_poll_timeout()s follow the same pattern - test a register
for a bit being clear every 100us, and timeout after 10ms returning
-EBUSY. Wrap this up into a function to avoid duplicating this.

This slightly changes the return value for stmmac_mdio_write() if the
second readl_poll_timeout() fails - rather than returning -ETIMEDOUT
we return -EBUSY matching the stmmac_mdio_read() behaviour.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8nx-00000001voa-0tJ0@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:02 -07:00
Russell King (Oracle)
16e03235d5 net: stmmac: mdio: provide address register formatter
Rather than duplicating the logic for filling the PA (MDIO address),
GR (MDIO register/devad), CR (clock range) and GB (busy) fields of the
address register in four locations, provide a helper to do this.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8ns-00000001voU-0S7b@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:02 -07:00
Jakub Kicinski
144d0b1c45 Merge branch 'ipv6-snmp-avoid-performance-issue-with-ratelimithost'
Eric Dumazet says:

====================
ipv6: snmp: avoid performance issue with RATELIMITHOST

Addition of ICMP6_MIB_RATELIMITHOST in commit d0941130c9
("icmp: Add counters for rate limits") introduced a performance drop
in case of DOS (like receiving UDP packets to closed ports).

Per netns ICMP6_MIB_RATELIMITHOST tracking uses per-cpu storage and
is enough, we do not need per-device and slow tracking for this metric.

In v2 of this series, I completed the removal of SNMP_MIB_SENTINEL
in all the kernel for consistency.

v1: https://lore.kernel.org/20250904092432.113c4940@kernel.org
====================

Link: https://patch.msgid.link/20250905165813.1470708-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:24 -07:00
Eric Dumazet
20d3d26815 net: snmp: remove SNMP_MIB_SENTINEL
No more user of SNMP_MIB_SENTINEL, we can remove it.

Also remove snmp_get_cpu_field[64]_batch() helpers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:21 -07:00
Eric Dumazet
c73d583e70 xfrm: snmp: do not use SNMP_MIB_SENTINEL anymore
Use ARRAY_SIZE(), so that we know the limit at compile time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:21 -07:00
Eric Dumazet
3a951f9520 tls: snmp: do not use SNMP_MIB_SENTINEL anymore
Use ARRAY_SIZE(), so that we know the limit at compile time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:21 -07:00
Eric Dumazet
52a33cae6a sctp: snmp: do not use SNMP_MIB_SENTINEL anymore
Use ARRAY_SIZE(), so that we know the limit at compile time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20250905165813.1470708-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:21 -07:00
Eric Dumazet
35cb2da0ab mptcp: snmp: do not use SNMP_MIB_SENTINEL anymore
Use ARRAY_SIZE(), so that we know the limit at compile time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mat Martineau <martineau@kernel.org>
Cc: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250905165813.1470708-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:20 -07:00
Eric Dumazet
b7b74953f8 ipv4: snmp: do not use SNMP_MIB_SENTINEL anymore
Use ARRAY_SIZE(), so that we know the limit at compile time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:20 -07:00
Eric Dumazet
2fab94bcf3 ipv6: snmp: do not track per idev ICMP6_MIB_RATELIMITHOST
Blamed commit added a critical false sharing on a single
atomic_long_t under DOS, like receiving UDP packets
to closed ports.

Per netns ICMP6_MIB_RATELIMITHOST tracking uses per-cpu
storage and is enough, we do not need per-device and slow tracking.

Fixes: d0941130c9 ("icmp: Add counters for rate limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Cc: Abhishek Rawal <rawal.abhishek92@gmail.com>
Link: https://patch.msgid.link/20250905165813.1470708-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:20 -07:00
Eric Dumazet
ceac1fb229 ipv6: snmp: do not use SNMP_MIB_SENTINEL anymore
Use ARRAY_SIZE(), so that we know the limit at compile time.

Following patch needs this preliminary change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:20 -07:00
Eric Dumazet
b7fe8c1be7 ipv6: snmp: remove icmp6type2name[]
This 2KB array can be replaced by a switch() to save space.

Before:
$ size net/ipv6/proc.o
   text	   data	    bss	    dec	    hex	filename
   6410	    624	      0	   7034	   1b7a	net/ipv6/proc.o

After:
$ size net/ipv6/proc.o
   text	   data	    bss	    dec	    hex	filename
   5516	    592	      0	   6108	   17dc	net/ipv6/proc.o

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:06:20 -07:00
Alok Tiwari
abcf9f662b ixgbe: fix typo in function comment for ixgbe_get_num_per_func()
Correct a typo in the comment where "PH" was used instead of "PF".
The function returns the number of resources per PF or 0 if no PFs
are available.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Qiang Liu <liuqiang@kylinos.cn>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20250905163353.3031910-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 17:51:50 -07:00
Alok Tiwari
bd64723327 net: mctp: fix typo in comment
Correct a typo in af_mctp.c: "fist" -> "first".

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20250905165006.3032472-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 17:47:57 -07:00
Jakub Kicinski
f3883b1ea5 selftests: net: move netlink-dumps back to progs
Commit 9bb88c6596 ("selftests: net: test extacks in netlink dumps")
moved netlink-dumps from TEST_GEN_PROGS to YNL_GEN_FILES.
But _FILES are not for tests, rather for utilities / helpers.
Create YNL_GEN_PROGS and include netlink-dumps there.
This makes netlink-dumps part of executed tests, again.

Link: https://patch.msgid.link/20250906211351.3192412-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 12:57:05 -07:00
Jakub Kicinski
27bc5eaf00 selftests: net: make the dump test less sensitive to mem accounting
Recent changes to make netlink socket memory accounting must
have broken the implicit assumption of the netlink-dump test
that we can fit exactly 64 dumps into the socket. Handle the
failure mode properly, and increase the dump count to 80
to make sure we still run into the error condition if
the default buffer size increases in the future.

Link: https://patch.msgid.link/20250906211351.3192412-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 12:57:05 -07:00
Jakub Kicinski
c6142e1913 Merge branch '10g-qxgmii-for-aqr412c-felix-dsa-and-lynx-pcs-driver'
Vladimir Oltean says:

====================
10G-QXGMII for AQR412C, Felix DSA and Lynx PCS driver

Introduce the first user of the "10g-qxgmii" phy-mode, since its
introduction from commit 5dfabcdd76 ("dt-bindings: net:
ethernet-controller: add 10g-qxgmii mode").

The arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso already
exists upstream, but has phy-mode = "usxgmii", which comes from the fact
that the AQR412(C) PHY does not distinguish between the two modes.
Yet, the distinction is crucial for the upcoming SerDes driver for the
LS1028A platform.

The series is comprised of:
- preliminary patches to the Lynx PCS and Felix DSA driver which accept
  the phy-mode and treat it like "usxgmii"
- an ad-hoc whitelisting mechanism in the Aquantia PHY driver based on
  firmware version, which was agreed upon with Marvell, and which serves
  as "detection"
- in-band auto-negotiation capability reporting and configuration. This
  makes sure this feature is enabled in the PHY, because the Lynx PCS
  only works with USXGMII/10G-QXGMII in-band autoneg enabled.

Notably, it lacks a device tree update, which will come later, but
should not be strictly necessary. The expectation is for the Aquantia
PHY driver to pick up "10g-qxgmii" with existing device trees as well,
which it does, except for the slightly confusing "configuring for
inband/usxgmii link mode" initial message. This changes to "configuring
for inband/10g-qxgmii link mode" once phylink gets a chance to pick up
the phydev->interface in its pl->link_config.interface.

$ ip link set swp3 up
mscc_felix 0000:00:00.5 swp3: configuring for inband/usxgmii link mode
mscc_felix 0000:00:00.5 swp3: phylink_mac_config: mode=inband/usxgmii/none adv=0000000,00000000,00008000,0002606c pause=04
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 0
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 1
mscc_felix 0000:00:00.5 swp3: phylink_mac_config: mode=inband/10g-qxgmii/none adv=0000000,00000000,00008000,0002606c pause=00
mscc_felix 0000:00:00.5 swp3: Link is Up - 2.5Gbps/Full - flow control off

$ ip link set swp3 down
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 0
mscc_felix 0000:00:00.5 swp3: Link is Down

$ ip link set swp3 up
mscc_felix 0000:00:00.5 swp3: configuring for inband/10g-qxgmii link mode
mscc_felix 0000:00:00.5 swp3: phylink_mac_config: mode=inband/10g-qxgmii/none adv=0000000,00000000,00008000,0002606c pause=04
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 0
mscc_felix 0000:00:00.5 swp3: phylink_phy_change: phy interface 10g-qxgmii link 1
mscc_felix 0000:00:00.5 swp3: Link is Up - 2.5Gbps/Full - flow control off
====================

Link: https://patch.msgid.link/20250903130730.2836022-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-05 19:03:42 -07:00
Vladimir Oltean
a76f26f7a8 net: phy: aquantia: support phy-mode = "10g-qxgmii" on NXP SPF-30841 (AQR412C)
The quad port PHYs (AQR4*) have 4 system interfaces, and some of them,
like AQR412C, can be used with a special firmware provisioning which
multiplexes all ports over a single host-side SerDes lane. The protocol
used over this lane is Cisco 10G-QXGMII feature, or "MUSX", as Aquantia
seems to call it.

One such example is the AQR412C PHY from the NXP SPF-30841 10G-QXGMII
add-in card, which uses this firmware file:
https://github.com/nxp-qoriq/qoriq-firmware-aquantia/blob/master/AQR-G3_v4.3.C-AQR_NXP_SPF-30841_MUSX_ID40019_VER1198.cld

There seems to be no disagreement, including from Marvell FAE, that
10G-QXGMII is reported to the host over MDIO as USXGMII and
indistinguishable from it. This includes the registers from the
provisioning based on which the firmware configures a single system
interface (lane C in the case of SPF-30841) to multiplex all ports -
they are also only accessible from the firmware, or over I2C (?!).

However, the Linux MAC and especially SerDes drivers may need to know if
it is using 1 port per lane (USXGMII) or 4 ports per lane (10G-QXGMII).

In the downstream Layerscape SDK we have previously implemented a
simpler scheme where for certain PHY interface modes, we trust the
device tree and never let the PHY driver overwrite phydev->interface:
862694a496

but for upstream, a nicer detection method is implemented, where
although we can not distinguish USXGMII from 10G-QXGMII per se, we
create a whitelist of firmware fingerprints for which USXGMII is
translated into 10G-QXGMII. At the time of writing, it is expected that
this should only happen for the NXP SPF-30841 card, although extending
for more is trivial - just uncomment the phydev_dbg() in
aqr_build_fingerprint().

An advantage of this method is that it doesn't strictly require updates
to arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso, since the
PHY driver will transition from "usxgmii" to "10g-qxgmii".

All aqr_translate_interface() callers have also previously called
aqr107_probe(), so dereferencing phydev->priv is safe.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250903130730.2836022-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-05 19:03:40 -07:00
Vladimir Oltean
dda916111e net: phy: aquantia: create and store a 64-bit firmware image fingerprint
Some PHY features cannot be queried through MDIO registers and require
alternative driver detection methods.

One such feature is 10G-QXGMII (4 ports of up to 2.5G multiplexed over
a single SerDes lane), or "MUSX" as it is called by Aquantia/Marvell.
The firmware has provisioning to modify some registers which seem
inaccessible for read or write over MDIO, which configure an internal
mux for MUSX. To the host, over MDIO, the system interface appears
indistinguishable from single-port-per-lane USXGMII.

Marvell FAE Ziang You recommended a detection method for this feature
based on a tuple which should hopefully identify the firmware build
uniquely. Most of the tuple items are already printed by
aqr107_chip_info(), and an extra set is the misc ID (reg 1.c41d) and the
misc version (reg 1.c41e). These are auto-generated by the Marvell
firmware tool for formal builds, and should be unique (not my claim).

In addition, at least for the builds provided to NXP and redistributed
here:
https://github.com/nxp-qoriq/qoriq-firmware-aquantia/tree/master
these registers are part of the name, for example in
AQR-G3_v4.3.C-AQR_NXP_SPF-30841_MUSX_ID40019_VER1198.cld, reg 1.c41d
will contain 40019 and reg 1.c41e will contain 1198.

Note that according to commit 43429a0353 ("net: phy: aquantia: report
PHY details like firmware version"), the "chip may be functional even
w/o firmware image." In that case, we can't construct a fingerprint and
it will remain zero. That shouldn't imact the use case though.

Dereferencing phydev->priv should be ok in all cases: all
aqr_gen1_config_init() callers have also previously called
aqr107_probe().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250903130730.2836022-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-05 19:03:40 -07:00
Vladimir Oltean
5d59109d47 net: phy: aquantia: report and configure in-band autoneg capabilities
The Global System Configuration registers for each media side link speed
have bit 3 which controls auto-negotiation for the system interface.
Since bits 2:0 of the same register indicate the SerDes protocol for the
same system interface, it makes sense to filter these registers for the
SerDes protocol matching phydev->interface, and to read/write the
auto-negotiation bit.

However, experimentally, USXGMII in-band auto-negotiation is unaffected
by this bit, and instead reacts to bit 3 of register 4.C441 (PHY XS
Transmit Reserved Vendor Provisioning 2).

Both the Global System Configuration as well as the aforementioned
register 4.C441 are documented as PD (Provisioning Defaults), i.e. each
PHY firmware may provision its own values.

I was initially planning to only read these values and not support
changing them (instead just the MAC PCS reconfigures itself, if it can).
But there is one problem: Linux expects that the in-band capability is
configured the same for all speeds where a given SerDes protocol is used.
I was going to add logic that detects mismatched vendor provisioning
(in-band autoneg enabled for speed X, disabled for speed Y) and warn
about it and return 0 (unknown capabilities).

Funnily enough, there is already a known instance where speed 2500 has
"autoneg 1" and the lower speeds have "autoneg 0":
https://lore.kernel.org/netdev/aJH8n0zheqB8tWzb@FUE-ALEWI-WINX/

I don't think it's worth fighting the battle with inconsistent firmware
images built by Aquantia/Marvell, and reporting that to the user, when
we have the ability to modify these fields to values that make sense to
us. We see the same situation with all the aqr*_get_features() functions
which fix up nonsensical supported link modes.

Furthermore, altering the in-band auto-negotiation setting can be
considered a minor change, compared to changing the SerDes protocol in
its entirety, for which we are still not prepared.

Testing was done on:
- AQR107 (Gen2) in USXGMII mode, as found on the NXP LX2160A-RDB.
- AQR112 (Gen3) in USXGMII mode, as found on the NXP SCH-30842 riser
  card, plugged into LS1028A-QDS.
- AQR412C (Gen3) in 10G-QXGMII mode, as found on the NXP SCH-30841 riser
  card, plugged into the LS1028A-QDS.
- AQR115 (Gen4) in SGMII mode, as found on the NXP LS1046A-RDB rev E.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250903130730.2836022-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-05 19:03:40 -07:00
Vladimir Oltean
7b0376d0e0 net: phy: aquantia: print global syscfg registers
Sometimes people with unknown firmware provisioning post on the mailing
lists asking for support. The information collected by
aqr_gen2_read_global_syscfg() is sufficiently important to warrant a
phydev_dbg() that can easily be turned into a verbose print by the
system owner in case some debugging is needed.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250903130730.2836022-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-05 19:03:40 -07:00