Commit Graph

1383552 Commits

Author SHA1 Message Date
Matthieu Baerts (NGI0)
287bc89bb4 tools: ynl: avoid bare except
This 'except' was used without specifying the exception class according
to Ruff. Here, only the ValueError class is expected and handled.

This is linked to Ruff error E722 [1]:

  A bare except catches BaseException which includes KeyboardInterrupt,
  SystemExit, Exception, and others. Catching BaseException can make it
  hard to interrupt the program (e.g., with Ctrl-C) and can disguise
  other problems.

Link: https://docs.astral.sh/ruff/rules/bare-except/ [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://patch.msgid.link/20250909-net-next-ynl-ruff-v1-2-238c2bccdd99@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-10 18:08:59 -07:00
Matthieu Baerts (NGI0)
7a3aaaa9fc tools: ynl: fix undefined variable name
This variable used in the error path was not defined according to Ruff.
msg_format.attr_set is used instead, presumably the one that was
supposed to be used originally.

This is linked to Ruff error F821 [1]:

  An undefined name is likely to raise NameError at runtime.

Fixes: 1769e2be4b ("tools/net/ynl: Add 'sub-message' attribute decoding to ynl")
Link: https://docs.astral.sh/ruff/rules/undefined-name/ [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://patch.msgid.link/20250909-net-next-ynl-ruff-v1-1-238c2bccdd99@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-10 18:08:59 -07:00
Russell King (Oracle)
724b22d38a net: stmmac: dwc-qos: use PHY WoL
Mark Tegra platforms to use PHY's wake-on-Lan capabilities rather than
the stmmac wake-on-Lan.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uw0ff-00000004IQJ-3AMp@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-10 18:06:54 -07:00
Niklas Söderlund
9c02ea544a net: sh_eth: Disable WoL if system can not suspend
The MAC can't facilitate WoL if the system can't go to sleep. Gate the
WoL support callbacks in ethtool at compile time using CONFIG_PM_SLEEP.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250909085849.3808169-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-10 17:58:35 -07:00
Saurabh Sengar
38611e5ada net: mana: Remove redundant netdev_lock_ops_to_full() calls
NET_SHAPER is always selected for MANA driver. When NET_SHAPER is enabled,
netdev_lock_ops_to_full() reduces effectively to only an assert for lock,
which is always held in the path when NET_SHAPER is enabled.

Remove the redundant netdev_lock_ops_to_full() call.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://patch.msgid.link/1757393830-20837-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-10 17:57:09 -07:00
Rohan G Thomas
deb105f498 net: phy: marvell: Fix 88e1510 downshift counter errata
The 88e1510 PHY has an erratum where the phy downshift counter is not
cleared after phy being suspended(BMCR_PDOWN set) and then later
resumed(BMCR_PDOWN cleared). This can cause the gigabit link to
intermittently downshift to a lower speed.

Disabling and re-enabling the downshift feature clears the counter,
allowing the PHY to retry gigabit link negotiation up to the programmed
retry count times before downshifting. This behavior has been observed
on copper links.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250906-marvell_fix-v2-1-f6efb286937f@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:30:51 -07:00
Jakub Kicinski
214da63451 Merge branch 'ptp-add-pulse-signal-loopback-support-for-debugging'
Wei Fang says:

====================
ptp: add pulse signal loopback support for debugging

Some PTP devices support looping back the periodic pulse signal for
debugging. For example, the PTP device of QorIQ platform and the NETC v4
Timer has the ability to loop back the pulse signal and record the extts
events for the loopback signal. So we can make sure that the pulse
intervals and their phase alignment are correct from the perspective of
the emitting PHC's time base. In addition, we can use this loopback
feature as a built-in extts event generator when we have no external
equipment which does that. Therefore, add the generic debugfs interfaces
to the ptp_clock driver. The first two patch are separated from the
previous patch set [1]. The third patch is new added.

[1]: https://lore.kernel.org/imx/20250827063332.1217664-1-wei.fang@nxp.com/ #patch 3 and 9
====================

Link: https://patch.msgid.link/20250905030711.1509648-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:28:55 -07:00
Wei Fang
f3164840a1 ptp: qoriq: convert to use generic interfaces to set loopback mode
Since the generic debugfs interfaces for setting the periodic pulse
signal loopback have been added to the ptp_clock driver, so convert
the vendor-defined debugfs interfaces to the generic interfaces.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250905030711.1509648-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:28:52 -07:00
Wei Fang
67ac836373 ptp: netc: add the periodic output signal loopback support
The NETC Timer supports looping back the output pulse signal of Fiper-n
into Trigger-n input, so that users can leverage this feature to validate
some other features without external hardware support. For example, users
can use it to test external trigger stamp (EXTTS). And users can combine
EXTTS with loopback mode to check whether the generation time of PPS is
aligned with an integral second of PHC, or the periodic output signal
(PTP_CLK_REQ_PEROUT) whether is generated at the specified time.

Since ptp_clock_info::perout_loopback() has been added to the ptp_clock
driver as a generic interface to enable or disable the periodic output
signal loopback, therefore, netc_timer_perout_loopback() is added as a
callback of ptp_clock_info::perout_loopback().

Test the generation time of PPS event:

$ echo 0 1 > /sys/kernel/debug/ptp0/perout_loopback
$ echo 1 > /sys/class/ptp/ptp0/pps_enable
$ testptp -d /dev/ptp0 -e 3
external time stamp request okay
event index 0 at 63.000000017
event index 0 at 64.000000017
event index 0 at 65.000000017

Test the generation time of the periodic output signal:

$ echo 0 1 > /sys/kernel/debug/ptp0/perout_loopback
$ echo 0 150 0 1 500000000 > /sys/class/ptp/ptp0/period
$ testptp -d /dev/ptp0 -e 3
external time stamp request okay
event index 0 at 150.000000014
event index 0 at 151.500000015
event index 0 at 153.000000014

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250905030711.1509648-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:28:52 -07:00
Wei Fang
e096a7cc0b ptp: add debugfs interfaces to loop back the periodic output signal
For some PTP devices, they have the capability to loop back the periodic
output signal for debugging, such as the ptp_qoriq device. So add the
generic interfaces to set the periodic output signal loopback, rather
than each vendor having a different implementation.

Show how many channels support the periodic output signal loopback:
$ cat /sys/kernel/debug/ptp<N>/n_perout_loopback

Enable the loopback of the periodic output signal of channel X:
$ echo <X> 1 > /sys/kernel/debug/ptp<N>/perout_loopback

Disable the loopback of the periodic output signal of channel X:
$ echo <X> 0 > /sys/kernel/debug/ptp<N>/perout_loopback

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250905030711.1509648-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:28:52 -07:00
Jakub Kicinski
cf71bdf686 Merge branch 'net-mlx5e-add-pcie-congestion-event-extras'
Tariq Toukan says:

====================
net/mlx5e: Add pcie congestion event extras

This small series by Dragos covers gaps requested in the initial pcie
congestion series [1]:
- Make pcie congestion thresholds configurable via devlink.
- Add a counter for stale pcie congestion events.

[1] https://lore.kernel.org/1752130292-22249-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/1757237976-531416-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:21:32 -07:00
Dragos Tatulea
cdc492746e net/mlx5e: Add stale counter for PCIe congestion events
This ethtool counter is meant to help with observing how many times the
congestion event was triggered but on query there was no state change.

This would help to indicate when a work item was scheduled to run too
late and in the meantime the congestion state changed back to previous
state.

While at it, do a driveby typo fix in documentation for
pci_bw_inbound_high.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1757237976-531416-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:21:30 -07:00
Dragos Tatulea
f4053490a6 net/mlx5e: Make PCIe congestion event thresholds configurable
Add devlink driverinit parameters for configuring the thresholds for
PCIe congestion events. These parameters are registered only when the
firmware supports this feature.

Update the mlx5 devlink docs as well on these new params.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1757237976-531416-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:21:30 -07:00
Jakub Kicinski
04d1ff1d75 Merge branch 'devlink-mlx5-add-new-parameters-for-link-management-and-sriov-eswitch-configurations'
Saeed Mahameed says:

====================
devlink, mlx5: Add new parameters for link management and SRIOV/eSwitch configurations [part]

This patch series introduces several devlink parameters improving device
configuration capabilities, link management, and SRIOV/eSwitch, by adding
NV config boot time parameters.

Implement the following parameters:

   a) total_vfs Parameter:
   -----------------------

Adds support for managing the number of VFs (total_vfs) and enabling
SR-IOV (enable_sriov for mlx5) through devlink. These additions enhance
user control over virtualization features directly from standard kernel
interfaces without relying on additional external tools. total_vfs
functionality is critical for environments that require flexible num VF
configuration.

   b) CQE Compression Type:
   ------------------------

Introduces a new devlink parameter, cqe_compress_type, to configure the
rate of CQE compression based on PCIe bus conditions. This setting
provides a balance between compression efficiency and overall NIC
performance under different traffic loads.
====================

Link: https://patch.msgid.link/20250907012953.301746-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:14:32 -07:00
Vlad Dumitrescu
a4c49611cf net/mlx5: Implement devlink total_vfs parameter
Some devices support both symmetric (same value for all PFs) and
asymmetric, while others only support symmetric configuration. This
implementation prefers asymmetric, since it is closer to the devlink
model (per function settings), but falls back to symmetric when needed.

Example usage:
  devlink dev param set pci/0000:01:00.0 name total_vfs value <u16> cmode permanent
  devlink dev reload pci/0000:01:00.0 action fw_activate
  echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove
  echo 1 >/sys/bus/pci/rescan
  cat /sys/bus/pci/devices/0000:01:00.0/sriov_totalvfs

Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:14:24 -07:00
Vlad Dumitrescu
95a0af146d net/mlx5: Implement devlink enable_sriov parameter
Example usage:
  devlink dev param set pci/0000:01:00.0 name enable_sriov value {true, false} cmode permanent
  devlink dev reload pci/0000:01:00.0 action fw_activate
  echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove
  echo 1 >/sys/bus/pci/rescan
  grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_*

Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:14:24 -07:00
Saeed Mahameed
bf2da4799f net/mlx5: Implement cqe_compress_type via devlink params
Selects which algorithm should be used by the NIC in order to decide rate of
CQE compression dependeng on PCIe bus conditions.

Supported values:

1) balanced, merges fewer CQEs, resulting in a moderate compression ratio
   but maintaining a balance between bandwidth savings and performance
2) aggressive, merges more CQEs into a single entry, achieving a higher
   compression rate and maximizing performance, particularly under high
   traffic loads.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:14:23 -07:00
Vlad Dumitrescu
ce0b015e26 devlink: Add 'total_vfs' generic device param
NICs are typically configured with total_vfs=0, forcing users to rely
on external tools to enable SR-IOV (a widely used and essential feature).

Add total_vfs parameter to devlink for SR-IOV max VF configurability.
Enables standard kernel tools to manage SR-IOV, addressing the need for
flexible VF configuration.

Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 19:14:23 -07:00
Jakub Kicinski
b90c7ca4f9 Merge branch 'mptcp-make-add_addr-retransmission-timeout-adaptive'
Matthieu Baerts says:

====================
mptcp: make ADD_ADDR retransmission timeout adaptive

Currently, the MPTCP ADD_ADDR notifications are retransmitted after a
fixed timeout controlled by the net.mptcp.add_addr_timeout sysctl knob,
if the corresponding "echo" packets are not received before. This can be
too slow (or too quick), especially with a too cautious default value
set to 2 minutes.

- Patch 1: make ADD_ADDR retransmission timeout adaptive, using the
  TCP's retransmission timeout. The corresponding sysctl knob is now
  used as a maximum value.

- Patch 2: now that these ADD_ADDR retransmissions can happen faster,
  all MPTCP Join subtests checking ADD_ADDR counters accept more
  ADD_ADDR than expected (if any). This is aligned with the previous
  behaviour, when the ADD_ADDR RTO was lowered down to 1 second.

- Patch 3: Some CIs have reported that some MPTCP Join signalling tests
  were unstable. It seems that it is due to the time it can take in slow
  environments to send a bunch of ADD_ADDR notifications and wait each
  time for their echo reply. Use a longer transfer to avoid such errors.

v1: https://lore.kernel.org/d5397026-92eb-4a43-9534-954b43ab9305@kernel.org
====================

Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-0-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:57:49 -07:00
Matthieu Baerts (NGI0)
e2cda6343b selftests: mptcp: join: allow more time to send ADD_ADDR
When many ADD_ADDR need to be sent, it can take some time to send each
of them, and create new subflows. Some CIs seem to occasionally have
issues with these tests, especially with "debug" kernels.

Two subtests will now run for a slightly longer time: the last two where
3 or more ADD_ADDR are sent during the test.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-3-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:57:45 -07:00
Matthieu Baerts (NGI0)
63c31d42cf selftests: mptcp: join: tolerate more ADD_ADDR
ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.

To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received, except
if no ADD_ADDR are expected.

Before the parent commit, the tolerance was present for each tests where
the ADD_ADDR could be retransmitted in a reasonable time (1 sec). Now
that all tests can have retransmitted ADD_ADDR, it is normal to apply
the same tolerance for all tests.

An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to many tests, it seems better to increase the tolerance.

Later, we could add a new MIB counter to identify the ADD_ADDR
retransmissions, and remove the tolerance when this counter is
available.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-2-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:57:45 -07:00
Geliang Tang
30549eebc4 mptcp: make ADD_ADDR retransmission timeout adaptive
Currently the ADD_ADDR option is retransmitted with a fixed timeout. This
patch makes the retransmission timeout adaptive by using the maximum RTO
among all the subflows, while still capping it at the configured maximum
value (add_addr_timeout_max). This improves responsiveness when
establishing new subflows.

Specifically:
1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive
timeout.
2. Uses maximum subflow RTO (icsk_rto) when available.
3. Applies exponential backoff based on retransmission count.
4. Maintains fallback to configured max timeout when no RTO data exists.

This slightly changes the behaviour of the MPTCP "add_addr_timeout"
sysctl knob to be used as a maximum instead of a fixed value. But this
is seen as an improvement: the ADD_ADDR might be sent quicker than
before to improve the overall MPTCP connection. Also, the default
value is set to 2 min, which was already way too long, and caused the
ADD_ADDR not to be retransmitted for connections shorter than 2 minutes.

Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576
Reviewed-by: Christoph Paasch <cpaasch@openai.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-1-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:57:45 -07:00
Jakub Kicinski
4ea83b7573 Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
idpf: add XDP support

Alexander Lobakin says:

Add XDP support (w/o XSk for now) to the idpf driver using the libeth_xdp
sublib. All possible verdicts, .ndo_xdp_xmit(), multi-buffer etc. are here.
In general, nothing outstanding comparing to ice, except performance --
let's say, up to 2x for .ndo_xdp_xmit() on certain platforms and
scenarios.
idpf doesn't support VLAN Rx offload, so only the hash hint is
available for now.

Patches 1-7 are prereqs, without which XDP would either not work at all
or work slower/worse/...

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  idpf: add XDP RSS hash hint
  idpf: add support for .ndo_xdp_xmit()
  idpf: add support for XDP on Rx
  idpf: use generic functions to build xdp_buff and skb
  idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq
  idpf: prepare structures to support XDP
  idpf: add support for nointerrupt queues
  idpf: remove SW marker handling from NAPI
  idpf: add 4-byte completion descriptor definition
  idpf: link NAPIs to queues
  idpf: use a saner limit for default number of queues to allocate
  idpf: fix Rx descriptor ready check barrier in splitq
  xdp, libeth: make the xdp_init_buff() micro-optimization generic
====================

Link: https://patch.msgid.link/20250908195748.1707057-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:44:07 -07:00
Ido Schimmel
ce6adea19a vxlan: Make vxlan_fdb_find_uc() more robust against NPDs
first_remote_rcu() can return NULL if the FDB entry points to an FDB
nexthop group instead of a remote destination. However, unlike other
users of first_remote_rcu(), NPD cannot currently happen in
vxlan_fdb_find_uc() as it is only invoked by one driver which vetoes the
creation of FDB nexthops.

Make the function more robust by making sure the remote destination is
only dereferenced if it is not NULL.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250908075141.125087-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:27:33 -07:00
Vladimir Oltean
051b62b71e net: phy: aquantia: delete aqr_firmware_read_fingerprint() prototype
This is a development artifact of commit a76f26f7a8 ("net: phy:
aquantia: support phy-mode = "10g-qxgmii" on NXP SPF-30841 (AQR412C)").
This function name isn't used. Instead we have aqr_build_fingerprint()
in aquantia_main.c.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250908134313.315406-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:24:17 -07:00
Jakub Kicinski
0d0766a47c Merge branch 'net-phy-fixed_phy-improvements'
Heiner Kallweit says:

====================
net: phy: fixed_phy: improvements

This series contains a number of improvements.
No functional change intended.
====================

Link: https://patch.msgid.link/e81be066-cc23-4055-aed7-2fbc86da1ff7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:11:58 -07:00
Heiner Kallweit
2983825579 net: phy: fixed_phy: remove struct fixed_mdio_bus
Use two separate static variables instead of the struct, this allows
to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:11:55 -07:00
Heiner Kallweit
f8db55c8eb net: phy: fixed_phy: add helper fixed_phy_find
Factor out the functionality to search for a fixed_phy matching an
address. This improves readability of the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:11:53 -07:00
Heiner Kallweit
0625b3bfbb net: phy: fixed_phy: remove member no_carrier from struct fixed_phy
After the recent removal of gpio support member no_carrier isn't
needed any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:11:50 -07:00
Heiner Kallweit
fecf7087f0 net: phy: fixed_phy: remove unused interrupt support
The two callers of __fixed_phy_add() both pass PHY_POLL, so we can
remove the irq argument to simplify the function.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 18:11:47 -07:00
Alok Tiwari
d436b5abba ipv4: udp: fix typos in comments
Correct typos in ipv4/udp.c comments for clarity:
"Encapulation" -> "Encapsulation"
"measureable" -> "measurable"
"tacking care" -> "taking care"

No functional changes.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907192535.3610686-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 16:29:05 -07:00
Jakub Kicinski
1c0353a6df selftests: net: speed up pmtu.sh by avoiding unnecessary cleanup
The pmtu test takes nearly an hour when run on a debug kernel
(10min on a normal kernel, so the debug slow down is quite significant).
NIPA tries to ensure all results are delivered by a certain deadline
so this prevents it from retrying the test in case of a flake.

Looks like one of the slowest operations in the test is calling out
to ./openvswitch/ovs-dpctl.py to remove potential leftover OvS interfaces.
Check whether the interfaces exist in the first place in sysfs,
since it can be done directly in bash it is very fast.

This should save us around 20-30% of the test runtime.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250906214535.3204785-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 16:26:44 -07:00
Jakub Kicinski
a12fd5c31b selftests: net: run groups from fcnal-test in parallel
fcnal-test.sh takes almost hour and a half to finish.
The tests are already grouped into ipv4, ipv6 and other.
Run those groups separately.

Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250908201021.270681-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 15:34:11 -07:00
Jakub Kicinski
3b4296f589 Merge tag 'mlx5-rs-fec-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:

====================
mlx5-next updates 2025-09-09

The following pull-request contains a common mlx5 update.

* tag 'mlx5-rs-fec-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add RS FEC histogram infrastructure
====================

Link: https://patch.msgid.link/1757413460-539097-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09 09:03:51 -07:00
Jakub Kicinski
0574c27cbe eth: fbnic: support persistent NAPI config
No shenanigans in this driver, AFAIU, pass the vector index to NAPI
registration.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250905022254.2635707-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 16:03:27 +02:00
Brett A C Sheffield
aeb8d48ea9 selftests: net: add test for ipv6 fragmentation
Add selftest for the IPv6 fragmentation regression which affected
several stable kernels.

Commit a18dfa9925 ("ipv6: save dontfrag in cork") was backported to
stable without some prerequisite commits.  This caused a regression when
sending IPv6 UDP packets by preventing fragmentation and instead
returning -1 (EMSGSIZE).

Add selftest to check for this issue by attempting to send a packet
larger than the interface MTU. The packet will be fragmented on a
working kernel, with sendmsg(2) correctly returning the expected number
of bytes sent.  When the regression is present, sendmsg returns -1 and
sets errno to EMSGSIZE.

Link: https://lore.kernel.org/stable/aElivdUXqd1OqgMY@karahi.gladserv.com
Signed-off-by: Brett A C Sheffield <bacs@librecast.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250903154925.13481-1-bacs@librecast.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 11:43:59 +02:00
Hangbin Liu
d67ca09ca3 hsr: use netdev_master_upper_dev_link() when linking lower ports
Unlike VLAN devices, HSR changes the lower device’s rx_handler, which
prevents the lower device from being attached to another master.
Switch to using netdev_master_upper_dev_link() when setting up the lower
device.

This could improves user experience, since ip link will now display the
HSR device as the master for its ports.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902065558.360927-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 11:27:37 +02:00
Paolo Abeni
c27334aef6 Merge branch 'bonding-support-aggregator-selection-based-on-port-priority'
Hangbin Liu says:

====================
bonding: support aggregator selection based on port priority

This patchset introduces a new per-port bonding option: `ad_actor_port_prio`.

It allows users to configure the actor's port priority, which can then be used
by the bonding driver for aggregator selection based on port priority.

This provides finer control over LACP aggregator choice, especially in setups
with multiple eligible aggregators over 2 switches.
====================

Link: https://patch.msgid.link/20250902064501.360822-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:05 +02:00
Hangbin Liu
c2377f1763 selftests: bonding: add test for LACP actor port priority
Add comprehensive selftest to verify:
- Per-port actor priority setting via ad_actor_port_prio
- Aggregator selection behavior with port_priority ad_select policy

Also move cmd_jq helper from forwarding/lib.sh to net/lib.sh for
broader reusability across network selftests.

Here is the result output
  # ./bond_lacp_prio.sh
  TEST: bond 802.3ad (ad_actor_port_prio setting)                     [ OK ]
  TEST: bond 802.3ad (ad_actor_port_prio select)                      [ OK ]
  TEST: bond 802.3ad (ad_actor_port_prio switch)                      [ OK ]

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-4-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:02 +02:00
Hangbin Liu
e5a6643435 bonding: support aggregator selection based on port priority
Add a new ad_select policy 'port_priority' that uses the per-port
actor priority values (set via ad_actor_port_prio) to determine
aggregator selection.

This allows administrators to influence which ports are preferred
for aggregation by assigning different priority values, providing
more flexible load balancing control in LACP configurations.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:02 +02:00
Hangbin Liu
6b6dc81ee7 bonding: add support for per-port LACP actor priority
Introduce a new netlink attribute 'actor_port_prio' to allow setting
the LACP actor port priority on a per-slave basis. This extends the
existing bonding infrastructure to support more granular control over
LACP negotiations.

The priority value is embedded in LACPDU packets and will be used by
subsequent patches to influence aggregator selection policies.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 10:56:02 +02:00
Carolina Jubran
ff97bc38be net/mlx5: Add RS FEC histogram infrastructure
Define the Ports Phy Histogram Configuration Register (PPHCR) to expose
RS-FEC histogram bin ranges, and expose a new counter group in the Ports
Performance Counters Register (PPCNT) to report the corresponding
histogram values.

Co-developed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1756884600-520195-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-09 04:18:19 -04:00
Paolo Abeni
389cca2bde Merge branch 'support-exposing-raw-cycle-counters-in-ptp-and-mlx5'
Tariq Toukan says:

====================
Support exposing raw cycle counters in PTP and mlx5

This series by Carolina adds support in ptp and usage in mlx5 for
exposing the raw free-running cycle counter of PTP hardware clocks.

This is V2. Find previous one here:
https://lore.kernel.org/all/1752556533-39218-1-git-send-email-tariqt@nvidia.com/

Find detailed description by Carolina below [1].

[1]
This patch series introduces support for exposing the raw free-running
cycle counter of PTP hardware clocks. When the device is in free-running
mode, it emits timestamps as raw cycle values instead of nanoseconds.
These values may be passed directly to user space through:

- fwctl: exposes internal device event records that include raw
         cycle-based timestamps.

- DPDK: retrieves CQEs that contain raw cycle counters, which are passed
        to user space unmodified.

To address this, the series introduces two new ioctl commands that allow
userspace to query the device's raw cycle counter together with host
time:

 - PTP_SYS_OFFSET_PRECISE_CYCLES

 - PTP_SYS_OFFSET_EXTENDED_CYCLES

These commands work like their existing counterparts but return the
device timestamp in cycle units instead of real-time nanoseconds.  This
allows user space to collect (cycle, time) pairs and build a mapping
between the device’s free-running clock and host time.

This can also be useful in the XDP fast path: if a driver inserts the
raw cycle value into metadata instead of a real-time timestamp, it can
avoid the overhead of converting cycles to time in the kernel. Then
userspace can resolve the cycle-to-time mapping using this ioctl when
needed.

The ioctl enables user space to correlate those with host time, without
requiring the PHC to be synchronized, so long as the drift remains
stable during collection.

Adds the new PTP ioctls and integrates support in ptp_ioctl():
- ptp: Add ioctl commands to expose raw cycle counter values

Support for exposing raw cycles in mlx5:
- net/mlx5: Extract MTCTR register read logic into helper function
- net/mlx5: Support getcyclesx and getcrosscycles
====================

Link: https://patch.msgid.link/1755008228-88881-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:26 +02:00
Carolina Jubran
a3fb485505 net/mlx5: Support getcyclesx and getcrosscycles
Implement the getcyclesx64 and getcrosscycles callbacks in ptp_info to
expose the device’s raw free-running counter.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1755008228-88881-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:24 +02:00
Carolina Jubran
96c345c3c5 net/mlx5: Extract MTCTR register read logic into helper function
Refactor the MTCTR register reading logic into a dedicated helper to
lay the groundwork for the next patch.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1755008228-88881-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:24 +02:00
Carolina Jubran
faf23f54d3 ptp: Add ioctl commands to expose raw cycle counter values
Introduce two new ioctl commands, PTP_SYS_OFFSET_PRECISE_CYCLES and
PTP_SYS_OFFSET_EXTENDED_CYCLES, to allow user space to access the
raw free-running cycle counter from PTP devices.

These ioctls are variants of the existing PRECISE and EXTENDED
offset queries, but instead of returning device time in realtime,
they return the raw cycle counter value.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/1755008228-88881-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09 09:33:24 +02:00
Håkon Bugge
9f0730b063 rds: ib: Remove unused extern definition
In the old days, RDS used FMR (Fast Memory Registration) to register
IB MRs to be used by RDMA. A newer and better verbs based
registration/de-registration method called FRWR (Fast Registration
Work Request) was added to RDS by commit 1659185fb4 ("RDS: IB:
Support Fastreg MR (FRMR) memory registration mode") in 2016.

Detection and enablement of FRWR was done in commit 2cb2912d65
("RDS: IB: add Fastreg MR (FRMR) detection support"). But said commit
added an extern bool prefer_frmr, which was not used by said commit -
nor used by later commits. Hence, remove it.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250905101958.4028647-1-haakon.bugge@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:16:49 -07:00
Jakub Kicinski
6e0cca6ba3 Merge branch 'net-stmmac-mdio-cleanups'
Russell King says:

====================
net: stmmac: mdio cleanups

Clean up the stmmac MDIO code:
- provide an address register formatter to avoid repeated code
- provide a common function to wait for the busy bit to clear
- pre-compute the CR field (mdio clock divider)
- move address formatter into read/write functions
- combine the read/write functions into a common accessor function
- move runtime PM handling into common accessor function
- rename register constants to better reflect manufacturer names
- move stmmac_clk_csr_set() into stmmac_mdio
- make stmmac_clk_csr_set() return the CR field value and remove
  priv->clk_csr
- clean up if() range tests in stmmac_clk_csr_set()
- use STMMAC_CSR_xxx definitions in initialisers

For Qualcomm QCS9100 Ride R3 board with the AQR115C PHY:

Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
====================

Link: https://patch.msgid.link/aLmBwsMdW__XBv7g@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:10 -07:00
Russell King (Oracle)
fc8f62c827 net: stmmac: use STMMAC_CSR_xxx definitions in platform glue
Use the STMMAC_CSR_xxx definitions to initialise plat->clk_csr in the
platform glue drivers to make the integer values meaningful.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oh-00000001vpT-0vk2@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:03 -07:00
Russell King (Oracle)
78c91bec8f net: stmmac: mdio: remove redundant clock rate tests
The pattern:

	... if (v < A)
		...
	else if (v >= A && v < B)
		...

can be simplified to:

	... if (v < A)
		...
	else if (v < B)
		...

which makes the string of ifelse more readable.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Mohd Ayaan Anwar <quic_mohdayaa@quicinc.com>
Link: https://patch.msgid.link/E1uu8oc-00000001vpN-0S1A@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-08 18:12:03 -07:00