Commit Graph

1336786 Commits

Author SHA1 Message Date
Xiao Liang
9c0fc091dc rtnetlink: Remove "net" from newlink params
Now that devices have been converted to use the specific netns instead
of ambiguous "net", let's remove it from newlink parameters.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-11-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang
5314e3d684 net: xfrm: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-10-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang
5e72ce3e39 net: ipv6: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-9-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
db014522f3 net: ipv6: Init tunnel link-netns before registering dev
Currently some IPv6 tunnel drivers set tnl->net to dev_net(dev) in
ndo_init(), which is called in register_netdevice(). However, it lacks
the context of link-netns when we enable cross-net tunnels at device
registration time.

Let's move the init of tunnel link-netns before register_netdevice().

ip6_gre has already initialized netns, so just remove the redundant
assignment.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-8-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
eacb116053 net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Convert common ip_tunnel_newlink() to accept an extra link netns
argument.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-7-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
9e17b2a1a0 net: ip_tunnel: Don't set tunnel->net in ip_tunnel_init()
ip_tunnel_init() is called from register_netdevice(). In all code paths
reaching here, tunnel->net should already have been set (either in
ip_tunnel_newlink() or __ip_tunnel_create()). So don't set it again.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-6-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
3533717581 ieee802154: 6lowpan: Validate link netns in newlink() of rtnl_link_ops
Device denoted by IFLA_LINK is in link_net (IFLA_LINK_NETNSID) or
source netns by design, but 6lowpan uses dev_net.

Note dev->netns_local is set to true and currently link_net is
implemented via a netns change. These together effectively reject
IFLA_LINK_NETNSID.

This patch adds a validation to ensure link_net is either NULL or
identical to dev_net. Thus it would be fine to continue using dev_net
when rtnetlink core begins to create devices directly in target netns.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-5-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
cf517ac16a net: Use link/peer netns in newlink() of rtnl_link_ops
Add two helper functions - rtnl_newlink_link_net() and
rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back
to link netns, and link netns falls back to source netns.

Convert the use of params->net in netdevice drivers to one of the helper
functions for clarity.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
69c7be1b90 rtnetlink: Pack newlink() params into struct
There are 4 net namespaces involved when creating links:

 - source netns - where the netlink socket resides,
 - target netns - where to put the device being created,
 - link netns - netns associated with the device (backend),
 - peer netns - netns of peer device.

Currently, two nets are passed to newlink() callback - "src_net"
parameter and "dev_net" (implicitly in net_device). They are set as
follows, depending on netlink attributes in the request.

 +------------+-------------------+---------+---------+
 | peer netns | IFLA_LINK_NETNSID | src_net | dev_net |
 +------------+-------------------+---------+---------+
 |            | absent            | source  | target  |
 | absent     +-------------------+---------+---------+
 |            | present           | link    | link    |
 +------------+-------------------+---------+---------+
 |            | absent            | peer    | target  |
 | present    +-------------------+---------+---------+
 |            | present           | peer    | link    |
 +------------+-------------------+---------+---------+

When IFLA_LINK_NETNSID is present, the device is created in link netns
first and then moved to target netns. This has some side effects,
including extra ifindex allocation, ifname validation and link events.
These could be avoided if we create it in target netns from
the beginning.

On the other hand, the meaning of src_net parameter is ambiguous. It
varies depending on how parameters are passed. It is the effective
link (or peer netns) by design, but some drivers ignore it and use
dev_net instead.

To provide more netns context for drivers, this patch packs existing
newlink() parameters, along with the source netns, link netns and peer
netns, into a struct. The old "src_net" is renamed to "net" to avoid
confusion with real source netns, and will be deprecated later. The use
of src_net are converted to params->net trivially.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
ec061546c6 rtnetlink: Lookup device in target netns when creating link
When creating link, lookup for existing device in target net namespace
instead of current one.
For example, two links created by:

  # ip link add dummy1 type dummy
  # ip link add netns ns1 dummy1 type dummy

should have no conflict since they are in different namespaces.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-2-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:01 -08:00
Jakub Kicinski
4fe67dd2d5 Merge branch 'dt-bindings-net-realtek-rtl9301-switch'
Chris Packham says:

====================
dt-bindings: net: realtek,rtl9301-switch (schema part)

This is my attempt at trying to sort out the mess I've created with the RTL9300
switch dt-bindings. Some context is available on [1] and [2].

The first patch just moves the binding from mfd/ to net/ (with an adjustment of
the internal path name). The next two patches are successors to patches already
sent as part of the series [3].

[1] - https://lore.kernel.org/lkml/20250204-eccentric-deer-of-felicity-02b7ee@krzk-bin/
[2] - https://lore.kernel.org/lkml/4e3c5d83-d215-4eff-bf02-6d420592df8f@alliedtelesis.co.nz/
[3] - https://lore.kernel.org/lkml/20250204030249.1965444-1-chris.packham@alliedtelesis.co.nz/
====================

Link: https://patch.msgid.link/20250218195216.1034220-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:08:05 -08:00
Chris Packham
96757457da dt-bindings: net: Add Realtek MDIO controller
Add dtschema for the MDIO controller found in the RTL9300 Ethernet
switch. The controller is slightly unusual in that direct MDIO
communication is not possible. We model the MDIO controller with the
MDIO buses as child nodes and the PHYs as children of the buses. The
mapping of switch port number to MDIO bus/addr requires the
ethernet-ports sibling to provide the mapping via the phy-handle
property.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-4-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:07:15 -08:00
Chris Packham
92575a2182 dt-bindings: net: Add switch ports and interrupts to RTL9300
Add bindings for the ethernet-switch and interrupt properties for the
RTL9300.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:07:15 -08:00
Chris Packham
3fa337651d dt-bindings: net: Move realtek,rtl9301-switch to net
Initially realtek,rtl9301-switch was placed under mfd/ because it had
some non-switch related blocks (specifically i2c and reset) but with a
bit more review it has become apparent that this was wrong and the
binding should live under net/.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-2-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:07:15 -08:00
Birger Koblitz
a850355610 net: sfp: add quirk for 2.5G OEM BX SFP
The OEM SFP-2.5G-BX10-D/U SFP module pair is meant to operate with
2500Base-X. However, in their EEPROM they incorrectly specify:
Transceiver codes   : 0x00 0x12 0x00 0x00 0x12 0x00 0x01 0x05 0x00
BR, Nominal         : 2500MBd

Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.
Tested on BananaPi R3.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20250218-b4-lkmsub-v1-1-1e51dcabed90@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:04:56 -08:00
Heiner Kallweit
bb3bb6c92e net: phy: remove unused feature array declarations
After 12d5151be0 ("net: phy: remove leftovers from switch to linkmode
bitmaps") the following declarations are unused and can be removed too.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/b2883c75-4108-48f2-ab73-e81647262bc2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 18:16:21 -08:00
Jakub Kicinski
56b06a71fc Merge branch 'selftests-drv-net-improve-the-queue-test-for-xsk'
Jakub Kicinski says:

====================
selftests: drv-net: improve the queue test for XSK

We see some flakes in the the XSK test:

   Exception| Traceback (most recent call last):
   Exception|   File "/home/virtme/testing-18/tools/testing/selftests/net/lib/py/ksft.py", line 218, in ksft_run
   Exception|     case(*args)
   Exception|   File "/home/virtme/testing-18/tools/testing/selftests/drivers/net/./queues.py", line 53, in check_xdp
   Exception|     ksft_eq(q['xsk'], {})
   Exception| KeyError: 'xsk'

I think it's because the method of running the helper in the background
is racy. Add more solid infra for waiting for a background helper to be
initialized.

v1: https://lore.kernel.org/20250218195048.74692-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250219234956.520599-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
932a9249f7 selftests: drv-net: rename queues check_xdp to check_xsk
The test is for AF_XDP, we refer to AF_XDP as XSK.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
4fde839846 selftests: drv-net: improve the use of ksft helpers in XSK queue test
Avoid exceptions when xsk attr is not present, and add a proper ksft
helper for "not in" condition.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
7147713799 selftests: drv-net: add a way to wait for a local process
We use wait_port_listen() extensively to wait for a process
we spawned to be ready. Not all processes will open listening
sockets. Add a method of explicitly waiting for a child to
be ready. Pass a FD to the spawned process and wait for it
to write a message to us. FD number is passed via KSFT_READY_FD
env variable.

Similarly use KSFT_WAIT_FD to let the child process for a sign
that we are done and child should exit. Sending a signal to
a child with shell=True can get tricky.

Make use of this method in the queues test to make it less flaky.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
d3726ab45c selftests: drv-net: probe for AF_XDP sockets more explicitly
Separate the support check from socket binding for easier refactoring.
Use: ./helper - - just to probe if we can open the socket.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
bab59dcf71 selftests: drv-net: add missing new line in xdp_helper
Kurt and Joe report missing new line at the end of Usage.

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
dabd31baa3 selftests: drv-net: use cfg.rpath() in netlink xsk attr test
The cfg.rpath() helper was been recently added to make formatting
paths for helper binaries easier.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
846742f7e3 selftests: drv-net: add a warning for bkg + shell + terminate
Joe Damato reports that some shells will fork before running
the command when python does "sh -c $cmd", while bash on my
machine does an exec of $cmd directly.

This will have implications for our ability to terminate
the child process on various configurations of bash and
other shells. Warn about using

	bkg(... shell=True, termininate=True)

most background commands can hopefully exit cleanly (exit_wait).

Link: https://lore.kernel.org/Z7Yld21sv_Ip3gQx@LQ3V64L9R2
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:57:29 -08:00
Arnd Bergmann
ca57d1c56f octeontx2: hide unused label
A previous patch introduces a build-time warning when CONFIG_DCB
is disabled:

drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c: In function 'otx2_probe':
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:3217:1: error: label 'err_free_zc_bmap' defined but not used [-Werror=unused-label]
drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c: In function 'otx2vf_probe':
drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c:740:1: error: label 'err_free_zc_bmap' defined but not used [-Werror=unused-label]

Add the same #ifdef check around it.

Fixes: efabce2901 ("octeontx2-pf: AF_XDP zero copy receive support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Suman Ghosh <sumang@marvell.com>
Link: https://patch.msgid.link/20250219162239.1376865-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:38:02 -08:00
Charalampos Mitrodimas
8279a8dacf net: phy: qt2025: Fix hardware revision check comment
Correct the hardware revision check comment in the QT2025 driver. The
revision value was documented as 0x3b instead of the correct 0xb3,
which matches the actual comparison logic in the code.

Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Link: https://patch.msgid.link/20250219-qt2025-comment-fix-v2-1-029f67696516@posteo.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:37:32 -08:00
Jakub Kicinski
ac8f0aff41 Merge branch 'mlx5-misc-enhancements-2025-02-19'
Tariq Toukan says:

====================
mlx5 misc enhancements 2025-02-19

This small series enhances the mlx5 ethtool link speed code (no
functional change), in addition to a Kconfig description enhancement.
====================

Link: https://patch.msgid.link/20250219114112.403808-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:12 -08:00
Shahar Shitrit
9c362aafda net/mlx5e: Separate extended link modes request from link modes type selection
The function ext_requested() serves two distinct purposes: it checks
if extended link modes were requested, and it selects whether to use
extended or legacy link modes.

This change separates these two purposes. Now, ext_link_mode_requested()
is used directly for checking if extended link modes are requested,
while the selection of extended modes is handled independently based on
the autonegotiation status.

By making this distinction, the logic for determining whether to select
extended or legacy link modes is clearer.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219114112.403808-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:09 -08:00
Shahar Shitrit
9ca3bf013a net/mlx5e: Change eth_proto parameter naming
eth_proto_cap parameter represents the supported link modes,
while eth_proto_admin refers to the configured ones.

The function get_advertising() retrieves the configured link
modes, thus we update its parameter name to eth_proto_admin.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219114112.403808-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:09 -08:00
Shahar Shitrit
64d97f8919 net/mlx5e: Introduce ptys2ethtool_process_link()
The functions ptys2ethtool_supported_link(), ptys2ethtool_adver_link()
share the same code, thus, in order to remove code duplication we
introduce a new function ptys2ethtool_process_link() to handle the
processing of both supported and advertised link modes.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219114112.403808-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:09 -08:00
Shahar Shitrit
5246fd3fc2 net/mlx5e: Refactor ptys2ethtool_adver_link()
The function ptys2ethtool_adver_link() contains duplicated code that
is found in mlx5e_ethtool_get_speed_arr().

To eliminate this redundancy, we update mlx5e_ethtool_get_speed_arr()
to select the appropriate table based on the ext argument passed by
the caller, rather than querying the supported mode locally.

This allows us to replace the current logic in ptys2ethtool_adver_link()
with a call to mlx5e_ethtool_get_speed_arr().

This adjustment aligns with the ptys2ethtool_supported_link() function
and prepares for an upcoming patch that reduces code duplication.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219114112.403808-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:08 -08:00
Cosmin Ratiu
3fe090ad02 net/mlx5: Bridge, correct config option description
The implication of the previous help text was that without this option
enabled, representor devices couldn't be added to a bridge device, while
in fact that was possible, just that rules didn't get offloaded to hw.

This commit clarifies the help text.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219114112.403808-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:08 -08:00
Kohei Enju
ef75d8343b neighbour: Replace kvzalloc() with kzalloc() when GFP_ATOMIC is specified
kzalloc() uses page allocator when size is larger than
KMALLOC_MAX_CACHE_SIZE, so the intention of commit ab101c553b
("neighbour: use kvzalloc()/kvfree()") can be achieved by using kzalloc().

When using GFP_ATOMIC, kvzalloc() only tries the kmalloc path,
since the vmalloc path does not support the flag.
In this case, kvzalloc() is equivalent to kzalloc() in that neither try
the vmalloc path, so this replacement brings no functional change.
This is primarily a cleanup change, as the original code functions
correctly.

This patch replaces kvzalloc() introduced by commit 41b3caa7c0
("neighbour: Add hlist_node to struct neighbour"), which is called in
the same context and with the same gfp flag as the aforementioned commit
ab101c553b ("neighbour: use kvzalloc()/kvfree()").

Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20250219102227.72488-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:28:01 -08:00
Jakub Kicinski
5225861b5c Merge branch 'some-pktgen-fixes-improvments-part-i'
Peter Seiderer says:

====================
Some pktgen fixes/improvments (part I)

While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds
access in get_imix_entries' ([2], [3]) and doing some tests and code review
I detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).

This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):

        $ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0

        $ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000  min_pkt_size: 12345  max_pkt_size: 0
Result: OK: min_pkt_size=12345

        $ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000  min_pkt_size: 12345  max_pkt_size: 0
Result: OK: min_pkt_size=12345

        $ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000  min_pkt_size: 123  max_pkt_size: 0
Result: OK: min_pkt_size=123

So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...

[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@red-soft.ru/
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76201b5979768500bca362871db66d77cb4c225e
====================

Link: https://patch.msgid.link/20250219084527.20488-1-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:58 -08:00
Peter Seiderer
425e64440a net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
Honour the user given buffer size for the strn_len() calls (otherwise
strn_len() will access memory outside of the user given buffer).

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-8-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:56 -08:00
Peter Seiderer
1e5e511373 net: pktgen: fix ctrl interface command parsing
Enable command writing without trailing '\n':

- the good case

	$ echo "reset" > /proc/net/pktgen/pgctrl

- the bad case (before the patch)

	$ echo -n "reset" > /proc/net/pktgen/pgctrl
	-bash: echo: write error: Invalid argument

- with patch applied

	$ echo -n "reset" > /proc/net/pktgen/pgctrl

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-7-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:56 -08:00
Peter Seiderer
1c3bc2c325 net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
Given an invalid 'ratep' command e.g. 'ratep 0' the return value is '1',
leading to the following misleading output:

- the good case

	$ echo "ratep 100" > /proc/net/pktgen/lo\@0
	$ grep "Result:" /proc/net/pktgen/lo\@0
	Result: OK: ratep=100

- the bad case (before the patch)

	$ echo "ratep 0" > /proc/net/pktgen/lo\@0"
	-bash: echo: write error: Invalid argument
	$ grep "Result:" /proc/net/pktgen/lo\@0
	Result: No such parameter "atep"

- with patch applied

	$ echo "ratep 0" > /proc/net/pktgen/lo\@0
	-bash: echo: write error: Invalid argument
	$ grep "Result:" /proc/net/pktgen/lo\@0
	Result: Idle

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-6-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:56 -08:00
Peter Seiderer
3ba38c25a8 net: pktgen: fix 'rate 0' error handling (return -EINVAL)
Given an invalid 'rate' command e.g. 'rate 0' the return value is '1',
leading to the following misleading output:

- the good case

	$ echo "rate 100" > /proc/net/pktgen/lo\@0
	$ grep "Result:" /proc/net/pktgen/lo\@0
	Result: OK: rate=100

- the bad case (before the patch)

	$ echo "rate 0" > /proc/net/pktgen/lo\@0"
	-bash: echo: write error: Invalid argument
	$ grep "Result:" /proc/net/pktgen/lo\@0
	Result: No such parameter "ate"

- with patch applied

	$ echo "rate 0" > /proc/net/pktgen/lo\@0
	-bash: echo: write error: Invalid argument
	$ grep "Result:" /proc/net/pktgen/lo\@0
	Result: Idle

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-5-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:56 -08:00
Peter Seiderer
b38504346a net: pktgen: fix hex32_arg parsing for short reads
Fix hex32_arg parsing for short reads (here 7 hex digits instead of the
expected 8), shift result only on successful input parsing.

- before the patch

	$ echo "mpls 0000123" > /proc/net/pktgen/lo\@0
	$ grep mpls /proc/net/pktgen/lo\@0
	     mpls: 00001230
	Result: OK: mpls=00001230

- with patch applied

	$ echo "mpls 0000123" > /proc/net/pktgen/lo\@0
	$ grep mpls /proc/net/pktgen/lo\@0
	     mpls: 00000123
	Result: OK: mpls=00000123

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-4-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:55 -08:00
Peter Seiderer
80604d19b5 net: pktgen: enable 'param=value' parsing
Enable more flexible parameters syntax, allowing 'param=value' in
addition to the already supported 'param value' pattern (additional
this gives the skipping '=' in count_trail_chars() a purpose).

Tested with:

	$ echo "min_pkt_size 999" > /proc/net/pktgen/lo\@0
	$ echo "min_pkt_size=999" > /proc/net/pktgen/lo\@0
	$ echo "min_pkt_size =999" > /proc/net/pktgen/lo\@0
	$ echo "min_pkt_size= 999" > /proc/net/pktgen/lo\@0
	$ echo "min_pkt_size = 999" > /proc/net/pktgen/lo\@0

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-3-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:55 -08:00
Peter Seiderer
802fb6db9f net: pktgen: replace ENOTSUPP with EOPNOTSUPP
Replace ENOTSUPP with EOPNOTSUPP, fixes checkpatch hint

  WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP

and e.g.

  $ echo "clone_skb 1" > /proc/net/pktgen/lo\@0
  -bash: echo: write error: Unknown error 524

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:24:55 -08:00
Purva Yeshi
1340461e51 af_unix: Fix undefined 'other' error
Fix an issue with the sparse static analysis tool where an
"undefined 'other'" error occurs due to `__releases(&unix_sk(other)->lock)`
being placed before 'other' is in scope.

Remove the `__releases()` annotation from the `unix_wait_for_peer()`
function to eliminate the sparse error. The annotation references `other`
before it is declared, leading to a false positive error during static
analysis.

Since AF_UNIX does not use sparse annotations, this annotation is
unnecessary and does not impact functionality.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Purva Yeshi <purvayeshi550@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250218141045.38947-1-purvayeshi550@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 15:28:46 -08:00
Mohsin Bashir
7b5b7a597f eth: fbnic: Add ethtool support for IRQ coalescing
Add ethtool support to configure the IRQ coalescing behavior. Support
separate timers for Rx and Tx for time based coalescing. For frame based
configuration, currently we only support the Rx side.

The hardware allows configuration of descriptor count instead of frame
count requiring conversion between the two. We assume 2 descriptors
per frame, one for the metadata and one for the data segment.

When rx-frames are not configured, we set the RX descriptor count to
half the ring size as a fail safe.

Default configuration:
ethtool -c eth0 | grep -E "rx-usecs:|tx-usecs:|rx-frames:"
rx-usecs:       30
rx-frames:      0
tx-usecs:       35

IRQ rate test:
With single iperf flow we monitor IRQ rate while changing the tx-usesc and
rx-usecs to high and low values.

ethtool -C eth0 rx-frames 8192 rx-usecs 150 tx-usecs 150
irq/sec   13k
irq/sec   14k
irq/sec   14k

ethtool -C eth0 rx-frames 8192 rx-usecs 10 tx-usecs 10
irq/sec  27k
irq/sec  28k
irq/sec  28k

Validating the use of extack:
ethtool -C eth0 rx-frames 16384
netlink error: fbnic: rx_frames is above device max
netlink error: Invalid argument

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250218023520.2038010-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 15:00:43 -08:00
Jakub Kicinski
f438d8da3c Merge branch 'support-ptp-clock-for-wangxun-nics'
Jiawen Wu says:

====================
Support PTP clock for Wangxun NICs

Implement support for PTP clock on Wangxun NICs.

v7: https://lore.kernel.org/20250213083041.78917-1-jiawenwu@trustnetic.com
v6: https://lore.kernel.org/20250208031348.4368-1-jiawenwu@trustnetic.com
v5: https://lore.kernel.org/20250117062051.2257073-1-jiawenwu@trustnetic.com
v4: https://lore.kernel.org/20250114084425.2203428-1-jiawenwu@trustnetic.com
v3: https://lore.kernel.org/20250110031716.2120642-1-jiawenwu@trustnetic.com
v2: https://lore.kernel.org/20250106084506.2042912-1-jiawenwu@trustnetic.com
v1: https://lore.kernel.org/20250102103026.1982137-1-jiawenwu@trustnetic.com
====================

Link: https://patch.msgid.link/20250218023432.146536-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 14:59:40 -08:00
Jiawen Wu
2d8967e86c net: ngbe: Add support for 1PPS and TOD
Implement support for generating a 1pps output signal on SDP0.
And support custom firmware to output TOD.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250218023432.146536-5-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 14:59:37 -08:00
Jiawen Wu
704145a854 net: wangxun: Add periodic checks for overflow and errors
Implement watchdog task to detect SYSTIME overflow and error cases of
Rx/Tx timestamp.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250218023432.146536-4-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 14:59:37 -08:00
Jiawen Wu
ce114069a6 net: wangxun: Support to get ts info
Implement the function get_ts_info and get_ts_stats in ethtool_ops to
get the HW capabilities and statistics for timestamping.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250218023432.146536-3-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 14:59:37 -08:00
Jiawen Wu
06e75161b9 net: wangxun: Add support for PTP clock
Implement support for PTP clock on Wangxun NICs.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250218023432.146536-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 14:59:36 -08:00
Akihiko Odaki
4adf749710 tun: Pad virtio headers
tun simply advances iov_iter when it needs to pad virtio header,
which leaves the garbage in the buffer as is. This will become
especially problematic when tun starts to allow enabling the hash
reporting feature; even if the feature is enabled, the packet may lack a
hash value and may contain a hole in the virtio header because the
packet arrived before the feature gets enabled or does not contain the
header fields to be hashed. If the hole is not filled with zero, it is
impossible to tell if the packet lacks a hash value.

In theory, a user of tun can fill the buffer with zero before calling
read() to avoid such a problem, but leaving the garbage in the buffer is
awkward anyway so replace advancing the iterator with writing zeros.

A user might have initialized the buffer to some non-zero value,
expecting tun to skip writing it. As this was never a documented
feature, this seems unlikely.

The overhead of filling the hole in the header is negligible when the
header size is specified according to the specification as doing so will
not make another cache line dirty under a reasonable assumption. Below
is a proof of this statement:

The first 10 bytes of the header is always written and tun also writes
the packet itself immediately after the packet unless the packet is
empty. This makes a hole between these writes whose size is: sz - 10
where sz is the specified header size.

Therefore, we will never make another cache line dirty when:
sz < L1_CACHE_BYTES + 10
where L1_CACHE_BYTES is the cache line size. Assuming
L1_CACHE_BYTES >= 16, this inequation holds when: sz < 26.

sz <= 20 according to the current specification so we even have a
margin of 5 bytes in case that the header size grows in a future version
of the specification.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://patch.msgid.link/20250215-buffers-v2-1-1fbc6aaf8ad6@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 14:16:52 -08:00
Breno Leitao
bf3624cf1c netdevsim: call napi_schedule from a timer context
The netdevsim driver was experiencing NOHZ tick-stop errors during packet
transmission due to pending softirq work when calling napi_schedule().
This issue was observed when running the netconsole selftest, which
triggered the following error message:

  NOHZ tick-stop error: local softirq work is pending, handler #08!!!

To fix this issue, introduce a timer that schedules napi_schedule()
from a timer context instead of calling it directly from the TX path.

Create an hrtimer for each queue and kick it from the TX path,
which then schedules napi_schedule() from the timer context.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250219-netdevsim-v3-1-811e2b8abc4c@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 13:18:31 -08:00