Commit Graph

1336830 Commits

Author SHA1 Message Date
Russell King (Oracle)
98f9928843 net: stmmac: qcom-ethqos: use rgmii_clock() to set the link clock
The link clock operates at twice the RGMII clock rate. Therefore, we
can use the rgmii_clock() helper to set this clock rate.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tlRMK-004Vsx-Ss@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24 12:53:48 -08:00
Jason Wang
e13b6da704 virtio-net: tweak for better TX performance in NAPI mode
There are several issues existed in start_xmit():

- Transmitted packets need to be freed before sending a packet, this
  introduces delay and increases the average packets transmit
  time. This also increase the time that spent in holding the TX lock.
- Notification is enabled after free_old_xmit_skbs() which will
  introduce unnecessary interrupts if TX notification happens on the
  same CPU that is doing the transmission now (actually, virtio-net
  driver are optimized for this case).

So this patch tries to avoid those issues by not cleaning transmitted
packets in start_xmit() when TX NAPI is enabled and disable
notifications even more aggressively. Notification will be since the
beginning of the start_xmit(). But we can't enable delayed
notification after TX is stopped as we will lose the
notifications. Instead, the delayed notification needs is enabled
after the virtqueue is kicked for best performance.

Performance numbers:

1) single queue 2 vcpus guest with pktgen_sample03_burst_single_flow.sh
   (burst 256) + testpmd (rxonly) on the host:

- When pinning TX IRQ to pktgen VCPU: split virtqueue PPS were
  increased 55% from 6.89 Mpps to 10.7 Mpps and 32% TX interrupts were
  eliminated. Packed virtqueue PPS were increased 50% from 7.09 Mpps to
  10.7 Mpps, 99% TX interrupts were eliminated.

- When pinning TX IRQ to VCPU other than pktgen: split virtqueue PPS
  were increased 96% from 5.29 Mpps to 10.4 Mpps and 45% TX interrupts
  were eliminated; Packed virtqueue PPS were increased 78% from 6.12
  Mpps to 10.9 Mpps and 99% TX interrupts were eliminated.

2) single queue 1 vcpu guest + vhost-net/TAP on the host: single
   session netperf from guest to host shows 82% improvement from
   31Gb/s to 58Gb/s, %stddev were reduced from 34.5% to 1.9% and 88%
   of TX interrupts were eliminated.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-24 08:10:24 +00:00
Jakub Kicinski
b66e19dcf6 Merge branch 'mctp-add-mctp-over-usb-hardware-transport-binding'
Jeremy Kerr says:

====================
mctp: Add MCTP-over-USB hardware transport binding

Add an implementation of the DMTF standard DSP0283, providing an MCTP
channel over high-speed USB.

This is a fairly trivial first implementation, in that we only submit
one tx and one rx URB at a time. We do accept multi-packet transfers,
but do not yet generate them on transmit.

Of course, questions and comments are most welcome, particularly on the
USB interfaces.

v2: https://lore.kernel.org/20250212-dev-mctp-usb-v2-0-76e67025d764@codeconstruct.com.au
v1: https://lore.kernel.org/20250206-dev-mctp-usb-v1-0-81453fe26a61@codeconstruct.com.au
====================

Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-0-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:45:26 -08:00
Jeremy Kerr
0791c0327a net: mctp: Add MCTP USB transport driver
Add an implementation for DMTF DSP0283, which defines a MCTP-over-USB
transport. As per that spec, we're restricted to full speed mode,
requiring 512-byte transfers.

Each MCTP-over-USB interface is a peer-to-peer link to a single MCTP
endpoint, so no physical addressing is required (of course, that MCTP
endpoint may then bridge to further MCTP endpoints). Consequently,
interfaces will report with no lladdr data:

    # mctp link
    dev lo index 1 address 00:00:00:00:00:00 net 1 mtu 65536 up
    dev mctpusb0 index 6 address none net 1 mtu 68 up

This is a simple initial implementation, with single rx & tx urbs, and
no multi-packet tx transfers - although we do accept multi-packet rx
from the device.

Includes suggested fixes from Santosh Puranik <spuranik@nvidia.com>.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Cc: Santosh Puranik <spuranik@nvidia.com>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-2-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:45:21 -08:00
Jeremy Kerr
dcc35baae7 usb: Add base USB MCTP definitions
Upcoming changes will add a USB host (and later gadget) driver for the
MCTP-over-USB protocol. Add a header that provides common definitions
for protocol support: the packet header format and a few framing
definitions. Add a define for the MCTP class code, as per
https://usb.org/defined-class-codes.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:45:21 -08:00
Sean Anderson
e6a532185d net: cadence: macb: Implement BQL
Implement byte queue limits to allow queuing disciplines to account for
packets enqueued in the ring buffer but not yet transmitted. There are a
separate set of transmit functions for AT91 that I haven't touched since
I don't have hardware to test on.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250220164257.96859-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:39:08 -08:00
Russell King (Oracle)
3e401818c8 net: stmmac: print stmmac_init_dma_engine() errors using netdev_err()
stmmac_init_dma_engine() uses dev_err() which leads to errors being
reported as e.g:

dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma
dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed

stmmac_init_dma_engine() is only called from stmmac_hw_setup() which
itself uses netdev_err(), and we will have a net_device setup. So,
change the dev_err() to netdev_err() to give consistent error
messages.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tl5y1-004UgG-8X@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:29:05 -08:00
Hangbin Liu
465b210fdc selftests: fib_nexthops: do not mark skipped tests as failed
The current test marks all unexpected return values as failed and sets ret
to 1. If a test is skipped, the entire test also returns 1, incorrectly
indicating failure.

To fix this, add a skipped variable and set ret to 4 if it was previously
0. Otherwise, keep ret set to 1.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250220085326.1512814-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:23:29 -08:00
Jakub Kicinski
27422c3738 Merge branch 'net-fib_rules-add-dscp-mask-support'
Ido Schimmel says:

====================
net: fib_rules: Add DSCP mask support

In some deployments users would like to encode path information into
certain bits of the IPv6 flow label, the UDP source port and the DSCP
field and use this information to route packets accordingly.

Redirecting traffic to a routing table based on specific bits in the
DSCP field is not currently possible. Only exact match is currently
supported by FIB rules.

This patchset extends FIB rules to match on the DSCP field with an
optional mask.

Patches #1-#5 gradually extend FIB rules to match on the DSCP field with
an optional mask.

Patch #6 adds test cases for the new functionality.

iproute2 support can be found here [1].

[1] https://github.com/idosch/iproute2/tree/submit/fib_rule_mask_v1
====================

Link: https://patch.msgid.link/20250220080525.831924-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:55 -08:00
Ido Schimmel
e818d1d1a6 selftests: fib_rule_tests: Add DSCP mask match tests
Add tests for FIB rules that match on DSCP with a mask. Test both good
and bad flows and both the input and output paths.

 # ./fib_rule_tests.sh
 IPv6 FIB rule tests
 [...]
    TEST: rule6 check: dscp redirect to table                           [ OK ]
    TEST: rule6 check: dscp no redirect to table                        [ OK ]
    TEST: rule6 del by pref: dscp redirect to table                     [ OK ]
    TEST: rule6 check: iif dscp redirect to table                       [ OK ]
    TEST: rule6 check: iif dscp no redirect to table                    [ OK ]
    TEST: rule6 del by pref: iif dscp redirect to table                 [ OK ]
    TEST: rule6 check: dscp masked redirect to table                    [ OK ]
    TEST: rule6 check: dscp masked no redirect to table                 [ OK ]
    TEST: rule6 del by pref: dscp masked redirect to table              [ OK ]
    TEST: rule6 check: iif dscp masked redirect to table                [ OK ]
    TEST: rule6 check: iif dscp masked no redirect to table             [ OK ]
    TEST: rule6 del by pref: iif dscp masked redirect to table          [ OK ]
 [...]

 Tests passed: 316
 Tests failed:   0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-7-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:49 -08:00
Ido Schimmel
0df1328eaf netlink: specs: Add FIB rule DSCP mask attribute
Add new DSCP mask attribute to the spec. Example:

 # ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
	 --do newrule \
	 --json '{"family": 2, "dscp": 10, "dscp-mask": 63, "action": 1, "table": 1}'
 None
 $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
	 --dump getrule --json '{"family": 2}' --output-json | jq '.[]'
 [...]
 {
   "table": 1,
   "suppress-prefixlen": "0xffffffff",
   "protocol": 0,
   "priority": 32765,
   "dscp": 10,
   "dscp-mask": "0x3f",
   "family": 2,
   "dst-len": 0,
   "src-len": 0,
   "tos": 0,
   "action": "to-tbl",
   "flags": 0
 }
 [...]

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:48 -08:00
Ido Schimmel
ea8af1affd net: fib_rules: Enable DSCP mask usage
Allow user space to configure FIB rules that match on DSCP with a mask,
now that support has been added to the IPv4 and IPv6 address families.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:48 -08:00
Ido Schimmel
c29165c272 ipv6: fib_rules: Add DSCP mask matching
Extend IPv6 FIB rules to match on DSCP using a mask. Unlike IPv4, also
initialize the DSCP mask when a non-zero 'tos' is specified as there is
no difference in matching between 'tos' and 'dscp'. As a side effect,
this makes it possible to match on 'dscp 0', like in IPv4.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:48 -08:00
Ido Schimmel
2ae00699b3 ipv4: fib_rules: Add DSCP mask matching
Extend IPv4 FIB rules to match on DSCP using a mask. The mask is only
set in rules that match on DSCP (not TOS) and initialized to cover the
entire DSCP field if the mask attribute is not specified.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:47 -08:00
Ido Schimmel
ca4edd969a net: fib_rules: Add DSCP mask attribute
Add an attribute that allows matching on DSCP with a mask. Matching on
DSCP with a mask is needed in deployments where users encode path
information into certain bits of the DSCP field.

Temporarily set the type of the attribute to 'NLA_REJECT' while support
is being added.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:47 -08:00
Jakub Kicinski
e87700965a Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-02-20

We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).

The main changes are:

1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing

2) Add network TX timestamping support to BPF sock_ops, from Jason Xing

3) Add TX metadata Launch Time support, from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  igc: Add launch time support to XDP ZC
  igc: Refactor empty frame insertion for launch time support
  net: stmmac: Add launch time support to XDP ZC
  selftests/bpf: Add launch time request to xdp_hw_metadata
  xsk: Add launch time hardware offload support to XDP Tx metadata
  selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
  bpf: Support selective sampling for bpf timestamping
  bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
  net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
  bpf: Disable unsafe helpers in TX timestamping callbacks
  bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
  bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
  bpf: Add networking timestamping support to bpf_get/setsockopt()
  selftests/bpf: Add rto max for bpf_setsockopt test
  bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================

Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:59:47 -08:00
Ziwei Xiao
4b9c7d8fa1 gve: Add RSS cache for non RSS device option scenario
Not all the devices have the capability for the driver to query for the
registered RSS configuration. The driver can discover this by checking
the relevant device option during setup. If it cannot, the driver needs
to store the RSS config cache and directly return such cache when
queried by the ethtool. RSS config is inited when driver probes. Also the
default RSS config will be adjusted when there is RX queue count change.

At this point, only keys of GVE_RSS_KEY_SIZE and indirection tables of
GVE_RSS_INDIR_SIZE are supported.

Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250219200451.3348166-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:55:39 -08:00
Thorsten Blum
c451715d78 net/rds: Replace deprecated strncpy() with strscpy_pad()
strncpy() is deprecated for NUL-terminated destination buffers. Use
strscpy_pad() instead and remove the manual NUL-termination.

Compile-tested only.

Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Tested-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250219224730.73093-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:51:26 -08:00
Jakub Kicinski
376cd9a2ab Merge branch 'net-improve-netns-handling-in-rtnetlink'
Xiao Liang says:

====================
net: Improve netns handling in rtnetlink

This patch series includes some netns-related improvements and fixes for
rtnetlink, to make link creation more intuitive:

 1) Creating link in another net namespace doesn't conflict with link
    names in current one.
 2) Refector rtnetlink link creation. Create link in target namespace
    directly.

So that

  # ip link add netns ns1 link-netns ns2 tun0 type gre ...

will create tun0 in ns1, rather than create it in ns2 and move to ns1.
And don't conflict with another interface named "tun0" in current netns.

Patch 01 avoids link name conflict in different netns.

To achieve 2), there're mainly 3 steps:

 - Patch 02 packs newlink() parameters into a struct, including
   the original "src_net" along with more netns context. No semantic
   changes are introduced.
 - Patch 03 ~ 09 converts device drivers to use the explicit netns
   extracted from params.
 - Patch 10 ~ 11 removes the old netns parameter, and converts
   rtnetlink to create device in target netns directly.

Patch 12 ~ 13 adds some tests for link name and link netns.

---

Please note there're some issues found in current code:

- In amt_newlink() drivers/net/amt.c:

    amt->net = net;
    ...
    amt->stream_dev = dev_get_by_index(net, ...

  Uses net, but amt_lookup_upper_dev() only searches in dev_net.
  So the AMT device may not be properly deleted if it's in a different
  netns from lower dev.

- In lowpan_newlink() in net/ieee802154/6lowpan/core.c:

    wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK]));

  Looks for IFLA_LINK in dev_net, but in theory the ifindex is defined
  in link netns.

And thanks to Kuniyuki for fixing related issues in gtp and pfcp:
https://lore.kernel.org/netdev/20250110014754.33847-1-kuniyu@amazon.com/

v9: https://lore.kernel.org/20250210133002.883422-1-shaw.leon@gmail.com
v8: https://lore.kernel.org/20250113143719.7948-1-shaw.leon@gmail.com
v7: https://lore.kernel.org/20250104125732.17335-1-shaw.leon@gmail.com
v6: https://lore.kernel.org/20241218130909.2173-1-shaw.leon@gmail.com
v5: https://lore.kernel.org/20241209140151.231257-1-shaw.leon@gmail.com
v4: https://lore.kernel.org/20241118143244.1773-1-shaw.leon@gmail.com
v3: https://lore.kernel.org/20241113125715.150201-1-shaw.leon@gmail.com
v2: https://lore.kernel.org/20241107133004.7469-1-shaw.leon@gmail.com
v1: https://lore.kernel.org/20241023023146.372653-1-shaw.leon@gmail.com
====================

Link: https://patch.msgid.link/20250219125039.18024-1-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:07 -08:00
Xiao Liang
85cb3711ac selftests: net: Add test cases for link and peer netns
- Add test for creating link in another netns when a link of the same
   name and ifindex exists in current netns.
 - Add test to verify that link is created in target netns directly -
   no link new/del events should be generated in link netns or current
   netns.
 - Add test cases to verify that link-netns is set as expected for
   various drivers and combination of namespace-related parameters.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Link: https://patch.msgid.link/20250219125039.18024-14-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang
0303294162 selftests: net: Add python context manager for netns entering
Change netns of current thread and switch back on context exit.
For example:

    with NetNSEnter("ns1"):
        ip("link add dummy0 type dummy")

The command be executed in netns "ns1".

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Link: https://patch.msgid.link/20250219125039.18024-13-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang
7ca486d08a rtnetlink: Create link directly in target net namespace
Make rtnl_newlink_create() create device in target namespace directly.
Avoid extra netns change when link netns is provided.

Device drivers has been converted to be aware of link netns, that is not
assuming device netns is and link netns is the same when ops->newlink()
is called.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-12-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang
9c0fc091dc rtnetlink: Remove "net" from newlink params
Now that devices have been converted to use the specific netns instead
of ambiguous "net", let's remove it from newlink parameters.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-11-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang
5314e3d684 net: xfrm: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-10-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang
5e72ce3e39 net: ipv6: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-9-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
db014522f3 net: ipv6: Init tunnel link-netns before registering dev
Currently some IPv6 tunnel drivers set tnl->net to dev_net(dev) in
ndo_init(), which is called in register_netdevice(). However, it lacks
the context of link-netns when we enable cross-net tunnels at device
registration time.

Let's move the init of tunnel link-netns before register_netdevice().

ip6_gre has already initialized netns, so just remove the redundant
assignment.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-8-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
eacb116053 net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops
When link_net is set, use it as link netns instead of dev_net(). This
prepares for rtnetlink core to create device in target netns directly,
in which case the two namespaces may be different.

Convert common ip_tunnel_newlink() to accept an extra link netns
argument.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-7-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
9e17b2a1a0 net: ip_tunnel: Don't set tunnel->net in ip_tunnel_init()
ip_tunnel_init() is called from register_netdevice(). In all code paths
reaching here, tunnel->net should already have been set (either in
ip_tunnel_newlink() or __ip_tunnel_create()). So don't set it again.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-6-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
3533717581 ieee802154: 6lowpan: Validate link netns in newlink() of rtnl_link_ops
Device denoted by IFLA_LINK is in link_net (IFLA_LINK_NETNSID) or
source netns by design, but 6lowpan uses dev_net.

Note dev->netns_local is set to true and currently link_net is
implemented via a netns change. These together effectively reject
IFLA_LINK_NETNSID.

This patch adds a validation to ensure link_net is either NULL or
identical to dev_net. Thus it would be fine to continue using dev_net
when rtnetlink core begins to create devices directly in target netns.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-5-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
cf517ac16a net: Use link/peer netns in newlink() of rtnl_link_ops
Add two helper functions - rtnl_newlink_link_net() and
rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back
to link netns, and link netns falls back to source netns.

Convert the use of params->net in netdevice drivers to one of the helper
functions for clarity.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
69c7be1b90 rtnetlink: Pack newlink() params into struct
There are 4 net namespaces involved when creating links:

 - source netns - where the netlink socket resides,
 - target netns - where to put the device being created,
 - link netns - netns associated with the device (backend),
 - peer netns - netns of peer device.

Currently, two nets are passed to newlink() callback - "src_net"
parameter and "dev_net" (implicitly in net_device). They are set as
follows, depending on netlink attributes in the request.

 +------------+-------------------+---------+---------+
 | peer netns | IFLA_LINK_NETNSID | src_net | dev_net |
 +------------+-------------------+---------+---------+
 |            | absent            | source  | target  |
 | absent     +-------------------+---------+---------+
 |            | present           | link    | link    |
 +------------+-------------------+---------+---------+
 |            | absent            | peer    | target  |
 | present    +-------------------+---------+---------+
 |            | present           | peer    | link    |
 +------------+-------------------+---------+---------+

When IFLA_LINK_NETNSID is present, the device is created in link netns
first and then moved to target netns. This has some side effects,
including extra ifindex allocation, ifname validation and link events.
These could be avoided if we create it in target netns from
the beginning.

On the other hand, the meaning of src_net parameter is ambiguous. It
varies depending on how parameters are passed. It is the effective
link (or peer netns) by design, but some drivers ignore it and use
dev_net instead.

To provide more netns context for drivers, this patch packs existing
newlink() parameters, along with the source netns, link netns and peer
netns, into a struct. The old "src_net" is renamed to "net" to avoid
confusion with real source netns, and will be deprecated later. The use
of src_net are converted to params->net trivially.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:02 -08:00
Xiao Liang
ec061546c6 rtnetlink: Lookup device in target netns when creating link
When creating link, lookup for existing device in target net namespace
instead of current one.
For example, two links created by:

  # ip link add dummy1 type dummy
  # ip link add netns ns1 dummy1 type dummy

should have no conflict since they are in different namespaces.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-2-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:01 -08:00
Jakub Kicinski
4fe67dd2d5 Merge branch 'dt-bindings-net-realtek-rtl9301-switch'
Chris Packham says:

====================
dt-bindings: net: realtek,rtl9301-switch (schema part)

This is my attempt at trying to sort out the mess I've created with the RTL9300
switch dt-bindings. Some context is available on [1] and [2].

The first patch just moves the binding from mfd/ to net/ (with an adjustment of
the internal path name). The next two patches are successors to patches already
sent as part of the series [3].

[1] - https://lore.kernel.org/lkml/20250204-eccentric-deer-of-felicity-02b7ee@krzk-bin/
[2] - https://lore.kernel.org/lkml/4e3c5d83-d215-4eff-bf02-6d420592df8f@alliedtelesis.co.nz/
[3] - https://lore.kernel.org/lkml/20250204030249.1965444-1-chris.packham@alliedtelesis.co.nz/
====================

Link: https://patch.msgid.link/20250218195216.1034220-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:08:05 -08:00
Chris Packham
96757457da dt-bindings: net: Add Realtek MDIO controller
Add dtschema for the MDIO controller found in the RTL9300 Ethernet
switch. The controller is slightly unusual in that direct MDIO
communication is not possible. We model the MDIO controller with the
MDIO buses as child nodes and the PHYs as children of the buses. The
mapping of switch port number to MDIO bus/addr requires the
ethernet-ports sibling to provide the mapping via the phy-handle
property.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-4-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:07:15 -08:00
Chris Packham
92575a2182 dt-bindings: net: Add switch ports and interrupts to RTL9300
Add bindings for the ethernet-switch and interrupt properties for the
RTL9300.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:07:15 -08:00
Chris Packham
3fa337651d dt-bindings: net: Move realtek,rtl9301-switch to net
Initially realtek,rtl9301-switch was placed under mfd/ because it had
some non-switch related blocks (specifically i2c and reset) but with a
bit more review it has become apparent that this was wrong and the
binding should live under net/.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250218195216.1034220-2-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:07:15 -08:00
Birger Koblitz
a850355610 net: sfp: add quirk for 2.5G OEM BX SFP
The OEM SFP-2.5G-BX10-D/U SFP module pair is meant to operate with
2500Base-X. However, in their EEPROM they incorrectly specify:
Transceiver codes   : 0x00 0x12 0x00 0x00 0x12 0x00 0x01 0x05 0x00
BR, Nominal         : 2500MBd

Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.
Tested on BananaPi R3.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20250218-b4-lkmsub-v1-1-1e51dcabed90@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:04:56 -08:00
Heiner Kallweit
bb3bb6c92e net: phy: remove unused feature array declarations
After 12d5151be0 ("net: phy: remove leftovers from switch to linkmode
bitmaps") the following declarations are unused and can be removed too.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/b2883c75-4108-48f2-ab73-e81647262bc2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 18:16:21 -08:00
Jakub Kicinski
56b06a71fc Merge branch 'selftests-drv-net-improve-the-queue-test-for-xsk'
Jakub Kicinski says:

====================
selftests: drv-net: improve the queue test for XSK

We see some flakes in the the XSK test:

   Exception| Traceback (most recent call last):
   Exception|   File "/home/virtme/testing-18/tools/testing/selftests/net/lib/py/ksft.py", line 218, in ksft_run
   Exception|     case(*args)
   Exception|   File "/home/virtme/testing-18/tools/testing/selftests/drivers/net/./queues.py", line 53, in check_xdp
   Exception|     ksft_eq(q['xsk'], {})
   Exception| KeyError: 'xsk'

I think it's because the method of running the helper in the background
is racy. Add more solid infra for waiting for a background helper to be
initialized.

v1: https://lore.kernel.org/20250218195048.74692-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250219234956.520599-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
932a9249f7 selftests: drv-net: rename queues check_xdp to check_xsk
The test is for AF_XDP, we refer to AF_XDP as XSK.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
4fde839846 selftests: drv-net: improve the use of ksft helpers in XSK queue test
Avoid exceptions when xsk attr is not present, and add a proper ksft
helper for "not in" condition.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
7147713799 selftests: drv-net: add a way to wait for a local process
We use wait_port_listen() extensively to wait for a process
we spawned to be ready. Not all processes will open listening
sockets. Add a method of explicitly waiting for a child to
be ready. Pass a FD to the spawned process and wait for it
to write a message to us. FD number is passed via KSFT_READY_FD
env variable.

Similarly use KSFT_WAIT_FD to let the child process for a sign
that we are done and child should exit. Sending a signal to
a child with shell=True can get tricky.

Make use of this method in the queues test to make it less flaky.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
d3726ab45c selftests: drv-net: probe for AF_XDP sockets more explicitly
Separate the support check from socket binding for easier refactoring.
Use: ./helper - - just to probe if we can open the socket.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
bab59dcf71 selftests: drv-net: add missing new line in xdp_helper
Kurt and Joe report missing new line at the end of Usage.

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
dabd31baa3 selftests: drv-net: use cfg.rpath() in netlink xsk attr test
The cfg.rpath() helper was been recently added to make formatting
paths for helper binaries easier.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski
846742f7e3 selftests: drv-net: add a warning for bkg + shell + terminate
Joe Damato reports that some shells will fork before running
the command when python does "sh -c $cmd", while bash on my
machine does an exec of $cmd directly.

This will have implications for our ability to terminate
the child process on various configurations of bash and
other shells. Warn about using

	bkg(... shell=True, termininate=True)

most background commands can hopefully exit cleanly (exit_wait).

Link: https://lore.kernel.org/Z7Yld21sv_Ip3gQx@LQ3V64L9R2
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:57:29 -08:00
Arnd Bergmann
ca57d1c56f octeontx2: hide unused label
A previous patch introduces a build-time warning when CONFIG_DCB
is disabled:

drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c: In function 'otx2_probe':
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:3217:1: error: label 'err_free_zc_bmap' defined but not used [-Werror=unused-label]
drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c: In function 'otx2vf_probe':
drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c:740:1: error: label 'err_free_zc_bmap' defined but not used [-Werror=unused-label]

Add the same #ifdef check around it.

Fixes: efabce2901 ("octeontx2-pf: AF_XDP zero copy receive support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Suman Ghosh <sumang@marvell.com>
Link: https://patch.msgid.link/20250219162239.1376865-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:38:02 -08:00
Charalampos Mitrodimas
8279a8dacf net: phy: qt2025: Fix hardware revision check comment
Correct the hardware revision check comment in the QT2025 driver. The
revision value was documented as 0x3b instead of the correct 0xb3,
which matches the actual comparison logic in the code.

Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Link: https://patch.msgid.link/20250219-qt2025-comment-fix-v2-1-029f67696516@posteo.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:37:32 -08:00
Jakub Kicinski
ac8f0aff41 Merge branch 'mlx5-misc-enhancements-2025-02-19'
Tariq Toukan says:

====================
mlx5 misc enhancements 2025-02-19

This small series enhances the mlx5 ethtool link speed code (no
functional change), in addition to a Kconfig description enhancement.
====================

Link: https://patch.msgid.link/20250219114112.403808-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:12 -08:00
Shahar Shitrit
9c362aafda net/mlx5e: Separate extended link modes request from link modes type selection
The function ext_requested() serves two distinct purposes: it checks
if extended link modes were requested, and it selects whether to use
extended or legacy link modes.

This change separates these two purposes. Now, ext_link_mode_requested()
is used directly for checking if extended link modes are requested,
while the selection of extended modes is handled independently based on
the autonegotiation status.

By making this distinction, the logic for determining whether to select
extended or legacy link modes is clearer.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219114112.403808-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:36:09 -08:00