Some distributions may not enable MPTCP by default. All other MPTCP tests
source mptcp_lib.sh to ensure MPTCP is enabled before testing. However,
the ip_local_port_range test is the only one that does not include this
step.
Let's also ensure MPTCP is enabled in netns for ip_local_port_range so
that it passes on all distributions.
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250224094013.13159-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Although the PCI device ID and Vendor ID for the RPM (MAC) block
have remained the same across Octeon CN10K and the next-generation
CN20K silicon, Hardware architecture has changed (NIX mapped RPMs
and RFOE Mapped RPMs).
Add PCI Subsystem IDs to the device table to ensure that this driver
can be probed from NIX mapped RPM devices only.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250224035603.1220913-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mohsin Bashir says:
====================
eth: fbnic: Update fbnic driver
This patchset makes following trivial changes to the fbnic driver:
1) Add coverage for PCIe CSRs in the ethtool register dump.
2) Consolidate the PUL_USER CSR section, update the end boundary,
and remove redundant definition of the end boundary.
3) Update the return value in kdoc for fbnic_netdev_alloc().
====================
Link: https://patch.msgid.link/20250221201813.2688052-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move PUL_USER CSRs in the relevant section, update the end boundary
address, and remove the redundant definition of end boundary.
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Provide coverage to PCIe registers in ethtool register dump
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Matthieu Baerts says:
====================
mptcp: pm: misc cleanups, part 3
These cleanups lead the way to the unification of the path-manager
interfaces, and allow future extensions. The following patches are not
all linked to each others, but are all related to the path-managers,
except the last three.
- Patch 1: remove unused returned value in mptcp_nl_set_flags().
- Patch 2: new flag: avoid iterating over all connections if not needed.
- Patch 3: add a build check making sure there is enough space in cb-ctx.
- Patch 4: new mptcp_pm_genl_fill_addr helper to reduce duplicated code.
- Patch 5: simplify userspace_pm_append_new_local_addr helper.
- Patch 6: drop unneeded inet6_sk().
- Patch 7: use ipv6_addr_equal() instead of !ipv6_addr_cmp()
- Patch 8: scheduler: split an interface in two.
- Patch 9: scheduler: save 64 bytes of currently unused data.
- Patch 10: small optimisation to exit early in case of retransmissions.
====================
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-0-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks for the previous commit ("mptcp: sched: split get_subflow
interface into two"), the mptcp_sched_data structure is now currently
unused.
This structure has been added to allow future extensions that are not
ready yet. At the end, this structure will not even be used at all when
mptcp_subflow bpf_iter will be supported [1].
Here is a first step to save 64 bytes on the stack for each scheduling
operation. The structure is not removed yet not to break the WIP work on
these extensions, but will be done when [1] will be ready and applied.
Link: https://lore.kernel.org/6645ad6e-8874-44c5-8730-854c30673218@linux.dev [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-9-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
get_retrans() interface of the burst packet scheduler invokes a sleeping
function mptcp_pm_subflow_chk_stale(), which calls __lock_sock_fast().
So get_retrans() interface should be set with BPF_F_SLEEPABLE flag in
BPF. But get_send() interface of this scheduler can't be set with
BPF_F_SLEEPABLE flag since it's invoked in ack_update_msk() under mptcp
data lock.
So this patch has to split get_subflow() interface of packet scheduer into
two interfaces: get_send() and get_retrans(). Then we can set get_retrans()
interface alone with BPF_F_SLEEPABLE flag.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-8-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The variable 'match' in mptcp_userspace_pm_append_new_local_addr() is a
redundant one, and this patch drops it.
No need to define 'match' as 'struct mptcp_pm_addr_entry *' type. In this
function, it's only used to check whether it's NULL. It can be defined as
a Boolean one.
Also other variables 'addr_match' and 'id_match' make 'match' a redundant
one, which can be replaced by directly checking 'addr_match && id_match'.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-5-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If an endpoint doesn't have the 'subflow' flag -- in fact, has no type,
so not 'subflow', 'signal', nor 'implicit' -- there are then no subflows
created from this local endpoint to at least the initial destination
address. In this case, no need to call mptcp_pm_nl_fullmesh() which is
there to recreate the subflows to reflect the new value of the fullmesh
attribute.
Similarly, there is then no need to iterate over all connections to do
nothing, if only the 'fullmesh' flag has been changed, and the endpoint
doesn't have the 'subflow' one. So stop early when dealing with this
specific case.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-2-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan says:
====================
net/mlx5e: Move IPSec policy check after decryption
This series by Jianbo adds IPsec policy check after decryption.
In current mlx5 driver, the policy check is done before decryption for
IPSec crypto and packet offload. This series changes that order to
make it consistent with the processing in kernel xfrm. Besides, RX
state with UPSPEC selector is supported correctly after new steering
table is added after decryption and before the policy check.
====================
Link: https://patch.msgid.link/20250220213959.504304-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, the upper layer matches are added for the decryption rule
when xfrm selector's UPSPEC is specified in the command. However, it's
impossible as packets are not decrypted, and there is no way to do
match on the upper protocol (TCP/UDP) with specific source/destination
port. The result is that packets are not decrypted by hardware because
of this mismatch. Instead, they are forwarded to kernel, and
decryption is done by software.
To resolve this issue, this patch adds new table (sa_sel) after status
table and before policy table. When UPSPEC's proto is specified in
xfrm state's selector, a rule is added in status table to forward the
decrypted packets to sa_sel table, where the corresponding rule for
selector's UPSPEC is added, and packet's upper headers are checked
there. If matched, they will be forward to policy table to do policy
check. Otherwise, they are dropped immediately.
Besides, add a global count for this kind of packet drop.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For crypto offload, there is no xfrm policy rule offloaded to
hardware, so no need to continue with policy check for it.
Previously, for crypto offload, the hardware metadata reg c4 is not
used and not changed, but set to ASO_OK(0) before decryption to avoid
garbage data. Then a default rule is added to check ipsec_syndrome and
this register. Packets are forwarded to policy table if succeed, or
drop if fails.
According to hardware document, this register value could be 0, 1.
So a special value (0xAA), which is not used by hardware, is chosen as
an indication for crypto offload. It is set to c4 before decryption.
Then a default rule, which matches on 0xAA (and ipsec_syndrome on 0),
is added, which means packets are done by crypto offload, and
sends them to kernel directly, thus skips the policy check.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, xfrm policy check is done before decryption in mlx5 driver.
If matching any policy, packets are forwarded to xfrm state table for
decryption. But this is exact opposite to what software does. For
kernel implementation, xfrm decode is unconditionally activated
whenever an IPSec packet reaches the input flow if there’s a matching
state rule.
This patch changes the order, move policy check after decryption.
Besides, a miss flow table is added at the end for legacy mode, to
make it easier to update the default destination of the steering rules.
So ESP packets are firstly forwarded to SA table for decryption, then
the result is checked in status table. If the decryption succeeds,
packets are forwarded to another table to check xfrm policy rules.
When a policy with allow action is matched, if in legacy mode packets
are forwarded to miss flow table with one rule to forward them to RoCE
tables, if in switchdev mode they are forwarded directly to TC root
chain instead.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In commit dddb49b63d ("net/mlx5e: Add IPsec and ASO syndromes check
in HW"), IPSec and ASO syndromes checks after decryption for the
specified ASO object were added. But they are correct only for eswith
in legacy mode. For switchdev mode, metadata register c1 is used to
save the mapped id (not ASO object id). So, need to change the match
accordingly for the check rules in status table.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For eswitch in legacy mode, the packets decrypted in RX SA table will
continue to be processed for RoCE. But this is not necessary for the
un-decrypted packets, which don't match any decryption rules but hit
the miss rule at the end of the table. So, change the destination of
miss rule to TTC default one and skip RoCE.
For eswitch in switchdev mode, the destination is unchanged.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250220213959.504304-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a new 40/25/10 Gigabit Ethernet device.
To support basic functions, PHYLINK is temporarily skipped as it is
intended to implement these configurations in the firmware. And the
associated link IRQ is also skipped.
And Implement the new SW-FW interaction interface, which use 64 Byte
message buffer.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250221065718.197544-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a shadow variable warning in net/core/dev.c when compiled with
CONFIG_LOCKDEP enabled. The warning occurs because 'dev' is redeclared
inside the while loop, shadowing the outer scope declaration.
net/core/dev.c:11211:22: warning: declaration shadows a local variable [-Wshadow]
struct net_device *dev = list_first_entry(&unlink_list,
net/core/dev.c:11202:21: note: previous declaration is here
struct net_device *dev, *tmp;
Remove the redundant declaration since the variable is already defined
in the outer scope and will be overwritten in the subsequent
list_for_each_entry_safe() loop anyway.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250221-netcons_fix_shadow-v1-1-dee20c8658dd@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: stmmac: thead: clean up clock rate setting
This series cleans up the thead clock rate setting to use the
rgmii_clock() helper function added to phylib.
The first patch switches over to using the rgmii_clock() helper,
and the second patch cleans up the verification that the desired
clock rate is achievable, allowing the private clock rate
definitions to be removed.
====================
Tested-by: Drew Fustini <drew@pdp7.com>
Link: https://patch.msgid.link/Z7iKdaCp4hLWWgJ2@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
thead was checking that the stmmac_clk rate was a multiple of the
RGMII rates for 1G and 100M, but didn't check for 10M. Rather than
use this with hard-coded speeds, check that the calculated divisor
gives the required rate by multplying the transmit clock rate back
up to the stmmac clock rate and checking that it agrees.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Drew Fustini <drew@pdp7.com>
Link: https://patch.msgid.link/E1tlToD-004W3g-HB@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
modprobe dummy dumdummies=1
Old behavior :
$ cat /sys/class/net/dummy0/carrier
cat: /sys/class/net/dummy0/carrier: Invalid argument
After blamed commit, an empty string is reported.
$ cat /sys/class/net/dummy0/carrier
$
In this commit, I restore the old behavior for carrier,
speed and duplex attributes.
Fixes: 79c61899b5 ("net-sysfs: remove rtnl_trylock from device attributes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Leogrande <leogrande@google.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250221051223.576726-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are several issues existed in start_xmit():
- Transmitted packets need to be freed before sending a packet, this
introduces delay and increases the average packets transmit
time. This also increase the time that spent in holding the TX lock.
- Notification is enabled after free_old_xmit_skbs() which will
introduce unnecessary interrupts if TX notification happens on the
same CPU that is doing the transmission now (actually, virtio-net
driver are optimized for this case).
So this patch tries to avoid those issues by not cleaning transmitted
packets in start_xmit() when TX NAPI is enabled and disable
notifications even more aggressively. Notification will be since the
beginning of the start_xmit(). But we can't enable delayed
notification after TX is stopped as we will lose the
notifications. Instead, the delayed notification needs is enabled
after the virtqueue is kicked for best performance.
Performance numbers:
1) single queue 2 vcpus guest with pktgen_sample03_burst_single_flow.sh
(burst 256) + testpmd (rxonly) on the host:
- When pinning TX IRQ to pktgen VCPU: split virtqueue PPS were
increased 55% from 6.89 Mpps to 10.7 Mpps and 32% TX interrupts were
eliminated. Packed virtqueue PPS were increased 50% from 7.09 Mpps to
10.7 Mpps, 99% TX interrupts were eliminated.
- When pinning TX IRQ to VCPU other than pktgen: split virtqueue PPS
were increased 96% from 5.29 Mpps to 10.4 Mpps and 45% TX interrupts
were eliminated; Packed virtqueue PPS were increased 78% from 6.12
Mpps to 10.9 Mpps and 99% TX interrupts were eliminated.
2) single queue 1 vcpu guest + vhost-net/TAP on the host: single
session netperf from guest to host shows 82% improvement from
31Gb/s to 58Gb/s, %stddev were reduced from 34.5% to 1.9% and 88%
of TX interrupts were eliminated.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an implementation for DMTF DSP0283, which defines a MCTP-over-USB
transport. As per that spec, we're restricted to full speed mode,
requiring 512-byte transfers.
Each MCTP-over-USB interface is a peer-to-peer link to a single MCTP
endpoint, so no physical addressing is required (of course, that MCTP
endpoint may then bridge to further MCTP endpoints). Consequently,
interfaces will report with no lladdr data:
# mctp link
dev lo index 1 address 00:00:00:00:00:00 net 1 mtu 65536 up
dev mctpusb0 index 6 address none net 1 mtu 68 up
This is a simple initial implementation, with single rx & tx urbs, and
no multi-packet tx transfers - although we do accept multi-packet rx
from the device.
Includes suggested fixes from Santosh Puranik <spuranik@nvidia.com>.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Cc: Santosh Puranik <spuranik@nvidia.com>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-2-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>