Ido Schimmel says:
====================
ipv4: icmp: Fix source IP derivation in presence of VRFs
Align IPv4 with IPv6 and in the presence of VRFs generate ICMP error
messages with a source IP that is derived from the receiving interface
and not from its VRF master. This is especially important when the error
messages are "Time Exceeded" messages as it means that utilities like
traceroute will show an incorrect packet path.
Patches #1-#2 are preparations.
Patch #3 is the actual change.
Patches #4-#7 make small improvements in the existing traceroute test.
Patch #8 extends the traceroute test with VRF test cases for both IPv4
and IPv6.
Changes since v1 [1]:
* Rebase.
[1] https://lore.kernel.org/netdev/20250901083027.183468-1-idosch@nvidia.com/
====================
Link: https://patch.msgid.link/20250908073238.119240-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Create versions of the existing test cases where the routers generating
the ICMP error messages are using VRFs. Check that the source IPs of
these messages do not change in the presence of VRFs.
IPv6 always behaved correctly, but IPv4 fails when reverting "ipv4:
icmp: Fix source IP derivation in presence of VRFs".
Without IPv4 change:
# ./traceroute.sh
TEST: IPv6 traceroute [ OK ]
TEST: IPv6 traceroute with VRF [ OK ]
TEST: IPv4 traceroute [ OK ]
TEST: IPv4 traceroute with VRF [FAIL]
traceroute did not return 1.0.3.1
$ echo $?
1
The test fails because the ICMP error message is sent with the VRF
device's IP (1.0.4.1):
# traceroute -n -s 1.0.1.3 1.0.2.4
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.4.1 0.165 ms 0.110 ms 0.103 ms
2 1.0.2.4 0.098 ms 0.085 ms 0.078 ms
# traceroute -n -s 1.0.3.3 1.0.2.4
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.4.1 0.201 ms 0.138 ms 0.129 ms
2 1.0.2.4 0.123 ms 0.105 ms 0.098 ms
With IPv4 change:
# ./traceroute.sh
TEST: IPv6 traceroute [ OK ]
TEST: IPv6 traceroute with VRF [ OK ]
TEST: IPv4 traceroute [ OK ]
TEST: IPv4 traceroute with VRF [ OK ]
$ echo $?
0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-9-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When generating ICMP error messages, the kernel will prefer a source IP
that is on the same subnet as the destination IP (see
inet_select_addr()). Test this behavior by invoking traceroute with
different source IPs and checking that the ICMP error message is
generated with a source IP in the same subnet.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-8-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Both of the addresses are configured as primary addresses, but the
kernel is expected to choose 10.0.1.1/24 as the source IP of the ICMP
error message since it is on the same subnet as the destination IP of
the message (10.0.1.3/24). Reword the comment to reflect that.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-7-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Use require_command() so that the test will return SKIP (4) when a
required command is not present.
Before:
# ./traceroute.sh
SKIP: Could not run IPV6 test without traceroute6
SKIP: Could not run IPV4 test without traceroute
$ echo $?
0
After:
# ./traceroute.sh
TEST: traceroute6 not installed [SKIP]
$ echo $?
4
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-6-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test always returns success even if some tests were modified to
fail. Fix by converting the test to use the appropriate library
functions instead of using its own functions.
Before:
# ./traceroute.sh
TEST: IPV6 traceroute [FAIL]
TEST: IPV4 traceroute [ OK ]
Tests passed: 1
Tests failed: 1
$ echo $?
0
After:
# ./traceroute.sh
TEST: IPv6 traceroute [FAIL]
traceroute6 did not return 2000:102::2
TEST: IPv4 traceroute [ OK ]
$ echo $?
1
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-5-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the "icmp_errors_use_inbound_ifaddr" sysctl is enabled, the source
IP of ICMP error messages should be the "primary address of the
interface that received the packet that caused the icmp error".
The IPv4 ICMP code determines this interface using inet_iif() which in
the input path translates to skb->skb_iif. If the interface that
received the packet is a VRF port, skb->skb_iif will contain the ifindex
of the VRF device and not that of the receiving interface. This is
because in the input path the VRF driver overrides skb->skb_iif with the
ifindex of the VRF device itself (see vrf_ip_rcv()).
As such, the source IP that will be chosen for the ICMP error message is
either an address assigned to the VRF device itself (if present) or an
address assigned to some VRF port, not necessarily the input or output
interface.
This behavior is especially problematic when the error messages are
"Time Exceeded" messages as it means that utilities like traceroute will
show an incorrect packet path.
Solve this by determining the input interface based on the iif field in
the control block, if present. This field is set in the input path to
skb->skb_iif and is not later overridden by the VRF driver, unlike
skb->skb_iif.
This behavior is consistent with the IPv6 counterpart that already uses
the iif from the control block.
Reported-by: Andy Roulin <aroulin@nvidia.com>
Reported-by: Rajkumar Srinivasan <rajsrinivasa@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-4-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
__icmp_send() is used to generate ICMP error messages in response to
various situations such as MTU errors (i.e., "Fragmentation Required")
and too many hops (i.e., "Time Exceeded").
The skb that generated the error does not necessarily come from the IPv4
layer and does not always have a valid IPv4 control block in skb->cb.
Therefore, commit 9ef6b42ad6 ("net: Add __icmp_send helper.") changed
the function to take the IP options structure as argument instead of
deriving it from the skb's control block. Some callers of this function
such as icmp_send() pass the IP options structure from the skb's control
block as in these call paths the control block is known to be valid, but
other callers simply pass a zeroed structure.
A subsequent patch will need __icmp_send() to access more information
from the IPv4 control block (specifically, the ifindex of the input
interface). As a preparation for this change, change the function to
take the IPv4 control block structure as an argument instead of the IP
options structure. This makes the function similar to its IPv6
counterpart that already takes the IPv6 control block structure as an
argument.
No functional changes intended.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250908073238.119240-3-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When __ip_options_compile() is called with an skb, the IP options are
parsed from the skb data into the provided IP option argument. This is
in contrast to the case where the skb argument is NULL and the options
are parsed from opt->__data.
Given that cipso_v4_error() always passes an skb to
__ip_options_compile(), there is no need to allocate an extra 40 bytes
(maximum IP options size).
Therefore, simplify the function by removing these extra bytes and make
the function similar to ipv4_send_dest_unreach() which also calls both
__ip_options_compile() and __icmp_send().
This is a preparation for changing the arguments being passed to
__icmp_send().
No functional changes intended.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250908073238.119240-2-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski says:
====================
net: xdp: handle frags with unreadable memory
Make XDP helpers compatible with unreadable memory. This is very
similar to how we handle pfmemalloc frags today. Record the info
in xdp_buf flags as frags get added and then update the skb once
allocated.
This series adds the unreadable memory metadata tracking to drivers
using xdp_build_skb_from*() with no changes on the driver side - hence
the only driver changes here are refactoring. Obviously, unreadable memory
is incompatible with XDP today, but thanks to xdp_build_skb_from_buf()
increasing number of drivers have a unified datapath, whether XDP is
enabled or not.
RFC: https://lore.kernel.org/20250812161528.835855-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250905221539.2930285-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
xdp_update_skb_shared_info() needs to update skb state which
was maintained in xdp_buff / frame. Pass full flags into it,
instead of breaking it out bit by bit. We will need to add
a bit for unreadable frags (even tho XDP doesn't support
those the driver paths may be common), at which point almost
all call sites would become:
xdp_update_skb_shared_info(skb, num_frags,
sinfo->xdp_frags_size,
MY_PAGE_SIZE * num_frags,
xdp_buff_is_frag_pfmemalloc(xdp),
xdp_buff_is_frag_unreadable(xdp));
Keep a helper for accessing the flags, in case we need to
transform them somehow in the future (e.g. to cover up xdp_buff
vs xdp_frame differences).
While we are touching call callers - rename the helper to
xdp_update_skb_frags_info(), previous name may have implied that
it's shinfo that's updated. We are updating flags in struct sk_buff
based on frags that got attched.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20250905221539.2930285-2-kuba@kernel.org
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There are currently no kernel tests that verify setting and getting
options of the team driver.
In the future, options may be added that implicitly change other
options, which will make it useful to have tests like these that show
nothing breaks. There will be a follow up patch to this that adds new
"rx_enabled" and "tx_enabled" options, which will implicitly affect the
"enabled" option value and vice versa.
The tests use teamnl to first set options to specific values and then
gets them to compare to the set values.
Signed-off-by: Marc Harvey <marcharvey@google.com>
Link: https://patch.msgid.link/20250905040441.2679296-1-marcharvey@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Matthieu Baerts says:
====================
tools: ynl: fix errors reported by Ruff
When looking at the YNL code to add a new feature, my text editor
automatically executed 'ruff check', and found out at least one
interesting error: one variable was used while not being defined.
I then decided to fix this error, and all the other ones reported by
Ruff. After this series, 'ruff check' reports no more errors with
version 0.12.12.
====================
Link: https://patch.msgid.link/20250909-net-next-ynl-ruff-v1-0-238c2bccdd99@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 88e1510 PHY has an erratum where the phy downshift counter is not
cleared after phy being suspended(BMCR_PDOWN set) and then later
resumed(BMCR_PDOWN cleared). This can cause the gigabit link to
intermittently downshift to a lower speed.
Disabling and re-enabling the downshift feature clears the counter,
allowing the PHY to retry gigabit link negotiation up to the programmed
retry count times before downshifting. This behavior has been observed
on copper links.
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250906-marvell_fix-v2-1-f6efb286937f@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Fang says:
====================
ptp: add pulse signal loopback support for debugging
Some PTP devices support looping back the periodic pulse signal for
debugging. For example, the PTP device of QorIQ platform and the NETC v4
Timer has the ability to loop back the pulse signal and record the extts
events for the loopback signal. So we can make sure that the pulse
intervals and their phase alignment are correct from the perspective of
the emitting PHC's time base. In addition, we can use this loopback
feature as a built-in extts event generator when we have no external
equipment which does that. Therefore, add the generic debugfs interfaces
to the ptp_clock driver. The first two patch are separated from the
previous patch set [1]. The third patch is new added.
[1]: https://lore.kernel.org/imx/20250827063332.1217664-1-wei.fang@nxp.com/ #patch 3 and 9
====================
Link: https://patch.msgid.link/20250905030711.1509648-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NETC Timer supports looping back the output pulse signal of Fiper-n
into Trigger-n input, so that users can leverage this feature to validate
some other features without external hardware support. For example, users
can use it to test external trigger stamp (EXTTS). And users can combine
EXTTS with loopback mode to check whether the generation time of PPS is
aligned with an integral second of PHC, or the periodic output signal
(PTP_CLK_REQ_PEROUT) whether is generated at the specified time.
Since ptp_clock_info::perout_loopback() has been added to the ptp_clock
driver as a generic interface to enable or disable the periodic output
signal loopback, therefore, netc_timer_perout_loopback() is added as a
callback of ptp_clock_info::perout_loopback().
Test the generation time of PPS event:
$ echo 0 1 > /sys/kernel/debug/ptp0/perout_loopback
$ echo 1 > /sys/class/ptp/ptp0/pps_enable
$ testptp -d /dev/ptp0 -e 3
external time stamp request okay
event index 0 at 63.000000017
event index 0 at 64.000000017
event index 0 at 65.000000017
Test the generation time of the periodic output signal:
$ echo 0 1 > /sys/kernel/debug/ptp0/perout_loopback
$ echo 0 150 0 1 500000000 > /sys/class/ptp/ptp0/period
$ testptp -d /dev/ptp0 -e 3
external time stamp request okay
event index 0 at 150.000000014
event index 0 at 151.500000015
event index 0 at 153.000000014
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250905030711.1509648-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For some PTP devices, they have the capability to loop back the periodic
output signal for debugging, such as the ptp_qoriq device. So add the
generic interfaces to set the periodic output signal loopback, rather
than each vendor having a different implementation.
Show how many channels support the periodic output signal loopback:
$ cat /sys/kernel/debug/ptp<N>/n_perout_loopback
Enable the loopback of the periodic output signal of channel X:
$ echo <X> 1 > /sys/kernel/debug/ptp<N>/perout_loopback
Disable the loopback of the periodic output signal of channel X:
$ echo <X> 0 > /sys/kernel/debug/ptp<N>/perout_loopback
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250905030711.1509648-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This ethtool counter is meant to help with observing how many times the
congestion event was triggered but on query there was no state change.
This would help to indicate when a work item was scheduled to run too
late and in the meantime the congestion state changed back to previous
state.
While at it, do a driveby typo fix in documentation for
pci_bw_inbound_high.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1757237976-531416-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
devlink, mlx5: Add new parameters for link management and SRIOV/eSwitch configurations [part]
This patch series introduces several devlink parameters improving device
configuration capabilities, link management, and SRIOV/eSwitch, by adding
NV config boot time parameters.
Implement the following parameters:
a) total_vfs Parameter:
-----------------------
Adds support for managing the number of VFs (total_vfs) and enabling
SR-IOV (enable_sriov for mlx5) through devlink. These additions enhance
user control over virtualization features directly from standard kernel
interfaces without relying on additional external tools. total_vfs
functionality is critical for environments that require flexible num VF
configuration.
b) CQE Compression Type:
------------------------
Introduces a new devlink parameter, cqe_compress_type, to configure the
rate of CQE compression based on PCIe bus conditions. This setting
provides a balance between compression efficiency and overall NIC
performance under different traffic loads.
====================
Link: https://patch.msgid.link/20250907012953.301746-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some devices support both symmetric (same value for all PFs) and
asymmetric, while others only support symmetric configuration. This
implementation prefers asymmetric, since it is closer to the devlink
model (per function settings), but falls back to symmetric when needed.
Example usage:
devlink dev param set pci/0000:01:00.0 name total_vfs value <u16> cmode permanent
devlink dev reload pci/0000:01:00.0 action fw_activate
echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove
echo 1 >/sys/bus/pci/rescan
cat /sys/bus/pci/devices/0000:01:00.0/sriov_totalvfs
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selects which algorithm should be used by the NIC in order to decide rate of
CQE compression dependeng on PCIe bus conditions.
Supported values:
1) balanced, merges fewer CQEs, resulting in a moderate compression ratio
but maintaining a balance between bandwidth savings and performance
2) aggressive, merges more CQEs into a single entry, achieving a higher
compression rate and maximizing performance, particularly under high
traffic loads.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NICs are typically configured with total_vfs=0, forcing users to rely
on external tools to enable SR-IOV (a widely used and essential feature).
Add total_vfs parameter to devlink for SR-IOV max VF configurability.
Enables standard kernel tools to manage SR-IOV, addressing the need for
flexible VF configuration.
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: make ADD_ADDR retransmission timeout adaptive
Currently, the MPTCP ADD_ADDR notifications are retransmitted after a
fixed timeout controlled by the net.mptcp.add_addr_timeout sysctl knob,
if the corresponding "echo" packets are not received before. This can be
too slow (or too quick), especially with a too cautious default value
set to 2 minutes.
- Patch 1: make ADD_ADDR retransmission timeout adaptive, using the
TCP's retransmission timeout. The corresponding sysctl knob is now
used as a maximum value.
- Patch 2: now that these ADD_ADDR retransmissions can happen faster,
all MPTCP Join subtests checking ADD_ADDR counters accept more
ADD_ADDR than expected (if any). This is aligned with the previous
behaviour, when the ADD_ADDR RTO was lowered down to 1 second.
- Patch 3: Some CIs have reported that some MPTCP Join signalling tests
were unstable. It seems that it is due to the time it can take in slow
environments to send a bunch of ADD_ADDR notifications and wait each
time for their echo reply. Use a longer transfer to avoid such errors.
v1: https://lore.kernel.org/d5397026-92eb-4a43-9534-954b43ab9305@kernel.org
====================
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-0-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.
To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received, except
if no ADD_ADDR are expected.
Before the parent commit, the tolerance was present for each tests where
the ADD_ADDR could be retransmitted in a reasonable time (1 sec). Now
that all tests can have retransmitted ADD_ADDR, it is normal to apply
the same tolerance for all tests.
An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to many tests, it seems better to increase the tolerance.
Later, we could add a new MIB counter to identify the ADD_ADDR
retransmissions, and remove the tolerance when this counter is
available.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-2-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the ADD_ADDR option is retransmitted with a fixed timeout. This
patch makes the retransmission timeout adaptive by using the maximum RTO
among all the subflows, while still capping it at the configured maximum
value (add_addr_timeout_max). This improves responsiveness when
establishing new subflows.
Specifically:
1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive
timeout.
2. Uses maximum subflow RTO (icsk_rto) when available.
3. Applies exponential backoff based on retransmission count.
4. Maintains fallback to configured max timeout when no RTO data exists.
This slightly changes the behaviour of the MPTCP "add_addr_timeout"
sysctl knob to be used as a maximum instead of a fixed value. But this
is seen as an improvement: the ADD_ADDR might be sent quicker than
before to improve the overall MPTCP connection. Also, the default
value is set to 2 min, which was already way too long, and caused the
ADD_ADDR not to be retransmitted for connections shorter than 2 minutes.
Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576
Reviewed-by: Christoph Paasch <cpaasch@openai.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-1-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
idpf: add XDP support
Alexander Lobakin says:
Add XDP support (w/o XSk for now) to the idpf driver using the libeth_xdp
sublib. All possible verdicts, .ndo_xdp_xmit(), multi-buffer etc. are here.
In general, nothing outstanding comparing to ice, except performance --
let's say, up to 2x for .ndo_xdp_xmit() on certain platforms and
scenarios.
idpf doesn't support VLAN Rx offload, so only the hash hint is
available for now.
Patches 1-7 are prereqs, without which XDP would either not work at all
or work slower/worse/...
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: add XDP RSS hash hint
idpf: add support for .ndo_xdp_xmit()
idpf: add support for XDP on Rx
idpf: use generic functions to build xdp_buff and skb
idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq
idpf: prepare structures to support XDP
idpf: add support for nointerrupt queues
idpf: remove SW marker handling from NAPI
idpf: add 4-byte completion descriptor definition
idpf: link NAPIs to queues
idpf: use a saner limit for default number of queues to allocate
idpf: fix Rx descriptor ready check barrier in splitq
xdp, libeth: make the xdp_init_buff() micro-optimization generic
====================
Link: https://patch.msgid.link/20250908195748.1707057-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
first_remote_rcu() can return NULL if the FDB entry points to an FDB
nexthop group instead of a remote destination. However, unlike other
users of first_remote_rcu(), NPD cannot currently happen in
vxlan_fdb_find_uc() as it is only invoked by one driver which vetoes the
creation of FDB nexthops.
Make the function more robust by making sure the remote destination is
only dereferenced if it is not NULL.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250908075141.125087-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use two separate static variables instead of the struct, this allows
to simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>