In the SRIOV LAG Active-Active, the functions ice_lag_cfg_pf_fltr's
and ice_lag_config_eswitch's content are moved to earlier locations
in the source file. Also, ice_lag_cfg_pf_fltr is renamed, and its
flow is changed.
To reduce the delta in the larger patch, move the original functions
to their new location so that only functional changes are needed in
the larger patch.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation for the new LAG functionality implementation, there are
a couple of existing LAG elements of the capabilities struct that should
be bool instead of u8. Since we are adding a new element to this struct
that should also be a bool, fix the existing LAG u8 in this patch and
eliminate !! operators where possible.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a simple test for checking that RSS on flow label works,
and that its rejected for IPv4 flows.
# ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py
TAP version 13
1..2
ok 1 rss_flow_label.test_rss_flow_label
ok 2 rss_flow_label.test_rss_flow_label_6only
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250811234212.580748-5-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Support IPv6 Flow Label hashing. Use both inner and outer IPv6
header's Flow Label if both headers are detected. Flow Label
is unlike normal header fields, by enabling it user accepts
the unstable hash and possible reordering. Because of that
I think it's reasonable to hash over all Flow Labels we can
find, even tho we don't hash over all L3 addresses.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250811234212.580748-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Some modern NICs support including the IPv6 Flow Label in
the flow hash for RSS queue selection. This is outside
the old "Microsoft spec", but was included in the OCP NIC spec:
[ ] RSS include flow label in the hash (configurable)
https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling
RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
to recover from receiver congestion / overload.
Rx CPU/queue hotspots are relatively common for data ingest
workloads, and so far we had to try to detect the condition
at the RPC layer and reopen the connection. PLB lets us change
the Flow Label and therefore Rx CPU on RTO, with minimal packet
reordering. PLB reaction times are much faster, and can happen
at any point in the connection, not just at RPC boundaries.
Due to the nature of host processing (relatively long queues,
other kernel subsystems masking IRQs for 100s of msecs)
the risk of reordering within the host is higher than in
the network. But for applications which need it - it is far
preferable to potentially persistent overload of subset of
queues.
It is expected that the hash communicated to the host
may change if the Flow Label changes. This may be surprising
to some host software, but I don't expect the devices
can compute two Toeplitz hashes, one with the Flow Label
for queue selection and one without for the rx hash
communicated to the host. Besides, changing the hash
may potentially help to change the path thru host queues.
User can disable NETIF_F_RXHASH if they require a stable
flow hash.
The name RXH_IP6_FL was chosen based on what we call
Flow Label variables in IPv6 processing (fl). I prefer
fl_lbl but that appears to be an fbnic-only spelling.
We could spell out RXH_IP6_FLOW_LABEL but existing
RXH_ defines are a lot more terse.
Willem notes [1] that Flow Label is defined as identifying the flow
and therefore including both the flow label _and_ the L4 header
fields is not generally necessary. But it should not hurt so
it's not explicitly prevented if the driver supports hashing
on both at the same time.
Link: https://lore.kernel.org/68483433b45e2_3cd66f29440@willemb.c.googlers.com.notmuch [1]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250811234212.580748-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Thomas Weißschuh says:
====================
net: Don't use %pK through printk or tracepoints
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
====================
Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-0-2e2fdc7d3f2c@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through tracepoints. They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-2-2e2fdc7d3f2c@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250811-restricted-pointers-net-v5-1-2e2fdc7d3f2c@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When `devm_add_action_or_reset()` fails, it is due to a failed memory
allocation and will thus return `-ENOMEM`. `dev_err_probe()` doesn't do
anything when error is `-ENOMEM`. Therefore, remove the useless call to
`dev_err_probe()` when `devm_add_action_or_reset()` fails, and just
return the value instead.
Signed-off-by: Waqar Hameed <waqar.hameed@axis.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/pnd1ppghh4p.a.out@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace the strcpy() calls that copy the device name into ifr->ifr_name
with strscpy() to avoid potential overflows and guarantee NULL termination.
Destination is ifr->ifr_name (size IFNAMSIZ).
Tested in QEMU (BusyBox rootfs):
- Created TUN devices via TUNSETIFF helper
- Set addresses and brought links up
- Verified long interface names are safely truncated (IFNAMSIZ-1)
Signed-off-by: Miguel García <miguelgarciaroman8@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250812082244.60240-1-miguelgarciaroman8@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test various aspects of FDB activity notification control:
* Transitioning of an FDB entry from inactive to active state.
* Transitioning of an FDB entry from active to inactive state.
* Avoiding the resetting of an FDB entry's last activity time (i.e.,
"updated" time) using the "norefresh" keyword.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250812071810.312346-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previous commit 230b183921 ("net: Use standard structures for generic
socket address structures.") use 'struct sockaddr_storage address;'
to replace 'char address[MAX_SOCK_ADDR];'.
The macro MAX_SOCK_ADDR is removed by commit 01893c82b4 ("net: Remove
MAX_SOCK_ADDR constant").
The comment in vsock_getname() is outdated, use sizeof(struct
sockaddr_storage) instead of magic value 128.
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20250812015929.1419896-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the MAC controller does not connect to any PHY interface, there is a
missing clock, then the DMA reset fails.
For this case, the DMA_BUS_MODE_SFT_RESET bit is 1 before software reset,
just print an error message which gives a hint the PHY clock is missing,
and then return -EINVAL immediately to avoid waiting for the timeout when
the DMA reset fails in loongson_dwmac_fix_reset().
With this patch, for the normal end user, the computer start faster with
reducing boot time for 2 seconds on the specified mainboard.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://patch.msgid.link/20250811073506.27513-4-yangtiezhu@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi says:
====================
net: airoha: Introduce NPU callbacks for wlan offloading
Similar to wired traffic, EN7581 SoC allows to offload traffic to/from
the MT76 wireless NIC configuring the NPU module via the Netfilter
flowtable. This series introduces the necessary NPU callback used by
the MT7996 driver in order to enable the offloading.
MT76 support has been posted as RFC in [0] in order to show how the
APIs are consumed.
====================
Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-0-58823603bb4e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Read all NPU wlan IRQ lines from the NPU device-tree node.
NPU module fires wlan irq lines when the traffic to/from the WiFi NIC is
not hw accelerated (these interrupts will be consumed by the MT76 driver
in subsequent patches).
This is a preliminary patch to enable wlan flowtable offload for EN7581
SoC.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-5-58823603bb4e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Document memory regions used by Airoha EN7581 NPU for wlan traffic
offloading. The brand new added memory regions do not introduce any
backward compatibility issues since they will be used just to offload
traffic to/from the MT76 wireless NIC and the MT76 probing will not fail
if these memory regions are not provide, it will just disable offloading
via the NPU module.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250811-airoha-en7581-wlan-offlaod-v7-1-58823603bb4e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski says:
====================
selftests: drv-net: improve zerocopy tests
A few tweaks to the devmem test to make it more "NIPA-compatible".
We still need a fix to make sure that the test sets hds threshold
to 0. Taehee is presumably already/still working on that:
https://lore.kernel.org/20250702104249.1665034-1-ap420073@gmail.com
so I'm not including my version.
# ./tools/testing/selftests/drivers/net/hw/devmem.py
TAP version 13
1..3
ok 1 devmem.check_rx
ok 2 devmem.check_tx
ok 3 devmem.check_tx_chunks
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
====================
Link: https://patch.msgid.link/20250811231334.561137-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Device Under Test should always be the local system.
While the Rx test gets this right the Tx test is sending
from remote to local. So Tx of DMABUF memory happens on remote.
These tests never run in NIPA since we don't have a compatible
device so we haven't caught this.
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a number of:
with bkg("socat ..LISTEN..", exit_wait=True)
uses in the tests. If whatever is supposed to send the traffic
fails we will get stuck in the bkg(). Try to kill the process
in case of exception, to avoid the long wait.
A specific example where this happens is the devmem Tx tests.
Reviewed-by: Joe Damato <joe@dama.to>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The general expectations for network HW selftests is that they
will be run as root. sudo doesn't seem to work on NIPA VMs.
While it's probably something solvable in the setup I think we should
remove the sudos. devmem is the only networking test using sudo.
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: stmmac: improbe suspend/resume architecture
This series improves the stmmac suspend/resume architecture by
providing a couple of method hooks in struct plat_stmmacenet_data which
are called by core code, and thus are available for any of the
platform glue drivers, whether using a platform or PCI device.
As these methods are called by core code, we can also provide a simple
PM ops structure also in the core code for converted glue drivers to
use.
The remainder of the patches convert the various drivers.
====================
Link: https://patch.msgid.link/aJo7kvoub5voHOUQ@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert mediatek to use the resume() platform method rather than the
init() platform method as mediatek_dwmac_init() is only called from
the resume paths.
This will ensure that in a future commit, mediatek_dwmac_init() won't
be called when probing the main part of the stmmac driver.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1ulXcC-008grN-Hc@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert rk to use the new suspend() and resume() methods rather than
implementing these in custom wrappers around the main driver's
suspend/resume methods. This allows this driver to use the simmac
simple PM ops structure.
We can further simplify the driver as there is no need to track whether
the device was suspended, we only need to check whether the device is
wakeup capable in the resume method. This is because the resume method
will only be called after the suspend method.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1ulXc2-008grB-9k@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add suspend/resume platform operations, which, when populated, override
the init/exit platform operations when we suspend and resume. These
suspend()/resume() methods are called by core code, and thus are
designed to support any struct device, not just platform devices. This
allows them to be used by the PCI drivers we have.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1ulXbX-008gqZ-Bb@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima says:
====================
selftest: af_unix: Enable -Wall and -Wflex-array-member-not-at-end.
This series fix 4 warnings caught by -Wall and
-Wflex-array-member-not-at-end.
====================
Link: https://patch.msgid.link/20250811215432.3379570-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>