Commit Graph

1383034 Commits

Author SHA1 Message Date
Gatien Chevallier
adbe2cfd8a drivers: net: stmmac: handle start time set in the past for flexible PPS
In case the time arguments used for flexible PPS signal generation are in
the past, consider the arguments to be a time offset relative to the MAC
system time.

This way, past time use case is handled and it avoids the tedious work
of passing an absolute time value for the flexible PPS signal generation
while not breaking existing scripts that may rely on this behavior.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://patch.msgid.link/20250901-relative_flex_pps-v4-2-b874971dfe85@foss.st.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 16:51:08 -07:00
Gatien Chevallier
96c88268b7 time: export timespec64_add_safe() symbol
Export the timespec64_add_safe() symbol so that this function can be used
in modules where computation of time related is done.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250901-relative_flex_pps-v4-1-b874971dfe85@foss.st.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 16:51:08 -07:00
Joy Zou
59aec9138f net: stmmac: imx: add i.MX91 support
Add i.MX91 specific settings for EQoS.

Reviewed-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Joy Zou <joy.zou@nxp.com>
Link: https://patch.msgid.link/20250901103632.3409896-7-joy.zou@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 16:36:47 -07:00
Jakub Kicinski
24ee9feeb3 Merge tag 'nf-next-25-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:

====================
netfilter: updates for net-next

1) prefer vmalloc_array in ebtables, from  Qianfeng Rong.

2) Use csum_replace4 instead of open-coding it, from Christophe Leroy.

3+4) Get rid of GFP_ATOMIC in transaction object allocations, those
     cause silly failures with large sets under memory pressure, from
     myself.

5) Remove test for AVX cpu feature in nftables pipapo set type,
   testing for AVX2 feature is sufficient.

6) Unexport a few function in nf_reject infra: no external callers.

7) Extend payload offset to u16, this was restricted to values <=255
   so far, from Fernando Fernandez Mancera.

* tag 'nf-next-25-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nft_payload: extend offset to 65535 bytes
  netfilter: nf_reject: remove unneeded exports
  netfilter: nft_set_pipapo: remove redundant test for avx feature bit
  netfilter: nf_tables: all transaction allocations can now sleep
  netfilter: nf_tables: allow iter callbacks to sleep
  netfilter: nft_payload: Use csum_replace4() instead of opencoding
  netfilter: ebtables: Use vmalloc_array() to improve code
====================

Link: https://patch.msgid.link/20250902133549.15945-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 16:06:45 -07:00
Onur Özkan
a7ddedc84c rust: phy: use to_result for error handling
Simplifies error handling by replacing the manual check
of the return value with the `to_result` helper.

Signed-off-by: Onur Özkan <work@onurozkan.dev>
Reviewed-by: Elle Rhumsaa <elle@weathered-steel.dev>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Link: https://patch.msgid.link/20250821091235.800-1-work@onurozkan.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:35:42 -07:00
Krzysztof Kozlowski
69cd993507 dt-bindings: net: renesas,rzn1-gmac: Constrain interrupts
Renesas RZN1 GMAC uses three interrupts in in-kernel DTS and common
snps,dwmac.yaml binding is flexible, so define precise constraint for
this device.

Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250902154051.263156-4-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:19:32 -07:00
Krzysztof Kozlowski
f672fcd8e6 dt-bindings: net: altr,socfpga-stmmac: Constrain interrupts
STMMAC on SoCFPGA uses exactly one interrupt in in-kernel DTS and common
snps,dwmac.yaml binding is flexible, so define precise constraint for
this device.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Matthew Gerlach <matthew.gerlach@altera.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250902154051.263156-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:19:32 -07:00
Jakub Kicinski
f38b9334bb Merge branch 'tools-ynl-gen-misc-changes'
Asbjørn Sloth Tønnesen says:

====================
tools: ynl-gen: misc changes

Misc changes to ahead of wireguard ynl conversion, as announced in v1.

v1: https://lore.kernel.org/20250901145034.525518-1-ast@fiberby.net
====================

Link: https://patch.msgid.link/20250902154640.759815-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:16:54 -07:00
Asbjørn Sloth Tønnesen
017bda80fd genetlink: fix typo in comment
In this context "not that ..." should properly be "note that ...".

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250902154640.759815-4-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:16:49 -07:00
Asbjørn Sloth Tønnesen
5fece05445 tools: ynl-gen: use macro for binary min-len check
This patch changes the generated min-len check for binary
attributes to use the NLA_POLICY_MIN_LEN() macro, thereby the
generated code supports strict policy validation.

With this change TypeBinary will always generate a NLA_BINARY
attribute policy.

This doesn't change any currently generated code, as it isn't
used in any specs currently used for generating code.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250902154640.759815-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:16:49 -07:00
Asbjørn Sloth Tønnesen
9f9581ba74 netlink: specs: fou: change local-v6/peer-v6 check
While updating the binary min-len implementation, I noticed that
the only user, should AFAICT be using exact-len instead.

In net/ipv4/fou_core.c FOU_ATTR_LOCAL_V6 and FOU_ATTR_PEER_V6
are only used for singular IPv6 addresses, and there are AFAICT
no known implementations trying to send more, it therefore
appears safe to change it to an exact-len policy.

This patch therefore changes the local-v6/peer-v6 attributes to
use an exact-len check, instead of a min-len check.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250902154640.759815-2-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:16:49 -07:00
Jakub Kicinski
a229866f7d Merge branch 'mptcp-misc-features-for-v6-18'
Matthieu Baerts says:

====================
mptcp: misc. features for v6.18

This series contains 4 independent new features:

- Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.

- Patch 2: selftests: check for unexpected fallback counter increments.

- Patches 3-4: record subflows in RPS table, for aRFS support.

v1: https://lore.kernel.org/20250901-net-next-mptcp-misc-feat-6-18-v1-0-80ae80d2b903@kernel.org
====================

Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-0-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:08:22 -07:00
Christoph Paasch
3bd4f98a4e mptcp: record subflows in RPS table
Accelerated Receive Flow Steering (aRFS) relies on sockets recording
their RX flow hash into the rps_sock_flow_table so that incoming packets
are steered to the CPU where the application runs.

With MPTCP, the application interacts with the parent MPTCP socket while
data is carried over per-subflow TCP sockets. Without recording these
subflows, aRFS cannot steer interrupts and RX processing for the flows
to the desired CPU.

Record all subflows in the RPS table by calling sock_rps_record_flow()
for each subflow at the start of mptcp_sendmsg(), mptcp_recvmsg() and
mptcp_stream_accept(), by using the new helper
mptcp_rps_record_subflows().

It does not by itself improve throughput, but ensures that IRQ and RX
processing are directed to the right CPU, which is a
prerequisite for effective aRFS.

Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-4-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:08:20 -07:00
Christoph Paasch
929324913e net: Add rfs_needed() helper
Add a helper to check if RFS is needed or not. Allows to make the code a
bit cleaner and the next patch to have MPTCP use this helper to decide
whether or not to iterate over the subflows.

tun_flow_update() was calling sock_rps_record_flow_hash() regardless of
the state of rfs_needed. This was not really a bug as sock_flow_table
simply ends up being NULL and thus everything will be fine.
This commit here thus also implicitly makes tun_flow_update() respect
the state of rfs_needed.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-3-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:08:20 -07:00
Gang Yan
3fff72f827 selftests: mptcp: add checks for fallback counters
Recently, some mib counters about fallback has been added, this patch
provides a method to check the expected behavior of these mib counters
during the test execution.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/571
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-2-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:08:20 -07:00
Eric Biggers
2d5be5629c mptcp: use HMAC-SHA256 library instead of open-coded HMAC
Now that there are easy-to-use HMAC-SHA256 library functions, use these
in net/mptcp/crypto.c instead of open-coding the HMAC algorithm.

Remove the WARN_ON_ONCE() for messages longer than SHA256_DIGEST_SIZE.
The new implementation handles all message lengths correctly.

The mptcp-crypto KUnit test still passes after this change.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-1-fa02bb3188b1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 15:08:20 -07:00
Jakub Kicinski
0e2a5208cc Merge tag 'mlx5-psp-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5 PSP IFC bits

This PR has a single patch to add mlx5_ifc PSP related capabilities structures
and HW definitions needed for PSP support in mlx5.

Link: https://lore.kernel.org/20250828162953.2707727-1-daniel.zahka@gmail.com

* tag 'mlx5-psp-ifc' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add PSP capabilities structures and bits
====================

Link: https://patch.msgid.link/20250903063050.668442-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03 14:59:41 -07:00
Saeed Mahameed
04a3134f88 net/mlx5: Add PSP capabilities structures and bits
Add mlx5_ifc PSP related capabilities structures and HW definitions
needed for PSP support in mlx5.

Link: https://lore.kernel.org/netdev/20250828162953.2707727-1-daniel.zahka@gmail.com/
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2025-09-02 23:08:13 -07:00
Jakub Kicinski
1d8f005909 Merge branch 'net-dsa-lantiq_gswip-prepare-for-supporting-maxlinear-gsw1xx'
Daniel Golle says:

====================
net: dsa: lantiq_gswip: prepare for supporting MaxLinear GSW1xx

Continue to prepare for supporting the newer standalone MaxLinear GSW1xx
switch family by extending the existing lantiq_gswip driver to allow it
to support MII interfaces and MDIO bus of the GSW1xx.

This series has been preceded by an RFC series which covers everything
needed to support the MaxLinear GSW1xx family of switches. Andrew Lunn
had suggested to split it into a couple of smaller series and start
with the changes which don't yet make actual functional changes or
support new features.

Everything has been compile and runtime tested on AVM Fritz!Box 7490
(GSWIP version 2.1, VR9 v1.2)

Link: https://lore.kernel.org/netdev/aKDhFCNwjDDwRKsI@pidgin.makrotopia.org/
====================

Link: https://patch.msgid.link/cover.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:45:45 -07:00
Daniel Golle
0dc602a3c7 net: dsa: lantiq_gswip: move MDIO bus registration to .setup()
Instead of registering the switch MDIO bus in the probe() function, move
the call to gswip_mdio() into the .setup() DSA switch op, so it can be
reused independently of the probe() function.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/2650602042c0bfdc5664b88d59071ed4dca96c26.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:45:43 -07:00
Daniel Golle
720412c4ae net: dsa: lantiq_gswip: support standard MDIO node name
Instead of matching against the child node's compatible string also
support locating the node of the device tree node of the MDIO bus
in the standard way by referencing the node name ("mdio").

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/5a9a3d659ef0d8b7eca37fb69ec87ff5a3192820.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:45:43 -07:00
Daniel Golle
5157820326 net: dsa: lantiq_gswip: support offset of MII registers
The MaxLinear GSW1xx family got a single (R)(G)MII port at index 5 but
the registers MII_PCDU and MII_CFG are those of port 0.
Allow applying an offset for the port index to access those registers.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/88145164c1f948e4ae9b04706f408359cf54223c.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:45:42 -07:00
Daniel Golle
17420a7fe5 net: dsa: lantiq_gswip: ignore SerDes modes in phylink_mac_config()
We can safely ignore SerDes interface modes 1000Base-X, 2500Base-X and
SGMII in phylink_mac_config() as they are being taken care of by the PCS
and the SGMII port anyway doesn't have MII_CFG and MII_PCDU registers
and hence gswip_phylink_mac_config() is already a no-op apart from
outputing a misleading error message.

Return early in case of SerDes interface modes to avoid printing that
error message.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/dcb066d6a02e6340314b5ff4f73937757a4f8eb3.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:45:42 -07:00
Daniel Golle
7a1eaef0a7 net: dsa: lantiq_gswip: support model-specific mac_select_pcs()
Call mac_select_pcs() function if provided in struct gswip_hwinfo.
The MaxLinear GSW1xx series got one port wired to a SerDes PCS and
PHY which can do 1000Base-X, 2500Base-X and SGMII. Support for the
SerDes port will be provided using phylink_pcs, so provide a
convenient way for mac_select_pcs() to differ based on the hardware
model.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/7668666aa51e43e7f2a6cbcf36eb5a0a3020998f.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:45:42 -07:00
Daniel Golle
cb477c3051 net: dsa: lantiq_gswip: move to dedicated folder
Move the lantiq_gswip driver to its own folder and update
MAINTAINERS file accordingly.
This is done ahead of extending the driver to support the MaxLinear
GSW1xx series of standalone switch ICs, which includes adding a bunch
of files.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://patch.msgid.link/a5923dee9a174501b284dc473bdec9dd89c68de1.1756520811.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:45:42 -07:00
Yue Haibing
3d95261eeb ipv6: Add sanity checks on ipv6_devconf.rpl_seg_enabled
In ipv6_rpl_srh_rcv() we use min(net->ipv6.devconf_all->rpl_seg_enabled,
idev->cnf.rpl_seg_enabled) is intended to return 0 when either value is
zero, but if one of the values is negative it will in fact return non-zero.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250901123726.1972881-3-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 17:01:14 -07:00
Zongmin Zhou
b0bc645122 selftests: net: avoid memory leak
The buffer be used without free,fix it to avoid memory leak.

Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901054557.32811-1-min_halo@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 16:55:00 -07:00
Chandra Mohan Sundar
3586018d5c net: macb: Validate the value of base_time properly
In macb_taprio_setup_replace(), the value of start_time is being
compared against zero which would never be true since start_time
is an unsigned value. Due to this there is a chance that an
incorrect config base time value can be used for computation.

Fix by checking the value of conf->base_time directly.
This issue was reported by static coverity analyzer.

Fixes: 89934dbf16 ("net: macb: Add TAPRIO traffic scheduling support")
Signed-off-by: Chandra Mohan Sundar <chandramohan.explore@gmail.com>
Reviewed-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com>
Link: https://patch.msgid.link/20250901162923.627765-1-chandramohan.explore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 16:39:29 -07:00
Jakub Kicinski
e2cf2d5baa selftests: drv-net: rss_ctx: make the test pass with few queues
rss_ctx.test_rss_key_indir implicitly expects at least 5 queues,
as it checks that the traffic on first 2 queues is lower than
the remaining queues when we use all queues. Special case fewer
queues.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901173139.881070-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 16:38:26 -07:00
Jakub Kicinski
4022f92a2e selftests: drv-net: rss_ctx: use Netlink for timed reconfig
The rss_ctx test has gotten pretty flaky after I increased
the queue count in NIPA 2->3. Not 100% clear why. We get
a lot of failures in the rss_ctx.test_hitless_key_update case.

Looking closer it appears that the failures are mostly due
to startup costs. I measured the following timing for ethtool -X:
 - python cmd(shell=True)  : 150-250msec
 - python cmd(shell=False) :  50- 70msec
 - timed in bash           :  45- 55msec
 - YNL Netlink call        :   2-  4msec
 - .set_rxfh callback      :   1-  2msec

The target in the test was set to 200msec. We were mostly measuring
ethtool startup cost it seems. Switch to YNL since it's 100x faster.

Lower the pass criteria to 150msec, no real science behind this number
but we removed some overhead, drivers which previously passed 200msec
should easily pass 150msec now.

Separately we should probably follow up on defaulting to shell=False,
when script doesn't explicitly ask for True, because the overhead
is rather significant.

Switch from _rss_key_rand() to random.randbytes(), YNL takes a binary
array rather than array of ints.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901173139.881070-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 16:38:26 -07:00
Jakub Kicinski
bc1a767f69 selftests: net: py: don't default to shell=True
Overhead of using shell=True is quite significant.
Micro-benchmark of running ethtool --help shows that
non-shell run is 2x faster.

Runtime of the XDP tests also shows improvement:
this patch: 2m34s 2m21s 2m18s 2m18s
    before:     2m54s 2m36s 2m34s

Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250830184317.696121-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 16:09:36 -07:00
Jakub Kicinski
c2e5108649 selftests: drv-net: adjust tests before defaulting to shell=False
Clean up tests which expect shell=True without explicitly passing
that param to cmd(). There seems to be only one such case, and
in fact it's better converted to a direct write.

Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250830184317.696121-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 16:09:36 -07:00
Eric Dumazet
5d14bbf9d1 net_sched: act: remove tcfa_qstats
tcfa_qstats is currently only used to hold drops and overlimits counters.

tcf_action_inc_drop_qstats() and tcf_action_inc_overlimit_qstats()
currently acquire a->tcfa_lock to increment these counters.

Switch to two atomic_t to get lock-free accounting.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250901093141.2093176-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 15:52:24 -07:00
Eric Dumazet
3016024d75 net_sched: add back BH safety to tcf_lock
Jamal reported that we had to use BH safety after all,
because stats can be updated from BH handler.

Fixes: 3133d5c15c ("net_sched: remove BH blocking in eight actions")
Fixes: 53df77e785 ("net_sched: act_skbmod: use RCU in tcf_skbmod_dump()")
Fixes: e97ae74297 ("net_sched: act_tunnel_key: use RCU in tunnel_key_dump()")
Fixes: 48b5e5dbdb ("net_sched: act_vlan: use RCU in tcf_vlan_dump()")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Closes: https://lore.kernel.org/netdev/CAM0EoMmhq66EtVqDEuNik8MVFZqkgxFbMu=fJtbNoYD7YXg4bA@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250901092608.2032473-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 15:51:45 -07:00
Breno Leitao
23313771c7 net: selftests: clean up tools/testing/selftests/net/lib/py/utils.py
This patch improves the utils.py module by removing unused imports
(errno, random), simplifying the fd_read_timeout() function by
eliminating unnecessary else clause, and cleaning up code style in the
defer class constructor.

Additionally, it renames the parameter in rand_port() from 'type' to
'stype' to avoid shadowing the built-in Python name 'type', improving
code clarity and preventing potential issues.

These changes enhance code readability and maintainability without
affecting functionality.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901-fix-v1-1-df0abb67481e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 15:51:16 -07:00
James Flowers
d250f14f5f net/smc: Replace use of strncpy on NUL-terminated string with strscpy
strncpy is deprecated for use on NUL-terminated strings, as indicated in
Documentation/process/deprecated.rst. strncpy NUL-pads the destination
buffer and doesn't guarantee the destination buffer will be NUL
terminated.

Signed-off-by: James Flowers <bold.zone2373@fastmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Link: https://patch.msgid.link/20250901030512.80099-1-bold.zone2373@fastmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 14:05:49 -07:00
Russell King (Oracle)
99502c61e8 net: mvpp2: add xlg pcs inband capabilities
Add PCS inband capabilities for XLG in the Marvell PP2 driver, so
phylink knows that 5G and 10G speeds have no inband capabilities.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/E1uslR9-00000001OxL-44CD@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 14:04:53 -07:00
Jay Vosburgh
23a6037ce7 bonding: Remove support for use_carrier
Remove the implementation of use_carrier, the link monitoring
method that utilizes ethtool or ioctl to determine the link state of an
interface in a bond.  Bonding will always behaves as if use_carrier=1,
which relies on netif_carrier_ok() to determine the link state of
interfaces.

	To avoid acquiring RTNL many times per second, bonding inspects
link state under RCU, but not under RTNL.  However, ethtool
implementations in drivers may sleep, and therefore this strategy is
unsuitable for use with calls into driver ethtool functions.

	The use_carrier option was introduced in 2003, to provide
backwards compatibility for network device drivers that did not support
the then-new netif_carrier_ok/on/off system.  Device drivers are now
expected to support netif_carrier_*, and the use_carrier backwards
compatibility logic is no longer necessary.

	The option itself remains, but when queried always returns 1,
and may only be set to 1.

Link: https://lore.kernel.org/000000000000eb54bf061cfd666a@google.com
Link: https://lore.kernel.org/20240718122017.d2e33aaac43a.I10ab9c9ded97163aef4e4de10985cd8f7de60d28@changeid
Signed-off-by: Jay Vosburgh <jv@jvosburgh.net>
Reported-by: syzbot+b8c48ea38ca27d150063@syzkaller.appspotmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/2029487.1756512517@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02 14:01:54 -07:00
Fernando Fernandez Mancera
077dc4a275 netfilter: nft_payload: extend offset to 65535 bytes
In some situations 255 bytes offset is not enough to match or manipulate
the desired packet field. Increase the offset limit to 65535 or U16_MAX.

In addition, the nla policy maximum value is not set anymore as it is
limited to s16. Instead, the maximum value is checked during the payload
expression initialization function.

Tested with the nft command line tool.

table ip filter {
	chain output {
		@nh,2040,8 set 0xff
		@nh,524280,8 set 0xff
		@nh,524280,8 0xff
		@nh,2040,8 0xff
	}
}

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-02 15:28:18 +02:00
Florian Westphal
f4f9e05904 netfilter: nf_reject: remove unneeded exports
These functions have no external callers and can be static.

Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-02 15:28:17 +02:00
Florian Westphal
8959f27d39 netfilter: nft_set_pipapo: remove redundant test for avx feature bit
Sebastian points out that avx2 depends on avx, see check_cpufeature_deps()
in arch/x86/kernel/cpu/cpuid-deps.c:
avx2 feature bit will be cleared when avx isn't available.

No functional change intended.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-02 15:28:17 +02:00
Florian Westphal
3d95a2e016 netfilter: nf_tables: all transaction allocations can now sleep
Now that nft_setelem_flush is not called with rcu read lock held or
disabled softinterrupts anymore this can now use GFP_KERNEL too.

This is the last atomic allocation of transaction elements, so remove
all gfp_t arguments and the wrapper function.

This makes attempts to delete large sets much more reliable, before
this was prone to transient memory allocation failures.

Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-02 15:28:17 +02:00
Florian Westphal
a60a5abe19 netfilter: nf_tables: allow iter callbacks to sleep
Quoting Sven Auhagen:
  we do see on occasions that we get the following error message, more so on
  x86 systems than on arm64:

  Error: Could not process rule: Cannot allocate memory delete table inet filter

  It is not a consistent error and does not happen all the time.
  We are on Kernel 6.6.80, seems to me like we have something along the lines
  of the nf_tables: allow clone callbacks to sleep problem using GFP_ATOMIC.

As hinted at by Sven, this is because of GFP_ATOMIC allocations during
set flush.

When set is flushed, all elements are deactivated. This triggers a set
walk and each element gets added to the transaction list.

The rbtree and rhashtable sets don't allow the iter callback to sleep:
rbtree walk acquires read side of an rwlock with bh disabled, rhashtable
walk happens with rcu read lock held.

Rbtree is simple enough to resolve:
When the walk context is ITER_READ, no change is needed (the iter
callback must not deactivate elements; we're not in a transaction).

When the iter type is ITER_UPDATE, the rwlock isn't needed because the
caller holds the transaction mutex, this prevents any and all changes to
the ruleset, including add/remove of set elements.

Rhashtable is slightly more complex.
When the iter type is ITER_READ, no change is needed, like rbtree.

For ITER_UPDATE, we hold transaction mutex which prevents elements from
getting free'd, even outside of rcu read lock section.

So build a temporary list of all elements while doing the rcu iteration
and then call the iterator in a second pass.

The disadvantage is the need to iterate twice, but this cost comes with
the benefit to allow the iter callback to use GFP_KERNEL allocations in
a followup patch.

The new list based logic makes it necessary to catch recursive calls to
the same set earlier.

Such walk -> iter -> walk recursion for the same set can happen during
ruleset validation in case userspace gave us a bogus (cyclic) ruleset
where verdict map m jumps to chain that sooner or later also calls
"vmap @m".

Before the new ->in_update_walk test, the ruleset is rejected because the
infinite recursion causes ctx->level to exceed the allowed maximum.

But with the new logic added here, elements would get skipped:

nft_rhash_walk_update would see elements that are on the walk_list of
an older stack frame.

As all recursive calls into same map results in -EMLINK, we can avoid this
problem by using the new in_update_walk flag and reject immediately.

Next patch converts the problematic GFP_ATOMIC allocations.

Reported-by: Sven Auhagen <Sven.Auhagen@belden.com>
Closes: https://lore.kernel.org/netfilter-devel/BY1PR18MB5874110CAFF1ED098D0BC4E7E07BA@BY1PR18MB5874.namprd18.prod.outlook.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-02 15:28:17 +02:00
Christophe Leroy
c015e17ba1 netfilter: nft_payload: Use csum_replace4() instead of opencoding
Open coded calculation can be avoided and replaced by the
equivalent csum_replace4() in nft_csum_replace().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-02 15:28:17 +02:00
Qianfeng Rong
46015e6b3e netfilter: ebtables: Use vmalloc_array() to improve code
Remove array_size() calls and replace vmalloc() with vmalloc_array() to
simplify the code.  vmalloc_array() is also optimized better, uses fewer
instructions, and handles overflow more concisely[1].

[1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-02 15:28:17 +02:00
Paolo Abeni
cd8a4cfa6b Merge branch 'e-switch-vport-sharing-delegation'
Saeed Mahameed says:

====================
E-Switch vport sharing & delegation

An mlx5 E-Switch FDB table can manage vports belonging to other sibling
physical functions, such as ECPF (ARM embedded cores) and Host PF (x86).
This enables a single source of truth for SDN software to manage network
pipelines from one host. While such functionality already exists in mlx5,
it is currently limited by static vport allocation,
meaning the number of vports shared between multi-host functions
must be known pre-boot.

This patchset enables delegated/external vports to be discovered
dynamically when switchdev mode is enabled, leveraging new firmware
capabilities for dynamic vport creation.

Adjacent functions that delegate their SR-IOV VFs to sibling PFs, can be
dynamically discovered on the sibling PF's switchdev mode enabling,
after sriov was enabled on the originating PF, allowing for more
flexible and scalable management in multi-host and ECPF-to-host
scenarios.

The patchset consists of the following changes:

- Refactoring of ACL root namespace handling: The storage of vport ACL root
  namespaces is converted from a linear array to an xarray, allowing dynamic
  creation of ACLs per individual vport.
- Improvements for vhca_id to vport mapping.
- Dynamic querying and creation of delegated functions/vports.
====================

Link: https://patch.msgid.link/20250829223722.900629-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02 15:18:19 +02:00
Saeed Mahameed
0c2a02f3c0 net/mlx5: {DR,HWS}, Use the cached vhca_id for this device
The mlx5 driver caches many capabilities to be used by mlx5 layers.

In SW and HW steering we can use the cached vhca_id instead of invoking
FW commands.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-8-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02 15:18:16 +02:00
Adithya Jayachandran
5d8ae2c2cf net/mlx5: E-switch, Set representor attributes for adjacent VFs
Adjacent vfs get their devlink port information from firmware,
use the information (pfnum, function id) from FW when populating the
devlink port attributes.

Before:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev eth0 flavour pcivf controller 0 pfnum 0 vfnum 49152 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00

After:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev enp0s3npf0vf2 flavour pcivf controller 0 pfnum 0 vfnum 2 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00

Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-7-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02 15:18:16 +02:00
Saeed Mahameed
a0a7002b94 net/mlx5: E-Switch, Register representors for adjacent vports
Register representors for adjacent vports dynamically when they are
discovered. Dynamically added representors state will now be set to
'REGISTERED' when the representor type was already registered,
otherwise they won't be loaded.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-6-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02 15:18:16 +02:00
Saeed Mahameed
9984ec9f1f net/mlx5: E-Switch, Create acls root namespace for adjacent vports
Use the new vport acl root namespace add/remove API to create the
missing acl root name spaces per each new adjacent function vport.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829223722.900629-5-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02 15:18:16 +02:00