Commit Graph

1430747 Commits

Author SHA1 Message Date
Xiang Mei
f81f4e79b1 bonding: remove unused bond_is_first_slave and bond_is_last_slave macros
Since commit 2884bf72fb ("net: bonding: fix use-after-free in
bond_xmit_broadcast()"), bond_is_last_slave() was only used in
bond_xmit_broadcast().  After the recent fix replaced that usage with
a simple index comparison, bond_is_last_slave() has no remaining
callers.  bond_is_first_slave() likewise has no callers.

Remove both unused macros.

Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/20260404220412.444753-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:07:08 -07:00
Jakub Kicinski
bd5c24e400 docs: netdev: improve wording of reviewer guidance
Reword the reviewer guidance based on behavior we see on the list.
Steer folks:
 - towards sending tags
 - away from process issues.

Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260406175334.3153451-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:03:00 -07:00
Jakub Kicinski
1795654f00 Merge tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:

====================
netfilter: updates for net-next

1) Fix ancient sparse warnings in nf conntrack nat modules, from
   Sun Jian.

2) Fix typo in enum description, from Jelle van der Waa.

3) remove redundant refetch of netns pointer in nf_conntrack_sip.

4) add a deprecation warning for dccp match.
   We can extend the deadline later if needed, but plan atm is to
   remove the feature.

5) remove nf_conntrack_h323 debug code that can read out-of-bounds
   with malformed messages. This code was commented out, but better
   remove this.

6+7) add more netlink policy validations in netfilter.
   This could theoretically cause issues when a client sends e.g.
   unsupported feature flags that were previously ignored, so we
   may have to relax some changes. For now, try to be stricter and
   reject upfront.

8+9) minor code cleanup in nft_set_pipapo (an nftables set backend).

10) Add nftables matching support fro double-tagged vlan and pppoe
    frames, from Pablo Neira Ayuso.

11) Fix up indentation of debug messages in nf_conntrack_h323 conntrack
    helper, from David Laight.

12) Add a helper to iterate to next flow action and bail out if the
    maximum number of actions is reached, also from Pablo.

* tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it
  netfilter: nf_conntrack_h323: Correct indentation when H323_TRACE defined
  netfilter: nft_meta: add double-tagged vlan and pppoe support
  netfilter: nft_set_pipapo_avx2: remove redundant loop in lookup_slow
  netfilter: nft_set_pipapo: increment data in one step
  netfilter: nf_tables: add netlink policy based cap on registers
  netfilter: add more netlink-based policy range checks
  netfilter: nf_conntrack_h323: remove unreliable debug code in decode_octstr
  netfilter: add deprecation warning for dccp support
  netfilter: nf_conntrack_sip: remove net variable shadowing
  netfilter: nf_tables: Fix typo in enum description
  netfilter: use function typedefs for __rcu NAT helper hook pointers
====================

Link: https://patch.msgid.link/20260408060419.25258-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 18:58:08 -07:00
Jakub Kicinski
ea0f90d1ed Merge tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2026-04-08

1) Update outdated comment in xfrm_dst_check().
   From kexinsun.

2) Drop support for HMAC-RIPEMD-160 from IPsec.
   From Eric Biggers.

* tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Drop support for HMAC-RIPEMD-160
  xfrm: update outdated comment
====================

Link: https://patch.msgid.link/20260408094258.148555-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 18:51:54 -07:00
Pablo Neira Ayuso
c6f8557758 netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it
Add a new helper function to retrieve the next action entry in flow
rule, check if the maximum number of actions is reached, bail out in
such case.

Replace existing opencoded iteration on the action array by this
helper function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:31 +02:00
David Laight
f33fad8dbf netfilter: nf_conntrack_h323: Correct indentation when H323_TRACE defined
The trace lines are indented using PRINT("%*.s", xx, " ").
Userspace will treat this as "%*.0s" and will output no characters
when 'xx' is zero, the kernel treats it as "%*s" and will output
a single ' ' - which is probably what is intended.

Change all the formats to "%*s" removing the default precision.
This gives a single space indent when level is zero.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:31 +02:00
Pablo Neira Ayuso
3785091c6c netfilter: nft_meta: add double-tagged vlan and pppoe support
Currently:

  add rule netdev x y ip saddr 1.1.1.1

does not work with neither double-tagged vlan nor pppoe packets. This is
because the network and transport header offset are not pointing to the
IP and transport protocol headers in the stack.

This patch expands NFT_META_PROTOCOL and NFT_META_L4PROTO to parse
double-tagged vlan and pppoe packets so matching network and transport
header fields becomes possible with the existing userspace generated
bytecode. Note that this parser only supports double-tagged vlan which
is composed of vlan offload + vlan header in the skb payload area for
simplicity.

NFT_META_PROTOCOL is used by bridge and netdev family as an implicit
dependency in the bytecode to match on network header fields.
Similarly, there is also NFT_META_L4PROTO, which is also used as an
implicit dependency when matching on the transport protocol header
fields.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:31 +02:00
Florian Westphal
a3f1e6a19a netfilter: nft_set_pipapo_avx2: remove redundant loop in lookup_slow
nft_pipapo_avx2_lookup_slow will never be used in reality, because the
common sizes are handled by avx2 optimized versions.

However, nft_pipapo_avx2_lookup_slow loops over the data just like the
avx2 functions.  However, _slow doesn't need to do that.

As-is, first loop sets all the right result bits and the next iterations
boil down to 'x = x & x'.  Remove the loop.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:31 +02:00
Florian Westphal
04e1ca21a5 netfilter: nft_set_pipapo: increment data in one step
Since commit e807b13cb3 ("nft_set_pipapo: Generalise group size for buckets")
there is no longer a need to increment the data pointer in two steps.
Switch to a single invocation of NFT_PIPAPO_GROUPS_PADDED_SIZE() helper,
like the avx2 implementation.

[ Stefano: Improve commit message ]

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:31 +02:00
Florian Westphal
8e57338c36 netfilter: nf_tables: add netlink policy based cap on registers
Should have no effect in practice; all of these use the
nft_parse_register_load/store apis which is mandatory anyway due
to the need to further validate the register load/store, e.g.
that the size argument doesn't result in out-of-bounds load/store.

OTOH this is a simple method to reject obviously wrong input
at earlier stage.

Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:31 +02:00
Florian Westphal
66b75e6bbe netfilter: add more netlink-based policy range checks
These spots either already check the attribute range manually
before use or the consuming functions tolerate unexpected values.

Nevertheless, add more range checks via netlink policy so we gain
more users and avoid possible re-use in other places that might
not have the required manual checks.  This also improves error
reporting: netlink core can generate extack errors.

Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:30 +02:00
Florian Westphal
390a57dd61 netfilter: nf_conntrack_h323: remove unreliable debug code in decode_octstr
The debug code (not enabled in any build) reads up to 6 octets of
the inpt buffer, but does so without bound checks.  Zap this.

Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:27 +02:00
Florian Westphal
606bd17ef0 netfilter: add deprecation warning for dccp support
Add a deprecation warning for the xt_dccp match and the
nft exthdr code.

Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:27 +02:00
Florian Westphal
7970d6aaf7 netfilter: nf_conntrack_sip: remove net variable shadowing
net is already set, derived from nf_conn.
I don't see how the device could be living in a different netns
than the conntrack entry.

Remove the extra variable and re-use existing one.

Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:27 +02:00
Jelle van der Waa
1f290c497c netfilter: nf_tables: Fix typo in enum description
Fix the spelling of "options".

Signed-off-by: Jelle van der Waa <jelle@vdwaa.nl>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:26 +02:00
Sun Jian
6e6f2b9b33 netfilter: use function typedefs for __rcu NAT helper hook pointers
After commit 07919126ec ("netfilter: annotate NAT helper hook pointers
with __rcu"), sparse can warn about type/address-space mismatches when
RCU-dereferencing NAT helper hook function pointers.

The hooks are __rcu-annotated and accessed via rcu_dereference(), but the
combination of complex function pointer declarators and the WRITE_ONCE()
machinery used by RCU_INIT_POINTER()/rcu_assign_pointer() can confuse
sparse and trigger false positives.

Introduce typedefs for the NAT helper function types, so __rcu applies to
a simple "fn_t __rcu *" pointer form. Also replace local typeof(hook)
variables with "fn_t *" to avoid propagating __rcu address space into
temporaries.

No functional change intended.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603022359.3dGE9fwI-lkp@intel.com/
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 07:51:26 +02:00
Jakub Kicinski
b3e69fc319 Merge branch 'net-pull-gso-packet-headers-in-core-stack'
Eric Dumazet says:

====================
net: pull gso packet headers in core stack

Most ndo_start_xmit() methods expects headers of gso packets
to be already in skb->head.

net/core/tso.c users are particularly at risk, because tso_build_hdr()
does a memcpy(hdr, skb->data, hdr_len);

qdisc_pkt_len_segs_init() already does a dissection of gso packets.

Use pskb_may_pull() instead of skb_header_pointer() to make
sure drivers do not have to reimplement this.

First patch is a small cleanup to ease second patch review.
====================

Link: https://patch.msgid.link/20260403221540.3297753-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07 19:02:18 -07:00
Eric Dumazet
7fb4c19670 net: pull headers in qdisc_pkt_len_segs_init()
Most ndo_start_xmit() methods expects headers of gso packets
to be already in skb->head.

net/core/tso.c users are particularly at risk, because tso_build_hdr()
does a memcpy(hdr, skb->data, hdr_len);

qdisc_pkt_len_segs_init() already does a dissection of gso packets.

Use pskb_may_pull() instead of skb_header_pointer() to make
sure drivers do not have to reimplement this.

Some malicious packets could be fed, detect them so that we can
drop them sooner with a new SKB_DROP_REASON_SKB_BAD_GSO drop_reason.

Fixes: e876f208af ("net: Add a software TSO helper API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260403221540.3297753-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07 19:02:13 -07:00
Eric Dumazet
30e02ec3b4 net: qdisc_pkt_len_segs_init() cleanup
Reduce indentation level by returning early if the transport header
was not set.

Add an unlikely() clause as this is not the common case.

No functional change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260403221540.3297753-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07 19:02:13 -07:00
Jakub Kicinski
e65d8b6f30 selftests: drv-net: adjust to socat changes
socat v1.8.1.0 now defaults to shut-null, it sends an extra
0-length UDP packet when sender disconnects. This breaks
our tests which expect the exact packet sequence.

Add shut-none which was the old default where necessary.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260404230103.2719103-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07 18:54:03 -07:00
Fernando Fernandez Mancera
2ce8a41113 net: hsr: emit notification for PRP slave2 changed hw addr on port deletion
On PRP protocol, when deleting the port the MAC address change
notification was missing. In addition to that, make sure to only perform
the MAC address change on slave2 deletion and PRP protocol as the
operation isn't necessary for HSR nor slave1.

Note that the eth_hw_addr_set() is correct on PRP context as the slaves
are either in promiscuous mode or forward offload enabled.

Reported-by: Luka Gejak <luka.gejak@linux.dev>
Closes: https://lore.kernel.org/netdev/DHFCZEM93FTT.1RWFBIE32K7OT@linux.dev/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Felix Maurer <fmaurer@redhat.com>
Link: https://patch.msgid.link/20260403123928.4249-2-fmancera@suse.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 17:06:16 +02:00
Paolo Abeni
97a8355b6a Merge branch 'net-mlx5e-xdp-add-support-for-multi-packet-per-page'
Tariq Toukan says:

====================
net/mlx5e: XDP, Add support for multi-packet per page

This series removes the limitation of having one packet per page in XDP
mode. This has the following implications:

- XDP in Striding RQ mode can now be used on 64K page systems.

- XDP in Legacy RQ mode was using a single packet per page which on 64K
  page systems is quite inefficient. The improvement can be observed
  with an XDP_DROP test when running in Legacy RQ mode on a ARM
  Neoverse-N1 system with a 64K page size:
  +-----------------------------------------------+
  | MTU  | baseline   | this change | improvement |
  |------+------------+-------------+-------------|
  | 1500 | 15.55 Mpps | 18.99 Mpps  | 22.0 %      |
  | 9000 | 15.53 Mpps | 18.24 Mpps  | 17.5 %      |
  +-----------------------------------------------+

After lifting this limitation, the series switches to using fragments
for the side page in non-linear mode. This small improvement is at most
visible for XDP_DROP tests with small 64B packets and a large enough MTU
for Striding RQ to be in non-linear mode:
+----------------------------------------------------------------------+
| System               | MTU  | baseline   | this change | improvement |
|----------------------+------+------------+-------------+-------------|
| 4K page x86_64 [1]   | 9000 | 26.30 Mpps | 30.45 Mpps  | 15.80 %     |
| 64K page aarch64 [2] | 9000 | 15.27 Mpps | 20.10 Mpps  | 31.62 %     |
+----------------------------------------------------------------------+

This series does not cover the xsk (AF_XDP) paths for 64K page systems.

[1] https://lore.kernel.org/all/20260324024235.929875-1-kuba@kernel.org/
====================

Link: https://patch.msgid.link/20260403090927.139042-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 13:34:08 +02:00
Dragos Tatulea
25b8c9b6d7 net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode
Currently in XDP multi-buffer mode for striding rq a whole page is
allocated for the linear part of the XDP buffer. This is wasteful,
especially on systems with larger page sizes.

This change splits the page into fixed sized fragments. The page is
replenished when the maximum number of allowed fragments is reached.
When a fragment is not used, it will be simply recycled on next packet.
This is great for XDP_DROP as the fragment can be recycled for the next
packet. In the most extreme case (XDP_DROP everything), there will be 0
fragments used => only one linear page allocation for the lifetime of
the XDP program.

The previous page_pool size increase was too conservative (doubling the
size) and now there are much fewer allocations (1/8 for a 4K page). So
drop the page_pool size extension altogether when the linear side page
is used.

This small improvement is at most visible for XDP_DROP tests with small
64B packets and a large enough MTU for Striding RQ to be in non-linear
mode:
+----------------------------------------------------------------------+
| System               | MTU  | baseline   | this change | improvement |
|----------------------+------+------------+-------------+-------------|
| 4K page x86_64 [1]   | 9000 | 26.30 Mpps | 30.45 Mpps  | 15.80 %     |
| 64K page aarch64 [2] | 9000 | 15.27 Mpps | 20.10 Mpps  | 31.62 %     |
+----------------------------------------------------------------------+

[1] Intel Xeon Platinum 8580
[2] ARM Neoverse-N1

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260403090927.139042-6-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 13:34:04 +02:00
Dragos Tatulea
ebd4ad29cc net/mlx5e: XDP, Use a single linear page per rq
Currently in striding rq there is one mlx5e_frag_page member per WQE for
the linear page. This linear page is used only in XDP multi-buffer mode.
This is wasteful because only one linear page is needed per rq: the page
gets refreshed on every packet, regardless of WQE. Furthermore, it is
not needed in other modes (non-XDP, XDP single-buffer).

This change moves the linear page into its own structure (struct
mlx5_mpw_linear_info) and allocates it only when necessary.

A special structure is created because an upcoming patch will extend
this structure to support fragmentation of the linear page.

This patch has no functional changes.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260403090927.139042-5-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 13:34:04 +02:00
Dragos Tatulea
2dfaa02387 net/mlx5e: XDP, Remove stride size limitation
Currently XDP mode always uses PAGE_SIZE strides. This limitation
existed because page fragment counting was not implemented when XDP was
added. Furthermore, due to this limitation there were other issues as
well on system with larger pages (e.g. 64K):

- XDP for Striding RQ was effectively disabled on such systems.

- Legacy RQ allows the configuration but uses a fixed scheme of one XDP
  buffer per page which is inefficient.

As fragment counting was added during the driver conversion to
page_pool and the support for XDP multi-buffer, it is now possible
to remove this stride size limitation. This patch does just that.

Now it is possible to use XDP on systems with higher page sizes (e.g.
64K):

- For Striding RQ, loading the program is no longer blocked.
  Although a 64K page can fit any packet, MTUs that result in
  stride > 8K will still make the RQ in non-linear mode. That's
  because the HW doesn't support a higher than 8K stride.

- For Legacy RQ, the stride size was PAGE_SIZE which was very
  inefficient. Now the stride size will be calculated relative to MTU.
  Legacy RQ will always be in linear mode for larger system pages.

  This can be observed with an XDP_DROP test [1] when running
  in Legacy RQ mode on a ARM Neoverse-N1 system with a 64K
  page size:
  +-----------------------------------------------+
  | MTU  | baseline   | this change | improvement |
  |------+------------+-------------+-------------|
  | 1500 | 15.55 Mpps | 18.99 Mpps  | 22.0 %      |
  | 9000 | 15.53 Mpps | 18.24 Mpps  | 17.5 %      |
  +-----------------------------------------------+

There are performance benefits for Striding RQ mode as well:

- Striding RQ non-linear mode now uses 256B strides, just like
  non-XDP mode.

- Striding RQ linear mode can now fit a number of XDP buffers per page
  that is relative to the MTU size. That means that on 4K page systems
  and a small enough MTU, 2 XDP buffers can fit in one page.

The above benefits for Striding RQ can be observed with an
XDP_DROP test [1] when running on a 4K page x86_64 system
(Intel Xeon Platinum 8580):
  +-----------------------------------------------+
  | MTU  | baseline   | this change | improvement |
  |------+------------+-------------+-------------|
  | 1000 | 28.36 Mpps | 33.98 Mpps  | 19.82 %     |
  | 9000 | 20.76 Mpps | 26.30 Mpps  | 26.70 %     |
  +-----------------------------------------------+

[1] Test description:
- xdp-bench with XDP_DROP
- RX: single queue
- TX: sends 64B packets to saturate CPU on RX side

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260403090927.139042-4-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 13:34:04 +02:00
Dragos Tatulea
833e72645a net/mlx5e: XDP, Improve dma address calculation of linear part for XDP_TX
When calculating the dma address of the linear part of an XDP frame, the
formula assumes that there is a single XDP buffer per page. Extend the
formula to allow multiple XDP buffers per page by calculating the data
offset in the page.

This is a preparation for the upcoming removal of a single XDP buffer
per page limitation when the formula will no longer be correct.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260403090927.139042-3-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 13:34:04 +02:00
Dragos Tatulea
1047e14b44 net/mlx5e: XSK, Increase size for chunk_size param
When 64K pages are used, chunk_size can take the 64K value
which doesn't fit in u16. This results in overflows that
are detected in mlx5e_mpwrq_log_wqe_sz().

Increase the type to u32 to fix this.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260403090927.139042-2-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 13:34:04 +02:00
Qingfang Deng
dfecb0c5af selftests: net: add tests for PPP
Add ping and iperf3 tests for ppp_async.c and pppoe.c.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260403034908.30017-1-qingfang.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 12:08:46 +02:00
Eric Biggers
05d42dc8ab xfrm: Drop support for HMAC-RIPEMD-160
Drop support for HMAC-RIPEMD-160 from IPsec to reduce the UAPI surface
and simplify future maintenance.  It's almost certainly unused.

RIPEMD-160 received some attention in the early 2000s when SHA-* weren't
quite as well established.  But it never received much adoption outside
of certain niches such as Bitcoin.

It's actually unclear that Linux + IPsec + HMAC-RIPEMD-160 has *ever*
been used, even historically.  When support for it was added in 2003, it
was done so in a "cleanup" commit without any justification [1].  It
didn't actually work until someone happened to fix it 5 years later [2].
That person didn't use or test it either [3].  Finally, also note that
"hmac(rmd160)" is by far the slowest of the algorithms in aalg_list[].

Of course, today IPsec is usually used with an AEAD, such as AES-GCM.
But even for IPsec users still using a dedicated auth algorithm, they
almost certainly aren't using, and shouldn't use, HMAC-RIPEMD-160.

Thus, let's just drop support for it.  Note: no kconfig update is
needed, since CRYPTO_RMD160 wasn't actually being selected anyway.

References:
  [1] linux-history commit d462985fc1941a47
      ("[IPSEC]: Clean up key manager algorithm handling.")
  [2] linux commit a13366c632
      ("xfrm: xfrm_algo: correct usage of RIPEMD-160")
  [3] https://lore.kernel.org/all/1212340578-15574-1-git-send-email-rueegsegger@swiss-it.ch

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-07 10:47:58 +02:00
Jakub Kicinski
c149d90e26 Merge branch 'mptcp-support-msg_eor-and-small-cleanups'
Matthieu Baerts says:

====================
mptcp: support MSG_EOR and small cleanups

This series contains various unrelated patches:

- Patches 1 & 2: support MSG_EOR instead of ignoring it.

- Patch 3: avoid duplicated code in TCP and MPTCP by using a new helper.

- Patch 4: adapt test to reproduce bug and increase code coverage.
====================

Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-0-b0b33bea3fed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:14:31 -07:00
Matthieu Baerts (NGI0)
c4a5cb2f00 selftests: mptcp: join: recreate signal endp with same ID
In this "delete re-add signal" MPTCP Join subtest, the endpoint linked
to the initial subflow is removed, but readded once with different ID.

It appears that there was an issue when reusing the same ID, recently
fixed by commit d191101dee ("mptcp: pm: in-kernel: always set ID as
avail when rm endp"). The test then now reuses the same ID the first
time, but continue to use another one (88) the second time.

This should then cover more cases.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/615
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-5-b0b33bea3fed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:14:30 -07:00
Geliang Tang
eb477fdd68 tcp: add recv_should_stop helper
Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked()
and tcp_splice_read() to check whether to stop receiving. And use this
helper in mptcp_recvmsg() and mptcp_splice_read() to reduce redundant code.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-3-b0b33bea3fed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:14:27 -07:00
Gang Yan
7fb2f5f964 mptcp: preserve MSG_EOR semantics in sendmsg path
Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
which marks the end of a record for application-level message boundaries.

Data fragments tagged with MSG_EOR are explicitly marked in the
mptcp_data_frag structure and skb context to prevent unintended
coalescing with subsequent data chunks. This ensures the intent of
applications using MSG_EOR is preserved across MPTCP subflows,
maintaining consistent message segmentation behavior.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-2-b0b33bea3fed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:14:26 -07:00
Gang Yan
00d46be3c3 mptcp: reduce 'overhead' from u16 to u8
The 'overhead' in struct mptcp_data_frag can safely use u8, as it
represents 'alignment + sizeof(mptcp_data_frag)'. With a maximum
alignment of 7('ALIGN(1, sizeof(long)) - 1'), the overhead is at most
47, well below U8_MAX and validated with BUILD_BUG_ON().

This patch also adds a field named 'unused' for further extensions.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-1-b0b33bea3fed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:14:26 -07:00
Arnd Bergmann
ede3136e56 dpaa2: avoid linking objects into multiple modules
Each object file contains information about which module it gets linked
into, so linking the same file into multiple modules now causes a warning:

scripts/Makefile.build:254: drivers/net/ethernet/freescale/dpaa2/Makefile: dpaa2-mac.o is added to multiple modules: fsl-dpaa2-eth fsl-dpaa2-switch
scripts/Makefile.build:254: drivers/net/ethernet/freescale/dpaa2/Makefile: dpmac.o is added to multiple modules: fsl-dpaa2-eth fsl-dpaa2-switch

Change the way that dpaa2 is built by moving the two common files into a
separate module with exported symbols instead.

Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260402184726.3746487-3-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:03:49 -07:00
Arnd Bergmann
df75bd552a net: ethernet: ti-cpsw: fix linking built-in code to modules
There are six variants of the cpsw driver, sharing various parts of
the code: davinci-emac, cpsw, cpsw-switchdev, netcp, netcp_ethss and
am65-cpsw-nuss.

I noticed that this means some files can be linked into more than
one loadable module, or even part of vmlinux but also linked into
a loadable module, both of which mess up assumptions of the build
system, and causes warnings:

scripts/Makefile.build:279: cpsw_ale.o is added to multiple modules: ti-am65-cpsw-nuss ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_priv.o is added to multiple modules: ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_sl.o is added to multiple modules: ti-am65-cpsw-nuss ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_ethtool.o is added to multiple modules: ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: davinci_cpdma.o is added to multiple modules: ti_cpsw ti_cpsw_new ti_davinci_emac

Change this back to having separate modules for each portion that
can be linked standalone, exporting symbols as needed:

 - ti-cpsw-common.ko now contains both cpsw-common.o and
   davinci_cpdma.o as they are always used together

 - ti-cpsw-priv.ko contains cpsw_priv.o, cpsw_sl.o and cpsw_ethtool.o,
   which are the core of the cpsw and cpsw-new drivers.

 - ti-cpsw-sl.ko contains the cpsw-sl.o object and is used on
   ti-am65-cpsw-nuss.ko in addition to the two other cpsw variants.

 - ti-cpsw-ale.o is the one standalone module that is used by all
   except davinci_emac.

Each of these will be built-in if any of its users are built-in, otherwise
it's a loadable module if there is at least one module using it. I did
not bring back the separate Kconfig symbols for this, but just handle
it using Makefile logic.

Note: ideally this is something that Kbuild complains about, but usually
we just notice when something using THIS_MODULE misbehaves in a way that
a user notices.

Fixes: 99f6297182 ("net: ethernet: ti: cpsw: drop TI_DAVINCI_CPDMA config option")
Link: https://lore.kernel.org/lkml/20240417084400.3034104-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260402184726.3746487-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:03:49 -07:00
Arnd Bergmann
961f3c5356 net: ethernet: ti-cpsw:: rename soft_reset() function
While looking at the glob symbols shared between the cpsw drivers,
I noticed that soft_reset() is the only one that is missing a proper
namespace prefix, and will pollute the kernel namespace, so rename
it to be consistent with the other symbols.

Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260402184726.3746487-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:03:46 -07:00
Jakub Kicinski
e6b7e1a10c eth: remove the driver for acenic / tigon1&2
The entire git history for this driver looks like tree-wide
and automated cleanups. There's even more coming now with
AI, so let's try to delete it instead.

Acked-by: Jes Sorensen <jes@trained-monkey.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20260403220501.2263835-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:52:27 -07:00
Kevin Hao
c321b5676d net: macb: Use netif_napi_add_tx() instead of netif_napi_add() for TX NAPI
The TX NAPI should be registered via netif_napi_add_tx() to avoid
unnecessarily polluting the napi_hash table.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://patch.msgid.link/20260403-macb-napi-tx-v1-1-08126a60c65e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:51:57 -07:00
Jakub Kicinski
646dbda284 Merge branch 'nfc-support-for-five-qualcomm-sdm845-phones'
David Heidelberg says:

====================
NFC support for five Qualcomm SDM845 phones

- OnePlus 6 / 6T
 - Pixel 3 / 3 XL
 - SHIFT 6MQ

Verified with NFC card using neard:

systemctl enable --now neard
nfctool --device nfc0 -1
nfctool -d nfc0 -p
gdbus introspect --system --dest org.neard --object-path /org/neard/nfc0/tag0/record0

or use gNFC:
  https://gitlab.gnome.org/dh/gnfc/

successfully detecting and reading a tag.
====================

Link: https://patch.msgid.link/20260403-oneplus-nfc-v3-0-fbdce57d63c1@ixit.cz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:50:51 -07:00
David Heidelberg
e72058a4be dt-bindings: nfc: nxp,nci: Document PN557 compatible
The PN557 uses the same hardware as the PN553 but ships with
firmware compliant with NCI 2.0.

Document PN557 as a compatible device.

Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260403-oneplus-nfc-v3-1-fbdce57d63c1@ixit.cz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:50:46 -07:00
Yue Haibing
2f60df9e61 ip6_tunnel: use generic for_each_ip_tunnel_rcu macro
Remove the locally defined for_each_ip6_tunnel_rcu macro and use
the generic for_each_ip_tunnel_rcu from linux/if_tunnel.h instead.

This eliminates code duplication and ensures consistency across
the kernel tunnel implementations.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260403084619.4107978-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:41:03 -07:00
Jason Xing
8a4e3ab61d net: advance skb_defer_disable_key check in napi_consume_skb
When net.core.skb_defer_max is adjusted to zero, napi_consume_skb()
shouldn't go into that deeper in skb_attempt_defer_free() because it adds
an additional pair of local_bh_enable/disable() which is evidently not
needed. Advancing the check of the static key saves more cycles and
benefits non defer case.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20260402034114.65766-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:32:04 -07:00
Jakub Kicinski
1ef05ed263 Merge branch 'net-dsa-mxl862xx-add-support-for-bridge-offloading'
Daniel Golle says:

====================
net: dsa: mxl862xx: add support for bridge offloading

As a next step to complete the mxl862xx DSA driver, add support for
offloading forwarding between bridged ports to the switch hardware.

This works pretty much without any big surprises, apart from two
subtleties:
 * per-port control over flooding behavior has to be implemented by
   (ab)using a 0-rate QoS meters as stopper in lack of any better
   option.
 * STP state transition unconditionally enables learning on a port
   even if it was previously explicitely disabled (a firmware bug)

Note that as the driver is still lacking all VLAN features (which
are going to be added next), at this point some of the
bridge_vlan_aware.sh tests are failing after applying this series.

This is expected and cannot be avoided without implementing
port_vlan_filtering + port_vlan_add/del. And adding both bridge and
VLAN offloading at the same time would be too much for anyone to
review, so VLAN support is going to be submitted in a follow-up
series immediately after this series has been accepted.

All other relevant selftests (including bridge_vlan_unaware.sh) are
still passing.

Inspired by the comments received from Paolo Abeni as reply to v5
the driver now no longer caches bridge port membership in the
driver, but instead imports an existing helper from yt921x.c to dsa.h
in order to allow the driver to easily iterate over bridge members.
The mapping between DSA bridge num and firmware bridge ID is done
using a simple fixed-size array in mxl862xx_priv.
====================

Link: https://patch.msgid.link/cover.1775049897.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:30:35 -07:00
Daniel Golle
340bdf9846 net: dsa: mxl862xx: implement bridge offloading
Implement joining and leaving bridges as well as add, delete and dump
operations on isolated FDBs, port MDB membership management, and
setting a port's STP state.

The switch supports a maximum of 63 bridges, however, up to 12 may
be used as "single-port bridges" to isolate standalone ports.
Allowing up to 48 bridges to be offloaded seems more than enough on
that hardware, hence that is set as max_num_bridges.

A total of 128 bridge ports are supported in the bridge portmap, and
virtual bridge ports have to be used eg. for link-aggregation, hence
potentially exceeding the number of hardware ports.

The firmware-assigned bridge identifier (FID) for each offloaded bridge
is stored in an array used to map DSA bridge num to firmware bridge ID,
avoiding the need for a driver-private bridge tracking structure.
Bridge member portmaps are rebuilt on join/leave using
dsa_switch_for_each_bridge_member().

As there are now more users of the BRIDGEPORT_CONFIG_SET API and the
state of each port is cached locally, introduce a helper function
mxl862xx_set_bridge_port(struct dsa_switch *ds, int port) which
applies the cached per-port state to hardware. For standalone user
ports (dp->bridge == NULL), it additionally resets the port to
single-port bridge state: CPU-only portmap, learning and flooding
disabled. The CPU port path sets its state explicitly before calling
this helper and is therefore not affected by the reset.

Note that MASK_VLAN_BASED_MAC_LEARNING is intentionally absent from
the firmware write mask. After mxl862xx_reset(), the firmware
initialises all VLAN-based MAC learning fields to 0 (disabled), so
SVL is the active mode by default without having to set it explicitly.

Note that there is no convenient way to control flooding on per-port
level, so the driver is using a 0-rate QoS meter setup as a stopper in
lack of any better option. In order to be perfect the firmware-enforced
minimum bucket size is bypassed by directly writing 0s to the relevant
registers -- without that at least one 64-byte packet could still
pass before the meter would change from 'yellow' into 'red' state.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/dd079180e2098e5f9626fcd149b9bad9a1b5a1b2.1775049897.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:30:33 -07:00
Daniel Golle
4250ff1640 dsa: tag_mxl862xx: set dsa_default_offload_fwd_mark()
The MxL862xx offloads bridge forwarding in hardware, so set
dsa_default_offload_fwd_mark() to avoid duplicate forwarding of
packets of (eg. flooded) frames arriving at the CPU port.

Link-local frames are directly trapped to the CPU port only, so don't
set dsa_default_offload_fwd_mark() on those.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/e1161c90894ddc519c57dc0224b3a0f6bfa1d2d6.1775049897.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:30:33 -07:00
Daniel Golle
f259e08494 net: dsa: add bridge member iteration macro
Drivers that offload bridges need to iterate over the ports that are
members of a given bridge, for example to rebuild per-port forwarding
bitmaps when membership changes. Currently drivers typically open-code
this by combining dsa_switch_for_each_user_port() with a
dsa_port_offloads_bridge_dev() check, or cache bridge membership
within the driver.

Add dsa_switch_for_each_bridge_member() macro to express this pattern
directly, and use it for the existing dsa_bridge_ports() inline
helper.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/e7136aaa26773f39e805a00fe4ecf13cd2b83fc0.1775049897.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:30:33 -07:00
Daniel Golle
b0a79590d1 net: dsa: move dsa_bridge_ports() helper to dsa.h
The yt921x driver contains a helper to create a bitmap of ports
which are members of a bridge.

Move the helper as static inline function into dsa.h, so other driver
can make use of it as well.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/4f8bbfce3e4e3a02064fc4dc366263136c6e0383.1775049897.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:30:33 -07:00
Laurence Rowe
98f28d8d6e vsock: avoid timeout for non-blocking accept() with empty backlog
A common pattern in epoll network servers is to eagerly accept all
pending connections from the non-blocking listening socket after
epoll_wait indicates the socket is ready by calling accept in a loop
until EAGAIN is returned indicating that the backlog is empty.

Scheduling a timeout for a non-blocking accept with an empty backlog
meant AF_VSOCK sockets used by epoll network servers incurred hundreds
of microseconds of additional latency per accept loop compared to
AF_INET or AF_UNIX sockets.

Signed-off-by: Laurence Rowe <laurencerowe@gmail.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260402204918.130395-1-laurencerowe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:29:01 -07:00
Daniel Zahka
c8eee00c0f psp: add missing device stats to get-stats reply attributes
Commit f05d26198c ("psp: add stats from psp spec to driver facing
api") added device statistics (rx-packets, rx-bytes, rx-auth-fail,
rx-error, rx-bad, tx-packets, tx-bytes, tx-error) to the stats
attribute-set but did not add them to the get-stats operation reply
attributes. The kernel reports these attributes in the reply, so
list them in the spec to match.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260403-psp-yaml-fix-v1-1-dacee0663903@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:12:34 -07:00