Commit Graph

1384564 Commits

Author SHA1 Message Date
Breno Leitao
b34df17d58 net: netpoll: remove unused netpoll pointer from netpoll_info
The netpoll_info structure contains an useless pointer back to its
associated netpoll. This field is never used, and the assignment in
__netpoll_setup() is does not comtemplate multiple instances, as
reported by Jay[1].

Drop both the member and its initialization to simplify the structure.

Link: https://lore.kernel.org/all/2930648.1757463506@famine/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250918-netpoll_jv-v1-1-67d50eeb2c26@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:50:59 -07:00
Jakub Kicinski
4d3c5db44c Merge branch 'net-ipv4-some-drop-reason-cleanup-and-improvements'
Antoine Tenart says:

====================
net: ipv4: some drop reason cleanup and improvements

A few patches that were laying around cleaning up and improving drop
reasons in net/ipv4.
====================

Link: https://patch.msgid.link/20250915091958.15382-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:35:54 -07:00
Antoine Tenart
9e1e2f4ebf net: ipv4: convert ip_rcv_options to drop reasons
This converts the only path not returning drop reasons in
ip_rcv_finish_core.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250915091958.15382-4-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:35:52 -07:00
Antoine Tenart
dcc0e68ed3 net: ipv4: simplify drop reason handling in ip_rcv_finish_core
Instead of setting the drop reason to SKB_DROP_REASON_NOT_SPECIFIED
early and having to reset it each time it is overridden by a function
returned value, just set the drop reason to the expected value before
returning from ip_rcv_finish_core.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250915091958.15382-3-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:35:51 -07:00
Antoine Tenart
1c7e4a6185 net: ipv4: make udp_v4_early_demux explicitly return drop reason
udp_v4_early_demux already returns drop reasons as it either returns 0
or ip_mc_validate_source, which itself returns drop reasons. Its return
value is also already used as a drop reason itself.

Makes this explicit by making it return drop reasons.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250915091958.15382-2-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:35:51 -07:00
Alasdair McWilliam
b73b8146d7 rtnetlink: add needed_{head,tail}room attributes
Various network interface types make use of needed_{head,tail}room values
to efficiently reserve buffer space for additional encapsulation headers,
such as VXLAN, Geneve, IPSec, etc. However, it is not currently possible
to query these values in a generic way.

Introduce ability to query the needed_{head,tail}room values of a network
device via rtnetlink, such that applications that may wish to use these
values can do so.

For example, Cilium agent iterates over present devices based on user config
(direct routing, vxlan, geneve, wireguard etc.) and in future will configure
netkit in order to expose the needed_{head,tail}room into K8s pods. See
b9ed315d3c ("netkit: Allow for configuring needed_{head,tail}room").

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alasdair McWilliam <alasdair@mcwilliam.dev>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20250917095543.14039-1-alasdair@mcwilliam.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:21:55 -07:00
Jakub Kicinski
0c2a4d304c Merge branch 'net-stmmac-remove-mac_interface'
Russell King says:

====================
net: stmmac: remove mac_interface

The dwmac core supports a range of interfaces, but when it comes to
SerDes interfaces, the core itself does not include the SerDes block.
Consequently, it has to provide an interface suitable to interface such
a block to, and uses TBI for this.

The driver also uses "PCS" for RGMII, even though the dwmac PCS block
is not present for this interface type - it was a convenice for the
code structure as RGMII includes inband signalling of the PHY state,
much like Cisco SGMII does at a high level.

As such, the code refers to RGMII and SGMII modes for the PCS, and
there used to be STMMAC_PCS_TBI and STMMAC_PCS_RTBI constants as well
but these were never set, although they were used in the code.

The selection of the PCS mode was from mac_interface. Thus, it seems
that the original intention was for mac_interface to describe the
interface mode used within the dwmac core, and phy_interface to
describe the external world-accessible interface (e.g. which would
connect to a PHY or SFP cage.)

It appears that many glue drivers misinterpret these. A good exmple
is socfpga. This supports SGMII and 1000BASE-X, but does not include
the dwmac PCS, relying on the Lynx PCS instead. However, it makes use
of mac_interface to configure the dwmac core to its GMII/MII mode.

So, when operating in either of these modes, the dwmac is configured
for GMII mode to communicate with the Lynx PCS which then provides
the SGMII or 1000BASE-X interface mode to the external world.

Given that phy_interface is the external world interface, and
mac_interface is the dwmac core interface, selecting the interface
mode based on mac_interface being 1000BASEX makes no sense.

Moreover, mac_interface is initialised by the stmmac core code. If
the "mac-mode" property is set in DT, this will be used. Otherwise,
it will reflect the "phy-mode" property - meaning that it defaults
to phy_interface. As no in-kernel DT makes reference to a "mac-mode"
property, we can assume that for all in-kernel platforms, these two
interface variables are the same. The exception are those platform
glues which I reviwed and suggested they use phy_interface, setting
mac_interface to PHY_INTERFACE_MODE_NA.

The conclusion to all of this is that mac_interface serves no useful
purpose, and causes confusion as the platform glue code doesn't seem
to know which one should be used.

Thus, let's get rid of mac_interface, which makes all this code more
understandable. It also helps to untangle some of the questions such
as:
- should this be using the interface passed from phylink
- should we set the range of phylink supported interfaces to be
  more than one
- when we get phylink PCS support for the dwmac PCS, should we be
  selecting it based on mac_interface or phy_interface, and how
  should we populate the PCS' supported_interface bitmap.

Having converted socfpga to use phy_interface, this turns out to
feel like "the right way" to do this - convert the external world
"phy_interface" to whatever phy_intf_sel value that the dwmac core
needs to achieve the connection to whatever hardware blocks are
integrated inside the SoC to achieve the requested external world
interface.

As an illustration why - consider that in the case of socfpga, it
_could_ have been implemented such that the dwmac PCS was used for
SGMII, and the Lynx PCS for 1000BASE-X, or vice versa. Only the
platform glue would know which it is.

I will also note that several cores provide their currently configured
interface mode via the ACTPHYIF field of Hardware Feature 0, and thus
can be read back in the platform-independent parts of the driver to
decide whether the internal PCS or the RGMII (or actually SMII) "PCS"
should be used.

This is hot-off-the-press, and has only been build tested. As I have
none of these platforms, this series has not been run-tested, so
please test on your hardware.
====================

Link: https://patch.msgid.link/aMrPpc8oRxqGtVPJ@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:47 -07:00
Russell King (Oracle)
6b0ed6a3a8 net: stmmac: remove mac_interface
mac_interface has served little purpose, and has only caused confusion.
Now that we have cleaned up all platform glue drivers which should not
have been using mac_interface, there are no users remaining. Remove
mac_interface.

This results in the special dwmac specific "mac-mode" DT property
becoming redundant, and an in case, no DTS files in the kernel make use
of this property. Add a warning if the property is set, and it is
different from the "phy-mode".

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Link: https://patch.msgid.link/E1uytpv-00000006H2x-196h@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:45 -07:00
Russell King (Oracle)
3a94ecdf1a net: stmmac: thead: convert to use phy_interface
dwmac-thead supports either MII or RGMII interface modes only.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-thead to use phy_interface when determining the
interface mode rather than mac_interface.

Also convert the error prints to use phy_modes() so that we get a
meaningful string rather than a number for the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpq-00000006H2q-0ajY@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:45 -07:00
Russell King (Oracle)
0fe080fa88 net: stmmac: sun8i: convert to use phy_interface
dwmac-sun8i supports MII, RMII and RGMII interface modes only. It
is unclear whether the dwmac core interface is different from the
one presented to the outside world.

However, as none of the DTS files set "mac-mode", mac_interface will
be identical to phy_interface.

Convert dwmac-sun8i to use phy_interface when determining the
interface mode rather than mac_interface.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Link: https://patch.msgid.link/E1uytpl-00000006H2k-08pH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:45 -07:00
Russell King (Oracle)
0ca60c26f6 net: stmmac: stm32: convert to use phy_interface
dwmac-stm32 supports MII, RMII, GMII and RGMII interface modes,
selecting the dwmac core interface mode via bits 23:21 of the
SYSCFG register. The bit combinations are identical to the
dwmac phy_intf_sel_i signals.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-stm32 to use phy_interface when determining the
interface mode rather than mac_interface.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpf-00000006H2c-3hiU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:45 -07:00
Russell King (Oracle)
6cb2b69c34 net: stmmac: starfive: convert to use phy_interface
dwmac-starfive uses RMII or RGMII interface modes without any PCS,
and selects the dwmac core accordingly using a register field with
the same bit encoding as the core's phy_intf_sel_i signals.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-starfive to use phy_interface when determining the
interface mode rather than mac_interface. Also convert the error
prints to use phy_modes() so that we get a meaningful string rather
than a number for the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpa-00000006H2X-3GWx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:45 -07:00
Russell King (Oracle)
de696c63c1 net: stmmac: socfpga: convert to use phy_interface
dwmac-socfpga uses MII, RMII, GMII, RGMII, SGMII and 1000BASE-X
interface modes, and supports the Lynx PCS. The Lynx PCS will only be
used for SGMII and 1000BASE-X modes, with the MAC programmed to use
GMII or MII mode to talk to the PCS. This suggests that the Synopsys
optional dwmac PCS is not present.

None of the DTS files set "mac-mode", so mac_interface will be
identical to phy_interface.

Convert dwmac-socfpga to use phy_interface when determining the
interface mode rather than mac_interface.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpV-00000006H2R-2nA6@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:44 -07:00
Russell King (Oracle)
9ff682b4a2 net: stmmac: ingenic: convert to use phy_interface
dwmac-ingenic uses only MII, RMII, GMII or RGMII interface modes, none
of which require any kind of conversion between the MAC and external
world. Thus, mac_interface and phy_interface will be the same.

Convert dwmac-ingenic to use phy_interface when determining the
interface mode that the dwmac core should be configured to at reset,
rather than mac_interface.

Also convert the error prints to use phy_modes() and terminate with a
newline so that we get a human readable string rather than a number for
the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpQ-00000006H2L-2Jzb@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:44 -07:00
Russell King (Oracle)
db1948da68 net: stmmac: imx: convert to use phy_interface
Checking the IMX8MP documentation, there is no requirement for a
separate mac_interface mode definition. As mac_interface and
phy_interface will be the same, use phy_interface internally rather
than mac_interface.

Also convert the error prints to use phy_modes() so that we get a
meaningful string rather than a number for the interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpL-00000006H2F-1o6b@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:44 -07:00
Russell King (Oracle)
0522f152a2 net: stmmac: use phy_interface in stmmac_check_pcs_mode()
In the majority, if not all cases, mac_interface and phy_interface
are the same with the exception of some drivers that I have suggested
only use phy_interface and set mac_interface to PHY_INTERFACE_MODE_NA.

The only two that currently set mac_interface to PHY_INTERFACE_MODE_NA
are dwmac-loongson and dwmac-lpc18xx, neither of which use RGMII nor
SGMII.

In order to phase out the use of mac_interface, we need to have a path
for existing drivers so they can update to only using phy_interface
without causing regressions.

Therefore, in order to keep the "pcs" code working, we need to choose
the STMMAC integrated PCS mode based on phy_interface if mac_interface
is PHY_INTERFACE_MODE_NA.

This will allow more drivers to set mac_interface to
PHY_INTERFACE_MODE_NA without risking regressions.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpG-00000006H29-1Ltk@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:44 -07:00
Russell King (Oracle)
32a8d2a197 net: stmmac: rework mac_interface and phy_interface documentation
Based on new research, it has come to light that the comment that I
added in a014c35556 ("net: stmmac: clarify difference between
"interface" and "phy_interface"") is not fully correct.

Update the comment to properly describe the difference between the two.

All of the DTS files in the kernel tree do not mention the "mac-mode"
property, which results in mac_interface and phy_interface being the
same. Also, none of the platform glue drivers set mac_interface to
anything but PHY_INTERFACE_MODE_NA. This means that for all the
platforms known to mainline, mac_interface is either the same as
phy_interface, or it is PHY_INTERFACE_MODE_NA.

Thus, updating the definition for mac_interface in stmmac.h has no
material effect on current uses known to mainline, but the change opens
the door to cleaning up all uses.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uytpB-00000006H23-0pRi@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:19:44 -07:00
Vlad Dumitrescu
6a46e4faa8 net/mlx5: Remove dead code from total_vfs setter
The mlx5_devlink_total_vfs_set function branches based on per_pf_support
twice. Remove the second branch as the first one exits the function when
per_pf_support is false.

Accidentally added as part of commit a4c49611cf ("net/mlx5: Implement
devlink total_vfs parameter").

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-rdma/aMQWenzpdjhAX4fm@stanley.mountain/
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/a6142a60-1948-439a-b0ae-ff1df26a37f8@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:03:50 -07:00
Daniel Zahka
85c7333c35 psp: clarify checksum behavior of psp_dev_rcv()
psp_dev_rcv() decapsulates psp headers from a received frame. This
will make any csum complete computed by the device inaccurate. Rather
than attempt to patch up skb->csum in psp_dev_rcv() just make it clear
to callers what they can expect regarding checksum complete.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918212723.17495-1-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:02:57 -07:00
Kuniyuki Iwashima
f1bf77491d psp: Fix typo in kdoc for struct psp_dev_caps.assoc_drv_spc.
assoc_drv_spc is the size of psp_assoc.drv_data[].

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250918192539.1587586-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:02:27 -07:00
Jakub Kicinski
c3bef01f0a net: phy: micrel: use %pe in print format
New cocci check complains:

  drivers/net/phy/micrel.c:4308:6-13: WARNING: Consider using %pe to print PTR_ERR()
  drivers/net/phy/micrel.c:5742:6-13: WARNING: Consider using %pe to print PTR_ERR()

Link: https://lore.kernel.org/1758192227-701925-1-git-send-email-tariqt@nvidia.com
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250918183119.2396019-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:01:57 -07:00
Jakub Kicinski
d373176425 Merge branch 'address-miscellaneous-issues-with-psp_sk_get_assoc_rcu'
Daniel Zahka says:

====================
address miscellaneous issues with psp_sk_get_assoc_rcu()

There were a few minor issues with psp_sk_get_assoc_rcu() identified
by Eric in his review of the initial psp series. This series addresses
them.
====================

Link: https://patch.msgid.link/20250918155205.2197603-1-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:01:22 -07:00
Daniel Zahka
28bb24dadd psp: don't use flags for checking sk_state
Using flags to check sk_state only makes sense to check for a subset
of states in parallel e.g. sk_fullsock(). We are not doing that
here. Compare for individual states directly.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918155205.2197603-4-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:01:20 -07:00
Daniel Zahka
803cdb6ddc psp: fix preemptive inet_twsk() cast in psp_sk_get_assoc_rcu()
It is weird to cast to a timewait_sock before checking sk_state, even
if the use is after such a check. Remove the tw local variable, and
use inet_twsk() directly in the timewait branch.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918155205.2197603-3-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:01:20 -07:00
Daniel Zahka
f8d2f8205b psp: make struct sock argument const in psp_sk_get_assoc_rcu()
This function does not need a mutable reference to its argument.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250918155205.2197603-2-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:01:20 -07:00
Eric Dumazet
b02c123010 tcp: prefer sk_skb_reason_drop()
Replace two calls to kfree_skb_reason() with sk_skb_reason_drop().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250918132007.325299-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:00:23 -07:00
Vadim Fedorenko
d3ca2ef0c9 ptp_ocp: make ptp_ocp driver compatible with PTP_EXTTS_REQUEST2
Originally ptp_ocp driver was not strictly checking flags for external
timestamper and was always activating rising edge timestamping as it's
the only supported mode. Recent changes to ptp made it incompatible with
PTP_EXTTS_REQUEST2 ioctl. Adjust ptp_clock_info to provide supported
mode and be compatible with new infra.

While at here remove explicit check of periodic output flags from the
driver and provide supported flags for ptp core to check.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250918131146.651468-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 17:00:06 -07:00
Dan Carpenter
5fc7fa743d net: ti: icssm-prueth: unwind cleanly in probe()
This error handling triggers a Smatch warning:

    drivers/net/ethernet/ti/icssm/icssm_prueth.c:1574 icssm_prueth_probe()
    warn: 'prueth->pru1' is an error pointer or valid

The warning is harmless because the pru_rproc_put() function has an
IS_ERR_OR_NULL() check built in.  However, there is a small bug if
syscon_regmap_lookup_by_phandle() fails.  In that case we should call
of_node_put() on eth0_node and eth1_node.

It's a little bit easier to re-write this code to only free things which
we know have been allocated successfully.

Fixes: 511f6c1ae0 ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://patch.msgid.link/aMvVagz8aBRxMvFn@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 16:58:25 -07:00
Jakub Kicinski
b1e5dfa6d8 Merge branch 'net-mlx5e-support-rss-for-ipsec-offload'
Tariq Toukan says:

====================
net/mlx5e: Support RSS for IPSec offload

The series by Jianbo uses a new firmware feature to identify the inner
protocol of decrypted packets, adding new flow groups and steering rules
to redirect them for proper L4-based RSS. This ensures traffic is spread
across multiple CPU cores.
====================

Link: https://patch.msgid.link/1758179963-649455-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 16:48:40 -07:00
Jianbo Liu
72ed3ebf95 net/mlx5e: Add flow rules for the decrypted ESP packets
The previous commit introduced two new flow groups to enable L4 RSS
for decrypted IPsec traffic. This commit implements the logic to
populate these groups with the necessary steering rules.

The rules are created dynamically whenever the first IPSec offload
rule is configured via the xfrm subsystem and the decryption tables
for RX are created. Each rule matches a specific decrypted traffic
type based on its ip version (or ethertype) and outer/inner
l4_type_ext, directing it to the appropriate L4 RSS-enabled TIR.

The lifecycle of these steering rules is tied directly to the RX
tables. They are deleted when the RX tables are destroyed.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 16:48:37 -07:00
Jianbo Liu
d8693cac22 net/mlx5e: Add flow groups for the packets decrypted by crypto offload
When using IPsec crypto offload, the hardware decrypts the packet
payload but preserves the ESP header. This prevents the standard RSS
mechanism from accessing the inner L4 (TCP/UDP) headers. As a result,
the RSS hash is calculated only on the outer L3 IP headers, causing
all traffic for a given IPsec tunnel to be directed to a single queue,
leading to poor traffic distribution.

Newer firmware introduces the ability to match on l4_type_ext, which
exposes the L4 protocol type following an ESP header. This allows the
driver to create steering rules that can identify the inner protocols
of decrypted packets.

This commit leverages this new capability to improve traffic
distribution. It adds two new flow groups to steer decrypted packets
to dedicated TIRs that was configured to perform RSS on the inner L4
headers.

These groups are inserted after the standard L4 group and before the
group that handles undecrypted ESP packets added in this series. The
first new group matches decrypted packets based on the outer IP
version (or ethertype) and l4_type_ext. The second new group matches
decrypted tunneled packets based on the inner IP version and
l4_type_ext. Eight new traffic types are also defined to support this
functionality.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 16:48:36 -07:00
Jianbo Liu
c69ac57199 net/mlx5e: Recirculate decrypted packets into TTC table
In the commit 5e46634529 ("net/mlx5e: IPsec: Add IPsec steering in
local NIC RX"), the decrypted packets are handled in RX error flow
table. There is only one rule in the table, which forwards packets to
the default ESP TIR.

This patch updates the design to allow RSS after decryption. For ESP
traffic, SPI and IP addresses are the fields selected for RSS hash,
and it's common that only one SPI is configured in RX direction, so
RSS can't work properly as all the packets are hashed to one key in
this case. To take advantage of RSS and improve performance, the
decrypted packets need to be forwarded back to TTC table, where RSS
can work based on the decrypted packet types.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 16:48:36 -07:00
Jianbo Liu
9f24f0c4d4 net/mlx5: Change TTC rules to match on undecrypted ESP packets
The TTC (Traffic Type Classifier) table classifies the traffic and
steers packet to TIRs, where RSS works based on the hash calculated
from the selected packet fields. For AH/ESP packets, SPI and IP
addresses are the fields used to calculate the hash value for RSS. So,
it's hard to distribute packets to different receiving queues as there
is usually only one SPI in that direction.

IPSec hardware offloads, crypto offload and full (packet) offload were
introduced later. For crypto offload, hardware does encryption,
decryption and authentication, kernel does the others. Kernel always
sends/receives formatted ESP packets with plaintext data instead of
the ciphertext data, all other fields are unmodified. For full
offload, hardware will take care of almost everything, kernel just
sends/receives packets without any IPSec headers.

Currently, all packets with ESP protocols are forwarded to IPSec
offload tables if IPSec rules are configured. In a downstream patch,
the decrypted packets will be recirculated to TTC table, in order to
use RSS, which does the hash on L4 fields after IPSec headers are
stripped by full offload. So those packets handled by crypto offload
must filtered out, as they still have the ESP headers, but apparently
no need to be decrypted again. To do that, ipsec_next_header is added
for the packet matching, as it is valid only after passing through
IPSec decryption.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758179963-649455-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 16:48:36 -07:00
Jakub Kicinski
3fb4f35a75 wan: framer: pef2256: use %pe in print format
New cocci check complains:

  drivers/net/wan/framer/pef2256/pef2256.c:733:3-10: WARNING: Consider using %pe to print PTR_ERR()

Link: https://lore.kernel.org/1758192227-701925-1-git-send-email-tariqt@nvidia.com
Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://patch.msgid.link/20250918134637.2226614-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 07:27:24 -07:00
Lorenzo Bianconi
e156dd6b85 net: airoha: Fix PPE_IP_PROTO_CHK register definitions
Fix typo in PPE_IP_PROTO_CHK_IPV4_MASK and PPE_IP_PROTO_CHK_IPV6_MASK
register mask definitions. This is not a real problem since this
register is not actually used in the current codebase.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 07:09:18 -07:00
ChunHao Lin
bf7154ffb1 r8169: set EEE speed down ratio to 1
EEE speed down means speed down MAC MCU clock. It is not from spec.
It is kind of Realtek specific power saving feature. But enable it
may cause some issues, like packet drop or interrupt loss. Different
hardware may have different issues.

EEE speed down ratio (mac ocp 0xe056[7:4]) is used to set EEE speed
down rate. The larger this value is, the more power can save. But it
actually save less power then we expected. And, as mentioned above,
will impact compatibility. So set it to 1 (mac ocp 0xe056[7:4] = 0)
, which means not to speed down, to improve compatibility.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250918023425.3463-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 07:07:17 -07:00
Heiner Kallweit
a346e48c17 net: dsa: dsa_loop: remove duplicated definition of NUM_FIXED_PHYS
Remove duplicated definition of NUM_FIXED_PHYS. This was a leftover from
41357bc7b9 ("net: dsa: dsa_loop: remove usage of mdio_board_info").

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/67a3b7df-c967-4431-86b6-a836dc46a4ef@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 07:06:42 -07:00
Matthieu Baerts (NGI0)
833d4313bc mptcp: reset blackhole on success with non-loopback ifaces
When a first MPTCP connection gets successfully established after a
blackhole period, 'active_disable_times' was supposed to be reset when
this connection was done via any non-loopback interfaces.

Unfortunately, the opposite condition was checked: only reset when the
connection was established via a loopback interface. Fixing this by
simply looking at the opposite.

This is similar to what is done with TCP FastOpen, see
tcp_fastopen_active_disable_ofo_check().

This patch is a follow-up of a previous discussion linked to commit
893c49a78d ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in
mptcp_active_enable()."), see [1].

Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/4209a283-8822-47bd-95b7-87e96d9b7ea3@kernel.org [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250918-net-next-mptcp-blackhole-reset-loopback-v1-1-bf5818326639@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 07:06:19 -07:00
Dan Carpenter
c4bdef8b3d hinic3: Fix NULL vs IS_ERR() check in hinic3_alloc_rxqs_res()
The page_pool_create() function never returns NULL, it returns
error pointers.  Update the check to match.

Fixes: 73f37a7e19 ("hinic3: Queue pair resource initialization")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/aMvUywhgbmO1kH3Z@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 07:05:51 -07:00
Eric Dumazet
17f1b7711e psp: do not use sk_dst_get() in psp_dev_get_for_sock()
Use __sk_dst_get() and dst_dev_rcu(), because dst->dev could
be changed under us.

Fixes: 6b46ca260e ("net: psp: add socket security association code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250918115238.237475-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19 07:05:26 -07:00
Daniel Machon
315f423be0 net: sparx5/lan969x: Add support for ethtool pause parameters
Implement get_pauseparam() and set_pauseparam() ethtool operations for
Sparx5 ports.  This allows users to query and configure IEEE 802.3x
pause frame settings via:

ethtool -a ethX
ethtool -A ethX rx on|off tx on|off autoneg on|off

The driver delegates pause parameter handling to phylink through
phylink_ethtool_get_pauseparam() and phylink_ethtool_set_pauseparam().

The underlying configuration of pause frame generation and reception is
already implemented in the driver; this patch only wires it up to the
standard ethtool interface, making the feature accessible to userspace.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250917-802-3x-pause-v1-1-3d1565a68a96@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 15:55:54 -07:00
Robert Marko
6287982aa5 net: ethernet: microchip: sparx5: make it selectable for ARCH_LAN969X
LAN969x switchdev support depends on the SparX-5 core,so make it selectable
for ARCH_LAN969X.

Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20250917110106.55219-1-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 15:53:47 -07:00
Horatiu Vultur
5a26346e62 net: phy: micrel: Add Fast link failure support for lan8842
Add support for fast link failure for lan8842, when this is enabled the
PHY will detect link down immediately (~1ms). The disadvantage of this
is that also small instability might be reported as link down.
Therefore add this feature as a tunable configuration and the user will
know when to enable or not. By default it is not enabled.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250917104630.3931969-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 15:50:28 -07:00
Jakub Kicinski
38c5b9c38b Merge tag 'mlx5-next-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:

====================
mlx5-next updates 2025-09-17

This series by Carolina contains cleanups significantly touching shared
mlx5 net and rdma headers.

* tag 'mlx5-next-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5e: Prevent WQE metadata conflicts between timestamping and offloads
  net/mlx5: Refactor MACsec WQE metadata shifts
  net/mlx5: Remove VLAN insertion fields from WQE Ether segment
====================

Link: https://patch.msgid.link/1757574619-604874-1-git-send-email-tariqt@nvidia.com
Link: https://patch.msgid.link/1758104780-642426-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 15:47:35 -07:00
Oleksij Rempel
60f887b129 net: phy: clear link parameters on admin link down
When a PHY is halted (e.g. `ip link set dev lan2 down`), several
fields in struct phy_device may still reflect the last active
connection. This leads to ethtool showing stale values even though
the link is down.

Reset selected fields in _phy_state_machine() when transitioning
to PHY_HALTED and the link was previously up:

- speed/duplex -> UNKNOWN, but only in autoneg mode (in forced mode
  these fields carry configuration, not status)
- master_slave_state -> UNKNOWN if previously supported
- mdix -> INVALID (state only, same meaning as "unknown")
- lp_advertising -> always cleared

The cleanup is skipped if the PHY is in PHY_ERROR state, so the
last values remain available for diagnostics.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250917094751.2101285-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 15:43:26 -07:00
Vishnu Singh
97248adb5a net: ti: am65-cpsw: Update hw timestamping filter for PTPv1 RX packets
CPTS module of CPSW supports hardware timestamping of PTPv1 packets.Update
the "hwtstamp_rx_filters" of CPSW driver to enable timestamping of received
PTPv1 packets. Also update the advertised capability to include PTPv1.

Signed-off-by: Vishnu Singh <v-singh1@ti.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20250917041455.1815579-1-v-singh1@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 15:42:28 -07:00
Jakub Kicinski
f2cdc4c22b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc7).

No conflicts.

Adjacent changes:

drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
  9536fbe10c ("net/mlx5e: Add PSP steering in local NIC RX")
  7601a0a462 ("net/mlx5e: Add a miss level for ipsec crypto offload")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 11:26:06 -07:00
Linus Torvalds
cbf658dd09 Merge tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless. No known regressions at this point.

  Current release - fix to a fix:

   - eth: Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"

   - wifi: iwlwifi: pcie: fix byte count table for 7000/8000 devices

   - net: clear sk->sk_ino in sk_set_socket(sk, NULL), fix CRIU

  Previous releases - regressions:

   - bonding: set random address only when slaves already exist

   - rxrpc: fix untrusted unsigned subtract

   - eth:
       - ice: fix Rx page leak on multi-buffer frames
       - mlx5: don't return mlx5_link_info table when speed is unknown

  Previous releases - always broken:

   - tls: make sure to abort the stream if headers are bogus

   - tcp: fix null-deref when using TCP-AO with TCP_REPAIR

   - dpll: fix skipping last entry in clock quality level reporting

   - eth: qed: don't collect too many protection override GRC elements,
     fix memory corruption"

* tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
  cnic: Fix use-after-free bugs in cnic_delete_task
  devlink rate: Remove unnecessary 'static' from a couple places
  MAINTAINERS: update sundance entry
  net: liquidio: fix overflow in octeon_init_instr_queue()
  net: clear sk->sk_ino in sk_set_socket(sk, NULL)
  Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
  selftests: tls: test skb copy under mem pressure and OOB
  tls: make sure to abort the stream if headers are bogus
  selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
  tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
  octeon_ep: fix VF MAC address lifecycle handling
  selftests: bonding: add vlan over bond testing
  bonding: don't set oif to bond dev when getting NS target destination
  net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
  net/mlx5e: Add a miss level for ipsec crypto offload
  net/mlx5e: Harden uplink netdev access against device unbind
  MAINTAINERS: make the DPLL entry cover drivers
  doc/netlink: Fix typos in operation attributes
  igc: don't fail igc_probe() on LED setup error
  ...
2025-09-18 10:22:02 -07:00
Linus Torvalds
86cc796e5e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "These are mostly Oliver's Arm changes: lock ordering fixes for the
  vGIC, and reverts for a buggy attempt to avoid RCU stalls on large
  VMs.

  Arm:

   - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
     visiting from an MMU notifier

   - Fixes to the TLB match process and TLB invalidation range for
     managing the VCNR pseudo-TLB

   - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
     values in PMSCR_EL1

   - Fix save/restore of host MDCR_EL2 to account for eagerly
     programming at vcpu_load() on VHE systems

   - Correct lock ordering when dealing with VGIC LPIs, avoiding
     scenarios where an xarray's spinlock was nested with a *raw*
     spinlock

   - Permit stage-2 read permission aborts which are possible in the
     case of NV depending on the guest hypervisor's stage-2 translation

   - Call raw_spin_unlock() instead of the internal spinlock API

   - Fix parameter ordering when assigning VBAR_EL1

   - Reverted a couple of fixes for RCU stalls when destroying a stage-2
     page table.

     There appears to be some nasty refcounting / UAF issues lurking in
     those patches and the band-aid we tried to apply didn't hold.

  s390:

   - mm fixes, including userfaultfd bug fix

  x86:

   - Sync the vTPR from the local APIC to the VMCB even when AVIC is
     active.

     This fixes a bug where host updates to the vTPR, e.g. via
     KVM_SET_LAPIC or emulation of a guest access, are lost and result
     in interrupt delivery issues in the guest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
  Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()"
  Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables"
  KVM: arm64: vgic: fix incorrect spinlock API usage
  KVM: arm64: Remove stage 2 read fault check
  KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment
  KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
  KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
  KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
  KVM: arm64: Spin off release helper from vgic_put_irq()
  KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
  KVM: arm64: vgic: Drop stale comment on IRQ active state
  KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly
  KVM: arm64: Initialize PMSCR_EL1 when in VHE
  KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
  KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
  KVM: s390: Fix incorrect usage of mmu_notifier_register()
  KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
  KVM: arm64: Mark freed S2 MMUs as invalid
2025-09-18 09:42:55 -07:00
Linus Torvalds
604530cd9a Merge tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes and new HW support:

   - amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list

   - amd/pmf: Support new ACPI ID AMDI0108

   - asus-wmi: Re-add extra keys to ignore_key_wlan quirk

   - oxpec: Add support for AOKZOE A1X and OneXPlayer X1Pro EVA-02"

* tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
  platform/x86/amd/pmf: Support new ACPI ID AMDI0108
  platform/x86: oxpec: Add support for AOKZOE A1X
  platform/x86: oxpec: Add support for OneXPlayer X1Pro EVA-02
  platform/x86/amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
2025-09-18 09:22:34 -07:00