Commit Graph

1397413 Commits

Author SHA1 Message Date
Maxime Chevallier
792000fbcd net: stmmac: Move subsecond increment configuration in dedicated helper
In preparation for fine/coarse support, let's move the subsecond increment
and addend configuration in a dedicated helper.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20251024070720.71174-2-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 15:34:34 +01:00
Paolo Abeni
d7d5eca4de Merge branch 'net-macb-eyeq5-support'
says:

====================
net: macb: EyeQ5 support

This series' goal is adding support to the MACB driver for EyeQ5 GEM.
The specifics for this compatible are:

 - HW cannot add dummy bytes at the start of IP packets for alignment
   purposes. The behavior can be detected using DCFG6 so it isn't
   attached to compatible data.

 - The hardware LSO/TSO is known to be buggy: add a compatible
   capability flag to force disable it.

 - At init, we have to wiggle two syscon registers that configure the
   PHY integration.

   In past attempts [0] we did it in macb_config->init() using a syscon
   regmap. That was far from ideal so now a generic PHY driver
   abstracts that away. We reuse the bp->sgmii_phy field used by some
   compatibles.

   We have to add a phy_set_mode() call as the PHY power on sequence
   depends on whether we do RGMII or SGMII.

[0]: https://lore.kernel.org/lkml/20250627-macb-v2-15-ff8207d0bb77@bootlin.com/

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
Changes in v3:
- Drop Fixes: trailer on [2/5]. We don't fix any platform using the
  driver currently.
- Improve [5/5] commit message; add info about how an unconditional
  phy_set_mode_ext() won't break existing platforms.
- Hardbreak 82 characters line in [2/5]; warning by patchwork.
- Trailers:
  - 1x Acked-by: Conor Dooley on [1/5].
  - 2x Reviewed-by: Andrew Lunn on [1/5] and [4/5].
  - 2x Reviewed-by: Maxime Chevallier on [4/5] and [5/5].
- Link to v2: https://lore.kernel.org/r/20251022-macb-eyeq5-v2-0-7c140abb0581@bootlin.com

Changes in v2:
- Drop non net-next patches.
- Re-run get_maintainers.pl to shorten the To/Cc list.
- Rebase upon latest net-next; no changes. Tested on HW.
- Link to v1: https://lore.kernel.org/r/20251021-macb-eyeq5-v1-0-3b0b5a9d2f85@bootlin.com

Past versions of the MACB EyeQ5 patches:
 - March 2025: [PATCH net-next 00/13] Support the Cadence MACB/GEM
   instances on Mobileye EyeQ5 SoCs
   https://lore.kernel.org/lkml/20250321-macb-v1-0-537b7e37971d@bootlin.com/
 - June 2025: [PATCH net-next v2 00/18] Support the Cadence MACB/GEM
   instances on Mobileye EyeQ5 SoCs
   https://lore.kernel.org/lkml/20250627-macb-v2-0-ff8207d0bb77@bootlin.com/
 - August 2025: [PATCH net v3 00/16] net: macb: various fixes & cleanup
   https://lore.kernel.org/lkml/20250808-macb-fixes-v3-0-08f1fcb5179f@bootlin.com/

---
Théo Lebrun (5):
      dt-bindings: net: cdns,macb: add Mobileye EyeQ5 ethernet interface
      net: macb: match skb_reserve(skb, NET_IP_ALIGN) with HW alignment
      net: macb: add no LSO capability (MACB_CAPS_NO_LSO)
      net: macb: rename bp->sgmii_phy field to bp->phy
      net: macb: Add "mobileye,eyeq5-gem" compatible

 .../devicetree/bindings/net/cdns,macb.yaml         | 10 +++
 drivers/net/ethernet/cadence/macb.h                |  6 +-
 drivers/net/ethernet/cadence/macb_main.c           | 94 +++++++++++++++++-----
 3 files changed, 91 insertions(+), 19 deletions(-)
---
base-commit: 61b7ade9ba
change-id: 20251020-macb-eyeq5-fe2c0d1edc75

Best regards,
====================

Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-0-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 15:17:56 +01:00
Théo Lebrun
48cf0be9b9 net: macb: Add "mobileye,eyeq5-gem" compatible
Add support for the two GEM instances inside Mobileye EyeQ5 SoCs, using
compatible "mobileye,eyeq5-gem". With it, add a custom init sequence
that must grab a generic PHY and initialise it.

We use bp->phy in both RGMII and SGMII cases. Tell our mode by adding a
phy_set_mode_ext() during macb_open(), before phy_power_on(). We are
the first users of bp->phy that use it in non-SGMII cases.

The phy_set_mode_ext() call is made unconditionally. It cannot cause
issues on platforms where !bp->phy or !bp->phy->ops->set_mode as, in
those cases, the call is a no-op (returning zero). From reading
upstream DTS, we can figure out that no platform has a bp->phy and a
PHY driver that has a .set_mode() implementation:
 - cdns,zynqmp-gem: no DTS upstream.
 - microchip,mpfs-macb: microchip/mpfs.dtsi, &mac0..1, no PHY attached.
 - xlnx,versal-gem: xilinx/versal-net.dtsi, &gem0..1, no PHY attached.
 - xlnx,zynqmp-gem: xilinx/zynqmp.dtsi, &gem0..3, PHY attached to
   drivers/phy/xilinx/phy-zynqmp.c which has no .set_mode().

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-5-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 15:17:54 +01:00
Théo Lebrun
3f7e51cd5f net: macb: rename bp->sgmii_phy field to bp->phy
The bp->sgmii_phy field is initialised at probe by init_reset_optional()
if bp->phy_interface == PHY_INTERFACE_MODE_SGMII. It gets used by:
 - zynqmp_config: "cdns,zynqmp-gem" or "xlnx,zynqmp-gem" compatibles.
 - mpfs_config: "microchip,mpfs-macb" compatible.
 - versal_config: "xlnx,versal-gem" compatible.

Make name more generic as EyeQ5 requires the PHY in SGMII & RGMII cases.

Drop "for ZynqMP SGMII mode" comment that is already a lie, as it gets
used on Microchip platforms as well. And soon it won't be SGMII-only.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-4-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 15:17:54 +01:00
Théo Lebrun
7a3d209145 net: macb: add no LSO capability (MACB_CAPS_NO_LSO)
LSO is runtime-detected using the PBUF_LSO field inside register DCFG6.
Allow disabling that feature if it is broken by using bp->caps coming
from match data.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-3-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 15:17:54 +01:00
Théo Lebrun
ae7a9585ea net: macb: match skb_reserve(skb, NET_IP_ALIGN) with HW alignment
If HW is RSC capable, it cannot add dummy bytes at the start of IP
packets. Alignment (ie number of dummy bytes) is configured using the
RBOF field inside the NCFGR register.

On the software side, the skb_reserve(skb, NET_IP_ALIGN) call must only
be done if those dummy bytes are added by the hardware; notice the
skb_reserve() is done AFTER writing the address to the device.

We cannot do the skb_reserve() call BEFORE writing the address because
the address field ignores the low 2/3 bits. Conclusion: in some cases,
we risk not being able to respect the NET_IP_ALIGN value (which is
picked based on unaligned CPU access performance).

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-2-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 15:17:54 +01:00
Théo Lebrun
c51aa14be9 dt-bindings: net: cdns,macb: add Mobileye EyeQ5 ethernet interface
Add "cdns,eyeq5-gem" as compatible for the integrated GEM block inside
Mobileye EyeQ5 SoCs. It is different from other compatibles in two main
ways: (1) it requires a generic PHY and (2) it is better to keep TCP
Segmentation Offload (TSO) disabled.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20251023-macb-eyeq5-v3-1-af509422c204@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 15:17:54 +01:00
Alexandra Winter
968822086b dibs: Use subsys_initcall()
In the case of built-in modules, the order of module_init() calls are
derived from the Makefiles.

Use subsys_initcall() for the dibs module, to make sure dibs_init() is
executed before dibs clients like smc and dibs devices like ism are
initialized. So future dibs client or dibs device modules can use
module_init() without the risk of getting the order in the Makefiles wrong.

Reported-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251023150636.3995476-2-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 13:42:31 +01:00
Alexandra Winter
182663bbff dibs: Remove reset of static vars in dibs_init()
'clients' and 'max_client' are static variables and therefore don't need to
be initialized.

Reported-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20251023150636.3995476-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 13:42:31 +01:00
Paolo Abeni
51f322550b Merge branch 'net-mlx5-add-balance-id-support-for-lag-multiplane-groups'
Tariq Toukan says:

====================
net/mlx5: Add balance ID support for LAG multiplane groups

This series adds balance ID support for MLX5 LAG in multiplane
configurations.

See detailed description by Mark below [1].

[1]
The problem: In complex multiplane LAG setups, we need finer control over LAG
groups. Currently, devices with the same system image GUID are treated
identically, but hardware now supports per-multiplane-group balance IDs that
let us differentiate between them. On such systems image system guid
isn't enough to decide which devices should be part of which LAG.

The solution: Extend the system image GUID with a balance ID byte when the
hardware supports it. This gives us the granularity we need without breaking
existing deployments.

What this series does:

1. Clean up some duplicate code while we're here
2. Rework the system image GUID infrastructure to handle variable lengths
3. Update PTP clock pairing to use the new approach
4. Restructure capability setting to make room for the new feature
5. Actually implement the balance ID support

The key insight is in patch 5: we only append the balance ID when both
capabilities are present, so older hardware and software continue to work
exactly as before. For newer setups, you get the extra byte that enables
per-multiplane-group load balancing.

This has been tested with both old and new hardware configurations.
====================

Link: https://patch.msgid.link/1761211020-925651-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 11:11:39 +01:00
Mark Bloch
20d78ead94 net/mlx5: Add balance ID support for LAG multiplane groups
Implement balance ID support for multiplane LAG configurations. This
feature enables per-multiplane group load balancing by extending the
software system image GUID with a balance ID component.

Key implementations:
- Enable lag_per_mp_group capability when supported by hardware.
- Append load_balance_id to software system image GUID when conditions
  are met.
- Increase MLX5_SW_IMAGE_GUID_MAX_BYTES from 8 to 9 to accommodate the
  extra byte.

The balance ID is appended to the system image GUID only when both
load_balance_id and lag_per_mp_group capabilities are available, ensuring
backward compatibility while enabling enhanced LAG functionality.

This enhancement allows for more granular load balancing control in complex
multi-plane LAG deployments, improving network performance and flexibility.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 11:11:27 +01:00
Mark Bloch
075e85a126 net/mlx5: Refactor HCA cap 2 setting
Refactor HCA capability 2 setting logic to be more structured and
conditional. Move the sw_vhca_id_valid setting inside proper conditional
checks and prepare the function for additional capability settings.

The refactoring:
- Always copy current capabilities to set_hca_cap buffer.
- Apply sw_vhca_id_valid setting only when conditions are met.
- Improve code readability and maintainability.

This cleanup prepares the handle_hca_cap_2() function for the upcoming
balance ID capability setting.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 11:11:27 +01:00
Mark Bloch
cd36818c34 net/mlx5: Refactor PTP clock devcom pairing
Refactor PTP clock device component pairing to use the clock identity
buffer instead of casting it to a u64 key. This change leverages the new
software system image GUID infrastructure.

Changes include:
- Pass identity buffer to mlx5_shared_clock_register().
- Use memcpy for identity buffer in devcom matching attributes.
- Remove intermediate u64 key conversion.
- Add BUILD_BUG_ON to ensure identity size fits in match key.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 11:11:27 +01:00
Mark Bloch
7718f2a8b8 net/mlx5: Add software system image GUID infrastructure
Replace direct hardware system image GUID usage with a new software
system image GUID function that supports variable-length identifiers.

Key changes:
- Add mlx5_query_nic_sw_system_image_guid() function with length parameter.
- Update all callsites to use the new function and buffer/length approach.
- Modify mapping contexts to use byte arrays instead of u64 keys.
- Update devcom matching to support variable-length keys.
- Change mlx5_same_hw_devs() to use buffer comparison instead of u64.

This refactoring prepares the infrastructure for balance ID support,
which requires extending the system image GUID with additional data.
The change maintains backward compatibility while enabling future
enhancements.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 11:11:27 +01:00
Mark Bloch
211de28b1c net/mlx5: Use common mlx5_same_hw_devs function
Refactor duplicate hardware device comparison code to use the common
mlx5_same_hw_devs() function instead of reimplementing system GUID
comparison logic in multiple places.

This cleanup eliminates code duplication in:
- Bridge representor device comparison.
- TC hardware device comparison.

Using the centralized function improves maintainability and ensures
consistent behavior across the driver.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1761211020-925651-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 11:11:27 +01:00
Paolo Abeni
0bc4059cc5 Merge branch 'implement-more-features-for-txgbe-devices'
Jiawen Wu says:

====================
Implement more features for txgbe devices

Based on the features of hardware support, implement RX desc merge and
TX head write-back for AML devices, support RSC offload for AML and SP
devices.
====================

Link: https://patch.msgid.link/20251023014538.12644-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 10:44:19 +01:00
Jiawen Wu
eaed177706 net: txgbe: support RSC offload
Support to enable and disable RSC for txgbe devices.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251023014538.12644-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 10:44:17 +01:00
Jiawen Wu
eb57b16d90 net: txgbe: support TX head write-back mode
TX head write-back mode is supported on AML devices. When it is enabled,
the hardware no longer writes the descriptors DD one by one, but write
back pointer of completion descriptor to the head_wb address.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251023014538.12644-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 10:44:17 +01:00
Jiawen Wu
a71e367773 net: txgbe: support RX desc merge mode
RX descriptor merge mode is supported on AML devices. When it is
enabled, the hardware process the RX descriptors in batches.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251023014538.12644-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28 10:44:16 +01:00
Lad Prabhakar
19ab0a22ef dt-bindings: net: phy: vsc8531: Convert to DT schema
Convert VSC8531 Gigabit ethernet phy binding to DT schema format. While
at it add compatible string for VSC8541 PHY which is very much similar
to the VSC8531 PHY and is already supported in the kernel. VSC8541 PHY
is present on the Renesas RZ/T2H EVK.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251025064850.393797-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:21:32 -07:00
Eric Dumazet
0ae1ac7335 tcp: remove one ktime_get() from recvmsg() fast path
Each time some payload is consumed by user space (recvmsg() and friends),
TCP calls tcp_rcv_space_adjust() to run DRS algorithm to check
if an increase of sk->sk_rcvbuf is needed.

This function is based on time sampling, and currently calls
tcp_mstamp_refresh(tp), which is a wrapper around ktime_get_ns().

ktime_get_ns() has a high cost on some platforms.
100+ cycles for rdtscp on AMD EPYC Turin for instance.

We do not have to refresh tp->tcp_mpstamp, using the last cached value
is enough. We only need to refresh it from __tcp_cleanup_rbuf()
if an ACK must be sent (this is a rare event).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251024120707.3516550-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:15:38 -07:00
Yue Haibing
6f147c8328 net/sched: Remove unused typedef psched_tdiff_t
Since commit 051d442098 ("net/sched: Retire CBQ qdisc")
this is not used anymore.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20251024025145.4069583-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:05:54 -07:00
Jakub Kicinski
c5a644d254 Merge branch 'sctp-avoid-redundant-initialisation-in-sctp_accept-and-sctp_do_peeloff'
Kuniyuki Iwashima says:

====================
sctp: Avoid redundant initialisation in sctp_accept() and sctp_do_peeloff().

When sctp_accept() and sctp_do_peeloff() allocates a new socket,
somehow sk_alloc() is used, and the new socket goes through full
initialisation, but most of the fields are overwritten later.

  1)
  sctp_accept()
  |- sctp_v[46]_create_accept_sk()
  |  |- sk_alloc()
  |  |- sock_init_data()
  |  |- sctp_copy_sock()
  |  `- newsk->sk_prot->init() / sctp_init_sock()
  |
  `- sctp_sock_migrate()
     `- sctp_copy_descendant(newsk, oldsk)

  sock_init_data() initialises struct sock, but many fields are
  overwritten by sctp_copy_sock(), which inherits fields of struct
  sock and inet_sock from the parent socket.

  sctp_init_sock() fully initialises struct sctp_sock, but later
  sctp_copy_descendant() inherits most fields from the parent's
  struct sctp_sock by memcpy().

  2)
  sctp_do_peeloff()
  |- sock_create()
  |  |
  |  ...
  |      |- sk_alloc()
  |      |- sock_init_data()
  |  ...
  |    `- newsk->sk_prot->init() / sctp_init_sock()
  |
  |- sctp_copy_sock()
  `- sctp_sock_migrate()
     `- sctp_copy_descendant(newsk, oldsk)

  sock_create() creates a brand new socket, but sctp_copy_sock()
  and sctp_sock_migrate() overwrite most of the fields.

So, sk_alloc(), sock_init_data(), sctp_copy_sock(), and
sctp_copy_descendant() can be replaced with a single function
like sk_clone_lock().

This series does the conversion and removes TODO comment added
by commit 4a997d49d9 ("tcp: Save lock_sock() for memcg in
inet_csk_accept().").

Tested accept() and SCTP_SOCKOPT_PEELOFF and both work properly.

  socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP) = 3
  bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  listen(3, -1)                           = 0
  getsockname(3, {sa_family=AF_INET, sin_port=htons(49460), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
  socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP) = 4
  connect(4, {sa_family=AF_INET, sin_port=htons(49460), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  accept(3, NULL, NULL)                   = 5
  ...
  socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
  bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  listen(3, -1)                           = 0
  getsockname(3, {sa_family=AF_INET, sin_port=htons(48240), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
  socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 4
  connect(4, {sa_family=AF_INET, sin_port=htons(48240), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  getsockopt(3, SOL_SCTP, SCTP_SOCKOPT_PEELOFF, "*\0\0\0\5\0\0\0", [8]) = 5

v1: https://lore.kernel.org/20251021214422.1941691-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20251023231751.4168390-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:05:02 -07:00
Kuniyuki Iwashima
71068e2e1b sctp: Remove sctp_copy_sock() and sctp_copy_descendant().
Now, sctp_accept() and sctp_do_peeloff() use sk_clone(), and
we no longer need sctp_copy_sock() and sctp_copy_descendant().

Let's remove them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:59 -07:00
Kuniyuki Iwashima
b7ddb55f31 sctp: Use sctp_clone_sock() in sctp_do_peeloff().
sctp_do_peeloff() calls sock_create() to allocate and initialise
struct sock, inet_sock, and sctp_sock, but later sctp_copy_sock()
and sctp_sock_migrate() overwrite most fields.

What sctp_do_peeloff() does is more like accept().

Let's use sock_create_lite() and sctp_clone_sock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:58 -07:00
Kuniyuki Iwashima
c49ed521f1 sctp: Remove sctp_pf.create_accept_sk().
sctp_v[46]_create_accept_sk() are no longer used.

Let's remove sctp_pf.create_accept_sk().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:58 -07:00
Kuniyuki Iwashima
16942cf4d3 sctp: Use sk_clone() in sctp_accept().
sctp_accept() calls sctp_v[46]_create_accept_sk() to allocate a new
socket and calls sctp_sock_migrate() to copy fields from the parent
socket to the new socket.

sctp_v4_create_accept_sk() allocates sk by sk_alloc(), initialises
it by sock_init_data(), and copy a bunch of fields from the parent
socekt by sctp_copy_sock().

sctp_sock_migrate() calls sctp_copy_descendant() to copy most fields
in sctp_sock from the parent socket by memcpy().

These can be simply replaced by sk_clone().

Let's consolidate sctp_v[46]_create_accept_sk() to sctp_clone_sock()
with sk_clone().

We will reuse sctp_clone_sock() for sctp_do_peeloff() and then remove
sctp_copy_descendant().

Note that sock_reset_flag(newsk, SOCK_ZAPPED) is not copied to
sctp_clone_sock() as sctp does not use SOCK_ZAPPED at all.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:58 -07:00
Kuniyuki Iwashima
151b98d10e net: Add sk_clone().
sctp_accept() will use sk_clone_lock(), but it will be called
with the parent socket locked, and sctp_migrate() acquires the
child lock later.

Let's add no lock version of sk_clone_lock().

Note that lockdep complains if we simply use bh_lock_sock_nested().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:57 -07:00
Kuniyuki Iwashima
b7185792f8 sctp: Don't call sk->sk_prot->init() in sctp_v[46]_create_accept_sk().
sctp_accept() calls sctp_v[46]_create_accept_sk() to allocate a new
socket and calls sctp_sock_migrate() to copy fields from the parent
socket to the new socket.

sctp_v[46]_create_accept_sk() calls sctp_init_sock() to initialise
sctp_sock, but most fields are overwritten by sctp_copy_descendant()
called from sctp_sock_migrate().

Things done in sctp_init_sock() but not in sctp_sock_migrate() are
the following:

  1. Copy sk->sk_gso
  2. Copy sk->sk_destruct (sctp_v6_init_sock())
  3. Allocate sctp_sock.ep
  4. Initialise sctp_sock.pd_lobby
  5. Count sk_sockets_allocated_inc(), sock_prot_inuse_add(),
     and SCTP_DBG_OBJCNT_INC()

Let's do these in sctp_copy_sock() and sctp_sock_migrate() and avoid
calling sk->sk_prot->init() in sctp_v[46]_create_accept_sk().

Note that sk->sk_destruct is already copied in sctp_copy_sock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:57 -07:00
Kuniyuki Iwashima
2d4df59aae sctp: Don't copy sk_sndbuf and sk_rcvbuf in sctp_sock_migrate().
sctp_sock_migrate() is called from 2 places.

1) sctp_accept() calls sp->pf->create_accept_sk() before
   sctp_sock_migrate(), and sp->pf->create_accept_sk() calls
   sctp_copy_sock().

2) sctp_do_peeloff() also calls sctp_copy_sock() before
   sctp_sock_migrate().

sctp_copy_sock() copies sk_sndbuf and sk_rcvbuf from the
parent socket.

Let's not copy the two fields in sctp_sock_migrate().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:57 -07:00
Kuniyuki Iwashima
622e8838a2 sctp: Defer SCTP_DBG_OBJCNT_DEC() to sctp_destroy_sock().
SCTP_DBG_OBJCNT_INC() is called only when sctp_init_sock()
returns 0 after successfully allocating sctp_sk(sk)->ep.

OTOH, SCTP_DBG_OBJCNT_DEC() is called in sctp_close().

The code seems to expect that the socket is always exposed
to userspace once SCTP_DBG_OBJCNT_INC() is incremented, but
there is a path where the assumption is not true.

In sctp_accept(), sctp_sock_migrate() could fail after
sctp_init_sock().

Then, sk_common_release() does not call inet_release() nor
sctp_close().  Instead, it calls sk->sk_prot->destroy().

Let's move SCTP_DBG_OBJCNT_DEC() from sctp_close() to
sctp_destroy_sock().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251023231751.4168390-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:56 -07:00
Jakub Kicinski
8b2ee2df6a Merge branch 'convert-net-drivers-to-ndo_hwtstamp-api-part-2'
Vadim Fedorenko says:

====================
convert net drivers to ndo_hwtstamp API part 2

This is part 2 of patchset to convert drivers which support HW
timestamping to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
The new API uses netlink to communicate with user-space and have some
test coverage.
====================

Link: https://patch.msgid.link/20251023220457.3201122-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:39 -07:00
Vadim Fedorenko
329021eeae net: hns3: add hwtstamp_get/hwtstamp_set ops
And .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks to HNS3 framework
to support HW timestamp configuration via netlink and adopt hns3pf to
use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-7-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:36 -07:00
Vadim Fedorenko
87e1b590f7 net: renesas: rswitch: convert to ndo_hwtstamp API
Convert driver to use .ndo_hwtstamp_set()/.ndo_hwtstamp_get() callbacks.
rswitch_eth_ioctl() becomes phy_do_ioctl_running(), remove it and
replace .ndo_eth_ioctl callback with phy_do_ioctl_running().

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-6-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:36 -07:00
Vadim Fedorenko
faac57cddf net: ravb: convert to ndo_hwtstamp API
Convert driver to use .ndo_hwtstamp_set()/.ndo_hwtstamp_get callbacks.
ravb_do_ioctl() becomes pure phy_do_ioctl_running(), remove it and
replace in callbacks.

Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-5-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:36 -07:00
Vadim Fedorenko
38efb0ba3c ionic: convert to ndo_hwtstamp API
Convert driver to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
ionic_eth_ioctl() becomes empty, remove it.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-4-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:36 -07:00
Vadim Fedorenko
7a07dc723f mlx4: convert to ndo_hwtstamp API
Convert driver to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
mlx4_en_ioctl() becomes empty, remove it.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-3-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:36 -07:00
Vadim Fedorenko
a5c12b060e octeontx2: convert to ndo_hwtstamp API
Convert driver to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
otx2_ioctl() becomes empty, remove it.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251023220457.3201122-2-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:04:36 -07:00
Dan Carpenter
05e090620b net: airoha: Fix a copy and paste bug in probe()
This code has a copy and paste bug where it accidentally checks "if (err)"
instead of checking if "xsi_rsts" is NULL.  Also, as a free bonus, I
changed the allocation from kzalloc() to  kcalloc() which is a kernel
hardening measure to protect against integer overflows.

Fixes: 5863b4e065 ("net: airoha: Add airoha_eth_soc_data struct")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/aPtht6y5DRokn9zv@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:03:30 -07:00
Jakub Kicinski
bbfa5e7c8d Merge tag 'batadv-next-pullrequest-20251024' of https://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - use skb_crc32c() instead of skb_seq_read(), by Sven Eckelmann

* tag 'batadv-next-pullrequest-20251024' of https://git.open-mesh.org/linux-merge:
  batman-adv: use skb_crc32c() instead of skb_seq_read()
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20251024092315.232636-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 18:02:38 -07:00
Jakub Kicinski
f4e52b326e Merge branch 'phy-mscc-fix-ptp-for-vsc8574-and-vsc8572'
Horatiu Vultur says:

====================
phy: mscc: Fix PTP for VSC8574 and VSC8572

The first patch will update the PHYs VSC8584, VSC8582, VSC8575 and VSC856X
to use PHY_ID_MATCH_EXACT because only rev B exists for these PHYs.
But for the PHYs VSC8574 and VSC8572 exists rev A, B, C, D and E.
This is just a preparation for the second patch to allow the VSC8574 and
VSC8572 to use the function vsc8584_probe().

We want to use vsc8584_probe() for VSC8574 and VSC8572 because this
function does the correct PTP initialization. This change is in the second
patch.
====================

Link: https://patch.msgid.link/20251023191350.190940-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 17:58:05 -07:00
Horatiu Vultur
ea5df88aec phy: mscc: Fix PTP for VSC8574 and VSC8572
The PTP initialization is two-step. First part are the function
vsc8584_ptp_probe_once() and vsc8584_ptp_probe() at probe time which
initialize the locks, queues, creates the PTP device. The second part is
the function vsc8584_ptp_init() at config_init() time which initialize
PTP in the HW.

For VSC8574 and VSC8572, the PTP initialization is incomplete. It is
missing the first part but it makes the second part. Meaning that the
ptp_clock_register() is never called.

There is no crash without the first part when enabling PTP but this is
unexpected because some PHys have PTP functionality exposed by the
driver and some don't even though they share the same PTP clock PTP.

Fixes: 774626fa44 ("net: phy: mscc: Add PTP support for 2 more VSC PHYs")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20251023191350.190940-3-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 17:58:02 -07:00
Horatiu Vultur
1bc80d6730 phy: mscc: Use PHY_ID_MATCH_EXACT for VSC8584, VSC8582, VSC8575, VSC856X
As the PHYs VSC8584, VSC8582, VSC8575 and VSC856X exists only as rev B,
we can use PHY_ID_MATCH_EXACT to match exactly on revision B of the PHY.
Because of this change then there is not need the check if it is a
different revision than rev B in the function vsc8584_probe() as we
already know that this will never happen.
These changes are a preparation for the next patch because in that patch
we will make the PHYs VSC8574 and VSC8572 to use vsc8584_probe() and
these PHYs have multiple revision.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20251023191350.190940-2-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 17:58:01 -07:00
Petr Machata
d10920607f selftests: bridge_mdb: Add a test for MDB flush on snooping disable
Check that non-permanent MDB entries are removed as IGMP / MLD snooping is
disabled.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/9420dfbcf26c8e1134d31244e9e7d6a49d677a69.1761228273.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 17:57:21 -07:00
Petr Machata
68800bbf58 net: bridge: Flush multicast groups when snooping is disabled
When forwarding multicast packets, the bridge takes MDB into account when
IGMP / MLD snooping is enabled. Currently, when snooping is disabled, the
MDB is retained, even though it is not used anymore.

At the same time, during the time that snooping is disabled, the IGMP / MLD
control packets are obviously ignored, and after the snooping is reenabled,
the administrator has to assume it is out of sync. In particular, missed
join and leave messages would lead to traffic being forwarded to wrong
interfaces.

Keeping the MDB entries around thus serves no purpose, and just takes
memory. Note also that disabling per-VLAN snooping does actually flush the
relevant MDB entries.

This patch flushes non-permanent MDB entries as global snooping is
disabled.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/5e992df1bb93b88e19c0ea5819e23b669e3dde5d.1761228273.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 17:57:21 -07:00
Wilfred Mallawa
5f30bc4706 selftests: tls: add tls record_size_limit test
Test that outgoing plaintext records respect the tls TLS_TX_MAX_PAYLOAD_LEN
set using setsockopt(). The limit is set to be 128, thus, in all received
records, the plaintext must not exceed this amount.

Also test that setting a new record size limit whilst a pending open
record exists is handled correctly by discarding the request.

Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20251022001937.20155-2-wilfred.opensource@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 16:13:43 -07:00
Wilfred Mallawa
82cb5be6ad net/tls: support setting the maximum payload size
During a handshake, an endpoint may specify a maximum record size limit.
Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the
maximum record size. Meaning that, the outgoing records from the kernel
can exceed a lower size negotiated during the handshake. In such a case,
the TLS endpoint must send a fatal "record_overflow" alert [1], and
thus the record is discarded.

Upcoming Western Digital NVMe-TCP hardware controllers implement TLS
support. For these devices, supporting TLS record size negotiation is
necessary because the maximum TLS record size supported by the controller
is less than the default 16KB currently used by the kernel.

Currently, there is no way to inform the kernel of such a limit. This patch
adds support to a new setsockopt() option `TLS_TX_MAX_PAYLOAD_LEN` that
allows for setting the maximum plaintext fragment size. Once set, outgoing
records are no larger than the size specified. This option can be used to
specify the record size limit.

[1] https://www.rfc-editor.org/rfc/rfc8449

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20251022001937.20155-1-wilfred.opensource@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 16:13:42 -07:00
Jakub Kicinski
bfe62db542 Merge branch 'dwmac-support-for-rockchip-rk3506'
Heiko Stuebner says:

====================
DWMAC support for Rockchip RK3506

Some cleanups to the DT binding for Rockchip variants of the dwmac
and adding the RK3506 support on top.

As well as the driver-glue needed for setting up the correct RMII
speed seitings.
====================

Link: https://patch.msgid.link/20251023111213.298860-1-heiko@sntech.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24 19:07:48 -07:00
Heiko Stuebner
384d842632 MAINTAINERS: add dwmac-rk glue driver to the main Rockchip entry
The dwmac-rk glue driver is currently not caught by the general maintainer
entry for Rockchip SoCs, so add it explicitly, similar to the i2c driver.

The binding document in net/rockchip-dwmac.yaml already gets caught by
the wildcard match.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251023111213.298860-6-heiko@sntech.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24 19:07:37 -07:00
David Wu
2010163a8e ethernet: stmmac: dwmac-rk: Add RK3506 GMAC support
Add the needed glue blocks for the RK3506-specific setup.

The RK3506 dwmac only supports up to 100MBit with a RMII PHY,
but no RGMII.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251023111213.298860-5-heiko@sntech.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24 19:07:37 -07:00