Commit Graph

1335777 Commits

Author SHA1 Message Date
Jakub Kicinski
93d2f2f36e eth: fbnic: wrap tx queue stats in a struct
The queue stats struct is used for Rx and Tx queues. Wrap
the Tx stats in a struct and a union, so that we can reuse
the same space for Rx stats on Rx queues.

This also makes it easy to add an assert to the stat handling
code to catch new stats not being aggregated on shutdown.

Acked-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 16:38:51 -08:00
Jakub Kicinski
34eea78a11 net: report csum_complete via qstats
Commit 13c7c941e7 ("netdev: add qstat for csum complete") reserved
the entry for csum complete in the qstats uAPI. Start reporting this
value now that we have a driver which needs it.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250211181356.580800-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 16:37:35 -08:00
Jakub Kicinski
fea5d56282 Merge branch 'use-phylib-for-reset-randomization-and-adjustable-polling'
Oleksij Rempel says:

====================
Use PHYlib for reset randomization and adjustable polling

This patch set tackles a DP83TG720 reset lock issue and improves PHY
polling. Rather than adding a separate polling worker to randomize PHY
resets, I chose to extend the PHYlib framework - which already handles
most of the needed functionality - with adjustable polling. This
approach not only addresses the DP83TG720-specific problem (where
synchronized resets can lock the link) but also lays the groundwork for
optimizing PHY stats polling across all PHY drivers. With generic PHY
stats coming in, we can adjust the polling interval based on hardware
characteristics, such as using longer intervals for PHYs with stable HW
counters or shorter ones for high-speed links prone to counter
overflows.

Patch version changes are tracked in separate patches.
====================

Link: https://patch.msgid.link/20250210082358.200751-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:49:15 -08:00
Oleksij Rempel
e252af1a67 net: phy: dp83tg720: Add randomized polling intervals for link detection
Address the limitations of the DP83TG720 PHY, which cannot reliably
detect or report a stable link state. To handle this, the PHY must be
periodically reset when the link is down. However, synchronized reset
intervals between the PHY and its link partner can result in a deadlock,
preventing the link from re-establishing.

This change introduces a randomized polling interval when the link is
down to desynchronize resets between link partners.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250210082358.200751-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:49:03 -08:00
Oleksij Rempel
8bf47e4d7b net: phy: Add support for driver-specific next update time
Introduce the `phy_get_next_update_time` function to allow PHY drivers
to dynamically determine the time (in jiffies) until the next state
update event. This enables more flexible and adaptive polling intervals
based on the link state or other conditions.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250210082358.200751-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:49:03 -08:00
Jakub Kicinski
12739192b1 Merge branch 'rate-management-on-traffic-classes-misc'
Tariq Toukan says:

====================
mlx5 misc

Patches 1-3 by William reduce the memory consumption for representors to
achieve better scalability.

Patches 4-5 by Akiva expose ICM memory consumption per function.

Patches 6-8 expose helpful information on RSS resources in devlink RX
reporter diagnose.

Patches 9-10 are simple enhancements by Alex Lazar.
====================

Link: https://patch.msgid.link/20250209101716.112774-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:19 -08:00
Alexei Lazar
1a9304859b net/mlx5: XDP, Enable TX side XDP multi-buffer support
In XDP scenarios, fragmented packets can occur if the MTU is larger
than the page size, even when the packet size fits within the linear
part.
If XDP multi-buffer support is disabled, the fragmented part won't be
handled in the TX flow, leading to packet drops.

Since XDP multi-buffer support is always available, this commit removes
the conditional check for enabling it.
This ensures that XDP multi-buffer support is always enabled,
regardless of the `is_xdp_mb` parameter, and guarantees the handling of
fragmented packets in such scenarios.

Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-16-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:18 -08:00
Alexei Lazar
95b9606b15 net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
Current loopback test validation ignores non-linear SKB case in
the SKB access, which can lead to failures in scenarios such as
when HW GRO is enabled.
Linearize the SKB so both cases will be handled.

Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-15-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
Amir Tzin
896c92aa74 net/mlx5e: Expose RSS via devlink rx reporter diagnose
Underneath "rx resources" tag expose RSS diagnostic information. For
each RSS expose its rqtn, TIRs and inner TIRs.

$ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx
 .......
RSS:
    Index: 0 rqtn: 0
    TIRs Numbers:
        tt: TT_IPV4_TCP tirn: 0
        tt: TT_IPV6_TCP tirn: 1
        tt: TT_IPV4_UDP tirn: 2
        tt: TT_IPV6_UDP tirn: 3
        tt: TT_IPV4_IPSEC_AH tirn: 4
        tt: TT_IPV6_IPSEC_AH tirn: 5
        tt: TT_IPV4_IPSEC_ESP tirn: 6
        tt: TT_IPV6_IPSEC_ESP tirn: 7
        tt: TT_IPV4 tirn: 8
        tt: TT_IPV6 tirn: 9
    Inner TIRs Numbers:
        tt: TT_IPV4_TCP tirn: 10
        tt: TT_IPV6_TCP tirn: 11
        tt: TT_IPV4_UDP tirn: 12
        tt: TT_IPV6_UDP tirn: 13
        tt: TT_IPV4_IPSEC_AH tirn: 14
        tt: TT_IPV6_IPSEC_AH tirn: 15
        tt: TT_IPV4_IPSEC_ESP tirn: 16
        tt: TT_IPV6_IPSEC_ESP tirn: 17
        tt: TT_IPV4 tirn: 18
        tt: TT_IPV6 tirn: 19
    Index: 2 rqtn: 27
    TIRs Numbers:
        tt: TT_IPV6_TCP tirn: 46

Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
Amir Tzin
99c55284e8 net/mlx5e: Add direct TIRs to devlink rx reporter diagnose
Add "RX resources" tag to the output of rx reporter diagnose callback.
Underneath add tag for direct TIRs, for each TIR expose its tirn and
the corresponding rqtn.

$ sudo devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx
 ....
 rx resources:
   Direct TIRs:
       ix: 0 tirn: 20 rqtn: 1
       ix: 1 tirn: 21 rqtn: 2
       ix: 2 tirn: 22 rqtn: 3
       ix: 3 tirn: 23 rqtn: 4
       ix: 4 tirn: 24 rqtn: 5
       ix: 5 tirn: 25 rqtn: 6
       ix: 6 tirn: 26 rqtn: 7
       ix: 7 tirn: 27 rqtn: 8
       ix: 8 tirn: 28 rqtn: 9
       ix: 9 tirn: 29 rqtn: 10
       ix: 10 tirn: 30 rqtn: 11
       ix: 11 tirn: 31 rqtn: 12
       ix: 12 tirn: 32 rqtn: 13
       ix: 13 tirn: 33 rqtn: 14
       ix: 14 tirn: 34 rqtn: 15
       ix: 15 tirn: 35 rqtn: 16
       ix: 16 tirn: 36 rqtn: 17
       ix: 17 tirn: 37 rqtn: 18
       ix: 18 tirn: 38 rqtn: 19
       ix: 19 tirn: 39 rqtn: 20
       ix: 20 tirn: 40 rqtn: 21
       ix: 21 tirn: 41 rqtn: 22
       ix: 22 tirn: 42 rqtn: 23
       ix: 23 tirn: 43 rqtn: 24

Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
Amir Tzin
913175b3f9 net/mlx5e: Move RQs diagnose to a dedicated function
Move rx reporter RQs diagnose from mlx5e_rx_reporter_diagnose() to a
dedicated function. This change is a preparation for the following
series which extends diagnose output for the rx reporter. While at it,
also pass a mlx5e_priv pointer to
mlx5e_rx_reporter_diagnose_common_config() as this is the argument the
latter actually needs.

Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
Akiva Goldberger
b820864335 net/mlx5: Expose ICM consumption per function
ICM is a portion of the host's memory assigned to a function by the OS
through requests made by the NIC's firmware.

PF ICM consumption can be accessed directly, while VF/SF ICM consumption
can be accessed through their representors in switchdev mode.

The value is exposed to the user in granularity of 4KB through the vnic
health reporter as follows:

$ devlink health diagnose pci/0000:08:00.0 reporter vnic
 vNIC env counters:
     total_error_queues: 0 send_queue_priority_update_flow: 0
     comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
     invalid_command: 0 quota_exceeded_command: 0
     nic_receive_steering_discard: 0 icm_consumption: 1032

Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
Akiva Goldberger
38b3d42e5a net/mlx5: Rename and move mlx5_esw_query_vport_vhca_id
Rename mlx5_esw_query_vport_vhca_id to mlx5_vport_get_vhca_id and move
it to vport file. Also, add function declaration to mlx5_core header
file. This better represents the function's usage and allows for it to
be called from other parts of the mlx5_core driver.

Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
William Tu
a38cc5706f net/mlx5e: set the tx_queue_len for pfifo_fast
By default, the mq netdev creates a pfifo_fast qdisc. On a
system with 16 core, the pfifo_fast with 3 bands consumes
16 * 3 * 8 (size of pointer) * 1024 (default tx queue len)
= 393KB. The patch sets the tx qlen to representor default
value, 128 (1<<MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE), which
consumes 16 * 3 * 8 * 128 = 49KB, saving 344KB for each
representor at ECPF.

Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
William Tu
b9cc8f9d70 net/mlx5e: reduce rep rxq depth to 256 for ECPF
By experiments, a single queue representor netdev consumes kernel
memory around 2.8MB, and 1.8MB out of the 2.8MB is due to page
pool for the RXQ. Scaling to a thousand representors consumes 2.8GB,
which becomes a memory pressure issue for embedded devices such as
BlueField-2 16GB / BlueField-3 32GB memory.

Since representor netdevs mostly handles miss traffic, and ideally,
most of the traffic will be offloaded, reduce the default non-uplink
rep netdev's RXQ default depth from 1024 to 256 if mdev is ecpf eswitch
manager. This saves around 1MB of memory per regular RQ,
(1024 - 256) * 2KB, allocated from page pool.

With rxq depth of 256, the netlink page pool tool reports
$./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
	 --dump page-pool-get
 {'id': 277,
  'ifindex': 9,
  'inflight': 128,
  'inflight-mem': 786432,
  'napi-id': 775}]

This is due to mtu 1500 + headroom consumes half pages, so 256 rxq
entries consumes around 128 pages (thus create a page pool with
size 128), shown above at inflight.

Note that each netdev has multiple types of RQs, including
Regular RQ, XSK, PTP, Drop, Trap RQ. Since non-uplink representor
only supports regular rq, this patch only changes the regular RQ's
default depth.

Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
William Tu
e1d68ea58c net/mlx5e: reduce the max log mpwrq sz for ECPF and reps
For the ECPF and representors, reduce the max MPWRQ size from 256KB (18)
to 128KB (17). This prepares the later patch for saving representor
memory.

With Striding RQ, there is a minimum of 4 MPWQEs. So with 128KB of max
MPWRQ size, the minimal memory is 4 * 128KB = 512KB. When creating page
pool, consider 1500 mtu, the minimal page pool size will be 512KB/4KB =
128 pages = 256 rx ring entries (2 entries per page).

Before this patch, setting RX ringsize (ethtool -G rx) to 256 causes
driver to allocate page pool size more than it needs due to max MPWRQ
is 256KB (18). Ex: 4 * 256KB = 1MB, 1MB/4KB = 256 pages, but actually
128 pages is good enough. Reducing the max MPWRQ to 128KB fixes the
limitation.

Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12 10:46:17 -08:00
Jakub Kicinski
4e41231249 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-02-10 (ice, igc, e1000e)

For ice:

Karol, Jake, and Michal add PTP support for E830 devices. Karol
refactors and cleans up PTP code. Jake allows for a common
cross-timestamp implementation to be shared for all devices and
Michal adds E830 support.

Mateusz cleans up initial Flow Director rule creation to loop rather
than duplicate repeated similar calls.

For igc:

Siang adjust calls to remove need for close and open calls on loading
XDP program.

For e1000e:

Gerhard Engleder batches register writes for writing multicast table
on real-time kernels.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  e1000e: Fix real-time violations on link up
  igc: Avoid unnecessary link down event in XDP_SETUP_PROG process
  ice: refactor ice_fdir_create_dflt_rules() function
  ice: Implement PTP support for E830 devices
  ice: Refactor ice_ptp_init_tx_*
  ice: Add unified ice_capture_crosststamp
  ice: Process TSYN IRQ in a separate function
  ice: Use FIELD_PREP for timestamp values
  ice: Remove unnecessary ice_is_e8xx() functions
  ice: Don't check device type when checking GNSS presence
====================

Link: https://patch.msgid.link/20250210192352.3799673-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 19:51:16 -08:00
Jakub Kicinski
be1d2a1b15 Merge branch 'sfc-support-devlink-flash'
Edward Cree says:

====================
sfc: support devlink flash

Allow upgrading device firmware on Solarflare NICs through standard tools.
====================

Link: https://patch.msgid.link/cover.1739186252.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:45:45 -08:00
Edward Cree
5ea73bf3c4 sfc: document devlink flash support
Update the information in sfc's devlink documentation including
 support for firmware update with devlink flash.
Also update the help text for CONFIG_SFC_MTD, as it is no longer
 strictly required for firmware updates.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/3476b0ef04a0944f03e0b771ec8ed1a9c70db4dc.1739186253.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:12:17 -08:00
Edward Cree
3ed63980ae sfc: deploy devlink flash images to NIC over MCDI
Use MC_CMD_NVRAM_* wrappers to write the firmware to the partition
 identified from the image header.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/a746335876b621b3e54cf4e49948148e349a1745.1739186253.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:12:17 -08:00
Edward Cree
d41987e906 sfc: extend NVRAM MCDI handlers
Support variable write-alignment, and background updates.  The latter
 allows other MCDI to continue while the device is processing an
 MC_CMD_NVRAM_UPDATE_FINISH, since this can take a long time owing to
 e.g. cryptographic signature verification.
Expose these handlers in mcdi.h, and build them even when
 CONFIG_SFC_MTD=n, so they can be used for devlink flash in a
 subsequent patch.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/de3d9e14fee69e15d95b46258401a93b75659f78.1739186253.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:12:17 -08:00
Edward Cree
fd118a77ed sfc: parse headers of devlink flash images
This parsing is necessary to obtain the metadata which will be
 used in a subsequent patch to write the image to the device.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/65318300f3f1b1462925f917f7c0d0ac833955ae.1739186253.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:12:17 -08:00
Jakub Kicinski
f4b87edbe0 Merge branch 'use-hwmon_channel_info-macro-to-simplify-code'
Huisong Li says:

====================
Use HWMON_CHANNEL_INFO macro to simplify code

The HWMON_CHANNEL_INFO macro is provided by hwmon.h and used widely by many
other drivers. This series use HWMON_CHANNEL_INFO macro to simplify code
in net subsystem.

Note: These patches do not depend on each other. Put them togeter just for
belonging to the same subsystem.
====================

Link: https://patch.msgid.link/20250210054710.12855-1-lihuisong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:07:05 -08:00
Huisong Li
d6085a23b3 net: phy: aquantia: Use HWMON_CHANNEL_INFO macro to simplify code
Use HWMON_CHANNEL_INFO macro to simplify code.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Link: https://patch.msgid.link/20250210054710.12855-6-lihuisong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:07:03 -08:00
Huisong Li
4798f4834b net: phy: marvell10g: Use HWMON_CHANNEL_INFO macro to simplify code
Use HWMON_CHANNEL_INFO macro to simplify code.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Link: https://patch.msgid.link/20250210054710.12855-5-lihuisong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:07:02 -08:00
Huisong Li
0cb595e80e net: phy: marvell: Use HWMON_CHANNEL_INFO macro to simplify code
Use HWMON_CHANNEL_INFO macro to simplify code.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Link: https://patch.msgid.link/20250210054710.12855-4-lihuisong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:07:02 -08:00
Huisong Li
e05427c4d1 net: nfp: Use HWMON_CHANNEL_INFO macro to simplify code
Use HWMON_CHANNEL_INFO macro to simplify code.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Link: https://patch.msgid.link/20250210054710.12855-3-lihuisong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:07:02 -08:00
Huisong Li
43a0d7f26a net: aquantia: Use HWMON_CHANNEL_INFO macro to simplify code
Use HWMON_CHANNEL_INFO macro to simplify code.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Link: https://patch.msgid.link/20250210054710.12855-2-lihuisong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 17:07:02 -08:00
Wolfram Sang
4d3f687e24 net: wwan: t7xx: don't include '<linux/pm_wakeup.h>' directly
The header clearly states that it does not want to be included directly,
only via '<linux/(platform_)?device.h>'. Which is already present, so
delete the superfluous include.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://patch.msgid.link/20250210113710.52071-2-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 16:37:30 -08:00
Wolfram Sang
ad30ee8013 net: phy: broadcom: don't include '<linux/pm_wakeup.h>' directly
The header clearly states that it does not want to be included directly,
only via '<linux/(platform_)?device.h>'. Replace the include accordingly.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250210113658.52019-2-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 16:36:48 -08:00
Heiner Kallweit
8729a9bd6e net: freescale: ucc_geth: remove unused PHY_INIT_TIMEOUT and PHY_CHANGE_TIME
Both definitions are unused. Last users have been removed with:

1577ecef76 ("netdev: Merge UCC and gianfar MDIO bus drivers")
728de4c927 ("ucc_geth: migrate ucc_geth to phylib")

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/62e9429b-57e0-42ec-96a5-6a89553f441d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 16:34:58 -08:00
Heiner Kallweit
16d11fdaeb net: phy: remove unused PHY_INIT_TIMEOUT and PHY_FORCE_TIMEOUT
Both definitions are unused. Last users have been removed with:

f3ba9d490d ("net: s6gmac: remove driver")
2bd229df5e ("net: phy: remove state PHY_FORCING")

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/f8e7b8ed-a665-41ad-b0ce-cbfdb65262ef@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 16:34:03 -08:00
Ethan Carter Edwards
3b147be9ef hamradio: baycom: replace strcpy() with strscpy()
The strcpy() function has been deprecated and replaced with strscpy().
There is an effort to make this change treewide:
https://github.com/KSPP/linux/issues/88.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/3qo3fbrak7undfgocsi2s74v4uyjbylpdqhie4dohfoh4welfn@joq7up65ug6v
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 16:23:38 -08:00
Tamir Duberstein
b341f6fd45 blackhole_dev: convert self-test to KUnit
Convert this very simple smoke test to a KUnit test.

Add a missing `htons` call that was spotted[0] by kernel test robot
<lkp@intel.com> after initial conversion to KUnit.

Link: https://lore.kernel.org/oe-kbuild-all/202502090223.qCYMBjWT-lkp@intel.com/ [0]
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20250208-blackholedev-kunit-convert-v2-1-182db9bd56ec@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 16:21:42 -08:00
Jakub Kicinski
b6df0523ec Merge branch 'net-phy-rename-eee_broken_mode'
Heiner Kallweit says:

====================
net: phy: rename eee_broken_mode

eee_broken_mode is used also if an EEE mode isn't actually broken
but e.g. not supported by MAC. So rename it.

This is split out from a bigger series that needs more rework.
====================

Link: https://patch.msgid.link/d7924d4e-49b0-4182-831f-73c558d4425e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 15:19:13 -08:00
Heiner Kallweit
5e7a74b6a3 net: phy: rename phy_set_eee_broken to phy_disable_eee_mode
Consider that an EEE mode may not be broken but simply not supported
by the MAC, and rename function phy_set_eee_broken().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/30deb630-3f6b-4ffb-a1e6-a9736021f43a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 15:19:10 -08:00
Heiner Kallweit
8eb0d381be net: phy: rename eee_broken_modes to eee_disabled_modes
This bitmap is used also if the MAC doesn't support an EEE mode.
So the mode isn't necessarily broken in the PHY. Therefore rename
the bitmap.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/6cd11422-dd67-4c87-a642-308de694af92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11 15:19:10 -08:00
Paolo Abeni
ae9b3c0e79 Merge branch 'tcp-allow-to-reduce-max-rto'
Eric Dumazet says:

====================
tcp: allow to reduce max RTO

This is a followup of a discussion started 6 months ago
by Jason Xing.

Some applications want to lower the time between each
retransmit attempts.

TCP_KEEPINTVL and TCP_KEEPCNT socket options don't
work around the issue.

This series adds:

- a new TCP level socket option (TCP_RTO_MAX_MS)
- a new sysctl (/proc/sys/net/ipv4/tcp_rto_max_ms)

Admins and/or applications can now change the max rto value
at their own risk.

Link: https://lore.kernel.org/netdev/20240715033118.32322-1-kerneljasonxing@gmail.com/T/
====================

Link: https://patch.msgid.link/20250207152830.2527578-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:08:02 +01:00
Eric Dumazet
1280c26228 tcp: add tcp_rto_max_ms sysctl
Previous patch added a TCP_RTO_MAX_MS socket option
to tune a TCP socket max RTO value.

Many setups prefer to change a per netns sysctl.

This patch adds /proc/sys/net/ipv4/tcp_rto_max_ms

Its initial value is 120000 (120 seconds).

Keep in mind that a decrease of tcp_rto_max_ms
means shorter overall timeouts, unless tcp_retries2
sysctl is increased.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:08:00 +01:00
Eric Dumazet
54a378f434 tcp: add the ability to control max RTO
Currently, TCP stack uses a constant (120 seconds)
to limit the RTO value exponential growth.

Some applications want to set a lower value.

Add TCP_RTO_MAX_MS socket option to set a value (in ms)
between 1 and 120 seconds.

It is discouraged to change the socket rto max on a live
socket, as it might lead to unexpected disconnects.

Following patch is adding a netns sysctl to control the
default value at socket creation time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:08:00 +01:00
Eric Dumazet
48b69b4c7e tcp: use tcp_reset_xmit_timer()
In order to reduce TCP_RTO_MAX occurrences, replace:

    inet_csk_reset_xmit_timer(sk, what, when, TCP_RTO_MAX)

With:

    tcp_reset_xmit_timer(sk, what, when, false);

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:07:59 +01:00
Eric Dumazet
7baa030155 tcp: add a @pace_delay parameter to tcp_reset_xmit_timer()
We want to factorize calls to inet_csk_reset_xmit_timer(),
to ease TCP_RTO_MAX change.

Current users want to add tcp_pacing_delay(sk)
to the timeout.

Remaining calls to inet_csk_reset_xmit_timer()
do not add the pacing delay. Following patch
will convert them, passing false for @pace_delay.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:07:59 +01:00
Eric Dumazet
0fed463777 tcp: remove tcp_reset_xmit_timer() @max_when argument
All callers use TCP_RTO_MAX, we can factorize this constant,
becoming a variable soon.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 13:07:59 +01:00
Paolo Abeni
812122783a Merge branch 'mptcp-pm-misc-cleanups-part-2'
Matthieu Baerts says:

====================
mptcp: pm: misc cleanups, part 2

These cleanups lead the way to the unification of the path-manager
interfaces, and allow future extensions. The following patches are not
all linked to each others, but are all related to the path-managers.

- Patch 1: drop unneeded parameter in a function helper.

- Patch 2: clearer NL error message when an NL attribute is missing.

- Patch 3: more precise NL messages by avoiding 'this or that is NOK'.

- Patch 4: improve too vague or missing NL err messages.

- Patch 5: use GENL_REQ_ATTR_CHECK to look for mandatory NL attributes.

- Patch 6: avoid overriding the error message.

- Patch 7: check all mandatory NL attributes with GENL_REQ_ATTR_CHECK.

- Patch 8: use NL_SET_ERR_MSG_ATTR instead of GENL_SET_ERR_MSG

- Patch 9: move doit callbacks used for both PM to pm.c.

- Patch 10: drop another unneeded parameter in a function helper.

- Patch 11: share the ID parsing code for the 'get_addr' callback.

- Patch 12: share sending NL code for the 'get_addr' callback.

- Patch 13: drop yet another unneeded parameter in a function helper.

- Patch 14: pick the usual structure type for the remote address.

- Patch 15: share the local addr parsing code for the 'set_flags' cb.

The behaviour when there are no errors should then not be modified.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================

v2: https://lore.kernel.org/r/20250117-net-next-mptcp-pm-misc-cleanup-2-v2-0-61d4fe0586e8@kernel.org
v1: https://lore.kernel.org/r/20250116-net-next-mptcp-pm-misc-cleanup-2-v1-0-c0b43f18fe06@kernel.org

Link: https://patch.msgid.link/20250207-net-next-mptcp-pm-misc-cleanup-2-v3-0-71753ed957de@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 12:46:41 +01:00
Geliang Tang
c7f25f7987 mptcp: pm: add local parameter for set_flags
This patch updates the interfaces set_flags to reduce repetitive
code, adds a new parameter 'local' for them.

The local address is parsed in public helper mptcp_pm_nl_set_flags_doit(),
then pass it to mptcp_pm_nl_set_flags() and mptcp_userspace_pm_set_flags().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 12:46:38 +01:00
Geliang Tang
ab5723599c mptcp: pm: change rem type of set_flags
Generally, in the path manager interfaces, the local address is defined
as an mptcp_pm_addr_entry type address, while the remote address is
defined as an mptcp_addr_info type one:

    (struct mptcp_pm_addr_entry *local, struct mptcp_addr_info *remote)

But the set_flags() interface uses two mptcp_pm_addr_entry type
parameters.

This patch changes the second one to mptcp_addr_info type and use helper
mptcp_pm_parse_addr() to parse it instead of using mptcp_pm_parse_entry().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 12:46:37 +01:00
Geliang Tang
2c8971c04f mptcp: pm: drop skb parameter of set_flags
The first parameter 'skb' in mptcp_pm_nl_set_flags() is only used to
obtained the network namespace, which can also be obtained through the
second parameters 'info' by using genl_info_net() helper.

This patch drops these useless parameters 'skb' in all three set_flags()
interfaces.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 12:46:37 +01:00
Geliang Tang
8556f4aecc mptcp: pm: reuse sending nlmsg code in get_addr
The netlink messages are sent both in mptcp_pm_nl_get_addr() and
mptcp_userspace_pm_get_addr(), this makes the code somewhat repetitive.
This is because the netlink PM and userspace PM use different locks to
protect the address entry that needs to be sent via the netlink message.
The former uses rcu read lock, and the latter uses msk->pm.lock.

The current get_addr() flow looks like this:

	lock();
	entry = get_entry();
	send_nlmsg(entry);
	unlock();

After holding the lock, get the entry from the list, send the entry, and
finally release the lock.

This patch changes the process by getting the entry while holding the lock,
then making a copy of the entry so that the lock can be released. Finally,
the copy of the entry is sent without locking:

	lock();
	entry = get_entry();
	*copy = *entry;
	unlock();

	send_nlmsg(copy);

This way we can reuse the send_nlmsg() code in get_addr() interfaces
between the netlink PM and userspace PM. They only need to implement their
own get_addr() interfaces to hold the different locks, get the entry from
the different lists, then release the locks.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 12:46:37 +01:00
Geliang Tang
d47b80758f mptcp: pm: add id parameter for get_addr
The address id is parsed both in mptcp_pm_nl_get_addr() and
mptcp_userspace_pm_get_addr(), this makes the code somewhat repetitive.

So this patch adds a new parameter 'id' for all get_addr() interfaces.
The address id is only parsed in mptcp_pm_nl_get_addr_doit(), then pass
it to both mptcp_pm_nl_get_addr() and mptcp_userspace_pm_get_addr().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 12:46:37 +01:00
Geliang Tang
67dcf65925 mptcp: pm: drop skb parameter of get_addr
The first parameters 'skb' of get_addr() interfaces are now useless
since mptcp_userspace_pm_get_sock() helper is used. This patch drops
these useless parameters of them.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11 12:46:37 +01:00