Konstantin Khorenko says:
====================
net: fix skb_ext BUILD_BUG_ON failures with GCOV
This mini-series fixes build failures in net/core/skbuff.c when the
kernel is built with CONFIG_GCOV_PROFILE_ALL=y.
This is part of a larger effort to add -fprofile-update=atomic to
global CFLAGS_GCOV (posted earlier as a combined series):
https://lore.kernel.org/lkml/20260401142020.1434243-1-khorenko@virtuozzo.com/T/#t
That combined series was split per subsystem as requested by Jakub.
The companion patches are:
- iommu: use __always_inline for amdv1pt_install_leaf_entry()
(sent to iommu maintainers)
- gcov: add -fprofile-update=atomic globally (sent to gcov/kbuild
maintainers, depends on this series and the iommu patch)
Patch 1/2 fixes a pre-existing build failure with CONFIG_GCOV_PROFILE_ALL:
GCOV counters prevent GCC from constant-folding the skb_ext_total_length()
loop. It also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
from d6e5794b06: that guard was a precaution in case KCOV instrumentation
also prevented constant folding, but KCOV's -fsanitize-coverage=trace-pc
does not interfere with GCC's constant folding (verified experimentally
with GCC 14.2 and GCC 16.0.1), so the guard is unnecessary.
Patch 2/2 is an additional fix needed when -fprofile-update=atomic is
added to CFLAGS_GCOV: __no_profile on the __always_inline function alone
is insufficient because after inlining, the code resides in the caller's
profiled body. The caller (skb_extensions_init) needs __no_profile and
noinline to prevent re-exposure to GCOV instrumentation.
====================
Link: https://patch.msgid.link/20260410162150.3105738-1-khorenko@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With -fprofile-update=atomic in global CFLAGS_GCOV, GCC still cannot
constant-fold the skb_ext_total_length() loop when it is inlined into a
profiled caller. The existing __no_profile on skb_ext_total_length()
itself is insufficient because after __always_inline expansion the code
resides in the caller's body, which still carries GCOV instrumentation.
Mark skb_extensions_init() with __no_profile so the BUILD_BUG_ON checks
can be evaluated at compile time. Also mark it noinline to prevent the
compiler from inlining it into skb_init() (which lacks __no_profile),
which would re-expose the function body to GCOV instrumentation.
Add __init since skb_extensions_init() is only called from __init
skb_init(). Previously it was implicitly inlined into the .init.text
section; with noinline it would otherwise remain in permanent .text,
wasting memory after boot.
Build-tested with both CONFIG_GCOV_PROFILE_ALL=y and
CONFIG_KCOV_INSTRUMENT_ALL=y.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Link: https://patch.msgid.link/20260410162150.3105738-3-khorenko@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When CONFIG_GCOV_PROFILE_ALL=y is enabled, the kernel fails to build:
In file included from <command-line>:
In function 'skb_extensions_init',
inlined from 'skb_init' at net/core/skbuff.c:5214:2:
././include/linux/compiler_types.h:706:45: error: call to
'__compiletime_assert_1490' declared with attribute error:
BUILD_BUG_ON failed: skb_ext_total_length() > 255
CONFIG_GCOV_PROFILE_ALL adds -fprofile-arcs -ftest-coverage
-fno-tree-loop-im to CFLAGS globally. GCC inserts branch profiling
counters into the skb_ext_total_length() loop and, combined with
-fno-tree-loop-im (which disables loop invariant motion), cannot
constant-fold the result.
BUILD_BUG_ON requires a compile-time constant and fails.
The issue manifests in kernels with 5+ SKB extension types enabled
(e.g., after addition of SKB_EXT_CAN, SKB_EXT_PSP). With 4 extensions
GCC can still unroll and fold the loop despite GCOV instrumentation;
with 5+ it gives up.
Mark skb_ext_total_length() with __no_profile to prevent GCOV from
inserting counters into this function. Without counters the loop is
"clean" and GCC can constant-fold it even with -fno-tree-loop-im active.
This allows BUILD_BUG_ON to work correctly while keeping GCOV profiling
for the rest of the kernel.
This also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
introduced by d6e5794b06. That guard was added as a precaution because
KCOV instrumentation was also suspected of inhibiting constant folding.
However, KCOV uses -fsanitize-coverage=trace-pc, which inserts
lightweight trace callbacks that do not interfere with GCC's constant
folding or loop optimization passes. Only GCOV's -fprofile-arcs combined
with -fno-tree-loop-im actually prevents the compiler from evaluating
the loop at compile time. The guard is therefore unnecessary and can be
safely removed.
Fixes: 96ea3a1e2d ("can: add CAN skb extension infrastructure")
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Reviewed-by: Thomas Weissschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260410162150.3105738-2-khorenko@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan says:
====================
mlx5-next updates 2026-04-09
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Add icm_mng_function_id_mode cap bit
net/mlx5: Rename MLX5_PF page counter type to MLX5_SELF
net/mlx5: Add vhca_id_type bit to alias context
mlx5: Remove redundant iseg base
====================
Link: https://patch.msgid.link/20260409110431.154894-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sk_filter_trim_cap() will soon return the reason by value,
do the same for tcp_filter().
Note:
tcp_filter() is no longer inlined. Following patch will inline it again.
$ scripts/bloat-o-meter -t vmlinux.4 vmlinux.5
add/remove: 2/0 grow/shrink: 0/2 up/down: 186/-43 (143)
Function old new delta
tcp_filter - 154 +154
__pfx_tcp_filter - 32 +32
tcp_v4_rcv 3152 3143 -9
tcp_v6_rcv 3169 3135 -34
Total: Before=29722640, After=29722783, chg +0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sk_filter_trim_cap will soon return the reason by value,
do the same for sk_filter_reason().
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-21 (-21)
Function old new delta
sock_queue_rcv_skb_reason 128 126 -2
tun_net_xmit 1146 1127 -19
Total: Before=29722661, After=29722640, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sk_filter_trim_cap() will soon return the drop reason by value.
Make sure *reason is cleared when no error is returned,
to ease this conversion.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-7 (-7)
Function old new delta
sk_filter_trim_cap 889 882 -7
Total: Before=29722668, After=29722661, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409145625.2306224-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Charles Perry says:
====================
Add support for PIC64-HPSC/HX MDIO controller
This series adds a driver for the two MDIO controllers of PIC64-HPSC/HX.
The hardware supports C22 and C45 but only C22 is implemented for now.
This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely different
with pic64hpsc, hence the need for a separate driver.
The documentation recommends an input clock of 156.25MHz and a prescaler of
39, which yields an MDIO clock of 1.95MHz.
This was tested on Microchip HB1301 evalkit which has a VSC8574 and a
VSC8541. I've tested with bus frequencies of 0.6, 1.95 and 2.5 MHz.
This series also adds a PHY write barrier when disabling PHY interrupts as
discussed in: https://lore.kernel.org/acvUqDgepCIScs8M@shell.armlinux.org.uk
====================
Link: https://patch.msgid.link/20260408131821.1145334-1-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MDIO bus controllers are not required to wait for write transactions to
complete before returning as synchronization is often achieved by polling
status bits.
This can cause issues when disabling interrupts since an interrupt could
fire before the interrupt handler is unregistered and there's no status
bit to poll.
Add a phy_write_barrier() function and use it in phy_disable_interrupts()
to fix this issue. The write barrier just reads an MII register and
discards the value, which is enough to guarantee that previous writes have
completed.
Signed-off-by: Charles Perry <charles.perry@microchip.com>
Link: https://patch.msgid.link/20260408131821.1145334-4-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds an MDIO driver for PIC64-HPSC/HX. The hardware supports C22
and C45 but only C22 is implemented in this commit.
This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely
different with pic64hpsc, hence the need for a separate driver.
The documentation recommends an input clock of 156.25MHz and a prescaler
of 39, which yields an MDIO clock of 1.95MHz.
The hardware supports an interrupt pin or a "TRIGGER" bit that can be
polled to signal transaction completion. This commit uses polling.
This was tested on Microchip HB1301 evalkit with a VSC8574 and a
VSC8541.
Signed-off-by: Charles Perry <charles.perry@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260408131821.1145334-3-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely different
with pic64hpsc, hence the need for separate documentation.
The hardware supports C22 and C45.
The documentation recommends an input clock of 156.25MHz and a prescaler
of 39, which yields an MDIO clock of 1.95MHz.
The hardware supports an interrupt pin to signal transaction completion
which is not strictly needed as the software can also poll a "TRIGGER"
bit for this.
Signed-off-by: Charles Perry <charles.perry@microchip.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260408131821.1145334-2-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the binding requires 'spi-cpha' for SJA1105 and 'spi-cpol'
for SJA1110.
However, the SJA1110 supports both SPI modes 0 and 2. Mode 2
(cpha=0, cpol=1) is used by the NXP LX2160 Bluebox 3.
On the SolidRun i.MX8DXL HummingBoard Telematics, mode 0 is stable,
while forcing mode 2 introduces CRC errors especially during bursts.
Drop the requirement on spi-cpol for SJA1110.
Fixes: af2eab1a82 ("dt-bindings: net: nxp,sja1105: document spi-cpol/cpha")
Signed-off-by: Josua Mayer <josua@solid-run.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260409-imx8dxl-sr-som-v2-1-83ff20629ba0@solid-run.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Fang says:
====================
net: enetc: improve statistics for v1 and add statistics for v4
For ENETC v1, some standardized statistics were redundantly included in
the unstructured statistics, so remove these duplicated entries.
Previously, the unstructured statistics only contained eMAC data and
did not include pMAC data; add pMAC statistics to ensure completeness.
For ENETC v4, the driver previously reported MAC statistics only for the
internal ENETC (Pseudo MAC). Extend the implementation to provide
additional statistics for both the internal ENETC and the standalone
ENETC.
====================
Link: https://patch.msgid.link/20260408055849.1314033-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The standardized counters are already exposed via the get_pause_stats(),
get_rmon_stats(), get_eth_ctrl_stats() and get_eth_mac_stats()
interfaces. Keeping the same counters in enetc_pm_counters results in
redundant output.
Remove these standardized counters from enetc_pm_counters and rely on
the existing statistics interfaces to report them.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For ENETC v1, each SI provides 16 RBDCR registers for RX ring drop
counters, but this does not imply that an SI actually owns 16 RX rings.
The ENETC hardware supports a total of 16 RX rings, which are assigned
to 3 SIs (1 PSI and 2 VSIs), so each SI is assigned fewer than 16 RX
rings.
The current implementation always reports 16 RX drop counters per SI,
leading to redundant output for SIs with fewer RX rings. Update the
logic to display drop counters only for the RX rings that are actually
assigned to the SI.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ENETC v4 provides 64-bit counters for IEEE 802.3 basic and mandatory
managed objects, the IETF Management Information Database (MIB) package
(RFC2665), and Remote Network Monitoring (RMON) statistics. In addition,
some ENETCs support preemption, so these ENETCs have two MACs: MAC 0 is
the express MAC (eMAC), MAC 1 is the preemptible MAC (pMAC). Both MACs
support these statistics.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260408055849.1314033-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nicolai Buchwitz says:
====================
net: phy: add support for disabling autonomous EEE
Some PHYs implement autonomous EEE where the PHY manages EEE
independently, preventing the MAC from controlling LPI signaling.
This conflicts with MACs that implement their own LPI control.
This series adds a .disable_autonomous_eee callback to struct phy_driver
and calls it from phy_support_eee(). When a MAC indicates it supports
EEE, the PHY's autonomous EEE is automatically disabled. The setting is
persisted across suspend/resume by re-applying it in phy_init_hw() after
soft reset, following the same pattern suggested by Russell King for PHY
tunables [1].
Patch 1 adds the phylib infrastructure.
Patch 2 implements it for Broadcom BCM54xx (AutogrEEEn).
Patch 3 converts the Realtek RTL8211F, which previously unconditionally
disabled PHY-mode EEE in config_init.
This came up while adding EEE support to the Cadence macb driver (used
on Raspberry Pi 5 with a BCM54210PE PHY). The PHY's AutogrEEEn mode
prevented the MAC from tracking LPI state. The Realtek RTL8211F has
the same pattern, unconditionally disabling PHY-mode EEE with the
comment "Disable PHY-mode EEE so LPI is passed to the MAC".
Other BCM54xx PHYs likely have the same AutogrEEEn register layout,
but I only have access to the BCM54210PE/BCM54213PE datasheets. It
would be appreciated if Florian or others could confirm which other
BCM54xx variants share this register so we can wire them up too.
Tested on Raspberry Pi CM4 (bcmgenet + BCM54210PE),
Raspberry Pi CM5 (Cadence GEM + BCM54210PE) and
Raspberry Pi 5 (Cadence GEM + BCM54213PE).
[1] https://lore.kernel.org/netdev/acuwvoydmJusuj9x@shell.armlinux.org.uk/
====================
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-0-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RTL8211F previously unconditionally disabled PHY-mode EEE in
config_init. Convert this to use the new .disable_autonomous_eee
callback so it is only disabled when the MAC indicates EEE support
via phy_support_eee().
This preserves PHY-autonomous EEE for MACs that do not support EEE,
while still disabling it when the MAC manages LPI.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-3-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some PHYs (e.g. Broadcom BCM54xx, Realtek RTL8211F) implement
autonomous EEE where the PHY manages LPI signaling without forwarding
it to the MAC. This conflicts with MAC drivers that implement their own
LPI control.
Add a .disable_autonomous_eee callback to struct phy_driver and call it
from phy_support_eee(). When a MAC driver indicates it supports EEE via
phy_support_eee(), the PHY's autonomous EEE is automatically disabled so
the MAC can manage LPI entry/exit.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-1-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu says:
====================
ynl/ethtool/netlink: fix nla_len overflow for large string sets
This series addresses a silent data corruption issue triggered when ynl
retrieves string sets from NICs with a large number of statistics entries
(e.g. mlx5_core with thousands of ETH_SS_STATS strings).
The root cause is that struct nlattr.nla_len is a __u16 (max 65535
bytes). When a NIC exports enough statistics strings, the
ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds
this limit. nla_nest_end() silently truncates the length on assignment,
producing a corrupted netlink message.
Patch 1 moves ethtool.py to selftest.
Patch 2 improves the ethtool tool: rename the doit/dumpit helpers
to do_set/do_get and convert do_get to use ynl.do() with an
explicit device header instead of a full dump with client-side filtering.
Patch 3 adds a --dbg-small-recv option to the YNL ethtool tool,
matching the same option already present in cli.py, to help debug netlink
message size issues
Patch 4 adds a new helper nla_nest_end_safe() to check whether the nla_len
is overflow and return -EMSGSIZE early if so.
Patch 5 uses the new helper in ethtool to make sure the ethtool doesn't
reply a corrupted netlink message.
====================
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-0-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netlink attribute length field nla_len is a __u16, which can only
represent values up to 65535 bytes. NICs with a large number of
statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS
entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds
this limit.
When nla_nest_end() writes the actual nest size back to nla_len, the
value is silently truncated. This results in a corrupted netlink message
being sent to userspace: the parser reads a wrong (truncated) attribute
length and misaligns all subsequent attribute boundaries, causing decode
errors.
Fix this by using the new helper nla_nest_end_safe and error out if
the size exceeds U16_MAX.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-5-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The nla_len field in struct nlattr is a __u16, which can only hold
values up to 65535. If a nested attribute grows beyond this limit,
nla_nest_end() silently truncates the length, producing a corrupted
netlink message with no indication of the problem.
Since nla_nest_end() is used everywhere and this issue rarely happens,
let's add a new helper to check the length.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-4-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename the local helper doit() to do_set() and dumpit() to do_get() to
better reflect their purpose.
Convert do_get() to use ynl.do() with an explicit device header instead
of ynl.dump() followed by client-side filtering. This is more efficient
as the kernel only processes and returns data for the requested device,
rather than dumping all devices across the netns.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-2-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bhargava Marreddy says:
====================
bng_en: add link management and statistics support
This series enhances the bng_en driver by adding:
1. Link/PHY support
a. Link query
b. Async Link events
c. Ethtool link set/get functionality
2. Hardware statistics reporting via ethtool -S
This version incorporates feedback received prior to splitting the
original series into two parts.
====================
Link: https://patch.msgid.link/20260406180420.279470-1-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement the hardware-level statistics foundation and modern structured
ethtool operations.
1. Infrastructure: Add HWRM firmware wrappers (FUNC_QSTATS_EXT,
PORT_QSTATS_EXT, and PORT_QSTATS) to query ring and port counters.
2. Structured ops: Implement .get_eth_phy_stats, .get_eth_mac_stats,
.get_eth_ctrl_stats, .get_pause_stats, and .get_rmon_stats.
Stats are initially reported as 0; accumulation logic is added
in a subsequent patch.
Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-7-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a dedicated single-thread workqueue and a timer for each PF
to drive deferred slow-path work such as link event handling and
stats collection. The timer is stopped via timer_delete_sync()
when interrupts are disabled and restarted on open.
While the close path stops the timer to prevent new tasks from
being scheduled, the sp_task and workqueue are preserved to
maintain state continuity. Final draining and destruction of
the workqueue are handled during PCI remove.
Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-2-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>