After commit 7642cc28fd ("net: phylink: fix PHY validation with rate
adaption"), the API contract changed and PHY drivers are no longer
required to respond to the .get_rate_matching() method for
PHY_INTERFACE_MODE_NA. This was later followed up by documentation
commit 6d4cfcf979 ("net: phy: Update documentation for
get_rate_matching").
As such, handling PHY_INTERFACE_MODE_NA in the Aquantia PHY driver
implementation of this method is unnecessary and confusing. Remove it.
Cc: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-8-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, aqr_gen2_fill_interface_modes() reads VEND1_GLOBAL_CFG_*
registers to populate phydev->supported_interfaces. But this is not
the only place which needs to read these registers. There is also
aqr107_read_rate().
Based on the premise that these values are statically set by firmware
and the driver only needs to read them, the proposal is to read them
only once, at config_init() time, and use the cached values also in
aqr107_read_rate().
This patch only refactors the aqr_gen2_fill_interface_modes() code to
save the registers to driver memory, and to populate supported_interfaces
based on that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since aqr_gen2_config_init() and aqr_gen2_fill_interface_modes() refer to
the feature set common to the same generation, it means all callers of
aqr_gen2_config_init() also support the Global System Configuration
registers at addresses 1E.31B -> 1E.31F, and these should be read by the
driver to figure out the list of supported interfaces for phylink.
This affects the following PHYs supported by this driver:
- Gen2: AQR107
- Gen3: AQR111, AQR111B0
- Gen4: AQR114C, AQR813.
AQR113C, a Gen4 PHY, has unmodified logic after this change, because
currently, the aqr_gen2_fill_interface_modes() call is chained after
aqr_gen2_config_init(), and after this patch, it is tail-called from the
latter function, leading to the same code flow.
At the same time, move aqr_gen2_fill_interface_modes() upwards of its
new caller, aqr_gen2_config_init(), to avoid a forward declaration.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Establish a more intuitive function naming convention in this driver.
A GenX PHY must only call aqr_genY_ functions, where Y <= X.
Loosely speaking, aqr107_ is representative of Gen2 and above, except for:
- aqr107_config_init()
- aqr107_suspend()
- aqr107_resume()
- aqr107_wait_processor_intensive_op()
which are also called by AQR105, so these are renamed to Gen1.
Actually aqr107_config_init() is renamed to aqr_gen1_config_init() when
called by AQR105, and aqr_gen2_config_init() when called by all other
PHYs. The Gen2 function calls the Gen1 function, so there is no
functional change. This prefaces further Gen2-specific initialization
steps which must be omitted for AQR105. These will be added to
aqr_gen2_config_init().
In fact, many PHY drivers call an aqr*_config_init() beneath their
generation's feature set: AQR114C is a Gen4 PHY which calls
aqr_gen2_config_init(), even though AQR113C, also a Gen4 PHY which
differs only in maximum link speed, calls the richer
aqr113c_config_init() which also sets phydev->possible_interfaces.
Many of the more subtle inconsistencies of this kind will be fixed up in
later changes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduced in commit bed90b06b6 ("net: phy: aquantia: clear PMD Global
Transmit Disable bit during init"), the clearing of MDIO_PMA_TXDIS plus
the call to aqr107_wait_processor_intensive_op() are only by chance
placed between aqr107_config_init() and aqr107_fill_interface_modes().
In other words, aqr107_fill_interface_modes() does not depend in any way
on these 2 operations.
I am only 90% sure of that, and I intend to move aqr107_fill_interface_modes()
to be a part of aqr107_config_init() in the future. So to isolate the
issue for blame attribution purposes, make these 2 functions adjacent to
each other again.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I'm unsure whether intentionate or not, but I think the (partially
observed) naming convention in this driver is that function prefixes
denote the earliest generation when a feature is available. In case of
aqr107_fill_interface_modes(), that means that the GLOBAL_CFG registers
are a Gen2 feature. Supporting evidence: the AQR105, a Gen1 PHY, does
not have these registers, thus the function is not named aqr105_*.
Based on this inferred naming scheme, I am proposing a refinement of
commit a7f3abcf63 ("net: phy: aquantia: only poll GLOBAL_CFG regs on
aqr113, aqr113c and aqr115c") which introduced aqr113c_fill_interface_modes(),
suggesting this may be a Gen4 PHY feature.
The long-term goal is for aqr107_config_init() to tail-call
aqr107_fill_interface_modes(), such that the latter function is also
called by AQR107 itself, and many other PHY drivers. Currently it can't,
because aqr113c_config_init() calls aqr107_config_init() and then
aqr113c_fill_interface_modes(). So this would lead to a duplicate call
to aqr107_fill_interface_modes() for AQR113C.
Centralize the reading of GLOBAL_CFG registers in the AQR107 method, and
create a boolean, set to true by AQR113C, which tests whether waiting
for a non-zero value in the GLOBAL_CFG_100M register is necessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I have noticed from schematics and firmware images that the PHY for
which I've previously added support in commit 973fbe68df ("net: phy:
aquantia: add AQR112 and AQR412 PHY IDs") is actually an AQR412C, not
AQR412.
These are actually PHYs from the same generation, and Marvell documents
them as differing only in the size of the FCCSP package: 19x19 mm for
the AQR412, vs 14x12mm for the Compact AQR412C.
I don't think there is any point in backporting this to stable kernels,
since the PHYs are identical in capabilities, and no functional
difference is expected regardless of how the PHY is identified.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu says:
====================
net: wangxun: complete ethtool coalesce options
Support to use adaptive RX coalescing. Change the default RX coalesce
usecs and limit the range of parameters for various types of devices,
according to their hardware design.
====================
Link: https://patch.msgid.link/20250821023408.53472-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change the default RX/TX ITR for wx_mac_em devices from 20K to 7K, which
is an experience value from out-of-tree ngbe driver, to get higher
performance on some platforms with weak single-core performance.
TCP_SRTEAM test on Phytium 2000+ shows that the throughput of 64-Byte
packets is increased from 350.53Mbits/s to 395.92Mbits/s.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250821023408.53472-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fan Gong says:
====================
net: hinic3: Add a driver for Huawei 3rd gen NIC - management interfaces
This is the 2/3 patch of the patch-set described below.
The patch-set contains driver for Huawei's 3rd generation HiNIC
Ethernet device that will be available in the future.
This is an SRIOV device, designed for data centers.
Initially, the driver only supports VFs.
Following the discussion over RFC01, the code will be submitted in
separate smaller patches where until the last patch the driver is
non-functional. The RFC02 submission contains overall view of the entire
driver but every patch will be posted as a standalone submission.
====================
Link: https://patch.msgid.link/cover.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dimitri Daskalakis says:
====================
selftests: Test XDP_TX for single-buffer
Ensure single buffer XDP functions correctly by covering the following cases:
1) Zero size payload
2) Full MTU
3) Single buffer packets through a multi-buffer XDP program
These changes were tested with netdevsim and fbnic.
# ./ksft-net-drv/drivers/net/xdp.py
TAP version 13
1..10
ok 1 xdp.test_xdp_native_pass_sb
ok 2 xdp.test_xdp_native_pass_mb
ok 3 xdp.test_xdp_native_drop_sb
ok 4 xdp.test_xdp_native_drop_mb
ok 5 xdp.test_xdp_native_tx_sb
ok 6 xdp.test_xdp_native_tx_mb
# Failed run: pkt_sz 2048, offset 1. Last successful run: pkt_sz 1024, offset 256. Reason: Adjustment failed
ok 7 xdp.test_xdp_native_adjst_tail_grow_data
ok 8 xdp.test_xdp_native_adjst_tail_shrnk_data
# Failed run: pkt_sz 512, offset -256. Last successful run: pkt_sz 512, offset -128. Reason: Adjustment failed
ok 9 xdp.test_xdp_native_adjst_head_grow_data
# Failed run: pkt_sz (2048) > HDS threshold (1536) and offset 64 > 48
ok 10 xdp.test_xdp_native_adjst_head_shrnk_data
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0
====================
Link: https://patch.msgid.link/20250821014023.1481662-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NIX block receives traffic from multiple channels, including:
MAC block (RPM)
Loopback module (LBK)
CPT block
RPM
|
-----------------
LBK --| NIX |
-----------------
|
CPT
Due to a hardware errata, CN10k and earlier Octeon silicon series,
the hardware may incorrectly assert XOFF on certain channels during
reset. As a workaround, a write operation to the NIX_AF_RX_CHANX_CFG
register can be performed to broadcast XON signals on the affected
channels
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250820064625.1464361-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Declare PP_FLAG_ALLOW_UNREADABLE_NETMEM to turn on unreadable netmem
support in GVE.
We also drop any net_iov packets where header split is not enabled.
We're unable to process packets where the header landed in unreadable
netmem.
Use page_pool_dma_sync_netmem_for_cpu in lieu of
dma_sync_single_range_for_cpu to correctly handle unreadable netmem
that should not be dma-sync'd.
Disable rx_copybreak optimization if payload is unreadable netmem as
that needs access to the payload.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250818210507.3781705-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
netfilter: updates for net-next
First patch gets rid of refcounting for dying list dumping,
use a cookie value instead of keeping the object around.
Remaining patches extend nftables pipapo (concatenated ranges) set type.
Make the AVX2 optimized version available from the control plane as
well, then use it during insert. This gives a nice speedup for large
sets. All from myself.
On PREEMPT_RT, we can't rely on local_bh_disable to protect the
access to the percpu scratch maps. Use nested-BH locking for this,
From Sebastian Siewior.
* tag 'nf-next-25-08-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch
netfilter: nft_set_pipapo: Store real pointer, adjust later.
netfilter: nft_set_pipapo: use avx2 algorithm for insertions too
netfilter: nft_set_pipapo_avx2: split lookup function in two parts
netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection
netfilter: ctnetlink: remove refcounting in dying list dumping
====================
Link: https://patch.msgid.link/20250820144738.24250-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xichao Zhao says:
====================
net: Remove the use of dev_err_probe()
The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.
====================
Link: https://patch.msgid.link/20250820085749.397586-1-zhao.xichao@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-08-21
We've added 9 non-merge commits during the last 3 day(s) which contain
a total of 13 files changed, 1027 insertions(+), 27 deletions(-).
The main changes are:
1) Added bpf dynptr support for accessing the metadata of a skb,
from Jakub Sitnicki.
The patches are merged from a stable branch bpf-next/skb-meta-dynptr.
The same patches have also been merged into bpf-next/master.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Cover metadata access from a modified skb clone
selftests/bpf: Cover read/write to skb metadata at an offset
selftests/bpf: Cover write access to skb metadata via dynptr
selftests/bpf: Cover read access to skb metadata via dynptr
selftests/bpf: Parametrize test_xdp_context_tuntap
selftests/bpf: Pass just bpf_map to xdp_context_test helper
selftests/bpf: Cover verifier checks for skb_meta dynptr type
bpf: Enable read/write access to skb metadata through a dynptr
bpf: Add dynptr type for skb metadata
====================
Link: https://patch.msgid.link/20250821191827.2099022-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth.
Current release - fix to a fix:
- usb: asix_devices: fix PHY address mask in MDIO bus initialization
Current release - regressions:
- Bluetooth: fixes for the split between BIS_LINK and PA_LINK
- Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN
flag", breaks compatibility with some existing device tree blobs
- dsa: b53: fix reserved register access in b53_fdb_dump()
Current release - new code bugs:
- sched: dualpi2: run probability update timer in BH to avoid
deadlock
- eth: libwx: fix the size in RSS hash key population
- pse-pd: pd692x0: improve power budget error paths and handling
Previous releases - regressions:
- tls: fix handling of zero-length records on the rx_list
- hsr: reject HSR frame if skb can't hold tag
- bonding: fix negotiation flapping in 802.3ad passive mode
Previous releases - always broken:
- gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM
- sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit,
avoid packet drops with low buffer_limit, remove unnecessary WARN()
- sched: fix backlog accounting after modifying config of a qdisc in
the middle of the hierarchy
- mptcp: improve handling of skb extension allocation failures
- eth: mlx5:
- fixes for the "HW Steering" flow management method
- fixes for QoS and device buffer management"
* tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
netfilter: nf_reject: don't leak dst refcount for loopback packets
net/mlx5e: Preserve shared buffer capacity during headroom updates
net/mlx5e: Query FW for buffer ownership
net/mlx5: Restore missing scheduling node cleanup on vport enable failure
net/mlx5: Fix QoS reference leak in vport enable error path
net/mlx5: Destroy vport QoS element when no configuration remains
net/mlx5e: Preserve tc-bw during parent changes
net/mlx5: Remove default QoS group and attach vports directly to root TSAR
net/mlx5: Base ECVF devlink port attrs from 0
net: pse-pd: pd692x0: Skip power budget configuration when undefined
net: pse-pd: pd692x0: Fix power budget leak in manager setup error path
Octeontx2-af: Skip overlap check for SPI field
selftests: tls: add tests for zero-length records
tls: fix handling of zero-length records on the rx_list
net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision
selftests: bonding: add test for passive LACP mode
bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU
bonding: update LACP activity flag after setting lacp_active
Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag"
ipv6: sr: Fix MAC comparison to be constant-time
...
recent patches to add a WARN() when replacing skb dst entry found an
old bug:
WARNING: include/linux/skbuff.h:1165 skb_dst_check_unset include/linux/skbuff.h:1164 [inline]
WARNING: include/linux/skbuff.h:1165 skb_dst_set include/linux/skbuff.h:1210 [inline]
WARNING: include/linux/skbuff.h:1165 nf_reject_fill_skb_dst+0x2a4/0x330 net/ipv4/netfilter/nf_reject_ipv4.c:234
[..]
Call Trace:
nf_send_unreach+0x17b/0x6e0 net/ipv4/netfilter/nf_reject_ipv4.c:325
nft_reject_inet_eval+0x4bc/0x690 net/netfilter/nft_reject_inet.c:27
expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline]
..
This is because blamed commit forgot about loopback packets.
Such packets already have a dst_entry attached, even at PRE_ROUTING stage.
Instead of checking hook just check if the skb already has a route
attached to it.
Fixes: f53b9b0bdc ("netfilter: introduce support for reject at prerouting stage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250820123707.10671-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When port buffer headroom changes, port_update_shared_buffer()
recalculates the shared buffer size and splits it in a 3:1 ratio
(lossy:lossless) - Currently, the calculation is:
lossless = shared / 4;
lossy = (shared / 4) * 3;
Meaning, the calculation dropped the remainder of shared % 4 due to
integer division, unintentionally reducing the total shared buffer
by up to three cells on each update. Over time, this could shrink
the buffer below usable size.
Fix it by changing the calculation to:
lossless = shared / 4;
lossy = shared - lossless;
This retains all buffer cells while still approximating the
intended 3:1 split, preventing capacity loss over time.
While at it, perform headroom calculations in units of cells rather than
in bytes for more accurate calculations avoiding extra divisions.
Fixes: a440030d89 ("net/mlx5e: Update shared buffer along with device buffer changes")
Signed-off-by: Armen Ratner <armeng@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250820133209.389065-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The SW currently saves local buffer ownership when setting
the buffer.
This means that the SW assumes it has ownership of the buffer
after the command is set.
If setting the buffer fails and we remain in FW ownership,
the local buffer ownership state incorrectly remains as SW-owned.
This leads to incorrect behavior in subsequent PFC commands,
causing failures.
Instead of saving local buffer ownership in SW,
query the FW for buffer ownership when setting the buffer.
This ensures that the buffer ownership state is accurately
reflected, avoiding the issues caused by incorrect ownership
states.
Fixes: ecdf2dadee ("net/mlx5e: Receive buffer support for DCBX")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>