Dynamically allocate async ICOSQ. ICO (Internal Communication
Operations) is for driver to communicate with the HW, and it's
not used for traffic. Currently mlx5 driver has sync and async
ICO send queues. The async ICOSQ means that it's not necessarily
under NAPI context protection. The patch is in preparation for
the later patch to detect its usage and enable it when necessary.
Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768376800-1607672-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before the cited commit, ICOSQ is used to post NOP WQE to trigger
hardware interrupt and start NAPI, but this mechanism suffers from
a race condition: mlx5e_alloc_rx_mpwqe may post UMR WQEs to ICOSQ
_before_ NOP WQE is posted. The cited commit fixes the issue by
replacing ICOSQ with async ICOSQ, as a new way to post the NOP WQE
to trigger the hardware interrupt and NAPI.
The patch changes it back by replacing async ICOSQ with regular
ICOSQ, for the purpose of saving memory in later patches, and solves
the issue by adding a new SQ state, MLX5E_SQ_STATE_LOCK_NEEDED
for syncing the start of NAPI.
What it does:
- Switch trigger path from async ICOSQ to regular ICOSQ to reduce
need for async SQ.
- Introduce MLX5E_SQ_STATE_LOCK_NEEDED and mlx5e_icosq_sync_lock(),
unlock() to prevent the race where UMR WQEs could be posted before
the NOP WQE used to trigger NAPI.
- Use synchronize_net() once per trigger cycle to quiesce in-flight
softirqs before serializing the NOP WQE and any UMR postings via
the ICOSQ lock.
- Wrap ICOSQ UMR posting in en_rx.c and xsk/rx.c with the new
conditional lock.
The conditional locking approach is critical for performance: always
locking would impose unnecessary overhead. Synchronization is not needed
between regular NAPI cycles once the channel is activated and running.
The lock is only required to protect against the race during channel
activation—specifically, when the very first NOP WQE is posted to trigger
NAPI. After that initial trigger, normal NAPI polling handles subsequent
work without contention. The MLX5E_SQ_STATE_LOCK_NEEDED flag ensures we
pay the synchronization cost only when necessary.
Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768376800-1607672-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the async_icosq spinlock from the mlx5e_channel structure into
the mlx5e_icosq structure itself for better encapsulation and for
later patch to also use it for other icosq use cases.
Changes:
- Add spinlock_t lock field to struct mlx5e_icosq
- Remove async_icosq_lock field from struct mlx5e_channel
- Initialize the new lock in mlx5e_open_icosq()
- Update all lock usage in ktls_rx.c and en_main.c to use sq->lock
instead of c->async_icosq_lock
Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768376800-1607672-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera says:
====================
dpll: support mode switching
This series adds support for switching the working mode (automatic vs
manual) of a DPLL device via netlink.
Currently, the DPLL subsystem allows userspace to retrieve the current
working mode but lacks the mechanism to configure it. Userspace is also
unaware of which modes a specific device actually supports, as it
currently assumes only the active mode is supported.
The series addresses these limitations by:
1. Introducing .supported_modes_get() callback to allow drivers to report
all modes capable of running on the device.
2. Introducing .mode_set() callback and updating the netlink policy
to allow userspace to request a mode change.
3. Implementing these callbacks in the zl3073x driver, enabling dynamic
switching between automatic and manual modes.
====================
Link: https://patch.msgid.link/20260114122726.120303-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for .supported_modes_get() and .mode_set() callbacks
to enable switching between manual and automatic modes via netlink.
Implement .supported_modes_get() to report available modes based
on the current hardware configuration:
* manual mode is always supported
* automatic mode is supported unless the dpll channel is configured
in NCO (Numerically Controlled Oscillator) mode
Implement .mode_set() to handle the specific logic required when
transitioning between modes:
1) Transition to manual:
* If a valid reference is currently active, switch the hardware
to ref-lock mode (force lock to that reference).
* If no reference is valid and the DPLL is unlocked, switch to freerun.
* Otherwise, switch to Holdover.
2) Transition to automatic:
* If the currently selected reference pin was previously marked
as non-selectable (likely during a previous manual forcing
operation), restore its priority and selectability in the hardware.
* Switch the hardware to Automatic selection mode.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Link: https://patch.msgid.link/20260114122726.120303-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, userspace can retrieve the DPLL working mode but cannot
configure it. This prevents changing the device operation, such as
switching from manual to automatic mode and vice versa.
Add a new callback .mode_set() to struct dpll_device_ops. Extend
the netlink policy and device-set command handling to process
the DPLL_A_MODE attribute. Update the netlink YAML specification
to include the mode attribute in the device-set operation.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260114122726.120303-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the DPLL subsystem assumes that the only supported mode is
the one currently active on the device. When dpll_msg_add_mode_supported()
is called, it relies on ops->mode_get() and reports that single mode
to userspace. This prevents users from discovering other modes the device
might be capable of.
Add a new callback .supported_modes_get() to struct dpll_device_ops. This
allows drivers to populate a bitmap indicating all modes supported by
the hardware.
Update dpll_msg_add_mode_supported() to utilize this new callback:
* if ops->supported_modes_get is defined, use it to retrieve the full
bitmap of supported modes.
* if not defined, fall back to the existing behavior: retrieve
the current mode via ops->mode_get and set the corresponding bit
in the bitmap.
Finally, iterate over the bitmap and add a DPLL_A_MODE_SUPPORTED netlink
attribute for every set bit, accurately reporting the device's capabilities
to userspace.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260114122726.120303-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Following warning is encountered when building selftests on powerpc/32.
CC csum
csum.c: In function 'recv_get_packet_csum_status':
csum.c:710:50: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
710 | error(1, 0, "cmsg: len=%lu expected=%lu",
| ~~^
| |
| long unsigned int
| %u
711 | cm->cmsg_len, CMSG_LEN(sizeof(struct tpacket_auxdata)));
| ~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
csum.c:710:63: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
710 | error(1, 0, "cmsg: len=%lu expected=%lu",
| ~~^
| |
| long unsigned int
| %u
cm->cmsg_len has type __kernel_size_t and CMSG() macro has the type
returned by sizeof() which is size_t.
size_t is 'unsigned int' on some platforms and 'unsigned long' on
other ones so use %zu instead of %lu.
The code in question was introduced by
commit 91a7de8560 ("selftests/net: add csum offload test").
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/8b69b40826553c1dd500d9d25e45883744f3f348.1768556791.git.chleroy@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Sverdlin says:
====================
dsa: mxl-gsw1xx: Support R(G)MII slew rate configuration
Maxlinear GSW1xx switches offer slew rate configuration bits for R(G)MII
interface. The default state of the configuration bits is "normal", while
"slow" can be used to reduce the radiated emissions. Add the support for
the latter option into the driver as well as the new DT bindings.
====================
Link: https://patch.msgid.link/20260114104509.618984-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet says:
====================
ipv6: more data-race annotations
Inspired by one unrelated syzbot report.
This series adds missing (and boring) data-race annotations in IPv6.
Only the first patch adds sysctl_ipv6_flowlabel group
to speedup ip6_make_flowlabel() a bit.
====================
Link: https://patch.msgid.link/20260115094141.3124990-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Group together following struct netns_sysctl_ipv6 fields:
- flowlabel_consistency
- auto_flowlabels
- flowlabel_state_ranges
After this patch, ip6_make_flowlabel() uses a single cache line to fetch
auto_flowlabels and flowlabel_state_ranges (instead of two before the patch).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao says:
====================
net: convert drivers to .get_rx_ring_count (part 2)
Commit 84eaf4359c ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.
Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().
This simplifies the RX ring count retrieval and aligns the following
drivers with the new ethtool API for querying RX ring parameters.
* engleder/tsnep
* mediatek
* amazon/ena
* microchip/lan743x
* amd/xgbe
* chelsio/cxgb4
* wangxun/txgbe
* cadence/macb
All of these change were compile-tested only.
====================
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-0-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yohei Kojima says:
====================
selftests: net: improve error handling in passive TFO test
This series improves error handling in the passive TFO test by (1)
fixing a broken behavior when the child processes failed (or timed out),
and (2) adding more error handlng code in the test program.
The first patch fixes the behavior that the test didn't report failure
even if the server or the client process exited with non-zero status.
The second patch adds error handling code in the test program to improve
reliability of the test.
====================
Link: https://patch.msgid.link/cover.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle says:
====================
net: phy: realtek: simplify and reunify C22/C45 drivers
The RTL8221B PHY variants (VB-CG and VM-CG) were previously split into
separate C22 and C45 driver instances to support copper SFP modules
using the RollBall MDIO-over-I2C protocol, which only supports Clause-45
access. However, this split created significant code duplication and
complexity.
Commit 8af2136e77 ("net: phy: realtek: add helper
RTL822X_VND2_C22_REG") exposed that RealTek PHYs map all standard
Clause-22 registers into MDIO_MMD_VEND2 at offset 0xa400.
With commit 1850ec20d6 ("net: phy: realtek: use paged access for
MDIO_MMD_VEND2 in C22 mode") it is now possible to access all MMD
registers transparently, regardless of whether the PHY is accessed via
C22 or C45 MDIO.
Further improve the translation logic for this register mapping, so a
single unified driver works efficiently with both access methods,
reducing code duplication.
The series also includes cleanup to remove unnecessary paged operations
on registers that aren't actually affected by page selection.
Testing was done on RTL8211F and RTL8221B-VB-CG (the latter in both
C22 and C45 modes).
====================
Link: https://patch.msgid.link/cover.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reunify the split C22/C45 drivers for the RTL8221B-VB-CG 2.5Gbps and
RTL8221B-VM-CG 2.5Gbps PHYs back into a single driver.
This is possible now by using all the driver operations previously used
by the C45 driver, as transparent access to all MMDs including
MDIO_MMD_VEND2 is now possible also over Clause-22 MDIO.
The unified driver will still only use Clause-45 access on any Clause-45
capable busses while still working fine on Clause-22 busses.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/bffcb85fdc20e07056976962d3caaa1be5d0ddb0.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RealTek 2.5GE PHYs have all standard Clause-22 registers mapped also
inside MDIO_MMD_VEND2 at offset 0xa400. This is used mainly in case the
PHY is connected to a Clause-45-only bus. The RTL8221B is frequently
used in copper SFP module which uses the RollBall MDIO-over-I2C
method which *only* supports Clause-45, for example.
In order to support using the PHY on Clause-45-only busses, the PHY
driver has previously been split into a C22-only and C45-only instances,
creating quite a bit of redundancy and confusion.
In preparation of reunifying the two driver instances, add support for
translating MDIO_MMD_VEND2 registers 0xa400 to 0xa43c back to Clause-22
registers 0 to 30 in case the PHY is accessed on a Clause-22 bus.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/fd49d86bd0445b76269fd3ea456c709c2066683f.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dwmac4's transmit performance dropped by a factor of four due to an
incorrect assumption about which definitions are for what. This
highlights the need for sane register macros.
Commit 8409495bf6 ("net: stmmac: cores: remove many xxx_SHIFT
definitions") changed the way the txpbl value is merged into the
register:
value = readl(ioaddr + DMA_CHAN_TX_CONTROL(dwmac4_addrs, chan));
- value = value | (txpbl << DMA_BUS_MODE_PBL_SHIFT);
+ value = value | FIELD_PREP(DMA_BUS_MODE_PBL, txpbl);
With the following in the header file:
#define DMA_BUS_MODE_PBL BIT(16)
-#define DMA_BUS_MODE_PBL_SHIFT 16
The assumption here was that DMA_BUS_MODE_PBL was the mask for
DMA_BUS_MODE_PBL_SHIFT, but this turns out not to be the case.
The field is actually six bits wide, buts 21:16, and is called
TXPBL.
What's even more confusing is, there turns out to be a PBLX8
single bit in the DMA_CHAN_CONTROL register (0x1100 for channel 0),
and DMA_BUS_MODE_PBL seems to be used for that. However, this bit
et.al. was listed under a comment "/* DMA SYS Bus Mode bitmap */"
which is for register 0x1004.
Fix this up by adding an appropriately named field definition under
the DMA_CHAN_TX_CONTROL() register address definition.
Move the RPBL mask definition under DMA_CHAN_RX_CONTROL(), correctly
renaming it as well.
Also move the PBL bit definition under DMA_CHAN_CONTROL(), correctly
renaming it.
This removes confusion over the PBL fields.
Fixes: 8409495bf6 ("net: stmmac: cores: remove many xxx_SHIFT definitions")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Bisected-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://lore.kernel.org/51859704-57fd-4913-b09d-9ac58a57f185@bootlin.com
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vgY1k-00000003vOC-0Z1H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- ALE_VERSION_MAJOR/MINOR are no longer used following the transition to
regmaps in commit bbfc7e2b9e ("net: ethernet: ti: cpsw_ale: use
regfields for ALE registers")
- ALE_VERSION_IR3 is unused since entry mask bits are no longer
hardcoded with commit b5d31f2940 ("net: ethernet: ti: ale: optimize
ale entry mask bits configuartion")
- ALE_VERSION_IR4 has never been used since its introduction in commit
ca47130a74 ("net: netcp: ale: update to support unknown vlan
controls for NU switch")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Link: https://patch.msgid.link/20260114144425.3973272-1-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo pointed out that we can avoid separately allocating struct
cake_sched_config even in the non-mq case, by embedding it into struct
cake_sched_data. This reduces the complexity of the logic that swaps the
pointers and frees the old value, at the cost of adding 56 bytes to the
latter. Since cake_sched_data is already almost 17k bytes, this seems
like a reasonable tradeoff.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Fixes: bc0ce2bad3 ("net/sched: sch_cake: Factor out config variables into separate struct")
Link: https://patch.msgid.link/20260113143157.2581680-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expand the tls-offload.rst documentation to provide a more detailed
explanation of the asynchronous resync process, including the role
of struct tls_offload_resync_async in managing resync requests on
the kernel side.
Also, add documentation for helper functions
tls_offload_rx_resync_async_request_start/ _end/ _cancel.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1768298883-1602599-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Weißschuh says:
====================
uapi: Use UAPI definitions of INT_MAX and INT_MIN
Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a
dependency on a libc, which UAPI headers should not do.
Introduce and use equivalent UAPI constants.
====================
Link: https://patch.msgid.link/20260113-uapi-limits-v2-0-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>