Dong Yibo says:
====================
Add driver for 1Gbe network chips from MUCSE
This patch series adds support for MUCSE RNPGBE 1Gbps PCIe Ethernet controllers
(N500/N210 series), including build infrastructure, hardware initialization,
mailbox (MBX) communication with firmware, and basic netdev registration
(Can show mac witch is got from firmware, and tx/rx will be added later).
Series breakdown (5 patches):
01/05: net: ethernet/mucse: Add build support for rnpgbe
- Kconfig/Makefile for MUCSE vendor, basic PCI probe (no netdev)
02/05: net: ethernet/mucse: Add N500/N210 chip support
- netdev allocation, BAR mapping
03/05: net: ethernet/mucse: Add basic MBX ops for PF-FW communication
- base read/write, write with poll ack, poll and read data
04/05: net: ethernet/mucse: Add FW commands (sync, reset, MAC query)
- FW sync retry logic, MAC address retrieval, reset hw with
base mbx ops in patch4
05/05: net: ethernet/mucse: Complete netdev registration
- HW reset, MAC setup, netdev_ops registration
====================
Link: https://patch.msgid.link/20251101013849.120565-1-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Complete the network device (netdev) registration flow for Mucse Gbe
Ethernet chips, including:
1. Hardware state initialization:
- Send powerup notification to firmware (via echo_fw_status)
- Sync with firmware
- Reset hardware
2. MAC address handling:
- Retrieve permanent MAC from firmware (via mucse_mbx_get_macaddr)
- Fallback to random valid MAC (eth_random_addr) if not valid mac
from Fw
Signed-off-by: Dong Yibo <dong100@mucse.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251101013849.120565-6-dong100@mucse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert TI NetCP driver to use ndo_hwtstamp_get()/ndo_hwtstamp_set()
callbacks. The logic is slightly changed, because I believe the original
logic was not really correct. Config reading part is using the very
first module to get the configuration instead of iterating over all of
them and keep the last one as the configuration is supposed to be identical
for all modules. HW timestamp config set path is now trying to configure
all modules, but in case of error from one module it adds extack
message. This way the configuration will be as synchronized as possible.
There are only 2 modules using netcp core infrastructure, and both use
the very same function to configure HW timestamping, so no actual
difference in behavior is expected.
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20251103172902.3538392-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vadim Fedorenko says:
====================
convert drivers to use ndo_hwtstamp callbacks part 3 [part]
This patchset converts the rest of ethernet drivers to use ndo callbacks
instead ioctl to configure and report time stamping. The drivers in part
3 originally implemented only SIOCSHWTSTAMP command, but converted to
also provide configuration back to users.
====================
Link: https://patch.msgid.link/20251103150952.3538205-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver implemented SIOCSHWTSTAMP ioctl command only, but it stores
configuration in the private data, so it is possible to report it back
to users. Implement both ndo_hwtstamp_set and ndo_hwtstamp_get
callbacks. To properly report RX filter type, store it in hwts_rx_en
instead of using this field as a simple flag. The logic didn't change
because receive path used this field as boolean flag.
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20251103150952.3538205-7-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: pm: in-kernel: fullmesh endp nb + bind cases
Here is a small optimisation for the in-kernel PM, joined by a small
behavioural change to avoid confusions, and followed by a few more
tests.
- Patch 1: record fullmesh endpoints numbers, not to iterate over all
endpoints to check if one is marked as fullmesh.
- Patch 2: when at least one endpoint is marked as fullmesh, only use
these endpoints when reacting to an ADD_ADDR, even if there are no
endpoints for this IP family: this is less confusing.
- Patch 3: reduce duplicated code to prepare the next patch.
- Patch 4: extra "bind" cases: the listen socket restrict the bind to
one IP address, not allowing MP_JOIN to extra IP addresses, except if
another listening socket accepts them.
====================
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-0-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By design, an MPTCP connection will not accept extra subflows where no
MPTCP listening sockets can accept such requests.
In other words, it means that if the 'server' listens on a specific
address / device, it cannot accept MP_JOIN sent to a different address /
device. Except if there is another MPTCP listening socket accepting
them.
This is what the new tests are validating:
- Forcing a bind on the main v4/v6 address, and checking that MP_JOIN
to announced addresses are not accepted.
- Also forcing a bind on the main v4/v6 address, but before, another
listening socket is created to accept additional subflows. Note that
'mptcpize run nc -l' -- or something else only doing: socket(MPTCP),
bind(<IP>), listen(0) -- would be enough, but here mptcp_connect is
reused not to depend on another tool just for that.
- Same as the previous one, but using v6 link-local addresses: this is
a bit particular because it is required to specify the outgoing
network interface when connecting to a link-local address announced
by the other peer. When using the routing rules, this doesn't work
(the outgoing interface is not known) ; but it does work with a
'laminar' endpoint having a specified interface.
Note that extra small modifications are needed for these tests to work:
- mptcp_connect's check_getpeername_connect() check should strip the
specified interface when comparing addresses.
- With IPv6 link-local addresses, it is required to wait for them to
be ready (no longer in 'tentative' mode) before using them, otherwise
the bind() will not be allowed.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/591
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-4-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Our documentation is saying that the in-kernel PM is only using fullmesh
endpoints to establish subflows to announced addresses when at least one
endpoint has a fullmesh flag. But this was not totally correct: only
fullmesh endpoints were used if at least one endpoint *from the same
address family as the received ADD_ADDR* has the fullmesh flag.
This is confusing, and it seems clearer not to have differences
depending on the address family.
So, now, when at least one MPTCP endpoint has a fullmesh flag, the local
addresses are picked from all fullmesh endpoints, which might be 0 if
there are no endpoints for the correct address family.
One selftest needs to be adapted for this behaviour change.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-2-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of iterating over all endpoints, under RCU read lock, just to
check if one of them as the fullmesh flag, we can keep a counter of
fullmesh endpoint, similar to what is done with the other flags.
This counter is now checked, before iterating over all endpoints.
Similar to the other counters, this new one is also exposed. A userspace
app can then know when it is being used in a fullmesh mode, with
potentially (too) many subflows.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-1-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan says:
====================
net/mlx5e: Reduce interface downtime on configuration change
This series significantly reduces the interface downtime while swapping
channels during a configuration change, on capable devices.
Here we remove an old requirement on operations ordering that became
obsolete on recent capable devices. This helps cutting the downtime by a
factor of magnitude, ~80% in our example.
Perf numbers:
Measured the number of dropped packets in a simple ping flood test,
during a configuration change operation, that switches the number of
channels from 247 to 248.
Before: 71 packets lost
After: 15 packets lost, ~80% saving.
====================
Link: https://patch.msgid.link/1761831159-1013140-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cap bit tis_tir_td_order=1 indicates that an old firmware requirement /
limitation no longer exists. When unset, the latency of several firmware
commands significantly increases with the presence of high number of
co-existing channels (both old and new sets). Hence, we used to close
unneeded old channels before invoking those firmware commands.
Today, on capable devices, this is no longer the case. Minimize the
interface down time by deferring the old channels closure, after the
activation of the new ones.
Perf numbers:
Measured the number of dropped packets in a simple ping flood test,
during a configuration change operation, that switches the number of
channels from 247 to 248.
Before: 71 packets lost
After: 15 packets lost, ~80% saving.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On old firmware, (tis_tir_td_order=0), TIR of a transport domain should
either be created after all SQs of the same domain, or TIR.self_lb_en
should be reapplied using MODIFY_TIR, for self loopback filtering to
function correctly.
This is not necessary anymnore on new FW (tis_tir_td_order=1), thus
there's no need for calling modify_tir operations after creating a new
set of SQs to maintain the self loopback prevention functional.
Skip these operations.
This saves O(max_num_channels) MODIFY_TIR firmware commands in
operations like interface up or channels configuration change.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In IPoIB, the self loopback prevention configuration apply in activation
stage has two roles: fulfill a firmware requirement for old firmware
(tis_tir_td_order=0), and update the proper configuration as it was not
set in init.
Here we set the proper configuration in init, to allow skipping the
modify_tirs commands on new firmware in a downstream patch.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The re-application of self loopback prevention attributes in TIRs is
necessary in old firmwares (where tis_tir_td_order cap is cleared) after
recreation of SQs.
However, this is not needed in new firmware with tis_tir_td_order=1.
As a preparation patch, enhance the function structures to differentiate
between an explicit loopback prevention configuration apply, and the
re-apply operation required by old firmware.
Loopback selftests should now call mlx5e_modify_tirs_lb() directly, as
their use case is not related to the firmware limitation.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1761831159-1013140-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original comments contained spelling errors and incomplete logical
descriptions, which could easily lead to misunderstandings of the code
logic. The specific modifications are as follows:
Correct the spelling error by changing "inut max" to "but not exceed the
maximum limit";
Add the note "If the user has not specified a value, the default maximum
limit is 8" to clarify the default value logic;
Improve the coherence of the statement to make the queue quantity rules
clearer.
After the modification, the comments can accurately reflect the code
behavior of "taking the smaller value between the number of CPUs and the
default maximum limit of 8 for the number of queues", enhancing code
maintainability.
Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://patch.msgid.link/20251103032212.2462-1-chuguangqing@inspur.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: stmmac: multi-interface stmmac
This series adds a callback for platform glue to configure the stmmac
core interface mode depending on the PHY interface mode that is being
used. This is currently only called just before the dwmac core is reset
since these signals are latched on reset.
Included in this series are changes to s32 to move its PHY_INTF_SEL_x
definitions out of the way of the dwmac core's signals which has more
entitlement to use this name. We convert dwmac-imx as an example.
Including other platform glue would make this series excessively large,
but once this core code is merged, the individual platform glue updates
can be posted one after another as they will be independent of each
other.
It is hoped that this callback can be used in future to reconfigure the
dwmac core when the interface mode changes to support PHYs that change
their interface mode, but we're nowhere near being able to do that yet.
====================
Link: https://patch.msgid.link/aQiWzyrXU_2hGJ4j@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
i.MX implementations other than IMX8DXL involve setting the dwmac core
phy_intf_sel input. Use stmmac_get_phy_intf_sel() to decode the PHY
interface mode to the phy_intf_sel value, validating the result, and
passing it into the implementation specific .set_intf_mode() method
rather than each .set_intf_mode() method doing this.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4x-0000000ChpA-1Edr@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When dwmac is synthesised with support for multiple PHY interfaces, the
core provides phy_intf_sel inputs, sampled on reset, to configure the
PHY facing interface. Use stmmac_get_phy_intf_sel() in core code to
determine the dwmac phy_intf_sel input value, and provide a new
platform method called with this value just before we issue a soft
reset to the dwmac core.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4h-0000000Chos-3wxX@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a function to translate the PHY interface mode to the
phy_intf_sel pin configuration for dwmac1000 and dwmac4 cores that
support multiple interfaces. We currently handle MII, GMII, RGMII,
SGMII, RMII and REVMII, but not TBI, RTBI nor SMII as drivers do not
appear to use these three and the driver doesn't currently support
these.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vFt4c-0000000Choe-3SII@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
deliver_skb() should not be inlined as is it not called
in the fast path.
Add unlikely() clauses giving hints to the compiler about this fact.
Before this patch:
size net/core/dev.o
text data bss dec hex filename
121794 13330 176 135300 21084 net/core/dev.o
__netif_receive_skb_core() size on x86_64 : 4080 bytes.
After:
size net/core/dev.o
text data bss dec hex filenamee
120330 13338 176 133844 20ad4 net/core/dev.o
__netif_receive_skb_core() size on x86_64 : 2781 bytes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251103165256.1712169-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing says:
====================
xsk: minor optimizations around locks
Two optimizations regarding xsk_tx_list_lock and cq_lock can yield a
performance increase because of avoiding disabling and enabling
interrupts frequently.
====================
Link: https://patch.msgid.link/20251030000646.18859-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
- Split cq_lock into two smaller locks: cq_prod_lock and
cq_cached_prod_lock
- Avoid disabling/enabling interrupts in the hot xmit path
In either xsk_cq_cancel_locked() or xsk_cq_reserve_locked() function,
the race condition is only between multiple xsks sharing the same
pool. They are all in the process context rather than interrupt context,
so now the small lock named cq_cached_prod_lock can be used without
handling interrupts.
While cq_cached_prod_lock ensures the exclusive modification of
@cached_prod, cq_prod_lock in xsk_cq_submit_addr_locked() only cares
about @producer and corresponding @desc. Both of them don't necessarily
be consistent with @cached_prod protected by cq_cached_prod_lock.
That's the reason why the previous big lock can be split into two
smaller ones. Please note that SPSC rule is all about the global state
of producer and consumer that can affect both layers instead of local
or cached ones.
Frequently disabling and enabling interrupt are very time consuming
in some cases, especially in a per-descriptor granularity, which now
can be avoided after this optimization, even when the pool is shared by
multiple xsks.
With this patch, the performance number[1] could go from 1,872,565 pps
to 1,961,009 pps. It's a minor rise of around 5%.
[1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20251030000646.18859-3-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The commit ac98d8aab6 ("xsk: wire upp Tx zero-copy functions")
originally introducing this lock put the deletion process in the
sk_destruct which can run in irq context obviously, so the
xxx_irqsave()/xxx_irqrestore() pair was used. But later another
commit 541d7fdd76 ("xsk: proper AF_XDP socket teardown ordering")
moved the deletion into xsk_release() that only happens in process
context. It means that since this commit, it doesn't necessarily
need that pair.
Now, there are two places that use this xsk_tx_list_lock and only
run in the process context. So avoid manipulating the irq then.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20251030000646.18859-2-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>