Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-26 (ice, idpf)
For ice:
Jake converts ring stats to utilize u64_stats APIs and performs some
cleanups along the way.
Alexander reorganizes layout of Tx and Rx rings for cacheline
locality and utilizes __cacheline_group* macros on the new layouts.
For idpf:
YiFei Zhu adds support for BPF kfunc reporting of hardware Rx timestamps.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: export RX hardware timestamping information to XDP
ice: reshuffle and group Rx and Tx queue fields by cachelines
ice: convert all ring stats to u64_stats_t
ice: shorten ring stat names and add accessors
ice: use u64_stats API to access pkts/bytes in dim sample
ice: remove ice_q_stats struct and use struct_group
ice: pass pointer to ice_fetch_u64_stats_per_ring
====================
Link: https://patch.msgid.link/20260126224313.3847849-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All drivers that need to report the RX ring count now implement the
get_rx_ring_count callback directly. Remove the legacy fallback path
that obtained this information by calling get_rxnfc with ETHTOOL_GRXRINGS.
This simplifies the code and makes get_rx_ring_count the only way
to retrieve the RX ring count.
Note: ethtool_get_rx_ring_count() returns int to allow returning
-EOPNOTSUPP, while the callback returns u32. The implicit conversion
is safe since RX ring counts will not exceed INT_MAX while we are still
alive.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260126-grxring_final-v1-1-0981cb24512e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman says:
====================
Single MSS length in UDP GSO_PARTIAL
This series addresses an inconsistency in how UDP GSO_PARTIAL handles
the UDP header length field.
Currently, when GSO_PARTIAL segmentation is used, the UDP header length
contains the large MSS size, requiring drivers to manually adjust it
before transmitting. This is inconsistent with how tunnel GSO_PARTIAL
handles outer headers in UDP tunnels, where the length is set to the
single segment size.
This was originally suggested by Alexander Duyck back in 2018:
https://lore.kernel.org/netdev/CAKgT0UcdnUWgr3KQ=RnLKigokkiUuYefmL-ePpDvJOBNpKScFA@mail.gmail.com/
The first patch fixes the core UDP offload code to set the UDP length
field to the single segment size (gso_size + UDP header) instead of the
large MSS size. This provides hardware with a proper template length
value for final segmentation.
The subsequent patches remove the now redundant UDP header length
adjustments from the mlx5e and aquantia drivers, as the core code now
handles this correctly.
I couldn't find any other drivers that support UDP GSO_PARTIAL; idpf
supports UDP segmentation, but it does not use GSO_PARTIAL.
====================
Link: https://patch.msgid.link/20260125121649.778086-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first byte of the Rx frame is a copy of the Rx status register, so
0x40 corresponds to RSR_MF (meaning the frame is multicast). Replace
0x40 with RSR_MF for clarity. (All other bits of the RSR indicate
errors. The fact that the driver ignores these errors will be fixed by
a later patch.)
The first byte of the status URB is a copy of the NSR, so 0x40
corresponds to NSR_LINKST. Replace 0x40 with NSR_LINKST for clarity.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Peter Korsgaard <peter@korsgaard.com>
Link: https://patch.msgid.link/20260124032248.26807-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that there are no users of the low-level SHA-1 interface, remove it.
Specifically:
- Remove SHA1_DIGEST_WORDS (no longer used)
- Remove sha1_init_raw() (no longer used)
- Rename sha1_transform() to sha1_block_generic() and make it static
- Move SHA1_WORKSPACE_WORDS into lib/crypto/sha1.c
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260123051656.396371-3-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There's now a proper SHA-1 API that follows the usual conventions for
hash function APIs: sha1_init(), sha1_update(), sha1_final(), sha1().
The only remaining user of the older low-level SHA-1 API,
sha1_init_raw() and sha1_transform(), is ipv6_generate_stable_address().
I'd like to remove this older API, which is too low-level.
Unfortunately, ipv6_generate_stable_address() does in fact skip the
SHA-1 finalization for some reason. So the values it computes are not
standard SHA-1 values, and it sort of does want the low-level API.
Still, it's still possible to use the higher-level functions sha1_init()
and sha1_update() to get the same result, provided that the resulting
state is used directly, skipping sha1_final().
So, let's do that instead. This will allow removing the low-level API.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260123051656.396371-2-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rk3328 contains two GMAC instances - gmac2io and gmac2phy. While the
gmac2io instance may be connected to an external PHY, the gmac2phy
instance is permanently connected via RMII to an on-SoC integrated PHY.
The driver currently tests for the gmac2phy instance by checking
bsp_priv->integrated_phy (determined from the PHY's phy-is-integrated
property) and sometimes that the interface mode is RMII. This works
because the rk3328.dtsi has:
gmac2phy: ethernet@ff550000 {
compatible = "rockchip,rk3328-gmac";
phy-mode = "rmii";
phy-handle = <&phy>;
mdio {
phy: ethernet-phy@0 {
phy-is-integrated;
};
};
};
The driver contains a mechanism to look up the MMIO address in a table
to determine bsp_priv->id, which is used for every other Rockchip
device. Switch rk3328 to use this mechanism to determine bsp_priv->id
and use that to select which GRF register is used for configuration,
similarly to how the other Rockchip SoCs handle such differences.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vkL1t-00000005usQ-0vjt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In https://lore.kernel.org/netdev/aDne1Ybuvbk0AwG0@shell.armlinux.org.uk/
I requested that a follow-up patch to change the name of dwmac-rk's
phy_power_on() function, which clashes with the drivers/phy function
of the same name. This can cause confusion when grepping for this
function name, or when reviewing code. Thankfully, stmmac doesn't make
use of drivers/phy which saves this from compile errors.
However, as is the usual case when a request is made as part of a
review, if the review leads to successful application of the patch the
author doesn't bother following up with any such requests, and so the
problem falls back onto the reviewer to address... so here is the
solution.
Rename dwmac-rk's function to rk_phy_power_ctl(), as the function both
powers up and down.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vkL1i-00000005usD-3lhz@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TCP fast path can (auto)inline this helper, instead
of (auto)inling it from tcp_send_fin().
No change of overall code size, but tcp_sendmsg() is faster.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 1/1 up/down: 141/-140 (1)
Function old new delta
tcp_stream_alloc_skb 216 357 +141
tcp_send_fin 688 548 -140
Total: Before=22236729, After=22236730, chg +0.00%
BTW, we might change tcp_send_fin() to use tcp_stream_alloc_skb().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260123111605.4089200-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao says:
====================
extend bit width in the flow director of HNS3 driver
The bit widths of HCLGE_FD_AD_QID and HCLGE_FD_AD_COUNTER_NUM are
increased to support higher specifications.
Note: The hardware already supports the specifications.
====================
Link: https://patch.msgid.link/20260123094756.3718516-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, HCLGE_FD_AD_COUNTER_NUM has only 7 bits and supports a
maximum of 127 counter_id. However, there are actually scenarios
where the counter_id exceeds 127.
This patch adds an additional bit to HCLGE_FD_AD_QID to ensure
that counter_id greater than 127 are supported.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260123094756.3718516-3-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, HCLGE_FD_AD_QID has only 10 bits and supports a
maximum of 1023 queues. However, there are actually scenarios
where the queue_id exceeds 1023.
This patch adds an additional bit to HCLGE_FD_AD_QID to ensure
that queue_id greater than 1023 are supported.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260123094756.3718516-2-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle says:
====================
net: dsa: lantiq: add support for Intel GSW150
The Intel GSW150 Ethernet Switch (aka. Lantiq PEB7084) is the predecessor of
MaxLinear's GSW1xx series of switches. It shares most features, but has a
slightly different port layout and different MII interfaces.
Adding support for this switch to the mxl-gsw1xx driver is quite trivial.
====================
Link: https://patch.msgid.link/cover.1769099517.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The Lantiq GSWIP and MaxLinear GSW1xx drivers are currently relying on a
hard-coded mapping of MII ports to their respective MII_CFG and MII_PCDU
registers and only allow applying an offset to the port index.
While this is sufficient for the currently supported hardware, the very
similar Intel GSW150 (aka. Lantiq PEB7084) cannot be described using
this arrangement.
Introduce two arrays to specify the MII_CFG and MII_PCDU registers for
each port, replacing the current bitmap used to safeguard MII ports as
well as the port index offset.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/63fc01195196384f5e244a0ce9ec2ae3a6c08fe3.1769099517.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests that validate vsock sockets are resilient to deleting
namespaces. The vsock sockets should still function normally.
The function check_ns_delete_doesnt_break_connection() is added to
re-use the step-by-step logic of 1) setup connections, 2) delete ns,
3) check that the connections are still ok.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-12-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests to validate namespace correctness using vsock_test and socat.
The vsock_test tool is used to validate expected success tests, but
socat is used for expected failure tests. socat is used to ensure that
connections are rejected outright instead of failing due to some other
socket behavior (as tested in vsock_test). Additionally, socat is
already required for tunneling TCP traffic from vsock_test. Using only
one of the vsock_test tests like 'test_stream_client_close_client' would
have yielded a similar result, but doing so wouldn't remove the socat
dependency.
Additionally, check for the dependency socat. socat needs special
handling beyond just checking if it is on the path because it must be
compiled with support for both vsock and unix. The function
check_socat() checks that this support exists.
Add more padding to test name printf strings because the tests added in
this patch would otherwise overflow.
Add vm_dmesg_* helpers to encapsulate checking dmesg
for oops and warnings.
Add ability to pass extra args to host-side vsock_test so that tests
that cause false positives may be skipped with arg --skip.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-11-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests to verify CID collision rules across different vsock namespace
modes.
1. Two VMs with the same CID cannot start in different global namespaces
(ns_global_same_cid_fails)
2. Two VMs with the same CID can start in different local namespaces
(ns_local_same_cid_ok)
3. VMs with the same CID can coexist when one is in a global namespace
and another is in a local namespace (ns_global_local_same_cid_ok and
ns_local_global_same_cid_ok)
The tests ns_global_local_same_cid_ok and ns_local_global_same_cid_ok
make sure that ordering does not matter.
The tests use a shared helper function namespaces_can_boot_same_cid()
that attempts to start two VMs with identical CIDs in the specified
namespaces and verifies whether VM initialization failed or succeeded.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-10-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests for the /proc/sys/net/vsock/{ns_mode,child_ns_mode}
interfaces. Namely, that they accept/report "global" and "local" strings
and enforce their access policies.
Start a convention of commenting the test name over the test
description. Add test name comments over test descriptions that existed
before this convention.
Add a check_netns() function that checks if the test requires namespaces
and if the current kernel supports namespaces. Skip tests that require
namespaces if the system does not have namespace support.
This patch is the first to add tests that do *not* re-use the same
shared VM. For that reason, it adds a run_ns_tests() function to run
these tests and filter out the shared VM tests.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-9-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Replace /proc/net parsing with ss(8) for detecting listening sockets in
wait_for_listener() functions and add support for TCP, VSOCK, and Unix
socket protocols.
The previous implementation parsed /proc/net/tcp using awk to detect
listening sockets, but this approach could not support vsock because
vsock does not export socket information to /proc/net/.
Instead, use ss so that we can detect listeners on tcp, vsock, and unix.
The protocol parameter is now required for all wait_for_listener family
functions (wait_for_listener, vm_wait_for_listener,
host_wait_for_listener) to explicitly specify which socket type to wait
for.
ss is added to the dependency check in check_deps().
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-8-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add namespace support to vm management, ssh helpers, and vsock_test
wrapper functions. This enables running VMs and test helpers in specific
namespaces, which is required for upcoming namespace isolation tests.
The functions still work correctly within the init ns, though the caller
must now pass "init_ns" explicitly.
No functional changes for existing tests. All have been updated to pass
"init_ns" explicitly.
Affected functions (such as vm_start() and vm_ssh()) now wrap their
commands with 'ip netns exec' when executing commands in non-init
namespaces.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-6-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add functions for initializing namespaces with the different vsock NS
modes. Callers can use add_namespaces() and del_namespaces() to create
namespaces global0, global1, local0, and local1.
The add_namespaces() function initializes global0, local0, etc... with
their respective vsock NS mode by toggling child_ns_mode before creating
the namespace.
Remove namespaces upon exiting the program in cleanup(). This is
unlikely to be needed for a healthy run, but it is useful for tests that
are manually killed mid-test.
This patch is in preparation for later namespace tests.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-5-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Associate reply packets with the sending socket. When vsock must reply
with an RST packet and there exists a sending socket (e.g., for
loopback), setting the skb owner to the socket correctly handles
reference counting between the skb and sk (i.e., the sk stays alive
until the skb is freed).
This allows the net namespace to be used for socket lookups for the
duration of the reply skb's lifetime, preventing race conditions between
the namespace lifecycle and vsock socket search using the namespace
pointer.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-2-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add netns logic to vsock core. Additionally, modify transport hook
prototypes to be used by later transport-specific patches (e.g.,
*_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions
(e.g., vsock_find_connected_socket()) to take into account the socket
namespace and the namespace mode before considering a candidate socket a
"match".
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to
report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode
for new namespaces.
Add netns functionality (initialization, passing to transports, procfs,
etc...) to the af_vsock socket layer. Later patches that add netns
support to transports depend on this patch.
This patch changes the allocation of random ports for connectible vsocks
in order to avoid leaking the random port range starting point to other
namespaces.
dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
modified to take a vsk in order to perform logic on namespace modes. In
future patches, the net will also be used for socket
lookups in these functions.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-1-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On 64bit arches, struct u64_stats_sync is empty and provides no help
against load/store tearing. Convert to u64_stats_t to ensure atomic
operations.
Note that does not mean the code is now tear-free: there're u32 counters
unprotected by u64_stats or anything else.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260123164841.2890054-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The __exit__ method receives ex_type as the exception class when an
exception occurs. The previous code used implicit boolean evaluation:
terminate = self.terminate or (self._exit_wait and ex_type)
^^^^^^^^^^^
In Python, the and operator can be used with non-boolean values, but it
does not always return a boolean result.
This is probably not what we want, because 'self._exit_wait and ex_type'
could return the actual ex_type value (the exception class) rather than
a boolean True when an exception occurs.
Use explicit `ex_type is not None` check to properly evaluate whether
an exception occurred, returning a boolean result.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260125105524.773993-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bits in register TxConfig used for chip identification aren't
sufficient for the number of upcoming chip versions. Therefore a register
is added with extended chip version information, for compatibility
purposes it's called TX_CONFIG_V2. First chip to use the extended chip
identification is RTL9151AS.
Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
[hkallweit1@gmail.com: add support for extended XID where XID is printed]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/a3525b74-a1aa-43f6-8413-56615f6fa795@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>