As long as the MSE102x is not operational, every packet transmission
will run into a TX timeout and flood the kernel log. So log only the
first TX timeout and a user is at least informed about this issue.
The amount of timeouts are still available via netstat.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240827191000.3244-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This macro has the advantage over SET_SYSTEM_SLEEP_PM_OPS that we don't
have to care about when the functions are actually used.
Also make use of pm_sleep_ptr() to discard all PM_SLEEP related
stuff if CONFIG_PM_SLEEP isn't enabled.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240827191000.3244-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change VMBus channels macro (VRSS_CHANNEL_DEFAULT) in
Linux netvsc from 8 to 16 to align with Azure Windows VM
and improve networking throughput.
For VMs having less than 16 vCPUS, the channels depend
on number of vCPUs. For greater than 16 vCPUs,
set the channels to maximum of VRSS_CHANNEL_DEFAULT and
number of physical cores / 2 which is returned by
netif_get_num_default_rss_queues() as a way to optimize CPU
resource utilization and scale for high-end processors with
many cores.
Maximum number of channels are by default set to 64.
Based on this change the channel creation would change as follows:
-----------------------------------------------------------------
| No. of vCPU | dev_info->num_chn | channels created |
-----------------------------------------------------------------
| 1-16 | 16 | vCPU |
| >16 | max(16,#cores/2) | min(64 , max(16,#cores/2)) |
-----------------------------------------------------------------
Performance tests showed significant improvement in throughput:
- 0.54% for 16 vCPUs
- 0.83% for 32 vCPUs
- 0.86% for 48 vCPUs
- 9.72% for 64 vCPUs
- 13.57% for 96 vCPUs
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/1724735791-22815-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet says:
====================
tcp: take better care of tw_substate and tw_rcv_nxt
While reviewing Jason Xing recent commit (0d9e5df4a2 "tcp: avoid reusing
FIN_WAIT2 when trying to find port in connect() process") I saw
we could remove the volatile qualifier for tw_substate field,
and I also added missing data-race annotations around tcptw->tw_rcv_nxt.
====================
Link: https://patch.msgid.link/20240827015250.3509197-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the list_for_each_entry_rcu iteration call of xenvif_flush_hash,
kfree_rcu does not exist inside the rcu read critical section, so if
kfree_rcu is called when the rcu grace period ends during the iteration,
UAF occurs when accessing head->next after the entry becomes free.
Therefore, to solve this, you need to change it to list_for_each_entry_safe.
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Link: https://patch.msgid.link/20240822181109.2577354-1-aha310510@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-08-26 (ice)
This series contains updates to ice driver only.
Jake implements and uses rd32_poll_timeout to replace a jiffies loop for
calling ice_sq_done. The rd32_poll_timeout() function is designed to allow
simplifying other places in the driver where we need to read a register
until it matches a known value.
Jake, Bruce, and Przemek update ice_debug_cq() to be more robust, and more
useful for tracing control queue messages sent and received by the device
driver.
Jake rewords several commands in the ice_control.c file which previously
referred to the "Admin queue" when they were actually generic functions
usable on any control queue.
Jake removes the unused and unnecessary cmd_buf array allocation for send
queues. This logic originally was going to be useful if we ever implemented
asynchronous completion of transmit messages. This support is unlikely to
materialize, so the overhead of allocating a command buffer is unnecessary.
Sergey improves the log messages when the ice driver reports that the NVM
version on the device is not supported by the driver. Now, these messages
include both the discovered NVM version and the requested/expected NVM
version.
Aleksandr Mishin corrects overallocation of memory related to adding
scheduler nodes.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Adjust over allocation of memory in ice_sched_add_root_node() and ice_sched_add_node()
ice: Report NVM version numbers on mismatch during load
ice: remove unnecessary control queue cmd_buf arrays
ice: reword comments referring to control queues
ice: stop intermixing AQ commands/responses debug dumps
ice: do not clutter debug logs with unused data
ice: improve debug print for control queue messages
ice: implement and use rd32_poll_timeout for ice_sq_done timeout
====================
Link: https://patch.msgid.link/20240826224655.133847-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KSZ8895/KSZ8864 is a switch family between KSZ8863/73 and KSZ8795, so it
shares some registers and functions in those switches already
implemented in the KSZ DSA driver.
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Tested-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Liao Chen says:
====================
net: fix module autoloading
This patchset aims to enable autoloading of some net modules.
By registering MDT, the kernel is allowed to automatically bind
modules to devices that match the specified compatible strings.
====================
Link: https://patch.msgid.link/20240826091858.369910-1-liaochen4@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, of_get_ethdev_address() return is checked for any return error
code which means that trying to get the MAC from NVMEM cells that is backed
by MTD will fail if it was not probed before ag71xx.
So, lets check the return error code for EPROBE_DEFER and defer the ag71xx
probe in that case until the underlying NVMEM device is live.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240824200249.137209-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit da15c78b56 ("liquidio CN23XX: VF register access") declared
cn23xx_dump_vf_initialized_regs() but never implemented it.
octeon_dump_soft_command() is never implemented and used since introduction in
commit 35878618c9 ("liquidio: Added delayed work for periodically updating
the link statistics.").
And finally, a few other declarations were never implenmented since introduction
in commit f21fb3ed36 ("Add support of Cavium Liquidio ethernet adapters").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240824083107.3639602-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 4863dea3fa ("net: Adding support for Cavium ThunderX network
controller") declared nicvf_qset_reg_{write,read}() but never implemented.
Commit 4863dea3fa ("net: Adding support for Cavium ThunderX network
controller") declared bgx_add_dmac_addr() but no implementation.
After commit 5fc7cf1794 ("net: thunderx: Cleanup PHY probing code.")
octeon_mdiobus_force_mod_depencency() is not used any more.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240824082754.3637963-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmitry Safonov via says:
====================
net/selftests: TCP-AO selftests updates
First 3 patches are more-or-less cleanups/preparations.
Patches 4/5 are fixes for netns file descriptors leaks/open.
Patch 6 was sent to me/contributed off-list by Mohammad, who wants 32-bit
kernels to run TCP-AO.
Patch 7 is a workaround/fix for slow VMs. Albeit, I can't reproduce
the issue, but I hope it will fix netdev flakes for connect-deny-*
tests.
And the biggest change is adding TCP-AO tracepoints to selftests.
I think it's a good addition by the following reasons:
- The related tracepoints are now tested;
- It allows tcp-ao selftests to raise expectations on the kernel
behavior - up from the syscalls exit statuses + net counters.
- Provides tracepoints usage samples.
As tracepoints are not a stable ABI, any kernel changes done to them
will be reflected to the selftests, which also will allow users
to see how to change their code. It's quite better than parsing dmesg
(what BGP was doing pre-tracepoints, ugh).
Somewhat arguably, the code parses trace_pipe, rather than uses
libtraceevent (which any sane user should do). The reason behind that is
the same as for rt-netlink macros instead of libmnl: I'm trying
to minimize the library dependencies of the selftests. And the
performance of formatting text in kernel and parsing it again in a test
is not critical.
Current output sample:
> ok 73 Trace events matched expectations: 13 tcp_hash_md5_required[2] tcp_hash_md5_unexpected[4] tcp_hash_ao_required[3] tcp_ao_key_not_found[4]
Previously, tracepoints selftests were part of kernel tcp tracepoints
submission [1], but since then the code was quite changed:
- Now generic tracing setup is in lib/ftrace.c, separate from
lib/ftrace-tcp.c which utilizes TCP trace points. This separation
allows future selftests to trace non-TCP events, i.e. to find out
an skb's drop reason, which was useful in the creation of TCP-CLOSE
stress-test (not in this patch set, but used in attempt to reproduce
the issue from [2]).
- Another change is that in the previous submission the trace events
where used only to detect unexpected TCP-AO/TCP-MD5 events. In this
version the selftests will fail if an expected trace event didn't
appear.
Let's see how reliable this is on the netdev bot - it obviously passes
on my testing, but potentially may require a temporary XFAIL patch
if it misbehaves on a slow VM.
[1] https://lore.kernel.org/lkml/20240224-tcp-ao-tracepoints-v1-0-15f31b7f30a7@arista.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=33700a0c9b56
v3: https://lore.kernel.org/20240815-tcp-ao-selftests-upd-6-12-v3-0-7bd2e22bb81c@gmail.com
v2: https://lore.kernel.org/20240802-tcp-ao-selftests-upd-6-12-v2-0-370c99358161@gmail.com
v1: https://lore.kernel.org/20240730-tcp-ao-selftests-upd-6-12-v1-0-ffd4bf15d638@gmail.com
====================
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-0-05623636fe8c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Setup trace points, add a new ftrace instance in order to not interfere
with the rest of the system, filtering by net namespace cookies.
Raise a new background thread that parses trace_pipe, matches them with
the list of expected events.
Wiring up trace events to selftests provides another insight if there is
anything unexpected happining in the tcp-ao code (i.e. key rotation when
it's not expected).
Note: in real programs libtraceevent should be used instead of this
manual labor of setting ftrace up and parsing. I'm not using it here
as I don't want to have an .so library dependency that one would have to
bring into VM or DUT (Device Under Test). Please, don't copy it over
into any real world programs, that aren't tests.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-8-05623636fe8c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On tests that are expecting failure the timeout value is
TEST_RETRANSMIT_SEC == 1 second. Which is big enough for most of devices
under tests. But on a particularly slow machine/VM, 1 second might be
not enough for another thread to be scheduled and attempt to connect().
It is not a problem for tests that expect connect() to succeed as
the timeout value for them (TEST_TIMEOUT_SEC) is intentionally bigger.
One obvious way to solve this would be to increase TEST_RETRANSMIT_SEC.
But as all tests would increase the timeouts, that's going to sum up.
But here is less obvious way that keeps timeouts for expected connect()
failures low: just synchronize the two threads, which will assure that
before counter checks the other thread got a chance to run and timeout
on connect(). The expected increase of the related counter for listen()
socket will yet test the expected failure.
Never happens on my machine, but I suppose the majority of netdev's
connect-deny-* flakes [1] are caused by this.
Prevents the following testing issue:
> # selftests: net/tcp_ao: connect-deny_ipv6
> # 1..21
> # # 462[lib/setup.c:243] rand seed 1720905426
> # TAP version 13
> # ok 1 Non-AO server + AO client
> # not ok 2 Non-AO server + AO client: TCPAOKeyNotFound counter did not increase: 0 <= 0
> # ok 3 AO server + Non-AO client
> # ok 4 AO server + Non-AO client: counter TCPAORequired increased 0 => 1
...
[1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao/results/681741/6-connect-deny-ipv6/stdout
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-7-05623636fe8c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It's not safe to use '%zu' specifier for printing uint64_t on 32-bit
systems. For uint64_t, we should use the 'PRIu64' macro from
the inttypes.h library. This ensures that the uint64_t is printed
correctly from the selftests regardless of the system architecture.
Signed-off-by: Mohammad Nassiri <mnassiri@ciena.com>
[Added missing spaces in fail/ok messages and uint64_t cast in
setsockopt-closed, as otherwise it was giving warnings on 64bit.
And carried it to netdev ml]
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-6-05623636fe8c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It turns to be that open_netns() is called rarely from the child-thread
and more often from parent-thread. Yet, on initialization of kconfig
checks, either of threads may reach kconfig_lock mutex first.
VRF-related checks do create a temporary ksft-check VRF in
an unshare()'d namespace and than setns() back to the original.
As original was opened from "/proc/self/ns/net", it's valid for
thread-leader (parent), but it's invalid for the child, resulting
in the following failure on tests that check has_vrfs() support:
> # ok 54 TCP-AO required on socket + TCP-MD5 key: prefailed as expected: Key was rejected by service
> # not ok 55 # error 381[unsigned-md5.c:24] Failed to add a VRF: -17
> # not ok 56 # error 383[unsigned-md5.c:33] Failed to add a route to VRF: -22: Key was rejected by service
> not ok 1 selftests: net/tcp_ao: unsigned-md5_ipv6 # exit=1
Use "/proc/thread-self/ns/net" which is valid for any thread.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-4-05623636fe8c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Most of the functions in tcp-ao lib/ return negative errno or -1 in case
of a failure. That creates inconsistencies in lib/kconfig, which saves
what was the error code. As well as the uninitialized kconfig value is
-1, which also may be the result of a check.
Define KCONFIG_UNKNOWN and save negative return code, rather than
libc-style errno.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-3-05623636fe8c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of pre-allocating a fixed-sized buffer of TEST_MSG_BUFFER_SIZE
and printing into it, call vsnprintf() with str = NULL, which will
return the needed size of the buffer. This hack is documented in
man 3 vsnprintf.
Essentially, in C++ terms, it re-invents std::stringstream, which is
going to be used to print different tracing paths and formatted strings.
Use it straight away in __test_print() - which is thread-safe version of
printing in selftests.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-2-05623636fe8c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent commit fc7ec7f554 ("l2tp: delete sessions using work queue")
incorrectly uses drain_workqueue. The use of drain_workqueue in
l2tp_pre_exit_net is flawed because the workqueue is shared by all
nets and it is therefore possible for new work items to be queued
for other nets while drain_workqueue runs.
Instead of using drain_workqueue, use __flush_workqueue twice. The
first one will run all tunnel delete work items and any work already
queued. When tunnel delete work items are run, they may queue
new session delete work items, which the second __flush_workqueue will
run.
In l2tp_exit_net, warn if any of the net's idr lists are not empty.
Fixes: fc7ec7f554 ("l2tp: delete sessions using work queue")
Signed-off-by: James Chapman <jchapman@katalix.com>
Link: https://patch.msgid.link/20240823142257.692667-1-jchapman@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>