linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-03 08:50:17 -04:00

Author	SHA1	Message	Date
Michael Kelley	c86ab60b92	hv_netvsc: Don't assume cpu_possible_mask is dense Current code allocates the pcpu_sum array with size num_possible_cpus(). This code assumes the cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask is sparse, the array might be indexed by a value beyond the size of the array. However, the configurations that Hyper-V provides to guest VMs on x86 and ARM64 hardware, in combination with how architecture specific code assigns Linux CPU numbers, does always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to allocate and initialize the array using size "nr_cpu_ids". While this leaves unused array entries corresponding to holes in cpu_possible_mask, the holes are assumed to be minimal and hence the amount of memory wasted by unused entries is minimal. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20241003035333.49261-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 13:09:20 -07:00
Daniel Zahka	5c2ab978f9	ethtool: rss: fix rss key initialization warning This warning is emitted when a driver does not default populate an rss key when one is not provided from userspace. Some devices do not support individual rss keys per context. For these devices, it is ok to leave the key zeroed out in ethtool_rxfh_context. Do not warn on zeroed key when ethtool_ops.rxfh_per_ctx_key == 0. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20241003162310.1310576-1-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 12:34:13 -07:00
Jakub Kicinski	00110c5eeb	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-10-01 (ice) This series contains updates to ice driver only. Karol cleans up current PTP GPIO pin handling, fixes minor bugs, refactors implementation for all products, introduces SDP (Software Definable Pins) for E825C and implements reading SDP section from NVM for E810 products. Sergey replaces multiple aux buses and devices used in the PTP support code with struct ice_adapter holding the necessary shared data. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Drop auxbus use for PTP to finalize ice_adapter move ice: Use ice_adapter for PTP shared data instead of auxdev ice: Initial support for E825C hardware in ice_adapter ice: Add ice_get_ctrl_ptp() wrapper to simplify the code ice: Introduce ice_get_phy_model() wrapper ice: Enable 1PPS out from CGU for E825C products ice: Read SDP section from NVM for pin definitions ice: Disable shared pin on E810 on setfunc ice: Cache perout/extts requests and check flags ice: Align E810T GPIO to other products ice: Add SDPs support for E825C ice: Implement ice_ptp_pin_desc ==================== Link: https://patch.msgid.link/20241001201702.3252954-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 12:30:18 -07:00
Jakub Kicinski	a73f214e89	Merge branch 'add-option-to-provide-opt_id-value-via-cmsg' Vadim Fedorenko says: ==================== Add option to provide OPT_ID value via cmsg SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there is no way to reliably predict socket timestamp ID value in case of error returned by sendmsg. For UDP sockets it's impossible because of lockless nature of UDP transmit, several threads may send packets in parallel. In case of RAW sockets MSG_MORE option makes things complicated. More details are in the conversation [1]. This patch adds new control message type to give user-space software an opportunity to control the mapping between packets and values by providing ID with each sendmsg. The first patch in the series adds all needed definitions and implements the function for UDP sockets. The explicit check of socket's type is not added because subsequent patches in the series will add support for other types of sockets. The documentation is also included into the first patch. Patch 2/4 adds support for TCP sockets. This part is simple and straight forward. Patch 3/4 adds support for RAW sockets. It's a bit tricky because sock_tx_timestamp functions has to be refactored to receive full socket cookie information to fill in ID. The commit `b534dc46c8` ("net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP") did the conversion of sk_tsflags to u32 but sock_tx_timestamp functions were not converted and still receive 16b flags. It wasn't a problem because SOF_TIMESTAMPING_OPT_ID_TCP was not checked in these functions, that's why no backporting is needed. Patch 4/4 adds selftests for new feature. [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/ ==================== Link: https://patch.msgid.link/20241001125716.2832769-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:23 -07:00
Vadim Fedorenko	a89568e9be	selftests: txtimestamp: add SCM_TS_OPT_ID test Extend txtimestamp test to run with fixed tskey using SCM_TS_OPT_ID control message for all types of sockets. Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-4-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:20 -07:00
Vadim Fedorenko	822b5bc6db	net_tstamp: add SCM_TS_OPT_ID for RAW sockets The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW sockets. To add new option this patch converts all callers (direct and indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of tsflags. And while here fix __sock_tx_timestamp to receive tsflags as __u32 instead of __u16. Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-3-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:19 -07:00
Vadim Fedorenko	4aecca4c76	net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there is no way to reliably predict socket timestamp ID value in case of error returned by sendmsg. For UDP sockets it's impossible because of lockless nature of UDP transmit, several threads may send packets in parallel. In case of RAW sockets MSG_MORE option makes things complicated. More details are in the conversation [1]. This patch adds new control message type to give user-space software an opportunity to control the mapping between packets and values by providing ID with each sendmsg for UDP sockets. The documentation is also added in this patch. [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/ Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-2-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:19 -07:00
Jakub Kicinski	34ea1df802	Merge branch 'net-mlx5-hw-counters-refactor' Tariq Toukan says: ==================== net/mlx5: hw counters refactor This is a patchset re-post, see: https://lore.kernel.org/20240815054656.2210494-7-tariqt@nvidia.com In this patchset, Cosmin refactors hw counters and solves perf scaling issue. Series generated against: commit `c824deb1a8` ("cxgb4: clip_tbl: Fix spelling mistake "wont" -> "won't"") HW counters are central to mlx5 driver operations. They are hardware objects created and used alongside most steering operations, and queried from a variety of places. Most counters are queried in bulk from a periodic task in fs_counters.c. Counter performance is important and as such, a variety of improvements have been done over the years. Currently, counters are allocated from pools, which are bulk allocated to amortize the cost of firmware commands. Counters are managed through an IDR, a doubly linked list and two atomic single linked lists. Adding/removing counters is a complex dance between user contexts requesting it and the mlx5_fc_stats_work task which does most of the work. Under high load (e.g. from connection tracking flow insertion/deletion), the counter code becomes a bottleneck, as seen on flame graphs. Whenever a counter is deleted, it gets added to a list and the wq task is scheduled to run immediately to actually delete it. This is done via mod_delayed_work which uses an internal spinlock. In some tests, waiting for this spinlock took up to 66% of all samples. This series refactors the counter code to use a more straight-forward approach, avoiding the mod_delayed_work problem and making the code easier to understand. For that: - patch #1 moves counters data structs to a more appropriate place. - patch #2 simplifies the bulk query allocation scheme by using vmalloc. - patch #3 replaces the IDR+3 lists with an xarray. This is the main patch of the series, solving the spinlock congestion issue. - patch #4 removes an unnecessary cacheline alignment causing a lot of memory to be wasted. - patches #5 and #6 are small cleanups enabled by the refactoring. ==================== Link: https://patch.msgid.link/20241001103709.58127-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:48 -07:00
Cosmin Ratiu	d1c9cffe4b	net/mlx5: hw counters: Remove mlx5_fc_create_ex It no longer serves any purpose and is identical to mlx5_fc_create upon which it was originally based of. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:47 -07:00
Cosmin Ratiu	4a67ebf85f	net/mlx5: hw counters: Don't maintain a counter count num_counters is only used for deciding whether to grow the bulk query buffer, which is done once more counters than a small initial threshold are present. After that, maintaining num_counters serves no purpose. This commit replaces that with an actual xarray traversal to count the counters. This appears expensive at first sight, but is only done when the number of counters is less than the initial threshold (8) and only once every sampling interval. Once the number of counters goes above the threshold, the bulk query buffer is grown to max size and the xarray traversal is never done again. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:46 -07:00
Cosmin Ratiu	d95f77f119	net/mlx5: hw counters: Drop unneeded cacheline alignment The mlx5_fc struct has a cache for values queried from hw, which is cacheline aligned. On x86_64, this results in: struct mlx5_fc { u32 id; /* 0 4 / bool aging; / 4 1 / / XXX 3 bytes hole, try to pack / struct mlx5_fc_bulk bulk; /* 8 8 / / XXX 48 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / struct mlx5_fc_cache cache __attribute__((__aligned__(64))); / 64 24 / u64 lastpackets; / 88 8 / u64 lastbytes; / 96 8 / / size: 128, cachelines: 2, members: 6 / / sum members: 53, holes: 2, sum holes: 51 / / padding: 24 / / forced aligns: 1, forced holes: 1, sum forced holes: 48 / } __attribute__((__aligned__(64))); (output from pahole). ...So a 48+24=72 byte waste. As far as I can determine, this serves no purpose other than maybe making sure that the values in the cache do not span two cachelines in the worst case scenario, but that's not a valid enough reason to waste 72 bytes per counter, especially since this code is not performance-critical. There could potentially be hundreds of thousands of counters (e.g. for connection-tracking), so this quickly adds up to multiple MB wasted. This commit removes the alignment, resulting in: struct mlx5_fc { [...] / size: 56, cachelines: 1, members: 6 / / sum members: 53, holes: 1, sum holes: 3 / / last cacheline: 56 bytes */ }; Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:46 -07:00
Cosmin Ratiu	918af0219a	net/mlx5: hw counters: Replace IDR+lists with xarray Previously, managing counters was a complicated affair involving an IDR, a sorted double linked list, two single linked lists and a complex dance between a non-periodic wq task and users adding/deleting counters. Adding was done by inserting new counters into the IDR and into a single linked list, leaving the wq to process the list and actually add the counters into the double linked list, maintained sorted with the IDR. Deleting involved adding the counter into another single linked list, leaving the wq to actually unlink the counter from the other structures and release it. Dumping the counters is done with the bulk query API, which relies on the counter list being sorted and unmutable during querying to efficiently retrieve cached counter values. Finally, the IDR data struct is deprecated. This commit replaces all of that with an xarray. Adding is now done directly, by using xa_lock. Deleting is also done directly, under the xa_lock. Querying is done from a periodic task running every sampling_interval (default 1s) and uses the bulk query API for efficiency. It works by iterating over the xarray: - when a new bulk needs to be started, the bulk information is computed under the xa_lock. - the xa iteration state is saved and the xa_lock dropped. - the HW is queried for bulk counter values. - the xa_lock is reacquired. - counter caches with ids covered by the bulk response are updated. Querying always requests the max bulk length, for simplicity. Counters could be added/deleted while the HW is queried. This is safe, as the HW API simply returns unknown values for counters not in HW, but those values won't be accessed. Only counters present in xarray before bulk query will actually read queried cache values. This cuts down the size of mlx5_fc by 4 pointers (88->56 bytes), which amounts to ~3MB / 100K counters. But more importantly, this solves the wq spinlock congestion issue seen happening on high-rate counter insertion+deletion. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:46 -07:00
Cosmin Ratiu	10cd92df83	net/mlx5: hw counters: Use kvmalloc for bulk query buffer The bulk query buffer starts out small (see [1]) and as soon as the number of counters goes past the initial threshold grows to max size (32K entries, 512KB) with a retry scheme. This commit switches to using kvmalloc for the buffer, which has a near zero likelihood of failing, and thus the explicit retry scheme becomes superfluous and is taken out. On the low chance the allocation fails, it will still be retried every sampling_interval, when the wq task runs. [1] commit `b247f32aec` ("net/mlx5: Dynamically resize flow counters query buffer") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:46 -07:00
Cosmin Ratiu	5acd957a98	net/mlx5: hw counters: Make fc_stats & fc_pool private The mlx5_fc_stats and mlx5_fc_pool structs are only used from fs_counters.c. As such, make them private there. mlx5_fc_pool is not used or referenced at all outside fs_counters. mlx5_fc_stats is referenced from mlx5_core_dev, so instead of having it as a direct member (which requires exporting it from fs_counters), store a pointer to it, allocate it on init and clear it on destroy. One caveat is that a simple container_of to get from a 'work' struct to the outermost mlx5_core_dev struct directly no longer works, so an extra pointer had to be added to mlx5_fc_stats back to the parent dev. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:46 -07:00
Riyan Dhiman	c55ff46aee	octeontx2-af: Change block parameter to const pointer in get_lf_str_list Convert struct rvu_block block to const struct rvu_block *block in get_lf_str_list() function parameter. This improves efficiency by avoiding structure copying and reflects the function's read-only access to block. Signed-off-by: Riyan Dhiman <riyandhiman14@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001110542.5404-2-riyandhiman14@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:12:28 -07:00
Aleksander Jan Bajkowski	8389cdb5c1	net: macb: Adding support for Jumbo Frames up to 10240 Bytes in SAMA5D2 As per the SAMA5D2 device specification it supports Jumbo frames. But the suggested flag and length of bytes it supports was not updated in this driver config_structure. The maximum jumbo frames the device supports: 10240 bytes as per the device spec. While changing the MTU value greater than 1500, it threw error: sudo ifconfig eth1 mtu 9000 SIOCSIFMTU: Invalid argument Add this support to driver so that it works as expected and designed. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://patch.msgid.link/20241003171941.8814-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:11:12 -07:00
Jakub Kicinski	7bc22763d5	Merge branch 'net-airoha-fix-pse-memory-configuration' Lorenzo Bianconi says: ==================== net: airoha: Fix PSE memory configuration Align PSE memory configuration to vendor SDK. Increase initial value of PSE reserved memory in airoha_fe_pse_ports_init() by the value used for the second Packet Processor Engine (PPE2). Do not overwrite the default value for the number of PSE reserved pages in airoha_fe_set_pse_oq_rsv(). These changes fix issues which are not visible to the user. v1: https://lore.kernel.org/20240930-airoha-eth-pse-fix-v1-0-f41f2f35abb9@kernel.org ==================== Link: https://patch.msgid.link/20241001-airoha-eth-pse-fix-v2-0-9a56cdffd074@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:55:16 -07:00
Lorenzo Bianconi	8e38e08f2c	net: airoha: fix PSE memory configuration in airoha_fe_pse_ports_init() Align PSE memory configuration to vendor SDK. In particular, increase initial value of PSE reserved memory in airoha_fe_pse_ports_init() routine by the value used for the second Packet Processor Engine (PPE2) and do not overwrite the default value. Introduced by commit `23020f0493` ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001-airoha-eth-pse-fix-v2-2-9a56cdffd074@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:54:28 -07:00
Lorenzo Bianconi	1f3e7ff4f2	net: airoha: read default PSE reserved pages value before updating Store the default value for the number of PSE reserved pages in orig_val at the beginning of airoha_fe_set_pse_oq_rsv routine, before updating it with airoha_fe_set_pse_queue_rsv_pages(). Introduce airoha_fe_get_pse_all_rsv utility routine. Introduced by commit `23020f0493` ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001-airoha-eth-pse-fix-v2-1-9a56cdffd074@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:54:28 -07:00
Jakub Kicinski	7d68b6f664	Merge branch 'net-switch-to-scoped-device_for_each_child_node' Javier Carrasco says: ==================== net: switch to scoped device_for_each_child_node() This series switches from the device_for_each_child_node() macro to its scoped variant. This makes the code more robust if new early exits are added to the loops, because there is no need for explicit calls to fwnode_handle_put(), which also simplifies existing code. The non-scoped macros to walk over nodes turn error-prone as soon as the loop contains early exits (break, goto, return), and patches to fix them show up regularly, sometimes due to new error paths in an existing loop [1]. Note that the child node is now declared in the macro, and therefore the explicit declaration is no longer required. The general functionality should not be affected by this modification. If functional changes are found, please report them back as errors. Link: https://lore.kernel.org/20240901160829.709296395@linuxfoundation.org v1: https://lore.kernel.org/r/20240930-net-device_for_each_child_node_scoped-v1-0-bbdd7f9fd649@gmail.com ==================== Link: https://patch.msgid.link/20240930-net-device_for_each_child_node_scoped-v2-0-35f09333c1d7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:28:28 -07:00
Javier Carrasco	e97dccd3e9	net: hns: hisilicon: hns_dsaf_mac: switch to scoped device_for_each_child_node() Use device_for_each_child_node_scoped() to simplify the code by removing the need for explicit calls to fwnode_handle_put() in every error path. This approach also accounts for any error path that could be added. Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://patch.msgid.link/20240930-net-device_for_each_child_node_scoped-v2-2-35f09333c1d7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:28:26 -07:00
Javier Carrasco	1d39d02a15	net: mdio: thunder: switch to scoped device_for_each_child_node() There has already been an issue with the handling of early exits from device_for_each_child() in this driver, and it was solved with commit `b1de5c78eb` ("net: mdio: thunder: Add missing fwnode_handle_put()") by adding a call to fwnode_handle_put() right after the loop. That solution is valid indeed, but if a new error path with a 'return' is added to the loop, this solution will fail. A more secure approach is using the scoped variant of the macro, which automatically decrements the refcount of the child node when it goes out of scope, removing the need for explicit calls to fwnode_handle_put(). Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://patch.msgid.link/20240930-net-device_for_each_child_node_scoped-v2-1-35f09333c1d7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:28:26 -07:00
Jakub Kicinski	6443cf1bdf	Merge branch 'qed-ethtool-d-faster-less-latency' Michal Schmidt says: ==================== qed: 'ethtool -d' faster, less latency Here is a patch to make 'ethtool -d' on a qede network device a lot faster and 3 patches to make it cause less latency for other tasks on non-preemptible kernels. ==================== Link: https://patch.msgid.link/20240930201307.330692-1-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:25:18 -07:00
Michal Schmidt	2efeaf1d2a	qed: put cond_resched() in qed_dmae_operation_wait() It is OK to sleep in qed_dmae_operation_wait, because it is called only in process context, while holding p_hwfn->dmae_info.mutex from one of the qed_dmae_{host,grc}2{host,grc} functions. The udelay(DMAE_MIN_WAIT_TIME=2) in the function is too short to replace with usleep_range, but at least it's a suitable point for checking if we should give up the CPU with cond_resched(). This lowers the latency caused by 'ethtool -d' from 10 ms to less than 2 ms on my test system with voluntary preemption. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://patch.msgid.link/20240930201307.330692-5-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:25:15 -07:00
Michal Schmidt	cf54ae6b59	qed: allow the callee of qed_mcp_nvm_read() to sleep qed_mcp_nvm_read has a loop where it calls qed_mcp_nvm_rd_cmd with the argument b_can_sleep=false. And it sleeps once every 0x1000 bytes read. Simplify this by letting qed_mcp_nvm_rd_cmd itself sleep (b_can_sleep=true). It will have slept at least once when successful (in the "Wait for the MFW response" loop). So the extra sleep once every 0x1000 bytes becomes superfluous. Delete it. On my test system with voluntary preemption, this lowers the latency caused by 'ethtool -d' from 53 ms to 10 ms. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://patch.msgid.link/20240930201307.330692-4-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:25:15 -07:00
Michal Schmidt	6cd695706f	qed: put cond_resched() in qed_grc_dump_ctx_data() On a kernel with preemption none or voluntary, 'ethtool -d' on a qede network device can cause a big latency spike. The biggest part of it is the loop in qed_grc_dump_ctx_data. The function is called only from the .get_size and .perform_dump callbacks for the "grc" feature defined in qed_features_lookup[]. As far as I can see, they are used in: - qed's devlink healh reporter .dump op - qede's ethtool get_regs/get_regs_len/get_dump_data ops - qedf's qedf_get_grc_dump, called from: - qedf_sysfs_write_grcdump - "grcdump" sysfs attribute write - qedf_wq_grcdump - a workqueue It is safe to sleep in all of them. Let's insert a cond_resched() in the outer loop to let other tasks run. Measured using this script: #!/bin/bash DEV=ens3f1 echo wakeup_rt > /sys/kernel/tracing/current_tracer echo 0 > /sys/kernel/tracing/tracing_max_latency echo 1 > /sys/kernel/tracing/tracing_on echo "Setting the task CPU affinity" taskset -p 1 $$ > /dev/null echo "Starting the real-time task" chrt -f 50 bash -c 'while sleep 0.01; do :; done' & sleep 1 echo "Running: ethtool -d $DEV" time ethtool -d $DEV > /dev/null kill %1 echo 0 > /sys/kernel/tracing/tracing_on echo "Measured latency: $(</sys/kernel/tracing/tracing_max_latency) us" echo "To see the latency trace: less /sys/kernel/tracing/trace" The patch lowers the latency from 180 ms to 53 ms on my test system with voluntary preemption. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://patch.msgid.link/20240930201307.330692-3-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:25:15 -07:00
Michal Schmidt	b8db67d4df	qed: make 'ethtool -d' 10 times faster As a side effect of commit `5401c3e099` ("qed: allow sleep in qed_mcp_trace_dump()"), 'ethtool -d' became much slower. Almost all the time is spent collecting the "mcp_trace". It is caused by sleeping too long in _qed_mcp_cmd_and_union. When called with sleeping not allowed, the function delays for 10 µs between firmware polls. But if sleeping is allowed, it sleeps for 10 ms instead. The sleeps in _qed_mcp_cmd_and_union are unnecessarily long. Replace msleep with usleep_range, which allows to achieve a similar polling interval like in the no-sleeping mode (10 - 20 µs). The only caller, qed_mcp_cmd_and_union, can stop doing the multiplication/division of the usecs/max_retries. The polling interval and the number of retries do not need to be parameters at all. On my test system, 'ethtool -d' now takes 4 seconds instead of 44. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://patch.msgid.link/20240930201307.330692-2-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 09:25:14 -07:00
Jakub Kicinski	4cd0bd19ce	Merge branch 'net-mv643xx-devm-fixes' Rosen Penev says: ==================== net: mv643xx: devm fixes Small simplification and a fix for a seemingly wrong function usage. ==================== Link: https://patch.msgid.link/20240930202951.297737-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:59:49 -07:00
Rosen Penev	50c3a7fbaa	net: mv643xx: fix wrong devm_clk_get usage This clock should be optional. In addition, PTR_ERR can be -EPROBE_DEFER in which case it should return. devm_clk_get_optional_enabled also allows removing explicit clock enable and disable calls. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240930202951.297737-3-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:59:47 -07:00
Rosen Penev	4d77e88ab4	net: mv643xx: use devm_platform_ioremap_resource This combines multiple steps in one function. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240930202951.297737-2-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:59:47 -07:00
Jakub Kicinski	59169e0a13	Merge branch 'net-ag71xx-small-cleanups' Rosen Penev says: ==================== net: ag71xx: small cleanups More devm and some loose ends. ==================== Link: https://patch.msgid.link/20240930181823.288892-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:56:41 -07:00
Rosen Penev	d14fe43e00	net: ag71xx: move assignment into main loop Effectively what's going on here is there's a main loop and an identical one below with a single assignment. Simpler to move it up. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240930181823.288892-6-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:56:27 -07:00
Rosen Penev	8b4ed4d5ff	net: ag71xx: replace INIT_LIST_HEAD LIST_HEAD is a shorter macro. No real difference. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240930181823.288892-5-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:56:27 -07:00
Rosen Penev	94656823c1	net: ag71xx: remove platform_set_drvdata platform_get_drvdata is never called as a result of all the devm conversions. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240930181823.288892-4-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:56:27 -07:00
Rosen Penev	27dc497b7b	net: ag71xx: use some dev_err_probe These functions can return EPROBE_DEFER. Don't warn in such a case. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240930181823.288892-3-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:56:27 -07:00
Rosen Penev	ab4239c8a7	net: ag71xx: use devm_ioremap_resource We can just use res directly. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240930181823.288892-2-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 08:56:27 -07:00
Dr. David Alan Gilbert	b63c755cb6	appletalk: Remove deadcode alloc_ltalkdev in net/appletalk/dev.c is dead since commit `00f3696f75` ("net: appletalk: remove cops support") Removing it (and it's helper) leaves dev.c and if_ltalk.h empty; remove them and the Makefile entry. tun.c was including that if_ltalk.h but actually wanted the uapi version for LTALK_ALEN, fix up the path. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-04 12:42:32 +01:00
Nick Child	2ee73c54a6	ibmvnic: Add stat for tx direct vs tx batched Allow tracking of packets sent with send_subcrq direct vs indirect. `ethtool -S <dev>` will now provide a counter of the number of uses of each xmit method. This metric will be useful in performance debugging. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001163531.1803152-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 17:37:32 -07:00
Rosen Penev	4c5107b8f5	net: marvell: mvmdio: use clk_get_optional The code seems to be handling EPROBE_DEFER explicitly and if there's no error, enables the clock. clk_get_optional exists for that. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240930211628.330703-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 17:00:01 -07:00
Jakub Kicinski	b7074e4375	Merge branch 'gve-link-irqs-queues-and-napi-instances' Joe Damato says: ==================== gve: Link IRQs, queues, and NAPI instances This series uses the netdev-genl API to link IRQs and queues to NAPI IDs so that this information is queryable by user apps. This is particularly useful for epoll-based busy polling apps which rely on having access to the NAPI ID. I've tested these commits on a GCP instance with a GVE NIC configured and have included test output in the commit messages for each patch showing how to query the information. [1]: https://lore.kernel.org/netdev/20240926030025.226221-1-jdamato@fastly.com/ ==================== Link: https://patch.msgid.link/20240930210731.1629-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:58:28 -07:00
Joe Damato	021f9e671e	gve: Map NAPI instances to queues Use the netdev-genl interface to map NAPI instances to queues so that this information is accessible to user programs via netlink. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8313, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8314, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8315, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8316, 'type': 'rx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8317, 'type': 'rx'}, [...] {'id': 0, 'ifindex': 2, 'napi-id': 8297, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8298, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8299, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8300, 'type': 'tx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8301, 'type': 'tx'}, [...] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Link: https://patch.msgid.link/20240930210731.1629-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:58:26 -07:00
Joe Damato	3017238b60	gve: Map IRQs to NAPI instances Use netdev-genl interface to map IRQs to NAPI instances so that this information is accessible by user apps via netlink. $ cat /proc/interrupts \| grep gve \| grep -v mgmnt \| cut -f1 --delimiter=':' 34 35 36 37 38 39 40 [...] 65 $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8288, 'ifindex': 2, 'irq': 65}, [...] {'id': 8263, 'ifindex': 2, 'irq': 40}, {'id': 8262, 'ifindex': 2, 'irq': 39}, {'id': 8261, 'ifindex': 2, 'irq': 38}, {'id': 8260, 'ifindex': 2, 'irq': 37}, {'id': 8259, 'ifindex': 2, 'irq': 36}, {'id': 8258, 'ifindex': 2, 'irq': 35}, {'id': 8257, 'ifindex': 2, 'irq': 34}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Link: https://patch.msgid.link/20240930210731.1629-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:58:25 -07:00
Sean Anderson	d772cc25cc	selftests: net: csum: Clean up recv_verify_packet_ipv6 Rename ip_len to payload_len since the length in this case refers only to the payload, and not the entire IP packet like for IPv4. While we're at it, just use the variable directly when calling recv_verify_packet_udp/tcp. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240930162935.980712-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:44:28 -07:00
Amit Cohen	be4e323544	selftests: mlxsw: rtnetlink: Use devlink_reload() API The test runs "devlink reload" explicitly. Instead, it is better to use devlink_reload() which waits for udev events to be processed. Do not sleep after reload, as devlink_reload() blocks until all the netdevs are renamed. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/844509e3057b65277a7181a23c95b71ec95e8a56.1727706741.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:43:44 -07:00
Dr. David Alan Gilbert	25ba2a5ada	net/rds: remove unused struct 'rds_ib_dereg_odp_mr' 'rds_ib_dereg_odp_mr' has been unused since the original commit `2eafa1746f` ("net/rds: Handle ODP mr registration/unregistration"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20240930134358.48647-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:42:52 -07:00
FUJITA Tomonori	3ed8d344e0	rust: net::phy always define device_table in module_phy_driver macro device_table in module_phy_driver macro is defined only when the driver is built as a module. So a PHY driver imports phy::DeviceId module in the following way then hits `unused import` warning when it's compiled as built-in: use kernel::net::phy::DeviceId; kernel::module_phy_driver! { drivers: [PhyQT2025], device_table: [ DeviceId::new_with_driver::<PhyQT2025>(), ], Put device_table in a const. It's not included in the kernel image if unused (when the driver is compiled as built-in), and the compiler doesn't complain. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20240930134038.1309-1-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:42:18 -07:00
Divya Koppera	5fad1c1a09	net: phy: microchip_t1: Interrupt support for lan887x Add support for link up and link down interrupts in lan887x. Signed-off-by: Divya Koppera <divya.koppera@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241001144421.6661-1-divya.koppera@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:22:12 -07:00
Jakub Kicinski	046e64f547	Merge branch 'ipv4-convert-ip_route_input_slow-and-its-callers-to-dscp_t' Guillaume Nault says: ==================== ipv4: Convert ip_route_input_slow() and its callers to dscp_t. Prepare ip_route_input_slow() and its call chain to future conversion of ->flowi4_tos. The ->flowi4_tos field of "struct flowi4" is used in many different places, which makes it hard to convert it from __u8 to dscp_t. In order to avoid a big patch updating all its users at once, this patch series gradually converts some users to dscp_t. Those users now set ->flowi4_tos from a dscp_t variable that is converted to __u8 using inet_dscp_to_dsfield(). When all users of ->flowi4_tos will use a dscp_t variable, converting that field to dscp_t will just be a matter of removing all the inet_dscp_to_dsfield() conversions. This series concentrates on ip_route_input_slow() and its direct and indirect callers. ==================== Link: https://patch.msgid.link/cover.1727807926.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:21:23 -07:00
Guillaume Nault	783946aa03	ipv4: Convert ip_route_input_slow() to dscp_t. Pass a dscp_t variable to ip_route_input_slow(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Only ip_route_input_rcu() actually calls ip_route_input_slow(). Since it already has a dscp_t variable to pass as parameter, we only need to remove the inet_dscp_to_dsfield() conversion. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/d6bca5f87eea9e83a3861e6e05594cdd252583c9.1727807926.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:21:21 -07:00
Guillaume Nault	be612f5e99	ipv4: Convert ip_route_input_rcu() to dscp_t. Pass a dscp_t variable to ip_route_input_rcu(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Callers of ip_route_input_rcu() to consider are: * ip_route_input_noref(), which already has a dscp_t variable to pass as parameter. We just need to remove the inet_dscp_to_dsfield() conversion. * inet_rtm_getroute(), which receives a u8 from user space and needs to convert it with inet_dsfield_to_dscp(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/c4dbb5aa9cbc79c4fcb317abbffa7c7156bc56a7.1727807926.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:21:21 -07:00

1 2 3 4 5 ...

1308975 Commits