linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-09 11:23:09 -04:00

Author	SHA1	Message	Date
Pavel Begunkov	7cb31c46b9	net: cache for same cpu skb_attempt_defer_free Optimise skb_attempt_defer_free() when run by the same CPU the skb was allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can disable softirqs and put the buffer into cpu local caches. CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1% throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles, the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note, I'd expect the win doubled with rx only benchmarks, as the optimisation is for the receive path, but the test spends >55% of CPU doing writes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/a887463fb219d973ec5ad275e31194812571f1f5.1712711977.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:27:32 -07:00
Eric Dumazet	9b9fd45869	tcp: tweak tcp_sock_write_txrx size assertion I forgot 32bit arches might have 64bit alignment for u64 fields. tcp_sock_write_txrx group does not contain pointers, but two u64 fields. It is possible that on 32bit kernel, a 32bit hole is before tp->tcp_clock_cache. I will try to remember a group can be bigger on 32bit kernels in the future. With help from Vladimir Oltean. Fixes: `d2c3a7eb1a` ("tcp: more struct tcp_sock adjustments") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404082207.HCEdQhUO-lkp@intel.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20240409140914.4105429-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 18:55:05 -07:00
Jakub Kicinski	414e576fb0	Merge branch 'selftests-move-bpf-offload-test-from-bpf-to-net' Jakub Kicinski says: ==================== selftests: move bpf-offload test from bpf to net The test_offload.py test fits in networking and bpf equally well. We started adding more Python tests in networking and some of the code in test_offload.py can be reused, so move it to networking. Looks like it bit rotted over time and some fixes are needed. Admittedly more code could be extracted but I only had the time for a minor cleanup :( ==================== Link: https://lore.kernel.org/r/20240409031549.3531084-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 14:03:14 -07:00
Jakub Kicinski	6ce2b68993	selftests: net: reuse common code in bpf_offload net/lib/py/nsim.py already contains the most useful parts of the netdevsim wrapper classes. Reuse them. Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240409031549.3531084-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 14:03:12 -07:00
Jakub Kicinski	b1c2ce11d4	selftests: net: declare section names for bpf_offload Non-ancient ip (iproute2-5.15.0, libbpf 0.7.0) refuses to load the sample with maps because we don't generate BTF: libbpf: BTF is required, but is missing or corrupted. ERROR: opening BPF object file failed Enable BTF by adding -g to clang flags. With that done neither of the programs load: libbpf: prog 'func': error relocating .BTF.ext function info: -22 libbpf: prog 'func': failed to relocate calls: -22 libbpf: failed to load object 'ksft-net-drv/net/sample_ret0.bpf.o' Andrii explains that this is because we don't specify section names for the code. Add the section names, too. Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240409031549.3531084-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 14:03:12 -07:00
Jakub Kicinski	fc50c698c2	selftests: net: bpf_offload: wait for maps Maps are removed asynchronously. Either there's a bigger delay now or the test has always been flaky. Retry waiting in the loop. Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240409031549.3531084-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 14:03:12 -07:00
Jakub Kicinski	e59f0e93e9	selftests: move bpf-offload test from bpf to net We're building more python tests on the netdev side, and some of the classes from the venerable BPF offload tests can be reused. Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240409031549.3531084-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 14:03:12 -07:00
Jianbo Liu	2ecd487b67	net: sched: cls_api: fix slab-use-after-free in fl_dump_key The filter counter is updated under the protection of cb_lock in the cited commit. While waiting for the lock, it's possible the filter is being deleted by other thread, and thus causes UAF when dump it. Fix this issue by moving tcf_block_filter_cnt_update() after tfilter_put(). ================================================================== BUG: KASAN: slab-use-after-free in fl_dump_key+0x1d3e/0x20d0 [cls_flower] Read of size 4 at addr ffff88814f864000 by task tc/2973 CPU: 7 PID: 2973 Comm: tc Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_02_12_41 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x7e/0xc0 print_report+0xc1/0x600 ? __virt_addr_valid+0x1cf/0x390 ? fl_dump_key+0x1d3e/0x20d0 [cls_flower] ? fl_dump_key+0x1d3e/0x20d0 [cls_flower] kasan_report+0xb9/0xf0 ? fl_dump_key+0x1d3e/0x20d0 [cls_flower] fl_dump_key+0x1d3e/0x20d0 [cls_flower] ? lock_acquire+0x1c2/0x530 ? fl_dump+0x172/0x5c0 [cls_flower] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? fl_dump_key_options.part.0+0x10f0/0x10f0 [cls_flower] ? do_raw_spin_lock+0x12d/0x270 ? spin_bug+0x1d0/0x1d0 fl_dump+0x21d/0x5c0 [cls_flower] ? fl_tmplt_dump+0x1f0/0x1f0 [cls_flower] ? nla_put+0x15f/0x1c0 tcf_fill_node+0x51b/0x9a0 ? tc_skb_ext_tc_enable+0x150/0x150 ? __alloc_skb+0x17b/0x310 ? __build_skb_around+0x340/0x340 ? down_write+0x1b0/0x1e0 tfilter_notify+0x1a5/0x390 ? fl_terse_dump+0x400/0x400 [cls_flower] tc_new_tfilter+0x963/0x2170 ? tc_del_tfilter+0x1490/0x1490 ? print_usage_bug.part.0+0x670/0x670 ? lock_downgrade+0x680/0x680 ? security_capable+0x51/0x90 ? tc_del_tfilter+0x1490/0x1490 rtnetlink_rcv_msg+0x75e/0xac0 ? if_nlmsg_stats_size+0x4c0/0x4c0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? __netlink_lookup+0x35e/0x6e0 netlink_rcv_skb+0x12c/0x360 ? if_nlmsg_stats_size+0x4c0/0x4c0 ? netlink_ack+0x15e0/0x15e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? netlink_deliver_tap+0xcd/0xa60 ? netlink_deliver_tap+0xcd/0xa60 ? netlink_deliver_tap+0x1c9/0xa60 netlink_unicast+0x43e/0x700 ? netlink_attachskb+0x750/0x750 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x534/0x6b0 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? entry_SYSCALL_64_after_hwframe+0x46/0x4e ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x390 ___sys_sendmsg+0xeb/0x170 ? __virt_addr_valid+0x1ca/0x390 ? copy_msghdr_from_user+0x110/0x110 ? __delete_object+0xb8/0x100 ? __virt_addr_valid+0x1cf/0x390 ? do_sys_openat2+0x102/0x150 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? do_sys_openat2+0x102/0x150 ? __fget_light+0x53/0x1d0 ? sockfd_lookup_light+0x1a/0x150 __sys_sendmsg+0xb5/0x140 ? __sys_sendmsg_sock+0x20/0x20 ? lock_downgrade+0x680/0x680 do_syscall_64+0x70/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f98e3713367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffc74a64608 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000047eae0 RCX: 00007f98e3713367 RDX: 0000000000000000 RSI: 00007ffc74a64670 RDI: 0000000000000003 RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f98e360c5e8 R11: 0000000000000246 R12: 00007ffc74a6a508 R13: 00000000660d518d R14: 0000000000484a80 R15: 00007ffc74a6a50b </TASK> Allocated by task 2973: kasan_save_stack+0x20/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0x77/0x90 fl_change+0x27a6/0x4540 [cls_flower] tc_new_tfilter+0x879/0x2170 rtnetlink_rcv_msg+0x75e/0xac0 netlink_rcv_skb+0x12c/0x360 netlink_unicast+0x43e/0x700 netlink_sendmsg+0x749/0xc10 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x534/0x6b0 ___sys_sendmsg+0xeb/0x170 __sys_sendmsg+0xb5/0x140 do_syscall_64+0x70/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e Freed by task 283: kasan_save_stack+0x20/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x50 poison_slab_object+0x105/0x190 __kasan_slab_free+0x11/0x30 kfree+0x111/0x340 process_one_work+0x787/0x1490 worker_thread+0x586/0xd30 kthread+0x2df/0x3b0 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x11/0x20 Last potentially related work creation: kasan_save_stack+0x20/0x40 __kasan_record_aux_stack+0x9b/0xb0 insert_work+0x25/0x1b0 __queue_work+0x640/0xc90 rcu_work_rcufn+0x42/0x70 rcu_core+0x6a9/0x1850 __do_softirq+0x264/0x88f Second to last potentially related work creation: kasan_save_stack+0x20/0x40 __kasan_record_aux_stack+0x9b/0xb0 __call_rcu_common.constprop.0+0x6f/0xac0 queue_rcu_work+0x56/0x70 fl_mask_put+0x20d/0x270 [cls_flower] __fl_delete+0x352/0x6b0 [cls_flower] fl_delete+0x97/0x160 [cls_flower] tc_del_tfilter+0x7d1/0x1490 rtnetlink_rcv_msg+0x75e/0xac0 netlink_rcv_skb+0x12c/0x360 netlink_unicast+0x43e/0x700 netlink_sendmsg+0x749/0xc10 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x534/0x6b0 ___sys_sendmsg+0xeb/0x170 __sys_sendmsg+0xb5/0x140 do_syscall_64+0x70/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e Fixes: `2081fd3445` ("net: sched: cls_api: add filter counter") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Tested-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-10 08:28:26 +01:00
Jakub Kicinski	811b836285	Merge branch 'minor-cleanups-to-skb-frag-ref-unref' Mina Almasry says: ==================== Minor cleanups to skb frag ref/unref (part) This series is largely motivated by a recent discussion where there was some confusion on how to properly ref/unref pp pages vs non pp pages: https://lore.kernel.org/netdev/CAHS8izOoO-EovwMwAm9tLYetwikNPxC0FKyVGu1TPJWSz4bGoA@mail.gmail.com/T/#t There is some subtely there because pp uses page->pp_ref_count for refcounting, while non-pp uses get_page()/put_page() for ref counting. Getting the refcounting pairs wrong can lead to kernel crash. [...] https://lore.kernel.org/lkml/CAHS8izN436pn3SndrzsCyhmqvJHLyxgCeDpWXA4r1ANt3RCDLQ@mail.gmail.com/T/ ==================== Link: https://lore.kernel.org/r/20240408153000.2152844-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 18:20:32 -07:00
Mina Almasry	f58f3c9563	net: remove napi_frag_unref With the changes in the last patches, napi_frag_unref() is now reduandant. Remove it and use skb_page_unref directly. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240408153000.2152844-4-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 18:20:32 -07:00
Mina Almasry	959fa5c188	net: make napi_frag_unref reuse skb_page_unref The implementations of these 2 functions are almost identical. Remove the implementation of napi_frag_unref, and make it a call into skb_page_unref so we don't duplicate the implementation. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240408153000.2152844-2-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 18:20:29 -07:00
Jakub Kicinski	445e603038	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== net/e1000e, igb, igc: Remove redundant runtime resume Bjorn Helgaas says: e1000e, igb, and igc all have code to runtime resume the device during ethtool operations. Since `f32a213765` ("ethtool: runtime-resume netdev parent before ethtool ioctl ops"), dev_ethtool() does this for us, so remove it from the individual drivers. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Remove redundant runtime resume for ethtool ops igb: Remove redundant runtime resume for ethtool_ops e1000e: Remove redundant runtime resume for ethtool_ops ==================== Link: https://lore.kernel.org/r/20240408210849.3641172-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 17:33:40 -07:00
Jakub Kicinski	91f2210ce3	Merge branch 'bonding-remove-rtnl-from-three-sysfs-files' Eric Dumazet says: ==================== bonding: remove RTNL from three sysfs files First patch might fix a potential deadlock. sysfs handlers should use rtnl_trylock() instead of rtnl_lock(). Following files can be read without acquiring RTNL : - /sys/class/net/bonding_masters - /sys/class/net/<name>/bonding/slaves - /sys/class/net/<name>/bonding/queue_id ==================== Link: https://lore.kernel.org/r/20240408190437.2214473-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 17:31:48 -07:00
Eric Dumazet	662e451d9a	bonding: no longer use RTNL in bonding_show_queue_id() Annotate lockless reads of slave->queue_id. Annotate writes of slave->queue_id. Switch bonding_show_queue_id() to rcu_read_lock() and bond_for_each_slave_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 17:31:45 -07:00
Eric Dumazet	d67fed98ca	bonding: no longer use RTNL in bonding_show_slaves() Slave devices are already RCU protected, simply switch to bond_for_each_slave_rcu(), Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 17:31:45 -07:00
Eric Dumazet	6c5d17143f	bonding: no longer use RTNL in bonding_show_bonds() netdev structures are already RCU protected. Change bond_init() and bond_uninit() to use RCU enabled list_add_tail_rcu() and list_del_rcu(). Then bonding_show_bonds() can use rcu_read_lock() while iterating through bn->dev_list. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 17:31:45 -07:00
Kuan-Wei Chiu	d034d02de8	net: sched: cake: Optimize the number of function calls and branches in heap construction When constructing a heap, heapify operations are required on all non-leaf nodes. Thus, determining the index of the first non-leaf node is crucial. In a heap, the left child's index of node i is 2 * i + 1 and the right child's index is 2 * i + 2. Node CAKE_MAX_TINS * CAKE_QUEUES / 2 has its left and right children at indexes CAKE_MAX_TINS * CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2, respectively, which are beyond the heap's range, indicating it as a leaf node. Conversely, node CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 has a left child at index CAKE_MAX_TINS * CAKE_QUEUES - 1, confirming its non-leaf status. The loop should start from it since it's not a leaf node. By starting the loop from CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1, we minimize function calls and branch condition evaluations. This adjustment theoretically reduces two function calls (one for cake_heapify() and another for cake_heap_get_backlog()) and five branch evaluations (one for iterating all non-leaf nodes, one within cake_heapify()'s while loop, and three more within the while loop with if conditions). Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20240408174716.751069-1-visitorckw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 17:30:14 -07:00
Asbjørn Sloth Tønnesen	545d95e5f1	cxgb4: flower: use NL_SET_ERR_MSG_MOD for validation errors Replace netdev_{warn,err} with NL_SET_ERR_MSG_{FMT_,}MOD to better inform the user about the problem. Only compile-tested, no access to HW. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://lore.kernel.org/r/20240408165506.94483-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 17:11:04 -07:00
Catalin Popescu	9ef9ecfa9e	net: phy: dp8382x: keep WOL settings across suspends Unlike other ethernet PHYs from TI, PHY dp8382x has WOL enabled at reset. The driver explicitly disables WOL in config_init callback which is called during init and during resume from suspend. Hence, WOL is unconditionally disabled during resume, even if it was enabled before the suspend. We make sure that WOL configuration is persistent across suspends. Signed-off-by: Catalin Popescu <catalin.popescu@leica-geosystems.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240408082602.3654090-1-catalin.popescu@leica-geosystems.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-09 16:57:55 -07:00
Paolo Abeni	6a053f07d5	Merge branch 'net-phy-micrel-lan8814-enable-ptp_pf_perout' Horatiu Vultur says: ==================== net: phy: micrel: lan8814: Enable PTP_PF_PEROUT Add support for PTP_PF_PEROUT to lan8814. First patch just enables the LTC at probe time, such that it is not required to enable timestamping to have the LTC enabled. While the second patch actually adds support for PTP_PF_PEROUT. ==================== Link: https://lore.kernel.org/r/20240408064432.3881636-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 13:34:06 +02:00
Horatiu Vultur	9e63941b89	net: phy: micrel: lan8814: Add support for PTP_PF_PEROUT Lan8814 has 24 GPIOs but only 2 GPIOs (GPIO 0 and GPIO 1) can be configured to generate period signals. And there are 2 events (EVENT_A and EVENT_B) but these events are hardcoded to the GPIO 0 and GPIO 1. These events are used to generate period signals. It is possible to configure the length, the start time and the period of the signal by configuring the event. These events are generated by comparing the target time with the PHC time. In case the PHC time is changed to a value bigger than the target time + reload time, then it would generate only 1 event and then it would stop because target time + reload time is smaller than PHC time. Therefore it is required to change also the target time every time when the PHC is changed. The same will apply also when the PHC time is changed to a smaller value. This was tested using: testptp -i 1 -L 1,2 testptp -i 1 -p 1000000000 -w 200000000 Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 13:34:03 +02:00
Horatiu Vultur	9f6b3a4981	net: phy: micrel: lan8814: Enable LTC at probe time The LTC for lan8814 was enabled only if timestamping was enabled, otherwise it would be stopped. Meaning that LTC will not increase by itself. This might break other features that don't required timestamping like generating 1PPS. Therefore enable the LTC at probe time. Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 13:34:03 +02:00
Sascha Hauer	220d63f249	dt-bindings: net: rockchip-dwmac: use rgmii-id in example The dwmac supports specifying the RGMII clock delays, but it is recommended to use rgmii-id and to specify the delays in the phy node instead [1]. Change the example accordingly to no longer promote this undesired setting. [1] https://lore.kernel.org/all/1a0de7b4-f0f7-4080-ae48-f5ffa9e76be3@lunn.ch/ Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Dragan Simic <dsimic@manjaro.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240408-rockchip-dwmac-rgmii-id-binding-v1-1-3886d1a8bd54@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 13:29:02 +02:00
Paolo Abeni	d2fd6cf39a	Merge branch 'tcp-fix-isn-selection-in-timewait-syn_recv' Eric Dumazet says: ==================== tcp: fix ISN selection in TIMEWAIT -> SYN_RECV TCP can transform a TIMEWAIT socket into a SYN_RECV one from a SYN packet, and the ISN of the SYNACK packet is normally generated using TIMEWAIT tw_snd_nxt. This SYN packet also bypasses normal checks against listen queue being full or not. Unfortunately this has been broken almost one decade ago. This series fixes the issue, in two patches. First patch refactors code to add tcp_tw_isn as a parameter to ->route_req(), to make the second patch smaller. Second patch fixes the issue, by no longer using TCP_SKB_CB(skb) to store the tcp_tw_isn. Following packetdrill test passes after this series: // Set up a server listening socket. 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // Establish connection +0 < S 0:0(0) win 32792 <mss 1460,nop,nop,sackOK> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK> +.01 < . 1:1(0) ack 1 win 32792 +0 accept(3, ..., ...) = 4 // We close(), send a FIN, and get an ACK and FIN, in order to get into TIME_WAIT. +.01 close(4) = 0 +0 > F. 1:1(0) ack 1 +.01 < F. 1:1(0) ack 2 win 32792 +0 > . 2:2(0) ack 2 // SYN hitting a TIME_WAIT -> should use an ISN based on TIMEWAIT tw_snd_nxt +.01 < S 1000:1000(0) win 65535 <mss 1460,nop,nop,sackOK> +0 > S. 65539:65539(0) ack 1001 <mss 1460,nop,nop,sackOK> ==================== Link: https://lore.kernel.org/r/20240407093322.3172088-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 11:47:44 +02:00
Eric Dumazet	41eecbd712	tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field TCP can transform a TIMEWAIT socket into a SYN_RECV one from a SYN packet, and the ISN of the SYNACK packet is normally generated using TIMEWAIT tw_snd_nxt : tcp_timewait_state_process() ... u32 isn = tcptw->tw_snd_nxt + 65535 + 2; if (isn == 0) isn++; TCP_SKB_CB(skb)->tcp_tw_isn = isn; return TCP_TW_SYN; This SYN packet also bypasses normal checks against listen queue being full or not. tcp_conn_request() ... __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn; ... /* TW buckets are converted to open requests without * limitations, they conserve resources and peer is * evidently real one. */ if ((syncookies == 2 \|\| inet_csk_reqsk_queue_is_full(sk)) && !isn) { want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name); if (!want_cookie) goto drop; } This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb. Unfortunately this field has been accidentally cleared after the call to tcp_timewait_state_process() returning TCP_TW_SYN. Using a field in TCP_SKB_CB(skb) for a temporary state is overkill. Switch instead to a per-cpu variable. As a bonus, we do not have to clear tcp_tw_isn in TCP receive fast path. It is temporarily set then cleared only in the TCP_TW_SYN dance. Fixes: `4ad19de877` ("net: tcp6: fix double call of tcp_v6_fill_cb()") Fixes: `eeea10b83a` ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 11:47:40 +02:00
Eric Dumazet	b9e8104058	tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() tcp_v6_init_req() reads TCP_SKB_CB(skb)->tcp_tw_isn to find out if the request socket is created by a SYN hitting a TIMEWAIT socket. This has been buggy for a decade, lets directly pass the information from tcp_conn_request(). This is a preparatory patch to make the following one easier to review. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 11:47:40 +02:00
Paolo Abeni	1c25fe9a04	Merge branch 'add-support-for-flower-actions-mirred-and-redirect' Daniel Machon says: ==================== Add support for flower actions mirred and redirect This series adds support for the two tc flower actions mirred and redirect. Both actions are implemented by means of a port mask and a mask mode. The mask mode controls how the mask is applied, and together they are used by the switch to make a forwarding decision. Both actions are configurable via the IS0 or IS2 VCAP's (ingress stage 0 and 2, respectively). Patch #1: adds support for tc flower mirred action. Patch #2: adds support for tc flower redirect action. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> ==================== Link: https://lore.kernel.org/r/20240405-mirror-redirect-actions-v2-0-875d4c1927c8@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 10:45:13 +02:00
Daniel Machon	1164b8e0b1	net: sparx5: add support for tc flower redirect action Add support for the flower redirect action. Two VCAP actions are encoded in the rule - one for the port mask, and one for the port mask mode. When the rule is hit, the port mask is used as the final destination set, replacing all other port masks. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 10:45:10 +02:00
Daniel Machon	48ba00da2e	net: sparx5: add support for tc flower mirred action. Add support for tc flower mirred action. Two VCAP actions are encoded in the rule - one for the port mask, and one for the port mask mode. When the rule is hit, the destination mask is OR'ed with the port mask. Also add new VCAP function for supporting 72-bit wide actions, and a tc helper for setting the port forwarding mask. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 10:45:10 +02:00
Paolo Abeni	74bd5dbe1b	Merge branch 'support-icssg-based-ethernet-on-am65x-sr1-0-devices' Diogo Ivo says: ==================== Support ICSSG-based Ethernet on AM65x SR1.0 devices This series extends the current ICSSG-based Ethernet driver to support AM65x Silicon Revision 1.0 devices. Notable differences between the Silicon Revisions are that there is no TX core in SR1.0 with this being handled by the firmware, requiring extra DMA channels to manage communication with the firmware (with the firmware being different as well) and in the packet classifier. The motivation behind it is that a significant number of Siemens devices containing SR1.0 silicon have been deployed in the field and need to be supported and updated to newer kernel versions without losing functionality. This series is based on TI's 5.10 SDK [1]. The fifth version of this patch series can be found in [2]. Compared to the last version of the patch set there are only changes in patch 05/10, where the fields of a struct are now explicitly declared as __le32 so that we can properly interpret them. Both of the problems mentioned in v4 have been addressed by disabling those functionalities, meaning that this driver currently only supports one TX queue and does not support a 100Mbit/s half-duplex connection. The removal of these features has been commented in the appropriate locations in the code. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y [2]: https://lore.kernel.org/netdev/20240326110709.26165-1-diogo.ivo@siemens.com/ ==================== Link: https://lore.kernel.org/r/20240403104821.283832-1-diogo.ivo@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:32 +02:00
Diogo Ivo	e654b85a69	net: ti: icssg-prueth: Add ICSSG Ethernet driver for AM65x SR1.0 platforms Add the PRUeth driver for the ICSSG subsystem found in AM65x SR1.0 devices. The main differences that set SR1.0 and SR2.0 apart are the missing TXPRU core in SR1.0, two extra DMA channels for management purposes and different firmware that needs to be configured accordingly. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	ce95cb4c8d	net: ti: icssg-prueth: Modify common functions for SR1.0 Some parts of the logic differ only slightly between Silicon Revisions. In these cases add the bits that differ to a common function that executes those bits conditionally based on the Silicon Revision. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	0a74a9de79	net: ti: icssg-prueth: Add functions to configure SR1.0 packet classifier Add the functions to configure the SR1.0 packet classifier. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	604e603d73	net: ti: icssg-prueth: Adjust the number of TX channels for SR1.0 As SR1.0 uses the current higher priority channel to send commands to the firmware, take this into account when setting/getting the number of channels to/from the user. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	95c2e68933	net: ti: icssg-prueth: Adjust IPG configuration for SR1.0 Correctly adjust the IPG based on the Silicon Revision. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	8623dea207	net: ti: icssg-prueth: Add SR1.0-specific description bits Add a field to distinguish between SR1.0 and SR2.0 in the driver as well as the necessary structures to program SR1.0. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	6d6a5751cd	net: ti: icssg-prueth: Add SR1.0-specific configuration bits Define the firmware configuration structure and commands needed to communicate with SR1.0 firmware, as well as SR1.0 buffer information where it differs from SR2.0. Based on the work of Roger Quadros, Murali Karicheri and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	e2dc7bfd67	net: ti: icssg-prueth: Move common functions into a separate file In order to allow code sharing between Silicon Revisions 1.0 and 2.0 move all functions that can be shared into a common file. This commit introduces no functional changes. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:29 +02:00
Diogo Ivo	e1900d7ba9	eth: Move IPv4/IPv6 multicast address bases to their own symbols As these addresses can be useful outside of checking if an address is a multicast address (for example in device drivers) make them accessible to users of etherdevice.h to avoid code duplication. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:28 +02:00
Diogo Ivo	dc073430db	dt-bindings: net: Add support for AM65x SR1.0 in ICSSG Silicon Revision 1.0 of the AM65x came with a slightly different ICSSG support: Only 2 PRUs per slice are available and instead 2 additional DMA channels are used for management purposes. We have no restrictions on specified PRUs, but the DMA channels need to be adjusted. Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-09 09:47:28 +02:00
Dan Carpenter	87c33315af	net: phy: air_en8811h: fix some error codes These error paths accidentally return "ret" which is zero/success instead of the correct error code. Fixes: `71e7943011` ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7ef2e230-dfb7-4a77-8973-9e5be1a99fc2@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-08 19:46:16 -07:00
Allen Pais	775d2e2b30	archnet: Convert from tasklet to BH workqueue The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/net/archnet/* from tasklet to BH workqueue. Based on the work done by Tejun Heo <tj@kernel.org> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20240403162306.20258-1-apais@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-08 19:44:09 -07:00
Bjorn Helgaas	75f16e06df	igc: Remove redundant runtime resume for ethtool ops `8c5ad0dae9` ("igc: Add ethtool support") added ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to resume suspended devices before any ethtool_ops callback and allow suspend after it completed. Subsequently, `f32a213765` ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path, so the device is resumed before any ethtool_ops callback even if the driver didn't supply a .begin() callback. Remove the .begin() and .complete() callbacks, which are now redundant because dev_ethtool() already resumes the device. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-04-08 13:25:39 -07:00
Bjorn Helgaas	461359c4f3	igb: Remove redundant runtime resume for ethtool_ops `749ab2cd12` ("igb: add basic runtime PM support") added ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to resume suspended devices before any ethtool_ops callback and allow suspend after it completed. Subsequently, `f32a213765` ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path, so the device is resumed before any ethtool_ops callback even if the driver didn't supply a .begin() callback. Remove the .begin() and .complete() callbacks, which are now redundant because dev_ethtool() already resumes the device. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-04-08 13:25:39 -07:00
Bjorn Helgaas	b2c289415b	e1000e: Remove redundant runtime resume for ethtool_ops `e60b22c5b7` ("e1000e: fix accessing to suspended device") added ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to resume suspended devices before any ethtool_ops callback and allow suspend after it completed. `3ef672ab18` ("e1000e: ethtool unnecessarily takes device out of RPM suspend") removed ethtool_ops.begin() and .complete() and instead did pm_runtime_get_sync() only in the individual ethtool_ops callbacks that access device registers. Subsequently, `f32a213765` ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path, so the device is resumed before any ethtool_ops callback, as it was before `3ef672ab18`. Remove most runtime resumes from ethtool_ops, which are now redundant because the resume has already been done by dev_ethtool(). This is essentially a revert of `3ef672ab18` ("e1000e: ethtool unnecessarily takes device out of RPM suspend"). There are a couple subtleties: - Prior to `3ef672ab18`, the device was resumed only for the duration of a single ethtool callback. `3ef672ab18` changed e1000_set_phys_id() so the device was resumed for ETHTOOL_ID_ACTIVE and remained resumed until a subsequent callback for ETHTOOL_ID_INACTIVE. Preserve that part of `3ef672ab18` so the device will not be runtime suspended while in the ETHTOOL_ID_ACTIVE state. - `3ef672ab18` added "if (!pm_runtime_suspended())" in before reading the STATUS register in e1000_get_settings(). This was racy and is now unnecessary because dev_ethtool() has resumed the device already, so revert that. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-04-08 13:25:39 -07:00
Heiner Kallweit	39f59c72ad	r8169: add support for RTL8168M A user reported an unknown chip version. According to the r8168 vendor driver it's called RTL8168M, but handling is identical to RTL8168H. So let's simply treat it as RTL8168H. Tested-by: Евгений <octobergun@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-08 14:25:53 +01:00
David S. Miller	358961f51f	Merge branch 'devlink-io-eqs' Parav Pandit says: ==================== devlink: Add port function attribute for IO EQs Currently, PCI SFs and VFs use IO event queues to deliver netdev per channel events. The number of netdev channels is a function of IO event queues. In the second scenario of an RDMA device, the completion vectors are also a function of IO event queues. Currently, an administrator on the hypervisor has no means to provision the number of IO event queues for the SF device or the VF device. Device/firmware determines some arbitrary value for these IO event queues. Due to this, the SF netdev channels are unpredictable, and consequently, the performance is too. This short series introduces a new port function attribute: max_io_eqs. The goal is to provide administrators at the hypervisor level with the ability to provision the maximum number of IO event queues for a function. This gives the control to the administrator to provision right number of IO event queues and have predictable performance. Examples of when an administrator provisions (set) maximum number of IO event queues when using switchdev mode: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10 $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20 $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20 This sets the corresponding maximum IO event queues of the function before it is enumerated. Thus, when the VF/SF driver reads the capability from the device, it sees the value provisioned by the hypervisor. The driver is then able to configure the number of channels for the net device, as well as the number of completion vectors for the RDMA device. The device/firmware also honors the provisioned value, hence any VF/SF driver attempting to create IO EQs beyond provisioned value results in an error. With above setting now, the administrator is able to achieve the 2x performance on SFs with 20 channels. In second example when SF was provisioned for a container with 2 cpus, the administrator provisioned only 2 IO event queues, thereby saving device resources. With the above settings now in place, the administrator achieved 2x performance with the SF device with 20 channels. In the second example, when the SF was provisioned for a container with 2 CPUs, the administrator provisioned only 2 IO event queues, thereby saving device resources. changelog: v2->v3: - limited to 80 chars per line in devlink - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock on error path v1->v2: - limited comment to 80 chars per line in header file - fixed set function variables for reverse christmas tree - fixed comments from Kalesh - fixed missing kfree in get call - returning error code for get cmd failure - fixed error msg copy paste error in set on cmd failure ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-08 14:10:45 +01:00
Parav Pandit	93197c7c50	mlx5/core: Support max_io_eqs for a function Implement get and set for the maximum IO event queues for SF and VF. This enables administrator on the hypervisor to control the maximum IO event queues which are typically used to derive the maximum and default number of net device channels or rdma device completion vectors. Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-08 14:10:45 +01:00
Parav Pandit	5af3e3876d	devlink: Support setting max_io_eqs Many devices send event notifications for the IO queues, such as tx and rx queues, through event queues. Enable a privileged owner, such as a hypervisor PF, to set the number of IO event queues for the VF and SF during the provisioning stage. example: Get maximum IO event queues of the VF device:: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 Set maximum IO event queues of the VF device:: $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-08 14:10:45 +01:00
Eric Dumazet	4308811ba9	net: display more skb fields in skb_dump() Print these additional fields in skb_dump() to ease debugging. - mac_len - csum_start (in v2, at Willem suggestion) - csum_offset (in v2, at Willem suggestion) - priority - mark - alloc_cpu - vlan_all - encapsulation - inner_protocol - inner_mac_header - inner_network_header - inner_transport_header Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-08 14:06:31 +01:00

1 2 3 4 5 ...

1265679 Commits