Commit Graph

1265679 Commits

Author SHA1 Message Date
Pavel Begunkov
7cb31c46b9 net: cache for same cpu skb_attempt_defer_free
Optimise skb_attempt_defer_free() when run by the same CPU the skb was
allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
disable softirqs and put the buffer into cpu local caches.

CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1%
throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles,
the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note,
I'd expect the win doubled with rx only benchmarks, as the optimisation
is for the receive path, but the test spends >55% of CPU doing writes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/a887463fb219d973ec5ad275e31194812571f1f5.1712711977.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10 19:27:32 -07:00
Eric Dumazet
9b9fd45869 tcp: tweak tcp_sock_write_txrx size assertion
I forgot 32bit arches might have 64bit alignment for u64
fields.

tcp_sock_write_txrx group does not contain pointers,
but two u64 fields. It is possible that on 32bit kernel,
a 32bit hole is before tp->tcp_clock_cache.

I will try to remember a group can be bigger on 32bit
kernels in the future.

With help from Vladimir Oltean.

Fixes: d2c3a7eb1a ("tcp: more struct tcp_sock adjustments")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404082207.HCEdQhUO-lkp@intel.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240409140914.4105429-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10 18:55:05 -07:00
Jakub Kicinski
414e576fb0 Merge branch 'selftests-move-bpf-offload-test-from-bpf-to-net'
Jakub Kicinski says:

====================
selftests: move bpf-offload test from bpf to net

The test_offload.py test fits in networking and bpf equally
well. We started adding more Python tests in networking
and some of the code in test_offload.py can be reused,
so move it to networking. Looks like it bit rotted over
time and some fixes are needed.

Admittedly more code could be extracted but I only had
the time for a minor cleanup :(
====================

Link: https://lore.kernel.org/r/20240409031549.3531084-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10 14:03:14 -07:00
Jakub Kicinski
6ce2b68993 selftests: net: reuse common code in bpf_offload
net/lib/py/nsim.py already contains the most useful parts
of the netdevsim wrapper classes. Reuse them.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10 14:03:12 -07:00
Jakub Kicinski
b1c2ce11d4 selftests: net: declare section names for bpf_offload
Non-ancient ip (iproute2-5.15.0, libbpf 0.7.0) refuses to load
the sample with maps because we don't generate BTF:

   libbpf: BTF is required, but is missing or corrupted.
   ERROR: opening BPF object file failed

Enable BTF by adding -g to clang flags. With that done
neither of the programs load:

  libbpf: prog 'func': error relocating .BTF.ext function info: -22
  libbpf: prog 'func': failed to relocate calls: -22
  libbpf: failed to load object 'ksft-net-drv/net/sample_ret0.bpf.o'

Andrii explains that this is because we don't specify
section names for the code. Add the section names, too.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10 14:03:12 -07:00
Jakub Kicinski
fc50c698c2 selftests: net: bpf_offload: wait for maps
Maps are removed asynchronously. Either there's a bigger delay
now or the test has always been flaky. Retry waiting in the loop.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10 14:03:12 -07:00
Jakub Kicinski
e59f0e93e9 selftests: move bpf-offload test from bpf to net
We're building more python tests on the netdev side, and some
of the classes from the venerable BPF offload tests can be reused.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10 14:03:12 -07:00
Jianbo Liu
2ecd487b67 net: sched: cls_api: fix slab-use-after-free in fl_dump_key
The filter counter is updated under the protection of cb_lock in the
cited commit. While waiting for the lock, it's possible the filter is
being deleted by other thread, and thus causes UAF when dump it.

Fix this issue by moving tcf_block_filter_cnt_update() after
tfilter_put().

 ==================================================================
 BUG: KASAN: slab-use-after-free in fl_dump_key+0x1d3e/0x20d0 [cls_flower]
 Read of size 4 at addr ffff88814f864000 by task tc/2973

 CPU: 7 PID: 2973 Comm: tc Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_02_12_41 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x7e/0xc0
  print_report+0xc1/0x600
  ? __virt_addr_valid+0x1cf/0x390
  ? fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  ? fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  kasan_report+0xb9/0xf0
  ? fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  ? lock_acquire+0x1c2/0x530
  ? fl_dump+0x172/0x5c0 [cls_flower]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? fl_dump_key_options.part.0+0x10f0/0x10f0 [cls_flower]
  ? do_raw_spin_lock+0x12d/0x270
  ? spin_bug+0x1d0/0x1d0
  fl_dump+0x21d/0x5c0 [cls_flower]
  ? fl_tmplt_dump+0x1f0/0x1f0 [cls_flower]
  ? nla_put+0x15f/0x1c0
  tcf_fill_node+0x51b/0x9a0
  ? tc_skb_ext_tc_enable+0x150/0x150
  ? __alloc_skb+0x17b/0x310
  ? __build_skb_around+0x340/0x340
  ? down_write+0x1b0/0x1e0
  tfilter_notify+0x1a5/0x390
  ? fl_terse_dump+0x400/0x400 [cls_flower]
  tc_new_tfilter+0x963/0x2170
  ? tc_del_tfilter+0x1490/0x1490
  ? print_usage_bug.part.0+0x670/0x670
  ? lock_downgrade+0x680/0x680
  ? security_capable+0x51/0x90
  ? tc_del_tfilter+0x1490/0x1490
  rtnetlink_rcv_msg+0x75e/0xac0
  ? if_nlmsg_stats_size+0x4c0/0x4c0
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? __netlink_lookup+0x35e/0x6e0
  netlink_rcv_skb+0x12c/0x360
  ? if_nlmsg_stats_size+0x4c0/0x4c0
  ? netlink_ack+0x15e0/0x15e0
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? netlink_deliver_tap+0xcd/0xa60
  ? netlink_deliver_tap+0xcd/0xa60
  ? netlink_deliver_tap+0x1c9/0xa60
  netlink_unicast+0x43e/0x700
  ? netlink_attachskb+0x750/0x750
  ? lock_acquire+0x1c2/0x530
  ? __might_fault+0xbb/0x170
  netlink_sendmsg+0x749/0xc10
  ? netlink_unicast+0x700/0x700
  ? __might_fault+0xbb/0x170
  ? netlink_unicast+0x700/0x700
  __sock_sendmsg+0xc5/0x190
  ____sys_sendmsg+0x534/0x6b0
  ? import_iovec+0x7/0x10
  ? kernel_sendmsg+0x30/0x30
  ? __copy_msghdr+0x3c0/0x3c0
  ? entry_SYSCALL_64_after_hwframe+0x46/0x4e
  ? lock_acquire+0x1c2/0x530
  ? __virt_addr_valid+0x116/0x390
  ___sys_sendmsg+0xeb/0x170
  ? __virt_addr_valid+0x1ca/0x390
  ? copy_msghdr_from_user+0x110/0x110
  ? __delete_object+0xb8/0x100
  ? __virt_addr_valid+0x1cf/0x390
  ? do_sys_openat2+0x102/0x150
  ? lockdep_hardirqs_on_prepare+0x284/0x400
  ? do_sys_openat2+0x102/0x150
  ? __fget_light+0x53/0x1d0
  ? sockfd_lookup_light+0x1a/0x150
  __sys_sendmsg+0xb5/0x140
  ? __sys_sendmsg_sock+0x20/0x20
  ? lock_downgrade+0x680/0x680
  do_syscall_64+0x70/0x140
  entry_SYSCALL_64_after_hwframe+0x46/0x4e
 RIP: 0033:0x7f98e3713367
 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 RSP: 002b:00007ffc74a64608 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 000000000047eae0 RCX: 00007f98e3713367
 RDX: 0000000000000000 RSI: 00007ffc74a64670 RDI: 0000000000000003
 RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
 R10: 00007f98e360c5e8 R11: 0000000000000246 R12: 00007ffc74a6a508
 R13: 00000000660d518d R14: 0000000000484a80 R15: 00007ffc74a6a50b
  </TASK>

 Allocated by task 2973:
  kasan_save_stack+0x20/0x40
  kasan_save_track+0x10/0x30
  __kasan_kmalloc+0x77/0x90
  fl_change+0x27a6/0x4540 [cls_flower]
  tc_new_tfilter+0x879/0x2170
  rtnetlink_rcv_msg+0x75e/0xac0
  netlink_rcv_skb+0x12c/0x360
  netlink_unicast+0x43e/0x700
  netlink_sendmsg+0x749/0xc10
  __sock_sendmsg+0xc5/0x190
  ____sys_sendmsg+0x534/0x6b0
  ___sys_sendmsg+0xeb/0x170
  __sys_sendmsg+0xb5/0x140
  do_syscall_64+0x70/0x140
  entry_SYSCALL_64_after_hwframe+0x46/0x4e

 Freed by task 283:
  kasan_save_stack+0x20/0x40
  kasan_save_track+0x10/0x30
  kasan_save_free_info+0x37/0x50
  poison_slab_object+0x105/0x190
  __kasan_slab_free+0x11/0x30
  kfree+0x111/0x340
  process_one_work+0x787/0x1490
  worker_thread+0x586/0xd30
  kthread+0x2df/0x3b0
  ret_from_fork+0x2d/0x70
  ret_from_fork_asm+0x11/0x20

 Last potentially related work creation:
  kasan_save_stack+0x20/0x40
  __kasan_record_aux_stack+0x9b/0xb0
  insert_work+0x25/0x1b0
  __queue_work+0x640/0xc90
  rcu_work_rcufn+0x42/0x70
  rcu_core+0x6a9/0x1850
  __do_softirq+0x264/0x88f

 Second to last potentially related work creation:
  kasan_save_stack+0x20/0x40
  __kasan_record_aux_stack+0x9b/0xb0
  __call_rcu_common.constprop.0+0x6f/0xac0
  queue_rcu_work+0x56/0x70
  fl_mask_put+0x20d/0x270 [cls_flower]
  __fl_delete+0x352/0x6b0 [cls_flower]
  fl_delete+0x97/0x160 [cls_flower]
  tc_del_tfilter+0x7d1/0x1490
  rtnetlink_rcv_msg+0x75e/0xac0
  netlink_rcv_skb+0x12c/0x360
  netlink_unicast+0x43e/0x700
  netlink_sendmsg+0x749/0xc10
  __sock_sendmsg+0xc5/0x190
  ____sys_sendmsg+0x534/0x6b0
  ___sys_sendmsg+0xeb/0x170
  __sys_sendmsg+0xb5/0x140
  do_syscall_64+0x70/0x140
  entry_SYSCALL_64_after_hwframe+0x46/0x4e

Fixes: 2081fd3445 ("net: sched: cls_api: add filter counter")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Tested-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-10 08:28:26 +01:00
Jakub Kicinski
811b836285 Merge branch 'minor-cleanups-to-skb-frag-ref-unref'
Mina Almasry says:

====================
Minor cleanups to skb frag ref/unref (part)

This series is largely motivated by a recent discussion where there was
some confusion on how to properly ref/unref pp pages vs non pp pages:

https://lore.kernel.org/netdev/CAHS8izOoO-EovwMwAm9tLYetwikNPxC0FKyVGu1TPJWSz4bGoA@mail.gmail.com/T/#t

There is some subtely there because pp uses page->pp_ref_count for
refcounting, while non-pp uses get_page()/put_page() for ref counting.
Getting the refcounting pairs wrong can lead to kernel crash.
[...]

https://lore.kernel.org/lkml/CAHS8izN436pn3SndrzsCyhmqvJHLyxgCeDpWXA4r1ANt3RCDLQ@mail.gmail.com/T/
====================

Link: https://lore.kernel.org/r/20240408153000.2152844-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 18:20:32 -07:00
Mina Almasry
f58f3c9563 net: remove napi_frag_unref
With the changes in the last patches, napi_frag_unref() is now
reduandant. Remove it and use skb_page_unref directly.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240408153000.2152844-4-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 18:20:32 -07:00
Mina Almasry
959fa5c188 net: make napi_frag_unref reuse skb_page_unref
The implementations of these 2 functions are almost identical. Remove
the implementation of napi_frag_unref, and make it a call into
skb_page_unref so we don't duplicate the implementation.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240408153000.2152844-2-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 18:20:29 -07:00
Jakub Kicinski
445e603038 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
net/e1000e, igb, igc: Remove redundant runtime resume

Bjorn Helgaas says:

e1000e, igb, and igc all have code to runtime resume the device during
ethtool operations.

Since f32a213765 ("ethtool: runtime-resume netdev parent before ethtool
ioctl ops"), dev_ethtool() does this for us, so remove it from the
individual drivers.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Remove redundant runtime resume for ethtool ops
  igb: Remove redundant runtime resume for ethtool_ops
  e1000e: Remove redundant runtime resume for ethtool_ops
====================

Link: https://lore.kernel.org/r/20240408210849.3641172-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:33:40 -07:00
Jakub Kicinski
91f2210ce3 Merge branch 'bonding-remove-rtnl-from-three-sysfs-files'
Eric Dumazet says:

====================
bonding: remove RTNL from three sysfs files

First patch might fix a potential deadlock.
sysfs handlers should use rtnl_trylock() instead of rtnl_lock().

Following files can be read without acquiring RTNL :

- /sys/class/net/bonding_masters
- /sys/class/net/<name>/bonding/slaves
- /sys/class/net/<name>/bonding/queue_id
====================

Link: https://lore.kernel.org/r/20240408190437.2214473-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:31:48 -07:00
Eric Dumazet
662e451d9a bonding: no longer use RTNL in bonding_show_queue_id()
Annotate lockless reads of slave->queue_id.

Annotate writes of slave->queue_id.

Switch bonding_show_queue_id() to rcu_read_lock()
and bond_for_each_slave_rcu().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240408190437.2214473-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:31:45 -07:00
Eric Dumazet
d67fed98ca bonding: no longer use RTNL in bonding_show_slaves()
Slave devices are already RCU protected, simply
switch to bond_for_each_slave_rcu(),

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240408190437.2214473-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:31:45 -07:00
Eric Dumazet
6c5d17143f bonding: no longer use RTNL in bonding_show_bonds()
netdev structures are already RCU protected.

Change bond_init() and bond_uninit() to use RCU
enabled list_add_tail_rcu() and list_del_rcu().

Then bonding_show_bonds() can use rcu_read_lock()
while iterating through bn->dev_list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240408190437.2214473-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:31:45 -07:00
Kuan-Wei Chiu
d034d02de8 net: sched: cake: Optimize the number of function calls and branches in heap construction
When constructing a heap, heapify operations are required on all
non-leaf nodes. Thus, determining the index of the first non-leaf node
is crucial. In a heap, the left child's index of node i is 2 * i + 1
and the right child's index is 2 * i + 2. Node CAKE_MAX_TINS *
CAKE_QUEUES / 2 has its left and right children at indexes
CAKE_MAX_TINS * CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2,
respectively, which are beyond the heap's range, indicating it as a
leaf node. Conversely, node CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 has a
left child at index CAKE_MAX_TINS * CAKE_QUEUES - 1, confirming its
non-leaf status. The loop should start from it since it's not a leaf
node.

By starting the loop from CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1, we
minimize function calls and branch condition evaluations. This
adjustment theoretically reduces two function calls (one for
cake_heapify() and another for cake_heap_get_backlog()) and five branch
evaluations (one for iterating all non-leaf nodes, one within
cake_heapify()'s while loop, and three more within the while loop
with if conditions).

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20240408174716.751069-1-visitorckw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:30:14 -07:00
Asbjørn Sloth Tønnesen
545d95e5f1 cxgb4: flower: use NL_SET_ERR_MSG_MOD for validation errors
Replace netdev_{warn,err} with NL_SET_ERR_MSG_{FMT_,}MOD
to better inform the user about the problem.

Only compile-tested, no access to HW.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://lore.kernel.org/r/20240408165506.94483-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:11:04 -07:00
Catalin Popescu
9ef9ecfa9e net: phy: dp8382x: keep WOL settings across suspends
Unlike other ethernet PHYs from TI, PHY dp8382x has WOL enabled
at reset. The driver explicitly disables WOL in config_init callback
which is called during init and during resume from suspend. Hence,
WOL is unconditionally disabled during resume, even if it was enabled
before the suspend. We make sure that WOL configuration is persistent
across suspends.

Signed-off-by: Catalin Popescu <catalin.popescu@leica-geosystems.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240408082602.3654090-1-catalin.popescu@leica-geosystems.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 16:57:55 -07:00
Paolo Abeni
6a053f07d5 Merge branch 'net-phy-micrel-lan8814-enable-ptp_pf_perout'
Horatiu Vultur says:

====================
net: phy: micrel: lan8814: Enable PTP_PF_PEROUT

Add support for PTP_PF_PEROUT to lan8814. First patch just enables
the LTC at probe time, such that it is not required to enable
timestamping to have the LTC enabled. While the second patch actually
adds support for PTP_PF_PEROUT.
====================

Link: https://lore.kernel.org/r/20240408064432.3881636-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 13:34:06 +02:00
Horatiu Vultur
9e63941b89 net: phy: micrel: lan8814: Add support for PTP_PF_PEROUT
Lan8814 has 24 GPIOs but only 2 GPIOs (GPIO 0 and GPIO 1) can be
configured to generate period signals. And there are 2 events (EVENT_A
and EVENT_B) but these events are hardcoded to the GPIO 0 and GPIO 1.
These events are used to generate period signals. It is possible to
configure the length, the start time and the period of the signal by
configuring the event.

These events are generated by comparing the target time with the PHC
time. In case the PHC time is changed to a value bigger than the target
time + reload time, then it would generate only 1 event and then it
would stop because target time + reload time is smaller than PHC time.
Therefore it is required to change also the target time every time when
the PHC is changed. The same will apply also when the PHC time is
changed to a smaller value.

This was tested using:
testptp -i 1 -L 1,2
testptp -i 1 -p 1000000000 -w 200000000

Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 13:34:03 +02:00
Horatiu Vultur
9f6b3a4981 net: phy: micrel: lan8814: Enable LTC at probe time
The LTC for lan8814 was enabled only if timestamping was enabled,
otherwise it would be stopped. Meaning that LTC will not increase by
itself. This might break other features that don't required timestamping
like generating 1PPS. Therefore enable the LTC at probe time.

Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 13:34:03 +02:00
Sascha Hauer
220d63f249 dt-bindings: net: rockchip-dwmac: use rgmii-id in example
The dwmac supports specifying the RGMII clock delays, but it is
recommended to use rgmii-id and to specify the delays in the phy node
instead [1].

Change the example accordingly to no longer promote this undesired
setting.

[1] https://lore.kernel.org/all/1a0de7b4-f0f7-4080-ae48-f5ffa9e76be3@lunn.ch/

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Dragan Simic <dsimic@manjaro.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240408-rockchip-dwmac-rgmii-id-binding-v1-1-3886d1a8bd54@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 13:29:02 +02:00
Paolo Abeni
d2fd6cf39a Merge branch 'tcp-fix-isn-selection-in-timewait-syn_recv'
Eric Dumazet says:

====================
tcp: fix ISN selection in TIMEWAIT -> SYN_RECV

TCP can transform a TIMEWAIT socket into a SYN_RECV one from
a SYN packet, and the ISN of the SYNACK packet is normally
generated using TIMEWAIT tw_snd_nxt.

This SYN packet also bypasses normal checks against listen queue
being full or not.

Unfortunately this has been broken almost one decade ago.

This series fixes the issue, in two patches.

First patch refactors code to add tcp_tw_isn as a parameter
to ->route_req(), to make the second patch smaller.

Second patch fixes the issue, by no longer using TCP_SKB_CB(skb)
to store the tcp_tw_isn.

Following packetdrill test passes after this series:

// Set up a server listening socket.
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

// Establish connection
   +0 < S 0:0(0) win 32792 <mss 1460,nop,nop,sackOK>
   +0 > S. 0:0(0) ack 1    <mss 1460,nop,nop,sackOK>
 +.01 < . 1:1(0) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

// We close(), send a FIN, and get an ACK and FIN, in order to get into TIME_WAIT.

 +.01 close(4) = 0
   +0 > F. 1:1(0) ack 1
 +.01 < F. 1:1(0) ack 2 win 32792
   +0 > . 2:2(0) ack 2

// SYN hitting a TIME_WAIT -> should use an ISN based on TIMEWAIT tw_snd_nxt

 +.01 < S 1000:1000(0) win 65535 <mss 1460,nop,nop,sackOK>
   +0 > S. 65539:65539(0) ack 1001 <mss 1460,nop,nop,sackOK>
====================

Link: https://lore.kernel.org/r/20240407093322.3172088-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 11:47:44 +02:00
Eric Dumazet
41eecbd712 tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field
TCP can transform a TIMEWAIT socket into a SYN_RECV one from
a SYN packet, and the ISN of the SYNACK packet is normally
generated using TIMEWAIT tw_snd_nxt :

tcp_timewait_state_process()
...
    u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
    if (isn == 0)
        isn++;
    TCP_SKB_CB(skb)->tcp_tw_isn = isn;
    return TCP_TW_SYN;

This SYN packet also bypasses normal checks against listen queue
being full or not.

tcp_conn_request()
...
       __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
...
        /* TW buckets are converted to open requests without
         * limitations, they conserve resources and peer is
         * evidently real one.
         */
        if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) {
                want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
                if (!want_cookie)
                        goto drop;
        }

This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb.

Unfortunately this field has been accidentally cleared
after the call to tcp_timewait_state_process() returning
TCP_TW_SYN.

Using a field in TCP_SKB_CB(skb) for a temporary state
is overkill.

Switch instead to a per-cpu variable.

As a bonus, we do not have to clear tcp_tw_isn in TCP receive
fast path.
It is temporarily set then cleared only in the TCP_TW_SYN dance.

Fixes: 4ad19de877 ("net: tcp6: fix double call of tcp_v6_fill_cb()")
Fixes: eeea10b83a ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 11:47:40 +02:00
Eric Dumazet
b9e8104058 tcp: propagate tcp_tw_isn via an extra parameter to ->route_req()
tcp_v6_init_req() reads TCP_SKB_CB(skb)->tcp_tw_isn to find
out if the request socket is created by a SYN hitting a TIMEWAIT socket.

This has been buggy for a decade, lets directly pass the information
from tcp_conn_request().

This is a preparatory patch to make the following one easier to review.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 11:47:40 +02:00
Paolo Abeni
1c25fe9a04 Merge branch 'add-support-for-flower-actions-mirred-and-redirect'
Daniel Machon says:

====================
Add support for flower actions mirred and redirect

This series adds support for the two tc flower actions mirred and
redirect. Both actions are implemented by means of a port mask and a
mask mode. The mask mode controls how the mask is applied, and together
they are used by the switch to make a forwarding decision. Both actions
are configurable via the IS0 or IS2 VCAP's (ingress stage 0 and 2,
respectively).

Patch #1: adds support for tc flower mirred action.
Patch #2: adds support for tc flower redirect action.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================

Link: https://lore.kernel.org/r/20240405-mirror-redirect-actions-v2-0-875d4c1927c8@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 10:45:13 +02:00
Daniel Machon
1164b8e0b1 net: sparx5: add support for tc flower redirect action
Add support for the flower redirect action. Two VCAP actions are encoded
in the rule - one for the port mask, and one for the port mask mode.
When the rule is hit, the port mask is used as the final destination
set, replacing all other port masks.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 10:45:10 +02:00
Daniel Machon
48ba00da2e net: sparx5: add support for tc flower mirred action.
Add support for tc flower mirred action. Two VCAP actions are encoded in
the rule - one for the port mask, and one for the port mask mode. When
the rule is hit, the destination mask is OR'ed with the port mask.

Also add new VCAP function for supporting 72-bit wide actions, and a tc
helper for setting the port forwarding mask.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 10:45:10 +02:00
Paolo Abeni
74bd5dbe1b Merge branch 'support-icssg-based-ethernet-on-am65x-sr1-0-devices'
Diogo Ivo says:

====================
Support ICSSG-based Ethernet on AM65x SR1.0 devices

This series extends the current ICSSG-based Ethernet driver to support
AM65x Silicon Revision 1.0 devices.

Notable differences between the Silicon Revisions are that there is
no TX core in SR1.0 with this being handled by the firmware, requiring
extra DMA channels to manage communication with the firmware (with the
firmware being different as well) and in the packet classifier.

The motivation behind it is that a significant number of Siemens
devices containing SR1.0 silicon have been deployed in the field
and need to be supported and updated to newer kernel versions
without losing functionality.

This series is based on TI's 5.10 SDK [1].

The fifth version of this patch series can be found in [2].

Compared to the last version of the patch set there are only changes in
patch 05/10, where the fields of a struct are now explicitly declared as
__le32 so that we can properly interpret them.

Both of the problems mentioned in v4 have been addressed by disabling
those functionalities, meaning that this driver currently only supports
one TX queue and does not support a 100Mbit/s half-duplex connection.
The removal of these features has been commented in the appropriate
locations in the code.

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y
[2]: https://lore.kernel.org/netdev/20240326110709.26165-1-diogo.ivo@siemens.com/
====================

Link: https://lore.kernel.org/r/20240403104821.283832-1-diogo.ivo@siemens.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:32 +02:00
Diogo Ivo
e654b85a69 net: ti: icssg-prueth: Add ICSSG Ethernet driver for AM65x SR1.0 platforms
Add the PRUeth driver for the ICSSG subsystem found in AM65x SR1.0 devices.
The main differences that set SR1.0 and SR2.0 apart are the missing TXPRU
core in SR1.0, two extra DMA channels for management purposes and different
firmware that needs to be configured accordingly.

Based on the work of Roger Quadros, Vignesh Raghavendra and
Grygorii Strashko in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
ce95cb4c8d net: ti: icssg-prueth: Modify common functions for SR1.0
Some parts of the logic differ only slightly between Silicon Revisions.
In these cases add the bits that differ to a common function that
executes those bits conditionally based on the Silicon Revision.

Based on the work of Roger Quadros, Vignesh Raghavendra and
Grygorii Strashko in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
0a74a9de79 net: ti: icssg-prueth: Add functions to configure SR1.0 packet classifier
Add the functions to configure the SR1.0 packet classifier.

Based on the work of Roger Quadros in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
604e603d73 net: ti: icssg-prueth: Adjust the number of TX channels for SR1.0
As SR1.0 uses the current higher priority channel to send commands to
the firmware, take this into account when setting/getting the number
of channels to/from the user.

Based on the work of Roger Quadros in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
95c2e68933 net: ti: icssg-prueth: Adjust IPG configuration for SR1.0
Correctly adjust the IPG based on the Silicon Revision.

Based on the work of Roger Quadros, Vignesh Raghavendra
and Grygorii Strashko in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
8623dea207 net: ti: icssg-prueth: Add SR1.0-specific description bits
Add a field to distinguish between SR1.0 and SR2.0 in the driver
as well as the necessary structures to program SR1.0.

Based on the work of Roger Quadros in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
6d6a5751cd net: ti: icssg-prueth: Add SR1.0-specific configuration bits
Define the firmware configuration structure and commands needed to
communicate with SR1.0 firmware, as well as SR1.0 buffer information
where it differs from SR2.0.

Based on the work of Roger Quadros, Murali Karicheri and
Grygorii Strashko in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
e2dc7bfd67 net: ti: icssg-prueth: Move common functions into a separate file
In order to allow code sharing between Silicon Revisions 1.0 and 2.0
move all functions that can be shared into a common file. This commit
introduces no functional changes.

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:29 +02:00
Diogo Ivo
e1900d7ba9 eth: Move IPv4/IPv6 multicast address bases to their own symbols
As these addresses can be useful outside of checking if an address
is a multicast address (for example in device drivers) make them
accessible to users of etherdevice.h to avoid code duplication.

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:28 +02:00
Diogo Ivo
dc073430db dt-bindings: net: Add support for AM65x SR1.0 in ICSSG
Silicon Revision 1.0 of the AM65x came with a slightly different ICSSG
support: Only 2 PRUs per slice are available and instead 2 additional
DMA channels are used for management purposes. We have no restrictions
on specified PRUs, but the DMA channels need to be adjusted.

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-09 09:47:28 +02:00
Dan Carpenter
87c33315af net: phy: air_en8811h: fix some error codes
These error paths accidentally return "ret" which is zero/success
instead of the correct error code.

Fixes: 71e7943011 ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/7ef2e230-dfb7-4a77-8973-9e5be1a99fc2@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-08 19:46:16 -07:00
Allen Pais
775d2e2b30 archnet: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/net/archnet/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo <tj@kernel.org>
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Link: https://lore.kernel.org/r/20240403162306.20258-1-apais@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-08 19:44:09 -07:00
Bjorn Helgaas
75f16e06df igc: Remove redundant runtime resume for ethtool ops
8c5ad0dae9 ("igc: Add ethtool support") added ethtool_ops.begin() and
.complete(), which used pm_runtime_get_sync() to resume suspended devices
before any ethtool_ops callback and allow suspend after it completed.

Subsequently, f32a213765 ("ethtool: runtime-resume netdev parent before
ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path,
so the device is resumed before any ethtool_ops callback even if the driver
didn't supply a .begin() callback.

Remove the .begin() and .complete() callbacks, which are now redundant
because dev_ethtool() already resumes the device.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-08 13:25:39 -07:00
Bjorn Helgaas
461359c4f3 igb: Remove redundant runtime resume for ethtool_ops
749ab2cd12 ("igb: add basic runtime PM support") added
ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to
resume suspended devices before any ethtool_ops callback and allow suspend
after it completed.

Subsequently, f32a213765 ("ethtool: runtime-resume netdev parent before
ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path,
so the device is resumed before any ethtool_ops callback even if the driver
didn't supply a .begin() callback.

Remove the .begin() and .complete() callbacks, which are now redundant
because dev_ethtool() already resumes the device.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-08 13:25:39 -07:00
Bjorn Helgaas
b2c289415b e1000e: Remove redundant runtime resume for ethtool_ops
e60b22c5b7 ("e1000e: fix accessing to suspended device") added
ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to
resume suspended devices before any ethtool_ops callback and allow suspend
after it completed.

3ef672ab18 ("e1000e: ethtool unnecessarily takes device out of RPM
suspend") removed ethtool_ops.begin() and .complete() and instead did
pm_runtime_get_sync() only in the individual ethtool_ops callbacks that
access device registers.

Subsequently, f32a213765 ("ethtool: runtime-resume netdev parent before
ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path,
so the device is resumed before *any* ethtool_ops callback, as it was
before 3ef672ab18.

Remove most runtime resumes from ethtool_ops, which are now redundant
because the resume has already been done by dev_ethtool().  This is
essentially a revert of 3ef672ab18 ("e1000e: ethtool unnecessarily takes
device out of RPM suspend").

There are a couple subtleties:

  - Prior to 3ef672ab18, the device was resumed only for the duration of
    a single ethtool callback.  3ef672ab18 changed e1000_set_phys_id() so
    the device was resumed for ETHTOOL_ID_ACTIVE and remained resumed until
    a subsequent callback for ETHTOOL_ID_INACTIVE.  Preserve that part of
    3ef672ab18 so the device will not be runtime suspended while in the
    ETHTOOL_ID_ACTIVE state.

  - 3ef672ab18 added "if (!pm_runtime_suspended())" in before reading the
    STATUS register in e1000_get_settings().  This was racy and is now
    unnecessary because dev_ethtool() has resumed the device already, so
    revert that.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-08 13:25:39 -07:00
Heiner Kallweit
39f59c72ad r8169: add support for RTL8168M
A user reported an unknown chip version. According to the r8168 vendor
driver it's called RTL8168M, but handling is identical to RTL8168H.
So let's simply treat it as RTL8168H.

Tested-by: Евгений <octobergun@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08 14:25:53 +01:00
David S. Miller
358961f51f Merge branch 'devlink-io-eqs'
Parav Pandit says:

====================
devlink: Add port function attribute for IO EQs

Currently, PCI SFs and VFs use IO event queues to deliver netdev per
channel events. The number of netdev channels is a function of IO
event queues. In the second scenario of an RDMA device, the
completion vectors are also a function of IO event queues. Currently, an
administrator on the hypervisor has no means to provision the number
of IO event queues for the SF device or the VF device. Device/firmware
determines some arbitrary value for these IO event queues. Due to this,
the SF netdev channels are unpredictable, and consequently, the
performance is too.

This short series introduces a new port function attribute: max_io_eqs.
The goal is to provide administrators at the hypervisor level with the
ability to provision the maximum number of IO event queues for a
function. This gives the control to the administrator to provision
right number of IO event queues and have predictable performance.

Examples of when an administrator provisions (set) maximum number of
IO event queues when using switchdev mode:

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10

  $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20

This sets the corresponding maximum IO event queues of the function
before it is enumerated. Thus, when the VF/SF driver reads the
capability from the device, it sees the value provisioned by the
hypervisor. The driver is then able to configure the number of channels
for the net device, as well as the number of completion vectors
for the RDMA device. The device/firmware also honors the provisioned
value, hence any VF/SF driver attempting to create IO EQs
beyond provisioned value results in an error.

With above setting now, the administrator is able to achieve the 2x
performance on SFs with 20 channels. In second example when SF was
provisioned for a container with 2 cpus, the administrator provisioned only
2 IO event queues, thereby saving device resources.

With the above settings now in place, the administrator achieved 2x
performance with the SF device with 20 channels. In the second example,
when the SF was provisioned for a container with 2 CPUs, the administrator
provisioned only 2 IO event queues, thereby saving device resources.

changelog:
v2->v3:
- limited to 80 chars per line in devlink
- fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
  on error path
v1->v2:
- limited comment to 80 chars per line in header file
- fixed set function variables for reverse christmas tree
- fixed comments from Kalesh
- fixed missing kfree in get call
- returning error code for get cmd failure
- fixed error msg copy paste error in set on cmd failure
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08 14:10:45 +01:00
Parav Pandit
93197c7c50 mlx5/core: Support max_io_eqs for a function
Implement get and set for the maximum IO event queues for SF and VF.
This enables administrator on the hypervisor to control the maximum
IO event queues which are typically used to derive the maximum and
default number of net device channels or rdma device completion vectors.

Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08 14:10:45 +01:00
Parav Pandit
5af3e3876d devlink: Support setting max_io_eqs
Many devices send event notifications for the IO queues,
such as tx and rx queues, through event queues.

Enable a privileged owner, such as a hypervisor PF, to set the number
of IO event queues for the VF and SF during the provisioning stage.

example:
Get maximum IO event queues of the VF device::

  $ devlink port show pci/0000:06:00.0/2
  pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
      function:
          hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10

Set maximum IO event queues of the VF device::

  $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32

  $ devlink port show pci/0000:06:00.0/2
  pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
      function:
          hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08 14:10:45 +01:00
Eric Dumazet
4308811ba9 net: display more skb fields in skb_dump()
Print these additional fields in skb_dump() to ease debugging.

- mac_len
- csum_start (in v2, at Willem suggestion)
- csum_offset (in v2, at Willem suggestion)
- priority
- mark
- alloc_cpu
- vlan_all
- encapsulation
- inner_protocol
- inner_mac_header
- inner_network_header
- inner_transport_header

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08 14:06:31 +01:00