Commit Graph

1370072 Commits

Author SHA1 Message Date
Ryan Wanner
dce32ece3b net: cadence: macb: Expose REFCLK as a device tree property
The RMII and RGMII can both support internal or external provided
REFCLKs 50MHz and 125MHz respectively. Since this is dependent on
the board that the SoC is on this needs to be set via the device tree.

This property flag is checked in the MACB DT node so the REFCLK cap is
configured the correct way for the RMII or RGMII is configured on the
board.

Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
Link: https://patch.msgid.link/7f9b65896d6b7b48275bc527b72a16347f8ce10a.1752510727.git.Ryan.Wanner@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 17:28:37 -07:00
Ryan Wanner
1b7531c094 dt-bindings: net: cdns,macb: Add external REFCLK property
REFCLK can be provided by an external source so this should be exposed
by a DT property. The REFCLK is used for RMII and in some SoCs that use
this driver the RGMII 125MHz clk can also be provided by an external
source.

Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/d558467c4d5b27fb3135ffdead800b14cd9c6c0a.1752510727.git.Ryan.Wanner@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 17:28:36 -07:00
Jakub Kicinski
27b0286d00 Merge branch 'selftest-net-add-selftest-for-netpoll'
Breno Leitao says:

====================
selftest: net: Add selftest for netpoll

I am submitting a new selftest for the netpoll subsystem specifically
targeting the case where the RX is polling in the TX path, which is
a case that we don't have any test in the tree today. This is done when
netpoll_poll_dev() called, and this test creates a scenario when that is
probably.

The test does the following:

 1) Configuring a single RX/TX queue to increase contention on the
    interface.
 2) Generating background traffic to saturate the network, mimicking
    real-world congestion.
 3) Sending netconsole messages to trigger netpoll polling and monitor
    its behavior.
 4) Using dynamic netconsole targets via configfs, with the ability to
    delete and recreate targets during the test.
 5) Running bpftrace in parallel to verify that netpoll_poll_dev() is
    called when expected. If it is called, then the test passes,
    otherwise the test is marked as skipped.

In order to achieve it, I stole Jakub's bpftrace helper from [1], and
did some small changes that I found useful to use the helper.

So, this patchset basically contains:

 1) The code stolen from Jakub
 2) Improvements on bpftrace() helper
 3) The selftest itself

Link: https://lore.kernel.org/all/20250421222827.283737-22-kuba@kernel.org/ [1]
====================

Link: https://patch.msgid.link/20250714-netpoll_test-v7-0-c0220cfaa63e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 17:25:51 -07:00
Breno Leitao
b3019343e4 selftests: net: add netpoll basic functionality test
Add a basic selftest for the netpoll polling mechanism, specifically
targeting the netpoll poll() side.

The test creates a scenario where network transmission is running at
maximum speed, and netpoll needs to poll the NIC. This is achieved by:

  1. Configuring a single RX/TX queue to create contention
  2. Generating background traffic to saturate the interface
  3. Sending netconsole messages to trigger netpoll polling
  4. Using dynamic netconsole targets via configfs
  5. Delete and create new netconsole targets after some messages
  6. Start a bpftrace in parallel to make sure netpoll_poll_dev() is
     called
  7. If bpftrace exists and netpoll_poll_dev() was called, stop.

The test validates a critical netpoll code path by monitoring traffic
flow and ensuring netpoll_poll_dev() is called when the normal TX path
is blocked.

This addresses a gap in netpoll test coverage for a path that is
tricky for the network stack.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250714-netpoll_test-v7-3-c0220cfaa63e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 17:25:49 -07:00
Breno Leitao
fd2aadcefb selftests: drv-net: Strip '@' prefix from bpftrace map keys
The '@' prefix in bpftrace map keys is specific to bpftrace and can be
safely removed when processing results. This patch modifies the bpftrace
utility to strip the '@' from map keys before storing them in the result
dictionary, making the keys more consistent with Python conventions.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250714-netpoll_test-v7-2-c0220cfaa63e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 17:25:49 -07:00
Jakub Kicinski
3c561c547c selftests: drv-net: add helper/wrapper for bpftrace
bpftrace is very useful for low level driver testing. perf or trace-cmd
would also do for collecting data from tracepoints, but they require
much more post-processing.

Add a wrapper for running bpftrace and sanitizing its output.
bpftrace has JSON output, which is great, but it prints loose objects
and in a slightly inconvenient format. We have to read the objects
line by line, and while at it return them indexed by the map name.

Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250714-netpoll_test-v7-1-c0220cfaa63e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 17:25:49 -07:00
Yue Haibing
6c628ed95e ipv6: mcast: Simplify mld_clear_{report|query}()
Use __skb_queue_purge() instead of re-implementing it. Note that it uses
kfree_skb_reason() instead of kfree_skb() internally, and pass
SKB_DROP_REASON_QUEUE_PURGE drop reason to the kfree_skb tracepoint.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250715120709.3941510-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 16:14:42 -07:00
Stefano Garzarella
47ee43e4bf vsock/test: fix vsock_ioctl_int() check for unsupported ioctl
`vsock_do_ioctl` returns -ENOIOCTLCMD if an ioctl support is not
implemented, like for SIOCINQ before commit f7c7226592 ("vsock: Add
support for SIOCINQ ioctl"). In net/socket.c, -ENOIOCTLCMD is re-mapped
to -ENOTTY for the user space. So, our test suite, without that commit
applied, is failing in this way:

    34 - SOCK_STREAM ioctl(SIOCINQ) functionality...ioctl(21531): Inappropriate ioctl for device

Return false in vsock_ioctl_int() to skip the test in this case as well,
instead of failing.

Fixes: 53548d6bff ("test/vsock: Add retry mechanism to ioctl wrapper")
Cc: niuxuewei.nxw@antgroup.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Link: https://patch.msgid.link/20250715093233.94108-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 16:14:00 -07:00
Paolo Abeni
7eeabfb237 tcp: fix UaF in tcp_prune_ofo_queue()
The CI reported a UaF in tcp_prune_ofo_queue():

BUG: KASAN: slab-use-after-free in tcp_prune_ofo_queue+0x55d/0x660
Read of size 4 at addr ffff8880134729d8 by task socat/20348

CPU: 0 UID: 0 PID: 20348 Comm: socat Not tainted 6.16.0-rc5-virtme #1 PREEMPT(full)
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0x82/0xd0
 print_address_description.constprop.0+0x2c/0x400
 print_report+0xb4/0x270
 kasan_report+0xca/0x100
 tcp_prune_ofo_queue+0x55d/0x660
 tcp_try_rmem_schedule+0x855/0x12e0
 tcp_data_queue+0x4dd/0x2260
 tcp_rcv_established+0x5e8/0x2370
 tcp_v4_do_rcv+0x4ba/0x8c0
 __release_sock+0x27a/0x390
 release_sock+0x53/0x1d0
 tcp_sendmsg+0x37/0x50
 sock_write_iter+0x3c1/0x520
 vfs_write+0xc09/0x1210
 ksys_write+0x183/0x1d0
 do_syscall_64+0xc1/0x380
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fcf73ef2337
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffd4f924708 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcf73ef2337
RDX: 0000000000002000 RSI: 0000555f11d1a000 RDI: 0000000000000008
RBP: 0000555f11d1a000 R08: 0000000000002000 R09: 0000000000000000
R10: 0000000000000040 R11: 0000000000000246 R12: 0000000000000008
R13: 0000000000002000 R14: 0000555ee1a44570 R15: 0000000000002000
 </TASK>

Allocated by task 20348:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 __kasan_slab_alloc+0x59/0x70
 kmem_cache_alloc_node_noprof+0x110/0x340
 __alloc_skb+0x213/0x2e0
 tcp_collapse+0x43f/0xff0
 tcp_try_rmem_schedule+0x6b9/0x12e0
 tcp_data_queue+0x4dd/0x2260
 tcp_rcv_established+0x5e8/0x2370
 tcp_v4_do_rcv+0x4ba/0x8c0
 __release_sock+0x27a/0x390
 release_sock+0x53/0x1d0
 tcp_sendmsg+0x37/0x50
 sock_write_iter+0x3c1/0x520
 vfs_write+0xc09/0x1210
 ksys_write+0x183/0x1d0
 do_syscall_64+0xc1/0x380
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 20348:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0x38/0x50
 kmem_cache_free+0x149/0x330
 tcp_prune_ofo_queue+0x211/0x660
 tcp_try_rmem_schedule+0x855/0x12e0
 tcp_data_queue+0x4dd/0x2260
 tcp_rcv_established+0x5e8/0x2370
 tcp_v4_do_rcv+0x4ba/0x8c0
 __release_sock+0x27a/0x390
 release_sock+0x53/0x1d0
 tcp_sendmsg+0x37/0x50
 sock_write_iter+0x3c1/0x520
 vfs_write+0xc09/0x1210
 ksys_write+0x183/0x1d0
 do_syscall_64+0xc1/0x380
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff888013472900
 which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 216 bytes inside of
 freed 232-byte region [ffff888013472900, ffff8880134729e8)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13472
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x80000000000040(head|node=0|zone=1)
page_type: f5(slab)
raw: 0080000000000040 ffff88800198fb40 ffffea0000347b10 ffffea00004f5290
raw: 0000000000000000 0000000000120012 00000000f5000000 0000000000000000
head: 0080000000000040 ffff88800198fb40 ffffea0000347b10 ffffea00004f5290
head: 0000000000000000 0000000000120012 00000000f5000000 0000000000000000
head: 0080000000000001 ffffea00004d1c81 00000000ffffffff 00000000ffffffff
head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888013472880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888013472900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888013472980: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
                                                    ^
 ffff888013472a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888013472a80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb

Indeed tcp_prune_ofo_queue() is reusing the skb dropped a few lines
above. The caller wants to enqueue 'in_skb', lets check space vs the
latter.

Fixes: 1d2fbaad7c ("tcp: stronger sk_rcvbuf checks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: syzbot+865aca08c0533171bf6a@syzkaller.appspotmail.com
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/b78d2d9bdccca29021eed9a0e7097dd8dc00f485.1752567053.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 16:13:26 -07:00
Jakub Kicinski
511ad4c264 selftests: packetdrill: correct the expected timing in tcp_rcv_big_endseq
Commit f5fda1a868 ("selftests/net: packetdrill: add tcp_rcv_big_endseq.pkt")
added this test recently, but it's failing with:

  # tcp_rcv_big_endseq.pkt:41: error handling packet: timing error: expected outbound packet at 1.230105 sec but happened at 1.190101 sec; tolerance 0.005046 sec
  # script packet:  1.230105 . 1:1(0) ack 54001 win 0
  # actual packet:  1.190101 . 1:1(0) ack 54001 win 0

It's unclear why the test expects the ack to be delayed.
Correct it.

Link: https://patch.msgid.link/20250715142849.959444-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 15:05:56 -07:00
Gal Pressman
410b0ace88 ethtool: Don't check for RXFH fields conflict when no input_xfrm is requested
The requirement of ->get_rxfh_fields() in ethtool_set_rxfh() is there to
verify that we have no conflict of input_xfrm with the RSS fields
options, there is no point in doing it if input_xfrm is not
supported/requested.

This is under the assumption that a driver that supports input_xfrm will
also support ->get_rxfh_fields(), so add a WARN_ON() to
ethtool_check_ops() to verify it, and remove the op NULL check.

This fixes the following error in mlx4_en, which doesn't support
getting/setting RXFH fields.
$ ethtool --set-rxfh-indir eth2 hfunc xor
Cannot set RX flow hash configuration: Operation not supported

Fixes: 72792461c8 ("net: ethtool: don't mux RXFH via rxnfc callbacks")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250715140754.489677-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 15:03:56 -07:00
Hangbin Liu
3047957cc7 selftests: rtnetlink: fix addrlft test flakiness on power-saving systems
Jakub reported that the rtnetlink test for the preferred lifetime of an
address has become quite flaky. The issue started appearing around the 6.16
merge window in May, and the test fails with:

    FAIL: preferred_lft addresses remaining

The flakiness might be related to power-saving behavior, as address
expiration is handled by a "power-efficient" workqueue.

To address this, use slowwait to check more frequently whether the address
still exists. This reduces the likelihood of the system entering a low-power
state during the test, improving reliability.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250715043459.110523-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16 15:03:26 -07:00
Jakub Kicinski
c3886ccaad Merge branch 'net-hns3-use-seq_file-for-debugfs'
Jijie Shao says:

====================
net: hns3: use seq_file for debugfs

Arnd reported that there are two build warning for on-stasck
buffer oversize. As Arnd's suggestion, using seq file way
to avoid the stack buffer or kmalloc buffer allocating.

v2: https://lore.kernel.org/20250711061725.225585-1-shaojijie@huawei.com
v1: https://lore.kernel.org/20250708130029.1310872-1-shaojijie@huawei.com
====================

Link: https://patch.msgid.link/20250714061037.2616413-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:07 -07:00
Jian Shen
b0aabb3b1e net: hns3: use seq_file for files in tx_bd_info/ and rx_bd_info/ in debugfs
This patch use seq_file for the following nodes:
tx_bd_queue_*/rx_bd_queue_*

This patch is the last modification to debugfs file.
Unused functions and variables are removed together.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-11-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:05 -07:00
Yonglong Liu
9e1545b488 net: hns3: use seq_file for files in common/ of hclge layer
This patch use seq_file for the following nodes:
mng_tbl/loopback/interrupt_info/reset_info/imp_info/ncl_config/
mac_tnl_status/service_task_info/vlan_config/ptp_info

This patch is the last modification to debugfs file
of hclge layer. Unused functions and variables
are removed together.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-10-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:05 -07:00
Jijie Shao
3945d94c9f net: hns3: use seq_file for files in fd/ in debugfs
This patch use seq_file for the following nodes:
fd_tcam/fd_counter

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-9-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:05 -07:00
Jijie Shao
2363145ad8 net: hns3: use seq_file for files in reg/ in debugfs
This patch use seq_file for the following nodes:
bios_common/ssu/igu_egu/rpu/ncsi/rtc/ppp/rcb/tqp/mac/dcb

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-8-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:05 -07:00
Yonglong Liu
00f9ea261d net: hns3: use seq_file for files in mac_list/ in debugfs
This patch use seq_file for the following nodes:
uc/mc

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-7-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:05 -07:00
Jian Shen
08a6476e28 net: hns3: use seq_file for files in tm/ in debugfs
Use seq_file for files in debugfs. This is the first
modification for reading in hclge_debugfs.c.

This patch use seq_file for the following nodes:
tm_nodes/tm_priority/tm_qset/tm_map/tm_pg/tm_port/tc_sch_info/
qos_pause_cfg/qos_pri_map/qos_dscp_map/qos_buf_cfg

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-6-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:04 -07:00
Jijie Shao
2b65524d10 net: hns3: use seq_file for files in common/ of hns3 layer
This patch use seq_file for the following nodes:
dev_info/coalesce_info/page_pool_info

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:04 -07:00
Jian Shen
eced3d1c41 net: hns3: use seq_file for files in queue/ in debugfs
This patch use seq_file for the following nodes:
rx_queue_info/queue_map

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:04 -07:00
Jian Shen
c557c18326 net: hns3: clean up the build warning in debugfs by use seq file
Arnd reported that there are two build warning for on-stasck
buffer oversize. As Arnd's suggestion, using seq file way
to avoid the stack buffer or kmalloc buffer allocating.

Reported-by: Arnd Bergmann <arnd@kernel.org>
Closes: https://lore.kernel.org/all/20250610092113.2639248-1-arnd@kernel.org/
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:04 -07:00
Jijie Shao
277ed0cc9d net: hns3: remove tx spare info from debugfs.
The tx spare info in debugfs is not very useful,
and there are related statistics available for troubleshooting.

This patch removes the tx spare info from debugfs.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:50:04 -07:00
Yue Haibing
ce6030afe4 ipv6: mcast: Remove unnecessary null check in ip6_mc_find_dev()
These is no need to check null for idev before return NULL.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250714081732.3109764-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:34:01 -07:00
Al Viro
5cc7fce349 don't open-code kernel_accept() in rds_tcp_accept_one()
rds_tcp_accept_one() starts with a pretty much verbatim
copy of kernel_accept().  Might as well use the real thing...

	That code went into mainline in 2009, kernel_accept()
had been added in Aug 2006, the copyright on rds/tcp_listen.c
is "Copyright (c) 2006 Oracle", so it's entirely possible
that it predates the introduction of kernel_accept().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://patch.msgid.link/20250713180134.GC1880847@ZenIV
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:19:54 -07:00
Andy Gospodarek
c34632dbb2 bnxt: move bnxt_hsi.h to include/linux/bnxt/hsi.h
This moves bnxt_hsi.h contents to a common location so it can be
properly referenced by bnxt_en, bnxt_re, and bnge.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250714170202.39688-1-gospo@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15 16:03:24 -07:00
Paolo Abeni
55e8757c69 Merge branch 'net-mctp-improved-bind-handling'
Matt Johnston says:

====================
net: mctp: Improved bind handling

This series improves a couple of aspects of MCTP bind() handling.

MCTP wasn't checking whether the same MCTP type was bound by multiple
sockets. That would result in messages being received by an arbitrary
socket, which isn't useful behaviour. Instead it makes more sense to
have the duplicate binds fail, the same as other network protocols.
An exception is made for more-specific binds to particular MCTP
addresses.

It is also useful to be able to limit a bind to only receive incoming
request messages (MCTP TO bit set) from a specific peer+type, so that
individual processes can communicate with separate MCTP peers. One
example is a PLDM firmware update requester, which will initiate
communication with a device, and then the device will connect back to the
requester process.

These limited binds are implemented by a connect() call on the socket
prior to bind. connect() isn't used in the general case for MCTP, since
a plain send() wouldn't provide the required MCTP tag argument for
addressing.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
====================

Link: https://patch.msgid.link/20250710-mctp-bind-v4-0-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:41 +02:00
Matt Johnston
e6d8e7dbc5 net: mctp: Add bind lookup test
Test the preference order of bound socket matches with a series of test
packets.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-8-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Matt Johnston
b7e28129b6 net: mctp: Test conflicts of connect() with bind()
The addition of connect() adds new conflict cases to test.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-7-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Matt Johnston
3549eb08e5 net: mctp: Allow limiting binds to a peer address
Prior to calling bind() a program may call connect() on a socket to
restrict to a remote peer address.

Using connect() is the normal mechanism to specify a remote network
peer, so we use that here. In MCTP connect() is only used for bound
sockets - send() is not available for MCTP since a tag must be provided
for each message.

The smctp_type must match between connect() and bind() calls.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-6-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Matt Johnston
1aeed732f4 net: mctp: Use hashtable for binds
Ensure that a specific EID (remote or local) bind will match in
preference to a MCTP_ADDR_ANY bind.

This adds infrastructure for binding a socket to receive messages from a
specific remote peer address, a future commit will expose an API for
this.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-5-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Matt Johnston
4ec4b7fc04 net: mctp: Add test for conflicting bind()s
Test pairwise combinations of bind addresses and types.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-4-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Matt Johnston
5000268c29 net: mctp: Treat MCTP_NET_ANY specially in bind()
When a specific EID is passed as a bind address, it only makes sense to
interpret with an actual network ID, so resolve that to the default
network at bind time.

For bind address of MCTP_ADDR_ANY, we want to be able to capture traffic
to any network and address, so keep the current behaviour of matching
traffic from any network interface (don't interpret MCTP_NET_ANY as
the default network ID).

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-3-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Matt Johnston
3954502377 net: mctp: Prevent duplicate binds
Disallow bind() calls that have the same arguments as existing bound
sockets.  Previously multiple sockets could bind() to the same
type/local address, with an arbitrary socket receiving matched messages.

This is only a partial fix, a future commit will define precedence order
for MCTP_ADDR_ANY versus specific EID bind(), which are allowed to exist
together.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-2-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Matt Johnston
3558ab79a2 net: mctp: mctp_test_route_extaddr_input cleanup
The sock was not being released. Other than leaking, the stale socket
will conflict with subsequent bind() calls in unrelated MCTP tests.

Fixes: 46ee16462f ("net: mctp: test: Add extaddr routing output test")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-1-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 12:08:39 +02:00
Yue Haibing
a8594c956c ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()
Avoid duplicate non-null pointer check for pmc in mld_del_delrec().
No functional changes.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250714081949.3109947-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15 11:02:53 +02:00
Jakub Kicinski
06baf9bfa6 Merge branch 'tcp-receiver-changes'
Eric Dumazet says:

====================
tcp: receiver changes

Before accepting an incoming packet:

- Make sure to not accept a packet beyond advertized RWIN.
  If not, increment a new SNMP counter (LINUX_MIB_BEYOND_WINDOW)

- ooo packets should update rcv_mss and tp->scaling_ratio.

- Make sure to not accept packet beyond sk_rcvbuf limit.

This series includes three associated packetdrill tests.
====================

Link: https://patch.msgid.link/20250711114006.480026-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:43 -07:00
Eric Dumazet
906893cf2c selftests/net: packetdrill: add tcp_rcv_toobig.pkt
Check that TCP receiver behavior after "tcp: stronger sk_rcvbuf checks"

Too fat packet is dropped unless receive queue is empty.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:43 -07:00
Eric Dumazet
1d2fbaad7c tcp: stronger sk_rcvbuf checks
Currently, TCP stack accepts incoming packet if sizes of receive queues
are below sk->sk_rcvbuf limit.

This can cause memory overshoot if the packet is big, like an 1/2 MB
BIG TCP one.

Refine the check to take into account the incoming skb truesize.

Note that we still accept the packet if the receive queue is empty,
to not completely freeze TCP flows in pathological conditions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:43 -07:00
Eric Dumazet
75dff0584c tcp: add const to tcp_try_rmem_schedule() and sk_rmem_schedule() skb
These functions to not modify the skb, add a const qualifier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:43 -07:00
Eric Dumazet
445e0cc38d selftests/net: packetdrill: add tcp_ooo_rcv_mss.pkt
We make sure tcpi_rcv_mss and tp->scaling_ratio
are correctly updated if no in-order packet has been received yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:43 -07:00
Eric Dumazet
38d7e44433 tcp: call tcp_measure_rcv_mss() for ooo packets
tcp_measure_rcv_mss() is used to update icsk->icsk_ack.rcv_mss
(tcpi_rcv_mss in tcp_info) and tp->scaling_ratio.

Calling it from tcp_data_queue_ofo() makes sure these
fields are updated, and permits a better tuning
of sk->sk_rcvbuf, in the case a new flow receives many ooo
packets.

Fixes: dfa2f04833 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:43 -07:00
Eric Dumazet
f5fda1a868 selftests/net: packetdrill: add tcp_rcv_big_endseq.pkt
This test checks TCP behavior when receiving a packet beyond the window.

It checks the new TcpExtBeyondWindow SNMP counter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:43 -07:00
Eric Dumazet
6c758062c6 tcp: add LINUX_MIB_BEYOND_WINDOW
Add a new SNMP MIB : LINUX_MIB_BEYOND_WINDOW

Incremented when an incoming packet is received beyond the
receiver window.

nstat -az | grep TcpExtBeyondWindow

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:42 -07:00
Eric Dumazet
9ca48d616e tcp: do not accept packets beyond window
Currently, TCP accepts incoming packets which might go beyond the
offered RWIN.

Add to tcp_sequence() the validation of packet end sequence.

Add the corresponding check in the fast path.

We relax this new constraint if the receive queue is empty,
to not freeze flows from buggy peers.

Add a new drop reason : SKB_DROP_REASON_TCP_INVALID_END_SEQUENCE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:41:15 -07:00
Arnd Bergmann
a86eb2a60d net: wangxun: fix LIBWX dependencies again
Two more drivers got added that use LIBWX and cause a build warning

WARNING: unmet direct dependencies detected for LIBWX
  Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
  Selected by [y]:
  - NGBEVF [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PCI_MSI [=y]
  Selected by [m]:
  - NGBE [=m] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PCI [=y] && PTP_1588_CLOCK_OPTIONAL [=m]

ld: drivers/net/ethernet/wangxun/libwx/wx_lib.o: in function `wx_clean_tx_irq':
wx_lib.c:(.text+0x5a68): undefined reference to `ptp_schedule_worker'
ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_nway_reset':
wx_ethtool.c:(.text+0x880): undefined reference to `phylink_ethtool_nway_reset'

Add the same dependency on PTP_1588_CLOCK_OPTIONAL to the two driver
using this library module, following the pattern from commit
8fa19c2c69 ("net: wangxun: fix LIBWX dependencies").

Fixes: 377d180bd7 ("net: wangxun: add txgbevf build")
Fixes: a0008a3658 ("net: wangxun: add ngbevf build")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://patch.msgid.link/20250711082339.1372821-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:15:32 -07:00
Samiullah Khawaja
2677010e77 Add support to set NAPI threaded for individual NAPI
A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.

Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.

Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.

Tested
 ./tools/testing/selftests/net/nl_netdev.py
 TAP version 13
 1..7
 ok 1 nl_netdev.empty_check
 ok 2 nl_netdev.lo_check
 ok 3 nl_netdev.page_pool_check
 ok 4 nl_netdev.napi_list_check
 ok 5 nl_netdev.dev_set_threaded
 ok 6 nl_netdev.napi_set_threaded
 ok 7 nl_netdev.nsim_rxq_reset_down
 # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 18:02:37 -07:00
Sean Anderson
a44312d58e net: phy: Don't register LEDs for genphy
If a PHY has no driver, the genphy driver is probed/removed directly in
phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the
LEDs will be (un)registered when probing/removing the genphy driver.
This could occur if the leds are for a non-generic driver that isn't
loaded for whatever reason. Synchronously removing the PHY device in
phy_detach leads to the following deadlock:

rtnl_lock()
ndo_close()
    ...
    phy_detach()
        phy_remove()
            phy_leds_unregister()
                led_classdev_unregister()
                    led_trigger_set()
                        netdev_trigger_deactivate()
                            unregister_netdevice_notifier()
                                rtnl_lock()

There is a corresponding deadlock on the open/register side of things
(and that one is reported by lockdep), but it requires a race while this
one is deterministic. Regular drivers do not have this problem since
they are probed asynchronously (without RTNL held).

Generic PHYs do not support LEDs anyway, so don't bother registering
them.

[JakubL this is a net-next version of
 commit f0f2b992d8 ("net: phy: Don't register LEDs for genphy"),
 which uses APIs removed in -next.]

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250710201454.1280277-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 17:54:06 -07:00
Breno Leitao
ff2ac4df58 netdevsim: implement peer queue flow control
Add flow control mechanism between paired netdevsim devices to stop the
TX queue during high traffic scenarios. When a receive queue becomes
congested (approaching NSIM_RING_SIZE limit), the corresponding transmit
queue on the peer device is stopped using netif_subqueue_try_stop().

Once the receive queue has sufficient capacity again, the peer's
transmit queue is resumed with netif_tx_wake_queue().

Key changes:
  * Add nsim_stop_peer_tx_queue() to pause peer TX when RX queue is full
  * Add nsim_start_peer_tx_queue() to resume peer TX when RX queue drains
  * Implement queue mapping validation to ensure TX/RX queue count match
  * Wake all queues during device unlinking to prevent stuck queues
  * Use RCU protection when accessing peer device references
  * wake the queues when changing the queue numbers
  * Remove IFF_NO_QUEUE given it will enqueue packets now

The flow control only activates when devices have matching TX/RX queue
counts to ensure proper queue mapping.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250711-netdev_flow_control-v3-1-aa1d5a155762@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 17:36:50 -07:00
Oscar Maes
5777d1871b selftests: net: add test for variable PMTU in broadcast routes
Added a test for variable PMTU in broadcast routes.

This test uses iputils' ping and attempts to send a ping between
two peers, which should result in a regular echo reply.

This test will fail when the receiving peer does not receive the echo
request due to a lack of packet fragmentation.

Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250710142714.12986-2-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14 17:29:41 -07:00