Commit Graph

1428061 Commits

Author SHA1 Message Date
Eric Dumazet
6f459eda8b tcp: add tcp_release_cb_cond() helper
Majority of tcp_release_cb() calls do nothing at all.

Provide tcp_release_cb_cond() helper so that release_sock()
can avoid these calls.

Also hint the compiler that __release_sock() and wake_up()
are rarely called.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-77 (-77)
Function                                     old     new   delta
release_sock                                 258     181     -77
Total: Before=25235790, After=25235713, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260310124451.2280968-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 13:22:03 +01:00
Wojciech Slenska
4320f1f111 dt-bindings: net: qcom,ipa: document qcm2290 compatible
Document that ipa on qcm2290 uses version 4.2, the same
as sc7180.

Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Wojciech Slenska <wojciech.slenska@gmail.com>
Link: https://patch.msgid.link/20260310112309.79261-2-wojciech.slenska@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:28:11 +01:00
Paolo Abeni
52e4d5da6d Merge branch 'net-hinic3-pf-initialization'
Fan Gong says:

====================
net: hinic3: PF initialization

This is [2/3] part of hinic3 Ethernet driver second submission.
With this patch hinic3 becomes a complete Ethernet driver with
pf and vf.

Add cmdq detailed-response interfaces.
Add dump interfaces for cmdq, aeq, ceq and mailbox.
Add msg_send_lock for message sending concurrency.
Add PF device support and chip_present_flag to check cards.
Add rx vlan offload support.
Add PF FLR wait and timeout handling.
Add 5 ethtool ops for information of driver and link.

v1: https://lore.kernel.org/netdev/cover.1771916043.git.zhuyikai1@h-partners.com/
v2: https://lore.kernel.org/netdev/cover.1772697509.git.zhuyikai1@h-partners.com/
====================

Link: https://patch.msgid.link/cover.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:50 +01:00
Fan Gong
00608d02dd hinic3: Add ethtool basic ops
Implement following ethtool callback function:
.get_link_ksettings
.get_drvinfo
.get_msglevel
.set_msglevel
.get_link

  These callbacks allow users to utilize ethtool for detailed
network configuration and monitoring.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/b56d490c2a06cae9541a0297d76b11d869f37161.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
330adcedd0 hinic3: Add PF/VF capability parsing and parameter validation
Add the ability to parse PF and VF capabilities and validate
related parameters(SQ & RQ).

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/ac4733f2c0409bb778b4624ed1632dcb2ded6632.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
33cf53672b hinic3: Add PF FLR wait and timeout handling
Add a mechanism for PF to wait for the completion of FLR, ensuring
hardware state consistency after an FLR event.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/7a1b21426fd4274831733aca962eb209b806f4bd.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
3d36efc280 hinic3: Add PF device support and function type validation
Add PF device ID to support for PF devices in driver and enhance
function type validation to ensure proper handling of both PF and
VF.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/895cf7ac341c475e383aa8726039dc8ea3b96ffb.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
2a76f900d1 hinic3: Add msg_send_lock for message sending concurrecy
As send_mbox_msg is invoked by 3 functions: hinic3_send_mbox_to_mgmt,
hinic3_response_mbox_to_mgmt and hinic3_send_mbox_to_mgmt_no_ack, only
hinic3_response_mbox_to_mgmt does not has mutex and the other two has
mbox->mbox_send_lock because their send actions are mutually exclusive.
  As hinic3_response_mbox_to_mgmt does not conflict with them in send
actions but in mailbox resources, add the new mutex(msg_send_lock) in
send_mbox_msg to ensure message concurrency.

  Besdies, in mbox_send_seg change FIELD_PREP to FIELD_GET in
MBOX_STATUS_FINISHED and MBOX_STATUS_SUCCESS to be more reasonable.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/d83f7f6eb4b5e94642a558fab75d61292c347e48.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
0f746fc5bc hinic3: Add RX VLAN offload support
Add vlan offload processing in RX process.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/22cf02a014c2beb7b5f92ab5e6de38c4dd928125.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
d69ee992fb hinic3: Add chip_present_flag checks to prevent errors when card is absent
chip_present_flag is added for driver to prevent errors when card does
not exist. It has been added to multiple critical functions, including
command queue, mailbox and network device operations, ensuring that the
existence of the network card is verified before performing operations.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/3954f22df125f5e843aaa62953d7506eb66922ac.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
678c5b3b6b hinic3: Add Command Queue/Async Event Queue/Complete Event Queue/Mailbox dump interfaces
Add dump interfaces for CMDQ, AEQ, CEQ and mailbox to enhance debugging
capabilities.
  Dump the WQE header for CMDQ.
  Dump the detailed queue information for AEQ and CEQ.
  Dump the related register status for mailbox.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/1644c5021e2059594e878812339ea025ed677f71.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Fan Gong
15b5be9389 hinic3: Add command queue detailed-response interfaces
Add new detailed response interfaces for the hinic3 command
queue (CMDQ), enhancing its functionality to handle commands
requiring input and output buffer pairs.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/cc3cff8458aeb27b07749dc9dcee43c11c45a4c1.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 12:13:48 +01:00
Alexander Graf
0de607dc4f vsock: add G2H fallback for CIDs not owned by H2G transport
When no H2G transport is loaded, vsock currently routes all CIDs to the
G2H transport (commit 65b422d9b6 ("vsock: forward all packets to the
host when no H2G is registered"). Extend that existing behavior: when
an H2G transport is loaded but does not claim a given CID, the
connection falls back to G2H in the same way.

This matters in environments like Nitro Enclaves, where an instance may
run nested VMs via vhost-vsock (H2G) while also needing to reach sibling
enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old
code, any CID > 2 was unconditionally routed to H2G when vhost was
loaded, making those enclaves unreachable without setting
VMADDR_FLAG_TO_HOST explicitly on every connect.

Requiring every application to set VMADDR_FLAG_TO_HOST creates friction:
tools like socat, iperf, and others would all need to learn about it.
The flag was introduced 6 years ago and I am still not aware of any tool
that supports it. Even if there was support, it would be cumbersome to
use. The most natural experience is a single CID address space where H2G
only wins for CIDs it actually owns, and everything else falls through to
G2H, extending the behavior that already exists when H2G is absent.

To give user space at least a hint that the kernel applied this logic,
automatically set the VMADDR_FLAG_TO_HOST on the remote address so it
can determine the path taken via getpeername().

Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1).
At 0 it forces strict routing: H2G always wins for CID > VMADDR_CID_HOST,
or ENODEV if H2G is not loaded.

Signed-off-by: Alexander Graf <graf@amazon.com>
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-12 10:59:36 +01:00
Christophe Leroy (CS GROUP)
17edc4e820 net: Convert move_addr_to_user() to scoped user access
move_addr_to_user() is a critical functions that was converted to
masked user access by commit 1fb0e47161 ("net: remove one stac/clac
pair from move_addr_to_user()")

Convert it to scoped user access to simplify the code.

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/36d7f2e7f504d620c1b88526b25ebc89e3cb61d9.1773142315.git.chleroy@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 20:38:00 -07:00
Wesley Atwell
dc9902bbd4 tcp: use WRITE_ONCE() for tsoffset in tcp_v6_connect()
Commit dd23c9f1e8 ("tcp: annotate data-races around tp->tsoffset")
updated do_tcp_getsockopt() to read tp->tsoffset with READ_ONCE()
for TCP_TIMESTAMP because another CPU may change it concurrently.

tcp_v6_connect() still stores tp->tsoffset with a plain write. That
store runs under lock_sock() via inet_stream_connect(), but the socket
lock does not serialize a concurrent getsockopt(TCP_TIMESTAMP) from
another task sharing the socket.

Use WRITE_ONCE() for the tcp_v6_connect() store so the connect-time
writer matches the lockless TCP_TIMESTAMP reader. This also makes the
IPv6 path consistent with tcp_v4_connect().

Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260310012604.145661-1-atwellwea@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 20:20:03 -07:00
Jakub Kicinski
87aa0f539d Merge branch 'selftests-net-fix-cmd-process-timeout-handling'
Gal Pressman says:

====================
selftests: net: fix cmd.process() timeout handling

Pass the timeout argument correctly in cmd.process().
As Jakub noted, fixing the timeout broke the bpftrace() command
in netpoll_basic.py, so fix it first.
====================

Link: https://patch.msgid.link/20260310115803.2521050-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 19:11:44 -07:00
Gal Pressman
f0bd193166 selftests: net: fix timeout passed as positional argument to communicate()
The cited commit refactored the hardcoded timeout=5 into a parameter,
but dropped the keyword from the communicate() call.
Since Popen.communicate()'s first positional argument is 'input' (not
'timeout'), the timeout value is silently treated as stdin input and the
call never enforces a timeout.

Pass timeout as a keyword argument to restore the intended behavior.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260310115803.2521050-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 19:11:40 -07:00
Gal Pressman
82562972b8 selftests: net: pass bpftrace timeout to cmd()
The bpftrace() helper configures an interval based exit timer but does
not propagate the timeout to the cmd object, which defaults to 5
seconds. Since the default BPFTRACE_TIMEOUT is 10 seconds, cmd.process()
always raises a TimeoutExpired exception before bpftrace has a chance to
exit gracefully.

Pass timeout+5 to cmd() to allow bpftrace to complete gracefully.

Note: this issue is masked by a bug in the way cmd() passes timeout,
this is fixed in the next commit.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260310115803.2521050-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 19:11:36 -07:00
Jakub Kicinski
bd4d6b955d Merge branch 'net-macb-clean-up-several-member-settings-of-macb_config-instances'
Kevin Hao says:

====================
net: macb: Clean up several member settings of macb_config instances

While debugging an issue in the macb driver, I noticed that many macb_config
instances have very similar member settings. This makes it difficult to
identify the actual differences between these instances. This patch series
aims to clean up some of these settings and clarify the specific configurations
of each macb_config instance. No functional changes are introduced.
====================

Link: https://patch.msgid.link/20260310-macb-cleanup-v1-0-928c1a91a7dc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 19:00:38 -07:00
Kevin Hao
0ae998c4ef net: macb: Clean up the .usrio settings in macb_config instances
All instances of macb_config currently have the .usrio set, but most of
them use &macb_default_usrio. In fact, there is no need to duplicate
this across all macb_config instances. Remove the .usrio setting from
instances that use &macb_default_usrio, and ensure that the default is
selected at runtime when no other value is explicitly set.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://patch.msgid.link/20260310-macb-cleanup-v1-3-928c1a91a7dc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 19:00:33 -07:00
Kevin Hao
9179711ee2 net: macb: Clean up the .init settings in macb_config instances
All instances of macb_config currently have the .init field set, but most
of them use macb_init(). In fact, there is no need to duplicate this
across all macb_config instances. Introduce a new macb_init() function
that executes the specific .init if it is set; otherwise, it runs a
default initialization function.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://patch.msgid.link/20260310-macb-cleanup-v1-2-928c1a91a7dc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 19:00:33 -07:00
Kevin Hao
f97977944d net: macb: Clean up the .clk_init setting in the macb_config instances
All instances of macb_config currently have .clk_init set, but most of
them use macb_clk_init(). In fact, there is no need to duplicate this
across all macb_config instances. Introduce a new macb_clk_init()
function that executes the specific .clk_init if it is set; otherwise,
it runs the default clock initialization function.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://patch.msgid.link/20260310-macb-cleanup-v1-1-928c1a91a7dc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 19:00:33 -07:00
Daniel Golle
7e27d6202e selftests: net: local_termination: test link-local protocols
Add tests to local_termination.sh to verify that link-local frames
arrive. On some switches the DSA driver uses bridges to connect the
user ports to their CPU ports. More "intelligent" switches typically
don't forward link-local frames, but may trap them to an internal
microcontroller. The driver may have to change trapping rules, so
link-local frames end up on the DSA CPU ports instead of being
silently dropped or trapped to the internal microcontroller of the
switch.

Add two tests which help to validate this has been done correctly:
 - Link-local STP BPDU should arrive at the Linux netdev when the
   bridge has STP disabled (BR_NO_STP), in which case the bridge
   forwards them rather than consuming them in the control plane
 - Link-local LLDP should arrive at standalone ports (and the test
   should be skipped on bridged ports similar to how it is done
   for the IEEE1588v2/PTP tests)

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/1a67081b2ede1e6d2d32f7dd54ae9688f3566152.1773166131.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 18:58:05 -07:00
Eric Dumazet
410593fec7 tcp: add sysctl_tcp_shrink_window to netns_ipv4_sysctl.rst
Add missing entry for sysctl_tcp_shrink_window.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260310073855.564927-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 18:21:08 -07:00
Soichiro Ueda
34c0378b15 selftests: af_unix: validate SO_PEEK_OFF advancement and reset
Extend the so_peek_off selftest to ensure the socket peek offset is handled
correctly after both MSG_PEEK and actual data consumption.

Verify that the peek offset advances by the same amount as the number of
bytes read when performing a read with MSG_PEEK.

After exercising SO_PEEK_OFF via MSG_PEEK, drain the receive queue with a
non-peek recv() and verify that it can receive all the content in the
buffer and SO_PEEK_OFF returns back to 0.

The verification after actual data consumption was suggested by Miao Wang
when the original so_peek_off selftest was introduced.

Link: https://lore.kernel.org/all/7B657CC7-B5CA-46D2-8A4B-8AB5FB83C6DA@gmail.com/
Suggested-by: Miao Wang <shankerwangmiao@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Soichiro Ueda <the.latticeheart@gmail.com>
Link: https://patch.msgid.link/20260310072832.127848-1-the.latticeheart@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 18:20:17 -07:00
Jakub Kicinski
482aac8b56 Merge branch 'net-stmmac-start-to-shrink-memory-usage'
Russell King says:

====================
net: stmmac: start to shrink memory usage

Start shrinking stmmac's memory usage by avoiding using "int" for
members that are only used for 0/1 (boolean) values, or values that
can't be larger than 255.

In addition, as struct stmmac_dma_cfg is approximately a cache line,
shrinks below a cache line as a result of this patch set, and is
required, there is no point separately allocating this from
struct plat_stmmacenet_data. Embed it into the end of this struct
and set the existing pointer to avoid large wide-spread changes.

Lastly, add documentation for struct stmmac_dma_cfg, and document
the stmmac clocks as best we can given the driver history.
====================

Link: https://patch.msgid.link/aa6VEsmBK-S9eNYU@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:43 -07:00
Russell King (Oracle)
315bab9411 net: stmmac: add documentation for clocks
Add documentation covering stmmac_clk, pclk, clk_ptp_ref and clk_tx_i
in the hope that this will help understand what each of these clocks
are for.

There is confusion around stmmac_clk and pclk which can't be easily
resolved today as the Imagination Technologies Pistachio board that
pclk was introduced for has no public documentation and is likely now
obsolete. So the origins of pclk are lost to the winds of time.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX5Z-0000000CVsb-1XTm@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:07 -07:00
Russell King (Oracle)
9fe167ab79 net: stmmac: add documentation for stmmac_dma_cfg members
Add documentation of each of the struct stmmac_dma_cfg members. dche
remains undocumented as I don't have documentation that covers this.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX5U-0000000CVsQ-162V@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:07 -07:00
Russell King (Oracle)
758ed85aad net: stmmac: use u8 for host_dma_width and similar struct members
We aren't going to see >= 256-bit address busses soon, so reduce
host_dma_width and associated other struct members that initialise
this from u32 to u8.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> # qcom-ethqos
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX5P-0000000CVsK-0iwX@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:07 -07:00
Russell King (Oracle)
94808793fe net: stmmac: use u8 for ?x_queues_to_use and number_?x_queues
The maximum number of queues is a compile time constant of only eight.
This makes using a 32-bit quantity wastefulf. Instead, use u8 for
these and their associated variables.

When reading the DT properties, saturdate at U8_MAX. Provided the core
provides DMA capabilities to describe the number of queues, this will
be capped by stmmac_hw_init() with a warning.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX5K-0000000CVsE-0J0Y@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:07 -07:00
Russell King (Oracle)
3357642e65 net: stmmac: reorder structs to reduce memory consumption
Reorder some of the stmmac structures to allow them to pack better,
thereby using less memory. On aarch64, sizeof(struct stmmac_priv)
was 880, and with this change becomes 816, saving 64 bytes, which
is an 8% saving.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX5E-0000000CVs8-40w4@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:06 -07:00
Russell King (Oracle)
c3d08424e0 net: stmmac: convert plat_stmmacenet_data booleans to type bool
Convert members of struct plat_stmmacenet_data that are booleans to
type 'bool' and ensure their initialisers are true/false. Move the
has_xxx for the GMAC cores together, and move the COE members to the
end of the list of bool to avoid unused holes in the struct.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX59-0000000CVs2-3MHc@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:06 -07:00
Russell King (Oracle)
7a6387dec8 net: stmmac: provide plat_dat->dma_cfg in stmmac_plat_dat_alloc()
plat_dat->dma_cfg is unconditionally required for the operation of the
driver, so it would make sense to allocate it along with the plat_dat.

On Arm64, sizeof(*plat_dat) has recently shrunk from 880 to 816 bytes
and sizeof(*plat_dat->dma_cfg) has shrunk from 32 to 20 bytes.

Given that dma_cfg is required, and it is now less than a cache line,
It doesn't make sense to allocate this separateny, so place it at the
end of struct plat_stmmacenet_data, and set plat_dat->dma_cfg to point
at that to avoid mass changes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX54-0000000CVrw-2jfu@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:54:06 -07:00
Johan Hovold
1a6ca6497a net: mdio: mvusb: drop redundant device reference
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.

Drop the redundant device reference to reduce cargo culting, make it
easier to spot drivers where an extra reference is needed, and reduce
the risk of memory leaks when drivers fail to release it.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260309082641.15574-1-johan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:53:19 -07:00
Jakub Kicinski
07f56c8f54 Merge branch 'amd-xgbe-improve-power-management-for-s0i3'
Raju Rangoju says:

====================
amd-xgbe: Improve power management for S0i3

Improve the amd-xgbe power management handling to allow AMD platforms to
reach the deepest suspend state (S0i3) when modern standby is used.

The first patch cleans up the xgbe_powerdown() and xgbe_powerup()
helpers by removing an unused caller distinction and aligning the
ordering of operations with xgbe_stop().

The second patch adds proper PCI power management operations, following
the standard PCI PM model, so that the device can be cleanly put into
D3 and resumed back to D0. Without this, the amd_pmc driver reports:

  "Last suspend didn't reach deepest state"

when the amd-xgbe driver is enabled.

These changes have been tested on AMD platforms using S0i3 modern
standby.
====================

Link: https://patch.msgid.link/20260308092851.1510214-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:51:27 -07:00
Raju Rangoju
7644e76956 amd-xgbe: add PCI power management for S0i3 support
The current suspend/resume implementation does not correctly handle PCI
device power state transitions, which prevents AMD platforms from
reaching the deepest suspend state (S0i3) when the amd-xgbe driver is
enabled.

In particular, the amd_pmc driver reports:

  "Last suspend didn't reach deepest state"

when this device is present.

Implement proper PCI power management operations following the standard
PCI PM model so that the device can be cleanly powered down and resumed.

Suspend path:
- Power down the network interface
- Put the PHY into low-power mode
- Disable bus mastering to prevent DMA activity
- Save PCI configuration space
- Disable the PCI device
- Disable wake from D3 (S0i3 does not require Wake-on-LAN)
- Set the device to D3hot

Resume path:
- Restore the PCI power state to D0
- Restore PCI configuration space
- Enable the PCI device
- Re-enable bus mastering
- Re-enable device interrupts
- Clear the PHY low-power mode
- Power up the network interface

This allows systems using amd-xgbe to reach the deepest suspend state
when entering modern standby (S0i3).

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260308092851.1510214-3-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:51:23 -07:00
Raju Rangoju
fe81629217 amd-xgbe: Simplify powerdown/powerup paths
The caller parameter in xgbe_powerdown() and xgbe_powerup() was intended
to differentiate between driver and ioctl contexts, but the only
remaining usage is from the driver suspend/resume path.

Simplify this by:
- Removing the unused XGMAC_DRIVER_CONTEXT and XGMAC_IOCTL_CONTEXT
  macros
- Dropping the now-unused caller parameter
- Reordering operations in xgbe_powerdown() to disable NAPI before
  stopping TX/RX, matching the order used in xgbe_stop()

This makes the powerdown/powerup paths easier to follow and keeps the
ordering consistent with the rest of the driver.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260308092851.1510214-2-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:51:22 -07:00
Jiayuan Chen
34bd3c6b0b net: sched: cls_u32: Avoid memcpy() false-positive warning in u32_init_knode()
Syzbot reported a warning in u32_init_knode() [1].

Similar to commit 7cba18332e ("net: sched: cls_u32: Avoid memcpy()
false-positive warning") which addressed the same issue in u32_change(),
use unsafe_memcpy() in u32_init_knode() to work around the compiler's
inability to see into composite flexible array structs.

This silences the false-positive reported by syzbot:

  memcpy: detected field-spanning write (size 32) of single field
  "&new->sel" at net/sched/cls_u32.c:855 (size 16)

Since the memory is correctly allocated with kzalloc_flex() using
s->nkeys, this is purely a false positive and does not need a Fixes tag.

[1] https://syzkaller.appspot.com/bug?extid=d5ace703ed883df56e42

Reported-by: syzbot+d5ace703ed883df56e42@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69a811b9.a70a0220.b118c.0019.GAE@google.com/T/
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260309123917.402183-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:39:35 -07:00
Yoshihiro Shimoda
9278b88892 net: ethernet: ravb: Disable interrupts when closing device
Disable E-MAC interrupts when closing the device.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
[Niklas: Rebase from BSP and reword commit message]
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260307095532.2118495-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:37:21 -07:00
Eric Joyner
1f9cab56e7 ionic: Report additional media types from firmware
The device firmware supports reporting more media types than what was
there in the past, so map these new media types to existing ethtool
bits, which appears to be what other drivers do for media types that
match speeds but not physical spec.

And while here, make a very small cleanup in ionic_get_link_ksettings()
to remove some unnecessary code duplication.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Eric Joyner <eric.joyner@amd.com>
Link: https://patch.msgid.link/20260306215634.64550-1-eric.joyner@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:34:14 -07:00
Jakub Kicinski
7bb1970494 Merge branch 'tools-ynl-policy-query-support'
Jakub Kicinski says:

====================
tools: ynl: policy query support

Improve the Netlink policy support in YNL. This series grew out of
improvements to policy checking, when writing selftests I realized
that instead of doing all the policy parsing in the test we're
better off making it part of YNL itself.

Patch 1 adds pad handling, apparently we never hit pad with commonly
used families. nlctrl policy dumps use pad more frequently.
Patch 2 is a trivial refactor.
Patch 3 pays off some technical debt in terms of documentation.
The YnlFamily class is growing in size and it's quite hard to
find its members. So document it a little bit.
Patch 4 is the main dish, the implementation of get_policy(op)
in YnlFamily.
Patch 5 plugs the new functionality into the CLI.
====================

Link: https://patch.msgid.link/20260310005337.3594225-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:33:07 -07:00
Jakub Kicinski
d6df5e9b2a tools: ynl: cli: add --policy support
Add --policy flag which can be combined with --do or --dump to query
the kernel's netlink policy for an operation instead of executing it.

Examples:

  $ ynl --family netdev --do dev-get --policy
  {'ifindex': {'max-value': 4294967295, 'min-value': 1, 'type': 'u32'}}

  $ ynl --family ethtool --do channels-get --policy --output-json
  {"header": {"type": "nested", "policy": {"dev-index": ...}}}

  $ ynl --family netdev --dump dev-get --policy
  {}

Link: https://patch.msgid.link/20260310005337.3594225-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:33:00 -07:00
Jakub Kicinski
77a6401a87 tools: ynl: add Python API for easier access to policies
The format of Netlink policy dump is a bit curious with messages
in the same dump carrying both attrs and mapping info. Plus each
message carries a single piece of the puzzle the caller must then
reassemble.

I need to do this reassembly for a test, but I think it's generally
useful. So let's add proper support to YnlFamily to return more
user-friendly representation. See the various docs in the patch
for more details.

Link: https://patch.msgid.link/20260310005337.3594225-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:32:46 -07:00
Jakub Kicinski
8bbcfce5db tools: ynl: add short doc to class YnlFamily
The class is quite long. It's getting hard to find the user-facing
methods. Add a short doc at the class level explaining the main API.

Link: https://patch.msgid.link/20260310005337.3594225-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:30:03 -07:00
Jakub Kicinski
c26fda6212 tools: ynl: move policy decoding out of NlMsg
We'll soon need to decode policies from dump so move _decode_policy()
out of class NlMsg.

Link: https://patch.msgid.link/20260310005337.3594225-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:30:03 -07:00
Jakub Kicinski
7b1309c339 tools: ynl: handle pad type during decode
Apparently Python code only handled the 'pad' type in structs
until now. Add it to attr decoding. nlctrl policy dumps need it.

Link: https://patch.msgid.link/20260310005337.3594225-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:30:03 -07:00
Rosen Penev
73a8643525 net: mvneta: support EPROBE_DEFER when reading MAC address
If nvmem loads after the ethernet driver, mac address assignments will
not take effect. of_get_ethdev_address returns EPROBE_DEFER in such a
case so we need to handle that to avoid eth_hw_addr_random.

Add extra goto section to just free stats as they are allocated right
above.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260307031709.640141-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:02:25 -07:00
Dimitri Daskalakis
690043b95c selftests: drv-net: rss: Add retries to test_rss_key_indir to reduce flakes
The test generates 16 flows, and verifies that traffic is distributed
across two queues via the NICs RSS indirection table. The likelihood of the
flows skewing to a single queue is high, so we retry sending traffic up to
3 times.

Alternatively, we could increase the number of generated flows. But
debug kernels may struggle to ramp this many flows.

During manual testing, the test passed for 10,000 consecutive runs.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260309204215.2110486-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 19:01:51 -07:00
Fernando Fernandez Mancera
7da62262ec inet: add ip_local_port_step_width sysctl to improve port usage distribution
With the current port selection algorithm, ports after a reserved port
range or long time used port are used more often than others [1]. This
causes an uneven port usage distribution. This combines with cloud
environments blocking connections between the application server and the
database server if there was a previous connection with the same source
port, leading to connectivity problems between applications on cloud
environments.

The real issue here is that these firewalls cannot cope with
standards-compliant port reuse. This is a workaround for such situations
and an improvement on the distribution of ports selected.

The proposed solution is to implement a variant of RFC 6056 Algorithm 5.
The step size is selected randomly on every connect() call ensuring it
is a coprime with respect to the size of the range of ports we want to
scan. This way, we can ensure that all ports within the range are
scanned before returning an error. To enable this algorithm, the user
must configure the new sysctl option "net.ipv4.ip_local_port_step_width".

In addition, on graphs generated we can observe that the distribution of
source ports is more even with the proposed approach. [2]

[1] https://0xffsoftware.com/port_graph_current_alg.html

[2] https://0xffsoftware.com/port_graph_random_step_alg.html

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260309023946.5473-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:59:39 -07:00
Jakub Kicinski
ae95cbaedb Merge branch 'selftests-rds-ksft-cleanups'
Allison Henderson says:

====================
selftests: rds: ksft cleanups

This set addresses a few rds selftests clean ups and bugs encountered
when running in the ksft framework.  The first patch is a clean up
patch that addresses pylint warnings, but otherwise no functional
changes.  The next patch moves the test time out to a ksft settings
file so that the time out is set appropriately.  And lastly we fix a
tcpdump segfault caused by deprecated a os.fork() call.
====================

Link: https://patch.msgid.link/20260308055835.1338257-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10 18:54:25 -07:00