linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 09:02:21 -04:00

Author	SHA1	Message	Date
Alexander Graf	0de607dc4f	vsock: add G2H fallback for CIDs not owned by H2G transport When no H2G transport is loaded, vsock currently routes all CIDs to the G2H transport (commit `65b422d9b6` ("vsock: forward all packets to the host when no H2G is registered"). Extend that existing behavior: when an H2G transport is loaded but does not claim a given CID, the connection falls back to G2H in the same way. This matters in environments like Nitro Enclaves, where an instance may run nested VMs via vhost-vsock (H2G) while also needing to reach sibling enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old code, any CID > 2 was unconditionally routed to H2G when vhost was loaded, making those enclaves unreachable without setting VMADDR_FLAG_TO_HOST explicitly on every connect. Requiring every application to set VMADDR_FLAG_TO_HOST creates friction: tools like socat, iperf, and others would all need to learn about it. The flag was introduced 6 years ago and I am still not aware of any tool that supports it. Even if there was support, it would be cumbersome to use. The most natural experience is a single CID address space where H2G only wins for CIDs it actually owns, and everything else falls through to G2H, extending the behavior that already exists when H2G is absent. To give user space at least a hint that the kernel applied this logic, automatically set the VMADDR_FLAG_TO_HOST on the remote address so it can determine the path taken via getpeername(). Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1). At 0 it forces strict routing: H2G always wins for CID > VMADDR_CID_HOST, or ENODEV if H2G is not loaded. Signed-off-by: Alexander Graf <graf@amazon.com> Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-12 10:59:36 +01:00
Christophe Leroy (CS GROUP)	17edc4e820	net: Convert move_addr_to_user() to scoped user access move_addr_to_user() is a critical functions that was converted to masked user access by commit `1fb0e47161` ("net: remove one stac/clac pair from move_addr_to_user()") Convert it to scoped user access to simplify the code. Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/36d7f2e7f504d620c1b88526b25ebc89e3cb61d9.1773142315.git.chleroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 20:38:00 -07:00
Wesley Atwell	dc9902bbd4	tcp: use WRITE_ONCE() for tsoffset in tcp_v6_connect() Commit `dd23c9f1e8` ("tcp: annotate data-races around tp->tsoffset") updated do_tcp_getsockopt() to read tp->tsoffset with READ_ONCE() for TCP_TIMESTAMP because another CPU may change it concurrently. tcp_v6_connect() still stores tp->tsoffset with a plain write. That store runs under lock_sock() via inet_stream_connect(), but the socket lock does not serialize a concurrent getsockopt(TCP_TIMESTAMP) from another task sharing the socket. Use WRITE_ONCE() for the tcp_v6_connect() store so the connect-time writer matches the lockless TCP_TIMESTAMP reader. This also makes the IPv6 path consistent with tcp_v4_connect(). Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://patch.msgid.link/20260310012604.145661-1-atwellwea@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 20:20:03 -07:00
Jakub Kicinski	87aa0f539d	Merge branch 'selftests-net-fix-cmd-process-timeout-handling' Gal Pressman says: ==================== selftests: net: fix cmd.process() timeout handling Pass the timeout argument correctly in cmd.process(). As Jakub noted, fixing the timeout broke the bpftrace() command in netpoll_basic.py, so fix it first. ==================== Link: https://patch.msgid.link/20260310115803.2521050-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 19:11:44 -07:00
Gal Pressman	f0bd193166	selftests: net: fix timeout passed as positional argument to communicate() The cited commit refactored the hardcoded timeout=5 into a parameter, but dropped the keyword from the communicate() call. Since Popen.communicate()'s first positional argument is 'input' (not 'timeout'), the timeout value is silently treated as stdin input and the call never enforces a timeout. Pass timeout as a keyword argument to restore the intended behavior. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260310115803.2521050-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 19:11:40 -07:00
Gal Pressman	82562972b8	selftests: net: pass bpftrace timeout to cmd() The bpftrace() helper configures an interval based exit timer but does not propagate the timeout to the cmd object, which defaults to 5 seconds. Since the default BPFTRACE_TIMEOUT is 10 seconds, cmd.process() always raises a TimeoutExpired exception before bpftrace has a chance to exit gracefully. Pass timeout+5 to cmd() to allow bpftrace to complete gracefully. Note: this issue is masked by a bug in the way cmd() passes timeout, this is fixed in the next commit. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260310115803.2521050-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 19:11:36 -07:00
Jakub Kicinski	bd4d6b955d	Merge branch 'net-macb-clean-up-several-member-settings-of-macb_config-instances' Kevin Hao says: ==================== net: macb: Clean up several member settings of macb_config instances While debugging an issue in the macb driver, I noticed that many macb_config instances have very similar member settings. This makes it difficult to identify the actual differences between these instances. This patch series aims to clean up some of these settings and clarify the specific configurations of each macb_config instance. No functional changes are introduced. ==================== Link: https://patch.msgid.link/20260310-macb-cleanup-v1-0-928c1a91a7dc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 19:00:38 -07:00
Kevin Hao	0ae998c4ef	net: macb: Clean up the .usrio settings in macb_config instances All instances of macb_config currently have the .usrio set, but most of them use &macb_default_usrio. In fact, there is no need to duplicate this across all macb_config instances. Remove the .usrio setting from instances that use &macb_default_usrio, and ensure that the default is selected at runtime when no other value is explicitly set. Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://patch.msgid.link/20260310-macb-cleanup-v1-3-928c1a91a7dc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 19:00:33 -07:00
Kevin Hao	9179711ee2	net: macb: Clean up the .init settings in macb_config instances All instances of macb_config currently have the .init field set, but most of them use macb_init(). In fact, there is no need to duplicate this across all macb_config instances. Introduce a new macb_init() function that executes the specific .init if it is set; otherwise, it runs a default initialization function. Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://patch.msgid.link/20260310-macb-cleanup-v1-2-928c1a91a7dc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 19:00:33 -07:00
Kevin Hao	f97977944d	net: macb: Clean up the .clk_init setting in the macb_config instances All instances of macb_config currently have .clk_init set, but most of them use macb_clk_init(). In fact, there is no need to duplicate this across all macb_config instances. Introduce a new macb_clk_init() function that executes the specific .clk_init if it is set; otherwise, it runs the default clock initialization function. Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://patch.msgid.link/20260310-macb-cleanup-v1-1-928c1a91a7dc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 19:00:33 -07:00
Daniel Golle	7e27d6202e	selftests: net: local_termination: test link-local protocols Add tests to local_termination.sh to verify that link-local frames arrive. On some switches the DSA driver uses bridges to connect the user ports to their CPU ports. More "intelligent" switches typically don't forward link-local frames, but may trap them to an internal microcontroller. The driver may have to change trapping rules, so link-local frames end up on the DSA CPU ports instead of being silently dropped or trapped to the internal microcontroller of the switch. Add two tests which help to validate this has been done correctly: - Link-local STP BPDU should arrive at the Linux netdev when the bridge has STP disabled (BR_NO_STP), in which case the bridge forwards them rather than consuming them in the control plane - Link-local LLDP should arrive at standalone ports (and the test should be skipped on bridged ports similar to how it is done for the IEEE1588v2/PTP tests) Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/1a67081b2ede1e6d2d32f7dd54ae9688f3566152.1773166131.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 18:58:05 -07:00
Eric Dumazet	410593fec7	tcp: add sysctl_tcp_shrink_window to netns_ipv4_sysctl.rst Add missing entry for sysctl_tcp_shrink_window. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260310073855.564927-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 18:21:08 -07:00
Soichiro Ueda	34c0378b15	selftests: af_unix: validate SO_PEEK_OFF advancement and reset Extend the so_peek_off selftest to ensure the socket peek offset is handled correctly after both MSG_PEEK and actual data consumption. Verify that the peek offset advances by the same amount as the number of bytes read when performing a read with MSG_PEEK. After exercising SO_PEEK_OFF via MSG_PEEK, drain the receive queue with a non-peek recv() and verify that it can receive all the content in the buffer and SO_PEEK_OFF returns back to 0. The verification after actual data consumption was suggested by Miao Wang when the original so_peek_off selftest was introduced. Link: https://lore.kernel.org/all/7B657CC7-B5CA-46D2-8A4B-8AB5FB83C6DA@gmail.com/ Suggested-by: Miao Wang <shankerwangmiao@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Soichiro Ueda <the.latticeheart@gmail.com> Link: https://patch.msgid.link/20260310072832.127848-1-the.latticeheart@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-11 18:20:17 -07:00
Jakub Kicinski	482aac8b56	Merge branch 'net-stmmac-start-to-shrink-memory-usage' Russell King says: ==================== net: stmmac: start to shrink memory usage Start shrinking stmmac's memory usage by avoiding using "int" for members that are only used for 0/1 (boolean) values, or values that can't be larger than 255. In addition, as struct stmmac_dma_cfg is approximately a cache line, shrinks below a cache line as a result of this patch set, and is required, there is no point separately allocating this from struct plat_stmmacenet_data. Embed it into the end of this struct and set the existing pointer to avoid large wide-spread changes. Lastly, add documentation for struct stmmac_dma_cfg, and document the stmmac clocks as best we can given the driver history. ==================== Link: https://patch.msgid.link/aa6VEsmBK-S9eNYU@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:43 -07:00
Russell King (Oracle)	315bab9411	net: stmmac: add documentation for clocks Add documentation covering stmmac_clk, pclk, clk_ptp_ref and clk_tx_i in the hope that this will help understand what each of these clocks are for. There is confusion around stmmac_clk and pclk which can't be easily resolved today as the Imagination Technologies Pistachio board that pclk was introduced for has no public documentation and is likely now obsolete. So the origins of pclk are lost to the winds of time. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX5Z-0000000CVsb-1XTm@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:07 -07:00
Russell King (Oracle)	9fe167ab79	net: stmmac: add documentation for stmmac_dma_cfg members Add documentation of each of the struct stmmac_dma_cfg members. dche remains undocumented as I don't have documentation that covers this. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX5U-0000000CVsQ-162V@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:07 -07:00
Russell King (Oracle)	758ed85aad	net: stmmac: use u8 for host_dma_width and similar struct members We aren't going to see >= 256-bit address busses soon, so reduce host_dma_width and associated other struct members that initialise this from u32 to u8. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> # qcom-ethqos Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX5P-0000000CVsK-0iwX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:07 -07:00
Russell King (Oracle)	94808793fe	net: stmmac: use u8 for ?x_queues_to_use and number_?x_queues The maximum number of queues is a compile time constant of only eight. This makes using a 32-bit quantity wastefulf. Instead, use u8 for these and their associated variables. When reading the DT properties, saturdate at U8_MAX. Provided the core provides DMA capabilities to describe the number of queues, this will be capped by stmmac_hw_init() with a warning. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX5K-0000000CVsE-0J0Y@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:07 -07:00
Russell King (Oracle)	3357642e65	net: stmmac: reorder structs to reduce memory consumption Reorder some of the stmmac structures to allow them to pack better, thereby using less memory. On aarch64, sizeof(struct stmmac_priv) was 880, and with this change becomes 816, saving 64 bytes, which is an 8% saving. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX5E-0000000CVs8-40w4@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:06 -07:00
Russell King (Oracle)	c3d08424e0	net: stmmac: convert plat_stmmacenet_data booleans to type bool Convert members of struct plat_stmmacenet_data that are booleans to type 'bool' and ensure their initialisers are true/false. Move the has_xxx for the GMAC cores together, and move the COE members to the end of the list of bool to avoid unused holes in the struct. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX59-0000000CVs2-3MHc@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:06 -07:00
Russell King (Oracle)	7a6387dec8	net: stmmac: provide plat_dat->dma_cfg in stmmac_plat_dat_alloc() plat_dat->dma_cfg is unconditionally required for the operation of the driver, so it would make sense to allocate it along with the plat_dat. On Arm64, sizeof(plat_dat) has recently shrunk from 880 to 816 bytes and sizeof(plat_dat->dma_cfg) has shrunk from 32 to 20 bytes. Given that dma_cfg is required, and it is now less than a cache line, It doesn't make sense to allocate this separateny, so place it at the end of struct plat_stmmacenet_data, and set plat_dat->dma_cfg to point at that to avoid mass changes. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX54-0000000CVrw-2jfu@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:54:06 -07:00
Johan Hovold	1a6ca6497a	net: mdio: mvusb: drop redundant device reference Driver core holds a reference to the USB interface and its parent USB device while the interface is bound to a driver and there is no need to take additional references unless the structures are needed after disconnect. Drop the redundant device reference to reduce cargo culting, make it easier to spot drivers where an extra reference is needed, and reduce the risk of memory leaks when drivers fail to release it. Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260309082641.15574-1-johan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:53:19 -07:00
Jakub Kicinski	07f56c8f54	Merge branch 'amd-xgbe-improve-power-management-for-s0i3' Raju Rangoju says: ==================== amd-xgbe: Improve power management for S0i3 Improve the amd-xgbe power management handling to allow AMD platforms to reach the deepest suspend state (S0i3) when modern standby is used. The first patch cleans up the xgbe_powerdown() and xgbe_powerup() helpers by removing an unused caller distinction and aligning the ordering of operations with xgbe_stop(). The second patch adds proper PCI power management operations, following the standard PCI PM model, so that the device can be cleanly put into D3 and resumed back to D0. Without this, the amd_pmc driver reports: "Last suspend didn't reach deepest state" when the amd-xgbe driver is enabled. These changes have been tested on AMD platforms using S0i3 modern standby. ==================== Link: https://patch.msgid.link/20260308092851.1510214-1-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:51:27 -07:00
Raju Rangoju	7644e76956	amd-xgbe: add PCI power management for S0i3 support The current suspend/resume implementation does not correctly handle PCI device power state transitions, which prevents AMD platforms from reaching the deepest suspend state (S0i3) when the amd-xgbe driver is enabled. In particular, the amd_pmc driver reports: "Last suspend didn't reach deepest state" when this device is present. Implement proper PCI power management operations following the standard PCI PM model so that the device can be cleanly powered down and resumed. Suspend path: - Power down the network interface - Put the PHY into low-power mode - Disable bus mastering to prevent DMA activity - Save PCI configuration space - Disable the PCI device - Disable wake from D3 (S0i3 does not require Wake-on-LAN) - Set the device to D3hot Resume path: - Restore the PCI power state to D0 - Restore PCI configuration space - Enable the PCI device - Re-enable bus mastering - Re-enable device interrupts - Clear the PHY low-power mode - Power up the network interface This allows systems using amd-xgbe to reach the deepest suspend state when entering modern standby (S0i3). Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260308092851.1510214-3-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:51:23 -07:00
Raju Rangoju	fe81629217	amd-xgbe: Simplify powerdown/powerup paths The caller parameter in xgbe_powerdown() and xgbe_powerup() was intended to differentiate between driver and ioctl contexts, but the only remaining usage is from the driver suspend/resume path. Simplify this by: - Removing the unused XGMAC_DRIVER_CONTEXT and XGMAC_IOCTL_CONTEXT macros - Dropping the now-unused caller parameter - Reordering operations in xgbe_powerdown() to disable NAPI before stopping TX/RX, matching the order used in xgbe_stop() This makes the powerdown/powerup paths easier to follow and keeps the ordering consistent with the rest of the driver. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260308092851.1510214-2-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:51:22 -07:00
Jiayuan Chen	34bd3c6b0b	net: sched: cls_u32: Avoid memcpy() false-positive warning in u32_init_knode() Syzbot reported a warning in u32_init_knode() [1]. Similar to commit `7cba18332e` ("net: sched: cls_u32: Avoid memcpy() false-positive warning") which addressed the same issue in u32_change(), use unsafe_memcpy() in u32_init_knode() to work around the compiler's inability to see into composite flexible array structs. This silences the false-positive reported by syzbot: memcpy: detected field-spanning write (size 32) of single field "&new->sel" at net/sched/cls_u32.c:855 (size 16) Since the memory is correctly allocated with kzalloc_flex() using s->nkeys, this is purely a false positive and does not need a Fixes tag. [1] https://syzkaller.appspot.com/bug?extid=d5ace703ed883df56e42 Reported-by: syzbot+d5ace703ed883df56e42@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69a811b9.a70a0220.b118c.0019.GAE@google.com/T/ Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://patch.msgid.link/20260309123917.402183-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:39:35 -07:00
Yoshihiro Shimoda	9278b88892	net: ethernet: ravb: Disable interrupts when closing device Disable E-MAC interrupts when closing the device. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> [Niklas: Rebase from BSP and reword commit message] Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/20260307095532.2118495-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:37:21 -07:00
Eric Joyner	1f9cab56e7	ionic: Report additional media types from firmware The device firmware supports reporting more media types than what was there in the past, so map these new media types to existing ethtool bits, which appears to be what other drivers do for media types that match speeds but not physical spec. And while here, make a very small cleanup in ionic_get_link_ksettings() to remove some unnecessary code duplication. Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Eric Joyner <eric.joyner@amd.com> Link: https://patch.msgid.link/20260306215634.64550-1-eric.joyner@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:34:14 -07:00
Jakub Kicinski	7bb1970494	Merge branch 'tools-ynl-policy-query-support' Jakub Kicinski says: ==================== tools: ynl: policy query support Improve the Netlink policy support in YNL. This series grew out of improvements to policy checking, when writing selftests I realized that instead of doing all the policy parsing in the test we're better off making it part of YNL itself. Patch 1 adds pad handling, apparently we never hit pad with commonly used families. nlctrl policy dumps use pad more frequently. Patch 2 is a trivial refactor. Patch 3 pays off some technical debt in terms of documentation. The YnlFamily class is growing in size and it's quite hard to find its members. So document it a little bit. Patch 4 is the main dish, the implementation of get_policy(op) in YnlFamily. Patch 5 plugs the new functionality into the CLI. ==================== Link: https://patch.msgid.link/20260310005337.3594225-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:33:07 -07:00
Jakub Kicinski	d6df5e9b2a	tools: ynl: cli: add --policy support Add --policy flag which can be combined with --do or --dump to query the kernel's netlink policy for an operation instead of executing it. Examples: $ ynl --family netdev --do dev-get --policy {'ifindex': {'max-value': 4294967295, 'min-value': 1, 'type': 'u32'}} $ ynl --family ethtool --do channels-get --policy --output-json {"header": {"type": "nested", "policy": {"dev-index": ...}}} $ ynl --family netdev --dump dev-get --policy {} Link: https://patch.msgid.link/20260310005337.3594225-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:33:00 -07:00
Jakub Kicinski	77a6401a87	tools: ynl: add Python API for easier access to policies The format of Netlink policy dump is a bit curious with messages in the same dump carrying both attrs and mapping info. Plus each message carries a single piece of the puzzle the caller must then reassemble. I need to do this reassembly for a test, but I think it's generally useful. So let's add proper support to YnlFamily to return more user-friendly representation. See the various docs in the patch for more details. Link: https://patch.msgid.link/20260310005337.3594225-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:32:46 -07:00
Jakub Kicinski	8bbcfce5db	tools: ynl: add short doc to class YnlFamily The class is quite long. It's getting hard to find the user-facing methods. Add a short doc at the class level explaining the main API. Link: https://patch.msgid.link/20260310005337.3594225-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:30:03 -07:00
Jakub Kicinski	c26fda6212	tools: ynl: move policy decoding out of NlMsg We'll soon need to decode policies from dump so move _decode_policy() out of class NlMsg. Link: https://patch.msgid.link/20260310005337.3594225-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:30:03 -07:00
Jakub Kicinski	7b1309c339	tools: ynl: handle pad type during decode Apparently Python code only handled the 'pad' type in structs until now. Add it to attr decoding. nlctrl policy dumps need it. Link: https://patch.msgid.link/20260310005337.3594225-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:30:03 -07:00
Rosen Penev	73a8643525	net: mvneta: support EPROBE_DEFER when reading MAC address If nvmem loads after the ethernet driver, mac address assignments will not take effect. of_get_ethdev_address returns EPROBE_DEFER in such a case so we need to handle that to avoid eth_hw_addr_random. Add extra goto section to just free stats as they are allocated right above. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260307031709.640141-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:02:25 -07:00
Dimitri Daskalakis	690043b95c	selftests: drv-net: rss: Add retries to test_rss_key_indir to reduce flakes The test generates 16 flows, and verifies that traffic is distributed across two queues via the NICs RSS indirection table. The likelihood of the flows skewing to a single queue is high, so we retry sending traffic up to 3 times. Alternatively, we could increase the number of generated flows. But debug kernels may struggle to ramp this many flows. During manual testing, the test passed for 10,000 consecutive runs. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260309204215.2110486-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 19:01:51 -07:00
Fernando Fernandez Mancera	7da62262ec	inet: add ip_local_port_step_width sysctl to improve port usage distribution With the current port selection algorithm, ports after a reserved port range or long time used port are used more often than others [1]. This causes an uneven port usage distribution. This combines with cloud environments blocking connections between the application server and the database server if there was a previous connection with the same source port, leading to connectivity problems between applications on cloud environments. The real issue here is that these firewalls cannot cope with standards-compliant port reuse. This is a workaround for such situations and an improvement on the distribution of ports selected. The proposed solution is to implement a variant of RFC 6056 Algorithm 5. The step size is selected randomly on every connect() call ensuring it is a coprime with respect to the size of the range of ports we want to scan. This way, we can ensure that all ports within the range are scanned before returning an error. To enable this algorithm, the user must configure the new sysctl option "net.ipv4.ip_local_port_step_width". In addition, on graphs generated we can observe that the distribution of source ports is more even with the proposed approach. [2] [1] https://0xffsoftware.com/port_graph_current_alg.html [2] https://0xffsoftware.com/port_graph_random_step_alg.html Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260309023946.5473-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 18:59:39 -07:00
Jakub Kicinski	ae95cbaedb	Merge branch 'selftests-rds-ksft-cleanups' Allison Henderson says: ==================== selftests: rds: ksft cleanups This set addresses a few rds selftests clean ups and bugs encountered when running in the ksft framework. The first patch is a clean up patch that addresses pylint warnings, but otherwise no functional changes. The next patch moves the test time out to a ksft settings file so that the time out is set appropriately. And lastly we fix a tcpdump segfault caused by deprecated a os.fork() call. ==================== Link: https://patch.msgid.link/20260308055835.1338257-1-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 18:54:25 -07:00
Allison Henderson	87fdf57ded	selftests: rds: Fix tcpdump segfault in rds selftests net/rds/test.py sees a segfault in tcpdump when executed through the ksft runner. [ 21.903713] tcpdump[1469]: segfault at 0 ip 000072100e99126d sp 00007ffccf740fd0 error 4 [ 21.903721] in libc.so.6[16a26d,7798b149a000+188000] [ 21.905074] in libc.so.6[16a26d,72100e84f000+188000] likely on CPU 5 (core 5, socket 0) [ 21.905084] Code: 00 0f 85 a0 00 00 00 48 83 c4 38 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 05 91 8b 09 00 8b 4d ac 64 89 08 <41> 0f b6 07 83 e8 2b a8 fd 0f 84 54 ff ff ff 49 8b 36 4c 89 ff e8 [ 21.906760] likely on CPU 9 (core 9, socket 0) [ 21.913469] Code: 00 0f 85 a0 00 00 00 48 83 c4 38 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 05 91 8b 09 00 8b 4d ac 64 89 08 <41> 0f b6 07 83 e8 2b a8 fd 0f 84 54 ff ff ff 49 8b 36 4c 89 ff e8 The os.fork() call creates extra complexity because it forks the entire process including the python interpreter. ip() then calls cmd() which creates a subprocess.Popen. We can avoid the extra layering by simply calling subprocess.Popen directly. Track the process handles directly and terminate them at cleanup rather than relying on killall. Further tcpdump's -Z flag attempts to change savefile ownership, which is not supported by the 9p protocol. Fix this by writing pcap captures to "/tmp" during the test and move them to the log directory after tcpdump exits. Signed-off-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260308055835.1338257-4-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 18:54:24 -07:00
Allison Henderson	b873b4e160	selftests: rds: Add ksft timeout rds/run.sh sets a timer of 400s when calling test.py. However when tests are run through ksft, a default 45s timer is applied. Fix this by adding a ksft timeout in tools/testing/selftests/net/rds/settings Signed-off-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260308055835.1338257-3-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 18:54:15 -07:00
Allison Henderson	5a0c5702bd	selftests: rds: Fix pylint warnings Tidy up all exiting pylint errors in test.py. No functional changes are introduced in this patch Signed-off-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260308055835.1338257-2-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 18:53:42 -07:00
Jakub Kicinski	b8a0e5eb6a	tools: ynl: cli: order set->list conversion in JSON output NIPA tries to make sure that HW tests don't modify system state. It dumps some well known configs before and after the test and compares the outputs. Make sure that YNL json output is stable. Converting sets to lists with a naive list(o) results in a random order. Link: https://patch.msgid.link/20260307175916.1652518-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 17:54:02 -07:00
Jakub Kicinski	16767c72a4	Merge branch 'smc-sysctl-formatting-and-missing-entries' Kyoji Ogasawara says: ==================== smc-sysctl formatting and missing entries update SMC sysctl documentation in two small steps. - patch 1 fixes indentation in the smcr_buf_type section - patch 2 documents missing sysctl parameters limit_smc_hs and hs_ctrl, including values/defaults and hs_ctrl usage notes ==================== Link: https://patch.msgid.link/20260309124541.22723-1-sawara04.o@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 17:53:10 -07:00
Kyoji Ogasawara	aa5ec9d03b	net/smc: Add documentation for limit_smc_hs and hs_ctrl Document missing SMC sysctl parameters limit_smc_hs and hs_ctrl Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com> Reviewed-by: D. Wythe<alibuda@linux.alibaba.com> Link: https://patch.msgid.link/20260309124541.22723-3-sawara04.o@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 17:53:03 -07:00
Kyoji Ogasawara	4a51ac9056	net/smc: fix indentation in smcr_buf_type section smcr_buf_type section used inconsistent indentation compared with the rest of this document. Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com> Reviewed-by: D. Wythe<alibuda@linux.alibaba.com> Link: https://patch.msgid.link/20260309124541.22723-2-sawara04.o@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-10 17:52:36 -07:00
Paolo Abeni	05e059510e	Merge branch 'eth-fbnic-add-fbnic-self-tests' Mike Marciniszyn says: ==================== eth fbnic: Add fbnic self tests From: "Mike Marciniszyn (Meta)" <mike.marciniszyn@gmail.com> This series adds self tests to test the registers, the msix interrupts, the tlv, and the firmware mailbox. This series assumes that the [PATCH net-next 0/2] Add debugfs hooks [1] is present. When the self tests are run the with ethtool -t: ethtool -t eth0 The test result is PASS The test extra info: Register test (offline) 0 MSI-X Interrupt test (offline) 0 FW mailbox test (on/offline) 0 ==================== Link: https://patch.msgid.link/20260307105847.1438-1-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-10 13:53:55 +01:00
Mike Marciniszyn (Meta)	8e5218199d	eth fbnic: Add mailbox self test The mailbox self test ensures the interface to and from the firmware is healthy by sending a test message and fielding the response from the firmware. This patch uses the new completion API [1][2] that allocates a completion structure, binds the completion to the TEST message, and uses a new FW parsing routine that wraps the completion processing around the TLV parser. Link: https://patch.msgid.link/20250516164804.741348-1-lee@trager.us [1] Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com [2] Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-6-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-10 13:53:53 +01:00
Mike Marciniszyn (Meta)	d522b1b004	eth fbnic: TLV support for use by MBX self test The TLV (Type-Value-Length) self uses a known set of data to create a TLV message. These routines support the MBX self test by creating the test messages and parsing the response message coming back from the firmware. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-5-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-10 13:53:53 +01:00
Mike Marciniszyn (Meta)	99fc8d3d00	eth fbnic: Add msix self test This function is meant to test the global interrupt registers and the PCIe IP MSI-X functionality. It essentially goes through and tests various combinations of the set, clear, and mask bits in order to verify the behavior is as we expect it to be from the driver. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-4-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-10 13:53:53 +01:00
Mike Marciniszyn (Meta)	b43498b7e9	eth fbnic: Add register self test The register test will be used to verify hardware is behaving as expected. The test itself will have us writing to registers that should have no side effects due to us resetting after the test has been completed. While the test is being run the interface should be offline. This patch counts on the first patch of this series to export netif_open() and also ensures that the half close calls netif_close() to avoid deadlock. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-3-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-10 13:53:53 +01:00

1 2 3 4 5 ...

1428049 Commits