linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-13 11:48:58 -04:00

Author	SHA1	Message	Date
Lucien.Jheng	6b9c9def95	net: phy: air_en8811h: Introduce resume/suspend and clk_restore_context to ensure correct CKO settings after network interface reinitialization. If the user reinitializes the network interface, the PHY will reinitialize, and the CKO settings will revert to their initial configuration(be enabled). To prevent CKO from being re-enabled, en8811h_clk_restore_context and en8811h_resume were added to ensure the CKO settings remain correct. Signed-off-by: Lucien.Jheng <lucienzx159@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250630154147.80388-1-lucienzx159@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:35:43 -07:00
Christophe JAILLET	10c38949e0	net: dsa: hellcreek: Constify struct devlink_region_ops and struct hellcreek_fdb_entry 'struct devlink_region_ops' and 'struct hellcreek_fdb_entry' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 55320 19216 320 74856 12468 drivers/net/dsa/hirschmann/hellcreek.o After: ===== text data bss dec hex filename 55960 18576 320 74856 12468 drivers/net/dsa/hirschmann/hellcreek.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://patch.msgid.link/2f7e8dc30db18bade94999ac7ce79f333342e979.1751231174.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:33:31 -07:00
Jakub Kicinski	215891acb4	Merge branch 'seg6-fix-typos-in-comments-within-the-srv6-subsystem' Andrea Mayer says: ==================== seg6: fix typos in comments within the SRv6 subsystem In this patchset, we correct some typos found both in the SRv6 Endpoints implementation (i.e., seg6local) and in some SRv6 selftests, using codespell. ==================== Link: https://patch.msgid.link/20250629171226.4988-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:32:47 -07:00
Andrea Mayer	3bedaff19b	selftests: seg6: fix instaces typo in comments Fix a typo: instaces -> instances The typo has been identified using codespell, and the tool does not report any additional issues in the selftests considered. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250629171226.4988-3-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:32:45 -07:00
Andrea Mayer	db3e2ceab3	seg6: fix lenghts typo in a comment Fix a typo: lenghts -> length The typo has been identified using codespell, and the tool currently does not report any additional issues in comments. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250629171226.4988-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:32:45 -07:00
Christophe JAILLET	a63b5a0bb7	net: dsa: mv88e6xxx: Use kcalloc() Use kcalloc() instead of hand writing it. This is less verbose. Also move the initialization of 'count' to save some LoC. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 18652 5920 64 24636 603c drivers/net/dsa/mv88e6xxx/devlink.o After: ===== text data bss dec hex filename 18498 5920 64 24482 5fa2 drivers/net/dsa/mv88e6xxx/devlink.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/2f4fca4ff84950da71e007c9169f18a0272476f3.1751200453.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:31:38 -07:00
Christophe JAILLET	ff2d4cfdaf	net: dsa: mv88e6xxx: Constify struct devlink_region_ops and struct mv88e6xxx_region 'struct devlink_region_ops' and 'struct mv88e6xxx_region' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 18076 6496 64 24636 603c drivers/net/dsa/mv88e6xxx/devlink.o After: ===== text data bss dec hex filename 18652 5920 64 24636 603c drivers/net/dsa/mv88e6xxx/devlink.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/46040062161dda211580002f950a6d60433243dc.1751200453.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:31:38 -07:00
Eric Work	fad9cf2165	net: atlantic: add set_power to fw_ops for atl2 to fix wol Aquantia AQC113(C) using ATL2FW doesn't properly prepare the NIC for enabling wake-on-lan. The FW operation `set_power` was only implemented for `hw_atl` and not `hw_atl2`. Implement the `set_power` functionality for `hw_atl2`. Tested with both AQC113 and AQC113C devices. Confirmed you can shutdown the system and wake from S5 using magic packets. NIC was previously powered off when entering S5. If the NIC was configured for WOL by the Windows driver, loading the atlantic driver would disable WOL. Partially cherry-picks changes from commit, https://github.com/Aquantia/AQtion/commit/37bd5cc Attributing original authors from Marvell for the referenced commit. Closes: https://github.com/Aquantia/AQtion/issues/70 Co-developed-by: Igor Russkikh <irusskikh@marvell.com> Co-developed-by: Mark Starovoitov <mstarovoitov@marvell.com> Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com> Co-developed-by: Pavel Belous <pbelous@marvell.com> Co-developed-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Eric Work <work.eric@gmail.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Link: https://patch.msgid.link/20250629051535.5172-1-work.eric@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:30:58 -07:00
Kohei Enju	34a500caf4	rose: fix dangling neighbour pointers in rose_rt_device_down() There are two bugs in rose_rt_device_down() that can cause use-after-free: 1. The loop bound `t->count` is modified within the loop, which can cause the loop to terminate early and miss some entries. 2. When removing an entry from the neighbour array, the subsequent entries are moved up to fill the gap, but the loop index `i` is still incremented, causing the next entry to be skipped. For example, if a node has three neighbours (A, A, B) with count=3 and A is being removed, the second A is not checked. i=0: (A, A, B) -> (A, B) with count=2 ^ checked i=1: (A, B) -> (A, B) with count=2 ^ checked (B, not A!) i=2: (doesn't occur because i < count is false) This leaves the second A in the array with count=2, but the rose_neigh structure has been freed. Code that accesses these entries assumes that the first `count` entries are valid pointers, causing a use-after-free when it accesses the dangling pointer. Fix both issues by iterating over the array in reverse order with a fixed loop bound. This ensures that all entries are examined and that the removal of an entry doesn't affect subsequent iterations. Reported-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e04e2c007ba2c80476cb Tested-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250629030833.6680-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:28:48 -07:00
Haiyang Zhang	fbe346ce9d	net: mana: Handle Reset Request from MANA NIC Upon receiving the Reset Request, pause the connection and clean up queues, wait for the specified period, then resume the NIC. In the cleanup phase, the HWC is no longer responding, so set hwc_timeout to zero to skip waiting on the response. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1751055983-29760-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:17:53 -07:00
Oleksij Rempel	f461c7a885	phy: micrel: add Signal Quality Indicator (SQI) support for KSZ9477 switch PHYs Add support for the Signal Quality Indicator (SQI) feature on KSZ9477 family switches, providing a relative measure of receive signal quality. The hardware exposes separate SQI readings per channel. For 1000BASE-T, all four channels are read. For 100BASE-TX, only one channel is reported, but which receive pair is active depends on Auto MDI-X negotiation, which is not exposed by the hardware. Therefore, it is not possible to reliably map the measured channel to a specific wire pair. This resolves an earlier discussion about how to handle multi-channel SQI. Originally, the plan was to expose all channels individually. However, since pair mapping is sometimes unavailable, this implementation treats SQI as a per-link metric instead. This fallback avoids ambiguity and ensures consistent behavior. The existing get_sqi() UAPI was designed for single-pair Ethernet (SPE), where per-pair and per-link are effectively equivalent. Restricting its use to per-link metrics does not introduce regressions for existing users. The raw 7-bit SQI value (0–127, lower is better) is converted to the standard 0–7 (high is better) scale. Empirical testing showed that the link becomes unstable around a raw value of 8. The SQI raw value remains zero if no data is received, even if noise is present. This confirms that the measurement reflects the "quality" during active data reception rather than the passive line state. User space must ensure that traffic is present on the link to obtain valid SQI readings. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250627112539.895255-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:13:18 -07:00
Alok Tiwari	aaf2b24803	enic: fix incorrect MTU comparison in enic_change_mtu() The comparison in enic_change_mtu() incorrectly used the current netdev->mtu instead of the new new_mtu value when warning about an MTU exceeding the port MTU. This could suppress valid warnings or issue incorrect ones. Fix the condition and log to properly reflect the new_mtu. Fixes: `ab123fe071` ("enic: handle mtu change for vf properly") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: John Daley <johndale@cisco.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250628145612.476096-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 18:56:28 -07:00
Mina Almasry	be75d319d1	selftests: pp-bench: remove page_pool_put_page wrapper Minor cleanup: remove the pointless looking _ wrapper around page_pool_put_page, and just do the call directly. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20250627200501.1712389-2-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:42:11 -07:00
Mina Almasry	8d3e0982f7	selftests: pp-bench: remove unneeded linux/version.h linux/version.h was used by the out-of-tree version, but not needed in the upstream one anymore. While I'm at it, sort the includes. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506271434.Gk0epC9H-lkp@intel.com/ Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20250627200501.1712389-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:42:11 -07:00
Nicolas Dichtel	0341e34727	ip6_tunnel: enable to change proto of fb tunnels This is possible via the ioctl API: > ip -6 tunnel change ip6tnl0 mode any Let's align the netlink API: > ip link set ip6tnl0 type ip6tnl mode any Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20250630145602.1027220-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:40:57 -07:00
Sebastian Andrzej Siewior	131e0a1123	selftests/tc-testing: Enable CONFIG_IP_SET The config snippet specifies CONFIG_NET_EMATCH_IPSET. This option depends on CONFIG_IP_SET. Set CONFIG_IP_SET to be enabled at part for tc-testing. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250630153341.Wgh3SzGi@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:40:18 -07:00
Frank Li	69fcb70c43	dt-bindings: net: convert nxp,lpc1850-dwmac.txt to yaml format Convert nxp,lpc1850-dwmac.txt to yaml format. Additional changes: - compatible string add fallback as "nxp,lpc1850-dwmac", "snps,dwmac-3.611" "snps,dwmac". - add common interrupts, interrupt-names, clocks, clock-names, resets and reset-names properties. - add ref snps,dwmac.yaml. - add phy-mode in example to avoid dt_binding_check warning. - update examples to align lpc18xx.dtsi. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250630161613.2838039-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:39:57 -07:00
Dave Marquardt	16f87fb243	docs: netdevsim: fixe typo in netdevsim documentation Fixed a typographical error in "Rate objects" section Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com> Link: https://patch.msgid.link/20250630-netdevsim-typo-fix-v3-1-e1eae3a5f018@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:38:50 -07:00
Raju Rangoju	42fd432fe6	amd-xgbe: align CL37 AN sequence as per databook Update the Clause 37 Auto-Negotiation implementation to properly align with the PCS hardware specifications: - Fix incorrect bit settings in Link Status and Link Duplex fields - Implement missing sequence steps 2 and 7 These changes ensure CL37 auto-negotiation protocol follows the exact sequence patterns as specified in the hardware databook. Fixes: `1bf40ada62` ("amd-xgbe: Add support for clause 37 auto-negotiation") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20250630192636.3838291-1-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:37:41 -07:00
Dan Carpenter	e6ed134a4e	lib: test_objagg: Set error message in check_expect_hints_stats() Smatch complains that the error message isn't set in the caller: lib/test_objagg.c:923 test_hints_case2() error: uninitialized symbol 'errmsg'. This static checker warning only showed up after a recent refactoring but the bug dates back to when the code was originally added. This likely doesn't affect anything in real life. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202506281403.DsuyHFTZ-lkp@intel.com/ Fixes: `0a020d416d` ("lib: introduce initial implementation of object aggregation manager") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8548f423-2e3b-4bb7-b816-5041de2762aa@sabinyo.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:29:00 -07:00
Jakub Kicinski	3249eae7e4	net: ethtool: fix leaking netdev ref if ethnl_default_parse() failed Ido spotted that I made a mistake in commit under Fixes, ethnl_default_parse() may acquire a dev reference even when it returns an error. This may have been driven by the code structure in dumps (which unconditionally release dev before handling errors), but it's too much of a trap. Functions should undo what they did before returning an error, rather than expecting caller to clean up. Rather than fixing ethnl_default_set_doit() directly make ethnl_default_parse() clean up errors. Reported-by: Ido Schimmel <idosch@idosch.org> Link: https://lore.kernel.org/aGEPszpq9eojNF4Y@shredder Fixes: `963781bdfe` ("net: ethtool: call .parse_request for SET handlers") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250630154053.1074664-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:27:53 -07:00
Fushuai Wang	ca899622c5	sfc: siena: eliminate xdp_rxq_info_valid using XDP base API Commit `d48523cb88` ("sfc: Copy shared files needed for Siena (part 2)") use xdp_rxq_info_valid to track failures of xdp_rxq_info_reg(). However, this driver-maintained state becomes redundant since the XDP framework already provides xdp_rxq_info_is_reg() for checking registration status. Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20250628051033.51133-1-wangfushuai@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:02:12 -07:00
Fushuai Wang	582643672d	sfc: eliminate xdp_rxq_info_valid using XDP base API Commit `eb9a36be7f` ("sfc: perform XDP processing on received packets") use xdp_rxq_info_valid to track failures of xdp_rxq_info_reg(). However, this driver-maintained state becomes redundant since the XDP framework already provides xdp_rxq_info_is_reg() for checking registration status. Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20250628051016.51022-1-wangfushuai@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:01:15 -07:00
Linus Torvalds	65c1736c8e	Merge tag 'mfd-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fix from Lee Jones: - Fix some -Werror=unused-variable build errors * tag 'mfd-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: Fix building without CONFIG_OF	2025-07-01 13:50:21 -07:00
Linus Torvalds	ce95858aee	Merge tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Fix loop in GSS sequence number cache - Clean up /proc/net/rpc/nfs if nfs_fs_proc_net_init() fails - Fix a race to wake on NFS_LAYOUT_DRAIN - Fix handling of NFS level errors in I/O * tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4/flexfiles: Fix handling of NFS level errors in I/O NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAIN nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails. sunrpc: fix loop in gss seqno cache	2025-07-01 13:42:30 -07:00
Vitaly Lifshits	0325143b59	igc: disable L1.2 PCI-E link substate to avoid performance issue I226 devices advertise support for the PCI-E link L1.2 substate. However, due to a hardware limitation, the exit latency from this low-power state is longer than the packet buffer can tolerate under high traffic conditions. This can lead to packet loss and degraded performance. To mitigate this, disable the L1.2 substate. The increased power draw between L1.1 and L1.2 is insignificant. Fixes: `4354621173` ("igc: Add new device ID's") Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2025-07-01 08:26:08 -07:00
Ahmed Zaki	b2beb5bb2c	idpf: convert control queue mutex to a spinlock With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated on module load: [ 324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578 [ 324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager [ 324.701689] preempt_count: 201, expected: 0 [ 324.701693] RCU nest depth: 0, expected: 0 [ 324.701697] 2 locks held by NetworkManager/1582: [ 324.701702] #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0 [ 324.701730] #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870 [ 324.701749] Preemption disabled at: [ 324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870 [ 324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary) [ 324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 [ 324.701774] Call Trace: [ 324.701777] <TASK> [ 324.701779] dump_stack_lvl+0x5d/0x80 [ 324.701788] ? __dev_open+0x3dd/0x870 [ 324.701793] __might_resched.cold+0x1ef/0x23d <..> [ 324.701818] __mutex_lock+0x113/0x1b80 <..> [ 324.701917] idpf_ctlq_clean_sq+0xad/0x4b0 [idpf] [ 324.701935] ? kasan_save_track+0x14/0x30 [ 324.701941] idpf_mb_clean+0x143/0x380 [idpf] <..> [ 324.701991] idpf_send_mb_msg+0x111/0x720 [idpf] [ 324.702009] idpf_vc_xn_exec+0x4cc/0x990 [idpf] [ 324.702021] ? rcu_is_watching+0x12/0xc0 [ 324.702035] idpf_add_del_mac_filters+0x3ed/0xb50 [idpf] <..> [ 324.702122] __hw_addr_sync_dev+0x1cf/0x300 [ 324.702126] ? find_held_lock+0x32/0x90 [ 324.702134] idpf_set_rx_mode+0x317/0x390 [idpf] [ 324.702152] __dev_open+0x3f8/0x870 [ 324.702159] ? __pfx___dev_open+0x10/0x10 [ 324.702174] __dev_change_flags+0x443/0x650 <..> [ 324.702208] netif_change_flags+0x80/0x160 [ 324.702218] do_setlink.isra.0+0x16a0/0x3960 <..> [ 324.702349] rtnl_newlink+0x12fd/0x21e0 The sequence is as follows: rtnl_newlink()-> __dev_change_flags()-> __dev_open()-> dev_set_rx_mode() - > # disables BH and grabs "dev->addr_list_lock" idpf_set_rx_mode() -> # proceed only if VIRTCHNL2_CAP_MACFILTER is ON __dev_uc_sync() -> idpf_add_mac_filter -> idpf_add_del_mac_filters -> idpf_send_mb_msg() -> idpf_mb_clean() -> idpf_ctlq_clean_sq() # mutex_lock(cq_lock) Fix by converting cq_lock to a spinlock. All operations under the new lock are safe except freeing the DMA memory, which may use vunmap(). Fix by requesting a contiguous physical memory for the DMA mapping. Fixes: `a251eee621` ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2025-07-01 08:25:01 -07:00
Michal Swiatkowski	f77bf1ebf8	idpf: return 0 size for RSS key if not supported Returning -EOPNOTSUPP from function returning u32 is leading to cast and invalid size value as a result. -EOPNOTSUPP as a size probably will lead to allocation fail. Command: ethtool -x eth0 It is visible on all devices that don't have RSS caps set. [ 136.615917] Call Trace: [ 136.615921] <TASK> [ 136.615927] ? __warn+0x89/0x130 [ 136.615942] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.615953] ? report_bug+0x164/0x190 [ 136.615968] ? handle_bug+0x58/0x90 [ 136.615979] ? exc_invalid_op+0x17/0x70 [ 136.615987] ? asm_exc_invalid_op+0x1a/0x20 [ 136.616001] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616016] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.616028] __alloc_pages_noprof+0xe/0x20 [ 136.616038] ___kmalloc_large_node+0x80/0x110 [ 136.616072] __kmalloc_large_node_noprof+0x1d/0xa0 [ 136.616081] __kmalloc_noprof+0x32c/0x4c0 [ 136.616098] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616105] rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616114] ethnl_default_doit+0x107/0x3d0 [ 136.616131] genl_family_rcv_msg_doit+0x100/0x160 [ 136.616147] genl_rcv_msg+0x1b8/0x2c0 [ 136.616156] ? __pfx_ethnl_default_doit+0x10/0x10 [ 136.616168] ? __pfx_genl_rcv_msg+0x10/0x10 [ 136.616176] netlink_rcv_skb+0x58/0x110 [ 136.616186] genl_rcv+0x28/0x40 [ 136.616195] netlink_unicast+0x19b/0x290 [ 136.616206] netlink_sendmsg+0x222/0x490 [ 136.616215] __sys_sendto+0x1fd/0x210 [ 136.616233] __x64_sys_sendto+0x24/0x30 [ 136.616242] do_syscall_64+0x82/0x160 [ 136.616252] ? __sys_recvmsg+0x83/0xe0 [ 136.616265] ? syscall_exit_to_user_mode+0x10/0x210 [ 136.616275] ? do_syscall_64+0x8e/0x160 [ 136.616282] ? __count_memcg_events+0xa1/0x130 [ 136.616295] ? count_memcg_events.constprop.0+0x1a/0x30 [ 136.616306] ? handle_mm_fault+0xae/0x2d0 [ 136.616319] ? do_user_addr_fault+0x379/0x670 [ 136.616328] ? clear_bhb_loop+0x45/0xa0 [ 136.616340] ? clear_bhb_loop+0x45/0xa0 [ 136.616349] ? clear_bhb_loop+0x45/0xa0 [ 136.616359] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 136.616369] RIP: 0033:0x7fd30ba7b047 [ 136.616376] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d bd d5 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 c3 55 48 83 ec 30 44 89 4c 24 2c 4c 89 44 [ 136.616381] RSP: 002b:00007ffde1796d68 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 136.616388] RAX: ffffffffffffffda RBX: 000055d7bd89f2a0 RCX: 00007fd30ba7b047 [ 136.616392] RDX: 0000000000000028 RSI: 000055d7bd89f3b0 RDI: 0000000000000003 [ 136.616396] RBP: 00007ffde1796e10 R08: 00007fd30bb4e200 R09: 000000000000000c [ 136.616399] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7bd89f340 [ 136.616403] R13: 000055d7bd89f3b0 R14: 000055d78943f200 R15: 0000000000000000 Fixes: `02cbfba1ad` ("idpf: add ethtool callbacks") Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2025-07-01 08:23:52 -07:00
RubenKelevra	21deb2d966	net: ieee8021q: fix insufficient table-size assertion _Static_assert(ARRAY_SIZE(map) != IEEE8021Q_TT_MAX - 1) rejects only a length of 7 and allows any other mismatch. Replace it with a strict equality test via a helper macro so that every mapping table must have exactly IEEE8021Q_TT_MAX (8) entries. Signed-off-by: RubenKelevra <rubenkelevra@gmail.com> Link: https://patch.msgid.link/20250626205907.1566384-1-rubenkelevra@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-01 12:55:49 +02:00
Jakub Kicinski	8b79380dfe	docs: fbnic: explain the ring config fbnic takes 4 parameters to configure the Rx queues. The semantics are similar to other existing NICs but confusing to newcomers. Document it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250626191554.32343-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-01 12:48:28 +02:00
Oleksij Rempel	c22f056e49	net: usb: lan78xx: fix possible NULL pointer dereference in lan78xx_phy_init() If no PHY device is found (e.g., for LAN7801 in fixed-link mode), lan78xx_phy_init() may proceed to dereference a NULL phydev pointer, leading to a crash. Update the logic to perform MAC configuration first, then check for the presence of a PHY. For the fixed-link case, set up the fixed link and return early, bypassing any code that assumes a valid phydev pointer. It is safe to move lan78xx_mac_prepare_for_phy() earlier because this function only uses information from dev->interface, which is configured by lan78xx_get_phy() beforehand. The function does not access phydev or any data set up by later steps. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: `e110bc8258` ("net: usb: lan78xx: Convert to PHYLINK for improved PHY and MAC management") Link: https://patch.msgid.link/20250626103731.3986545-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-01 12:42:51 +02:00
Paolo Abeni	8f24003079	Merge branch 'clean-up-usage-of-ffi-types' Tamir Duberstein says: ==================== Clean up usage of ffi types Remove qualification of ffi types which are included in the prelude and change `as` casts to target the proper ffi type alias rather than the underlying primitive. Signed-off-by: Tamir Duberstein <tamird@gmail.com> ==================== Link: https://patch.msgid.link/20250625-correct-type-cast-v2-0-6f2c29729e69@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-01 09:49:22 +02:00
Tamir Duberstein	c9a7bcd2c0	Cast to the proper type Use the ffi type rather than the resolved underlying type. Acked-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Link: https://patch.msgid.link/20250625-correct-type-cast-v2-2-6f2c29729e69@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-01 09:49:18 +02:00
Tamir Duberstein	22955d942f	Use unqualified references to ffi types Remove unnecessary qualifications; `kernel::ffi::*` is included in `kernel::prelude`. Signed-off-by: Tamir Duberstein <tamird@gmail.com> Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20250625-correct-type-cast-v2-1-6f2c29729e69@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-01 09:49:18 +02:00
Jakub Kicinski	72fb83735c	Merge tag 'for-net-2025-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - MGMT: set_mesh: update LE scan interval and window - MGMT: mesh_send: check instances prior disabling advertising - hci_sync: revert some mesh modifications - hci_sync: Set extended advertising data synchronously - hci_sync: Prevent unintended pause by checking if advertising is active * tag 'for-net-2025-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: HCI: Set extended advertising data synchronously Bluetooth: MGMT: mesh_send: check instances prior disabling advertising Bluetooth: MGMT: set_mesh: update LE scan interval and window Bluetooth: hci_sync: revert some mesh modifications Bluetooth: Prevent unintended pause by checking if advertising is active ==================== Link: https://patch.msgid.link/20250627181601.520435-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:58:47 -07:00
Eric Dumazet	aed4969f2b	net: net->nsid_lock does not need BH safety At the time of commit `bc51dddf98` ("netns: avoid disabling irq for netns id") peernet2id() was not yet using RCU. Commit `2dce224f46` ("netns: protect netns ID lookups with RCU") changed peernet2id() to no longer acquire net->nsid_lock (potentially from BH context). We do not need to block soft interrupts when acquiring net->nsid_lock anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Guillaume Nault <gnault@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250627163242.230866-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:58:20 -07:00
Lukas Bulwahn	3b2c45cb1b	MAINTAINERS: adjust file entry after renaming rzv2h-gbeth dtb Commit `d53320aeef` ("dt-bindings: net: Rename renesas,r9a09g057-gbeth.yaml") renames the net devicetree binding renesas,r9a09g057-gbeth.yaml to renesas,rzv2h-gbeth.yaml, but misses to adjust the file entry in the RENESAS RZ/V2H(P) DWMAC GBETH GLUE LAYER DRIVER section in MAINTAINERS. Adjust the file entry after this file renaming. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20250627134453.51780-1-lukas.bulwahn@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:32:37 -07:00
Oleksij Rempel	6c7ffc9af7	net: usb: lan78xx: fix WARN in __netif_napi_del_locked on disconnect Remove redundant netif_napi_del() call from disconnect path. A WARN may be triggered in __netif_napi_del_locked() during USB device disconnect: WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 This happens because netif_napi_del() is called in the disconnect path while NAPI is still enabled. However, it is not necessary to call netif_napi_del() explicitly, since unregister_netdev() will handle NAPI teardown automatically and safely. Removing the redundant call avoids triggering the warning. Full trace: lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV lan78xx 1-1:1.0 enu1: Link is Down lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 Modules linked in: flexcan can_dev fuse CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT Hardware name: SKOV IMX8MP CPU revC - bd500 (DT) Workqueue: usb_hub_wq hub_event pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __netif_napi_del_locked+0x2b4/0x350 lr : __netif_napi_del_locked+0x7c/0x350 sp : ffffffc085b673c0 x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8 x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000 x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000 x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028 x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8 x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000 x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001 x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000 Call trace: __netif_napi_del_locked+0x2b4/0x350 (P) lan78xx_disconnect+0xf4/0x360 usb_unbind_interface+0x158/0x718 device_remove+0x100/0x150 device_release_driver_internal+0x308/0x478 device_release_driver+0x1c/0x30 bus_remove_device+0x1a8/0x368 device_del+0x2e0/0x7b0 usb_disable_device+0x244/0x540 usb_disconnect+0x220/0x758 hub_event+0x105c/0x35e0 process_one_work+0x760/0x17b0 worker_thread+0x768/0xce8 kthread+0x3bc/0x690 ret_from_fork+0x10/0x20 irq event stamp: 211604 hardirqs last enabled at (211603): [<ffffffc0828cc9ec>] _raw_spin_unlock_irqrestore+0x84/0x98 hardirqs last disabled at (211604): [<ffffffc0828a9a84>] el1_dbg+0x24/0x80 softirqs last enabled at (211296): [<ffffffc080095f10>] handle_softirqs+0x820/0xbc8 softirqs last disabled at (210993): [<ffffffc080010288>] __do_softirq+0x18/0x20 ---[ end trace 0000000000000000 ]--- lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0 Fixes: `ec4c7e1239` ("lan78xx: Introduce NAPI polling support") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250627051346.276029-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:26:22 -07:00
Jakub Kicinski	7878e21e40	Merge branch 'net-enetc-change-some-statistics-to-64-bit' Wei Fang says: ==================== net: enetc: change some statistics to 64-bit The port MAC counters of ENETC are 64-bit registers and the statistics of ethtool are also u64 type, so add enetc_port_rd64() helper function to read 64-bit statistics from these registers, and also change the statistics of ring to unsigned long type to be consistent with the statistics type in struct net_device_stats. v1: https://lore.kernel.org/20250620102140.2020008-1-wei.fang@nxp.com v2: https://lore.kernel.org/20250624101548.2669522-1-wei.fang@nxp.com ==================== Link: https://patch.msgid.link/20250627021108.3359642-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:23:58 -07:00
Wei Fang	4c7ef31984	net: enetc: read 64-bit statistics from port MAC counters The counters of port MAC are all 64-bit registers, and the statistics of ethtool are u64 type, so replace enetc_port_rd() with enetc_port_rd64() to read 64-bit statistics. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250627021108.3359642-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:23:55 -07:00
Wei Fang	9fe5f7145a	net: enetc: separate 64-bit counters from enetc_port_counters Some counters in enetc_port_counters are 32-bit registers, and some are 64-bit registers. But in the current driver, they are all read through enetc_port_rd(), which can only read a 32-bit value. Therefore, separate 64-bit counters (enetc_pm_counters) from enetc_port_counters and use enetc_port_rd64() to read the 64-bit statistics. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250627021108.3359642-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:23:54 -07:00
Wei Fang	f5ed33771b	net: enetc: change the statistics of ring to unsigned long type The statistics of the ring are all unsigned int type, so the statistics will overflow quickly under heavy traffic. In addition, the statistics of struct net_device_stats are obtained from struct enetc_ring_stats, but the statistics of net_device_stats are unsigned long type. So it is better to keep the statistics types consistent in these two structures. Considering these two factors, and the fact that both LS1028A and i.MX95 are arm64 architecture, the statistics of enetc_ring_stats are changed to unsigned long type. Note that unsigned int and unsigned long are the same thing on some systems, and on such systems there is no overflow advantage of one over the other. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250627021108.3359642-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:23:54 -07:00
Jonas Rebmann	b7ad21258f	net: fec: allow disable coalescing In the current implementation, IP coalescing is always enabled and cannot be disabled. As setting maximum frames to 0 or 1, or setting delay to zero implies immediate delivery of single packets/IRQs, disable coalescing in hardware in these cases. This also guarantees that coalescing is never enabled with ICFT or ICTT set to zero, a configuration that could lead to unpredictable behaviour according to i.MX8MP reference manual. Signed-off-by: Jonas Rebmann <jre@pengutronix.de> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250626-fec_deactivate_coalescing-v2-1-0b217f2e80da@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:22:32 -07:00
Jiawen Wu	e39ed71c7a	net: txgbe: fix the issue of TX failure There is a occasional problem that ping is failed between AML devices. That is because the manual enablement of the security Tx path on the hardware is missing, no matter what its previous state was. Fixes: `6f8b4c01a8` ("net: txgbe: Implement PHYLINK for AML 25G/10G devices") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/5BDFB14C57D1C42A+20250626085153.86122-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:15:53 -07:00
Jakub Kicinski	b28fe7f20a	Merge branch 'add-support-for-externally-validated-neighbor-entries' Ido Schimmel says: ==================== Add support for externally validated neighbor entries Patch #1 adds a new neighbor flag ("extern_valid") that prevents the kernel from invalidating or removing a neighbor entry, while allowing the kernel to notify user space when the entry becomes reachable. See motivation and implementation details in the commit message. Patch #2 adds a selftest. v1: https://lore.kernel.org/20250611141551.462569-1-idosch@nvidia.com ==================== Link: https://patch.msgid.link/20250626073111.244534-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:14:26 -07:00
Ido Schimmel	171f2ee31a	selftests: net: Add a selftest for externally validated neighbor entries Add test cases for externally validated neighbor entries, testing both IPv4 and IPv6. Name the file "test_neigh.sh" so that it could be possibly extended in the future with more neighbor test cases. Example output: # ./test_neigh.sh TEST: IPv4 "extern_valid" flag: Add entry [ OK ] TEST: IPv4 "extern_valid" flag: Add with an invalid state [ OK ] TEST: IPv4 "extern_valid" flag: Add with "use" flag [ OK ] TEST: IPv4 "extern_valid" flag: Replace entry [ OK ] TEST: IPv4 "extern_valid" flag: Replace entry with "managed" flag [ OK ] TEST: IPv4 "extern_valid" flag: Replace with an invalid state [ OK ] TEST: IPv4 "extern_valid" flag: Interface down [ OK ] TEST: IPv4 "extern_valid" flag: Carrier down [ OK ] TEST: IPv4 "extern_valid" flag: Transition to "reachable" state [ OK ] TEST: IPv4 "extern_valid" flag: Transition back to "stale" state [ OK ] TEST: IPv4 "extern_valid" flag: Forced garbage collection [ OK ] TEST: IPv4 "extern_valid" flag: Periodic garbage collection [ OK ] TEST: IPv6 "extern_valid" flag: Add entry [ OK ] TEST: IPv6 "extern_valid" flag: Add with an invalid state [ OK ] TEST: IPv6 "extern_valid" flag: Add with "use" flag [ OK ] TEST: IPv6 "extern_valid" flag: Replace entry [ OK ] TEST: IPv6 "extern_valid" flag: Replace entry with "managed" flag [ OK ] TEST: IPv6 "extern_valid" flag: Replace with an invalid state [ OK ] TEST: IPv6 "extern_valid" flag: Interface down [ OK ] TEST: IPv6 "extern_valid" flag: Carrier down [ OK ] TEST: IPv6 "extern_valid" flag: Transition to "reachable" state [ OK ] TEST: IPv6 "extern_valid" flag: Transition back to "stale" state [ OK ] TEST: IPv6 "extern_valid" flag: Forced garbage collection [ OK ] TEST: IPv6 "extern_valid" flag: Periodic garbage collection [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250626073111.244534-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:14:24 -07:00
Ido Schimmel	03dc03fa04	neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries tl;dr ===== Add a new neighbor flag ("extern_valid") that can be used to indicate to the kernel that a neighbor entry was learned and determined to be valid externally. The kernel will not try to remove or invalidate such an entry, leaving these decisions to the user space control plane. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches (VTEPs). VTEPs that are connected to the same ES are called ES peers. When a neighbor entry is learned on a VTEP, it is distributed to both ES peers and remote VTEPs using EVPN MAC/IP advertisement routes. ES peers use the neighbor entry when routing traffic towards the multi-homed host and remote VTEPs use it for ARP/NS suppression. Motivation ========== If the ES link between a host and the VTEP on which the neighbor entry was locally learned goes down, the EVPN MAC/IP advertisement route will be withdrawn and the neighbor entries will be removed from both ES peers and remote VTEPs. Routing towards the multi-homed host and ARP/NS suppression can fail until another ES peer locally learns the neighbor entry and distributes it via an EVPN MAC/IP advertisement route. "draft-rbickhart-evpn-ip-mac-proxy-adv-03" [1] suggests avoiding these intermittent failures by having the ES peers install the neighbor entries as before, but also injecting EVPN MAC/IP advertisement routes with a proxy indication. When the previously mentioned ES link goes down and the original EVPN MAC/IP advertisement route is withdrawn, the ES peers will not withdraw their neighbor entries, but instead start aging timers for the proxy indication. If an ES peer locally learns the neighbor entry (i.e., it becomes "reachable"), it will restart its aging timer for the entry and emit an EVPN MAC/IP advertisement route without a proxy indication. An ES peer will stop its aging timer for the proxy indication if it observes the removal of the proxy indication from at least one of the ES peers advertising the entry. In the event that the aging timer for the proxy indication expired, an ES peer will withdraw its EVPN MAC/IP advertisement route. If the timer expired on all ES peers and they all withdrew their proxy advertisements, the neighbor entry will be completely removed from the EVPN fabric. Implementation ============== In the above scheme, when the control plane (e.g., FRR) advertises a neighbor entry with a proxy indication, it expects the corresponding entry in the data plane (i.e., the kernel) to remain valid and not be removed due to garbage collection or loss of carrier. The control plane also expects the kernel to notify it if the entry was learned locally (i.e., became "reachable") so that it will remove the proxy indication from the EVPN MAC/IP advertisement route. That is why these entries cannot be programmed with dummy states such as "permanent" or "noarp". Instead, add a new neighbor flag ("extern_valid") which indicates that the entry was learned and determined to be valid externally and should not be removed or invalidated by the kernel. The kernel can probe the entry and notify user space when it becomes "reachable" (it is initially installed as "stale"). However, if the kernel does not receive a confirmation, have it return the entry to the "stale" state instead of the "failed" state. In other words, an entry marked with the "extern_valid" flag behaves like any other dynamically learned entry other than the fact that the kernel cannot remove or invalidate it. One can argue that the "extern_valid" flag should not prevent garbage collection and that instead a neighbor entry should be programmed with both the "extern_valid" and "extern_learn" flags. There are two reasons for not doing that: 1. Unclear why a control plane would like to program an entry that the kernel cannot invalidate but can completely remove. 2. The "extern_learn" flag is used by FRR for neighbor entries learned on remote VTEPs (for ARP/NS suppression) whereas here we are concerned with local entries. This distinction is currently irrelevant for the kernel, but might be relevant in the future. Given that the flag only makes sense when the neighbor has a valid state, reject attempts to add a neighbor with an invalid state and with this flag set. For example: # ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid Error: Cannot create externally validated neighbor with an invalid state. # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 nud failed dev br0.10 extern_valid Error: Cannot mark neighbor as externally validated with an invalid state. The above means that a neighbor cannot be created with the "extern_valid" flag and flags such as "use" or "managed" as they result in a neighbor being created with an invalid state ("none") and immediately getting probed: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use Error: Cannot create externally validated neighbor with an invalid state. However, these flags can be used together with "extern_valid" after the neighbor was created with a valid state: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use One consequence of preventing the kernel from invalidating a neighbor entry is that by default it will only try to determine reachability using unicast probes. This can be changed using the "mcast_resolicit" sysctl: # sysctl net.ipv4.neigh.br0/10.mcast_resolicit 0 # tcpdump -nn -e -i br0.10 -Q out arp & # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 # sysctl -wq net.ipv4.neigh.br0/10.mcast_resolicit=3 # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 iproute2 patches can be found here [2]. [1] https://datatracker.ietf.org/doc/html/draft-rbickhart-evpn-ip-mac-proxy-adv-03 [2] https://github.com/idosch/iproute2/tree/submit/extern_valid_v1 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250626073111.244534-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:14:23 -07:00
Eric Dumazet	af232e7615	ipv6: guard ip6_mr_output() with rcu syzbot found at least one path leads to an ip_mr_output() without RCU being held. Add guard(rcu)() to fix this in a concise way. WARNING: net/ipv6/ip6mr.c:2376 at ip6_mr_output+0xe0b/0x1040 net/ipv6/ip6mr.c:2376, CPU#1: kworker/1:2/121 Call Trace: <TASK> ip6tunnel_xmit include/net/ip6_tunnel.h:162 [inline] udp_tunnel6_xmit_skb+0x640/0xad0 net/ipv6/ip6_udp_tunnel.c:112 send6+0x5ac/0x8d0 drivers/net/wireguard/socket.c:152 wg_socket_send_skb_to_peer+0x111/0x1d0 drivers/net/wireguard/socket.c:178 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x1c8/0x7c0 drivers/net/wireguard/send.c:276 process_one_work kernel/workqueue.c:3239 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3322 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3403 kthread+0x70e/0x8a0 kernel/kthread.c:464 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fixes: `96e8f5a9fe` ("net: ipv6: Add ip6_mr_output()") Reported-by: syzbot+0141c834e47059395621@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685e86b3.a00a0220.129264.0003.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Benjamin Poirier <bpoirier@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250627115822.3741390-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 17:57:29 -07:00
Linus Torvalds	66701750d5	Merge tag 'io_uring-6.16-20250630' of git://git.kernel.dk/linux Pull io_uring fix from Jens Axboe: "Now that anonymous inodes set S_IFREG, this breaks the io_uring read/write retries for short reads/writes. As things like timerfd and eventfd are anon inodes, applications that previously did: unsigned long event_data[2]; io_uring_prep_read(sqe, evfd, event_data, sizeof(event_data), 0); and just got a short read when 1 event was posted, will now wait for the full amount before posting a completion. This caused issues for the ghostty application, making it basically unusable due to excessive buffering" * tag 'io_uring-6.16-20250630' of git://git.kernel.dk/linux: io_uring: gate REQ_F_ISREG on !S_ANON_INODE as well	2025-06-30 16:32:43 -07:00
Mark Bloch	60f7f4afaf	MAINTAINERS: Add myself as mlx5 core and mlx5e co-maintainer I have been working on mlx5 related code for several years, contributing features, code reviews, and occasional maintainer tasks when needed. This patch makes my maintainer role official. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20250627014252.1262592-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 08:43:19 -07:00

1 2 3 4 5 ...

1369260 Commits