linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 05:31:37 -04:00

Author	SHA1	Message	Date
Weiming Shi	2091c6aa0d	openvswitch: cap upcall PID array size and pre-size vport replies The vport netlink reply helpers allocate a fixed-size skb with nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID array via ovs_vport_get_upcall_portids(). Since ovs_vport_set_upcall_portids() accepts any non-zero multiple of sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID array large enough to overflow the reply buffer, causing nla_put() to fail with -EMSGSIZE and hitting BUG_ON(err < 0). On systems with unprivileged user namespaces enabled (e.g., Ubuntu default), this is reachable via unshare -Urn since OVS vport mutation operations use GENL_UNS_ADMIN_PERM. kernel BUG at net/openvswitch/datapath.c:2414! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 1 UID: 0 PID: 65 Comm: poc Not tainted 7.0.0-rc7-00195-geb216e422044 #1 RIP: 0010:ovs_vport_cmd_set+0x34c/0x400 Call Trace: <TASK> genl_family_rcv_msg_doit (net/netlink/genetlink.c:1116) genl_rcv_msg (net/netlink/genetlink.c:1194) netlink_rcv_skb (net/netlink/af_netlink.c:2550) genl_rcv (net/netlink/genetlink.c:1219) netlink_unicast (net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) __sys_sendto (net/socket.c:2206) __x64_sys_sendto (net/socket.c:2209) do_syscall_64 (arch/x86/entry/syscall_64.c:63) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> Kernel panic - not syncing: Fatal exception Reject attempts to set more PIDs than nr_cpu_ids in ovs_vport_set_upcall_portids(), and pre-compute the worst-case reply size in ovs_vport_cmd_msg_size() based on that bound, similar to the existing ovs_dp_cmd_msg_size(). nr_cpu_ids matches the cap already used by the per-CPU dispatch configuration on the datapath side (ovs_dp_cmd_fill_info() serialises at most nr_cpu_ids PIDs), so the two sides stay consistent. Fixes: `5cd667b0a4` ("openvswitch: Allow each vport to have an array of 'port_id's.") Reported-by: Xiang Mei <xmei5@asu.edu> Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Ilya Maximets <i.maximets@ovn.org> Link: https://patch.msgid.link/20260416024653.153456-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-20 11:43:04 -07:00
Prathamesh Deshpande	d03fc81a57	net/mlx5: Fix HCA caps leak on notifier init failure mlx5_mdev_init() allocates HCA caps via mlx5_hca_caps_alloc() before calling mlx5_notifiers_init(). If notifier initialization fails, the error path jumps to err_hca_caps and skips mlx5_hca_caps_free(), leaking allocated caps. Add a dedicated unwind label for notifier-init failure that frees HCA caps before continuing the existing cleanup sequence. Fixes: `b6b03097f9` ("net/mlx5: Initialize events outside devlink lock") Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260415005022.34764-1-prathameshdeshpande7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-20 11:42:30 -07:00
Qingfang Deng	cc1ff87bce	pppoe: drop PFC frames RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT RECOMMENDED for PPPoE. In practice, pppd does not support negotiating PFC for PPPoE sessions, and the current PPPoE driver assumes an uncompressed (2-byte) protocol field. However, the generic PPP layer function ppp_input() is not aware of the negotiation result, and still accepts PFC frames. If a peer with a broken implementation or an attacker sends a frame with a compressed (1-byte) protocol field, the subsequent PPP payload is shifted by one byte. This causes the network header to be 4-byte misaligned, which may trigger unaligned access exceptions on some architectures. To reduce the attack surface, drop PPPoE PFC frames. Introduce ppp_skb_is_compressed_proto() helper function to be used in both ppp_generic.c and pppoe.c to avoid open-coding. Fixes: `7fb1b8ca8f` ("ppp: Move PFC decompression to PPP generic layer") Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260415022456.141758-2-qingfang.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-20 11:35:17 -07:00
Qingfang Deng	d6c19b31a3	flow_dissector: do not dissect PPPoE PFC frames RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT RECOMMENDED for PPPoE. In practice, pppd does not support negotiating PFC for PPPoE sessions, and the flow dissector driver has assumed an uncompressed frame until the blamed commit. During the review process of that commit [1], support for PFC is suggested. However, having a compressed (1-byte) protocol field means the subsequent PPP payload is shifted by one byte, causing 4-byte misalignment for the network header and an unaligned access exception on some architectures. The exception can be reproduced by sending a PPPoE PFC frame to an ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE session is active on that interface: $ 0 : 00000000 80c40000 00000000 85144817 $ 4 : 00000008 00000100 80a75758 81dc9bb8 $ 8 : 00000010 8087ae2c 0000003d 00000000 $12 : 000000e0 00000039 00000000 00000000 $16 : 85043240 80a75758 81dc9bb8 00006488 $20 : 0000002f 00000007 85144810 80a70000 $24 : 81d1bda0 00000000 $28 : 81dc8000 81dc9aa8 00000000 805ead08 Hi : 00009d51 Lo : 2163358a epc : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50 ra : 805ead08 __skb_get_hash_net+0x74/0x12c Status: 11000403 KERNEL EXL IE Cause : 40800010 (ExcCode 04) BadVA : 85144817 PrId : 0001992f (MIPS 1004Kc) Call Trace: [<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50 [<805ead08>] __skb_get_hash_net+0x74/0x12c [<805ef330>] get_rps_cpu+0x1b8/0x3fc [<805fca70>] netif_receive_skb_list_internal+0x324/0x364 [<805fd120>] napi_complete_done+0x68/0x2a4 [<8058de5c>] mtk_napi_rx+0x228/0xfec [<805fd398>] __napi_poll+0x3c/0x1c4 [<805fd754>] napi_threaded_poll_loop+0x234/0x29c [<805fd848>] napi_threaded_poll+0x8c/0xb0 [<80053544>] kthread+0x104/0x12c [<80002bd8>] ret_from_kernel_thread+0x14/0x1c Code: 02d51821 1060045b 00000000 <8c640000> 3084000f 2c820005 144001a2 00042080 8e220000 To reduce the attack surface and maintain performance, do not process PPPoE PFC frames. [1] https://lore.kernel.org/r/20220630231016.GA392@debian.home Fixes: `46126db9c8` ("flow_dissector: Add PPPoE dissectors") Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev> Link: https://patch.msgid.link/20260415022456.141758-1-qingfang.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-20 11:35:16 -07:00
Michael Bommarito	0cf004ffb6	sctp: fix OOB write to userspace in sctp_getsockopt_peer_auth_chunks sctp_getsockopt_peer_auth_chunks() checks that the caller's optval buffer is large enough for the peer AUTH chunk list with if (len < num_chunks) return -EINVAL; but then writes num_chunks bytes to p->gauth_chunks, which lives at offset offsetof(struct sctp_authchunks, gauth_chunks) == 8 inside optval. The check is missing the sizeof(struct sctp_authchunks) = 8-byte header. When the caller supplies len == num_chunks (for any num_chunks > 0) the test passes but copy_to_user() writes sizeof(struct sctp_authchunks) = 8 bytes past the declared buffer. The sibling function sctp_getsockopt_local_auth_chunks() at the next line already has the correct check: if (len < sizeof(struct sctp_authchunks) + num_chunks) return -EINVAL; Align the peer variant with its sibling. Reproducer confirms on v7.0-13-generic: an unprivileged userspace caller that opens a loopback SCTP association with AUTH enabled, queries num_chunks with a short optval, then issues the real getsockopt with len == num_chunks and sentinel bytes painted past the buffer observes those sentinel bytes overwritten with the peer's AUTH chunk type. The bytes written are under the peer's control but land in the caller's own userspace; this is not a kernel memory corruption, but it is a kernel-side contract violation that can silently corrupt adjacent userspace data. Fixes: `65b07e5d0d` ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260416031903.1447072-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:16:14 -07:00
Marek Vasut	22230e68b2	net: ks8851: Avoid excess softirq scheduling The code injects a packet into netif_rx() repeatedly, which will add it to its internal NAPI and schedule a softirq, and process it. It is more efficient to queue multiple packets and process them all at the local_bh_enable() time. Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Fixes: `e0863634bf` ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs") Cc: stable@vger.kernel.org Signed-off-by: Marek Vasut <marex@nabladev.com> Link: https://patch.msgid.link/20260415231020.455298-2-marex@nabladev.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:14:19 -07:00
Marek Vasut	5c9fcac3c8	net: ks8851: Reinstate disabling of BHs around IRQ handler If the driver executes ks8851_irq() AND a TX packet has been sent, then the driver enables TX queue via netif_wake_queue() which schedules TX softirq to queue packets for this device. If CONFIG_PREEMPT_RT=y is set AND a packet has also been received by the MAC, then ks8851_rx_pkts() calls netdev_alloc_skb_ip_align() to allocate SKBs for the received packets. If netdev_alloc_skb_ip_align() is called with BH enabled, then local_bh_enable() at the end of netdev_alloc_skb_ip_align() will trigger the pending softirq processing, which may ultimately call the .xmit callback ks8851_start_xmit_par(). The ks8851_start_xmit_par() will try to lock struct ks8851_net_par .lock spinlock, which is already locked by ks8851_irq() from which ks8851_start_xmit_par() was called. This leads to a deadlock, which is reported by the kernel, including a trace listed below. If CONFIG_PREEMPT_RT is not set, then since commit `0913ec336a` ("net: ks8851: Fix deadlock with the SPI chip variant") the deadlock can also be triggered without received packet in the RX FIFO. The pending softirqs will be processed on return from spin_unlock_bh(&ks->statelock) in ks8851_irq(), which triggers the deadlock as well. Fix the problem by disabling BH around critical sections, including the IRQ handler, thus preventing the net_tx_action() softirq from triggering during these critical sections. The net_tx_action() softirq is triggered once BH are re-enabled and at the end of the IRQ handler, once all the other IRQ handler actions have been completed. __schedule from schedule_rtlock+0x1c/0x34 schedule_rtlock from rtlock_slowlock_locked+0x548/0x904 rtlock_slowlock_locked from rt_spin_lock+0x60/0x9c rt_spin_lock from ks8851_start_xmit_par+0x74/0x1a8 ks8851_start_xmit_par from netdev_start_xmit+0x20/0x44 netdev_start_xmit from dev_hard_start_xmit+0xd0/0x188 dev_hard_start_xmit from sch_direct_xmit+0xb8/0x25c sch_direct_xmit from __qdisc_run+0x1f8/0x4ec __qdisc_run from qdisc_run+0x1c/0x28 qdisc_run from net_tx_action+0x1f0/0x268 net_tx_action from handle_softirqs+0x1a4/0x270 handle_softirqs from __local_bh_enable_ip+0xcc/0xe0 __local_bh_enable_ip from __alloc_skb+0xd8/0x128 __alloc_skb from __netdev_alloc_skb+0x3c/0x19c __netdev_alloc_skb from ks8851_irq+0x388/0x4d4 ks8851_irq from irq_thread_fn+0x24/0x64 irq_thread_fn from irq_thread+0x178/0x28c irq_thread from kthread+0x12c/0x138 kthread from ret_from_fork+0x14/0x28 Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Fixes: `e0863634bf` ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs") Cc: stable@vger.kernel.org Signed-off-by: Marek Vasut <marex@nabladev.com> Link: https://patch.msgid.link/20260415231020.455298-1-marex@nabladev.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:14:19 -07:00
Kuniyuki Iwashima	965dc93481	af_unix: Drop all SCM attributes for SOCKMAP. SOCKMAP can hide inflight fd from AF_UNIX GC. When a socket in SOCKMAP receives skb with inflight fd, sk_psock_verdict_data_ready() looks up the mapped socket and enqueue skb to its psock->ingress_skb. Since neither the old nor the new GC can inspect the psock queue, the hidden skb leaks the inflight sockets. Note that this cannot be detected via kmemleak because inflight sockets are linked to a global list. In addition, SOCKMAP redirect breaks the Tarjan-based GC's assumption that unix_edge.successor is always alive, which is no longer true once skb is redirected, resulting in use-after-free below. [0] Moreover, SOCKMAP does not call scm_stat_del() properly, so unix_show_fdinfo() could report an incorrect fd count. sk_msg_recvmsg() does not support any SCM attributes in the first place. Let's drop all SCM attributes before passing skb to the SOCKMAP layer. [0]: BUG: KASAN: slab-use-after-free in unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251) Read of size 8 at addr ffff888125362670 by task kworker/56:1/496 CPU: 56 UID: 0 PID: 496 Comm: kworker/56:1 Not tainted 7.0.0-rc7-00263-gb9d8b856689d #3 PREEMPT(lazy) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 Workqueue: events sk_psock_backlog Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:122) print_report (mm/kasan/report.c:379) kasan_report (mm/kasan/report.c:597) unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251) unix_destroy_fpl (net/unix/garbage.c:317) unix_destruct_scm (./include/net/scm.h:80 ./include/net/scm.h:86 net/unix/af_unix.c:1976) sk_psock_backlog (./include/linux/skbuff.h:?) process_scheduled_works (kernel/workqueue.c:?) worker_thread (kernel/workqueue.c:?) kthread (kernel/kthread.c:438) ret_from_fork (arch/x86/kernel/process.c:164) ret_from_fork_asm (arch/x86/entry/entry_64.S:258) </TASK> Allocated by task 955: kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78) __kasan_slab_alloc (mm/kasan/common.c:369) kmem_cache_alloc_noprof (mm/slub.c:4539) sk_prot_alloc (net/core/sock.c:2240) sk_alloc (net/core/sock.c:2301) unix_create1 (net/unix/af_unix.c:1099) unix_create (net/unix/af_unix.c:1169) __sock_create (net/socket.c:1606) __sys_socketpair (net/socket.c:1811) __x64_sys_socketpair (net/socket.c:1863 net/socket.c:1860 net/socket.c:1860) do_syscall_64 (arch/x86/entry/syscall_64.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Freed by task 496: kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78) kasan_save_free_info (mm/kasan/generic.c:587) __kasan_slab_free (mm/kasan/common.c:287) kmem_cache_free (mm/slub.c:6165) __sk_destruct (net/core/sock.c:2282 net/core/sock.c:2384) sk_psock_destroy (./include/net/sock.h:?) process_scheduled_works (kernel/workqueue.c:?) worker_thread (kernel/workqueue.c:?) kthread (kernel/kthread.c:438) ret_from_fork (arch/x86/kernel/process.c:164) ret_from_fork_asm (arch/x86/entry/entry_64.S:258) Fixes: `c63829182c` ("af_unix: Implement ->psock_update_sk_prot()") Fixes: `77462de14a` ("af_unix: Add read_sock for stream socket types") Reported-by: Xingyu Jin <xingyuj@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260415184830.3988432-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:12:28 -07:00
KhaiWenTan	8cff9dbe89	net: stmmac: Update default_an_inband before passing value to phylink_config get_interfaces() will update both the plat->phy_interfaces and mdio_bus_data->default_an_inband based on reading a SERDES register. As get_interfaces() will be called after default_an_inband had already been read, dwmac-intel regressed as a result with incorrect default_an_inband value in phylink_config. Therefore, we moved the priv->plat->get_interfaces() to be executed first before assigning priv->plat->default_an_inband to config->default_an_inband to ensure default_an_inband is in correct value. Fixes: `d3836052fe` ("net: stmmac: intel: convert speed_mode_2500() to get_interfaces()") Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260416102609.7953-1-khai.wen.tan@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:10:16 -07:00
Eric Dumazet	f996edd761	ipv6: fix possible UAF in icmpv6_rcv() Caching saddr and daddr before pskb_pull() is problematic since skb->head can change. Remove these temporary variables: - We only access &ipv6_hdr(skb)->saddr and &ipv6_hdr(skb)->daddr when net_dbg_ratelimited() is called in the slow path. - Avoid potential future misuse after pskb_pull() call. Fixes: `4b3418fba0` ("ipv6: icmp: include addresses in debug messages") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260416103505.2380753-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:09:52 -07:00
Jakub Kicinski	dcf6d5e629	Merge branch 'intel-wired-lan-driver-updates-2026-04-14-ice-i40e-iavf-idpf-e1000e' Jacob Keller says: ==================== Intel Wired LAN Driver Updates 2026-04-14 (ice, i40e, iavf, e1000e) Grzegorz updates the logic for adjusting the PTP hardware clock on E830, fixing a bug that prevented adjustments below S32_MAX/MIN nanoseconds. Grzegorz and Zoli update the PCS latency settings for E825 devices at 10GbE and 25GbE, improving the accuracy of timestamps based on data from production hardware. Michal Schmidt fixes a double-free that could happen if a particular error path is taken in ice_xmit_frame_ring(). Guangshuo fixes a double-free that could happen during error paths in the ice_sf_eth_activate() function. Paul Greenwalt fixes the PHY link configuration when the link-down-on-close driver parameter is enabled and new media is inserted. Paul Greenwalt fixes the ICE_AQ_LINK_SPEED_M macro for 200G, enabling 200G link speed advertisement. Keita Morisaki fixes a race condition in the ice Tx timestamp ring cleanup, preventing a possible NULL pointer dereference. Kohei Enju fixes a potential NULL pointer dereference in ice_set_ring_param(). Kohei Enju fixes i40e to stop advertising IFF_SUPP_NOFCS, when the driver does not actually support the feature. Petr fixes the VLAN L2TAG2 mask when the iAVF VF and a PF negotiate use of the legacy Rx descriptor format. Matt fixes the unrolling logic for PTP when the e1000e probe fails after the PTP clock has been registered. A note to stable backports The patches [7/12] ("ice: fix race condition in TX timestamp ring cleanup") and [8/12] ("ice: fix potential NULL pointer deref in error path of ice_set_ringparam()") must be backported together. Otherwise the fix in patch 8 will not work properly. ==================== Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-0-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:42 -07:00
Matt Vollrath	aa3f7fe409	e1000e: Unroll PTP in probe error handling If probe fails after registering the PTP clock and its delayed work, these resources must be released. This was not an issue until a 2016 fix moved the e1000e_ptp_init() call before the jump to err_register. Fixes: `aa524b66c5` ("e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl") Signed-off-by: Matt Vollrath <tactii@gmail.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-12-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:41 -07:00
Petr Oros	496d9f9106	iavf: fix wrong VLAN mask for legacy Rx descriptors L2TAG2 The IAVF_RXD_LEGACY_L2TAG2_M mask was incorrectly defined as GENMASK_ULL(63, 32), extracting 32 bits from qw2 instead of the 16-bit VLAN tag. In the legacy Rx descriptor layout, the 2nd L2TAG2 (VLAN tag) occupies bits 63:48 of qw2, not 63:32. The oversized mask causes FIELD_GET to return a 32-bit value where the actual VLAN tag sits in bits 31:16. When this value is passed to iavf_receive_skb() as a u16 parameter, it gets truncated to the lower 16 bits (which contain the 1st L2TAG2, typically zero). As a result, __vlan_hwaccel_put_tag() is never called and software VLAN interfaces on VFs receive no traffic. This affects VFs behind ice PF (VIRTCHNL VLAN v2) when the PF advertises VLAN stripping into L2TAG2_2 and legacy descriptors are used. The flex descriptor path already uses the correct mask (IAVF_RXD_FLEX_L2TAG2_2_M = GENMASK_ULL(63, 48)). Reproducer: 1. Create 2 VFs on ice PF (echo 2 > sriov_numvfs) 2. Disable spoofchk on both VFs 3. Move each VF into a separate network namespace 4. On each VF: create VLAN interface (e.g. vlan 198), assign IP, bring up 5. Set rx-vlan-offload OFF on both VFs 6. Ping between VLAN interfaces -> expect PASS (VLAN tag stays in packet data, kernel matches in-band) 7. Set rx-vlan-offload ON on both VFs 8. Ping between VLAN interfaces -> expect FAIL if bug present (HW strips VLAN tag into descriptor L2TAG2 field, wrong mask extracts bits 47:32 instead of 63:48, truncated to u16 -> zero, __vlan_hwaccel_put_tag() never called, packet delivered to parent interface, not VLAN interface) The reproducer requires legacy Rx descriptors. On modern ice + iavf with full PTP support, flex descriptors are always negotiated and the buggy legacy path is never reached. Flex descriptors require all of: - CONFIG_PTP_1588_CLOCK enabled - VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC granted by PF - PTP capabilities negotiated (VIRTCHNL_VF_CAP_PTP) - VIRTCHNL_1588_PTP_CAP_RX_TSTAMP supported - VIRTCHNL_RXDID_2_FLEX_SQ_NIC present in DDP profile If any condition is not met, iavf_select_rx_desc_format() falls back to legacy descriptors (RXDID=1) and the wrong L2TAG2 mask is hit. Fixes: `2dc8e7c36d` ("iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-10-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:35 -07:00
Kohei Enju	a24162f188	i40e: don't advertise IFF_SUPP_NOFCS i40e advertises IFF_SUPP_NOFCS, allowing users to use the SO_NOFCS socket option. However, this option is silently ignored, as the driver does not check skb->no_fcs, and always enables FCS insertion offload. Fix this by removing the advertisement of IFF_SUPP_NOFCS. This behavior can be reproduced with a simple AF_PACKET socket: import socket s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW) s.setsockopt(socket.SOL_SOCKET, 43, 1) # SO_NOFCS s.bind(("eth0", 0)) s.send(b'\xff' * 64) Previously, send() succeeds but the driver ignores SO_NOFCS. With this change, send() fails with -EPROTONOSUPPORT, as expected. Fixes: `41c445ff0f` ("i40e: main driver core") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-9-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:34 -07:00
Kohei Enju	fa28351f97	ice: fix potential NULL pointer deref in error path of ice_set_ringparam() ice_set_ringparam nullifies tstamp_ring of temporary tx_rings, without clearing ICE_TX_RING_FLAGS_TXTIME bit. When ICE_TX_RING_FLAGS_TXTIME is set and the subsequent ice_setup_tx_ring() call fails, a NULL pointer dereference could happen in the unwinding sequence: ice_clean_tx_ring() -> ice_is_txtime_cfg() == true (ICE_TX_RING_FLAGS_TXTIME is set) -> ice_free_tx_tstamp_ring() -> ice_free_tstamp_ring() -> tstamp_ring->desc (NULL deref) Clear ICE_TX_RING_FLAGS_TXTIME bit to avoid the potential issue. Note that this potential issue is found by manual code review. Compile test only since unfortunately I don't have E830 devices. Fixes: `ccde82e909` ("ice: add E830 Earliest TxTime First Offload support") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-8-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:34 -07:00
Keita Morisaki	7c72ec18c2	ice: fix race condition in TX timestamp ring cleanup Fix a race condition between ice_free_tx_tstamp_ring() and ice_tx_map() that can cause a NULL pointer dereference. ice_free_tx_tstamp_ring currently clears the ICE_TX_FLAGS_TXTIME flag after NULLing the tstamp_ring. This could allow a concurrent ice_tx_map call on another CPU to dereference the tstamp_ring, which could lead to a NULL pointer dereference. CPU A:ice_free_tx_tstamp_ring() \| CPU B:ice_tx_map() --------------------------------\|--------------------------------- tx_ring->tstamp_ring = NULL \| \| ice_is_txtime_cfg() -> true \| tstamp_ring = tx_ring->tstamp_ring \| tstamp_ring->count // NULL deref! flags &= ~ICE_TX_FLAGS_TXTIME \| Fix by: 1. Reordering ice_free_tx_tstamp_ring() to clear the flag before NULLing the pointer, with smp_wmb() to ensure proper ordering. 2. Adding smp_rmb() in ice_tx_map() after the flag check to order the flag read before the pointer read, using READ_ONCE() for the pointer, and adding a NULL check as a safety net. 3. Converting tx_ring->flags from u8 to DECLARE_BITMAP() and using atomic bitops (set_bit(), clear_bit(), test_bit()) for all flag operations throughout the driver: - ICE_TX_RING_FLAGS_XDP - ICE_TX_RING_FLAGS_VLAN_L2TAG1 - ICE_TX_RING_FLAGS_VLAN_L2TAG2 - ICE_TX_RING_FLAGS_TXTIME Fixes: `ccde82e909` ("ice: add E830 Earliest TxTime First Offload support") Signed-off-by: Keita Morisaki <kmta1236@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-7-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:34 -07:00
Paul Greenwalt	4a3a940059	ice: fix ICE_AQ_LINK_SPEED_M for 200G When setting PHY configuration during driver initialization, 200G link speed is not being advertised even when the PHY is capable. This is because the get PHY capabilities link speed response is being masked by ICE_AQ_LINK_SPEED_M, which does not include the 200G link speed bit. ICE_AQ_LINK_SPEED_200GB is defined as BIT(11), but the mask 0x7FF only covers bits 0-10. Fix ICE_AQ_LINK_SPEED_M to use GENMASK(11, 0) so that it covers all defined link speed bits including 200G. Fixes: `24407a01e5` ("ice: Add 200G speed/phy type use") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-6-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:34 -07:00
Paul Greenwalt	55e74f9ea7	ice: fix PHY config on media change with link-down-on-close Commit `1a3571b593` ("ice: restore PHY settings on media insertion") introduced separate flows for setting PHY configuration on media present: ice_configure_phy() when link-down-on-close is disabled, and ice_force_phys_link_state() when enabled. The latter incorrectly uses the previous configuration even after module change, causing link issues such as wrong speed or no link. Unify PHY configuration into a single ice_phy_cfg() function with a link_en parameter, ensuring PHY capabilities are always fetched fresh from hardware. Fixes: `1a3571b593` ("ice: restore PHY settings on media insertion") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-5-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:34 -07:00
Michal Schmidt	1a303baa71	ice: fix double-free of tx_buf skb If ice_tso() or ice_tx_csum() fail, the error path in ice_xmit_frame_ring() frees the skb, but the 'first' tx_buf still points to it and is marked as valid (ICE_TX_BUF_SKB). 'next_to_use' remains unchanged, so the potential problem will likely fix itself when the next packet is transmitted and the tx_buf gets overwritten. But if there is no next packet and the interface is brought down instead, ice_clean_tx_ring() -> ice_unmap_and_free_tx_buf() will find the tx_buf and free the skb for the second time. The fix is to reset the tx_buf type to ICE_TX_BUF_EMPTY in the error path, so that ice_unmap_and_free_tx_buf(). Move the initialization of 'first' up, to ensure it's already valid in case we hit the linearization error path. The bug was spotted by AI while I had it looking for something else. It also proposed an initial version of the patch. I reproduced the bug and tested the fix by adding code to inject failures, on a build with KASAN. I looked for similar bugs in related Intel drivers and did not find any. Fixes: `d76a60ba7a` ("ice: Add support for VLANs and offloads") Assisted-by: Claude:claude-4.6-opus-high Cursor Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-4-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:33 -07:00
Guangshuo Li	9aab1c3d72	ice: fix double free in ice_sf_eth_activate() error path When auxiliary_device_add() fails, ice_sf_eth_activate() jumps to aux_dev_uninit and calls auxiliary_device_uninit(&sf_dev->adev). The device release callback ice_sf_dev_release() frees sf_dev, but the current error path falls through to sf_dev_free and calls kfree(sf_dev) again, causing a double free. Keep kfree(sf_dev) for the auxiliary_device_init() failure path, but avoid falling through to sf_dev_free after auxiliary_device_uninit(). Fixes: `13acc5c4cd` ("ice: subfunction activation and base devlink ops") Cc: stable@vger.kernel.org Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-3-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:33 -07:00
Grzegorz Nitka	05567e4052	ice: update PCS latency settings for E825 10G/25Gb modes Update MAC Rx/Tx offset registers settings (PHY_MAC_[RX\|TX]_OFFSET registers) with the data obtained with the latest research. It applies to PCS latency settings for the following speeds/modes: * 10Gb NO-FEC - TX latency changed from 71.25 ns to 73 ns - RX latency changed from -25.6 ns to -28 ns * 25Gb NO-FEC - TX latency changed from 28.17 ns to 33 ns - RX latency changed from -12.45 ns to -12 ns * 25Gb RS-FEC - TX latency changed from 64.5 ns to 69 ns - RX latency changed from -3.6 ns to -3 ns The original data came from simulation and pre-production hardware. The new data measures the actual delays and as such is more accurate. Fixes: `7cab44f1c3` ("ice: Introduce ETH56G PHY model for E825C products") Co-developed-by: Zoltan Fodor <zoltan.fodor@intel.com> Signed-off-by: Zoltan Fodor <zoltan.fodor@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-2-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:33 -07:00
Grzegorz Nitka	885c5e5792	ice: fix 'adjust' timer programming for E830 devices Fix incorrect 'adjust the timer' programming sequence for E830 devices series. Only shadow registers GLTSYN_SHADJ were programmed in the current implementation. According to the specification [1], write to command GLTSYN_CMD register is also required with CMD field set to "Adjust the Time" value, for the timer adjustment to take the effect. The flow was broken for the adjustment less than S32_MAX/MIN range (around +/- 2 seconds). For bigger adjustment, non-atomic programming flow is used, involving set timer programming. Non-atomic flow is implemented correctly. Testing hints: Run command: phc_ctl /dev/ptpX get adj 2 get Expected result: Returned timestamps differ at least by 2 seconds [1] Intel® Ethernet Controller E830 Datasheet rev 1.3, chapter 9.7.5.4 https://cdrdv2.intel.com/v1/dl/getContent/787353?explicitVersion=true Fixes: `f003075227` ("ice: Implement PTP support for E830 devices") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-1-686c33c9828d@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 12:01:33 -07:00
Jakub Kicinski	0916664f99	Merge tag 'ovpn-net-20260417' of https://github.com/OpenVPN/ovpn-net-next Antonio Quartulli says: ==================== This batch includes only fixes to the selftest harness: * switch to TAP test orchestration * parse slurped notifications as returned by jq -s * add ovpn_ prefix to helpers and global variables to avoid clashes * fail test in case of netlink notification mismatch * add missing kernel config dependencies * add delay when launching multiple ynl/cli.py listeners * tag 'ovpn-net-20260417' of https://github.com/OpenVPN/ovpn-net-next: selftests: ovpn: serialize YNL listener startup selftests: ovpn: align command flow with TAP selftests: ovpn: add prefix to helpers and shared variables selftests: ovpn: flatten slurped notification JSON before filtering selftests: ovpn: fail notification check on mismatch selftests: ovpn: add nftables config dependencies for test-mark ==================== Link: https://patch.msgid.link/20260417090305.2775723-1-antonio@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:44:12 -07:00
Jakub Kicinski	f3a0e90d4d	Merge branch 'tcp-take-care-of-tcp_get_timestamping_opt_stats-races' Eric Dumazet says: ==================== tcp: take care of tcp_get_timestamping_opt_stats() races tcp_get_timestamping_opt_stats() does not own the socket lock, this is intentional. It calls tcp_get_info_chrono_stats() while other threads could change chrono fields in tcp_chrono_set(). It also reads many tcp socket fields that can be modified by other cpus/threads. I do not think we need coherent TCP socket state snapshot in tcp_get_timestamping_opt_stats(). Add READ_ONCE()/WRITE_ONCE() or data_race() annotations. Note that icsk_ca_state is a bitfield, thus not covered in this series. ==================== Link: https://patch.msgid.link/20260416200319.3608680-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:16 -07:00
Eric Dumazet	9e89b9d03a	tcp: annotate data-races around tp->plb_rehash tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `29c1c44646` ("tcp: add u32 counter in tcp_sock and an SNMP counter for PLB") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-15-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:14 -07:00
Eric Dumazet	3a63b3d160	tcp: annotate data-races around (tp->write_seq - tp->snd_nxt) tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() annotations to keep KCSAN happy. WRITE_ONCE() annotations are already present. Fixes: `e08ab0b377` ("tcp: add bytes not sent to SCM_TIMESTAMPING_OPT_STATS") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-14-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:13 -07:00
Eric Dumazet	71c675358b	tcp: annotate data-races around tp->timeout_rehash tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `32efcc06d2` ("tcp: export count for rehash attempts") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-13-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:13 -07:00
Eric Dumazet	290b693ce7	tcp: annotate data-races around tp->srtt_us tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `e8bd8fca67` ("tcp: add SRTT to SCM_TIMESTAMPING_OPT_STATS") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:13 -07:00
Eric Dumazet	62585690e6	tcp: annotate data-races around tp->reord_seen tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `7ec65372ca` ("tcp: add stat of data packet reordering events") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:13 -07:00
Eric Dumazet	a984705ca8	tcp: annotate data-races around tp->dsack_dups tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `7e10b6554f` ("tcp: add dsack blocks received stats") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:13 -07:00
Eric Dumazet	5efc7b9f7c	tcp: annotate data-races around tp->bytes_retrans tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `fb31c9b9f6` ("tcp: add data bytes retransmitted stats") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:13 -07:00
Eric Dumazet	ee43e957ce	tcp: annotate data-races around tp->bytes_sent tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `ba113c3aa7` ("tcp: add data bytes sent stats") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:12 -07:00
Eric Dumazet	124199444d	tcp: add data-race annotations for TCP_NLA_SNDQ_SIZE tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `87ecc95d81` ("tcp: add send queue size stat in SCM_TIMESTAMPING_OPT_STATS") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:12 -07:00
Eric Dumazet	faa886ad3c	tcp: annotate data-races around tp->delivered and tp->delivered_ce tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `feb5f2ec64` ("tcp: export packets delivery info") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:12 -07:00
Eric Dumazet	fd571afb05	tcp: annotate data-races around tp->snd_ssthresh tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `7156d194a0` ("tcp: add snd_ssthresh stat in SCM_TIMESTAMPING_OPT_STATS") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:12 -07:00
Eric Dumazet	829ba1f329	tcp: add data-races annotations around tp->reordering, tp->snd_cwnd tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE(), WRITE_ONCE() data_race() annotations to keep KCSAN happy. Fixes: `bb7c19f960` ("tcp: add related fields into SCM_TIMESTAMPING_OPT_STATS") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:12 -07:00
Eric Dumazet	21e92a38cf	tcp: add data-race annotations around tp->data_segs_out and tp->total_retrans tcp_get_timestamping_opt_stats() intentionally runs lockless, we must add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy. Fixes: `7e98102f48` ("tcp: record pkts sent and retransmistted") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:12 -07:00
Eric Dumazet	267bf3cf9a	tcp: annotate data-races in tcp_get_info_chrono_stats() tcp_get_timestamping_opt_stats() does not own the socket lock, this is intentional. It calls tcp_get_info_chrono_stats() while other threads could change chrono fields in tcp_chrono_set(). I do not think we need coherent TCP socket state snapshot in tcp_get_timestamping_opt_stats(), I chose to only add annotations to keep KCSAN happy. Fixes: `1c885808e4` ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260416200319.3608680-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-18 11:10:11 -07:00
Ralf Lici	6c9b1dc218	selftests: ovpn: serialize YNL listener startup Starting one background YNL notification listener per peer back-to-back can intermittently stall the test setup before the listeners even reach the Python main function. This was reproducible in a reduced test.sh setup-only loop: a single listener stayed stable across repeated runs, while starting listeners for all peers could hang early in the listener launch phase. Adding a short delay between listener launches makes the listeners start cleanly and eliminates the reproduced hangs in repeated normal and slow-runner tests. Serialize listener startup with a small sleep between setup_listener calls. Fixes: `77de28cd7c` ("selftests: ovpn: add notification parsing and matching") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>	2026-04-17 10:54:04 +02:00
Ralf Lici	1be93bb979	selftests: ovpn: align command flow with TAP Current tests do not properly adhere to the TAP infrastructure therefore they do not properly report failures leading to hangs of the CI machinery. Restructure ovpn selftests into using the TAP infrastructure: split each test in stages, execute stage bodies with fail-fast semantics, and emit KTAP pass/fail for each stage. Centralize behavior control in common.sh and makes the scripts use dedicated wrappers for required-success, expected-failure, and non-fatal commands. Also add the OVPN_VERBOSE mode that exposes captured command output for debugging. This way tests won't hang anymore in case of failure when executed within the CI machinery. This change also makes default OVPN_CLI and YNL resolution independent from the caller CWD by anchoring both to COMMON_DIR, so behavior is stable across direct execution and run_tests-style execution. Fixes: `959bc330a4` ("testing/selftests: add test tool and scripts for ovpn module") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>	2026-04-17 10:54:03 +02:00
Ralf Lici	7c29665a3a	selftests: ovpn: add prefix to helpers and shared variables Current naming for shared variables, helpers and netnamespaces is a bit unfortunate as it doesn't come with a clean prefix. This showed to be problematic in case of name clashes with external scripts or in case of abrupt test termination (hanging netns' weren't easily reconducible to ovpn). Rename common helper entry points and all shared globals in the ovpn selftests to ovpn_ or OVPN_ names so test scripts and wrappers use a single explicit prefix. Also rename the temporary network namespaces created by the tests from peerN to ovpn_peerN. This makes leaked namespaces easier to identify. This is a mechanical refactor only, behavior is unchanged. Fixes: `959bc330a4` ("testing/selftests: add test tool and scripts for ovpn module") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>	2026-04-17 10:54:03 +02:00
Ralf Lici	222e7f8d1c	selftests: ovpn: flatten slurped notification JSON before filtering Notification comparison uses jq -s, which slurps all inputs into an array. Some inputs can be arrays themselves, and applying the .msg.peer filter directly on those entries triggers jq type errors. Expand any array-valued JSON items returned by jq -s before selecting .msg.peer, so the filter handles both normal notification objects and [] entries without type errors. Fixes: `77de28cd7c` ("selftests: ovpn: add notification parsing and matching") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>	2026-04-17 10:54:03 +02:00
Ralf Lici	c409da0fe1	selftests: ovpn: fail notification check on mismatch compare_ntfs doesn't fail when expected and received notification streams diverge. Fix this bug by tracking the diff exit status explicitly and return it to the caller so notification mismatches propagate as test failures. Fixes: `77de28cd7c` ("selftests: ovpn: add notification parsing and matching") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>	2026-04-17 10:54:03 +02:00
Ralf Lici	e5fd34ab8d	selftests: ovpn: add nftables config dependencies for test-mark test-mark.sh installs nftables rules in an inet/filter output chain and verifies packet drops via nft counters. In vmksft this can fail when the nftables core is not enabled by the ovpn selftest config. Add the missing kernel options required by this test: - CONFIG_NETFILTER - CONFIG_NF_TABLES - CONFIG_NF_TABLES_INET Fixes: `7b80d8a335` ("selftests: ovpn: add test for the FW mark feature") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/all/20260319124114.42f91f72@kernel.org/ Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>	2026-04-17 10:54:03 +02:00
Jakub Kicinski	82c2106902	selftests: net: add missing CMAC to tcp_ao config Recent changes to crypto and wifi made CMAC no longer selected by default on x86 and tcp_ao needs it. Add the missing config. Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260416010439.1053587-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-16 19:35:43 -07:00
Jakub Kicinski	946e991465	Merge branch 'vsock-virtio-fix-msg_peek-calculation-on-bytes-to-copy' Luigi Leonardi says: ==================== vsock/virtio: fix MSG_PEEK calculation on bytes to copy `virtio_transport_stream_do_peek`, when calculating the number of bytes to copy, didn't consider the `offset`, caused by partial reads that happened before. This might cause out-of-bounds read that lead to an EFAULT. More details in the commits. Commit 1 introduces the fix Commit 2 introduces some preliminary work for adding a test and fixes a problem in existing tests. Commit 3 introduces a test that checks for this bug to avoid future regressions. For disclosure: this bug was found initially by claude opus 4.6, I then analyzed it and worked on the fix and the test. ==================== Link: https://patch.msgid.link/20260415-fix_peek-v4-0-8207e872759e@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-16 19:34:26 -07:00
Luigi Leonardi	2a2675ef61	vsock/test: add MSG_PEEK after partial recv test Add a test that verifies MSG_PEEK works correctly after a partial recv(). This is to test a bug that was present in the `virtio_transport_stream_do_peek()` when computing the number of bytes to copy: After a partial read, the peek function didn't take into consideration the number of bytes that were already read. So peeking the whole buffer would cause an out-of-bounds read, that resulted in a -EFAULT. This test does exactly this: do a partial recv on a buffer, then try to peek the whole buffer content. The test re-uses `test_stream_msg_peek_client()` to also cover this scenario. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20260415-fix_peek-v4-3-8207e872759e@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-16 19:34:22 -07:00
Luigi Leonardi	a3f77afbf6	vsock/test: fix MSG_PEEK handling in recv_buf() `recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling `recv` until all requested bytes are available or an error occurs. The problem is how it calculates the number of bytes read: MSG_PEEK doesn't consume any bytes and will re-read the same bytes from the buffer head, so summing the return value every time is wrong. Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if more bytes are requested than are available, the loop will never terminate, because `recv` will never return EOF. For this reason, we need to compare the number of bytes read with the number of bytes expected. Add a check: if the MSG_PEEK flag is present, update the byte counter and break out of the loop only after at least the expected number of bytes have been received; otherwise, retry after a short delay to avoid consuming too many CPU cycles. This allows us to simplify the `test_stream_credit_update_test` by reusing `recv_buf`, like some other tests already do. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260415-fix_peek-v4-2-8207e872759e@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-16 19:34:22 -07:00
Luigi Leonardi	080f22f5d3	vsock/virtio: fix MSG_PEEK ignoring skb offset when calculating bytes to copy `virtio_transport_stream_do_peek()` does not account for the skb offset when computing the number of bytes to copy. This means that, after a partial recv() that advances the offset, a peek requesting more bytes than are available in the sk_buff causes `skb_copy_datagram_iter()` to go past the valid payload, resulting in a -EFAULT. The dequeue path already handles this correctly. Apply the same logic to the peek path. Fixes: `0df7cd3c13` ("vsock/virtio/vhost: read data from non-linear skb") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20260415-fix_peek-v4-1-8207e872759e@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-16 19:34:22 -07:00
Jakub Kicinski	d2dced26bc	Merge branch 'net-enetc-fix-command-bd-ring-issues' Wei Fang says: ==================== net: enetc: fix command BD ring issues Currently, the implementation of command BD ring has two issues, one is that the driver may obtain wrong consumer index of the ring, because the driver does not mask out the SBE bit of the CIR value, so a wrong index will be obtained when a SBE error ouccrs. The other one is that the DMA buffer may be used after free. If netc_xmit_ntmp_cmd() times out and returns an error, the pending command is not explicitly aborted, while ntmp_free_data_mem() unconditionally frees the DMA buffer. If the buffer has already been reallocated elsewhere, this may lead to silent memory corruption. Because the hardware eventually processes the pending command and perform a DMA write of the response to the physical address of the freed buffer. So this patch set is to fix these two issues. ==================== Link: https://patch.msgid.link/20260415060833.2303846-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-16 19:31:09 -07:00

1 2 3 4 5 ...

1434360 Commits