linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 09:02:21 -04:00

Author	SHA1	Message	Date
Markus Baier	36bdc0e815	net: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks Commit `e0bffe3e68` ("net: asix: ax88772: migrate to phylink") replaced the asix_adjust_link() PHY callback with phylink's mac_link_up() and mac_link_down() handlers, but did not carry over the usbnet_link_change() notification that commit `805206e66f` ("net: asix: fix "can't send until first packet is send" issue") had added. As a result, the original symptom returns: when the link comes up, usbnet is never notified, so the RX URB submission stays dormant until some other event (e.g. a transmitted packet triggering the status endpoint interrupt) wakes it up. This is reproducible with the Apple A1277 USB Ethernet Adapter (05ac:1402, AX88772A based) on a Banana Pro using a static IPv4 configuration. After bringing the interface up, no incoming packets are received until the first outgoing frame triggers usbnet's RX path. Restore the link change notification, gated on a carrier transition so the call remains idempotent if the status endpoint also reports the change later. Fixes: `e0bffe3e68` ("net: asix: ax88772: migrate to phylink") Signed-off-by: Markus Baier <Markus.Baier@soslab.tu-darmstadt.de> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260501163941.107668-1-Markus.Baier@soslab.tu-darmstadt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-04 19:04:15 -07:00
Breno Leitao	76b93a8107	netpoll: pass buffer size to egress_dev() to avoid MAC truncation egress_dev() formats np->dev_mac via snprintf() but receives buf as a bare char , so it cannot derive the buffer size from the pointer. The size argument was hardcoded to MAC_ADDR_STR_LEN (3 ETH_ALEN - 1 = 17), which is silly wrong in two ways: 1) misleading kernel log output on the MAC-selected target path (np->dev_name[0] == '\0'); for example "aa:bb:cc:dd:ee:ff doesn't exist, aborting" was logged as "aa:bb:cc:dd:ee:f doesn't exist, aborting". 2) the second argument of snprintf is the size of the buffer, not the size of what you want to write. Add a bufsz parameter to egress_dev() and pass sizeof(buf) from each caller, matching the standard snprintf() idiom and removing the hardcoded size from the helper. Every caller already declares "char buf[MAC_ADDR_STR_LEN + 1]" so the formatted MAC continues to fit. Tested by booting with netconsole=6665@/aa:bb:cc:dd:ee:ff,6666@10.0.0.1/00:11:22:33:44:55 on a kernel without a matching device. Pre-fix dmesg shows "aa:bb:cc:dd:ee:f doesn't exist, aborting"; post-fix shows the full "aa:bb:cc:dd:ee:ff doesn't exist, aborting". Fixes: `f8a10bed32` ("netconsole: allow selection of egress interface via MAC address") Cc: stable@vger.kernel.org Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260501-netpoll_snprintf_fix-v1-1-84b0566e6597@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-04 18:37:25 -07:00
Kuniyuki Iwashima	d82ba05263	af_unix: Set gc_in_progress to true in unix_gc(). Igor Ushakov reported that unix_gc() could run with gc_in_progress being false if the work is scheduled while running: Thread 1 Thread 2 Thread 3 -------- -------- -------- unix_schedule_gc() unix_schedule_gc() `- if (!gc_in_progress) `- if (!gc_in_progress) \|- gc_in_progress = true \| `- queue_work() \| unix_gc() <----------------/ \| \| \|- gc_in_progress = true ... `- queue_work() \| \| `- gc_in_progress = false \| \| unix_gc() <---------------------------------------------' \| ... /* gc_in_progress == false */ \| `- gc_in_progress = false unix_peek_fpl() relies on gc_in_progress not to confuse GC by MSG_PEEK. Let's set gc_in_progress to true in unix_gc(). Fixes: `8b90a9f819` ("af_unix: Run GC on only one CPU.") Reported-by: Igor Ushakov <sysroot314@gmail.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260501073945.1884564-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-04 18:34:45 -07:00
Jakub Kicinski	bd3a4795d5	selftests: tls: add test for data loss on small pipe Add selftest for data loss on short splice. Link: https://patch.msgid.link/20260429222944.2139041-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 18:27:14 -07:00
Jakub Kicinski	7e7be31bfd	net: tls: fix silent data drop under pipe back-pressure tls_sw_splice_read() uses len when advancing rxm->offset / rxm->full_len after skb_splice_bits(), rather than copied (the actual number of bytes successfully spliced into the pipe). When the destination pipe cannot accept all the requested bytes, splice_to_pipe() returns fewer bytes than len, and 'len - copied' of data is effectively skipped over. Fixes: `e062fe99cc` ("tls: splice_read: fix accessing pre-processed records") Link: https://patch.msgid.link/20260429222944.2139041-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 18:27:14 -07:00
Jakub Kicinski	b42f68cf04	Merge branch 'net-sched-sch_cake-annotate-data-races-in-cake_dump_class_stats-series' Eric Dumazet says: ==================== net/sched: sch_cake: annotate data-races in cake_dump_class_stats (series) cake_dump_class_stats() runs without qdisc spinlock being held. In this series (of two), I add READ_ONCE()/WRITE_ONCE() annotations for: - flow->head - flow->dropped - b->backlogs[] - flow->deficit - flow->cvars.dropping - flow->cvars.count - flow->cvars.p_drop - flow->cvars.blue_timer - flow->cvars.drop_next ==================== Link: https://patch.msgid.link/20260430061610.3503483-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 16:59:12 -07:00
Eric Dumazet	67dc6c56b8	net/sched: sch_cake: annotate data-races in cake_dump_class_stats (II) cake_dump_class_stats() runs without qdisc spinlock being held. In this second patch, I add READ_ONCE()/WRITE_ONCE() annotations for: - flow->deficit - flow->cvars.dropping - flow->cvars.count - flow->cvars.p_drop - flow->cvars.blue_timer - flow->cvars.drop_next Fixes: `046f6fd5da` ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260430061610.3503483-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 16:59:09 -07:00
Eric Dumazet	046111a1a3	net/sched: sch_cake: annotate data-races in cake_dump_class_stats (I) cake_dump_class_stats() runs without qdisc spinlock being held. In this first patch, I add READ_ONCE()/WRITE_ONCE() annotations for: - flow->head - flow->dropped - b->backlogs[] Fixes: `046f6fd5da` ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260430061610.3503483-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 16:59:09 -07:00
Maoyi Xie	1d324c2f43	ip6_gre: Use cached t->net in ip6erspan_changelink(). After commit `5e72ce3e39` ("net: ipv6: Use link netns in newlink() of rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns ip6gre hash via link_net. ip6erspan_changelink() was not converted in that series and still uses dev_net(dev), which diverges from the device's creation netns after IFLA_NET_NS_FD migration. This re-inserts the tunnel into the wrong per-netns hash. The original netns keeps a stale entry. When that netns is later destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a slab-use-after-free reported by KASAN, followed by a kernel BUG at net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify(). Reachable from an unprivileged user namespace (unshare --user --map-root-user --net). ip6gre_changelink() earlier in the same file already uses the cached t->net; only ip6erspan_changelink() has the wrong shape. Fixes: `2d665034f2` ("net: ip6_gre: Fix ip6erspan hlen calculation") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260430103318.3206018-1-maoyi.xie@ntu.edu.sg Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:36:21 -07:00
Jakub Kicinski	386829cd89	Merge branch 'replace-direct-dequeue-call-with-qdisc_dequeue_peeked' Jamal Hadi Salim says: ==================== Replace direct dequeue call with qdisc_dequeue_peeked When sfb and red qdiscs have children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (red/sfb in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(red/sfb) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (red/sfb). And herein lies the problem. - red/sfb will call the child's dequeue() which will essentially just try to grab something of qfq's queue. The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Patch 1 fixes the issue for red qdisc. Patch 2 fixes it for sfb. Patch 3 adds testcases for the two setups. ==================== Link: https://patch.msgid.link/20260430152957.194015-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:21:00 -07:00
Victor Nogueira	3a3a30c14d	selftests/tc-testing: Add tests that force red and sfb to dequeue from child's gso_skb Create 4 test cases: - Force red to dequeue from its child's gso_skb with qfq leaf - Force sfb to dequeue from its child's gso_skb with qfq leaf - Force red to dequeue from its child's gso_skb with dualpi2 leaf - Force sfb to dequeue from its child's gso_skb with dualpi2 leaf All of them have tbf followed by red (or sfb) followed by qfq (or dualpi2). Since tbf calls its child's peek followed by qdisc_dequeue_peeked, it will force red/sfb to call their child's peek. In this case, since the child (qfq/dualpi2) has qdisc_peek_dequeued as its peek callback, the packet will be stored in its gso_skb queue. During the subsequent call to qdisc_dequeue_peeked, red/sfb will have to dequeue from the child's gso_skb to retrieve the packet. Not doing so will cause a NULL ptr deref which was happening before a recent fix. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260430152957.194015-4-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:20:56 -07:00
Victor Nogueria	1b9bc71153	net/sched: sch_sfb: Replace direct dequeue call with peek and qdisc_dequeue_peeked When sfb has children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (sfb in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(sfb) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (sfb). And herein lies the problem. - sfb will call the child's dequeue() which will essentially just try to grab something of qfq's queue. [ 127.594489][ T453] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 127.594741][ T453] CPU: 2 UID: 0 PID: 453 Comm: ping Not tainted 7.1.0-rc1-00035-gac961974495b-dirty #793 PREEMPT(full) [ 127.595059][ T453] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 127.595254][ T453] RIP: 0010:qfq_dequeue+0x35c/0x1650 [sch_qfq] [ 127.595461][ T453] Code: 00 fc ff df 80 3c 02 00 0f 85 17 0e 00 00 4c 8d 73 48 48 89 9d b8 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 76 0c 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b [ 127.596081][ T453] RSP: 0018:ffff88810e5af440 EFLAGS: 00010216 [ 127.596337][ T453] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 127.596623][ T453] RDX: 0000000000000009 RSI: 0000001880000000 RDI: ffff888104fd82b0 [ 127.596917][ T453] RBP: ffff888104fd8000 R08: ffff888104fd8280 R09: 1ffff110211893a3 [ 127.597165][ T453] R10: 1ffff110211893a6 R11: 1ffff110211893a7 R12: 0000001880000000 [ 127.597404][ T453] R13: ffff888104fd82b8 R14: 0000000000000048 R15: 0000000040000000 [ 127.597644][ T453] FS: 00007fc380cbfc40(0000) GS:ffff88816f2a8000(0000) knlGS:0000000000000000 [ 127.597956][ T453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 127.598160][ T453] CR2: 00005610aa9890a8 CR3: 000000010369e000 CR4: 0000000000750ef0 [ 127.598390][ T453] PKRU: 55555554 [ 127.598509][ T453] Call Trace: [ 127.598629][ T453] <TASK> [ 127.598718][ T453] ? mark_held_locks+0x40/0x70 [ 127.598890][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599053][ T453] sfb_dequeue+0x88/0x4d0 [ 127.599174][ T453] ? ktime_get+0x137/0x230 [ 127.599328][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599480][ T453] ? qdisc_peek_dequeued+0x7b/0x350 [sch_qfq] [ 127.599670][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599831][ T453] tbf_dequeue+0x6b1/0x1098 [sch_tbf] [ 127.599988][ T453] __qdisc_run+0x169/0x1900 The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Fixes: `e13e02a3c6` ("net_sched: SFB flow scheduler") Signed-off-by: Victor Nogueria <victor@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:20:56 -07:00
Jamal Hadi Salim	458d561527	net/sched: sch_red: Replace direct dequeue call with peek and qdisc_dequeue_peeked When red qdisc has children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (red in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(red) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (red). And herein lies the problem. - red will call the child's dequeue() which will essentially just try to grab something of qfq's queue. [ 78.667668][ T363] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 78.667927][ T363] CPU: 1 UID: 0 PID: 363 Comm: ping Not tainted 7.1.0-rc1-00033-g46f74a3f7d57-dirty #790 PREEMPT(full) [ 78.668263][ T363] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 78.668486][ T363] RIP: 0010:qfq_dequeue+0x446/0xc90 [sch_qfq] [ 78.668718][ T363] Code: 54 c0 e8 dd 90 00 f1 48 c7 c7 e0 03 54 c0 48 89 de e8 ce 90 00 f1 48 8d 7b 48 b8 ff ff 37 00 48 89 fa 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 05 e8 ef a1 e1 f1 48 8b 7b 48 48 8d 54 24 58 48 8d [ 78.669312][ T363] RSP: 0018:ffff88810de573e0 EFLAGS: 00010216 [ 78.669533][ T363] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 78.669790][ T363] RDX: 0000000000000009 RSI: 0000000000000004 RDI: 0000000000000048 [ 78.670044][ T363] RBP: ffff888110dc4000 R08: ffffffffb1b0885a R09: fffffbfff6ba9078 [ 78.670297][ T363] R10: 0000000000000003 R11: ffff888110e31c80 R12: 0000001880000000 [ 78.670560][ T363] R13: ffff888110dc4150 R14: ffff888110dc42b8 R15: 0000000000000200 [ 78.670814][ T363] FS: 00007f66a8f09c40(0000) GS:ffff888163428000(0000) knlGS:0000000000000000 [ 78.671110][ T363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 78.671324][ T363] CR2: 000055db4c6a30a8 CR3: 000000010da67000 CR4: 0000000000750ef0 [ 78.671585][ T363] PKRU: 55555554 [ 78.671713][ T363] Call Trace: [ 78.671843][ T363] <TASK> [ 78.671936][ T363] ? __pfx_qfq_dequeue+0x10/0x10 [sch_qfq] [ 78.672148][ T363] ? __pfx__printk+0x10/0x10 [ 78.672322][ T363] ? srso_alias_return_thunk+0x5/0xfbef5 [ 78.672496][ T363] ? lockdep_hardirqs_on_prepare+0xa8/0x1a0 [ 78.672706][ T363] ? srso_alias_return_thunk+0x5/0xfbef5 [ 78.672875][ T363] ? trace_hardirqs_on+0x19/0x1a0 [ 78.673047][ T363] red_dequeue+0x65/0x270 [sch_red] [ 78.673217][ T363] ? srso_alias_return_thunk+0x5/0xfbef5 [ 78.673385][ T363] tbf_dequeue.cold+0xb0/0x70c [sch_tbf] [ 78.673566][ T363] __qdisc_run+0x169/0x1900 The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Fixes: `77be155cba` ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.") Reported-by: Manas <ghandatmanas@gmail.com> Reported-by: Rakshit Awasthi <rakshitawasthi17@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:20:55 -07:00
Gregory Fuchedgi	383d0fb894	amd-xgbe: fix PTP addend overflow causing frozen clock XGBE_PTP_ACT_CLK_FREQ and XGBE_V2_PTP_ACT_CLK_FREQ were 10x too large (500MHz/1GHz instead of 50MHz/100MHz), causing the computed addend to overflow the 32-bit tstamp_addend. In the general case this would result in the clock advancing at the wrong rate. For v2 (PCI), ptpclk_rate is hardcoded to 125MHz, so the addend formula (ACT_CLK_FREQ << 32) / ptpclk_rate yields exactly 8 * 2^32, and when stored to the 32-bit tstamp_addend the value is zero. With addend = 0 the hardware accumulator never overflows and the PTP clock is fully stopped. For v1 (platform), ptpclk_rate is read from ACPI/DT so the exact overflow behavior depends on the firmware-reported frequency. Define the constants as NSEC_PER_SEC / SSINC so the relationship is explicit and cannot drift out of sync. Fixes: `fbd47be098` ("amd-xgbe: add hardware PTP timestamping support") Tested-by: Gregory Fuchedgi <gfuchedgi@gmail.com> Signed-off-by: Gregory Fuchedgi <gfuchedgi@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429-fix-xgbe-ptp-addend-v1-1-fca5b0ca5e62@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:16:27 -07:00
Holger Brunck	851bba8068	net: wan: fsl_ucc_hdlc: fix ucc_hdlc_remove If the driver is used in a non tdm mode priv->utdm is a NULL pointer. Therefore we need to check this pointer first before checking si_regs. Fixes: `c19b6d246a` ("drivers/net: support hdlc function for QE-UCC") Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:14:06 -07:00
Holger Brunck	1a57efe250	net: wan: fsl_ucc_hdlc: fix uhdlc_memclean Unmapping of uf_regs is done from ucc_fast_free and doesn't need to be done explicitly. If already unmapped ucc_fast_free will crash. Fixes: `c19b6d246a` ("drivers/net: support hdlc function for QE-UCC") Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:14:00 -07:00
Maciej W. Rozycki	ddca6da148	MAINTAINERS: Add self for the DEC LANCE network driver Like with the rest of DECstation and TURBOchannel hardware I have been handling the DEC LANCE network driver for some 25 years now anyway. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/alpine.DEB.2.21.2604271113520.28583@angie.orcam.me.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-02 10:06:32 -07:00
Jakub Kicinski	ebb639024e	Merge branch 'ipv6-fix-ecmp-route-failover-on-carrier-loss' Sagarika Sharma says: ==================== ipv6: fix ECMP route failover on carrier loss This patchset resolves an issue where established IPv6 connections are unable to transition to alternative ECMP nexthops upon carrier loss. Unlike IPv4, the IPv6 routing subsystem does not actively invalidate cached destinations during a NETDEV_CHANGE event. Sockets persist with dead routes, leading to stalled traffic or connection drops. This series introduces a fix to trigger route invalidation by updating the route serial number on link carrier loss and provides a corresponding selftest to validate the failover behavior for IPv4 and IPv6. ==================== Link: https://patch.msgid.link/20260430200909.527827-1-sharmasagarika@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:58:46 -07:00
Kuniyuki Iwashima	d1ae37dc68	selftest: net: Add test for TCP flow failover with ECMP routes. Without the previous commit, TCP failed to switch to alternative IPv6 routes immediately upon carrier loss. It would persist with the dead route until reaching the threshold net.ipv4.tcp_retries1, leading to unnecessary delays in failover. Let's add a selftest for this scenario to ensure TCP fails over immediately upon a carrier loss event. Before: TEST: TCP IPv4 failover [ OK ] TEST: TCP IPv6 failover [FAIL] After: TEST: TCP IPv4 failover [ OK ] TEST: TCP IPv6 failover [ OK ] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Sagarika Sharma <sharmasagarika@google.com> Link: https://patch.msgid.link/20260430200909.527827-3-sharmasagarika@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:58:44 -07:00
Sagarika Sharma	4bc852006b	ipv6: update route serial number on NETDEV_CHANGE When using IPv6 ECMP routes, if a netdev listed as a nexthop experiences a carrier change event (e.g., a bond device generating a NETDEV_CHANGE event after its slaves go linkdown), established connections utilizing that nexthop fail to fail over to other available nexthops. Instead, these connections stall or drop. This happens because the IPv6 FIB code does not invalidate the socket's cached destination when a NETDEV_CHANGE event occurs. While fib6_ifdown() correctly marks the nexthop with RTNH_F_LINKDOWN, it leaves the route's serial number unchanged. As a result, sockets with a previously cached dst do not realize the route is no longer viable and continue to try using the non-functional nexthop. This behavior contrasts with IPv4, which actively flushes cached destinations on a NETDEV_CHANGE event (see fib_netdev_event() in net/ipv4/fib_frontend.c). Fix this by updating the route serial number in fib6_ifdown() when setting RTNH_F_LINKDOWN. This invalidates stale cached destinations, forcing sockets to perform a new route lookup and fail over to a functioning nexthop. Fixes: `51ebd31815` ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: Sagarika Sharma <sharmasagarika@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430200909.527827-2-sharmasagarika@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:58:44 -07:00
Eric Dumazet	6d4106e8df	net/sched: sch_pie: annotate more data-races in pie_dump_stats() My prior patch missed few READ_ONCE()/WRITE_ONCE() annotations. Fixes: `5154561d9b` ("net/sched: sch_pie: annotate data-races in pie_dump_stats()") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430080056.35104-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:54:57 -07:00
Alex Cheema	a5148bc2fa	net: usb: cdc_ncm: add Apple Mac USB-C direct networking quirk Apple Silicon Macs expose two CDC NCM "private" data interfaces over USB-C with VID:PID 0x05ac:0x1905 and product string "Mac". This is the same protocol Apple already ships on iPhone (0x05ac:0x12a8) and iPad (0x05ac:0x12ab) for RemoteXPC since iOS 17 -- both data interfaces lack an interrupt status endpoint, so they rely on the FLAG_LINK_INTR- conditional bind path introduced in commit `3ec8d7572a` ("CDC-NCM: add support for Apple's private interface"). The id_table currently has entries for iPhone and iPad but not for the Mac. Without a match, cdc_ncm falls through to the generic CDC NCM class-match entry, which uses the FLAG_LINK_INTR-having cdc_ncm_info struct, so bind_common() fails on the missing status endpoint and no netdev appears. Add id_table entries for both interface numbers (0 and 2) of the Mac, bound to the existing apple_private_interface_info driver_info. Verified empirically on a Mac Studio M3 Ultra running macOS 26.5: when a Mac is connected via USB-C, ioreg shows VID 0x05ac, PID 0x1905, product string "Mac", with two NCM data interfaces at numbers 0 and 2. The same PID is presented by all current Apple Silicon Mac models (MacBook Pro/Air, Mac mini, Mac Studio across the M-series), mirroring Apple's single-PID-per-family pattern from iPhone/iPad. After this patch, plugging a Mac into a Linux host running the patched kernel produces two enx... interfaces (one per data interface), "ip -br link" lists them as UP, and standard userspace networking (DHCP, NetworkManager shared mode, etc.) works without any modprobe overrides or out-of-tree modules. Signed-off-by: Alex Cheema <alex@exolabs.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429175739.34426-1-alex@exolabs.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:17:27 -07:00
Eric Dumazet	c6bebaa744	ipv4: igmp: annotate data-races in igmp_heard_query() Multiple cpus can run igmp_heard_query() concurrently. Add missing READ_ONCE()/WRITE_ONCE() over following in_dev fields. - mr_qrv - mr_qi - mr_qri - mr_v1_seen - mr_v2_seen Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+ae9a171f239b14485310@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69f38675.050a0220.3cbe47.0002.GAE@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430164836.872079-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:11:42 -07:00
Kai Zen	4b9e327991	net: rtnetlink: zero ifla_vf_broadcast to avoid stack infoleak in rtnl_fill_vfinfo rtnl_fill_vfinfo() declares struct ifla_vf_broadcast on the stack without initialisation: struct ifla_vf_broadcast vf_broadcast; The struct contains a single fixed 32-byte field: /* include/uapi/linux/if_link.h / struct ifla_vf_broadcast { __u8 broadcast[32]; }; The function then copies dev->broadcast into it using dev->addr_len as the length: memcpy(vf_broadcast.broadcast, dev->broadcast, dev->addr_len); On Ethernet devices (the overwhelming majority of SR-IOV NICs) dev->addr_len is 6, so only the first 6 bytes of broadcast[] are written. The remaining 26 bytes retain whatever was previously on the kernel stack. The full struct is then handed to userspace via: nla_put(skb, IFLA_VF_BROADCAST, sizeof(vf_broadcast), &vf_broadcast) leaking up to 26 bytes of uninitialised kernel stack per VF per RTM_GETLINK request, repeatable. The other vf_ structs in the same function are explicitly zeroed for exactly this reason - see the memset() calls for ivi, vf_vlan_info, node_guid and port_guid a few lines above. vf_broadcast was simply missed when it was added. Reachability: any unprivileged local process can open AF_NETLINK / NETLINK_ROUTE without capabilities and send RTM_GETLINK with an IFLA_EXT_MASK attribute carrying RTEXT_FILTER_VF. The kernel walks each VF and emits IFLA_VF_BROADCAST, leaking 26 bytes of stack per VF per request. Stack residue at this call site can include return addresses and transient sensitive data; KASAN with stack instrumentation, or KMSAN, will flag the nla_put() when reproduced. Zero the on-stack struct before the partial memcpy, matching the existing pattern used for the other vf_* structs in the same function. Fixes: `75345f888f` ("ipoib: show VF broadcast address") Cc: stable@vger.kernel.org Signed-off-by: Kai Zen <kai.aizen.dev@gmail.com> Link: https://patch.msgid.link/3c506e8f936e52b57620269b55c348af05d413a2.1777557228.git.kai.aizen.dev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:04:28 -07:00
Eric Dumazet	4f34002e2e	ipmr: prevent info-leak in pmr_cache_report() Yiming Qian reported: <quote> ipmr_cache_report()` allocates a report skb with `alloc_skb(128, GFP_ATOMIC)` and appends a `struct igmphdr` using `skb_put()`. In the non-`IGMPMSG_WHOLEPKT` path it initializes only: - `igmp->type` - `igmp->code` but does not initialize: - `igmp->csum` - `igmp->group` Later, `igmpmsg_netlink_event()` copies the bytes after `sizeof(struct igmpmsg)` into the `IPMRA_CREPORT_PKT` netlink attribute and emits `RTM_NEWCACHEREPORT` on `RTNLGRP_IPV4_MROUTE_R`. As a result, 6 bytes of stale heap data from the skb head are disclosed to userspace. </quote> Let's use skb_put_zero() instead of skb_put() to fix this bug. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260430070611.4004529-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 17:01:40 -07:00
Aleksander Jan Bajkowski	f93836b236	net: usb: r8152: add TRENDnet TUC-ET2G v2.0 The TRENDnet TUC-ET2G V2.0 is an RTL8156B based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Birger Koblitz <mail@birger-koblitz.de> Link: https://patch.msgid.link/20260430213435.21821-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 16:50:49 -07:00
Jakub Kicinski	2ab02ac411	Merge tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Replace skb_try_make_writable() by skb_ensure_writable() in nft_fwd_netdev and the flowtable to deal with uncloned packets having their network header in paged fragments. 2) Drop packet if output device does not exist and ensure sufficient headroom in nft_fwd_netdev before transmitting the skb. 3) Use the existing dup recursion counter in nft_fwd_netdev for the neigh_xmit variant, from Weiming Shi. 4) Add .check_hooks interface to x_tables to detach the control plane hook check based on the match/target configuration. Then, update nft_compat to use .check_hooks from .validate path, this fixes a lack of hook validation for several match/targets. 5) Fix incorrect .usersize in xt_CT, from Florian Westphal. 6) Fix a memleak with netdev tables in dormant state, from Florian Westphal. 7) Several patches to check if the packet is a fragment, then skip layer 4 inspection, for x_tables and nf_tables; as well as common nf_socket infrastructure. The xt_hashlimit match drops fragments to stay consistent with the existing approach when failing to parse the layer 4 protocol header. 8) Ensure sufficient headroom in the flowtable before transmitting the skb. 9) Fix the flowtable inline vlan approach for double-tagged vlan: Reverse the iteration over .encap[] since it represents the encapsulation as seen from the ingress path. Postpone pushing layer 2 header so output device is available to calculate needed headroom. Finally, add and use nf_flow_vlan_push() to fix it. 10) Fix flowtable inline pppoe with GSO packets. Moreover, use FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware address since neighbour cache does not exist in pppoe. 11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for double-tagged vlan in particular this should provide some benefits in certain scenarios. More notes regarding 9-11): - sashiko is also signalling to use it for IPIP headers, but that needs more adjustments such setting skb->protocol after removing the IPIP header, will follow up in a separated patch. - I plan to submit selftests to cover double-tagged-vlan. As for pppoe, it should be possible but that would mandate a few userspace dependencies. This has been semi-automatically tested by me and reporters describing broken double-vlan-tagged and pppoe currently in the flowtable. * tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header netfilter: flowtable: fix inline pppoe encapsulation in xmit path netfilter: flowtable: fix inline vlan encapsulation in xmit path netfilter: flowtable: ensure sufficient headroom in xmit path netfilter: xtables: fix L4 header parsing for non-first fragments netfilter: nf_tables: skip L4 header parsing for non-first fragments netfilter: nf_socket: skip socket lookup for non-first fragments netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables netfilter: xt_CT: fix usersize for v1 and v2 revision netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate netfilter: x_tables: add .check_hooks to matches and targets netfilter: nft_fwd_netdev: use recursion counter in neigh egress path netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding netfilter: replace skb_try_make_writable() by skb_ensure_writable() ==================== Link: https://patch.msgid.link/20260501122237.296262-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-01 16:45:42 -07:00
Pablo Neira Ayuso	baa3c65435	netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header This adjusts the checksum, if required, after pulling the layer 2 header, either the pppoe header or the inner vlan header in the double-tagged vlan packets. Fixes: `4cd91f7c29` ("netfilter: flowtable: add vlan support") Fixes: `72efd585f7` ("netfilter: flowtable: add pppoe support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-01 12:39:23 +02:00
Jakub Kicinski	85da3965df	Merge branch 'octeontx2-af-npc-cn20k-mcam-fixes' Ratheesh Kannoth says: ==================== octeontx2-af: npc: cn20k: MCAM fixes This series tightens Marvell OcteonTX2 AF NPC support for CN20K silicon around MCAM key typing, optional debugfs setup, defrag allocation rollback, defrag entry relocation bookkeeping, logical MCAM clear and programming, default-rule index handling with explicit teardown, and NIXLF reserved-slot lookup when default rules are missing. Patches 1 through 3 focus on AF error handling: propagate npc_mcam_idx_2_key_type() failures through cn20k MCAM enable, config, copy, and read paths; treat cn20k NPC debugfs nodes as optional so probe does not fail when debugfs is unavailable; and fix defrag MCAM allocation rollback so allocation errno is not overwritten during subbank index resolution. Patch 4 fixes npc_defrag_move_vdx_to_free(): when an MCAM line is moved to a new physical index, move entry2target_pffunc[] association to the new slot, clear the old slot, and retarget the matching mcam_rules entry so software state matches hardware after defrag. Patches 5 through 7 refine cn20k MCAM programming: clear entries using the logical MCAM index and resolved key width, fix bank/CFG sequencing in npc_cn20k_config_mcam_entry(), and read action metadata from the correct bank in npc_cn20k_read_mcam_entry(). Patches 8 through 10 complete default-rule lifecycle handling: initialize default-rule index outputs eagerly, tear down reserved default MCAM rules explicitly (coordinated with npc_mcam_free_all_entries()), and reject USHRT_MAX sentinel indices from npc_get_nixlf_mcam_index() on cn20k. ==================== Link: https://patch.msgid.link/20260429022722.1110289-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:19 -07:00
Ratheesh Kannoth	bc968f61bf	octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices When cn20k default L2 rules are not installed, npc_cn20k_dft_rules_idx_get() leaves broadcast, multicast, promiscuous, and unicast slots at USHRT_MAX. npc_get_nixlf_mcam_index() previously returned that sentinel as a valid MCAM index, so callers could program hardware with an invalid index. Return -EINVAL from the cn20k branches of npc_get_nixlf_mcam_index() when the requested slot is still USHRT_MAX. Harden cn20k NPC MCAM entry helpers to reject out-of-range indices before touching hardware. Drop the early bounds check in npc_enable_mcam_entry() for cn20k so invalid indices are validated inside npc_cn20k_enable_mcam_entry() instead of being silently ignored. In rvu_npc_update_flowkey_alg_idx(), treat negative MCAM indices like out-of-range values, and only update RSS actions for promiscuous and all-multi paths when the resolved index is non-negative. Cc: Suman Ghosh <sumang@marvell.com> Fixes: `6d1e70282f` ("octeontx2-af: npc: cn20k: Use common APIs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-11-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:17 -07:00
Ratheesh Kannoth	013717353c	octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free npc_cn20k_dft_rules_free() used the NPC MCAM mbox "free all" path, which does not match how cn20k tracks default-rule MCAM slots indexes. Resolve the default-rule indices, then for each valid slot clear the bitmap entry, drop the PF/VF map, disable the MCAM line, clear the target function, and npc_cn20k_idx_free(). Remove any matching software mcam_rules nodes. On hard failure from idx_free, WARN and stop so the box stays up for analysis. In npc_mcam_free_all_entries(), prefetch the same default-rule indices and, on cn20k, skip bitmap clear and idx_free when the scanned entry is one of those reserved defaults (they are released by npc_cn20k_dft_rules_free). Fixes: `09d3b7a140` ("octeontx2-af: npc: cn20k: Allocate default MCAM indexes") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-10-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:17 -07:00
Ratheesh Kannoth	afb474bd4f	octeontx2-af: npc: cn20k: Initialize default-rule index outputs up front npc_cn20k_dft_rules_idx_get() wrote USHRT_MAX into individual outputs only on some error paths (lbk promisc lookup, VF ucast lookup, and the PF rule walk), which could leave other caller slots stale across retries. Set every non-NULL bcast/mcast/promisc/ucast pointer to USHRT_MAX once at entry, then drop the duplicate assignments on failure. Successful lookups still overwrite the relevant slot before returning. Fixes: `09d3b7a140` ("octeontx2-af: npc: cn20k: Allocate default MCAM indexes") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-9-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:17 -07:00
Ratheesh Kannoth	f6803eb070	octeontx2-af: npc: cn20k: Fix MCAM actions read npc_cn20k_read_mcam_entry() always reloaded action and vtag_action from bank 0 after programming the CAM words. Use the bank returned by npc_get_bank() for the ACTION reads as well, and read those registers once up front so both X2 and X4 paths share the same metadata. Return directly from the X2 keyword path now that the action fields are already populated. Cc: Suman Ghosh <sumang@marvell.com> Fixes: `6d1e70282f` ("octeontx2-af: npc: cn20k: Use common APIs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-8-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:16 -07:00
Ratheesh Kannoth	2b6d6bb728	octeontx2-af: npc: cn20k: Fix bank value For X4 keys its loop reused the bank parameter as the loop counter, so bank no longer reflected the caller's bank after the loop and the control flow was hard to follow. Program NPC_AF_CN20K_MCAMEX_BANKX_CFG_EXT directly in npc_cn20k_config_mcam_entry(): one CFG write for X2 using the computed bank, and one CFG write per bank inside the X4 action loop. Enable the entry at the end with npc_cn20k_enable_mcam_entry(..., true) instead of embedding the enable bit in bank_cfg via the removed helper. Cc: Suman Ghosh <sumang@marvell.com> Fixes: `4e527f1e5c` ("octeontx2-af: npc: cn20k: Add new mailboxes for CN20K silicon") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-7-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:16 -07:00
Ratheesh Kannoth	d2dabf0963	octeontx2-af: npc: cn20k: Clear MCAM entries by index and key width Replace the old four-argument CN20K MCAM clear with a per-bank static helper and npc_cn20k_clear_mcam_entry() that takes a logical MCAM index, resolves the key width via npc_mcam_idx_2_key_type(), and clears either one bank (X2) or every bank (X4). Call it from npc_clear_mcam_entry() on cn20k and log when key-type lookup fails. Use the per-bank helper from npc_cn20k_config_mcam_entry() for pre-program clears. For loopback VFs, use the promisc MCAM index as ucast_idx when copying RSS action for promisc, matching cn20k default-rule layout. Cc: Suman Ghosh <sumang@marvell.com> Fixes: `6d1e70282f` ("octeontx2-af: npc: cn20k: Use common APIs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-6-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:16 -07:00
Ratheesh Kannoth	d7e5940c4c	octeontx2-af: npc: cn20k: Fix target map and rule npc_defrag_move_vdx_to_free() disables, copies, and enables the MCAM entry at a new index but previously left entry2target_pffunc[] and the mcam_rules list still keyed to the old index. Copy the target PF association to the new slot, clear the old one, and retarget the rule entry so software state matches the relocated hardware context. Fixes: `645c6e3c19` ("octeontx2-af: npc: cn20k: virtual index support") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-5-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:16 -07:00
Ratheesh Kannoth	adb5ff41ef	octeontx2-af: npc: cn20k: Propagate errors in defrag MCAM alloc rollback npc_defrag_alloc_free_slots() allocates MCAM indexes in up to two passes on bank0 then bank1. On failure it rolls back by freeing entries already placed in save[]. __npc_subbank_alloc() can return a negative errno while only part of the indexes are valid. The rollback loop used rc for npc_mcam_idx_2_subbank_idx() as well, so a successful lookup stored zero in rc and a later __npc_subbank_free() failure could still end with return 0 when the allocation path had also left rc at zero (for example shortfall after zero return values from the alloc helpers). Jump to the rollback path immediately when either __npc_subbank_alloc() call fails, preserving its errno. If both calls succeed but the total allocated count is still less than cnt, set rc to -ENOSPC before rollback. Use a separate err variable for npc_mcam_idx_2_subbank_idx() so a successful lookup no longer clears a non-zero rc from the allocation phase. Cc: Dan Carpenter <error27@gmail.com> Fixes: `645c6e3c19` ("octeontx2-af: npc: cn20k: virtual index support") Link: https://lore.kernel.org/netdev/adjNJEpILRZATB2N@stanley.mountain/ Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-4-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:16 -07:00
Ratheesh Kannoth	1100af13fd	octeontx2-af: npc: cn20k: Drop debugfs_create_file() error checks in init debugfs is not intended to be checked for allocation failures the way other kernel APIs are: callers should not fail probe or subsystem init because a debugfs node could not be created, including when debugfs is disabled in Kconfig. Replacing NULL checks with IS_ERR() checks is similarly wrong for optional debugfs. Remove dentry checks and -EFAULT returns from npc_cn20k_debugfs_init(). See: https://staticthinking.wordpress.com/2023/07/24/ debugfs-functions-are-not-supposed-to-be-checked/ Cc: Dan Carpenter <error27@gmail.com> Fixes: `528530dff5` ("octeontx2-af: npc: cn20k: add debugfs support") Link: https://lore.kernel.org/netdev/adjNGPWKMOk3KgWL@stanley.mountain/ Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-3-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:15 -07:00
Ratheesh Kannoth	aaadccde31	octeontx2-af: npc: cn20k: Propagate MCAM key-type errors on cn20k npc_mcam_idx_2_key_type() can fail; callers used to ignore it and still used kw_type when enabling, configuring, copying, and reading MCAM entries. That could program or decode hardware with an undefined key type. Return -EINVAL when key-type lookup fails. Return -EINVAL from npc_cn20k_copy_mcam_entry() when src and dest key types differ instead of failing silently. Change npc_cn20k_{enable,config,copy,read}_mcam_entry() to return int on success or error. Thread those errors through the cn20k MCAM write and read mbox handlers, the cn20k baseline steer read path, NPC defrag move (disable/copy/enable with dev_err and -EFAULT), and the DMAC update path in rvu_npc_fs.c. Make npc_copy_mcam_entry() return int so the cn20k branch can return npc_cn20k_copy_mcam_entry() without a void/int mismatch, and fail NPC_MCAM_SHIFT_ENTRY when copy fails. Cc: Suman Ghosh <sumang@marvell.com> Cc: Dan Carpenter <error27@gmail.com> Fixes: `6d1e70282f` ("octeontx2-af: npc: cn20k: Use common APIs") Link: https://lore.kernel.org/netdev/adiQJvuKlEhq2ILx@stanley.mountain/ Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-2-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:50:15 -07:00
Lorenzo Bianconi	75df490c9e	net: airoha: Move entries to queue head in case of DMA mapping failure in airoha_dev_xmit() In order to respect the original descriptor order and avoid any potential IOMMU fault or memory corruption, move pending queue entries to the head of hw queue tx_list if the DMA mapping of current inflight packet fails in airoha_dev_xmit routine. Fixes: `3f47e67dff` ("net: airoha: Add the capability to consume out-of-order DMA tx descriptors") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260429-airoha-xmit-unmap-error-path-v2-1-32e43b7c6d25@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:08:48 -07:00
Jiawen Wu	7a33345153	net: libwx: use request_irq for VF misc interrupt Currently, request_threaded_irq() is used with a primary handler but a NULL threaded handler, while also setting the IRQF_ONESHOT flag. This specific combination triggers a WARNING since the commit `aef30c8d56` ("genirq: Warn about using IRQF_ONESHOT without a threaded handler"). WARNING: kernel/irq/manage.c:1502 at __setup_irq+0x4fa/0x760 Fix the issue by switching to request_irq(), which is the appropriate interface or a non-threaded interrupt handler, and removing the unnecessary IRQF_ONESHOT flag. Fixes: `eb4898fde1` ("net: libwx: add wangxun vf common api") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/786DDC7D5CCA6D0A+20260429083743.88961-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:07:21 -07:00
Jiawen Wu	694de316f6	net: libwx: fix VF illegal register access Register WX_CFG_PORT_ST is a PF restricted register. When a VF is initialized, attempting to read this register triggers an illegal register access, which lead to a system hang. When the device is VF, the bus function ID can be obtained directly from the PCI_FUNC(pdev->devfn). Fixes: `a04ea57aae` ("net: libwx: fix device bus LAN ID") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/4D1F4452D21DE107+20260429083743.88961-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 18:07:21 -07:00
Wei Fang	26ebd12e67	net: enetc: fix VSI mailbox timeout handling and DMA lifecycle In the current VSI mailbox implementation, the VSI allocates a DMA buffer to store the message sent to the PSI. When the PSI receives the message request from the VSI, the hardware copies the message data from this DMA buffer to PSI's DMA buffer for processing. When enetc_msg_vsi_send() times out, two scenarios can occur: 1) Use-after-free: If the hardware hasn't completed message copying when the VSI frees the buffer, the hardware may subsequently copy the data from freed memory to PSI's DMA buffer. 2) Message race: If PSI hasn't processed the previous message when the next message is sent, the VSI may receive the previous message's reply, leading to incorrect handling. To address these issues, implement the following changes: - Check the mailbox busy status before sending a new message. If the mailbox is in busy state, it indicates the previous message is still being processed, so return an error immediately. - Add the 'msg' field to struct enetc_si to preserve the DMA buffer information. The caller of enetc_msg_vsi_send() no longer frees the DMA buffer. Instead, defer freeing until it is safe to do so (when mailbox is not busy on next send). - Add cleanup in enetc_vf_remove() to free the last message buffer. This ensures the DMA buffer remains valid during message copying and prevents message reply mismatches. Fixes: `beb74ac878` ("enetc: Add vf to pf messaging support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260429081930.3259824-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 17:35:56 -07:00
Daniel Borkmann	3744b0964d	ipv6: Implement limits on extension header parsing ipv6_{skip_exthdr,find_hdr}() and ip6_{tnl_parse_tlv_enc_lim, protocol_deliver_rcu}() iterate over IPv6 extension headers until they find a non-extension-header protocol or run out of packet data. The loops have no iteration counter, relying solely on the packet length to bound them. For a crafted packet with 8-byte extension headers filling a 64KB jumbogram, this means a worst case of up to ~8k iterations with a skb_header_pointer call each. ipv6_skip_exthdr(), for example, is used where it parses the inner quoted packet inside an incoming ICMPv6 error: - icmpv6_rcv - checksum validation - case ICMPV6_DEST_UNREACH - icmpv6_notify - pskb_may_pull() <- pull inner IPv6 header - ipv6_skip_exthdr() <- iterates here - pskb_may_pull() - ipprot->err_handler() <- sk lookup The per-iteration cost of ipv6_skip_exthdr itself is generally light, but skb_header_pointer becomes more costly on reassembled packets: the first ~1232 bytes of the inner packet are in the skb's linear area, but the remaining ~63KB are in the frag_list where skb_copy_bits is needed to read data. Initially, the idea was to add a configurable limit via a new sysctl knob with default 8, in line with knobs from commit `47d3d7ac65` ("ipv6: Implement limits on Hop-by-Hop and Destination options"), but two reasons eventually argued against it: - It adds to UAPI that needs to be maintained forever, and upcoming work is restricting extension header ordering anyway, leaving little reason for another sysctl knob - exthdrs_core.c is always built-in even when CONFIG_IPV6=n, where struct net has no .ipv6 member, so the read site would need an ifdef'd fallback to a constant anyway Therefore, just use a constant (IP6_MAX_EXT_HDRS_CNT). All four extension header walking functions are now bound by this limit. Note that the check in ip6_protocol_deliver_rcu() happens right before the goto resubmit, such that we don't have to have a test for ipv6_ext_hdr() in the fast-path. There's an ongoing IETF draft-iurman-6man-eh-occurrences to enforce IPv6 extension headers ordering and occurrence. The latter also discusses security implications. As per RFC8200 section 4.1, the occurrence rules for extension headers provide a practical upper bound which is 8. In order to be conservative, let's define IP6_MAX_EXT_HDRS_CNT as 12 to leave enough room for quirky setups. In the unlikely event that this is still not enough, then we might need to reconsider a sysctl. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260429154648.809751-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 17:21:45 -07:00
Robert Marko	e027c218c4	net: phy: micrel: fix LAN8814 QSGMII soft reset LAN8814 QSGMII soft reset was moved into the probe function to avoid triggering it for each of 4 PHY-s in the package. However, that broke QSGMII link between the MAC and PHY on most LAN8814 PHY-s, specificaly for us on the Microchip LAN969x switch. Reading the QSGMII status registers it was visible that lanes were only partially synced. It looks like the reset timing is crucial, so lets move the reset back into the .config_init function but guard it with phy_package_init_once() to avoid it being triggered on each of 4 PHY-s in the package. Change the probe function to use phy_package_probe_once() for coma and PtP setup. Fixes: `96a9178a29` ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Link: https://patch.msgid.link/20260428134138.1741253-1-robert.marko@sartura.hr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 16:49:23 -07:00
Pablo Neira Ayuso	69c54f80f4	netfilter: flowtable: fix inline pppoe encapsulation in xmit path Address two issues in the inline pppoe encapsulation: - Add needs_gso_segment flag to segment PPPoE packets in software given that there is no GSO support for this. - Use FLOW_OFFLOAD_XMIT_DIRECT since neighbour cache is not available in point-to-point device, use the hardware address that is obtained via flowtable path discovery (ie. fill_forward_path). Fixes: `18d27bed08` ("netfilter: flowtable: inline pppoe encapsulation in xmit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-01 01:24:01 +02:00
Pablo Neira Ayuso	a177ae30f7	netfilter: flowtable: fix inline vlan encapsulation in xmit path Several issues in the inline vlan support: - The layer 2 encapsulation representation in the tuple takes encap[0] as the outer header and encap[1] as the inner header as seen from the ingress path. Reverse the encap loop to push first the inner then the outer vlan header. - Postpone pushing the layer 2 header once destination device is known. This allows to calculate the needed hearoom via LL_RESERVED_SPACE to accommodate the layer 2 headers. - Add and use nf_flow_vlan_push() as suggested by Eric Woudstra, this is a simplified version of skb_vlan_push() for egress path only. Fixes: `c653d5a78f` ("netfilter: flowtable: inline vlan encapsulation in xmit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-01 01:23:47 +02:00
Jakub Kicinski	a54c9a13cf	Merge branch 'net-mctp-test-minor-kunit-test-fixes' Jeremy Kerr says: ==================== net: mctp: test: minor kunit test fixes This series provides two fixes in the MCTP kunit tests - one exposed by ktr, and one found while debugging the former on different VM configs. ==================== Link: https://patch.msgid.link/20260429-dev-mctp-test-fixes-v1-0-1127b7425809@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 13:36:49 -07:00
Jeremy Kerr	7687297106	net: mctp: test: Use dev_direct_xmit for TX to our test device In our test cases, we typically feed a packet sequence into the routing code, then inspect the device's TXed skbs to assert specific behaviours. Using dev_queue_xmit() for our TX path introduces a fair bit of complexity between the test packet sequence and the test device's ndo_start_xmit callback; which may mean that the skbs have not hit the device at the point we're inspecting the TXed skb list. Use dev_direct_xmit instead, as we want a direct a path as possible here, and the test dev does not need any queueing, scheduling or flow control. Fixes: `6ab578739a` ("net: mctp: test: move TX packetqueue from dst to dev") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202604281320.525eee17-lkp@intel.com Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429-dev-mctp-test-fixes-v1-2-1127b7425809@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 13:36:47 -07:00
Jeremy Kerr	18ed60e33e	net: mctp: test: use a zeroed struct sockaddr_mctp Invalid sockaddr padding will cause bind() to fail; ensure we have a zeroed address in the testcase. Fixes: `0d8647bc74` ("net: mctp: don't require a route for null-EID ingress") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429-dev-mctp-test-fixes-v1-1-1127b7425809@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-30 13:36:47 -07:00

1 2 3 4 5 ...

1444323 Commits