linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-20 20:15:23 -04:00

Author	SHA1	Message	Date
Paolo Abeni	83310d6133	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge in late fixes in preparation for the net-next PR. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-11 15:14:35 +01:00
Eric Dumazet	97d7ae6e14	tcp: inet6_csk_xmit() optimization After prior patches, inet6_csk_xmit() can reuse inet->cork.fl.u.ip6 if __sk_dst_check() returns a valid dst. Otherwise call inet6_csk_route_socket() to refresh inet->cork.fl.u.ip6 content and get a new dst. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	a6eee39cc2	tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock() As explained in commit `85d05e2817` ("ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6"): TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. After this patch, passive TCP ipv6 flows have correctly initialized inet->cork.fl.u.ip6 structure. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	19bdb267f7	tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect() Instead of using private @fl6 and @final variables use respectively inet->cork.fl.u.ip6 and np->final. As explained in commit `85d05e2817` ("ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6"): TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. After this patch, active TCP ipv6 flows have correctly initialized inet->cork.fl.u.ip6 structure. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	969a20198b	ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6 Convert inet6_csk_route_socket() to use np->final instead of an automatic variable to get rid of a stack canary. Convert inet6_csk_xmit() and inet6_csk_update_pmtu() to use inet->cork.fl.u.ip6 instead of @fl6 automatic variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:50 -08:00
Eric Dumazet	4e6c91cf60	ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update() Get rid of @fl6 and &final variables in ip6_datagram_dst_update(). Use instead inet->cork.fl.u.ip6 and np->final so that a stack canary is no longer needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:49 -08:00
Eric Dumazet	3d3f075e80	ipv6: use np->final in inet6_sk_rebuild_header() Instead of using an automatic variable, use np->final to get rid of the stack canary in inet6_sk_rebuild_header(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260206173426.1638518-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-10 20:57:49 -08:00
Alice Mikityanska	1676ebba39	net/ipv6: Remove jumbo_remove step from TX path Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove unnecessary steps from the GSO TX path, that used to check and remove HBH. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-5-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:12 -08:00
Alice Mikityanska	81be30c1f5	net/ipv6: Drop HBH for BIG TCP on RX side Complementary to the previous commit, stop inserting HBH when building BIG TCP GRO SKBs. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-4-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:12 -08:00
Alice Mikityanska	741d069aa4	net/ipv6: Drop HBH for BIG TCP on TX side BIG TCP IPv6 inserts a hop-by-hop extension header to indicate the real IPv6 payload length when it doesn't fit into the 16-bit field in the IPv6 header itself. While it helps tools parse the packet, it also requires every driver that supports TSO and BIG TCP to remove this 8-byte extension header. It might not sound that bad until we try to apply it to tunneled traffic. Currently, the drivers don't attempt to strip HBH if skb->encapsulation = 1. Moreover, trying to do so would require dissecting different tunnel protocols and making corresponding adjustments on case-by-case basis, which would slow down the fastpath (potentially also requiring adjusting checksums in outer headers). At the same time, BIG TCP IPv4 doesn't insert any extra headers and just calculates the payload length from skb->len, significantly simplifying implementing BIG TCP for tunnels. Stop inserting HBH when building BIG TCP GSO SKBs. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-3-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:12 -08:00
Alice Mikityanska	b2936b4fd5	net/ipv6: Introduce payload_len helpers The next commits will transition away from using the hop-by-hop extension header to encode packet length for BIG TCP. Add wrappers around ip6->payload_len that return the actual value if it's non-zero, and calculate it from skb->len if payload_len is set to zero (and a symmetrical setter). The new helpers are used wherever the surrounding code supports the hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h). No behavioral change in this commit. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:50:03 -08:00
Eric Dumazet	a14d931790	ipv6: do not use skb_header_pointer() in icmpv6_filter() Prefer pskb_may_pull() to avoid a stack canary in raw6_local_deliver(). Note: skb->head can change, hence we reload ip6h pointer in ipv6_raw_deliver() $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-86 (-86) Function old new delta raw6_local_deliver 780 694 -86 Total: Before=24889784, After=24889698, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205211909.4115285-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-06 20:34:20 -08:00
Eric Dumazet	c89477ad79	inet: RAW sockets using IPPROTO_RAW MUST drop incoming ICMP Yizhou Zhao reported that simply having one RAW socket on protocol IPPROTO_RAW (255) was dangerous. socket(AF_INET, SOCK_RAW, 255); A malicious incoming ICMP packet can set the protocol field to 255 and match this socket, leading to FNHE cache changes. inner = IP(src="192.168.2.1", dst="8.8.8.8", proto=255)/Raw("TEST") pkt = IP(src="192.168.1.1", dst="192.168.2.1")/ICMP(type=3, code=4, nexthopmtu=576)/inner "man 7 raw" states: A protocol of IPPROTO_RAW implies enabled IP_HDRINCL and is able to send any IP protocol that is specified in the passed header. Receiving of all IP protocols via IPPROTO_RAW is not possible using raw sockets. Make sure we drop these malicious packets. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Link: https://lore.kernel.org/netdev/20251109134600.292125-1-zhaoyz24@mails.tsinghua.edu.cn/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260203192509.682208-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 12:36:49 -08:00
Jakub Kicinski	a182a62ff7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc9). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c `3125fc1701` ("net: spacemit: k1-emac: fix jumbo frame support") `f66086798f` ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:54:08 -08:00
Eric Dumazet	85d05e2817	ipv6: change inet6_sk_rebuild_header() to use inet->cork.fl.u.ip6 TCP v6 spends a good amount of time rebuilding a fresh fl6 at each transmit in inet6_csk_xmit()/inet6_csk_route_socket(). TCP v4 caches the information in inet->cork.fl.u.ip4 instead. This patch is a first step converting IPv6 to the same strategy: Before this patch inet6_sk_rebuild_header() only validated/rebuilt a dst. Automatic variable @fl6 content was lost. After this patch inet6_sk_rebuild_header() also initializes inet->cork.fl.u.ip6, which can be reused in the future. This makes inet6_sk_rebuild_header() very similar to inet_sk_rebuild_header(). Also remove the EXPORT_SYMBOL_GPL(), inet6_sk_rebuild_header() is not called from any module. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260204163035.4123817-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:24:10 -08:00
Shigeru Yoshida	bbf4a17ad9	ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF syzbot reported a kernel BUG in fib6_add_rt2node() when adding an IPv6 route. [0] Commit `f72514b3c5` ("ipv6: clear RA flags when adding a static route") introduced logic to clear RTF_ADDRCONF from existing routes when a static route with the same nexthop is added. However, this causes a problem when the existing route has a gateway. When RTF_ADDRCONF is cleared from a route that has a gateway, that route becomes eligible for ECMP, i.e. rt6_qualify_for_ecmp() returns true. The issue is that this route was never added to the fib6_siblings list. This leads to a mismatch between the following counts: - The sibling count computed by iterating fib6_next chain, which includes the newly ECMP-eligible route - The actual siblings in fib6_siblings list, which does not include that route When a subsequent ECMP route is added, fib6_add_rt2node() hits BUG_ON(sibling->fib6_nsiblings != rt->fib6_nsiblings) because the counts don't match. Fix this by only clearing RTF_ADDRCONF when the existing route does not have a gateway. Routes without a gateway cannot qualify for ECMP anyway (rt6_qualify_for_ecmp() requires fib_nh_gw_family), so clearing RTF_ADDRCONF on them is safe and matches the original intent of the commit. [0]: kernel BUG at net/ipv6/ip6_fib.c:1217! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6010 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:fib6_add_rt2node+0x3433/0x3470 net/ipv6/ip6_fib.c:1217 [...] Call Trace: <TASK> fib6_add+0x8da/0x18a0 net/ipv6/ip6_fib.c:1532 __ip6_ins_rt net/ipv6/route.c:1351 [inline] ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3946 ipv6_route_ioctl+0x35c/0x480 net/ipv6/route.c:4571 inet6_ioctl+0x219/0x280 net/ipv6/af_inet6.c:577 sock_do_ioctl+0xdc/0x300 net/socket.c:1245 sock_ioctl+0x576/0x790 net/socket.c:1366 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: `f72514b3c5` ("ipv6: clear RA flags when adding a static route") Reported-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cb809def1baaac68ab92 Tested-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260204095837.1285552-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 08:38:40 -08:00
Eric Dumazet	b409a7f717	ipv6: colocate inet6_cork in inet_cork_full All inet6_cork users also use one inet_cork_full. Reduce number of parameters and increase data locality. This saves ~275 bytes of code on x86_64. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:30 -08:00
Eric Dumazet	fe8570186f	ipv4: use dst4_mtu() instead of dst_mtu() When we expect an IPv4 dst, use dst4_mtu() instead of dst_mtu() to save some code space. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:29 -08:00
Eric Dumazet	b40f0130a2	ipv6: use dst6_mtu() instead of dst_mtu() When we expect an IPv6 dst, use dst6_mtu() instead of dst_mtu() to save some code space. Due to current dst6_mtu() implementation, only convert users in IPv6 stack. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:29 -08:00
Eric Dumazet	94a150bf14	ipv6: use SKB_DROP_REASON_PKT_TOO_BIG in ip6_xmit() When a too big packet is dropped, use SKB_DROP_REASON_PKT_TOO_BIG. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:29 -08:00
Eric Dumazet	b5b1b676a3	ipv6: use __skb_push() in ip6_xmit() ip6_xmit() makes sure there is enough headroom in the skb, it can uses __skb_push() instead of the out-of-line skb_push(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:29 -08:00
Eric Dumazet	2855e49254	ipv6: add some unlikely()/likely() clauses in ip6_output.c 1) daddr is unlikely a multicast in ip6_finish_output2(). 2) ip6_finish_output_gso_slowpath_drop() should not be called often. 3) ip6_fragment() should not be called often. 4) opt is unlikely to be set. 5) ip6_xmit() and ip6_forward() mostly sends not too big packets. 6) Most __ip6_make_skb() calls are for UDP packets, not ICMPV6 ones. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:29 -08:00
Eric Dumazet	1bc46dd209	ipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts() With CONFIG_STACKPROTECTOR_STRONG=y, it is better to avoid passing a pointer to an automatic variable. Change these exported functions to return 'u8 proto' instead of void. - ipv6_push_nfrag_opts() - ipv6_push_frag_opts() For instance, replace ipv6_push_frag_opts(skb, opt, &proto); with: proto = ipv6_push_frag_opts(skb, opt, proto); Note that even after this change, ip6_xmit() has to use a stack canary because of @first_hop variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:49:28 -08:00
Eric Dumazet	ed9b70040d	tcp: reduce tcp sockets size by one cache line By default, when a kmem_cache is created with SLAB_TYPESAFE_BY_RCU, slub has to use extra storage for the freelist pointer after each object, because slub assumes that any bit in the object can be used by RCU readers. Because proto_register() is also using SLAB_HWCACHE_ALIGN, this forces slub to use one extra cache line per object. We can instead put the slub freelist anywhere in the object, granted the concurrent RCU readers are not supposed to use the pointer value. Add a new (struct sock)sk_freeptr field, in an union with sk_rcu: No RCU readers would need to look at sk_rcu, which is only used at free phase. Tested: grep . /sys/kernel/slab/TCP/{object_size,slab_size,objs_per_slab} grep . /sys/kernel/slab/TCPv6/{object_size,slab_size,objs_per_slab} Before: /sys/kernel/slab/TCP/object_size:2368 /sys/kernel/slab/TCP/slab_size:2432 /sys/kernel/slab/TCP/objs_per_slab:13 /sys/kernel/slab/TCPv6/object_size:2496 /sys/kernel/slab/TCPv6/slab_size:2560 /sys/kernel/slab/TCPv6/objs_per_slab:12 After this patch, we can pack one more TCPv6 object per slab, and object_size == slab_size. /sys/kernel/slab/TCP/object_size:2368 /sys/kernel/slab/TCP/slab_size:2368 /sys/kernel/slab/TCP/objs_per_slab:13 /sys/kernel/slab/TCPv6/object_size:2496 /sys/kernel/slab/TCPv6/slab_size:2496 /sys/kernel/slab/TCPv6/objs_per_slab:13 Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260129153458.4163797-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-30 17:15:51 -08:00
Eric Dumazet	b1cd687e3e	ipv6: optimize fl6_update_dst() fl6_update_dst() is called for every TCP (and others) transmit, and is a nop for common cases. Split it in two parts : 1) fl6_update_dst() inline helper, small and fast. 2) __fl6_update_dst() for the exception, out of line. Small size increase to get better TX performance. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/2 grow/shrink: 8/0 up/down: 296/-125 (171) Function old new delta __fl6_update_dst - 104 +104 rawv6_sendmsg 2244 2284 +40 udpv6_sendmsg 3013 3043 +30 tcp_v6_connect 1514 1534 +20 cookie_v6_check 1501 1519 +18 ip6_datagram_dst_update 673 690 +17 inet6_sk_rebuild_header 499 516 +17 inet6_csk_route_socket 507 524 +17 inet6_csk_route_req 343 360 +17 __pfx___fl6_update_dst - 16 +16 __pfx_fl6_update_dst 16 - -16 fl6_update_dst 109 - -109 Total: Before=22570304, After=22570475, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260128185548.3738781-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-29 18:47:21 -08:00
Jakub Kicinski	a010fe8d86	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc8). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c `2c84959167` ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()") `f66086798f` ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-29 17:28:54 -08:00
Jibin Zhang	426ca15c7f	net: fix segmentation of forwarding fraglist GRO This patch enhances GSO segment handling by properly checking the SKB_GSO_DODGY flag for frag_list GSO packets, addressing low throughput issues observed when a station accesses IPv4 servers via hotspots with an IPv6-only upstream interface. Specifically, it fixes a bug in GSO segmentation when forwarding GRO packets containing a frag_list. The function skb_segment_list cannot correctly process GRO skbs that have been converted by XLAT, since XLAT only translates the header of the head skb. Consequently, skbs in the frag_list may remain untranslated, resulting in protocol inconsistencies and reduced throughput. To address this, the patch explicitly sets the SKB_GSO_DODGY flag for GSO packets in XLAT's IPv4/IPv6 protocol translation helpers (bpf_skb_proto_4_to_6 and bpf_skb_proto_6_to_4). This marks GSO packets as potentially modified after protocol translation. As a result, GSO segmentation will avoid using skb_segment_list and instead falls back to skb_segment for packets with the SKB_GSO_DODGY flag. This ensures that only safe and fully translated frag_list packets are processed by skb_segment_list, resolving protocol inconsistencies and improving throughput when forwarding GRO packets converted by XLAT. Signed-off-by: Jibin Zhang <jibin.zhang@mediatek.com> Fixes: `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260126152114.1211-1-jibin.zhang@mediatek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-29 14:40:12 +01:00
Lorenzo Bianconi	d98103575d	netfilter: flowtable: Add IP6IP6 rx sw acceleration Introduce sw acceleration for rx path of IP6IP6 tunnels relying on the netfilter flowtable infrastructure. Subsequent patches will add sw acceleration for IP6IP6 tunnels tx path. IP6IP6 rx sw acceleration can be tested running the following scenario where the traffic is forwarded between two NICs (eth0 and eth1) and an IP6IP6 tunnel is used to access a remote site (using eth1 as the underlay device): ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (2001:db8:3::2) $ip addr show 6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff inet6 2001:db8:1::2/64 scope global nodad valid_lft forever preferred_lft forever 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff inet6 2001:db8:2::1/64 scope global nodad valid_lft forever preferred_lft forever 8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000 link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr ce9c:2940:7dcc:: inet6 2002:db8:1::1/64 scope global nodad valid_lft forever preferred_lft forever $ip -6 route show 2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium 2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium default via 2002:db8:1::2 dev tun0 metric 1024 pref medium $nft list ruleset table inet filter { flowtable ft { hook ingress priority filter devices = { eth0, eth1 } } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp } flow add @ft } } Reproducing the scenario described above using veths I got the following results: - TCP stream received from the IPIP tunnel: - net-next: (baseline) ~ 81Gbps - net-next + IP6IP6 flowtbale support: ~112Gbps Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-29 09:52:06 +01:00
Eric Dumazet	2f80b2797a	ipv6: remove __inet6_csk_dst_check() __inet6_csk_dst_check() is a very simple wrapper with no value, it is used only once. Directly use __sk_dst_check(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260127211203.1524339-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-28 19:37:45 -08:00
Eric Biggers	5023479627	ipv6: Switch to higher-level SHA-1 functions There's now a proper SHA-1 API that follows the usual conventions for hash function APIs: sha1_init(), sha1_update(), sha1_final(), sha1(). The only remaining user of the older low-level SHA-1 API, sha1_init_raw() and sha1_transform(), is ipv6_generate_stable_address(). I'd like to remove this older API, which is too low-level. Unfortunately, ipv6_generate_stable_address() does in fact skip the SHA-1 finalization for some reason. So the values it computes are not standard SHA-1 values, and it sort of does want the low-level API. Still, it's still possible to use the higher-level functions sha1_init() and sha1_update() to get the same result, provided that the resulting state is used directly, skipping sha1_final(). So, let's do that instead. This will allow removing the low-level API. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260123051656.396371-2-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-27 15:47:40 -08:00
Fernando Fernandez Mancera	03cbcdf938	ipv6: use the right ifindex when replying to icmpv6 from localhost When replying to a ICMPv6 echo request that comes from localhost address the right output ifindex is 1 (lo) and not rt6i_idev dev index. Use the skb device ifindex instead. This fixes pinging to a local address from localhost source address. $ ping6 -I ::1 2001:1:1::2 -c 3 PING 2001:1:1::2 (2001:1:1::2) from ::1 : 56 data bytes 64 bytes from 2001:1:1::2: icmp_seq=1 ttl=64 time=0.037 ms 64 bytes from 2001:1:1::2: icmp_seq=2 ttl=64 time=0.069 ms 64 bytes from 2001:1:1::2: icmp_seq=3 ttl=64 time=0.122 ms 2001:1:1::2 ping statistics 3 packets transmitted, 3 received, 0% packet loss, time 2032ms rtt min/avg/max/mdev = 0.037/0.076/0.122/0.035 ms Fixes: `1b70d792cf` ("ipv6: Use rt6i_idev index for echo replies to a local address") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260121194409.6749-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-25 13:16:47 -08:00
Jakub Kicinski	9abf22075d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc7). Conflicts: drivers/net/ethernet/huawei/hinic3/hinic3_irq.c `b35a6fd37a` ("hinic3: Add adaptive IRQ coalescing with DIM") `fb2bb2a1eb` ("hinic3: Fix netif_queue_set_napi queue_index input parameter error") https://lore.kernel.org/fc0a7fdf08789a52653e8ad05281a0a849e79206.1768915707.git.zhuyikai1@h-partners.com drivers/net/wireless/ath/ath12k/mac.c drivers/net/wireless/ath/ath12k/wifi7/hw.c `3170757210` ("wifi: ath12k: Fix wrong P2P device link id issue") `c26f294fef` ("wifi: ath12k: Move ieee80211_ops callback to the arch specific module") https://lore.kernel.org/20260114123751.6a208818@canb.auug.org.au Adjacent changes: drivers/net/wireless/ath/ath12k/mac.c `8b8d6ee53d` ("wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel") `914c890d3b` ("wifi: ath12k: Add framework for hardware specific ieee80211_ops registration") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-22 20:14:36 -08:00
Eric Dumazet	b8d9b7daf0	gro: inline tcp6_gro_complete() Remove one function call from GRO stack for native IPv6 + TCP packets. $ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3 add/remove: 0/0 grow/shrink: 1/1 up/down: 298/-5 (293) Function old new delta ipv6_gro_complete 435 733 +298 tcp6_gro_complete 311 306 -5 Total: Before=22593532, After=22593825, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260120164903.1912995-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-21 19:28:32 -08:00
Eric Dumazet	87737cd76e	gro: inline tcp6_gro_receive() FDO/LTO are unable to inline tcp6_gro_receive() from ipv6_gro_receive() Make sure tcp6_check_fraglist_gro() is only called only when needed, so that compiler can leave it out-of-line. $ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2 add/remove: 2/0 grow/shrink: 3/1 up/down: 1123/-253 (870) Function old new delta ipv6_gro_receive 1069 1846 +777 tcp6_check_fraglist_gro - 272 +272 ipv6_offload_init 218 274 +56 __pfx_tcp6_check_fraglist_gro - 16 +16 ipv6_gro_complete 433 435 +2 tcp6_gro_receive 959 706 -253 Total: Before=22592662, After=22593532, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260120164903.1912995-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-21 19:28:32 -08:00
Eric Dumazet	a4674aa58b	tcp: preserve const qualifier in tcp_rsk() and inet_rsk() We can change tcp_rsk() and inet_rsk() to propagate their argument const qualifier thanks to container_of_const(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260120125353.1470456-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-21 19:20:04 -08:00
Eric Dumazet	9a063f96d8	ipv6: annotate data-race in ndisc_router_discovery() syzbot found that ndisc_router_discovery() could read and write in6_dev->ra_mtu without holding a lock [1] This looks fine, IFLA_INET6_RA_MTU is best effort. Add READ_ONCE()/WRITE_ONCE() to document the race. Note that we might also reject illegal MTU values (mtu < IPV6_MIN_MTU \|\| mtu > skb->dev->mtu) in a future patch. [1] BUG: KCSAN: data-race in ndisc_router_discovery / ndisc_router_discovery read to 0xffff888119809c20 of 4 bytes by task 25817 on cpu 1: ndisc_router_discovery+0x151d/0x1c90 net/ipv6/ndisc.c:1558 ndisc_rcv+0x2ad/0x3d0 net/ipv6/ndisc.c:1841 icmpv6_rcv+0xe5a/0x12f0 net/ipv6/icmp.c:989 ip6_protocol_deliver_rcu+0xb2a/0x10d0 net/ipv6/ip6_input.c:438 ip6_input_finish+0xf0/0x1d0 net/ipv6/ip6_input.c:489 NF_HOOK include/linux/netfilter.h:318 [inline] ip6_input+0x5e/0x140 net/ipv6/ip6_input.c:500 ip6_mc_input+0x27c/0x470 net/ipv6/ip6_input.c:590 dst_input include/net/dst.h:474 [inline] ip6_rcv_finish+0x336/0x340 net/ipv6/ip6_input.c:79 ... write to 0xffff888119809c20 of 4 bytes by task 25816 on cpu 0: ndisc_router_discovery+0x155a/0x1c90 net/ipv6/ndisc.c:1559 ndisc_rcv+0x2ad/0x3d0 net/ipv6/ndisc.c:1841 icmpv6_rcv+0xe5a/0x12f0 net/ipv6/icmp.c:989 ip6_protocol_deliver_rcu+0xb2a/0x10d0 net/ipv6/ip6_input.c:438 ip6_input_finish+0xf0/0x1d0 net/ipv6/ip6_input.c:489 NF_HOOK include/linux/netfilter.h:318 [inline] ip6_input+0x5e/0x140 net/ipv6/ip6_input.c:500 ip6_mc_input+0x27c/0x470 net/ipv6/ip6_input.c:590 dst_input include/net/dst.h:474 [inline] ip6_rcv_finish+0x336/0x340 net/ipv6/ip6_input.c:79 ... value changed: 0x00000000 -> 0xe5400659 Fixes: `49b99da2c9` ("ipv6: add IFLA_INET6_RA_MTU to expose mtu value") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Rocco Yue <rocco.yue@mediatek.com> Link: https://patch.msgid.link/20260118152941.2563857-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-20 18:37:45 -08:00
Eric Dumazet	f062e8e251	ipv6: annotate data-races in net/ipv6/route.c sysctls are read while their values can change, add READ_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:43 -08:00
Eric Dumazet	978b67d283	ipv6: exthdrs: annotate data-race over multiple sysctl Following four sysctls can change under us, add missing READ_ONCE(). - ipv6.sysctl.max_dst_opts_len - ipv6.sysctl.max_dst_opts_cnt - ipv6.sysctl.max_hbh_opts_len - ipv6.sysctl.max_hbh_opts_cnt Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:43 -08:00
Eric Dumazet	12eddc6857	ipv6: annotate data-races around sysctl.ip6_rt_gc_interval Add READ_ONCE() on lockless reads of net->ipv6.sysctl.ip6_rt_gc_interval Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	5ade47c974	ipv6: annotate data-races over sysctl.flowlabel_reflect Add missing READ_ONCE() when reading ipv6.sysctl.flowlabel_reflect, as its value can be changed under us. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Jakub Kicinski	c27022497d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-15 18:02:48 -08:00
Paolo Abeni	5ce234a8fe	Merge tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-01-14 1) Fix inner mode lookup in tunnel mode GSO segmentation. The protocol was taken from the wrong field. 2) Set ipv4 no_pmtu_disc flag only on output SAs. The insertation of input SAs can fail if no_pmtu_disc is set. Please pull or let me know if there are problems. ipsec-2026-01-14 * tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is set xfrm: Fix inner mode lookup in tunnel mode GSO segmentation ==================== Link: https://patch.msgid.link/20260114121817.1106134-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-15 12:46:12 +01:00
Kuniyuki Iwashima	ddf96c393a	ipv6: Fix use-after-free in inet6_addr_del(). syzbot reported use-after-free of inet6_ifaddr in inet6_addr_del(). [0] The cited commit accidentally moved ipv6_del_addr() for mngtmpaddr before reading its ifp->flags for temporary addresses in inet6_addr_del(). Let's move ipv6_del_addr() down to fix the UAF. [0]: BUG: KASAN: slab-use-after-free in inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117 Read of size 4 at addr ffff88807b89c86c by task syz.3.1618/9593 CPU: 0 UID: 0 PID: 9593 Comm: syz.3.1618 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcd/0x630 mm/kasan/report.c:482 kasan_report+0xe0/0x110 mm/kasan/report.c:595 inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117 addrconf_del_ifaddr+0x11e/0x190 net/ipv6/addrconf.c:3181 inet6_ioctl+0x1e5/0x2b0 net/ipv6/af_inet6.c:582 sock_do_ioctl+0x118/0x280 net/socket.c:1254 sock_ioctl+0x227/0x6b0 net/socket.c:1375 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f164cf8f749 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f164de64038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f164d1e5fa0 RCX: 00007f164cf8f749 RDX: 0000200000000000 RSI: 0000000000008936 RDI: 0000000000000003 RBP: 00007f164d013f91 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f164d1e6038 R14: 00007f164d1e5fa0 R15: 00007ffde15c8288 </TASK> Allocated by task 9593: kasan_save_stack+0x33/0x60 mm/kasan/common.c:56 kasan_save_track+0x14/0x30 mm/kasan/common.c:77 poison_kmalloc_redzone mm/kasan/common.c:397 [inline] __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:414 kmalloc_noprof include/linux/slab.h:957 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] ipv6_add_addr+0x4e3/0x2010 net/ipv6/addrconf.c:1120 inet6_addr_add+0x256/0x9b0 net/ipv6/addrconf.c:3050 addrconf_add_ifaddr+0x1fc/0x450 net/ipv6/addrconf.c:3160 inet6_ioctl+0x103/0x2b0 net/ipv6/af_inet6.c:580 sock_do_ioctl+0x118/0x280 net/socket.c:1254 sock_ioctl+0x227/0x6b0 net/socket.c:1375 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 6099: kasan_save_stack+0x33/0x60 mm/kasan/common.c:56 kasan_save_track+0x14/0x30 mm/kasan/common.c:77 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584 poison_slab_object mm/kasan/common.c:252 [inline] __kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284 kasan_slab_free include/linux/kasan.h:234 [inline] slab_free_hook mm/slub.c:2540 [inline] slab_free_freelist_hook mm/slub.c:2569 [inline] slab_free_bulk mm/slub.c:6696 [inline] kmem_cache_free_bulk mm/slub.c:7383 [inline] kmem_cache_free_bulk+0x2bf/0x680 mm/slub.c:7362 kfree_bulk include/linux/slab.h:830 [inline] kvfree_rcu_bulk+0x1b7/0x1e0 mm/slab_common.c:1523 kvfree_rcu_drain_ready mm/slab_common.c:1728 [inline] kfree_rcu_monitor+0x1d0/0x2f0 mm/slab_common.c:1801 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421 kthread+0x3c5/0x780 kernel/kthread.c:463 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 Fixes: `00b5b7aab9` ("net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged") Reported-by: syzbot+72e610f4f1a930ca9d8a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/696598e9.050a0220.3be5c5.0009.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260113010538.2019411-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-13 19:09:11 -08:00
Eric Dumazet	9a6f0c4d57	dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list() syzbot was able to crash the kernel in rt6_uncached_list_flush_dev() in an interesting way [1] Crash happens in list_del_init()/INIT_LIST_HEAD() while writing list->prev, while the prior write on list->next went well. static inline void INIT_LIST_HEAD(struct list_head *list) { WRITE_ONCE(list->next, list); // This went well WRITE_ONCE(list->prev, list); // Crash, @list has been freed. } Issue here is that rt6_uncached_list_del() did not attempt to lock ul->lock, as list_empty(&rt->dst.rt_uncached) returned true because the WRITE_ONCE(list->next, list) happened on the other CPU. We might use list_del_init_careful() and list_empty_careful(), or make sure rt6_uncached_list_del() always grabs the spinlock whenever rt->dst.rt_uncached_list has been set. A similar fix is neeed for IPv4. [1] BUG: KASAN: slab-use-after-free in INIT_LIST_HEAD include/linux/list.h:46 [inline] BUG: KASAN: slab-use-after-free in list_del_init include/linux/list.h:296 [inline] BUG: KASAN: slab-use-after-free in rt6_uncached_list_flush_dev net/ipv6/route.c:191 [inline] BUG: KASAN: slab-use-after-free in rt6_disable_ip+0x633/0x730 net/ipv6/route.c:5020 Write of size 8 at addr ffff8880294cfa78 by task kworker/u8:14/3450 CPU: 0 UID: 0 PID: 3450 Comm: kworker/u8:14 Tainted: G L syzkaller #0 PREEMPT_{RT,(full)} Tainted: [L]=SOFTLOCKUP Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 Workqueue: netns cleanup_net Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x240 mm/kasan/report.c:482 kasan_report+0x118/0x150 mm/kasan/report.c:595 INIT_LIST_HEAD include/linux/list.h:46 [inline] list_del_init include/linux/list.h:296 [inline] rt6_uncached_list_flush_dev net/ipv6/route.c:191 [inline] rt6_disable_ip+0x633/0x730 net/ipv6/route.c:5020 addrconf_ifdown+0x143/0x18a0 net/ipv6/addrconf.c:3853 addrconf_notify+0x1bc/0x1050 net/ipv6/addrconf.c:-1 notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85 call_netdevice_notifiers_extack net/core/dev.c:2268 [inline] call_netdevice_notifiers net/core/dev.c:2282 [inline] netif_close_many+0x29c/0x410 net/core/dev.c:1785 unregister_netdevice_many_notify+0xb50/0x2330 net/core/dev.c:12353 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline] ops_undo_list+0x3dc/0x990 net/core/net_namespace.c:248 cleanup_net+0x4de/0x7b0 net/core/net_namespace.c:696 process_one_work kernel/workqueue.c:3257 [inline] process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 </TASK> Allocated by task 803: kasan_save_stack mm/kasan/common.c:57 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:78 unpoison_slab_object mm/kasan/common.c:340 [inline] __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366 kasan_slab_alloc include/linux/kasan.h:253 [inline] slab_post_alloc_hook mm/slub.c:4953 [inline] slab_alloc_node mm/slub.c:5263 [inline] kmem_cache_alloc_noprof+0x18d/0x6c0 mm/slub.c:5270 dst_alloc+0x105/0x170 net/core/dst.c:89 ip6_dst_alloc net/ipv6/route.c:342 [inline] icmp6_dst_alloc+0x75/0x460 net/ipv6/route.c:3333 mld_sendpack+0x683/0xe60 net/ipv6/mcast.c:1844 mld_send_cr net/ipv6/mcast.c:2154 [inline] mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693 process_one_work kernel/workqueue.c:3257 [inline] process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 Freed by task 20: kasan_save_stack mm/kasan/common.c:57 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:78 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584 poison_slab_object mm/kasan/common.c:253 [inline] __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285 kasan_slab_free include/linux/kasan.h:235 [inline] slab_free_hook mm/slub.c:2540 [inline] slab_free mm/slub.c:6670 [inline] kmem_cache_free+0x18f/0x8d0 mm/slub.c:6781 dst_destroy+0x235/0x350 net/core/dst.c:121 rcu_do_batch kernel/rcu/tree.c:2605 [inline] rcu_core kernel/rcu/tree.c:2857 [inline] rcu_cpu_kthread+0xba5/0x1af0 kernel/rcu/tree.c:2945 smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 Last potentially related work creation: kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556 __call_rcu_common kernel/rcu/tree.c:3119 [inline] call_rcu+0xee/0x890 kernel/rcu/tree.c:3239 refdst_drop include/net/dst.h:266 [inline] skb_dst_drop include/net/dst.h:278 [inline] skb_release_head_state+0x71/0x360 net/core/skbuff.c:1156 skb_release_all net/core/skbuff.c:1180 [inline] __kfree_skb net/core/skbuff.c:1196 [inline] sk_skb_reason_drop+0xe9/0x170 net/core/skbuff.c:1234 kfree_skb_reason include/linux/skbuff.h:1322 [inline] tcf_kfree_skb_list include/net/sch_generic.h:1127 [inline] __dev_xmit_skb net/core/dev.c:4260 [inline] __dev_queue_xmit+0x26aa/0x3210 net/core/dev.c:4785 NF_HOOK_COND include/linux/netfilter.h:307 [inline] ip6_output+0x340/0x550 net/ipv6/ip6_output.c:247 NF_HOOK+0x9e/0x380 include/linux/netfilter.h:318 mld_sendpack+0x8d4/0xe60 net/ipv6/mcast.c:1855 mld_send_cr net/ipv6/mcast.c:2154 [inline] mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693 process_one_work kernel/workqueue.c:3257 [inline] process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 The buggy address belongs to the object at ffff8880294cfa00 which belongs to the cache ip6_dst_cache of size 232 The buggy address is located 120 bytes inside of freed 232-byte region [ffff8880294cfa00, ffff8880294cfae8) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x294cf memcg:ffff88803536b781 flags: 0x80000000000000(node=0\|zone=1) page_type: f5(slab) raw: 0080000000000000 ffff88802ff1c8c0 ffffea0000bf2bc0 dead000000000006 raw: 0000000000000000 00000000800c000c 00000000f5000000 ffff88803536b781 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52820(GFP_ATOMIC\|__GFP_NOWARN\|__GFP_NORETRY\|__GFP_COMP), pid 9, tgid 9 (kworker/0:0), ts 91119585830, free_ts 91088628818 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x234/0x290 mm/page_alloc.c:1857 prep_new_page mm/page_alloc.c:1865 [inline] get_page_from_freelist+0x28c0/0x2960 mm/page_alloc.c:3915 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5210 alloc_pages_mpol+0xd1/0x380 mm/mempolicy.c:2486 alloc_slab_page mm/slub.c:3075 [inline] allocate_slab+0x86/0x3b0 mm/slub.c:3248 new_slab mm/slub.c:3302 [inline] ___slab_alloc+0xb10/0x13e0 mm/slub.c:4656 __slab_alloc+0xc6/0x1f0 mm/slub.c:4779 __slab_alloc_node mm/slub.c:4855 [inline] slab_alloc_node mm/slub.c:5251 [inline] kmem_cache_alloc_noprof+0x101/0x6c0 mm/slub.c:5270 dst_alloc+0x105/0x170 net/core/dst.c:89 ip6_dst_alloc net/ipv6/route.c:342 [inline] icmp6_dst_alloc+0x75/0x460 net/ipv6/route.c:3333 mld_sendpack+0x683/0xe60 net/ipv6/mcast.c:1844 mld_send_cr net/ipv6/mcast.c:2154 [inline] mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693 process_one_work kernel/workqueue.c:3257 [inline] process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158 page last free pid 5859 tgid 5859 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1406 [inline] __free_frozen_pages+0xfe1/0x1170 mm/page_alloc.c:2943 discard_slab mm/slub.c:3346 [inline] __put_partials+0x149/0x170 mm/slub.c:3886 __slab_free+0x2af/0x330 mm/slub.c:5952 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:350 kasan_slab_alloc include/linux/kasan.h:253 [inline] slab_post_alloc_hook mm/slub.c:4953 [inline] slab_alloc_node mm/slub.c:5263 [inline] kmem_cache_alloc_noprof+0x18d/0x6c0 mm/slub.c:5270 getname_flags+0xb8/0x540 fs/namei.c:146 getname include/linux/fs.h:2498 [inline] do_sys_openat2+0xbc/0x200 fs/open.c:1426 do_sys_open fs/open.c:1436 [inline] __do_sys_openat fs/open.c:1452 [inline] __se_sys_openat fs/open.c:1447 [inline] __x64_sys_openat+0x138/0x170 fs/open.c:1447 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94 Fixes: `8d0b94afdc` ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister") Fixes: `78df76a065` ("ipv4: take rt_uncached_lock only if needed") Reported-by: syzbot+179fc225724092b8b2b2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6964cdf2.050a0220.eaf7.009d.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260112103825.3810713-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-13 19:08:18 -08:00
Ido Schimmel	b853b94e84	ipv6: Allow for nexthop device mismatch with "onlink" IPv4 allows for a nexthop device mismatch when the "onlink" keyword is specified: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip link add name dummy2 up type dummy # ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2 Error: Nexthop has invalid gateway. # ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2 onlink # echo $? 0 This seems to be consistent with the description of "onlink" in the ip-route man page: "Pretend that the nexthop is directly attached to this link, even if it does not match any interface prefix". On the other hand, IPv6 rejects a nexthop device mismatch, even when "onlink" is specified: # ip link add name dummy1 up type dummy # ip address add 2001:db8:1::1/64 dev dummy1 # ip link add name dummy2 up type dummy # ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 RTNETLINK answers: No route to host # ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink Error: Nexthop has invalid gateway or device mismatch. This is intentional according to commit `fc1e64e109` ("net/ipv6: Add support for onlink flag") which added IPv6 "onlink" support and states that "any unicast gateway is allowed as long as the gateway is not a local address and if it resolves it must match the given device". The condition was later relaxed in commit `4ed591c8ab` ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route") to allow for a nexthop device mismatch if the gateway address is resolved via the default route: # ip link add name dummy1 up type dummy # ip route add ::/0 dev dummy1 # ip link add name dummy2 up type dummy # ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 RTNETLINK answers: No route to host # ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink # echo $? 0 While the decision to forbid a nexthop device mismatch in IPv6 seems to be intentional, it is unclear why it was made. Especially when it differs from IPv4 and seems to go against the intended behavior of "onlink". Therefore, relax the condition further and allow for a nexthop device mismatch when "onlink" is specified: # ip link add name dummy1 up type dummy # ip address add 2001:db8:1::1/64 dev dummy1 # ip link add name dummy2 up type dummy # ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink # echo $? 0 The motivating use case is the fact that FRR would like to be able to configure overlay routes of the following form: # ip route add <host-Z> vrf <VRF> encap ip id <ID> src <VTEP-A> dst <VTEP-Z> via <VTEP-Z> dev vxlan0 onlink Where vxlan0 is in the default VRF in which "VTEP-Z" is reachable via one of the underlay routes (e.g., via swpX). Without this patch, the above only works with IPv4, but not with IPv6. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260111120813.159799-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-13 18:57:34 -08:00
Eric Dumazet	81c734dae2	ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv() Blamed commit did not take care of VLAN encapsulations as spotted by syzbot [1]. Use skb_vlan_inet_prepare() instead of pskb_inet_may_pull(). [1] BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] BUG: KMSAN: uninit-value in IP6_ECN_decapsulate+0x7a8/0x1fa0 include/net/inet_ecn.h:321 __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] IP6_ECN_decapsulate+0x7a8/0x1fa0 include/net/inet_ecn.h:321 ip6ip6_dscp_ecn_decapsulate+0x16f/0x1b0 net/ipv6/ip6_tunnel.c:729 __ip6_tnl_rcv+0xed9/0x1b50 net/ipv6/ip6_tunnel.c:860 ip6_tnl_rcv+0xc3/0x100 net/ipv6/ip6_tunnel.c:903 gre_rcv+0x1529/0x1b90 net/ipv6/ip6_gre.c:-1 ip6_protocol_deliver_rcu+0x1c89/0x2c60 net/ipv6/ip6_input.c:438 ip6_input_finish+0x1f4/0x4a0 net/ipv6/ip6_input.c:489 NF_HOOK include/linux/netfilter.h:318 [inline] ip6_input+0x9c/0x330 net/ipv6/ip6_input.c:500 ip6_mc_input+0x7ca/0xc10 net/ipv6/ip6_input.c:590 dst_input include/net/dst.h:474 [inline] ip6_rcv_finish+0x958/0x990 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:318 [inline] ipv6_rcv+0xf1/0x3c0 net/ipv6/ip6_input.c:311 __netif_receive_skb_one_core net/core/dev.c:6139 [inline] __netif_receive_skb+0x1df/0xac0 net/core/dev.c:6252 netif_receive_skb_internal net/core/dev.c:6338 [inline] netif_receive_skb+0x57/0x630 net/core/dev.c:6397 tun_rx_batched+0x1df/0x980 drivers/net/tun.c:1485 tun_get_user+0x5c0e/0x6c60 drivers/net/tun.c:1953 tun_chr_write_iter+0x3e9/0x5c0 drivers/net/tun.c:1999 new_sync_write fs/read_write.c:593 [inline] vfs_write+0xbe2/0x15d0 fs/read_write.c:686 ksys_write fs/read_write.c:738 [inline] __do_sys_write fs/read_write.c:749 [inline] __se_sys_write fs/read_write.c:746 [inline] __x64_sys_write+0x1fb/0x4d0 fs/read_write.c:746 x64_sys_call+0x30ab/0x3e70 arch/x86/include/generated/asm/syscalls_64.h:2 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd3/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4960 [inline] slab_alloc_node mm/slub.c:5263 [inline] kmem_cache_alloc_node_noprof+0x9e7/0x17a0 mm/slub.c:5315 kmalloc_reserve+0x13c/0x4b0 net/core/skbuff.c:586 __alloc_skb+0x805/0x1040 net/core/skbuff.c:690 alloc_skb include/linux/skbuff.h:1383 [inline] alloc_skb_with_frags+0xc5/0xa60 net/core/skbuff.c:6712 sock_alloc_send_pskb+0xacc/0xc60 net/core/sock.c:2995 tun_alloc_skb drivers/net/tun.c:1461 [inline] tun_get_user+0x1142/0x6c60 drivers/net/tun.c:1794 tun_chr_write_iter+0x3e9/0x5c0 drivers/net/tun.c:1999 new_sync_write fs/read_write.c:593 [inline] vfs_write+0xbe2/0x15d0 fs/read_write.c:686 ksys_write fs/read_write.c:738 [inline] __do_sys_write fs/read_write.c:749 [inline] __se_sys_write fs/read_write.c:746 [inline] __x64_sys_write+0x1fb/0x4d0 fs/read_write.c:746 x64_sys_call+0x30ab/0x3e70 arch/x86/include/generated/asm/syscalls_64.h:2 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd3/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 0 UID: 0 PID: 6465 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(none) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 Fixes: `8d975c15c0` ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()") Reported-by: syzbot+d4dda070f833dc5dc89a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/695e88b2.050a0220.1c677c.036d.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260107163109.4188620-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-09 18:14:08 -08:00
Eric Dumazet	e9cd04b281	udp: udplite is unlikely Add some unlikely() annotations to speed up the fast path, at least with clang compiler. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260105101719.2378881-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:06:03 -08:00
Yumei Huang	cb3de96eea	ipv6: preserve insertion order for same-scope addresses IPv6 addresses with the same scope are returned in reverse insertion order, unlike IPv4. For example, when adding a -> b -> c, the list is reported as c -> b -> a, while IPv4 preserves the original order. This behavior causes: a. When using `ip -6 a save` and `ip -6 a restore`, addresses are restored in the opposite order from which they were saved. See example below showing addresses added as 1::1, 1::2, 1::3 but displayed and saved in reverse order. # ip -6 a a 1::1 dev x # ip -6 a a 1::2 dev x # ip -6 a a 1::3 dev x # ip -6 a s dev x 2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 inet6 1::3/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::2/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::1/128 scope global tentative valid_lft forever preferred_lft forever # ip -6 a save > dump # ip -6 a d 1::1 dev x # ip -6 a d 1::2 dev x # ip -6 a d 1::3 dev x # ip a d ::1 dev lo # ip a restore < dump # ip -6 a s dev x 2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 inet6 1::1/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::2/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::3/128 scope global tentative valid_lft forever preferred_lft forever # ip a showdump < dump if1: inet6 ::1/128 scope host proto kernel_lo valid_lft forever preferred_lft forever if2: inet6 1::3/128 scope global tentative valid_lft forever preferred_lft forever if2: inet6 1::2/128 scope global tentative valid_lft forever preferred_lft forever if2: inet6 1::1/128 scope global tentative valid_lft forever preferred_lft forever b. Addresses in pasta to appear in reversed order compared to host addresses. The ipv6 addresses were added in reverse order by commit `e55ffac601` ("[IPV6]: order addresses by scope"), then it was changed by commit `502a2ffd73` ("ipv6: convert idev_list to list macros"), and restored by commit `b54c9b98bb` ("ipv6: Preserve pervious behavior in ipv6_link_dev_addr()."). However, this reverse ordering within the same scope causes inconsistency with IPv4 and the issues described above. This patch aligns IPv6 address ordering with IPv4 for consistency by changing the comparison from >= to > when inserting addresses into the address list. Also updates the ioam6 selftest to reflect the new address ordering behavior. Combine these two changes into one patch for bisectability. Link: https://bugs.passt.top/show_bug.cgi?id=175 Suggested-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Yumei Huang <yuhuang@redhat.com> Acked-by: Justin Iurman <justin.iurman@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260104032357.38555-1-yuhuang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 16:41:35 -08:00
Jiayuan Chen	1adaea51c6	ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT On PREEMPT_RT kernels, after rt6_get_pcpu_route() returns NULL, the current task can be preempted. Another task running on the same CPU may then execute rt6_make_pcpu_route() and successfully install a pcpu_rt entry. When the first task resumes execution, its cmpxchg() in rt6_make_pcpu_route() will fail because rt6i_pcpu is no longer NULL, triggering the BUG_ON(prev). It's easy to reproduce it by adding mdelay() after rt6_get_pcpu_route(). Using preempt_disable/enable is not appropriate here because ip6_rt_pcpu_alloc() may sleep. Fix this by handling the cmpxchg() failure gracefully on PREEMPT_RT: free our allocation and return the existing pcpu_rt installed by another task. The BUG_ON is replaced by WARN_ON_ONCE for non-PREEMPT_RT kernels where such races should not occur. Link: https://syzkaller.appspot.com/bug?extid=9b35e9bc0951140d13e6 Fixes: `d2d6422f8b` ("x86: Allow to enable PREEMPT_RT.") Reported-by: syzbot+9b35e9bc0951140d13e6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6918cd88.050a0220.1c914e.0045.GAE@google.com/T/ Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20251223051413.124687-1-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-30 12:04:36 +01:00
Will Rosenberg	58fc7342b5	ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr() There exists a kernel oops caused by a BUG_ON(nhead < 0) at net/core/skbuff.c:2232 in pskb_expand_head(). This bug is triggered as part of the calipso_skbuff_setattr() routine when skb_cow() is passed headroom > INT_MAX (i.e. (int)(skb_headroom(skb) + len_delta) < 0). The root cause of the bug is due to an implicit integer cast in __skb_cow(). The check (headroom > skb_headroom(skb)) is meant to ensure that delta = headroom - skb_headroom(skb) is never negative, otherwise we will trigger a BUG_ON in pskb_expand_head(). However, if headroom > INT_MAX and delta <= -NET_SKB_PAD, the check passes, delta becomes negative, and pskb_expand_head() is passed a negative value for nhead. Fix the trigger condition in calipso_skbuff_setattr(). Avoid passing "negative" headroom sizes to skb_cow() within calipso_skbuff_setattr() by only using skb_cow() to grow headroom. PoC: Using `netlabelctl` tool: netlabelctl map del default netlabelctl calipso add pass doi:7 netlabelctl map add default address:0::1/128 protocol:calipso,7 Then run the following PoC: int fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP); // setup msghdr int cmsg_size = 2; int cmsg_len = 0x60; struct msghdr msg; struct sockaddr_in6 dest_addr; struct cmsghdr * cmsg = (struct cmsghdr ) calloc(1, sizeof(struct cmsghdr) + cmsg_len); msg.msg_name = &dest_addr; msg.msg_namelen = sizeof(dest_addr); msg.msg_iov = NULL; msg.msg_iovlen = 0; msg.msg_control = cmsg; msg.msg_controllen = cmsg_len; msg.msg_flags = 0; // setup sockaddr dest_addr.sin6_family = AF_INET6; dest_addr.sin6_port = htons(31337); dest_addr.sin6_flowinfo = htonl(31337); dest_addr.sin6_addr = in6addr_loopback; dest_addr.sin6_scope_id = 31337; // setup cmsghdr cmsg->cmsg_len = cmsg_len; cmsg->cmsg_level = IPPROTO_IPV6; cmsg->cmsg_type = IPV6_HOPOPTS; char hop_hdr = (char )cmsg + sizeof(struct cmsghdr); hop_hdr[1] = 0x9; //set hop size - (0x9 + 1) 8 = 80 sendmsg(fd, &msg, 0); Fixes: `2917f57b6b` ("calipso: Allow the lsm to label the skbuff directly.") Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Will Rosenberg <whrosenb@asu.edu> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20251219173637.797418-1-whrosenb@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-29 19:36:45 +01:00

1 2 3 4 5 ...

8870 Commits