linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-04 02:34:10 -04:00

Author	SHA1	Message	Date
Gang Yan	3fea468dca	selftests: mptcp: refactor send_query parameters for code clarity This patch use 'inet_diag_req_v2' instead of 'token' as parameters of send_query, and construct the req in 'get_mptcpinfo'. This modification enhances the clarity of the code, and prepare for the dump_subflow_info. Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-4-68eec95898fb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 16:52:00 -07:00
Gang Yan	cd732d5110	selftests: mptcp: add struct params in mptcp_diag This patch adds a struct named 'params' to save 'target_token' and other future parameters. This structure facilitates future function expansions. Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-3-68eec95898fb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 16:52:00 -07:00
Geliang Tang	dd367e81b7	selftests: mptcp: sockopt: use IPPROTO_MPTCP for getaddrinfo getaddrinfo MPTCP is recently supported in glibc and IPPROTO_MPTCP for getaddrinfo is used in mptcp_connect.c. But in mptcp_sockopt.c and mptcp_inq.c, IPPROTO_TCP are still used for getaddrinfo, So this patch updates them. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-2-68eec95898fb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 16:51:59 -07:00
Matthieu Baerts (NGI0)	6d0eb15c65	selftests: mptcp: info: hide 'grep: write error' warnings mptcp_lib_get_info_value() will only print the first entry that match the filter because of the ';q' at the end. As a consequence, the 'sed' command could finish before the previous 'grep' one and print a 'write error' warning because it is trying to write data to the closed pipe. Such warnings are not interesting, they can be hidden by muting stderr here for grep. While at it, clearly indicate that mptcp_lib_get_info_value() will only print the first matched entry to avoid confusions later on. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-1-68eec95898fb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 16:51:59 -07:00
Dr. David Alan Gilbert	ac8f09b921	sctp: Remove unused sctp_assoc_del_peer and sctp_chunk_iif sctp_assoc_del_peer() last use was removed in 2015 by commit `73e6742027` ("sctp: Do not try to search for the transport twice") which now uses rm_peer instead of del_peer. sctp_chunk_iif() last use was removed in 2016 by commit `1f45f78f8e` ("sctp: allow GSO frags to access the chunk too") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20250501233815.99832-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 16:51:12 -07:00
Andy Shevchenko	1f586017f5	net: phy: Refactor fwnode_get_phy_node() Refactor to check if the fwnode we got is correct and return if so, otherwise do additional checks. Using same pattern in all conditionals makes it slightly easier to read and understand. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20250430143802.3714405-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 16:50:26 -07:00
Dr. David Alan Gilbert	320a66f840	strparser: Remove unused __strp_unpause The last use of __strp_unpause() was removed in 2022 by commit `84c61fe1a7` ("tls: rx: do not use the standard strparser") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250501002402.308843-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 16:48:12 -07:00
Jakub Kicinski	b4cd2ee54c	Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-05-02 We've added 14 non-merge commits during the last 10 day(s) which contain a total of 13 files changed, 740 insertions(+), 121 deletions(-). The main changes are: 1) Avoid skipping or repeating a sk when using a UDP bpf_iter, from Jordan Rife. 2) Fixed a crash when a bpf qdisc is set in the net.core.default_qdisc, from Amery Hung. 3) A few other fixes in the bpf qdisc, from Amery Hung. - Always call qdisc_watchdog_init() in the .init prologue such that the .reset/.destroy epilogue can always call qdisc_watchdog_cancel() without issue. - bpf_qdisc_init_prologue() was incorrectly returning an error when the bpf qdisc is set as the default_qdisc and the mq is creating the default_qdisc. It is now fixed. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Cleanup bpf qdisc selftests selftests/bpf: Test attaching a bpf qdisc with incomplete operators bpf: net_sched: Make some Qdisc_ops ops mandatory selftests/bpf: Test setting and creating bpf qdisc as default qdisc bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc selftests/bpf: Add tests for bucket resume logic in UDP socket iterators selftests/bpf: Return socket cookies from sock_iter_batch progs bpf: udp: Avoid socket skips and repeats during iteration bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items bpf: udp: Get rid of st_bucket_done bpf: udp: Make sure iter->batch always contains a full bucket snapshot bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch bpf: net_sched: Fix using bpf qdisc as default qdisc selftests/bpf: Fix compilation errors ==================== Link: https://patch.msgid.link/20250503010755.4030524-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 13:22:58 -07:00
Ido Schimmel	836b313a14	ipv4: Honor "ignore_routes_with_linkdown" sysctl in nexthop selection Commit `32607a332c` ("ipv4: prefer multipath nexthop that matches source address") changed IPv4 nexthop selection to prefer a nexthop whose nexthop device is assigned the specified source address for locally generated traffic. While the selection honors the "fib_multipath_use_neigh" sysctl and will not choose a nexthop with an invalid neighbour, it does not honor the "ignore_routes_with_linkdown" sysctl and can choose a nexthop without a carrier: $ sysctl net.ipv4.conf.all.ignore_routes_with_linkdown net.ipv4.conf.all.ignore_routes_with_linkdown = 1 $ ip route show 198.51.100.0/24 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy1 weight 1 nexthop via 192.0.2.18 dev dummy2 weight 1 dead linkdown $ ip route get 198.51.100.1 from 192.0.2.17 198.51.100.1 from 192.0.2.17 via 192.0.2.18 dev dummy2 uid 0 Solve this by skipping over nexthops whose assigned hash upper bound is minus one, which is the value assigned to nexthops that do not have a carrier when the "ignore_routes_with_linkdown" sysctl is set. In practice, this probably does not matter a lot as the initial route lookup for the source address would not choose a nexthop that does not have a carrier in the first place, but the change does make the code clearer. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-05-03 21:52:38 +01:00
Kuniyuki Iwashima	586ceac9ac	ipv6: Restore fib6_config validation for SIOCADDRT. syzkaller reported out-of-bounds read in ipv6_addr_prefix(), where the prefix length was over 128. The cited commit accidentally removed some fib6_config validation from the ioctl path. Let's restore the validation. [0]: BUG: KASAN: slab-out-of-bounds in ip6_route_info_create (./include/net/ipv6.h:616 net/ipv6/route.c:3814) Read of size 1 at addr ff11000138020ad4 by task repro/261 CPU: 3 UID: 0 PID: 261 Comm: repro Not tainted 6.15.0-rc3-00614-g0d15a26b247d #87 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) print_report (mm/kasan/report.c:409 mm/kasan/report.c:521) kasan_report (mm/kasan/report.c:636) ip6_route_info_create (./include/net/ipv6.h:616 net/ipv6/route.c:3814) ip6_route_add (net/ipv6/route.c:3902) ipv6_route_ioctl (net/ipv6/route.c:4523) inet6_ioctl (net/ipv6/af_inet6.c:577) sock_do_ioctl (net/socket.c:1190) sock_ioctl (net/socket.c:1314) __x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:906 fs/ioctl.c:892 fs/ioctl.c:892) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f518fb2de5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007fff14f38d18 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f518fb2de5d RDX: 00000000200015c0 RSI: 000000000000890b RDI: 0000000000000003 RBP: 00007fff14f38d30 R08: 0000000000000800 R09: 0000000000000800 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff14f38e48 R13: 0000000000401136 R14: 0000000000403df0 R15: 00007f518fd3c000 </TASK> Fixes: `fa76c1674f` ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: Yi Lai <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/netdev/aBAcKDEFoN%2FLntBF@ly-workstation/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250501005335.53683-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:48:48 -07:00
Pedro Falcato	a2f6476ed1	mptcp: Align mptcp_inet6_sk with other protocols Ever since commit `f5f80e32de` ("ipv6: remove hard coded limitation on ipv6_pinfo") that protocols stopped using the old "obj_size - sizeof(struct ipv6_pinfo)" way of grabbing ipv6_pinfo, that severely restricted struct layout and caused fun, hard to see issues. However, mptcp_inet6_sk wasn't fixed (unlike tcp_inet6_sk). Do so. The non-cloned sockets already do the right thing using ipv6_pinfo_offset + the generic IPv6 code. Signed-off-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250430154541.1038561-1-pfalcato@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:28:21 -07:00
Jakub Kicinski	b88c382bed	Merge branch 'net-stmmac-replace-speed_mode_2500-method' Russell King says: ==================== net: stmmac: replace speed_mode_2500() method This series replaces the speed_mode_2500() method with a new method that is more flexible, allowing the platform glue driver to populate phylink's supported_interfaces and set the PHY-side interface mode. The only user of this method is currently dwmac-intel, which we update to use this new method. ==================== Link: https://patch.msgid.link/aBNe0Vt81vmqVCma@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:25:11 -07:00
Russell King (Oracle)	9d165dc580	net: stmmac: remove speed_mode_2500() method Remove the speed_mode_2500() platform method which is no longer used or necessary, being superseded by the more flexible get_interfaces() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uASM3-0021R3-2B@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:25:08 -07:00
Russell King (Oracle)	d3836052fe	net: stmmac: intel: convert speed_mode_2500() to get_interfaces() TGL platforms support either SGMII or 2500BASE-X, which is determined by reading a SERDES register. Thus, plat->phy_interface (and phylink's supported_interfaces) depend on this. Use the new .get_interfaces() method to set both plat->phy_interface and the supported_interfaces bitmap. This removes the only user of the .speed_mode_2500() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uASLx-0021Qs-Uz@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:25:08 -07:00
Russell King (Oracle)	0f455d2d1b	net: stmmac: intel: move phy_interface init to tgl_common_data() Move the initialisation of plat->phy_interface to tgl_common_data() as all callers set this same interface mode. This moves it to a single location to make the change to get_interfaces() more obvious. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uASLs-0021Qk-Qt@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:25:08 -07:00
Russell King (Oracle)	ca732e990f	net: stmmac: add get_interfaces() platform method Add a get_interfaces() platform method to allow platforms to indicate to phylink which interface modes they support - which then allows phylink to validate on initialisation that the configured PHY interface mode is actually supported. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1uASLn-0021Qd-Mi@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:25:08 -07:00
Russell King (Oracle)	1966be55da	net: stmmac: use priv->plat->phy_interface directly Avoid using a local variable for priv->plat->phy_interface as this may be modified in the .get_interfaces() method added in a future commit. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uASLi-0021QX-HG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:25:08 -07:00
Russell King (Oracle)	5ad39ceaea	net: stmmac: use a local variable for priv->phylink_config Use a local variable for priv->phylink_config in stmmac_phy_setup() which makes the code a bit easier to read, allowing some lines to be merged. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uASLd-0021QR-Cu@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-02 18:25:08 -07:00
Martin KaFai Lau	30190f82a1	Merge branch 'fix-bpf-qdisc-bugs-and-clean-up' Amery Hung says: ==================== Fix bpf qdisc bugs and clean up This patchset fixes the following bugs in bpf qdisc and clean up the selftest. - A null-pointer dereference can happen in qdisc_watchdog_cancel() if the timer is not initialized when 1) .init is not defined by user so init prologue is not generated. 2) .init fails and qdisc_create() calls .destroy - bpf qdisc fails to attach to mq/mqprio when being set as the default qdisc due to failed qdisc_lookup() in init prologue v2 - Rebase to bpf-next/net - Fix erroneous commit messages - Fix and simplify selftests cleanup v1: https://lore.kernel.org/bpf/20250501223025.569020-1-ameryhung@gmail.com/ ==================== Link: https://patch.msgid.link/20250502201624.3663079-1-ameryhung@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 15:51:17 -07:00
Amery Hung	2f9838e257	selftests/bpf: Cleanup bpf qdisc selftests Some cleanups: - Remove unnecessary kfuncs declaration - Use _ns in the test name to run tests in a separate net namespace - Call skeleton __attach() instead of bpf_map__attach_struct_ops() to simplify tests. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 15:51:17 -07:00
Amery Hung	6cda0e2c47	selftests/bpf: Test attaching a bpf qdisc with incomplete operators Implement .destroy in bpf_fq and bpf_fifo as it is now mandatory. Test attaching a bpf qdisc with a missing operator .init. This is not allowed as bpf qdisc qdisc_watchdog_cancel() could have been called with an uninitialized timer. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 15:42:48 -07:00
Amery Hung	64d6e3b9df	bpf: net_sched: Make some Qdisc_ops ops mandatory The patch makes all currently supported Qdisc_ops (i.e., .enqueue, .dequeue, .init, .reset, and .destroy) mandatory. Make .init, .reset and .destroy mandatory as bpf qdisc relies on prologue and epilogue to check attach points and correctly initialize/cleanup resources. The prologue/epilogue will only be generated for an struct_ops operator only if users implement the operator. Make .enqueue and .dequeue mandatory as bpf qdisc infra does not provide a default data path. Fixes: `c824034495` ("bpf: net_sched: Support implementation of Qdisc_ops in bpf") Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 15:35:37 -07:00
Amery Hung	6d080362c3	selftests/bpf: Test setting and creating bpf qdisc as default qdisc First, test that bpf qdisc can be set as default qdisc. Then, attach an mq qdisc to see if bpf qdisc can be successfully created and grafted. The test is a sequential test as net.core.default_qdisc is global. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 15:28:29 -07:00
Amery Hung	659b3b2c48	bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc Allow .init to proceed if qdisc_lookup() returns NULL as it only happens when called by qdisc_create_dflt() in mq/mqprio_init and the parent qdisc has not been added to qdisc_hash yet. In qdisc_create(), the caller, __tc_modify_qdisc(), would have made sure the parent qdisc already exist. In addition, call qdisc_watchdog_init() whether .init succeeds or not to prevent null-pointer dereference. In qdisc_create() and qdisc_create_dflt(), if .init fails, .destroy will be called. As a result, the destroy epilogue could call qdisc_watchdog_cancel() with an uninitialized timer, causing null-pointer deference in hrtimer_cancel(). Fixes: `c824034495` ("bpf: net_sched: Support implementation of Qdisc_ops in bpf") Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 14:50:08 -07:00
Martin KaFai Lau	1b1f563a25	Merge branch 'bpf-udp-exactly-once-socket-iteration' Jordan Rife says: ==================== bpf: udp: Exactly-once socket iteration Both UDP and TCP socket iterators use iter->offset to track progress through a bucket, which is a measure of the number of matching sockets from the current bucket that have been seen or processed by the iterator. On subsequent iterations, if the current bucket has unprocessed items, we skip at least iter->offset matching items in the bucket before adding any remaining items to the next batch. However, iter->offset isn't always an accurate measure of "things already seen" when the underlying bucket changes between reads which can lead to repeated or skipped sockets. Instead, this series remembers the cookies of the sockets we haven't seen yet in the current bucket and resumes from the first cookie in that list that we can find on the next iteration. This series focuses on UDP socket iterators, but a later series will apply a similar approach to TCP socket iterators. To be more specific, this series replaces struct sock *batch inside struct bpf_udp_iter_state with union bpf_udp_iter_batch_item batch, where union bpf_udp_iter_batch_item can contain either a pointer to a socket or a socket cookie. During reads, batch contains pointers to all sockets in the current batch while between reads batch contains all the cookies of the sockets in the current bucket that have yet to be processed. On subsequent reads, when iteration resumes, bpf_iter_udp_batch finds the first saved cookie that matches a socket in the bucket's socket list and picks up from there to construct the next batch. On average, assuming it's rare that the next socket disappears before the next read occurs, we should only need to scan as much as we did with the offset-based approach to find the starting point. In the case that the next socket is no longer there, we keep scanning through the saved cookies list until we find a match. The worst case is when none of the sockets from last time exist anymore, but again, this should be rare. CHANGES ======= v6 -> v7: * Move initialization of iter->state.bucket to -1 from patch five ("bpf: udp: Avoid socket skips and repeats during iteration") to patch three ("bpf: udp: Get rid of st_bucket_done") to avoid skipping the first bucket in the patch three and four (Martin). * Rename sock to sk in bpf_iter_batch_item (Martin). * Use ASSERT_OK_PTR in do_resume_test to check if counts is NULL (Martin). * goto done in do_resume_test when calloc or sock_iter_batch__open fails to make sure things are cleaned up properly, and initialize pointers to NULL explicitly to silence warnings from llvm 20 in CI. v5 -> v6: * Rework the logic in patch two ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot") again to simplify it: * Only try realloc with GFP_USER one time instead of two (Alexei). * v5 introduced a second call to bpf_iter_udp_realloc_batch inside the loop to handle the GFP_ATOMIC case. In v6, move the GFP_USER case inside the loop as well, so it's all in once place. This, I feel, makes it a bit easier to understand the control flow. Consequently, it also simplifies the logic outside the loop. * Use GFP_NOWAIT instead of GFP_ATOMIC to avoid depleting memory reserves, since iterators are not critical operation (Alexei). Alexei suggested using __GFP_NOWARN as well with GFP_NOWAIT, but this is already set inside bpf_iter_udp_realloc_batch, so no change was needed there. * Introduce patch three ("bpf: udp: Get rid of st_bucket_done") to simplify things further, since with patch two, st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk. * In patch five ("bpf: udp: Avoid socket skips and repeats during iteration"), initialize iter->state.bucket to -1 so that on the first call to bpf_iter_udp_batch, the resume_bucket condition is not hit. This avoids adding a special case to the condition around bpf_iter_udp_resume for bucket zero. v4 -> v5: * Rework the logic from patch two ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot") to move the handling of the GFP_ATOMIC case inside the main loop and get rid of the extra lock variable. This makes the logic clearer and makes it clearer that the bucket lock is always released (Martin). * Introduce udp_portaddr_for_each_entry_from in patch two instead of patch four ("bpf: udp: Avoid socket skips and repeats during iteration"), since patch two now needs to be able to resume list iteration from an arbitrary point in the GFP_ATOMIC case. * Similarly, introduce the memcpy inside bpf_iter_udp_realloc_batch in patch two instead of patch four, since in the GFP_ATOMIC case the new batch needs to remember the sockets from the old batch. * Use sock_gen_cookie instead of __sock_gen_cookie inside bpf_iter_udp_put_batch, since it can be called from a preemptible context (Martin). v3 -> v4: * Explicitly assign sk = NULL on !iter->end_sk exit condition (Kuniyuki). * Reword the commit message of patch two ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot") to make the reasoning for GFP_ATOMIC more clear. v2 -> v3: * Guarantee that iter->batch is always a full snapshot of a bucket to prevent socket repeat scenarios [3]. This supercedes the patch from v2 that simply propagated ENOMEM up from bpf_iter_udp_batch and covers the scenario where the batch size is still too small after a realloc. * Fix up self tests (Martin) * ASSERT_EQ(nread, sizeof(out), "nread") instead of ASSERT_GE(nread, 1, "nread) in read_n. * Use ASSERT_OK and ASSERT_OK_FD in several places. * Add missing free(counts) to do_resume_test. * Move int local_port declaration to the top of do_resume_test. * Remove unnecessary guards before close and free. v1 -> v2: * Drop WARN_ON_ONCE from bpf_iter_udp_realloc_batch (Kuniyuki). * Fixed memcpy size parameter in bpf_iter_udp_realloc_batch; before it was missing sizeof(elem) * (Kuniyuki). * Move "bpf: udp: Propagate ENOMEM up from bpf_iter_udp_batch" to patch two in the series (Kuniyuki). rfc [1] -> v1: * Use hlist_entry_safe directly to retrieve the first socket in the current bucket's linked list instead of immediately breaking from udp_portaddr_for_each_entry (Martin). * Cancel iteration if bpf_iter_udp_realloc_batch() can't grab enough memory to contain a full snapshot of the current bucket to prevent unwanted skips or repeats [2]. [1]: https://lore.kernel.org/bpf/20250404220221.1665428-1-jordan@jrife.io/ [2]: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/ [3]: https://lore.kernel.org/bpf/d323d417-3e8b-48af-ae94-bc28469ac0c1@linux.dev/ ==================== Link: https://patch.msgid.link/20250502161528.264630-1-jordan@jrife.io Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 12:07:54 -07:00
Jordan Rife	c58dcc1dbe	selftests/bpf: Add tests for bucket resume logic in UDP socket iterators Introduce a set of tests that exercise various bucket resume scenarios: * remove_seen resumes iteration after removing a socket from the bucket that we've already processed. Before, with the offset-based approach, this test would have skipped an unseen socket after resuming iteration. With the cookie-based approach, we now see all sockets exactly once. * remove_unseen exercises the condition where the next socket that we would have seen is removed from the bucket before we resume iteration. This tests the scenario where we need to scan past the first cookie in our remembered cookies list to find the socket from which to resume iteration. * remove_all exercises the condition where all sockets we remembered were removed from the bucket to make sure iteration terminates and returns no more results. * add_some exercises the condition where a few, but not enough to trigger a realloc, sockets are added to the head of the current bucket between reads. Before, with the offset-based approach, this test would have repeated sockets we've already seen. With the cookie-based approach, we now see all sockets exactly once. * force_realloc exercises the condition that we need to realloc the batch on a subsequent read, since more sockets than can be held in the current batch array were added to the current bucket. This exercies the logic inside bpf_iter_udp_realloc_batch that copies cookies into the new batch to make sure nothing is skipped or repeated. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 12:07:53 -07:00
Jordan Rife	4a0614e18c	selftests/bpf: Return socket cookies from sock_iter_batch progs Extend the iter_udp_soreuse and iter_tcp_soreuse programs to write the cookie of the current socket, so that we can track the identity of the sockets that the iterator has seen so far. Update the existing do_test function to account for this change to the iterator program output. At the same time, teach both programs to work with AF_INET as well. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 12:07:53 -07:00
Jordan Rife	5668f73f09	bpf: udp: Avoid socket skips and repeats during iteration Replace the offset-based approach for tracking progress through a bucket in the UDP table with one based on socket cookies. Remember the cookies of unprocessed sockets from the last batch and use this list to pick up where we left off or, in the case that the next socket disappears between reads, find the first socket after that point that still exists in the bucket and resume from there. This approach guarantees that all sockets that existed when iteration began and continue to exist throughout will be visited exactly once. Sockets that are added to the table during iteration may or may not be seen, but if they are they will be seen exactly once. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 12:07:46 -07:00
Jordan Rife	251c6636e0	bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items Prepare for the next patch that tracks cookies between iterations by converting struct sock *batch to union bpf_udp_iter_batch_item batch inside struct bpf_udp_iter_state. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>	2025-05-02 11:46:42 -07:00
Jordan Rife	3fae8959cd	bpf: udp: Get rid of st_bucket_done Get rid of the st_bucket_done field to simplify UDP iterator state and logic. Before, st_bucket_done could be false if bpf_iter_udp_batch returned a partial batch; however, with the last patch ("bpf: udp: Make sure iter->batch always contains a full bucket snapshot"), st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 10:54:38 -07:00
Jordan Rife	66d454e99d	bpf: udp: Make sure iter->batch always contains a full bucket snapshot Require that iter->batch always contains a full bucket snapshot. This invariant is important to avoid skipping or repeating sockets during iteration when combined with the next few patches. Before, there were two cases where a call to bpf_iter_udp_batch may only capture part of a bucket: 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1]. 2. When more sockets are added to the bucket while calling bpf_iter_udp_realloc_batch(), making the updated batch size insufficient [2]. In cases where the batch size only covers part of a bucket, it is possible to forget which sockets were already visited, especially if we have to process a bucket in more than two batches. This forces us to choose between repeating or skipping sockets, so don't allow this: 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation fails instead of continuing with a partial batch. 2. Try bpf_iter_udp_realloc_batch() with GFP_USER just as before, but if we still aren't able to capture the full bucket, call bpf_iter_udp_realloc_batch() again while holding the bucket lock to guarantee the bucket does not change. On the second attempt use GFP_NOWAIT since we hold onto the spin lock. Introduce the udp_portaddr_for_each_entry_from macro and use it instead of udp_portaddr_for_each_entry to make it possible to continue iteration from an arbitrary socket. This is required for this patch in the GFP_NOWAIT case to allow us to fill the rest of a batch starting from the middle of a bucket and the later patch which skips sockets that were already seen. Testing all scenarios directly is a bit difficult, but I did some manual testing to exercise the code paths where GFP_NOWAIT is used and where ERR_PTR(err) is returned. I used the realloc test case included later in this series to trigger a scenario where a realloc happens inside bpf_iter_udp_batch and made a small code tweak to force the first realloc attempt to allocate a too-small batch, thus requiring another attempt with GFP_NOWAIT. Some printks showed both reallocs with the tests passing: Apr 25 23:16:24 crow kernel: go again GFP_USER Apr 25 23:16:24 crow kernel: go again GFP_NOWAIT With this setup, I also forced each of the bpf_iter_udp_realloc_batch calls to return -ENOMEM to ensure that iteration ends and that the read() in userspace fails. [1]: https://lore.kernel.org/bpf/CABi4-ogUtMrH8-NVB6W8Xg_F_KDLq=yy-yu-tKr2udXE2Mu1Lg@mail.gmail.com/ [2]: https://lore.kernel.org/bpf/7ed28273-a716-4638-912d-f86f965e54bb@linux.dev/ Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-05-02 10:54:37 -07:00
Jordan Rife	3e485e15a1	bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch Prepare for the next patch which needs to be able to choose either GFP_USER or GFP_NOWAIT for calls to bpf_iter_udp_realloc_batch. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>	2025-05-02 10:54:35 -07:00
Paolo Abeni	cb9d6b2c2a	Merge branch 'tools-ynl-gen-additional-c-types-and-classic-netlink-handling' Jakub Kicinski says: ==================== tools: ynl-gen: additional C types and classic netlink handling This series is a bit of a random grab bag adding things we need to generate code for rt-link. First two patches are pretty random code cleanups. Patch 3 adds default values if the spec is missing them. Patch 4 adds support for setting Netlink request flags (NLM_F_CREATE, NLM_F_REPLACE etc.). Classic netlink uses those quite a bit. Patches 5 and 6 extend the notification handling for variations used in classic netlink. Patch 6 adds support for when notification ID is the same as the ID of the response message to GET. Next 4 patches add support for handling a couple of complex types. These are supported by the schema and Python but C code gen wasn't there. Patch 11 is a bit of a hack, it skips code related to kernel policy generation, since we don't need it for classic netlink. Patch 12 adds support for having different fixed headers per op. Something we could avoid in previous rtnetlink specs but some specs do mix. v2: https://lore.kernel.org/20250425024311.1589323-1-kuba@kernel.org v1: https://lore.kernel.org/20250424021207.1167791-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250429154704.2613851-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:06 +02:00
Jakub Kicinski	777c8029b5	tools: ynl: allow fixed-header to be specified per op rtnetlink has variety of ops with different fixed headers. Detect that op fixed header is not the same as family one, and use sizeof() directly. For reverse parsing we need to pass the fixed header len along the policy (in the socket state). Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-13-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:03 +02:00
Jakub Kicinski	18d574c8dd	tools: ynl-gen: don't init enum checks for classic netlink rt-link has a vlan-protocols enum with: name: 8021q value: 33024 name: 8021ad value: 34984 It's nice to have, since it converts the values to strings in Python. For C, however, the codegen is trying to use enums to generate strict policy checks. Parsing such sparse enums is not possible via policies. Since for classic netlink we don't support kernel codegen and policy generation - skip the auto-generation of checks from enums. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-12-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:03 +02:00
Jakub Kicinski	5f7804dd83	tools: ynl-gen: array-nest: support binary array with exact-len IPv6 addresses are expressed as binary arrays since we don't have u128. Since they are not variable length, however, they are relatively easy to represent as an array of known size. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-11-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:03 +02:00
Jakub Kicinski	18b1886447	tools: ynl-gen: array-nest: support put for scalar C codegen supports ArrayNest AKA indexed-array carrying scalars, but only for the netlink -> struct parsing. Support rendering from struct to netlink. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-10-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	3456084d63	tools: ynl-gen: mutli-attr: support binary types with struct Binary types with struct are fixed size, relatively easy to handle for multi attr. Declare the member as a pointer. Count the members, allocate an array, copy in the data. Allow the netlink attr to be smaller or larger than our view of the struct in case the build headers are newer or older than the running kernel. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-9-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	0ea8cf56cc	tools: ynl-gen: multi-attr: type gen for string Add support for multi attr strings (needed for link alt_names). We record the length individual strings in a len member, to do the same for multi-attr create a struct ynl_string in ynl.h and use it as a layer holding both the string and its length. Since strings may be arbitrary length dynamically allocate each individual one. Adjust arg_member and struct member to avoid spacing the double pointers to get "type *name;" rather than "type *name;" Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-8-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	49398830a4	tools: ynl-gen: support CRUD-like notifications for classic Netlink Allow CRUD-style notification where the notification is more like the response to the request, which can optionally be looped back onto the requesting socket. Since the notification and request are different ops in the spec, for example: - name: delrule doc: Remove an existing FIB rule attribute-set: fib-rule-attrs do: request: value: 33 attributes: *fib-rule-all - name: delrule-ntf doc: Notify a rule deletion value: 33 notify: getrule We need to find the request by ID. Ideally we'd detect this model from the spec properties, rather than assume that its what all classic netlink families do. But maybe that'd cause this model to spread and its easy to get wrong. For now assume CRUD == classic. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-7-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	bbfb3c557c	tools: ynl-gen: support using dump types for ntf Classic Netlink has GET callbacks with no doit support, just dumps. Support using their responses in notifications. If notification points at a type which only has a dump - use the dump's type. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-6-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	fe7d57e040	tools: ynl: let classic netlink requests specify extra nlflags Classic netlink makes extensive use of flags. Support specifying them the same way as attributes are specified (using a helper), for example: rt_link_newlink_req_set_nlflags(req, NLM_F_CREATE \| NLM_F_ECHO); Wrap the code up in a RenderInfo predicate. I think that some genetlink families may want this, too. It should be easy to add a spec property later. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-5-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	d12a7be025	tools: ynl-gen: fill in missing empty attr lists The C codegen refers to op attribute lists all over the place, without checking if they are present, even tho attribute list is technically an optional property. Add them automatically at init if missing so that we don't have to make specs longer. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-4-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	2286905f1b	tools: ynl-gen: factor out free_needs_iter for a struct Instead of walking the entries in the code gen add a method for the struct class to return if any of the members need an iterator. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Jakub Kicinski	a6471da774	tools: ynl-gen: fix comment about nested struct dict The dict stores struct objects (of class Struct), not just a trivial set with directions. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250429154704.2613851-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-02 12:41:02 +02:00
Alexey Charkov	630cb33ccf	dt-bindings: net: via-rhine: Convert to YAML Rewrite the textual description for the VIA Rhine platform Ethernet controller as YAML schema, and switch the filename to follow the compatible string. These are used in several VIA/WonderMedia SoCs Signed-off-by: Alexey Charkov <alchark@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250430-rhine-binding-v2-1-4290156c0f57@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-01 18:04:59 -07:00
Andrea Mayer	14a0087e72	ipv6: sr: switch to GFP_ATOMIC flag to allocate memory during seg6local LWT setup Recent updates to the locking mechanism that protects IPv6 routing tables [1] have affected the SRv6 networking subsystem. Such changes cause problems with some SRv6 Endpoints behaviors, like End.B6.Encaps and also impact SRv6 counters. Starting from commit `169fd62799` ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE."), the inet6_rtm_newroute() function no longer needs to acquire the RTNL lock for creating and configuring IPv6 routes and set up lwtunnels. The RTNL lock can be avoided because the ip6_route_add() function finishes setting up a new route in a section protected by RCU. This makes sure that no dev/nexthops can disappear during the operation. Because of this, the steps for setting up lwtunnels - i.e., calling lwtunnel_build_state() - are now done in a RCU lock section and not under the RTNL lock anymore. However, creating and configuring a lwtunnel instance in an RCU-protected section can be problematic when that tunnel needs to allocate memory using the GFP_KERNEL flag. For example, the following trace shows what happens when an SRv6 End.B6.Encaps behavior is instantiated after commit `169fd62799` ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE."): [ 3061.219696] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 [ 3061.226136] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 445, name: ip [ 3061.232101] preempt_count: 0, expected: 0 [ 3061.235414] RCU nest depth: 1, expected: 0 [ 3061.238622] 1 lock held by ip/445: [ 3061.241458] #0: ffffffff83ec64a0 (rcu_read_lock){....}-{1:3}, at: ip6_route_add+0x41/0x1e0 [ 3061.248520] CPU: 1 UID: 0 PID: 445 Comm: ip Not tainted 6.15.0-rc3-micro-vm-dev-00590-ge527e891492d #2058 PREEMPT(full) [ 3061.248532] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 3061.248549] Call Trace: [ 3061.248620] <TASK> [ 3061.248633] dump_stack_lvl+0xa9/0xc0 [ 3061.248846] __might_resched+0x218/0x360 [ 3061.248871] __kmalloc_node_track_caller_noprof+0x332/0x4e0 [ 3061.248889] ? rcu_is_watching+0x3a/0x70 [ 3061.248902] ? parse_nla_srh+0x56/0xa0 [ 3061.248938] kmemdup_noprof+0x1c/0x40 [ 3061.248952] parse_nla_srh+0x56/0xa0 [ 3061.248969] seg6_local_build_state+0x2e0/0x580 [ 3061.248992] ? __lock_acquire+0xaff/0x1cd0 [ 3061.249013] ? do_raw_spin_lock+0x111/0x1d0 [ 3061.249027] ? __pfx_seg6_local_build_state+0x10/0x10 [ 3061.249068] ? lwtunnel_build_state+0xe1/0x3a0 [ 3061.249274] lwtunnel_build_state+0x10d/0x3a0 [ 3061.249303] fib_nh_common_init+0xce/0x1e0 [ 3061.249337] ? __pfx_fib_nh_common_init+0x10/0x10 [ 3061.249352] ? in6_dev_get+0xaf/0x1f0 [ 3061.249369] ? __rcu_read_unlock+0x64/0x2e0 [ 3061.249392] fib6_nh_init+0x290/0xc30 [ 3061.249422] ? __pfx_fib6_nh_init+0x10/0x10 [ 3061.249447] ? __lock_acquire+0xaff/0x1cd0 [ 3061.249459] ? _raw_spin_unlock_irqrestore+0x22/0x70 [ 3061.249624] ? ip6_route_info_create+0x423/0x520 [ 3061.249641] ? rcu_is_watching+0x3a/0x70 [ 3061.249683] ip6_route_info_create_nh+0x190/0x390 [ 3061.249715] ip6_route_add+0x71/0x1e0 [ 3061.249730] ? __pfx_inet6_rtm_newroute+0x10/0x10 [ 3061.249743] inet6_rtm_newroute+0x426/0xc50 [ 3061.249764] ? avc_has_perm_noaudit+0x13d/0x360 [ 3061.249853] ? __pfx_inet6_rtm_newroute+0x10/0x10 [ 3061.249905] ? __lock_acquire+0xaff/0x1cd0 [ 3061.249962] ? rtnetlink_rcv_msg+0x52f/0x890 [ 3061.249996] ? __pfx_inet6_rtm_newroute+0x10/0x10 [ 3061.250012] rtnetlink_rcv_msg+0x551/0x890 [ 3061.250040] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 3061.250065] ? __lock_acquire+0xaff/0x1cd0 [ 3061.250092] netlink_rcv_skb+0xbd/0x1f0 [ 3061.250108] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 3061.250124] ? __pfx_netlink_rcv_skb+0x10/0x10 [ 3061.250179] ? netlink_deliver_tap+0x10b/0x700 [ 3061.250210] netlink_unicast+0x2e7/0x410 [ 3061.250232] ? __pfx_netlink_unicast+0x10/0x10 [ 3061.250241] ? __lock_acquire+0xaff/0x1cd0 [ 3061.250280] netlink_sendmsg+0x366/0x670 [ 3061.250306] ? __pfx_netlink_sendmsg+0x10/0x10 [ 3061.250313] ? find_held_lock+0x2d/0xa0 [ 3061.250344] ? import_ubuf+0xbc/0xf0 [ 3061.250370] ? __pfx_netlink_sendmsg+0x10/0x10 [ 3061.250381] __sock_sendmsg+0x13e/0x150 [ 3061.250420] ____sys_sendmsg+0x33d/0x450 [ 3061.250442] ? __pfx_____sys_sendmsg+0x10/0x10 [ 3061.250453] ? __pfx_copy_msghdr_from_user+0x10/0x10 [ 3061.250489] ? __pfx_slab_free_after_rcu_debug+0x10/0x10 [ 3061.250514] ___sys_sendmsg+0xe5/0x160 [ 3061.250530] ? __pfx____sys_sendmsg+0x10/0x10 [ 3061.250568] ? __lock_acquire+0xaff/0x1cd0 [ 3061.250617] ? find_held_lock+0x2d/0xa0 [ 3061.250678] ? __virt_addr_valid+0x199/0x340 [ 3061.250704] ? preempt_count_sub+0xf/0xc0 [ 3061.250736] __sys_sendmsg+0xca/0x140 [ 3061.250750] ? __pfx___sys_sendmsg+0x10/0x10 [ 3061.250786] ? syscall_exit_to_user_mode+0xa2/0x1e0 [ 3061.250825] do_syscall_64+0x62/0x140 [ 3061.250844] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 3061.250855] RIP: 0033:0x7f0b042ef914 [ 3061.250868] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53 [ 3061.250876] RSP: 002b:00007ffc2d113ef8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 3061.250885] RAX: ffffffffffffffda RBX: 00000000680f93fa RCX: 00007f0b042ef914 [ 3061.250891] RDX: 0000000000000000 RSI: 00007ffc2d113f60 RDI: 0000000000000003 [ 3061.250897] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008 [ 3061.250902] R10: fffffffffffff26d R11: 0000000000000246 R12: 0000000000000001 [ 3061.250907] R13: 000055a961f8a520 R14: 000055a961f63eae R15: 00007ffc2d115270 [ 3061.250952] </TASK> To solve this issue, we replace the GFP_KERNEL flag with the GFP_ATOMIC one in those SRv6 Endpoints that need to allocate memory during the setup phase. This change makes sure that memory allocations are handled in a way that works with RCU critical sections. [1] - https://lore.kernel.org/all/20250418000443.43734-1-kuniyu@amazon.com/ Fixes: `169fd62799` ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250429132453.31605-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-01 18:04:08 -07:00
Heiner Kallweit	a3e1c0ad83	net: phy: factor out provider part from mdio_bus.c After `52358dd63e` ("net: phy: remove function stubs") there's a problem if CONFIG_MDIO_BUS is set, but CONFIG_PHYLIB is not. mdiobus_scan() uses phylib functions like get_phy_device(). Bringing back the stub wouldn't make much sense, because it would allow to compile mdiobus_scan(), but the function would be unusable. The stub returned NULL, and we have the following in mdiobus_scan(): phydev = get_phy_device(bus, addr, c45); if (IS_ERR(phydev)) return phydev; So calling mdiobus_scan() w/o CONFIG_PHYLIB would cause a crash later in mdiobus_scan(). In general the PHYLIB functionality isn't optional here. Consequently, MDIO bus providers depend on PHYLIB. Therefore factor it out and build it together with the libphy core modules. In addition make all MDIO bus providers under /drivers/net/mdio depend on PHYLIB. Same applies to enetc MDIO bus provider. Note that PHYLIB selects MDIO_DEVRES, therefore we can omit this here. Fixes: `52358dd63e` ("net: phy: remove function stubs") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504270639.mT0lh2o1-lkp@intel.com/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/c74772a9-dab6-44bf-a657-389df89d85c2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-01 18:03:29 -07:00
Daniel Golle	51cf06ddaf	net: ethernet: mtk_eth_soc: add support for MT7988 internal 2.5G PHY The MediaTek MT7988 SoC comes with an single built-in Ethernet PHY for 2500Base-T/1000Base-T/100Base-TX/10Base-T link partners in addition to the built-in 1GE switch. The built-in PHY only supports full duplex. Add muxes allowing to select GMAC2->2.5G PHY path and add basic support for XGMAC as the built-in 2.5G PHY is internally connected via XGMII. The XGMAC features will also be used by 5GBase-R, 10GBase-R and USXGMII SerDes modes which are going to be added once support for standalone PCS drivers is in place. In order to make use of the built-in 2.5G PHY the appropriate PHY driver as well as (proprietary) PHY firmware has to be present as well. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/9072cefbff6db969720672ec98ed5cef65e8218c.1745715380.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-01 18:00:15 -07:00
Eric Biggers	7a4f15cadc	r8152: use SHA-256 library API instead of crypto_shash API This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://patch.msgid.link/20250428191606.856198-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-01 17:59:32 -07:00

1 2 3 4 5 ...

1353052 Commits