linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 19:31:42 -04:00

Author	SHA1	Message	Date
Pengpeng Hou	a44ce6aa2e	rxrpc: proc: size address buffers for %pISpc output The AF_RXRPC procfs helpers format local and remote socket addresses into fixed 50-byte stack buffers with "%pISpc". That is too small for the longest current-tree IPv6-with-port form the formatter can produce. In lib/vsprintf.c, the compressed IPv6 path uses a dotted-quad tail not only for v4mapped addresses, but also for ISATAP addresses via ipv6_addr_is_isatap(). As a result, a case such as [ffff:ffff:ffff:ffff:0:5efe:255.255.255.255]:65535 is possible with the current formatter. That is 50 visible characters, so 51 bytes including the trailing NUL, which does not fit in the existing char[50] buffers used by net/rxrpc/proc.c. Size the buffers from the formatter's maximum textual form and switch the call sites to scnprintf(). Changes since v1: - correct the changelog to cite the actual maximum current-tree case explicitly - frame the proof around the ISATAP formatting path instead of the earlier mapped-v4 example Fixes: `75b54cb57c` ("rxrpc: Add IPv6 support") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Anderson Nascimento <anderson@allelesecurity.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-22-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:45:32 -07:00
Wang Jie	c43ffdcfdb	rxrpc: only handle RESPONSE during service challenge Only process RESPONSE packets while the service connection is still in RXRPC_CONN_SERVICE_CHALLENGING. Check that state under state_lock before running response verification and security initialization, then use a local secured flag to decide whether to queue the secured-connection work after the state transition. This keeps duplicate or late RESPONSE packets from re-running the setup path and removes the unlocked post-transition state test. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jie Wang <jiewang2024@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-21-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:45:05 -07:00
David Howells	f564af387c	rxrpc: Fix buffer overread in rxgk_do_verify_authenticator() Fix rxgk_do_verify_authenticator() to check the buffer size before checking the nonce. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-20-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	7e1876caa8	rxrpc: Fix leak of rxgk context in rxgk_verify_response() Fix rxgk_verify_response() to clean up the rxgk context it creates. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-19-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	699e52180f	rxrpc: Fix integer overflow in rxgk_verify_response() In rxgk_verify_response(), there's a potential integer overflow due to rounding up token_len before checking it, thereby allowing the length check to be bypassed. Fix this by checking the unrounded value against len too (len is limited as the response must fit in a single UDP packet). Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-18-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	f93af41b9f	rxrpc: Fix missing error checks for rxkad encryption/decryption failure Add error checking for failure of crypto_skcipher_en/decrypt() to various rxkad function as the crypto functions can fail with ENOMEM at least. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-17-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	2afd86ccbb	rxrpc: Fix key/keyring checks in setsockopt(RXRPC_SECURITY_KEY/KEYRING) An AF_RXRPC socket can be both client and server at the same time. When sending new calls (ie. it's acting as a client), it uses rx->key to set the security, and when accepting incoming calls (ie. it's acting as a server), it uses rx->securities. setsockopt(RXRPC_SECURITY_KEY) sets rx->key to point to an rxrpc-type key and setsockopt(RXRPC_SECURITY_KEYRING) sets rx->securities to point to a keyring of rxrpc_s-type keys. Now, it should be possible to use both rx->key and rx->securities on the same socket - but for userspace AF_RXRPC sockets rxrpc_setsockopt() prevents that. Fix this by: (1) Remove the incorrect check rxrpc_setsockopt(RXRPC_SECURITY_KEYRING) makes on rx->key. (2) Move the check that rxrpc_setsockopt(RXRPC_SECURITY_KEY) makes on rx->key down into rxrpc_request_key(). (3) Remove rxrpc_request_key()'s check on rx->securities. This (in combination with a previous patch) pushes the checks down into the functions that set those pointers and removes the cross-checks that prevent both key and keyring being set. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Anderson Nascimento <anderson@allelesecurity.com> cc: Luxiao Xu <rakukuip@gmail.com> cc: Yuan Tan <yuantan098@gmail.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
Luxiao Xu	f125846ee7	rxrpc: fix reference count leak in rxrpc_server_keyring() This patch fixes a reference count leak in rxrpc_server_keyring() by checking if rx->securities is already set. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Luxiao Xu <rakukuip@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-15-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Keenan Dong	a2567217ad	rxrpc: fix oversized RESPONSE authenticator length check rxgk_verify_response() decodes auth_len from the packet and is supposed to verify that it fits in the remaining bytes. The existing check is inverted, so oversized RESPONSE authenticators are accepted and passed to rxgk_decrypt_skb(), which can later reach skb_to_sgvec() with an impossible length and hit BUG_ON(len). Decoded from the original latest-net reproduction logs with scripts/decode_stacktrace.sh: RIP: __skb_to_sgvec() [net/core/skbuff.c:5285 (discriminator 1)] Call Trace: skb_to_sgvec() [net/core/skbuff.c:5305] rxgk_decrypt_skb() [net/rxrpc/rxgk_common.h:81] rxgk_verify_response() [net/rxrpc/rxgk.c:1268] rxrpc_process_connection() [net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364 net/rxrpc/conn_event.c:386] process_one_work() [kernel/workqueue.c:3281] worker_thread() [kernel/workqueue.c:3353 kernel/workqueue.c:3440] kthread() [kernel/kthread.c:436] ret_from_fork() [arch/x86/kernel/process.c:164] Reject authenticator lengths that exceed the remaining packet payload. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: Willy Tarreau <w@1wt.eu> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-14-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Keenan Dong	3e31380078	rxrpc: fix RESPONSE authenticator parser OOB read rxgk_verify_authenticator() copies auth_len bytes into a temporary buffer and then passes p + auth_len as the parser limit to rxgk_do_verify_authenticator(). Since p is a __be32 *, that inflates the parser end pointer by a factor of four and lets malformed RESPONSE authenticators read past the kmalloc() buffer. Decoded from the original latest-net reproduction logs with scripts/decode_stacktrace.sh: BUG: KASAN: slab-out-of-bounds in rxgk_verify_response() Call Trace: dump_stack_lvl() [lib/dump_stack.c:123] print_report() [mm/kasan/report.c:379 mm/kasan/report.c:482] kasan_report() [mm/kasan/report.c:597] rxgk_verify_response() [net/rxrpc/rxgk.c:1103 net/rxrpc/rxgk.c:1167 net/rxrpc/rxgk.c:1274] rxrpc_process_connection() [net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364 net/rxrpc/conn_event.c:386] process_one_work() [kernel/workqueue.c:3281] worker_thread() [kernel/workqueue.c:3353 kernel/workqueue.c:3440] kthread() [kernel/kthread.c:436] ret_from_fork() [arch/x86/kernel/process.c:164] Allocated by task 54: rxgk_verify_response() [include/linux/slab.h:954 net/rxrpc/rxgk.c:1155 net/rxrpc/rxgk.c:1274] rxrpc_process_connection() [net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364 net/rxrpc/conn_event.c:386] Convert the byte count to __be32 units before constructing the parser limit. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: Willy Tarreau <w@1wt.eu> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-13-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Yuqi Xu	fe4447cd95	rxrpc: reject undecryptable rxkad response tickets rxkad_decrypt_ticket() decrypts the RXKAD response ticket and then parses the buffer as plaintext without checking whether crypto_skcipher_decrypt() succeeded. A malformed RESPONSE can therefore use a non-block-aligned ticket length, make the decrypt operation fail, and still drive the ticket parser with attacker-controlled bytes. Check the decrypt result and abort the connection with RXKADBADTICKET when ticket decryption fails. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Yuqi Xu <xuyuqiabc@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-12-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Douya Le	6331f1b24a	rxrpc: Only put the call ref if one was acquired rxrpc_input_packet_on_conn() can process a to-client packet after the current client call on the channel has already been torn down. In that case chan->call is NULL, rxrpc_try_get_call() returns NULL and there is no reference to drop. The client-side implicit-end error path does not account for that and unconditionally calls rxrpc_put_call(). This turns a protocol error path into a kernel crash instead of rejecting the packet. Only drop the call reference if one was actually acquired. Keep the existing protocol error handling unchanged. Fixes: `5e6ef4f101` ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Signed-off-by: Douya Le <ldy3087146292@gmail.com> Co-developed-by: Yuan Tan <tanyuan98@gmail.com> Signed-off-by: Yuan Tan <tanyuan98@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ao Zhou <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-11-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Marc Dionne	0cd3e3f3f2	rxrpc: Fix to request an ack if window is limited Peers may only send immediate acks for every 2 UDP packets received. When sending a jumbogram, it is important to check that there is sufficient window space to send another same sized jumbogram following the current one, and request an ack if there isn't. Failure to do so may cause the call to stall waiting for an ack until the resend timer fires. Where jumbograms are in use this causes a very significant drop in performance. Fixes: `fe24a54943` ("rxrpc: Send jumbo DATA packets") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-10-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Anderson Nascimento	d666540d21	rxrpc: Fix key reference count leak from call->key When creating a client call in rxrpc_alloc_client_call(), the code obtains a reference to the key. This is never cleaned up and gets leaked when the call is destroyed. Fix this by freeing call->key in rxrpc_destroy_call(). Before the patch, it shows the key reference counter elevated: $ cat /proc/keys \| grep afs@54321 1bffe9cd I--Q--i 8053480 4169w 3b010000 1000 1000 rxrpc afs@54321: ka $ After the patch, the invalidated key is removed when the code exits: $ cat /proc/keys \| grep afs@54321 $ Fixes: `f3441d4125` ("rxrpc: Copy client call parameters into rxrpc_call earlier") Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-9-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
Alok Tiwari	65b3ffe097	rxrpc: Fix rack timer warning to report unexpected mode rxrpc_rack_timer_expired() clears call->rack_timer_mode to OFF before the switch. The default case warning therefore always prints OFF and doesn't identify the unexpected timer mode. Log the saved mode value instead so the warning reports the actual unexpected rack timer mode. Fixes: `7c48266593` ("rxrpc: Implement RACK/TLP to deal with transmission stalls [RFC8985]") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-8-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
Alok Tiwari	b33f5741bb	rxrpc: Fix use of wrong skb when comparing queued RESP challenge serial In rxrpc_post_response(), the code should be comparing the challenge serial number from the cached response before deciding to switch to a newer response, but looks at the newer packet private data instead, rendering the comparison always false. Fix this by switching to look at the older packet. Fix further[1] to substitute the new packet in place of the old one if newer and also to release whichever we don't use. Fixes: `5800b1cf3f` ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com [1] Link: https://patch.msgid.link/20260408121252.2249051-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
Oleh Konko	d179a868dd	rxrpc: Fix RxGK token loading to check bounds rxrpc_preparse_xdr_yfs_rxgk() reads the raw key length and ticket length from the XDR token as u32 values and passes each through round_up(x, 4) before using the rounded value for validation and allocation. When the raw length is >= 0xfffffffd, round_up() wraps to 0, so the bounds check and kzalloc both use 0 while the subsequent memcpy still copies the original ~4 GiB value, producing a heap buffer overflow reachable from an unprivileged add_key() call. Fix this by: (1) Rejecting raw key lengths above AFSTOKEN_GK_KEY_MAX and raw ticket lengths above AFSTOKEN_GK_TOKEN_MAX before rounding, consistent with the caps that the RxKAD path already enforces via AFSTOKEN_RK_TIX_MAX. (2) Sizing the flexible-array allocation from the validated raw key length via struct_size_t() instead of the rounded value. (3) Caching the raw lengths so that the later field assignments and memcpy calls do not re-read from the token, eliminating a class of TOCTOU re-parse. The control path (valid token with lengths within bounds) is unaffected. Fixes: `0ca100ff4d` ("rxrpc: Add YFS RxGK (GSSAPI) security class") Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
David Howells	146d4ab94c	rxrpc: Fix call removal to use RCU safe deletion Fix rxrpc call removal from the rxnet->calls list to use list_del_rcu() rather than list_del_init() to prevent stuffing up reading /proc/net/rxrpc/calls from potentially getting into an infinite loop. This, however, means that list_empty() no longer works on an entry that's been deleted from the list, making it harder to detect prior deletion. Fix this by: Firstly, make rxrpc_destroy_all_calls() only dump the first ten calls that are unexpectedly still on the list. Limiting the number of steps means there's no need to call cond_resched() or to remove calls from the list here, thereby eliminating the need for rxrpc_put_call() to check for that. rxrpc_put_call() can then be fixed to unconditionally delete the call from the list as it is the only place that the deletion occurs. Fixes: `2baec2c3f8` ("rxrpc: Support network namespacing") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
David Howells	6a59d84b4f	rxrpc: Fix anonymous key handling In rxrpc_new_client_call_for_sendmsg(), a key with no payload is meant to be substituted for a NULL key pointer, but the variable this is done with is subsequently not used. Fix this by using "key" rather than "rx->key" when filling in the connection parameters. Note that this only affects direct use of AF_RXRPC; the kAFS filesystem doesn't use sendmsg() directly and so bypasses the issue. Further, AF_RXRPC passes a NULL key in if no key is set, so using an anonymous key in that manner works. Since this hasn't been noticed to this point, it might be better just to remove the "key" variable and the code that sets it - and, arguably, rxrpc_init_client_call_security() would be a better place to handle it. Fixes: `19ffa01c9c` ("rxrpc: Use structs to hold connection params and protocol info") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:31 -07:00
David Howells	b555912b9b	rxrpc: Fix key parsing memleak In rxrpc_preparse_xdr_yfs_rxgk(), the memory attached to token->rxgk can be leaked in a few error paths after it's allocated. Fix this by freeing it in the "reject_token:" case. Fixes: `0ca100ff4d` ("rxrpc: Add YFS RxGK (GSSAPI) security class") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:31 -07:00
David Howells	bdbfead6d3	rxrpc: Fix key quota calculation for multitoken keys In the rxrpc key preparsing, every token extracted sets the proposed quota value, but for multitoken keys, this will overwrite the previous proposed quota, losing it. Fix this by adding to the proposed quota instead. Fixes: `8a7a3eb4dd` ("KEYS: RxRPC: Use key preparsing") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:31 -07:00
Felix Gu	c09ea768bd	net: mdio: realtek-rtl9300: use scoped device_for_each_child_node loop Switch to device_for_each_child_node_scoped() to auto-release fwnode references on early exit. Fixes: `24e31e4747` ("net: mdio: Add RTL9300 MDIO driver") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260405-rtl9300-v1-1-08e4499cf944@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:42:08 -07:00
Jakub Kicinski	f821664dde	Merge branch 'seg6-fix-dst_cache-sharing-in-seg6-lwtunnel' Andrea Mayer says: ==================== seg6: fix dst_cache sharing in seg6 lwtunnel The seg6 lwtunnel encap uses a single per-route dst_cache shared between seg6_input_core() and seg6_output_core(). These two paths can perform the post-encap SID lookup in different routing contexts (e.g., ip rules matching on the ingress interface, or VRF table separation). Whichever path runs first populates the cache, and the other reuses it blindly, bypassing its own lookup. Patch 1 fixes this by splitting the cache into cache_input and cache_output. Patch 2 adds a selftest that validates the isolation. ==================== Link: https://patch.msgid.link/20260404004405.4057-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 20:21:00 -07:00
Andrea Mayer	32dfd742f0	selftests: seg6: add test for dst_cache isolation in seg6 lwtunnel Add a selftest that verifies the dst_cache in seg6 lwtunnel is not shared between the input (forwarding) and output (locally generated) paths. The test creates three namespaces (ns_src, ns_router, ns_dst) connected in a line. An SRv6 encap route on ns_router encapsulates traffic destined to cafe::1 with SID fc00::100. The SID is reachable only for forwarded traffic (from ns_src) via an ip rule matching the ingress interface (iif veth-r0 lookup 100), and blackholed in the main table. The test verifies that: 1. A packet generated locally on ns_router does not reach ns_dst with an empty cache, since the SID is blackholed; 2. A forwarded packet from ns_src populates the input cache from table 100 and reaches ns_dst; 3. A packet generated locally on ns_router still does not reach ns_dst after the input cache is populated, confirming the output path does not reuse the input cache entry. Both the forwarded and local packets are pinned to the same CPU with taskset, since dst_cache is per-cpu. Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260404004405.4057-3-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 20:20:56 -07:00
Andrea Mayer	c3812651b5	seg6: separate dst_cache for input and output paths in seg6 lwtunnel The seg6 lwtunnel uses a single dst_cache per encap route, shared between seg6_input_core() and seg6_output_core(). These two paths can perform the post-encap SID lookup in different routing contexts (e.g., ip rules matching on the ingress interface, or VRF table separation). Whichever path runs first populates the cache, and the other reuses it blindly, bypassing its own lookup. Fix this by splitting the cache into cache_input and cache_output, so each path maintains its own cached dst independently. Fixes: `6c8702c60b` ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Cc: stable@vger.kernel.org Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260404004405.4057-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 20:20:56 -07:00
Daniel Golle	efaa71faf2	selftests: net: bridge_vlan_mcast: wait for h1 before querier check The querier-interval test adds h1 (currently a slave of the VRF created by simple_if_init) to a temporary bridge br1 acting as an outside IGMP querier. The kernel VRF driver (drivers/net/vrf.c) calls cycle_netdev() on every slave add and remove, toggling the interface admin-down then up. Phylink takes the PHY down during the admin-down half of that cycle. Since h1 and swp1 are cable-connected, swp1 also loses its link may need several seconds to re-negotiate. Use setup_wait_dev $h1 0 which waits for h1 to return to UP state, so the test can rely on the link being back up at this point. Fixes: `4d8610ee8b` ("selftests: net: bridge: add vlan mcast_querier_interval tests") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Link: https://patch.msgid.link/c830f130860fd2efae08bfb9e5b25fd028e58ce5.1775424423.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 20:16:16 -07:00
Jakub Kicinski	944b3b734c	net: avoid nul-deref trying to bind mp to incapable device Sashiko points out that we use qops in __net_mp_open_rxq() but never validate they are null. This was introduced when check was moved from netdev_rx_queue_restart(). Look at ops directly instead of the locking config. qops imply netdev_need_ops_lock(). We used netdev_need_ops_lock() initially to signify that the real_num_rx_queues check below is safe without rtnl_lock, but I'm not sure if this is actually clear to most people, anyway. Fixes: `da7772a2b4` ("net: move mp->rx_page_size validation to __net_mp_open_rxq()") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20260404001938.2425670-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 18:57:56 -07:00
Johan Alvarado	f2777d5cb5	net: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure This patch fixes an issue where reading the MAC address from the eFUSE fails due to a race condition. The root cause was identified by comparing the driver's behavior with a custom U-Boot port. In U-Boot, the MAC address was read successfully every time because the driver was loaded later in the boot process, giving the hardware ample time to initialize. In Linux, reading the eFUSE immediately returns all zeros, resulting in a fallback to a random MAC address. Hardware cold-boot testing revealed that the eFUSE controller requires a short settling time to load its internal data. Adding a 2000-5000us delay after the reset ensures the hardware is fully ready, allowing the native MAC address to be read consistently. Fixes: `02ff155ea2` ("net: stmmac: Add glue driver for Motorcomm YT6801 ethernet controller") Reported-by: Georg Gottleuber <ggo@tuxedocomputers.com> Closes: https://lore.kernel.org/24cfefff-1233-4745-8c47-812b502d5d19@tuxedocomputers.com Signed-off-by: Johan Alvarado <contact@c127.dev> Reviewed-by: Yao Zi <me@ziyao.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/fc5992a4-9532-49c3-8ec1-c2f8c5b84ca1@smtp-relay.sendinblue.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 18:21:00 -07:00
John Pavlick	95aca8602e	net: sfp: add quirks for Hisense and HSGQ GPON ONT SFP modules Several GPON ONT SFP sticks based on Realtek RTL960x report 1000BASE-LX at 1300MBd in their EEPROM but can operate at 2500base-X. On hosts capable of 2500base-X (e.g. Banana Pi R3 / MT7986), the kernel negotiates only 1G because it trusts the incorrect EEPROM data. Add quirks for: - Hisense-Leox LXT-010S-H - Hisense ZNID-GPON-2311NA - HSGQ HSGQ-XPON-Stick Each quirk advertises 2500base-X and ignores TX_FAULT during the module's ~40s Linux boot time. Tested on Banana Pi R3 (MT7986) with OpenWrt 25.12.1, confirmed 2.5Gbps link and full throughput with flow offloading. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Suggested-by: Marcin Nita <marcin.nita@leolabs.pl> Signed-off-by: John Pavlick <jspavlick@posteo.net> Link: https://patch.msgid.link/20260406132321.72563-1-jspavlick@posteo.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 18:13:51 -07:00
Muhammad Alifa Ramdhan	a9b8b18364	net/tls: fix use-after-free in -EBUSY error path of tls_do_encryption The -EBUSY handling in tls_do_encryption(), introduced by commit `8590541473` ("net: tls: handle backlogging of crypto requests"), has a use-after-free due to double cleanup of encrypt_pending and the scatterlist entry. When crypto_aead_encrypt() returns -EBUSY, the request is enqueued to the cryptd backlog and the async callback tls_encrypt_done() will be invoked upon completion. That callback unconditionally restores the scatterlist entry (sge->offset, sge->length) and decrements ctx->encrypt_pending. However, if tls_encrypt_async_wait() returns an error, the synchronous error path in tls_do_encryption() performs the same cleanup again, double-decrementing encrypt_pending and double-restoring the scatterlist. The double-decrement corrupts the encrypt_pending sentinel (initialized to 1), making tls_encrypt_async_wait() permanently skip the wait for pending async callbacks. A subsequent sendmsg can then free the tls_rec via bpf_exec_tx_verdict() while a cryptd callback is still pending, resulting in a use-after-free when the callback fires on the freed record. Fix this by skipping the synchronous cleanup when the -EBUSY async wait returns an error, since the callback has already handled encrypt_pending and sge restoration. Fixes: `8590541473` ("net: tls: handle backlogging of crypto requests") Cc: stable@vger.kernel.org Signed-off-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260403013617.2838875-1-ramdhan@starlabs.sg Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-07 14:53:42 +02:00
Michael Guralnik	a9d4f4f6e6	net/mlx5: Update the list of the PCI supported devices Add the upcoming ConnectX-10 NVLink-C2C device ID to the table of supported PCI device IDs. Cc: stable@vger.kernel.org Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260403091756.139583-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 19:17:42 -07:00
Jiayuan Chen	0f42e3f4fe	net: skb: fix cross-cache free of KFENCE-allocated skb head SKB_SMALL_HEAD_CACHE_SIZE is intentionally set to a non-power-of-2 value (e.g. 704 on x86_64) to avoid collisions with generic kmalloc bucket sizes. This ensures that skb_kfree_head() can reliably use skb_end_offset to distinguish skb heads allocated from skb_small_head_cache vs. generic kmalloc caches. However, when KFENCE is enabled, kfence_ksize() returns the exact requested allocation size instead of the slab bucket size. If a caller (e.g. bpf_test_init) allocates skb head data via kzalloc() and the requested size happens to equal SKB_SMALL_HEAD_CACHE_SIZE, then slab_build_skb() -> ksize() returns that exact value. After subtracting skb_shared_info overhead, skb_end_offset ends up matching SKB_SMALL_HEAD_HEADROOM, causing skb_kfree_head() to incorrectly free the object to skb_small_head_cache instead of back to the original kmalloc cache, resulting in a slab cross-cache free: kmem_cache_free(skbuff_small_head): Wrong slab cache. Expected skbuff_small_head but got kmalloc-1k Fix this by always calling kfree(head) in skb_kfree_head(). This keeps the free path generic and avoids allocator-specific misclassification for KFENCE objects. Fixes: `bf9f1baa27` ("net: add dedicated kmem_cache for typical/small skb->head") Reported-by: Antonius <antonius@bluedragonsec.com> Closes: https://lore.kernel.org/netdev/CAK8a0jxC5L5N7hq-DT2_NhUyjBxrPocoiDazzsBk4TGgT1r4-A@mail.gmail.com/ Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260403014517.142550-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:46:53 -07:00
Stefano Garzarella	24ad7ff668	vsock/test: fix send_buf()/recv_buf() EINTR handling When send() or recv() returns -1 with errno == EINTR, the code skips the break but still adds the return value to nwritten/nread, making it decrease by 1. This leads to wrong buffer offsets and wrong bytes count. Fix it by explicitly continuing the loop on EINTR, so the return value is only added when it is positive. Fixes: `a8ed71a27e` ("vsock/test: add recv_buf() utility function") Fixes: `12329bd51f` ("vsock/test: add send_buf() utility function") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20260403093251.30662-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:46:03 -07:00
Jakub Kicinski	270c0637b9	Merge branch 'xsk-tailroom-reservation-and-mtu-validation' Maciej Fijalkowski says: ==================== xsk: tailroom reservation and MTU validation here we fix a long-standing issue regarding multi-buffer scenario in ZC mode - we have not been providing space at the end of the buffer where multi-buffer XDP works on skb_shared_info. This has been brought to our attention via [0]. Unaligned mode does not get any specific treatment, it is user's responsibility to properly handle XSK addresses in queues. With adjustments included here in this set against xskxceiver I have been able to pass the full test suite on ice. [0]: https://community.intel.com/t5/Ethernet-Products/X710-XDP-Packet-Corruption-Issue-DRV-MODE-Zero-Copy-Multi-Buffer/m-p/1724208 ==================== Link: https://patch.msgid.link/20260402154958.562179-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:54 -07:00
Maciej Fijalkowski	62838e363e	selftests: bpf: adjust rx_dropped xskxceiver's test to respect tailroom Since we have changed how big user defined headroom in umem can be, change the logic in testapp_stats_rx_dropped() so we pass updated headroom validation in xdp_umem_reg() and still drop half of frames. Test works on non-mbuf setup so __xsk_pool_get_rx_frame_size() that is called on xsk_rcv_check() will not account skb_shared_info size. Taking the tailroom size into account in test being fixed is needed as xdp_umem_reg() defaults to respect it. Reviewed-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-9-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:52 -07:00
Maciej Fijalkowski	16546954e1	selftests: bpf: have a separate variable for drop test Currently two different XDP programs share a static variable for different purposes (picking where to redirect on shared umem test & whether to drop a packet). This can be a problem when running full test suite - idx can be written by shared umem test and this value can cause a false behavior within XDP drop half test. Introduce a dedicated variable for drop half test so that these two don't step on each other toes. There is no real need for using __sync_fetch_and_add here as XSK tests are executed on single CPU. Reviewed-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-8-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:52 -07:00
Maciej Fijalkowski	3197c51ce2	selftests: bpf: fix pkt grow tests Skip tail adjust tests in xskxceiver for SKB mode as it is not very friendly for it. multi-buffer case does not work as xdp_rxq_info that is registered for generic XDP does not report ::frag_size. The non-mbuf path copies packet via skb_pp_cow_data() which only accounts for headroom, leaving us with no tailroom and causing underlying XDP prog to drop packets therefore. For multi-buffer test on other modes, change the amount of bytes we use for growth, assume worst-case scenario and take care of headroom and tailroom. Reviewed-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-7-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Maciej Fijalkowski	c5866a6be4	selftests: bpf: introduce a common routine for reading procfs Parametrize current way of getting MAX_SKB_FRAGS value from {sys,proc}fs so that it can be re-used to get cache line size of system's CPU. All that just to mimic and compute size of kernel's struct skb_shared_info which for xsk and test suite interpret as tailroom. Introduce two variables to ifobject struct that will carry count of skb frags and tailroom size. Do the reading and computing once, at the beginning of test suite execution in xskxceiver, but for test_progs such way is not possible as in this environment each test setups and torns down ifobject structs. Reviewed-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-6-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Maciej Fijalkowski	36ee60b569	xsk: validate MTU against usable frame size on bind AF_XDP bind currently accepts zero-copy pool configurations without verifying that the device MTU fits into the usable frame space provided by the UMEM chunk. This becomes a problem since we started to respect tailroom which is subtracted from chunk_size (among with headroom). 2k chunk size might not provide enough space for standard 1500 MTU, so let us catch such settings at bind time. Furthermore, validate whether underlying HW will be able to satisfy configured MTU wrt XSK's frame size multiplied by supported Rx buffer chain length (that is exposed via net_device::xdp_zc_max_segs). Fixes: `24ea50127e` ("xsk: support mbuf on ZC RX") Reviewed-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-5-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Maciej Fijalkowski	93e84fe45b	xsk: fix XDP_UMEM_SG_FLAG issues Currently xp_assign_dev_shared() is missing XDP_USE_SG being propagated to flags so set it in order to preserve mtu check that is supposed to be done only when no multi-buffer setup is in picture. Also, this flag has the same value as XDP_UMEM_TX_SW_CSUM so we could get unexpected SG setups for software Tx checksums. Since csum flag is UAPI, modify value of XDP_UMEM_SG_FLAG. Fixes: `d609f3d228` ("xsk: add multi-buffer support for sockets sharing umem") Reviewed-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-4-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Maciej Fijalkowski	1ee1605138	xsk: respect tailroom for ZC setups Multi-buffer XDP stores information about frags in skb_shared_info that sits at the tailroom of a packet. The storage space is reserved via xdp_data_hard_end(): ((xdp)->data_hard_start + (xdp)->frame_sz - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) and then we refer to it via macro below: static inline struct skb_shared_info * xdp_get_shared_info_from_buff(const struct xdp_buff xdp) { return (struct skb_shared_info )xdp_data_hard_end(xdp); } Currently we do not respect this tailroom space in multi-buffer AF_XDP ZC scenario. To address this, introduce xsk_pool_get_tailroom() and use it within xsk_pool_get_rx_frame_size() which is used in ZC drivers to configure length of HW Rx buffer. Typically drivers on Rx Hw buffers side work on 128 byte alignment so let us align the value returned by xsk_pool_get_rx_frame_size() in order to avoid addressing this on driver's side. This addresses the fact that idpf uses mentioned function before pool->dev being set so we were at risk that after subtracting tailroom we would not provide 128-byte aligned value to HW. Since xsk_pool_get_rx_frame_size() is actively used in xsk_rcv_check() and __xsk_rcv(), add a variant of this routine that will not include 128 byte alignment and therefore old behavior is preserved. Reviewed-by: Björn Töpel <bjorn@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: `24ea50127e` ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-3-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Maciej Fijalkowski	a315e022a7	xsk: tighten UMEM headroom validation to account for tailroom and min frame The current headroom validation in xdp_umem_reg() could leave us with insufficient space dedicated to even receive minimum-sized ethernet frame. Furthermore if multi-buffer would come to play then skb_shared_info stored at the end of XSK frame would be corrupted. HW typically works with 128-aligned sizes so let us provide this value as bare minimum. Multi-buffer setting is known later in the configuration process so besides accounting for 128 bytes, let us also take care of tailroom space upfront. Reviewed-by: Björn Töpel <bjorn@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: `99e3a236dd` ("xsk: Add missing check on user supplied headroom size") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20260402154958.562179-2-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:43:51 -07:00
Jakub Kicinski	1caa871bb0	Merge branch 'net-stmmac-fix-tegra234-mgbe-clock' Jon Hunter says: ==================== net: stmmac: Fix Tegra234 MGBE clock The name of the PTP ref clock for the Tegra234 MGBE ethernet controller does not match the generic name in the stmmac platform driver. Despite this basic ethernet is functional on the Tegra234 platforms that use this driver and as far as I know, we have not tested PTP support with this driver. Hence, the risk of breaking any functionality is low. The previous attempt to fix this in the stmmac platform driver, by supporting the Tegra234 PTP clock name, was rejected [0]. The preference from the netdev maintainers is to fix this in the DT binding for Tegra234. This series fixes this by correcting the device-tree binding to align with the generic name for the PTP clock. I understand that this is breaking the ABI for this device, which we should never do, but this is a last resort for getting this fixed. I am open to any better ideas to fix this. Please note that we still maintain backward compatibility in the driver to allow older device-trees to work, but we don't advertise this via the binding, because I did not see any value in doing so. ==================== Link: https://patch.msgid.link/20260401102941.17466-1-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 16:02:31 -07:00
Jon Hunter	fb22b1fc5b	dt-bindings: net: Fix Tegra234 MGBE PTP clock The PTP clock for the Tegra234 MGBE device is incorrectly named 'ptp-ref' and should be 'ptp_ref'. This is causing the following warning to be observed on Tegra234 platforms that use this device: ERR KERN tegra-mgbe 6800000.ethernet eth0: Invalid PTP clock rate WARNING KERN tegra-mgbe 6800000.ethernet eth0: PTP init failed Although this constitutes an ABI breakage in the binding for this device, PTP support has clearly never worked and so fix this now so we can correct the device-tree for this device. Note that the MGBE driver still supports the legacy 'ptp-ref' clock name and so older/existing device-trees will still work, but given that this is not the correct name, there is no point to advertise this in the binding. Fixes: `189c2e5c76` ("dt-bindings: net: Add Tegra234 MGBE") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260401102941.17466-3-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 16:02:30 -07:00
Jon Hunter	1345e9f4e3	net: stmmac: Fix PTP ref clock for Tegra234 Since commit `030ce919e1` ("net: stmmac: make sure that ptp_rate is not 0 before configuring timestamping") was added the following error is observed on Tegra234: ERR KERN tegra-mgbe 6800000.ethernet eth0: Invalid PTP clock rate WARNING KERN tegra-mgbe 6800000.ethernet eth0: PTP init failed It turns out that the Tegra234 device-tree binding defines the PTP ref clock name as 'ptp-ref' and not 'ptp_ref' and the above commit now exposes this and that the PTP clock is not configured correctly. In order to update device-tree to use the correct 'ptp_ref' name, update the Tegra MGBE driver to use 'ptp_ref' by default and fallback to using 'ptp-ref' if this clock name is present. Fixes: `d8ca113724` ("net: stmmac: tegra: Add MGBE support") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260401102941.17466-2-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 16:02:21 -07:00
Pengpeng Hou	5c14a19d5b	nfc: s3fwrn5: allocate rx skb before consuming bytes s3fwrn82_uart_read() reports the number of accepted bytes to the serdev core. The current code consumes bytes into recv_skb and may already deliver a complete frame before allocating a fresh receive buffer. If that alloc_skb() fails, the callback returns 0 even though it has already consumed bytes, and it leaves recv_skb as NULL for the next receive callback. That breaks the receive_buf() accounting contract and can also lead to a NULL dereference on the next skb_put_u8(). Allocate the receive skb lazily before consuming the next byte instead. If allocation fails, return the number of bytes already accepted. Fixes: `3f52c2cb7e` ("nfc: s3fwrn5: Support a UART interface") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Link: https://patch.msgid.link/20260402042148.65236-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:57:46 -07:00
Chris J Arges	77facb3522	net: increase IP_TUNNEL_RECURSION_LIMIT to 5 In configurations with multiple tunnel layers and MPLS lwtunnel routing, a single tunnel hop can increment the counter beyond this limit. This causes packets to be dropped with the "Dead loop on virtual device" message even when a routing loop doesn't exist. Increase IP_TUNNEL_RECURSION_LIMIT from 4 to 5 to handle this use-case. Fixes: `6f1a9140ec` ("net: add xmit recursion limit to tunnel xmit functions") Link: https://lore.kernel.org/netdev/88deb91b-ef1b-403c-8eeb-0f971f27e34f@redhat.com/ Signed-off-by: Chris J Arges <carges@cloudflare.com> Link: https://patch.msgid.link/20260402222401.3408368-1-carges@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:52:10 -07:00
Yiqi Sun	fde29fd934	ipv4: icmp: fix null-ptr-deref in icmp_build_probe() ipv6_stub->ipv6_dev_find() may return ERR_PTR(-EAFNOSUPPORT) when the IPv6 stack is not active (CONFIG_IPV6=m and not loaded), and passing this error pointer to dev_hold() will cause a kernel crash with null-ptr-deref. Instead, silently discard the request. RFC 8335 does not appear to define a specific response for the case where an IPv6 interface identifier is syntactically valid but the implementation cannot perform the lookup at runtime, and silently dropping the request may safer than misreporting "No Such Interface". Fixes: `d329ea5bd8` ("icmp: add response to RFC 8335 PROBE messages") Signed-off-by: Yiqi Sun <sunyiqixm@gmail.com> Link: https://patch.msgid.link/20260402070419.2291578-1-sunyiqixm@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:46:17 -07:00
Fernando Fernandez Mancera	14cf0cd353	ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop() When querying a nexthop object via RTM_GETNEXTHOP, the kernel currently allocates a fixed-size skb using NLMSG_GOODSIZE. While sufficient for single nexthops and small Equal-Cost Multi-Path groups, this fixed allocation fails for large nexthop groups like 512 nexthops. This results in the following warning splat: WARNING: net/ipv4/nexthop.c:3395 at rtm_get_nexthop+0x176/0x1c0, CPU#20: rep/4608 [...] RIP: 0010:rtm_get_nexthop (net/ipv4/nexthop.c:3395) [...] Call Trace: <TASK> rtnetlink_rcv_msg (net/core/rtnetlink.c:6989) netlink_rcv_skb (net/netlink/af_netlink.c:2550) netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) ____sys_sendmsg (net/socket.c:721 net/socket.c:736 net/socket.c:2585) ___sys_sendmsg (net/socket.c:2641) __sys_sendmsg (net/socket.c:2671) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> Fix this by allocating the size dynamically using nh_nlmsg_size() and using nlmsg_new(), this is consistent with nexthop_notify() behavior. In addition, adjust nh_nlmsg_size_grp() so it calculates the size needed based on flags passed. While at it, also add the size of NHA_FDB for nexthop group size calculation as it was missing too. This cannot be reproduced via iproute2 as the group size is currently limited and the command fails as follows: addattr_l ERROR: message exceeded bound of 1048 Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8Li2h4KO+AQFXW4S6Yb_u5X4oSKnkywW+LPFjuErhqELA@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:34:27 -07:00
Fernando Fernandez Mancera	06aaf04ca8	ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump Currently NHA_HW_STATS_ENABLE is included twice everytime a dump of nexthop group is performed with NHA_OP_FLAG_DUMP_STATS. As all the stats querying were moved to nla_put_nh_group_stats(), leave only that instance of the attribute querying. Fixes: `5072ae00ae` ("net: nexthop: Expose nexthop group HW stats to user space") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:34:27 -07:00

1 2 3 4 5 ...

1429547 Commits