linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-21 23:04:54 -04:00

Author	SHA1	Message	Date
Matt Johnston	a58893aa17	net: mctp: Fix bad kfree_skb in bind lookup test The kunit test's skb_pkt is consumed by mctp_dst_input() so shouldn't be freed separately. Fixes: `e6d8e7dbc5` ("net: mctp: Add bind lookup test") Reported-by: Alexandre Ghiti <alex@ghiti.fr> Closes: https://lore.kernel.org/all/734b02a3-1941-49df-a0da-ec14310d41e4@ghiti.fr/ Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://patch.msgid.link/20250812-fix-mctp-bind-test-v1-1-5e2128664eb3@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-13 17:07:34 -07:00
Wang Liang	4d18083d6b	vsock: use sizeof(struct sockaddr_storage) instead of magic value Previous commit `230b183921` ("net: Use standard structures for generic socket address structures.") use 'struct sockaddr_storage address;' to replace 'char address[MAX_SOCK_ADDR];'. The macro MAX_SOCK_ADDR is removed by commit `01893c82b4` ("net: Remove MAX_SOCK_ADDR constant"). The comment in vsock_getname() is outdated, use sizeof(struct sockaddr_storage) instead of magic value 128. Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20250812015929.1419896-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-13 17:05:21 -07:00
Pablo Neira Ayuso	cf5fb87fcd	netfilter: nf_tables: reject duplicate device on updates A chain/flowtable update with duplicated devices in the same batch is possible. Unfortunately, netdev event path only removes the first device that is found, leaving unregistered the hook of the duplicated device. Check if a duplicated device exists in the transaction batch, bail out with EEXIST in such case. WARNING is hit when unregistering the hook: [49042.221275] WARNING: CPU: 4 PID: 8425 at net/netfilter/core.c:340 nf_hook_entry_head+0xaa/0x150 [49042.221375] CPU: 4 UID: 0 PID: 8425 Comm: nft Tainted: G S 6.16.0+ #170 PREEMPT(full) [...] [49042.221382] RIP: 0010:nf_hook_entry_head+0xaa/0x150 Fixes: `78d9f48f7f` ("netfilter: nf_tables: add devices to existing flowtable") Fixes: `b9703ed44f` ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-08-13 08:34:55 +02:00
Frederic Weisbecker	c0a23bbc98	ipvs: Fix estimator kthreads preferred affinity The estimator kthreads' affinity are defined by sysctl overwritten preferences and applied through a plain call to the scheduler's affinity API. However since the introduction of managed kthreads preferred affinity, such a practice shortcuts the kthreads core code which eventually overwrites the target to the default unbound affinity. Fix this with using the appropriate kthread's API. Fixes: `d1a8919758` ("kthread: Default affine kthread to its preferred NUMA node") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-08-13 08:34:33 +02:00
Florian Westphal	30c1d25b98	netfilter: nft_set_pipapo: fix null deref for empty set Blamed commit broke the check for a null scratch map: - if (unlikely(!m \|\| !raw_cpu_ptr(m->scratch))) + if (unlikely(!raw_cpu_ptr(m->scratch))) This should have been "if (!raw_ ...)". Use the pattern of the avx2 version which is more readable. This can only be reproduced if avx2 support isn't available. Fixes: `d8d871a35c` ("netfilter: nft_set_pipapo: merge pipapo_get/lookup") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-08-13 08:34:32 +02:00
Jakub Kicinski	6db015fc4b	tls: handle data disappearing from under the TLS ULP TLS expects that it owns the receive queue of the TCP socket. This cannot be guaranteed in case the reader of the TCP socket entered before the TLS ULP was installed, or uses some non-standard read API (eg. zerocopy ones). Replace the WARN_ON() and a buggy early exit (which leaves anchor pointing to a freed skb) with real error handling. Wipe the parsing state and tell the reader to retry. We already reload the anchor every time we (re)acquire the socket lock, so the only condition we need to avoid is an out of bounds read (not having enough bytes in the socket for previously parsed record len). If some data was read from under TLS but there's enough in the queue we'll reload and decrypt what is most likely not a valid TLS record. Leading to some undefined behavior from TLS perspective (corrupting a stream? missing an alert? missing an attack?) but no kernel crash should take place. Reported-by: William Liu <will@willsroot.io> Reported-by: Savino Dicanosa <savy@syst3mfailure.io> Link: https://lore.kernel.org/tFjq_kf7sWIG3A7CrCg_egb8CVsT_gsmHAK0_wxDPJXfIzxFAMxqmLwp3MlU5EHiet0AwwJldaaFdgyHpeIUCS-3m3llsmRzp9xIOBR4lAI=@syst3mfailure.io Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250807232907.600366-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-12 18:59:05 -07:00
Thorsten Blum	b3ba7d929c	net/sched: Remove redundant memset(0) call in reset_policy() The call to nla_strscpy() already zero-pads the tail of the destination buffer which makes the additional memset(0) call redundant. Remove it. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250811164039.43250-1-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-12 17:13:29 -07:00
Eric Dumazet	86e3d52bd3	phonet: add __rcu annotations Removes following sparse errors. make C=2 net/phonet/socket.o net/phonet/af_phonet.o CHECK net/phonet/socket.c net/phonet/socket.c:619:14: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:619:14: struct sock [noderef] __rcu * net/phonet/socket.c:619:14: struct sock * net/phonet/socket.c:642:17: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:642:17: struct sock [noderef] __rcu * net/phonet/socket.c:642:17: struct sock * net/phonet/socket.c:658:17: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:658:17: struct sock [noderef] __rcu * net/phonet/socket.c:658:17: struct sock * net/phonet/socket.c:677:25: error: incompatible types in comparison expression (different address spaces): net/phonet/socket.c:677:25: struct sock [noderef] __rcu * net/phonet/socket.c:677:25: struct sock * net/phonet/socket.c:726:21: warning: context imbalance in 'pn_res_seq_start' - wrong count at exit net/phonet/socket.c:741:13: warning: context imbalance in 'pn_res_seq_stop' - wrong count at exit CHECK net/phonet/af_phonet.c net/phonet/af_phonet.c:35:14: error: incompatible types in comparison expression (different address spaces): net/phonet/af_phonet.c:35:14: struct phonet_protocol const [noderef] __rcu * net/phonet/af_phonet.c:35:14: struct phonet_protocol const * net/phonet/af_phonet.c:474:17: error: incompatible types in comparison expression (different address spaces): net/phonet/af_phonet.c:474:17: struct phonet_protocol const [noderef] __rcu * net/phonet/af_phonet.c:474:17: struct phonet_protocol const * net/phonet/af_phonet.c:486:9: error: incompatible types in comparison expression (different address spaces): net/phonet/af_phonet.c:486:9: struct phonet_protocol const [noderef] __rcu * net/phonet/af_phonet.c:486:9: struct phonet_protocol const * Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Rémi Denis-Courmont <courmisch@gmail.com> Link: https://patch.msgid.link/20250811145252.1007242-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-12 14:15:24 -07:00
Thorsten Blum	63fe077c21	caif: Replace memset(0) + strscpy() with strscpy_pad() Replace memset(0) followed by strscpy() with strscpy_pad() to improve cfctrl_linkup_request(). This avoids zeroing the memory before copying the string and ensures the destination buffer is only written to once, simplifying the code and improving efficiency. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20250811093442.5075-2-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-12 14:08:56 -07:00
Qianfeng Rong	7792232a4e	RDS: remove redundant __GFP_NOWARN GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the redundant __GFP_NOWARN. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20250810072944.438574-3-rongqianfeng@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-12 14:05:44 -07:00
Qianfeng Rong	526c2530cb	tcp: cdg: remove redundant __GFP_NOWARN GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the redundant __GFP_NOWARN. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250810072944.438574-2-rongqianfeng@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-12 14:05:43 -07:00
Jedrzej Jagielski	c5ec7f49b4	devlink: let driver opt out of automatic phys_port_name generation Currently when adding devlink port, phys_port_name is automatically generated within devlink port initialization flow. As a result adding devlink port support to driver may result in forced changes of interface names, which breaks already existing network configs. This is an expected behavior but in some scenarios it would not be preferable to provide such limitation for legacy driver not being able to keep 'pre-devlink' interface name. Add flag no_phys_port_name to devlink_port_attrs struct which indicates if devlink should not alter name of interface. Suggested-by: Jiri Pirko <jiri@resnulli.us> Link: https://lore.kernel.org/all/nbwrfnjhvrcduqzjl4a2jafnvvud6qsbxlvxaxilnryglf4j7r@btuqrimnfuly/ Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2025-08-12 13:23:39 -07:00
Martin KaFai Lau	9e293d47bf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Cross merge bpf/master after 6.17-rc1. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-08-12 12:14:02 -07:00
Paolo Abeni	c04fdca8a9	Merge tag 'ipsec-2025-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-08-11 1) Fix flushing of all states in xfrm_state_fini. From Sabrina Dubroca. 2) Fix some IPsec software offload features. These got lost with some recent HW offload changes. From Sabrina Dubroca. Please pull or let me know if there are problems. * tag 'ipsec-2025-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: udp: also consider secpath when evaluating ipsec use for checksumming xfrm: bring back device check in validate_xmit_xfrm xfrm: restore GSO for SW crypto xfrm: flush all states in xfrm_state_fini ==================== Link: https://patch.msgid.link/20250811092008.731573-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-12 15:01:09 +02:00
Jakub Kicinski	b3fc08ab9a	net: prevent deadlocks when enabling NAPIs with mixed kthread config The following order of calls currently deadlocks if: - device has threaded=1; and - NAPI has persistent config with threaded=0. netif_napi_add_weight_config() dev->threaded == 1 napi_kthread_create() napi_enable() napi_restore_config() napi_set_threaded(0) napi_stop_kthread() while (NAPIF_STATE_SCHED) msleep(20) We deadlock because disabled NAPI has STATE_SCHED set. Creating a thread in netif_napi_add() just to destroy it in napi_disable() is fairly ugly in the first place. Let's read both the device config and the NAPI config in netif_napi_add(). Fixes: `e6d7626881` ("net: Update threaded state in napi config in netif_set_threaded") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250809001205.1147153-4-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-12 14:43:05 +02:00
Jakub Kicinski	ccba9f6baa	net: update NAPI threaded config even for disabled NAPIs We have to make sure that all future NAPIs will have the right threaded state when the state is configured on the device level. We chose not to have an "unset" state for threaded, and not to wipe the NAPI config clean when channels are explicitly disabled. This means the persistent config structs "exist" even when their NAPIs are not instantiated. Differently put - the NAPI persistent state lives in the net_device (ncfg == struct napi_config): ,--- [napi 0] - [napi 1] [dev] \| \| `--- [ncfg 0] - [ncfg 1] so say we a device with 2 queues but only 1 enabled: ,--- [napi 0] [dev] \| `--- [ncfg 0] - [ncfg 1] now we set the device to threaded=1: ,---------- [napi 0 (thr:1)] [dev(thr:1)] \| `---------- [ncfg 0 (thr:1)] - [ncfg 1 (thr:?)] Since [ncfg 1] was not attached to a NAPI during configuration we skipped it. If we create a NAPI for it later it will have the old setting (presumably disabled). One could argue if this is right or not "in principle", but it's definitely not how things worked before per-NAPI config.. Fixes: `2677010e77` ("Add support to set NAPI threaded for individual NAPI") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250809001205.1147153-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-12 14:43:05 +02:00
Linus Torvalds	53e760d894	Merge tag 'nfsd-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - A correctness fix for delegated timestamps - Address an NFSD shutdown hang when LOCALIO is in use - Prevent a remotely exploitable crasher when TLS is in use * tag 'nfsd-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: sunrpc: fix handling of server side tls alerts nfsd: avoid ref leak in nfsd_open_local_fh() nfsd: don't set the ctime on delegated atime updates	2025-08-11 07:38:55 -07:00
Linus Torvalds	ccc1ead23c	Merge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - don't inherit NFS filesystem capabilities when crossing from one filesystem to another Bugfixes: - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers - NFS improve bounds checking in nfs_fh_to_dentry() - NFS Fix allocation errors when writing to a NFS file backed loopback device - NFSv4: More listxattr fixes - SUNRPC: fix client handling of TLS alerts - pNFS block/scsi layout fix for an uninitialised pointer dereference - pNFS block/scsi layout fixes for the extent encoding, stripe mapping, and disk offset overflows - pNFS layoutcommit work around for RPC size limitations - pNFS/flexfiles avoid looping when handling fatal errors after layoutget - localio: fix various race conditions Features and cleanups: - Add NFSv4 support for retrieving the btime - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC - NFS: Support using a kernel keyring to store TLS certificates - NFSv4: Speed up delegation lookup using a hash table - Assorted cleanups to remove unused variables and struct fields - Assorted new tracepoints to improve debugging" * tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits) NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() NFS/localio: nfs_close_local_fh() fix check for file closed NFSv4: Remove duplicate lookups, capability probes and fsinfo calls NFS: Fix the setting of capabilities when automounting a new filesystem sunrpc: fix client side handling of tls alerts nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock() NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY NFSv4.2: another fix for listxattr NFS: Fix filehandle bounds checking in nfs_fh_to_dentry() SUNRPC: Silence warnings about parameters not being described NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final() NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate() NFS: use a hash table for delegation lookup NFS: track active delegations per-server NFS: move the delegation_watermark module parameter NFS: cleanup nfs_inode_reclaim_delegation NFS: cleanup error handling in nfs4_server_common_setup pNFS/flexfiles: don't attempt pnfs on fatal DS errors NFS: drop __exit from nfs_exit_keyring ...	2025-08-09 07:20:44 +03:00
Xin Long	fd60d8a086	sctp: linearize cloned gso packets in sctp_rcv A cloned head skb still shares these frag skbs in fraglist with the original head skb. It's not safe to access these frag skbs. syzbot reported two use-of-uninitialized-memory bugs caused by this: BUG: KMSAN: uninit-value in sctp_inq_pop+0x15b7/0x1920 net/sctp/inqueue.c:211 sctp_inq_pop+0x15b7/0x1920 net/sctp/inqueue.c:211 sctp_assoc_bh_rcv+0x1a7/0xc50 net/sctp/associola.c:998 sctp_inq_push+0x2ef/0x380 net/sctp/inqueue.c:88 sctp_backlog_rcv+0x397/0xdb0 net/sctp/input.c:331 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1122 __release_sock+0x1da/0x330 net/core/sock.c:3106 release_sock+0x6b/0x250 net/core/sock.c:3660 sctp_wait_for_connect+0x487/0x820 net/sctp/socket.c:9360 sctp_sendmsg_to_asoc+0x1ec1/0x1f00 net/sctp/socket.c:1885 sctp_sendmsg+0x32b9/0x4a80 net/sctp/socket.c:2031 inet_sendmsg+0x25a/0x280 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:718 [inline] and BUG: KMSAN: uninit-value in sctp_assoc_bh_rcv+0x34e/0xbc0 net/sctp/associola.c:987 sctp_assoc_bh_rcv+0x34e/0xbc0 net/sctp/associola.c:987 sctp_inq_push+0x2a3/0x350 net/sctp/inqueue.c:88 sctp_backlog_rcv+0x3c7/0xda0 net/sctp/input.c:331 sk_backlog_rcv+0x142/0x420 include/net/sock.h:1148 __release_sock+0x1d3/0x330 net/core/sock.c:3213 release_sock+0x6b/0x270 net/core/sock.c:3767 sctp_wait_for_connect+0x458/0x820 net/sctp/socket.c:9367 sctp_sendmsg_to_asoc+0x223a/0x2260 net/sctp/socket.c:1886 sctp_sendmsg+0x3910/0x49f0 net/sctp/socket.c:2032 inet_sendmsg+0x269/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:712 [inline] This patch fixes it by linearizing cloned gso packets in sctp_rcv(). Fixes: `90017accff` ("sctp: Add GSO support") Reported-by: syzbot+773e51afe420baaf0e2b@syzkaller.appspotmail.com Reported-by: syzbot+70a42f45e76bede082be@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/dd7dc337b99876d4132d0961f776913719f7d225.1754595611.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-08 13:08:06 -07:00
Budimir Markovic	aba0c94f61	vsock: Do not allow binding to VMADDR_PORT_ANY It is possible for a vsock to autobind to VMADDR_PORT_ANY. This can cause a use-after-free when a connection is made to the bound socket. The socket returned by accept() also has port VMADDR_PORT_ANY but is not on the list of unbound sockets. Binding it will result in an extra refcount decrement similar to the one fixed in `fcdd2242c0` (vsock: Keep the binding until socket destruction). Modify the check in __vsock_bind_connectible() to also prevent binding to VMADDR_PORT_ANY. Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Reported-by: Budimir Markovic <markovicbudimir@gmail.com> Signed-off-by: Budimir Markovic <markovicbudimir@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250807041811.678-1-markovicbudimir@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-08 12:55:00 -07:00
Jakub Kicinski	64fdaa94bf	net: page_pool: allow enabling recycling late, fix false positive warning Page pool can have pages "directly" (locklessly) recycled to it, if the NAPI that owns the page pool is scheduled to run on the same CPU. To make this safe we check that the NAPI is disabled while we destroy the page pool. In most cases NAPI and page pool lifetimes are tied together so this happens naturally. The queue API expects the following order of calls: -> mem_alloc alloc new pp -> stop napi_disable -> start napi_enable -> mem_free free old pp Here we allocate the page pool in ->mem_alloc and free in ->mem_free. But the NAPIs are only stopped between ->stop and ->start. We created page_pool_disable_direct_recycling() to safely shut down the recycling in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't have to worry about recycling any more. Unfortunately, the page_pool_disable_direct_recycling() is not enough to deal with failures which necessitate freeing the _new_ page pool. If we hit a failure in ->mem_alloc or ->stop the new page pool has to be freed while the NAPI is active (assuming driver attaches the page pool to an existing NAPI instance and doesn't reallocate NAPIs). Freeing the new page pool is technically safe because it hasn't been used for any packets, yet, so there can be no recycling. But the check in napi_assert_will_not_race() has no way of knowing that. We could check if page pool is empty but that'd make the check much less likely to trigger during development. Add page_pool_enable_direct_recycling(), pairing with page_pool_disable_direct_recycling(). It will allow us to create the new page pools in "disabled" state and only enable recycling when we know the reconfig operation will not fail. Coincidentally it will also let us re-enable the recycling for the old pool, if the reconfig failed: -> mem_alloc (new) -> stop (old) # disables direct recycling for old -> start (new) # fail!! -> start (old) # go back to old pp but direct recycling is lost :( -> mem_free (new) The new helper is idempotent to make the life easier for drivers, which can operate in HDS mode and support zero-copy Rx. The driver can call the helper twice whether there are two pools or it has multiple references to a single pool. Fixes: `40eca00ae6` ("bnxt_en: unlink page pool when stopping Rx queue") Tested-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250805003654.2944974-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-08 12:54:42 -07:00
Jakub Kicinski	f6a2a31043	Merge tag 'nf-25-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Reinstantiate Florian Westphal as a Netfilter maintainer. 2) Depend on both NETFILTER_XTABLES and NETFILTER_XTABLES_LEGACY, from Arnd Bergmann. 3) Use id to annotate last conntrack/expectation visited to resume netlink dump, patches from Florian Westphal. 4) Fix bogus element in nft_pipapo avx2 lookup, introduced in the last nf-next batch of updates, also from Florian. 5) Return 0 instead of recycling ret variable in nf_conntrack_log_invalid_sysctl(), introduced in the last nf-next batch of updates, from Dan Carpenter. 6) Fix WARN_ON_ONCE triggered by syzbot with larger cgroup level in nft_socket. * tag 'nf-25-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_socket: remove WARN_ON_ONCE with huge level value netfilter: conntrack: clean up returns in nf_conntrack_log_invalid_sysctl() netfilter: nft_set_pipapo: don't return bogus extension pointer netfilter: ctnetlink: remove refcounting in expectation dumpers netfilter: ctnetlink: fix refcount leak on table dump netfilter: add back NETFILTER_XTABLES dependencies MAINTAINERS: resurrect my netfilter maintainer entry ==================== Link: https://patch.msgid.link/20250807112948.1400523-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-08 11:45:14 -07:00
Linus Torvalds	3781648824	Merge tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: Previous releases - regressions: - netlink: avoid infinite retry looping in netlink_unicast() Previous releases - always broken: - packet: fix a race in packet_set_ring() and packet_notifier() - ipv6: reject malicious packets in ipv6_gso_segment() - sched: mqprio: fix stack out-of-bounds write in tc entry parsing - net: drop UFO packets (injected via virtio) in udp_rcv_segment() - eth: mlx5: correctly set gso_segs when LRO is used, avoid false positive checksum validation errors - netpoll: prevent hanging NAPI when netcons gets enabled - phy: mscc: fix parsing of unicast frames for PTP timestamping - a number of device tree / OF reference leak fixes" * tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits) pptp: fix pptp_xmit() error path net: ti: icssg-prueth: Fix skb handling for XDP_PASS net: Update threaded state in napi config in netif_set_threaded selftests: netdevsim: Xfail nexthop test on slow machines eth: fbnic: Lock the tx_dropped update eth: fbnic: Fix tx_dropped reporting eth: fbnic: remove the debugging trick of super high page bias net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect dt-bindings: net: Replace bouncing Alexandru Tachici emails dpll: zl3073x: ZL3073X_I2C and ZL3073X_SPI should depend on NET net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing Revert "net: mdio_bus: Use devm for getting reset GPIO" selftests: net: packetdrill: xfail all problems on slow machines net/packet: fix a race in packet_set_ring() and packet_notifier() benet: fix BUG when creating VFs net: airoha: npu: Add missing MODULE_FIRMWARE macros net: devmem: fix DMA direction on unmapping ipa: fix compile-testing with qcom-mdt=m eth: fbnic: unlink NAPIs from queues on error to open net: Add locking to protect skb->dev access in ip_output ...	2025-08-08 07:03:25 +03:00
Li Jun	fa47913284	bpf: Standardize function declaration style 'noinlne' after 'int' cause "ERROR: inline keyword should sit between storage class and type" by checkpatch.pl - Standardize function declaration style by moving 'noinline' modifier - Fix asm volatile statement formatting Signed-off-by: Li Jun <lijun01@kylinos.cn> Link: https://lore.kernel.org/r/20250730105019.436235-1-lijun01@kylinos.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-07 19:18:03 -07:00
Pablo Neira Ayuso	1dee968d22	netfilter: nft_socket: remove WARN_ON_ONCE with huge level value syzbot managed to reach this WARN_ON_ONCE by passing a huge level value, remove it. WARNING: CPU: 0 PID: 5853 at net/netfilter/nft_socket.c:220 nft_socket_init+0x2f4/0x3d0 net/netfilter/nft_socket.c:220 Reported-by: syzbot+a225fea35d7baf8dbdc3@syzkaller.appspotmail.com Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-08-07 13:19:26 +02:00
Dan Carpenter	f54186df80	netfilter: conntrack: clean up returns in nf_conntrack_log_invalid_sysctl() Smatch complains that these look like error paths with missing error codes, especially the one where we return if nf_log_is_registered() is true: net/netfilter/nf_conntrack_standalone.c:575 nf_conntrack_log_invalid_sysctl() warn: missing error code? 'ret' In fact, all these return zero deliberately. Change them to return a literal instead which helps readability as well as silencing the warning. Fixes: `e89a680466` ("netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Lance Yang <lance.yang@linux.dev> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-08-07 13:19:26 +02:00
Florian Westphal	c8a7c2c608	netfilter: nft_set_pipapo: don't return bogus extension pointer Dan Carpenter says: Commit `17a20e09f0` ("netfilter: nft_set: remove one argument from lookup and update functions") [..] leads to the following Smatch static checker warning: net/netfilter/nft_set_pipapo_avx2.c:1269 nft_pipapo_avx2_lookup() error: uninitialized symbol 'ext'. Fix this by initing ext to NULL and set it only once we've found a match. Fixes: `17a20e09f0` ("netfilter: nft_set: remove one argument from lookup and update functions") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/netfilter-devel/aJBzc3V5wk-yPOnH@stanley.mountain/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-08-07 13:19:26 +02:00
Florian Westphal	1492e3dcb2	netfilter: ctnetlink: remove refcounting in expectation dumpers Same pattern as previous patch: do not keep the expectation object alive via refcount, only store a cookie value and then use that as the skip hint for dump resumption. AFAICS this has the same issue as the one resolved in the conntrack dumper, when we do if (!refcount_inc_not_zero(&exp->use)) to increment the refcount, there is a chance that exp == last, which causes a double-increment of the refcount and subsequent memory leak. Fixes: `cf6994c2b9` ("[NETFILTER]: nf_conntrack_netlink: sync expectation dumping with conntrack table dumping") Fixes: `e844a92843` ("netfilter: ctnetlink: allow to dump expectation per master conntrack") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-08-07 13:19:26 +02:00
Florian Westphal	de788b2e62	netfilter: ctnetlink: fix refcount leak on table dump There is a reference count leak in ctnetlink_dump_table(): if (res < 0) { nf_conntrack_get(&ct->ct_general); // HERE cb->args[1] = (unsigned long)ct; ... While its very unlikely, its possible that ct == last. If this happens, then the refcount of ct was already incremented. This 2nd increment is never undone. This prevents the conntrack object from being released, which in turn keeps prevents cnet->count from dropping back to 0. This will then block the netns dismantle (or conntrack rmmod) as nf_conntrack_cleanup_net_list() will wait forever. This can be reproduced by running conntrack_resize.sh selftest in a loop. It takes ~20 minutes for me on a preemptible kernel on average before I see a runaway kworker spinning in nf_conntrack_cleanup_net_list. One fix would to change this to: if (res < 0) { if (ct != last) nf_conntrack_get(&ct->ct_general); But this reference counting isn't needed in the first place. We can just store a cookie value instead. A followup patch will do the same for ctnetlink_exp_dump_table, it looks to me as if this has the same problem and like ctnetlink_dump_table, we only need a 'skip hint', not the actual object so we can apply the same cookie strategy there as well. Fixes: `d205dc4079` ("[NETFILTER]: ctnetlink: fix deadlock in table dumping") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-08-07 13:19:25 +02:00
Arnd Bergmann	25a8b88f00	netfilter: add back NETFILTER_XTABLES dependencies Some Kconfig symbols were changed to depend on the 'bool' symbol NETFILTER_XTABLES_LEGACY, which means they can now be set to built-in when the xtables code itself is in a loadable module: x86_64-linux-ld: vmlinux.o: in function `arpt_unregister_table_pre_exit': (.text+0x1831987): undefined reference to `xt_find_table' x86_64-linux-ld: vmlinux.o: in function `get_info.constprop.0': arp_tables.c:(.text+0x1831aab): undefined reference to `xt_request_find_table_lock' x86_64-linux-ld: arp_tables.c:(.text+0x1831bea): undefined reference to `xt_table_unlock' x86_64-linux-ld: vmlinux.o: in function `do_arpt_get_ctl': arp_tables.c:(.text+0x183205d): undefined reference to `xt_find_table_lock' x86_64-linux-ld: arp_tables.c:(.text+0x18320c1): undefined reference to `xt_table_unlock' x86_64-linux-ld: arp_tables.c:(.text+0x183219a): undefined reference to `xt_recseq' Change these to depend on both NETFILTER_XTABLES and NETFILTER_XTABLES_LEGACY. Fixes: `9fce66583f` ("netfilter: Exclude LEGACY TABLES on PREEMPT_RT.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Florian Westphal <fw@strlen.de> Tested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-08-07 13:19:25 +02:00
Sabrina Dubroca	1118aaa3b3	udp: also consider secpath when evaluating ipsec use for checksumming Commit `b40c5f4fde` ("udp: disable inner UDP checksum offloads in IPsec case") tried to fix checksumming in UFO when the packets are going through IPsec, so that we can't rely on offloads because the UDP header and payload will be encrypted. But when doing a TCP test over VXLAN going through IPsec transport mode with GSO enabled (esp4_offload module loaded), I'm seeing broken UDP checksums on the encap after successful decryption. The skbs get to udp4_ufo_fragment/__skb_udp_tunnel_segment via __dev_queue_xmit -> validate_xmit_skb -> skb_gso_segment and at this point we've already dropped the dst (unless the device sets IFF_XMIT_DST_RELEASE, which is not common), so need_ipsec is false and we proceed with checksum offload. Make need_ipsec also check the secpath, which is not dropped on this callpath. Fixes: `b40c5f4fde` ("udp: disable inner UDP checksum offloads in IPsec case") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-08-07 08:07:15 +02:00
Sabrina Dubroca	65f079a6c4	xfrm: bring back device check in validate_xmit_xfrm This is partial revert of commit `d53dda291b`. This change causes traffic using GSO with SW crypto running through a NIC capable of HW offload to no longer get segmented during validate_xmit_xfrm, and is unrelated to the bonding use case mentioned in the commit. Fixes: `d53dda291b` ("xfrm: Remove unneeded device check from validate_xmit_xfrm") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-08-07 08:07:01 +02:00
Sabrina Dubroca	234d1eff5d	xfrm: restore GSO for SW crypto Commit `49431af6c4` incorrectly assumes that the GSO path is only used by HW offload, but it's also useful for SW crypto. This patch re-enables GSO for SW crypto. It's not an exact revert to preserve the other changes made to xfrm_dev_offload_ok afterwards, but it reverts all of its effects. Fixes: `49431af6c4` ("xfrm: rely on XFRM offload") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-08-07 08:05:10 +02:00
Olga Kornievskaia	bee47cb026	sunrpc: fix handling of server side tls alerts Scott Mayhew discovered a security exploit in NFS over TLS in tls_alert_recv() due to its assumption it can read data from the msg iterator's kvec.. kTLS implementation splits TLS non-data record payload between the control message buffer (which includes the type such as TLS aler or TLS cipher change) and the rest of the payload (say TLS alert's level/description) which goes into the msg payload buffer. This patch proposes to rework how control messages are setup and used by sock_recvmsg(). If no control message structure is setup, kTLS layer will read and process TLS data record types. As soon as it encounters a TLS control message, it would return an error. At that point, NFS can setup a kvec backed msg buffer and read in the control message such as a TLS alert. Msg iterator can advance the kvec pointer as a part of the copy process thus we need to revert the iterator before calling into the tls_alert_recv. Reported-by: Scott Mayhew <smayhew@redhat.com> Fixes: `5e052dda12` ("SUNRPC: Recognize control messages in server-side TCP socket code") Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-08-06 09:57:50 -04:00
Sabrina Dubroca	42e42562c9	xfrm: flush all states in xfrm_state_fini While reverting commit `f75a2804da` ("xfrm: destroy xfrm_state synchronously on net exit path"), I incorrectly changed xfrm_state_flush's "proto" argument back to IPSEC_PROTO_ANY. This reverts some of the changes in commit `dbb2483b2a` ("xfrm: clean up xfrm protocol checks"), and leads to some states not being removed when we exit the netns. Pass 0 instead of IPSEC_PROTO_ANY from both xfrm_state_fini xfrm6_tunnel_net_exit, so that xfrm_state_flush deletes all states. Fixes: `2a198bbec6` ("Revert "xfrm: destroy xfrm_state synchronously on net exit path"") Reported-by: syzbot+6641a61fe0e2e89ae8c5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6641a61fe0e2e89ae8c5 Tested-by: syzbot+6641a61fe0e2e89ae8c5@syzkaller.appspotmail.com Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-08-06 09:23:38 +02:00
Samiullah Khawaja	e6d7626881	net: Update threaded state in napi config in netif_set_threaded Commit `2677010e77` ("Add support to set NAPI threaded for individual NAPI") added support to enable/disable threaded napi using netlink. This also extended the napi config save/restore functionality to set the napi threaded state. This breaks netdev reset for drivers that use napi threaded at device level and also use napi config save/restore on napi_disable/napi_enable. Basically on netdev with napi threaded enabled at device level, a napi_enable call will get stuck trying to stop the napi kthread. This is because the napi->config->threaded is set to disabled when threaded is enabled at device level. The issue can be reproduced on virtio-net device using qemu. To reproduce the issue run following, echo 1 > /sys/class/net/threaded ethtool -L eth0 combined 1 Update the threaded state in napi config in netif_set_threaded and add a new test that verifies this scenario. Tested on qemu with virtio-net: NETIF=eth0 ./tools/testing/selftests/drivers/net/napi_threaded.py TAP version 13 1..2 ok 1 napi_threaded.change_num_queues ok 2 napi_threaded.enable_dev_threaded_disable_napi_threaded # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: `2677010e77` ("Add support to set NAPI threaded for individual NAPI") Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20250804164457.2494390-1-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-05 17:46:15 -07:00
Maher Azzouzi	ffd2dc4c6c	net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing TCA_MQPRIO_TC_ENTRY_INDEX is validated using NLA_POLICY_MAX(NLA_U32, TC_QOPT_MAX_QUEUE), which allows the value TC_QOPT_MAX_QUEUE (16). This leads to a 4-byte out-of-bounds stack write in the fp[] array, which only has room for 16 elements (0–15). Fix this by changing the policy to allow only up to TC_QOPT_MAX_QUEUE - 1. Fixes: `f62af20bed` ("net/sched: mqprio: allow per-TC user input of FP adminStatus") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Maher Azzouzi <maherazz04@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250802001857.2702497-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-04 17:22:20 -07:00
Quang Le	01d3c8417b	net/packet: fix a race in packet_set_ring() and packet_notifier() When packet_set_ring() releases po->bind_lock, another thread can run packet_notifier() and process an NETDEV_UP event. This race and the fix are both similar to that of commit `15fe076ede` ("net/packet: fix a race in packet_bind() and packet_notifier()"). There too the packet_notifier NETDEV_UP event managed to run while a po->bind_lock critical section had to be temporarily released. And the fix was similarly to temporarily set po->num to zero to keep the socket unhooked until the lock is retaken. The po->bind_lock in packet_set_ring and packet_notifier precede the introduction of git history. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250801175423.2970334-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-04 17:21:27 -07:00
Jakub Kicinski	fa516c0d8b	net: devmem: fix DMA direction on unmapping Looks like we always unmap the DMA_BUF with DMA_FROM_DEVICE direction. While at it unexport __net_devmem_dmabuf_binding_free(), it's internal. Found by code inspection. Fixes: `bd61848900` ("net: devmem: Implement TX path") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250801011335.2267515-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-04 17:15:38 -07:00
Olga Kornievskaia	cc5d59081f	sunrpc: fix client side handling of tls alerts A security exploit was discovered in NFS over TLS in tls_alert_recv due to its assumption that there is valid data in the msghdr's iterator's kvec. Instead, this patch proposes the rework how control messages are setup and used by sock_recvmsg(). If no control message structure is setup, kTLS layer will read and process TLS data record types. As soon as it encounters a TLS control message, it would return an error. At that point, NFS can setup a kvec backed control buffer and read in the control message such as a TLS alert. Scott found that a msg iterator can advance the kvec pointer as a part of the copy process thus we need to revert the iterator before calling into the tls_alert_recv. Fixes: `dea034b963` ("SUNRPC: Capture CMSG metadata on client-side receive") Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Suggested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Link: https://lore.kernel.org/r/20250731180058.4669-3-okorniev@redhat.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-08-03 12:45:47 -07:00
Linus Torvalds	a6923c06a3	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix kCFI failures in JITed BPF code on arm64 (Sami Tolvanen, Puranjay Mohan, Mark Rutland, Maxwell Bland) - Disallow tail calls between BPF programs that use different cgroup local storage maps to prevent out-of-bounds access (Daniel Borkmann) - Fix unaligned access in flow_dissector and netfilter BPF programs (Paul Chaignon) - Avoid possible use of uninitialized mod_len in libbpf (Achill Gilgenast) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test for unaligned flow_dissector ctx access bpf: Improve ctx access verifier error message bpf: Check netfilter ctx accesses are aligned bpf: Check flow_dissector ctx accesses are aligned arm64/cfi,bpf: Support kCFI + BPF on arm64 cfi: Move BPF CFI types and helpers to generic code cfi: add C CFI type macro libbpf: Avoid possible use of uninitialized mod_len bpf: Fix oob access in cgroup local storage bpf: Move cgroup iterator helpers to bpf.h bpf: Move bpf map owner out of common struct bpf: Add cookie object to bpf maps	2025-08-01 17:13:26 -07:00
Sharath Chandra Vurukala	1dbf1d590d	net: Add locking to protect skb->dev access in ip_output In ip_output() skb->dev is updated from the skb_dst(skb)->dev this can become invalid when the interface is unregistered and freed, Introduced new skb_dst_dev_rcu() function to be used instead of skb_dst_dev() within rcu_locks in ip_output.This will ensure that all the skb's associated with the dev being deregistered will be transnmitted out first, before freeing the dev. Given that ip_output() is called within an rcu_read_lock() critical section or from a bottom-half context, it is safe to introduce an RCU read-side critical section within it. Multiple panic call stacks were observed when UL traffic was run in concurrency with device deregistration from different functions, pasting one sample for reference. [496733.627565][T13385] Call trace: [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0 [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498 [496733.627595][T13385] ip_finish_output+0xa4/0xf4 [496733.627605][T13385] ip_output+0x100/0x1a0 [496733.627613][T13385] ip_send_skb+0x68/0x100 [496733.627618][T13385] udp_send_skb+0x1c4/0x384 [496733.627625][T13385] udp_sendmsg+0x7b0/0x898 [496733.627631][T13385] inet_sendmsg+0x5c/0x7c [496733.627639][T13385] __sys_sendto+0x174/0x1e4 [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c [496733.627653][T13385] invoke_syscall+0x58/0x11c [496733.627662][T13385] el0_svc_common+0x88/0xf4 [496733.627669][T13385] do_el0_svc+0x2c/0xb0 [496733.627676][T13385] el0_svc+0x2c/0xa4 [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4 [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8 Changes in v3: - Replaced WARN_ON() with WARN_ON_ONCE(), as suggested by Willem de Bruijn. - Dropped legacy lines mistakenly pulled in from an outdated branch. Changes in v2: - Addressed review comments from Eric Dumazet - Used READ_ONCE() to prevent potential load/store tearing - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250730105118.GA26100@hu-sharathv-hyd.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-01 15:17:52 -07:00
Takamitsu Iwai	ae8508b25d	net/sched: taprio: enforce minimum value for picos_per_byte Syzbot reported a WARNING in taprio_get_start_time(). When link speed is 470,589 or greater, q->picos_per_byte becomes too small, causing length_to_duration(q, ETH_ZLEN) to return zero. This zero value leads to validation failures in fill_sched_entry() and parse_taprio_schedule(), allowing arbitrary values to be assigned to entry->interval and cycle_time. As a result, sched->cycle can become zero. Since SPEED_800000 is the largest defined speed in include/uapi/linux/ethtool.h, this issue can occur in realistic scenarios. To ensure length_to_duration() returns a non-zero value for minimum-sized Ethernet frames (ETH_ZLEN = 60), picos_per_byte must be at least 17 (60 * 17 > PSEC_PER_NSEC which is 1000). This patch enforces a minimum value of 17 for picos_per_byte when the calculated value would be lower, and adds a warning message to inform users that scheduling accuracy may be affected at very high link speeds. Fixes: `fb66df20a7` ("net/sched: taprio: extend minimum interval restriction to entire cycle too") Reported-by: syzbot+398e1ee4ca2cac05fddb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=398e1ee4ca2cac05fddb Signed-off-by: Takamitsu Iwai <takamitz@amazon.co.jp> Link: https://patch.msgid.link/20250728173149.45585-1-takamitz@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-01 15:15:28 -07:00
Eric Dumazet	d45cf1e7d7	ipv6: reject malicious packets in ipv6_gso_segment() syzbot was able to craft a packet with very long IPv6 extension headers leading to an overflow of skb->transport_header. This 16bit field has a limited range. Add skb_reset_transport_header_careful() helper and use it from ipv6_gso_segment() WARNING: CPU: 0 PID: 5871 at ./include/linux/skbuff.h:3032 skb_reset_transport_header include/linux/skbuff.h:3032 [inline] WARNING: CPU: 0 PID: 5871 at ./include/linux/skbuff.h:3032 ipv6_gso_segment+0x15e2/0x21e0 net/ipv6/ip6_offload.c:151 Modules linked in: CPU: 0 UID: 0 PID: 5871 Comm: syz-executor211 Not tainted 6.16.0-rc6-syzkaller-g7abc678e3084 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:skb_reset_transport_header include/linux/skbuff.h:3032 [inline] RIP: 0010:ipv6_gso_segment+0x15e2/0x21e0 net/ipv6/ip6_offload.c:151 Call Trace: <TASK> skb_mac_gso_segment+0x31c/0x640 net/core/gso.c:53 nsh_gso_segment+0x54a/0xe10 net/nsh/nsh.c:110 skb_mac_gso_segment+0x31c/0x640 net/core/gso.c:53 __skb_gso_segment+0x342/0x510 net/core/gso.c:124 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x857/0x11b0 net/core/dev.c:3950 validate_xmit_skb_list+0x84/0x120 net/core/dev.c:4000 sch_direct_xmit+0xd3/0x4b0 net/sched/sch_generic.c:329 __dev_xmit_skb net/core/dev.c:4102 [inline] __dev_queue_xmit+0x17b6/0x3a70 net/core/dev.c:4679 Fixes: `d1da932ed4` ("ipv6: Separate ipv6 offload support") Reported-by: syzbot+af43e647fd835acc02df@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/688a1a05.050a0220.5d226.0008.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250730131738.3385939-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-01 14:40:53 -07:00
Linus Torvalds	821c9e515d	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - vhost can now support legacy threading if enabled in Kconfig - vsock memory allocation strategies for large buffers have been improved, reducing pressure on kmalloc - vhost now supports the in-order feature. guest bits missed the merge window. - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (30 commits) vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers vsock/virtio: Rename virtio_vsock_skb_rx_put() vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers vsock/virtio: Move SKB allocation lower-bound check to callers vsock/virtio: Rename virtio_vsock_alloc_skb() vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page vsock/virtio: Move length check to callers of virtio_vsock_skb_rx_put() vsock/virtio: Validate length in packet header before skb_put() vhost/vsock: Avoid allocating arbitrarily-sized SKBs vhost_net: basic in_order support vhost: basic in order support vhost: fail early when __vhost_add_used() fails vhost: Reintroduce kthread API and add mode selection vdpa: Fix IDR memory leak in VDUSE module exit vdpa/mlx5: Fix release of uninitialized resources on error path vhost-scsi: Fix check for inline_sg_cnt exceeding preallocated limit virtio: virtio_dma_buf: fix missing parameter documentation vhost: Fix typos vhost: vringh: Remove unused functions vhost: vringh: Remove unused iotlb functions ...	2025-08-01 14:17:48 -07:00
Paul Chaignon	9e6448f7b1	bpf: Check netfilter ctx accesses are aligned Similarly to the previous patch fixing the flow_dissector ctx accesses, nf_is_valid_access also doesn't check that ctx accesses are aligned. Contrary to flow_dissector programs, netfilter programs don't have context conversion. The unaligned ctx accesses are therefore allowed by the verifier. Fixes: `fd9c663b9a` ("bpf: minimal support for programs hooked into netfilter framework") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/853ae9ed5edaa5196e8472ff0f1bb1cc24059214.1754039605.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-01 09:22:44 -07:00
Paul Chaignon	ead3d7b2b6	bpf: Check flow_dissector ctx accesses are aligned flow_dissector_is_valid_access doesn't check that the context access is aligned. As a consequence, an unaligned access within one of the exposed field is considered valid and later rejected by flow_dissector_convert_ctx_access when we try to convert it. The later rejection is problematic because it's reported as a verifier bug with a kernel warning and doesn't point to the right instruction in verifier logs. Fixes: `d58e468b11` ("flow_dissector: implements flow dissector BPF hook") Reported-by: syzbot+ccac90e482b2a81d74aa@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ccac90e482b2a81d74aa Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/cc1b036be484c99be45eddf48bd78cc6f72839b1.1754039605.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-01 09:22:44 -07:00
Will Deacon	6693731487	vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers When transmitting a vsock packet, virtio_transport_send_pkt_info() calls virtio_transport_alloc_linear_skb() to allocate and fill SKBs with the transmit data. Unfortunately, these are always linear allocations and can therefore result in significant pressure on kmalloc() considering that the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB allocation for each packet. Rework the vsock SKB allocation so that, for sizes with page order greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated instead with the packet header in the SKB and the transmit data in the fragments. Note that this affects both the vhost and virtio transports. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-10-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2025-08-01 09:11:09 -04:00
Will Deacon	8ca76151d2	vsock/virtio: Rename virtio_vsock_skb_rx_put() In preparation for using virtio_vsock_skb_rx_put() when populating SKBs on the vsock TX path, rename virtio_vsock_skb_rx_put() to virtio_vsock_skb_put(). No functional change. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-9-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2025-08-01 09:11:09 -04:00
Will Deacon	2304c64a28	vsock/virtio: Rename virtio_vsock_alloc_skb() In preparation for nonlinear allocations for large SKBs, rename virtio_vsock_alloc_skb() to virtio_vsock_alloc_linear_skb() to indicate that it returns linear SKBs unconditionally and switch all callers over to this new interface for now. No functional change. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-6-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2025-08-01 09:11:09 -04:00

... 13 14 15 16 17 ...

82231 Commits