linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-20 08:41:12 -04:00

Author	SHA1	Message	Date
Eric Dumazet	5f92385309	tcp: fix __tcp_close() to only send RST when required If the receive queue contains payload that was already received, __tcp_close() can send an unexpected RST. Refine the code to take tp->copied_seq into account, as we already do in tcp recvmsg(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250903084720.1168904-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-04 19:13:41 -07:00
Jakub Kicinski	5ef04a7b06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc5). No conflicts. Adjacent changes: include/net/sock.h `c51613fa27` ("net: add sk->sk_drop_counters") `5d6b58c932` ("net: lockless sock_i_ino()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-04 13:33:00 -07:00
Alexandra Winter	c975e1dfcc	net/smc: Improve log message for devices w/o pnetid Explicitly state in the log message, when a device has no pnetid. "with pnetid" and "has pnetid" was misleading for devices without pnetid. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250901145842.1718373-3-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-04 08:57:04 -07:00
Jakub Kicinski	d93b10e894	Merge tag 'nf-25-09-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net 1) Fix a silly bug in conntrack selftest, busyloop may get optimized to for (;;), reported by Yi Chen. 2) Introduce new NFTA_DEVICE_PREFIX attribute in nftables netlink api, re-using old NFTA_DEVICE_NAME led to confusion with different kernel/userspace versions. This refines the wildcard interface support added in 6.16 release. From Phil Sutter. * tag 'nf-25-09-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX selftests: netfilter: fix udpclash tool hang ==================== Link: https://patch.msgid.link/20250904072548.3267-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-04 06:59:27 -07:00
Jakub Kicinski	3ceb08838b	net: add helper to pre-check if PP for an Rx queue will be unreadable mlx5 pokes into the rxq state to check if the queue has a memory provider, and therefore whether it may produce unreadable mem. Add a helper for doing this in the page pool API. fbnic will want a similar thing (tho, for a slightly different reason). Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250901211214.1027927-11-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-04 10:19:17 +02:00
Yue Haibing	61481d72e1	ipv6: sit: Add ipip6_tunnel_dst_find() for cleanup Extract the dst lookup logic from ipip6_tunnel_xmit() into new helper ipip6_tunnel_dst_find() to reduce code duplication and enhance readability. No functional change intended. On a x86_64, with allmodconfig object size is also reduced: ./scripts/bloat-o-meter net/ipv6/sit.o net/ipv6/sit-new.o add/remove: 5/3 grow/shrink: 3/4 up/down: 1841/-2275 (-434) Function old new delta ipip6_tunnel_dst_find - 1697 +1697 __pfx_ipip6_tunnel_dst_find - 64 +64 __UNIQUE_ID_modinfo2094 - 43 +43 ipip6_tunnel_xmit.isra.cold 79 88 +9 __UNIQUE_ID_modinfo2096 12 20 +8 __UNIQUE_ID___addressable_init_module2092 - 8 +8 __UNIQUE_ID___addressable_cleanup_module2093 - 8 +8 __func__ 55 59 +4 __UNIQUE_ID_modinfo2097 20 18 -2 __UNIQUE_ID___addressable_init_module2093 8 - -8 __UNIQUE_ID___addressable_cleanup_module2094 8 - -8 __UNIQUE_ID_modinfo2098 18 - -18 __UNIQUE_ID_modinfo2095 43 12 -31 descriptor 112 56 -56 ipip6_tunnel_xmit.isra 9910 7758 -2152 Total: Before=72537, After=72103, chg -0.60% Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250901114857.1968513-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-04 10:03:59 +02:00
Wang Liang	0a228624bc	net: atm: fix memory leak in atm_register_sysfs when device_register fail When device_register() return error in atm_register_sysfs(), which can be triggered by kzalloc fail in device_private_init() or other reasons, kmemleak reports the following memory leaks: unreferenced object 0xffff88810182fb80 (size 8): comm "insmod", pid 504, jiffies 4294852464 hex dump (first 8 bytes): 61 64 75 6d 6d 79 30 00 adummy0. backtrace (crc 14dfadaf): __kmalloc_node_track_caller_noprof+0x335/0x450 kvasprintf+0xb3/0x130 kobject_set_name_vargs+0x45/0x120 dev_set_name+0xa9/0xe0 atm_register_sysfs+0xf3/0x220 atm_dev_register+0x40b/0x780 0xffffffffa000b089 do_one_initcall+0x89/0x300 do_init_module+0x27b/0x7d0 load_module+0x54cd/0x5ff0 init_module_from_file+0xe4/0x150 idempotent_init_module+0x32c/0x610 __x64_sys_finit_module+0xbd/0x120 do_syscall_64+0xa8/0x270 entry_SYSCALL_64_after_hwframe+0x77/0x7f When device_create_file() return error in atm_register_sysfs(), the same issue also can be triggered. Function put_device() should be called to release kobj->name memory and other device resource, instead of kfree(). Fixes: `1fa5ae857b` ("driver core: get rid of struct device's bus_id string array") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250901063537.1472221-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-04 09:53:44 +02:00
Phil Sutter	4039ce7ef4	netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX This new attribute is supposed to be used instead of NFTA_DEVICE_NAME for simple wildcard interface specs. It holds a NUL-terminated string representing an interface name prefix to match on. While kernel code to distinguish full names from prefixes in NFTA_DEVICE_NAME is simpler than this solution, reusing the existing attribute with different semantics leads to confusion between different versions of kernel and user space though: * With old kernels, wildcards submitted by user space are accepted yet silently treated as regular names. * With old user space, wildcards submitted by kernel may cause crashes since libnftnl expects NUL-termination when there is none. Using a distinct attribute type sanitizes these situations as the receiving part detects and rejects the unexpected attribute nested in *_HOOK_DEVS attributes. Fixes: `6d07a28950` ("netfilter: nf_tables: Support wildcard netdev hook specs") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-04 09:19:25 +02:00
Eric Dumazet	8156210d36	ax25: properly unshare skbs in ax25_kiss_rcv() Bernard Pidoux reported a regression apparently caused by commit `c353e8983e` ("net: introduce per netns packet chains"). skb->dev becomes NULL and we crash in __netif_receive_skb_core(). Before above commit, different kind of bugs or corruptions could happen without a major crash. But the root cause is that ax25_kiss_rcv() can queue/mangle input skb without checking if this skb is shared or not. Many thanks to Bernard Pidoux for his help, diagnosis and tests. We had a similar issue years ago fixed with commit `7aaed57c5c` ("phonet: properly unshare skbs in phonet_rcv()"). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Bernard Pidoux <f6bvp@free.fr> Closes: https://lore.kernel.org/netdev/1713f383-c538-4918-bc64-13b3288cd542@free.fr/ Tested-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Joerg Reuter <jreuter@yaina.de> Cc: David Ranch <dranch@trinnet.net> Cc: Folkert van Heusden <folkert@vanheusden.com> Reviewed-by: Dan Cross <crossd@gmail.com> Link: https://patch.msgid.link/20250902124642.212705-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 17:06:30 -07:00
Alok Tiwari	a125c8fb9d	mctp: return -ENOPROTOOPT for unknown getsockopt options In mctp_getsockopt(), unrecognized options currently return -EINVAL. In contrast, mctp_setsockopt() returns -ENOPROTOOPT for unknown options. Update mctp_getsockopt() to also return -ENOPROTOOPT for unknown options. This aligns the behavior of getsockopt() and setsockopt(), and matches the standard kernel socket API convention for handling unsupported options. Fixes: `99ce45d5e7` ("mctp: Implement extended addressing") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20250902102059.1370008-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 17:01:52 -07:00
Mahanta Jambigi	cc282f73bc	net/smc: Remove validation of reserved bits in CLC Decline message Currently SMC code is validating the reserved bits while parsing the incoming CLC decline message & when this validation fails, its treated as a protocol error. As a result, the SMC connection is terminated instead of falling back to TCP. As per RFC7609[1] specs we shouldn't be validating the reserved bits that is part of CLC message. This patch fixes this issue. CLC Decline message format can viewed here[2]. [1] https://datatracker.ietf.org/doc/html/rfc7609#page-92 [2] https://datatracker.ietf.org/doc/html/rfc7609#page-105 Fixes: `8ade200c26` ("net/smc: add v2 format of CLC decline message") Signed-off-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Link: https://patch.msgid.link/20250902082041.98996-1-mjambigi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 17:01:07 -07:00
Dan Carpenter	a51160f8da	ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init() The inetdev_init() function never returns NULL. Check for error pointers instead. Fixes: `22600596b6` ("ipv4: give an IPv4 dev to blackhole_netdev") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/aLaQWL9NguWmeM1i@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 16:58:44 -07:00
Eric Dumazet	5d6b58c932	net: lockless sock_i_ino() Followup of commit `c51da3f7a1` ("net: remove sock_i_uid()") A recent syzbot report was the trigger for this change. Over the years, we had many problems caused by the read_lock[_bh](&sk->sk_callback_lock) in sock_i_uid(). We could fix smc_diag_dump_proto() or make a more radical move: Instead of waiting for new syzbot reports, cache the socket inode number in sk->sk_ino, so that we no longer need to acquire sk->sk_callback_lock in sock_i_ino(). This makes socket dumps faster (one less cache line miss, and two atomic ops avoided). Prior art: commit `25a9c8a443` ("netlink: Add __sock_i_ino() for __netlink_diag_dump().") commit `4f9bf2a2f5` ("tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.") commit `efc3dbc374` ("rds: Make rds_sock_lock BH rather than IRQ safe.") Fixes: `d2d6422f8b` ("x86: Allow to enable PREEMPT_RT.") Reported-by: syzbot+50603c05bbdf4dfdaffa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68b73804.050a0220.3db4df.01d8.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250902183603.740428-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 16:08:24 -07:00
Jakub Kicinski	24ee9feeb3	Merge tag 'nf-next-25-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next 1) prefer vmalloc_array in ebtables, from Qianfeng Rong. 2) Use csum_replace4 instead of open-coding it, from Christophe Leroy. 3+4) Get rid of GFP_ATOMIC in transaction object allocations, those cause silly failures with large sets under memory pressure, from myself. 5) Remove test for AVX cpu feature in nftables pipapo set type, testing for AVX2 feature is sufficient. 6) Unexport a few function in nf_reject infra: no external callers. 7) Extend payload offset to u16, this was restricted to values <=255 so far, from Fernando Fernandez Mancera. * tag 'nf-next-25-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nft_payload: extend offset to 65535 bytes netfilter: nf_reject: remove unneeded exports netfilter: nft_set_pipapo: remove redundant test for avx feature bit netfilter: nf_tables: all transaction allocations can now sleep netfilter: nf_tables: allow iter callbacks to sleep netfilter: nft_payload: Use csum_replace4() instead of opencoding netfilter: ebtables: Use vmalloc_array() to improve code ==================== Link: https://patch.msgid.link/20250902133549.15945-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 16:06:45 -07:00
Asbjørn Sloth Tønnesen	9f9581ba74	netlink: specs: fou: change local-v6/peer-v6 check While updating the binary min-len implementation, I noticed that the only user, should AFAICT be using exact-len instead. In net/ipv4/fou_core.c FOU_ATTR_LOCAL_V6 and FOU_ATTR_PEER_V6 are only used for singular IPv6 addresses, and there are AFAICT no known implementations trying to send more, it therefore appears safe to change it to an exact-len policy. This patch therefore changes the local-v6/peer-v6 attributes to use an exact-len check, instead of a min-len check. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250902154640.759815-2-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 15:16:49 -07:00
Christoph Paasch	3bd4f98a4e	mptcp: record subflows in RPS table Accelerated Receive Flow Steering (aRFS) relies on sockets recording their RX flow hash into the rps_sock_flow_table so that incoming packets are steered to the CPU where the application runs. With MPTCP, the application interacts with the parent MPTCP socket while data is carried over per-subflow TCP sockets. Without recording these subflows, aRFS cannot steer interrupts and RX processing for the flows to the desired CPU. Record all subflows in the RPS table by calling sock_rps_record_flow() for each subflow at the start of mptcp_sendmsg(), mptcp_recvmsg() and mptcp_stream_accept(), by using the new helper mptcp_rps_record_subflows(). It does not by itself improve throughput, but ensures that IRQ and RX processing are directed to the right CPU, which is a prerequisite for effective aRFS. Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-4-fa02bb3188b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 15:08:20 -07:00
Eric Biggers	2d5be5629c	mptcp: use HMAC-SHA256 library instead of open-coded HMAC Now that there are easy-to-use HMAC-SHA256 library functions, use these in net/mptcp/crypto.c instead of open-coding the HMAC algorithm. Remove the WARN_ON_ONCE() for messages longer than SHA256_DIGEST_SIZE. The new implementation handles all message lengths correctly. The mptcp-crypto KUnit test still passes after this change. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-1-fa02bb3188b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 15:08:20 -07:00
Jakub Kicinski	c5142df58d	Merge tag 'wireless-2025-09-03' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few updates: - a set of buffer overflow fixes - ath11k: a fix for GTK rekeying - ath12k: a missed WiFi7 capability * tag 'wireless-2025-09-03' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: wilc1000: avoid buffer overflow in WID string configuration wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result() wifi: libertas: cap SSID len in lbs_associate() wifi: cw1200: cap SSID length in cw1200_do_join() wifi: ath11k: fix group data packet drops during rekey wifi: ath12k: Set EMLSR support flag in MLO flags for EML-capable stations ==================== Link: https://patch.msgid.link/20250903075602.30263-4-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-03 14:56:15 -07:00
Dan Carpenter	62b635dcd6	wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result() If the ssid->datalen is more than IEEE80211_MAX_SSID_LEN (32) it would lead to memory corruption so add some bounds checking. Fixes: `c38c701851` ("wifi: cfg80211: Set SSID if it is not already set") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/0aaaae4a3ed37c6252363c34ae4904b1604e8e32.1756456951.git.dan.carpenter@linaro.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-03 09:37:55 +02:00
Yue Haibing	3d95261eeb	ipv6: Add sanity checks on ipv6_devconf.rpl_seg_enabled In ipv6_rpl_srh_rcv() we use min(net->ipv6.devconf_all->rpl_seg_enabled, idev->cnf.rpl_seg_enabled) is intended to return 0 when either value is zero, but if one of the values is negative it will in fact return non-zero. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250901123726.1972881-3-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-02 17:01:14 -07:00
Yue Haibing	3a5f55500f	ipv6: annotate data-races around devconf->rpl_seg_enabled devconf->rpl_seg_enabled can be changed concurrently from /proc/sys/net/ipv6/conf, annotate lockless reads on it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250901123726.1972881-2-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-02 17:01:06 -07:00
Christoph Paasch	fa390321ab	net/tcp: Fix socket memory leak in TCP-AO failure handling for IPv6 When tcp_ao_copy_all_matching() fails in tcp_v6_syn_recv_sock() it just exits the function. This ends up causing a memory-leak: unreferenced object 0xffff0000281a8200 (size 2496): comm "softirq", pid 0, jiffies 4295174684 hex dump (first 32 bytes): 7f 00 00 06 7f 00 00 06 00 00 00 00 cb a8 88 13 ................ 0a 00 03 61 00 00 00 00 00 00 00 00 00 00 00 00 ...a............ backtrace (crc 5ebdbe15): kmemleak_alloc+0x44/0xe0 kmem_cache_alloc_noprof+0x248/0x470 sk_prot_alloc+0x48/0x120 sk_clone_lock+0x38/0x3b0 inet_csk_clone_lock+0x34/0x150 tcp_create_openreq_child+0x3c/0x4a8 tcp_v6_syn_recv_sock+0x1c0/0x620 tcp_check_req+0x588/0x790 tcp_v6_rcv+0x5d0/0xc18 ip6_protocol_deliver_rcu+0x2d8/0x4c0 ip6_input_finish+0x74/0x148 ip6_input+0x50/0x118 ip6_sublist_rcv+0x2fc/0x3b0 ipv6_list_rcv+0x114/0x170 __netif_receive_skb_list_core+0x16c/0x200 netif_receive_skb_list_internal+0x1f0/0x2d0 This is because in tcp_v6_syn_recv_sock (and the IPv4 counterpart), when exiting upon error, inet_csk_prepare_forced_close() and tcp_done() need to be called. They make sure the newsk will end up being correctly free'd. tcp_v4_syn_recv_sock() makes this very clear by having the put_and_exit label that takes care of things. So, this patch here makes sure tcp_v4_syn_recv_sock and tcp_v6_syn_recv_sock have similar error-handling and thus fixes the leak for TCP-AO. Fixes: `06b22ef295` ("net/tcp: Wire TCP-AO to request sockets") Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250830-tcpao_leak-v1-1-e5878c2c3173@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-02 15:58:22 -07:00
Eric Dumazet	5d14bbf9d1	net_sched: act: remove tcfa_qstats tcfa_qstats is currently only used to hold drops and overlimits counters. tcf_action_inc_drop_qstats() and tcf_action_inc_overlimit_qstats() currently acquire a->tcfa_lock to increment these counters. Switch to two atomic_t to get lock-free accounting. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250901093141.2093176-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-02 15:52:24 -07:00
Eric Dumazet	3016024d75	net_sched: add back BH safety to tcf_lock Jamal reported that we had to use BH safety after all, because stats can be updated from BH handler. Fixes: `3133d5c15c` ("net_sched: remove BH blocking in eight actions") Fixes: `53df77e785` ("net_sched: act_skbmod: use RCU in tcf_skbmod_dump()") Fixes: `e97ae74297` ("net_sched: act_tunnel_key: use RCU in tunnel_key_dump()") Fixes: `48b5e5dbdb` ("net_sched: act_vlan: use RCU in tcf_vlan_dump()") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Closes: https://lore.kernel.org/netdev/CAM0EoMmhq66EtVqDEuNik8MVFZqkgxFbMu=fJtbNoYD7YXg4bA@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250901092608.2032473-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-02 15:51:45 -07:00
James Flowers	d250f14f5f	net/smc: Replace use of strncpy on NUL-terminated string with strscpy strncpy is deprecated for use on NUL-terminated strings, as indicated in Documentation/process/deprecated.rst. strncpy NUL-pads the destination buffer and doesn't guarantee the destination buffer will be NUL terminated. Signed-off-by: James Flowers <bold.zone2373@fastmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250901030512.80099-1-bold.zone2373@fastmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-02 14:05:49 -07:00
Fernando Fernandez Mancera	077dc4a275	netfilter: nft_payload: extend offset to 65535 bytes In some situations 255 bytes offset is not enough to match or manipulate the desired packet field. Increase the offset limit to 65535 or U16_MAX. In addition, the nla policy maximum value is not set anymore as it is limited to s16. Instead, the maximum value is checked during the payload expression initialization function. Tested with the nft command line tool. table ip filter { chain output { @nh,2040,8 set 0xff @nh,524280,8 set 0xff @nh,524280,8 0xff @nh,2040,8 0xff } } Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:18 +02:00
Florian Westphal	f4f9e05904	netfilter: nf_reject: remove unneeded exports These functions have no external callers and can be static. Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:17 +02:00
Florian Westphal	8959f27d39	netfilter: nft_set_pipapo: remove redundant test for avx feature bit Sebastian points out that avx2 depends on avx, see check_cpufeature_deps() in arch/x86/kernel/cpu/cpuid-deps.c: avx2 feature bit will be cleared when avx isn't available. No functional change intended. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:17 +02:00
Florian Westphal	3d95a2e016	netfilter: nf_tables: all transaction allocations can now sleep Now that nft_setelem_flush is not called with rcu read lock held or disabled softinterrupts anymore this can now use GFP_KERNEL too. This is the last atomic allocation of transaction elements, so remove all gfp_t arguments and the wrapper function. This makes attempts to delete large sets much more reliable, before this was prone to transient memory allocation failures. Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:17 +02:00
Florian Westphal	a60a5abe19	netfilter: nf_tables: allow iter callbacks to sleep Quoting Sven Auhagen: we do see on occasions that we get the following error message, more so on x86 systems than on arm64: Error: Could not process rule: Cannot allocate memory delete table inet filter It is not a consistent error and does not happen all the time. We are on Kernel 6.6.80, seems to me like we have something along the lines of the nf_tables: allow clone callbacks to sleep problem using GFP_ATOMIC. As hinted at by Sven, this is because of GFP_ATOMIC allocations during set flush. When set is flushed, all elements are deactivated. This triggers a set walk and each element gets added to the transaction list. The rbtree and rhashtable sets don't allow the iter callback to sleep: rbtree walk acquires read side of an rwlock with bh disabled, rhashtable walk happens with rcu read lock held. Rbtree is simple enough to resolve: When the walk context is ITER_READ, no change is needed (the iter callback must not deactivate elements; we're not in a transaction). When the iter type is ITER_UPDATE, the rwlock isn't needed because the caller holds the transaction mutex, this prevents any and all changes to the ruleset, including add/remove of set elements. Rhashtable is slightly more complex. When the iter type is ITER_READ, no change is needed, like rbtree. For ITER_UPDATE, we hold transaction mutex which prevents elements from getting free'd, even outside of rcu read lock section. So build a temporary list of all elements while doing the rcu iteration and then call the iterator in a second pass. The disadvantage is the need to iterate twice, but this cost comes with the benefit to allow the iter callback to use GFP_KERNEL allocations in a followup patch. The new list based logic makes it necessary to catch recursive calls to the same set earlier. Such walk -> iter -> walk recursion for the same set can happen during ruleset validation in case userspace gave us a bogus (cyclic) ruleset where verdict map m jumps to chain that sooner or later also calls "vmap @m". Before the new ->in_update_walk test, the ruleset is rejected because the infinite recursion causes ctx->level to exceed the allowed maximum. But with the new logic added here, elements would get skipped: nft_rhash_walk_update would see elements that are on the walk_list of an older stack frame. As all recursive calls into same map results in -EMLINK, we can avoid this problem by using the new in_update_walk flag and reject immediately. Next patch converts the problematic GFP_ATOMIC allocations. Reported-by: Sven Auhagen <Sven.Auhagen@belden.com> Closes: https://lore.kernel.org/netfilter-devel/BY1PR18MB5874110CAFF1ED098D0BC4E7E07BA@BY1PR18MB5874.namprd18.prod.outlook.com/ Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:17 +02:00
Christophe Leroy	c015e17ba1	netfilter: nft_payload: Use csum_replace4() instead of opencoding Open coded calculation can be avoided and replaced by the equivalent csum_replace4() in nft_csum_replace(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:17 +02:00
Qianfeng Rong	46015e6b3e	netfilter: ebtables: Use vmalloc_array() to improve code Remove array_size() calls and replace vmalloc() with vmalloc_array() to simplify the code. vmalloc_array() is also optimized better, uses fewer instructions, and handles overflow more concisely[1]. [1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/ Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-02 15:28:17 +02:00
Jeremy Kerr	773b27a8a2	net: mctp: mctp_fraq_queue should take ownership of passed skb As of commit `f5d83cf0ee` ("net: mctp: unshare packets when reassembling"), we skb_unshare() in mctp_frag_queue(). The unshare may invalidate the original skb pointer, so we need to treat the skb as entirely owned by the fraq queue, even on failure. Fixes: `f5d83cf0ee` ("net: mctp: unshare packets when reassembling") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250829-mctp-skb-unshare-v1-1-1c28fe10235a@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-02 14:45:51 +02:00
Liu Jian	ba1e9421cf	net/smc: fix one NULL pointer dereference in smc_ib_is_sg_need_sync() BUG: kernel NULL pointer dereference, address: 00000000000002ec PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 28 UID: 0 PID: 343 Comm: kworker/28:1 Kdump: loaded Tainted: G OE 6.17.0-rc2+ #9 NONE Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_ib_is_sg_need_sync+0x9e/0xd0 [smc] ... Call Trace: <TASK> smcr_buf_map_link+0x211/0x2a0 [smc] __smc_buf_create+0x522/0x970 [smc] smc_buf_create+0x3a/0x110 [smc] smc_find_rdma_v2_device_serv+0x18f/0x240 [smc] ? smc_vlan_by_tcpsk+0x7e/0xe0 [smc] smc_listen_find_device+0x1dd/0x2b0 [smc] smc_listen_work+0x30f/0x580 [smc] process_one_work+0x18c/0x340 worker_thread+0x242/0x360 kthread+0xe7/0x220 ret_from_fork+0x13a/0x160 ret_from_fork_asm+0x1a/0x30 </TASK> If the software RoCE device is used, ibdev->dma_device is a null pointer. As a result, the problem occurs. Null pointer detection is added to prevent problems. Fixes: `0ef69e7884` ("net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Link: https://patch.msgid.link/20250828124117.2622624-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-02 10:51:16 +02:00
Jakub Kicinski	aca701c618	Merge tag 'batadv-net-pullrequest-20250901' of https://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix OOB read/write in network-coding decode, by Stanislav Fort * tag 'batadv-net-pullrequest-20250901' of https://git.open-mesh.org/linux-merge: batman-adv: fix OOB read/write in network-coding decode ==================== Link: https://patch.msgid.link/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 13:35:37 -07:00
Eric Dumazet	51ba2d26bc	inet: ping: use EXPORT_IPV6_MOD[_GPL]() There is no neeed to export ping symbols when CONFIG_IPV6=y Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250829153054.474201-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 13:15:14 -07:00
Eric Dumazet	689adb36bd	inet: ping: make ping_port_rover per netns Provide isolation between netns for ping idents. Randomize initial ping_port_rover value at netns creation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250829153054.474201-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 13:15:14 -07:00
Eric Dumazet	10343e7e6c	inet: ping: remove ping_hash() There is no point in keeping ping_hash(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250829153054.474201-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 13:15:14 -07:00
Eric Dumazet	59f26d86b2	inet: ping: check sock_net() in ping_get_port() and ping_lookup() We need to check socket netns before considering them in ping_get_port(). Otherwise, one malicious netns could 'consume' all ports. Add corresponding check in ping_lookup(). Fixes: `c319b4d76b` ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250829153054.474201-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 13:15:14 -07:00
Fabian Bläse	c6dd1aa2cb	icmp: fix icmp_ndo_send address translation for reply direction The icmp_ndo_send function was originally introduced to ensure proper rate limiting when icmp_send is called by a network device driver, where the packet's source address may have already been transformed by SNAT. However, the original implementation only considers the IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's source address with that of the original-direction tuple. This causes two problems: 1. For SNAT: Reply-direction packets were incorrectly translated using the source address of the CT original direction, even though no translation is required. 2. For DNAT: Reply-direction packets were not handled at all. In DNAT, the original direction's destination is translated. Therefore, in the reply direction the source address must be set to the reply-direction source, so rate limiting works as intended. Fix this by using the connection direction to select the correct tuple for source address translation, and adjust the pre-checks to handle reply-direction packets in case of DNAT. Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids possible KCSAN reports about concurrent updates to `ct->status`. Fixes: `0b41713b60` ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by: Fabian Bläse <fabian@blaese.de> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 12:54:41 -07:00
Kuniyuki Iwashima	7051b54fb5	tcp: Remove sk->sk_prot->orphan_count. TCP tracks the number of orphaned (SOCK_DEAD but not yet destructed) sockets in tcp_orphan_count. In some code that was shared with DCCP, tcp_orphan_count is referenced via sk->sk_prot->orphan_count. Let's reference tcp_orphan_count directly. inet_csk_prepare_for_destroy_sock() is moved to inet_connection_sock.c due to header dependency. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250829215641.711664-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 12:52:09 -07:00
Jakub Kicinski	0dffd938db	Merge tag 'for-net-2025-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - vhci: Prevent use-after-free by removing debugfs files early - L2CAP: Fix use-after-free in l2cap_sock_cleanup_listen() * tag 'for-net-2025-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix use-after-free in l2cap_sock_cleanup_listen() Bluetooth: vhci: Prevent use-after-free by removing debugfs files early ==================== Link: https://patch.msgid.link/20250829191210.1982163-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-01 12:32:05 -07:00
Stanislav Fort	d77b6ff0ce	batman-adv: fix OOB read/write in network-coding decode batadv_nc_skb_decode_packet() trusts coded_len and checks only against skb->len. XOR starts at sizeof(struct batadv_unicast_packet), reducing payload headroom, and the source skb length is not verified, allowing an out-of-bounds read and a small out-of-bounds write. Validate that coded_len fits within the payload area of both destination and source sk_buffs before XORing. Fixes: `2df5278b02` ("batman-adv: network coding - receive coded packets and decode them") Cc: stable@vger.kernel.org Reported-by: Stanislav Fort <disclosure@aisle.com> Signed-off-by: Stanislav Fort <stanislav.fort@aisle.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2025-08-31 17:01:35 +02:00
Eric Dumazet	6ad8de3cef	ipv4: start using dst_dev_rcu() Change icmpv4_xrlim_allow(), ip_defrag() to prevent possible UAF. Change ipmr_prepare_xmit(), ipmr_queue_fwd_xmit(), ip_mr_output(), ipv4_neigh_lookup() to use lockdep enabled dst_dev_rcu(). Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 19:36:32 -07:00
Eric Dumazet	b62a59c18b	tcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check() Use RCU to avoid a pair of atomic operations and a potential UAF on dst_dev()->flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 19:36:32 -07:00
Eric Dumazet	50c127a69c	tcp_metrics: use dst_dev_net_rcu() Replace three dst_dev() with a lockdep enabled helper. Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 19:36:32 -07:00
Eric Dumazet	99a2ace61b	net: use dst_dev_rcu() in sk_setup_caps() Use RCU to protect accesses to dst->dev from sk_setup_caps() and sk_dst_gso_max_size(). Also use dst_dev_rcu() in ip6_dst_mtu_maybe_forward(), and ip_dst_mtu_maybe_forward(). ip4_dst_hoplimit() can use dst_dev_net_rcu(). Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 19:36:32 -07:00
Eric Dumazet	11709573cc	ipv6: use RCU in ip6_output() Use RCU in ip6_output() in order to use dst_dev_rcu() to prevent possible UAF. We can remove rcu_read_lock()/rcu_read_unlock() pairs from ip6_finish_output2(). Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 19:36:32 -07:00
Eric Dumazet	9085e56501	ipv6: use RCU in ip6_xmit() Use RCU in ip6_xmit() in order to use dst_dev_rcu() to prevent possible UAF. Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 19:36:32 -07:00
Eric Dumazet	b775ecf165	ipv6: start using dst_dev_rcu() Refactor icmpv6_xrlim_allow() and ip6_dst_hoplimit() so that we acquire rcu_read_lock() a bit longer to be able to use dst_dev_rcu() instead of dst_dev(). __ip6_rt_update_pmtu() and rt6_do_redirect can directly use dst_dev_rcu() in sections already holding rcu_read_lock(). Small changes to use dst_dev_net_rcu() in ip6_default_advmss(), ipv6_sock_ac_join(), ip6_mc_find_dev() and ndisc_send_skb(). Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-29 19:36:32 -07:00

1 2 3 4 5 ...

81724 Commits