linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-03 22:03:25 -04:00

Author	SHA1	Message	Date
Jakub Kicinski	317bbe5301	Merge tag 'nf-26-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix small race windows in nf_ct_helper_log() when accessing helper, from Florian Westphal. 2) Fix potential infinite loop and race conditions in IPVS caused by frequent user-triggered service table changes, from Julia Anastasov. 3) Fix a race condition when dumping ipsets for restore, from Jozsef Kadlecsik. 4) Fix inner transport offset in IPv6 in nft_inner when extension headers come before the layer 4 transport header, from Yizhou Zhao. 5) Fix incorrect iteration over IPv4 ranges in several hash set types, from Nan Li. 6) Fix incorrect order when restoring BH in nft_inner_restore_tun_ctx(), from Florian Westphal. 7) Validate option array from ip6t_hbh checkpath() to fix an off-by-one access, from Zhengchuan Liang. 8) Fix race condition between ipset list -terse and concurrent updates, from Jozsef Kadlecisk. 9) Fix race condition when inserting elements into a hash bucket, also from Jozsef. 10) Annotate access to first free slot in hashtable, from Jozsef Kadlecsik. 11) Ensure sufficient headroom in br_netfilter neigh transmission, from Lorenzo Bianconi. 12) Hold reference on skb->dev in nfqueue exit path, bridge local input is speciall since skb->dev != state->indev, allowing for net_device to go away while packet is sitting in nfqueue. From Haoze Xie. * tag 'nf-26-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_queue: hold bridge skb->dev while queued netfilter: br_netfilter: Reallocate headroom if necessary in neigh_hh_bridge() netfilter: ipset: annotate "pos" for concurrent readers/writers netfilter: ipset: Fix data race between add and dump in all hash types netfilter: ipset: Fix data race between add and list header in all hash types netfilter: ip6t_hbh: reject oversized option lists netfilter: nft_inner: release local_lock before re-enabling softirqs netfilter: ipset: stop hash:* range iteration at end netfilter: nft_inner: Fix IPv6 inner_thoff desync netfilter: ipset: fix a potential dump-destroy race ipvs: avoid possible loop in ip_vs_dst_event on resizing netfilter: nf_conntrack_helper: fix possible null deref during error log ==================== Link: https://patch.msgid.link/20260516115627.967773-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-18 16:59:30 -07:00
Jakub Kicinski	aeff7e8e50	Merge tag 'batadv-net-pullrequest-20260515' of https://git.open-mesh.org/batadv Simon Wunderlich says: ==================== Here are various batman-adv bugfixes: - fix tp_meter counter underflow during shutdown, by Luxiao Xu - fix tp_meter tp_vars reference leak in receiver shutdown, by Sven Eckelmann - fix various translation table integer handling issues, by Sven Eckelmann (3 patches) - fix various translation table counter issues, by Sven Eckelmann (3 patches) - fix fragment reassembly length accounting, by Ruide Cao - clear current gateway during teardown, by Ruijie Li - handle forward allocation error in DAT, by Sven Eckelmann - tp_meter: avoid use of uninitialized sender variables in tp_meter, by Sven Eckelmann - disallow unicast fragment in fragment, by Sven Eckelmann - directly shut down tp_meter timer on cleanup, by Sven Eckelmann * tag 'batadv-net-pullrequest-20260515' of https://git.open-mesh.org/batadv: batman-adv: tp_meter: directly shut down timer on cleanup batman-adv: frag: disallow unicast fragment in fragment batman-adv: tp_meter: avoid use of uninit sender vars batman-adv: dat: handle forward allocation error batman-adv: clear current gateway during teardown batman-adv: fix fragment reassembly length accounting batman-adv: tt: prevent TVLV entry number overflow batman-adv: tt: avoid empty VLAN responses batman-adv: tt: fix TOCTOU race for reported vlans batman-adv: tt: fix negative last_changeset_len batman-adv: tt: fix negative tt_buff_len batman-adv: tt: reject oversized local TVLV buffers batman-adv: tp_meter: fix tp_vars reference leak in receiver shutdown batman-adv: fix tp_meter counter underflow during shutdown ==================== Link: https://patch.msgid.link/20260515095540.325586-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-18 16:50:04 -07:00
Jakub Kicinski	23f3b04f15	Merge tag 'for-net-2026-05-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - af_bluetooth: serialize accept_q access - L2CAP: ecred_reconfigure: send packed pdu, not stack pointer - btmtk: accept too short WMT FUNC_CTRL events - hci_qca: Convert timeout from jiffies to ms * tag 'for-net-2026-05-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_qca: Convert timeout from jiffies to ms Bluetooth: L2CAP: ecred_reconfigure: send packed pdu, not stack pointer Bluetooth: btmtk: accept too short WMT FUNC_CTRL events Bluetooth: serialize accept_q access ==================== Link: https://patch.msgid.link/20260514172340.1515042-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-18 16:40:05 -07:00
Ilya Maximets	bae3ee802c	openvswitch: vport: fix race between linking and the device notifier Sashiko reports that it is technically possible that we got the device reference, but by the time we're linking it to the OVS datapath, it may be already in the process of being deleted. In this case if the notifier wins the race for RTNL, it will see that the device is not yet in the OVS datapath (ovs_netdev_get_vport() will fail in the dp_device_event()) and will do nothing. Then the ovs_netdev_link() will take the RTNL and link the unregistering device to OVS datapath. Eventually, netdev_wait_allrefs_any() will re-broadcast the event and the device will be properly detached, but it will take at least a second before that happens, so it's not something we should rely on. Let's avoid linking the non-registered device in the first place. Note: As per documentation, RTNL doesn't protect the reg_state, but it actually does for all the state transitions we care about here, so it should not be necessary to use READ_ONCE or taking the instance lock. We can still do that, but we have a few more places even in this file where the reg_state is accessed without those while under RTNL, and many more places like this across the kernel code, so it might make more sense to change all of them in a more centralized fashion in the future, if necessary. Fixes: `ccb1352e76` ("net: Add Open vSwitch kernel components.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://patch.msgid.link/20260514184702.2461435-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-18 16:38:45 -07:00
Weiming Shi	9e7f36ab5b	net: appletalk: fix NULL pointer dereference in aarp_send_ddp() aarp_send_ddp() calls atalk_find_dev_addr(dev) in the LocalTalk fast path without checking for NULL. When the device has no AppleTalk interface configured (dev->atalk_ptr == NULL), this leads to a NULL pointer dereference at the at->s_net access. KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:aarp_send_ddp (net/appletalk/aarp.c:552 (discriminator 2)) Call Trace: <TASK> atalk_sendmsg (net/appletalk/ddp.c:1715) __sys_sendto (net/socket.c:2265 (discriminator 1)) __x64_sys_sendto (net/socket.c:2272) do_syscall_64 (arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) Add a NULL check consistent with the other callers of atalk_find_dev_addr(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Link: https://patch.msgid.link/20260514123806.3085961-3-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-18 16:33:34 -07:00
Haoze Xie	e196115ec3	netfilter: nf_queue: hold bridge skb->dev while queued br_pass_frame_up() rewrites skb->dev from the ingress port to the bridge master before queueing bridge LOCAL_IN packets. NFQUEUE only holds references on state.in/out and bridge physdevs, so a queued bridge packet can retain a freed bridge master in skb->dev until reinjection. When the verdict is reinjected later, br_netif_receive_skb() re-enters the receive path with skb->dev still pointing at the freed bridge master, triggering a use-after-free. Store skb->dev in the queue entry, hold a reference on it for the queue lifetime, and use the saved device when dropping queued packets during NETDEV_DOWN handling. Fixes: `ac28634456` ("netfilter: bridge: add nf_afinfo to enable queuing to userspace") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:23:01 +02:00
Lorenzo Bianconi	b2870fc216	netfilter: br_netfilter: Reallocate headroom if necessary in neigh_hh_bridge() neigh_hh_bridge() assumes the skb always has sufficient headroom to copy the aligned L2 header. This assumption can trigger the crash reported below using the following netfilter setup: $modprobe br_netfilter $sysctl -w net.bridge.bridge-nf-call-iptables=1 $root@OpenWrt:~# nft list ruleset table ip nat { chain prerouting { type nat hook prerouting priority dstnat; policy accept; ip daddr 192.168.83.123 dnat to 192.168.83.120 } } - iperf3 client (192.168.83.119) --> bridge (192.168.83.118) --> iperf3 server (192.168.83.120) the iperf3 client is sending packet for 192.168.83.123 to the bridge device. [ 1579.036575] Unable to handle kernel write to read-only memory at virtual address ffffff8004d76ffe [ 1579.045482] Mem abort info: [ 1579.048273] ESR = 0x000000009600004f [ 1579.052024] EC = 0x25: DABT (current EL), IL = 32 bits [ 1579.057363] SET = 0, FnV = 0 [ 1579.060417] EA = 0, S1PTW = 0 [ 1579.063550] FSC = 0x0f: level 3 permission fault [ 1579.068345] Data abort info: [ 1579.071224] ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000 [ 1579.076720] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 1579.081770] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1579.087092] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080dc4000 [ 1579.093794] [ffffff8004d76ffe] pgd=180000009ffff003, p4d=180000009ffff003, pud=180000009ffff003, pmd=180000009ffe3003, pte=0060000084d76787 [ 1579.106343] Internal error: Oops: 000000009600004f [#1] SMP [ 1579.193824] CPU: 0 UID: 0 PID: 235 Comm: napi/qdma_eth-3 Tainted: G O 6.12.57 #0 [ 1579.202614] Tainted: [O]=OOT_MODULE [ 1579.206102] Hardware name: Airoha AN7581 Evaluation Board (DT) [ 1579.211929] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1579.218889] pc : br_nf_pre_routing_finish_bridge+0x1ac/0xcc8 [br_netfilter] [ 1579.225859] lr : br_nf_pre_routing_finish_bridge+0x18c/0xcc8 [br_netfilter] [ 1579.232822] sp : ffffffc0817cba20 [ 1579.236128] x29: ffffffc0817cba20 x28: 0000000000000000 x27: ffffff8002b89000 [ 1579.243273] x26: ffffff8004d7700e x25: 0000000000000008 x24: 0000000000000000 [ 1579.250416] x23: ffffffc08179d4c0 x22: 0000000000000000 x21: ffffffc08179d4c0 [ 1579.257561] x20: ffffff8004d9b800 x19: ffffff8015010000 x18: 0000000000000014 [ 1579.264704] x17: ffffffbf9e930000 x16: ffffffc0817c8000 x15: 0000000000000070 [ 1579.271848] x14: 0000000000000080 x13: 0000000000000001 x12: 0000000000000000 [ 1579.278993] x11: ffffffc0798caae0 x10: ffffff8014db6fd8 x9 : 0000000000000000 [ 1579.286136] x8 : 0000000000000003 x7 : ffffffc08171f628 x6 : 000000001a3b83d3 [ 1579.293281] x5 : 0000000000000000 x4 : 1beb76f22fee0000 x3 : ffffff8004d7700e [ 1579.300425] x2 : 0000000000000000 x1 : ffffff8004d9b8bc x0 : ffffff80026ed000 [ 1579.307570] Call trace: [ 1579.310018] br_nf_pre_routing_finish_bridge+0x1ac/0xcc8 [br_netfilter] [ 1579.316632] br_nf_hook_thresh+0xd4/0x14bc [br_netfilter] [ 1579.322032] br_nf_hook_thresh+0x250/0x14bc [br_netfilter] [ 1579.327517] br_nf_hook_thresh+0x76c/0x14bc [br_netfilter] [ 1579.333003] br_handle_frame+0x180/0x480 [ 1579.336935] __netif_receive_skb_core.constprop.0+0x540/0xf40 [ 1579.342682] __netif_receive_skb_one_core+0x28/0x50 [ 1579.347561] process_backlog+0x98/0x1e0 [ 1579.351398] __napi_poll+0x34/0x1c4 [ 1579.354887] net_rx_action+0x178/0x330 [ 1579.358638] handle_softirqs+0x108/0x2d4 [ 1579.362560] __do_softirq+0x10/0x18 [ 1579.366051] ____do_softirq+0xc/0x20 [ 1579.369627] call_on_irq_stack+0x30/0x4c [ 1579.373550] do_softirq_own_stack+0x18/0x20 [ 1579.377734] do_softirq+0x4c/0x60 [ 1579.381050] __local_bh_enable_ip+0x88/0x98 [ 1579.385234] napi_threaded_poll_loop+0x188/0x21c [ 1579.389853] napi_threaded_poll+0x70/0x80 [ 1579.393863] kthread+0xd8/0xdc [ 1579.396918] ret_from_fork+0x10/0x20 [ 1579.400499] Code: 88dffc22 3707ffc2 f9406663 f9406684 (f81f0064) [ 1579.406589] ---[ end trace 0000000000000000 ]--- [ 1579.411209] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 1579.418083] SMP: stopping secondary CPUs [ 1579.422012] Kernel Offset: disabled Fix the issue reallocating the skb headroom if necessary in neigh_hh_bridge routine. Fixes: `e179e6322a` ("netfilter: bridge-netfilter: Fix MAC header handling with IP DNAT") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:22:50 +02:00
Jozsef Kadlecsik	7f7445840b	netfilter: ipset: annotate "pos" for concurrent readers/writers The "pos" structure member of struct hbucket stores the first free slot in the hash bucket of a hash type of set and there are concurrent readers/writers. Annotate accesses properly. Fixes: `18f84d41d3` ("netfilter: ipset: Introduce RCU locking in hash:* types") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:21:42 +02:00
Jozsef Kadlecsik	2358f7427c	netfilter: ipset: Fix data race between add and dump in all hash types When adding a new entry to the next position in the existing hash bucket, the position index was incremented too early and parallel dump could read it before the entry was populated with the value. Move the setting of the position index after populating the entry. v2: Position counting fixed, noticed by Florian Westphal. Fixes: `18f84d41d3` ("netfilter: ipset: Introduce RCU locking in hash:* types") Reported-by: syzbot+786c889f046e8b003ca6@syzkaller.appspotmail.com Reported-by: syzbot+1da17e4b41d795df059e@syzkaller.appspotmail.com Reported-by: syzbot+421c5f3ff8e9493084d9@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:21:42 +02:00
Jozsef Kadlecsik	c0c42a0fb2	netfilter: ipset: Fix data race between add and list header in all hash types The "ipset list -terse" command is actually a dump operation which may run parallel with "ipset add" commands, which can trigger an internal resizing of the hash type of sets just being dumped. However, dumping just the header part of the set was not protected against underlying resizing. Fix it by protecting the header dumping part as well. Fixes: `c4c997839c` ("netfilter: ipset: Fix parallel resizing and listing of the same set") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:21:41 +02:00
Zhengchuan Liang	4322dcde6b	netfilter: ip6t_hbh: reject oversized option lists struct ip6t_opts stores at most IP6T_OPTS_OPTSNR option descriptors, but hbh_mt6_check() does not reject larger optsnr values supplied from userspace. Validate optsnr in the rule setup path so only match data that fits the fixed-size opts array can be installed. This follows the existing xtables pattern of rejecting invalid user-provided counts in checkentry() and keeps the packet matching path unchanged. `struct ip6t_opts` has a fixed `opts[IP6T_OPTS_OPTSNR]` array, where `IP6T_OPTS_OPTSNR` is 16, then off-by-one array access is possible: [ 137.924693][ T8692] UBSAN: array-index-out-of-bounds in ../net/ipv6/netfilter/ip6t_hbh.c:110:29 [ 137.926167][ T8692] index 16 is out of range for type '__u16 [16]' Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:21:41 +02:00
Florian Westphal	a6cb3ff979	netfilter: nft_inner: release local_lock before re-enabling softirqs Quoting sashiko: In the error path, local_bh_enable() is called before local_unlock_nested_bh(). Fixes: `ba36fada9a` ("netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:21:41 +02:00
Nan Li	0d3a282ab5	netfilter: ipset: stop hash:* range iteration at end The following hash set variants: hash:ip,mark hash:ip,port hash:ip,port,ip hash:ip,port,net iterate IPv4 ranges with a 32-bit iterator. The iterator must stop once the last address in the requested range has been processed. Advancing it once more can move the traversal state past the end of the request, so a later retry may continue from an unintended position. Handle the iterator increment explicitly at the end of the loop and stop once the upper bound has been processed. This keeps the existing retry behaviour intact for valid ranges while preventing traversal from continuing past the original boundary. Fixes: `48596a8ddc` ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Nan Li <tonanli66@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 13:21:15 +02:00
Yizhou Zhao	b6a91f68eb	netfilter: nft_inner: Fix IPv6 inner_thoff desync In nft_inner_parse_l2l3(), when processing inner IPv6 packets, ipv6_find_hdr() correctly computes the transport header offset traversing all extension headers, but the result is immediately overwritten with nhoff + sizeof(_ip6h) (40 bytes), which only accounts for the IPv6 base header. This creates a desync between inner_thoff (wrong — points to extension header start) and l4proto (correct — e.g., IPPROTO_TCP), enabling transport header forgery and potential firewall bypass. This issue affects stable versions from Linux 6.2. For comparison, the normal (non-inner) IPv6 path correctly preserves ipv6_find_hdr()'s result. Removing the incorrect overwrite ensures that ipv6_find_hdr()'s calculated transport header offset is preserved, thereby fixing the desynchronization. Fixes: `3a07327d10` ("netfilter: nft_inner: support for inner tunnel header matching") Cc: stable@vger.kernel.org Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Assisted-by: GLM:5.1 Z.ai Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 12:19:56 +02:00
Jozsef Kadlecsik	53d7fd878c	netfilter: ipset: fix a potential dump-destroy race When dumping sets in order to create the proper order for restore, the list type of sets dumped last. Therefore internally we run the dumping loop twice: first with all non-list type of sets and skipping the list type ones and then secondly for the list type of sets. Sashiko noticed that there's a potential race between dump and destroy if in the first loop the last set was a list type of set: its pointer remains unreferenced and a concurrent destroy can free it. Fix the issue by resetting the variable holding the pointer. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 12:19:56 +02:00
Julian Anastasov	5522d65d81	ipvs: avoid possible loop in ip_vs_dst_event on resizing Sashiko points out that unprivileged user can frequently call ip_vs_flush() or ip_vs_del_service() to trigger svc_table_changes updates that can lead to infinite loop in ip_vs_dst_event(). This can also happen if the user triggers frequent table resizing without deleting all services. We should also consider the possible effects if the user triggers many NETDEV_DOWN events. One way to solve it is to hold svc_resize_sem in ip_vs_dst_event() but this can block the dev notifier during the whole resizing process. Instead, use new rw_semaphore svc_replace_sem to protect just the svc_table replacement which is a short code section. Then hold svc_replace_sem in ip_vs_dst_event() to serialize with replacing the svc_table. As result, loop is avoided as there is no need to repeat the table walking from the start. By this way changes in svc_table_changes can happen only when all services are removed and all dev references dropped which allows us to abort the table walking. As IP_VS_WORK_SVC_NORESIZE is the flag used to stop the svc_resize_work under service_mutex, we should check only this flag often but not while under service_mutex. To remove the mutex_trylock() for service_mutex in the second phase where the resizer installs the new table after rehashing, we will avoid holding the service_mutex there. As result, the code in configuration context which is under service_mutex should access ipvs->svc_table under RCU because it can be replaced at anytime and released after a RCU grace period. As for ip_vs_zero_all(), it needs different solution as a table walker which can escape single RCU read-side critical section: to hold the svc_replace_sem to prevent table to be replaced. In ip_vs_status_show() prefer to hold svc_replace_sem to avoid many loops, just detect if the svc_table is removed. Prefer the newly attached table for the u_thresh/l_thresh checks to know when to grow/shrink while adding or deleting services because the new table size is based on the latest parameters. Link: https://sashiko.dev/#/patchset/20260505001648.360569-1-pablo%40netfilter.org Fixes: `840aac3d90` ("ipvs: use resizable hash table for services") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 12:19:56 +02:00
Florian Westphal	1afc25ae75	netfilter: nf_conntrack_helper: fix possible null deref during error log Reported by sashiko: there is a small race window. If a helper module is unloaded or a userspace-defined helper is removed, nf_conntrack_helper_unregister() sets ->helper to NULL. Handle this safely. This needs a second patch to close related race during nf_conntrack_helper_unregister(). Fixes: `b20ab9cc63` ("netfilter: nf_ct_helper: better logging for dropped packets") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-16 12:19:56 +02:00
Michael Bommarito	aaec7096f9	net: hsr: defer node table free until after RCU readers HSR node-list and node-status generic-netlink operations run under rcu_read_lock(). They walk hsr->node_db through hsr_get_next_node() and hsr_get_node_data(), but RTM_DELLINK teardown removes the same node table with plain list_del() and frees each node immediately. That lets a generic-netlink reader hold a struct hsr_node pointer across hsr_dellink(). In a KASAN build, widening the reader window after hsr_get_next_node() obtains the node reproduces a slab-use-after-free when the reader copies node->macaddress_A; the freeing stack is hsr_del_nodes() from hsr_dellink(). Use list_del_rcu() and defer the free through the existing hsr_free_node_rcu() callback. This matches the lifetime rule used by the HSR prune paths, which already delete nodes with list_del_rcu() and call_rcu(). Fixes: `b9a1e62740` ("hsr: implement dellink to clean up resources") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260513233838.3064715-2-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-15 18:25:26 -07:00
Stefano Garzarella	ae38d91791	vsock/virtio: fix zerocopy completion for multi-skb sends When a large message is fragmented into multiple skbs, the zerocopy uarg is only allocated and attached to the last skb in the loop. Non-final skbs carry pinned user pages with no completion tracking, so the kernel has no way to notify userspace when those pages are safe to reuse. If the loop breaks early the uarg is never allocated at all, leaking pinned pages with no completion notification. Fix this by following the approach used by TCP: allocate the zerocopy uarg (if not provided by the caller) before the send loop and attach it to every skb via skb_zcopy_set(), which takes a reference per skb. Each skb's completion properly decrements the refcount, and the notification only fires after the last skb is freed. On failure, if no data was sent, the uarg is cleanly aborted via net_zcopy_put_abort(). This issue was initially discovered by sashiko while reviewing commit `1cb36e2522` ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting") but was pre-existing. Fixes: `581512a6dc` ("vsock/virtio: MSG_ZEROCOPY flag support") Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com Reported-by: Maher Azzouzi <maherazz04@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Link: https://patch.msgid.link/20260514092948.268720-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-15 17:38:15 -07:00
Michael Bommarito	915fab6982	ipv4: raw: reject IP_HDRINCL packets with ihl < 5 raw_send_hdrinc() validates that the caller-supplied IPv4 header fits within the message length: iphlen = iph->ihl * 4; err = -EINVAL; if (iphlen > length) goto error_free; if (iphlen >= sizeof(iph)) { / fix up saddr, tot_len, id, csum, transport_header / } It does not, however, reject ihl < 5. For such a packet the "if (iphlen >= sizeof(iph))" branch is skipped, leaving the crafted iphdr untouched, but the packet is still handed to __ip_local_out() and onward. Downstream consumers that read iph->ihl assume a sane value: net/ipv4/ah4.c:ah_output() in particular subtracts sizeof(struct iphdr) from top_iph->ihl * 4 and passes the (signed-int-negative, then cast to size_t) result to memcpy(), producing an OOB access of length close to SIZE_MAX and a host kernel panic. An IPv4 header with ihl < 5 is malformed by definition (RFC 791: "Internet Header Length is the length of the internet header in 32 bit words ... Note that the minimum value for a correct header is 5."). The kernel should not be willing to inject such a packet into its own output path. Reject "iphlen < sizeof(iph)" alongside the existing "iphlen > length" check. This matches the principle that locally constructed packets that re-enter the IP stack must pass the same basic sanity tests that a foreign packet would be subjected to. Once this lands, the "if (iphlen >= sizeof(iph))" wrapper around the fixup branch becomes redundant; left in place to keep the patch minimal and backport-friendly. A follow-up can unwrap it. Note that commit `86f4c90a1c` ("ipv4, ipv6: ensure raw socket message is big enough to hold an IP header") ensures the message buffer is large enough to hold an iphdr, but does not constrain the self-reported iph->ihl. Reachability: the malformed packet source is any caller with CAP_NET_RAW, including an unprivileged process in a user+net namespace on a kernel with CONFIG_USER_NS=y. The reproduced AH crash also requires a matching xfrm AH policy on the outgoing route; a container granted CAP_NET_ADMIN can install that state and policy in its netns. Loopback bypasses xfrm_output, so the trigger uses a real netdev. Reproduced on UML + KASAN: kernel-mode fault at addr 0x0 with memcpy_orig at the crash site. Same shape reproduces inside a rootless Docker container with --cap-add NET_ADMIN on a stock distro kernel. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/77ec2b5e8111961c2c39883c92e8aa2709039c17.1778614451.git.michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-15 15:55:02 -07:00
Sven Eckelmann	d5487249a8	batman-adv: tp_meter: directly shut down timer on cleanup batadv_tp_sender_cleanup() was calling timer_delete_sync() followed by timer_delete() to guard against the timer handler re-arming itself between the two calls. This double-deletion hack relied on the sending status being set to 0 to suppress re-arming. Replace both calls with a single timer_shutdown_sync(). This function both waits for any running timer callback to complete (like timer_delete_sync()) and permanently disarms the timer so it cannot be re-armed afterwards, making re-arming prevention unconditional and self-documenting. The re-arming property is also required because otherwise: 1. context 0 (batadv_tp_recv_ack()) checks in batadv_tp_reset_sender_timer() if sending is still 1 -> it is 2. context 1 changes in batadv_tp_sender_shutdown() sending to 0 and in this process forces the kthread to stop timer in batadv_tp_sender_cleanup() 3. context 0 continues in batadv_tp_reset_sender_timer() and rearms the timer -> but the reference for it is already gone Cc: stable@kernel.org Fixes: `33a3bb4a33` ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-15 10:41:55 +02:00
Sven Eckelmann	bc62216dc8	batman-adv: frag: disallow unicast fragment in fragment batadv_frag_skb_buffer() is called by batadv_batman_skb_recv() when a BATADV_UNICAST_FRAG packet is received. Once all fragments are collected and the packet is reassembled, batadv_recv_frag_packet() calls batadv_batman_skb_recv() again to process the defragmented payload. A malicious sender can craft a BATADV_UNICAST_FRAG packet whose reassembled payload is itself a BATADV_UNICAST_FRAG packet (matryoshka-style nesting). Each nesting level recurses through batadv_batman_skb_recv() without bound, growing the kernel stack until it is exhausted. Since refragmentation or fragments in fragments are not actually allowed, discard all packets which are still BATADV_UNICAST_FRAG packets after the defragmentation process. Cc: stable@kernel.org Fixes: `610bfc6bc9` ("batman-adv: Receive fragmented packets and merge") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Reviewed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-15 10:41:49 +02:00
Chuck Lever	f508262ae9	tls: Preserve sk_err across recvmsg() when data has been copied The sk_err check in tls_rx_rec_wait() consumes the error via sock_error(), which clears sk_err atomically. When the caller (tls_sw_recvmsg, tls_sw_splice_read, or tls_sw_read_sock) already has bytes copied to userspace, it returns those bytes and discards the error from this call. sk_err is now zero on the socket, so the next read syscall observes only RCV_SHUTDOWN and reports a clean EOF instead of the actual error (typically -ECONNRESET). The race is reachable when tls_read_flush_backlog()'s periodic sk_flush_backlog() triggers tcp_reset() in the middle of a multi-record read. Pass a has_copied flag to tls_rx_rec_wait(). When has_copied is false, consume sk_err via sock_error() as before. When has_copied is true, report the error from READ_ONCE() but leave sk_err set: the caller returns the byte count and discards the err from this call, and the next read syscall surfaces the preserved sk_err. This mirrors the tcp_recvmsg() preserve-and-surface pattern. The decrypt-abort path is unaffected: tls_err_abort() raises sk_err to EBADMSG after tls_rx_rec_wait() returns, and nothing on the caller's return path consumes it, so the EBADMSG surfaces on the next read. tls_sw_splice_read() passes has_copied=false: it processes one record per call, so no bytes have been copied within the function when tls_rx_rec_wait() runs. A reset that arrives between iterations of splice_direct_to_actor() (the sendfile() path) is still consumed by sock_error() in the later call, and the outer loop returns the prior iterations' byte count and drops the error. tcp_splice_read() exhibits the same pattern at the iteration boundary; addressing it belongs at the splice_direct_to_actor() layer and is out of scope here. Fixes: `c46b01839f` ("tls: rx: periodically flush socket backlog") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260513125825.205189-1-cel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-14 18:19:44 -07:00
William Bowling	f84eca5817	net: skbuff: preserve shared-frag marker during coalescing skb_try_coalesce() can attach paged frags from @from to @to. If @from has SKBFL_SHARED_FRAG set, the resulting @to skb can contain the same externally-owned or page-cache-backed frags, but the shared-frag marker is currently lost. That breaks the invariant relied on by later in-place writers. In particular, ESP input checks skb_has_shared_frag() before deciding whether an uncloned nonlinear skb can skip skb_cow_data(). If TCP receive coalescing has moved shared frags into an unmarked skb, ESP can see skb_has_shared_frag() as false and decrypt in place over page-cache backed frags. Propagate SKBFL_SHARED_FRAG when skb_try_coalesce() transfers paged frags. The tailroom copy path does not need the marker because it copies bytes into @to's linear data rather than transferring frag descriptors. Fixes: `cef401de7b` ("net: fix possible wrong checksum generation") Fixes: `f4c50a4034` ("xfrm: esp: avoid in-place decrypt on shared skb frags") Signed-off-by: William Bowling <vakzz@zellic.io> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260513041635.1289541-1-vakzz@zellic.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-14 17:22:00 -07:00
Maoyi Xie	d2bfdbb69c	rds_tcp: close NULL deref window in rds_tcp_set_callbacks rds_tcp_set_callbacks() links a new rds_tcp_connection onto rds_tcp_tc_list under rds_tcp_tc_list_lock. It releases the lock, then assigns tc->t_sock = sock outside the lock. rds_tcp_tc_info() and rds6_tcp_tc_info() walk rds_tcp_tc_list under the same lock. Both dereference tc->t_sock->sk without a NULL check. A reader can acquire rds_tcp_tc_list_lock between the writer's spin_unlock and the t_sock store. It then sees a list entry whose t_sock is NULL. The dereference of tc->t_sock->sk is a NULL access. Move tc->t_sock = sock inside rds_tcp_tc_list_lock, before list_add_tail. A reader holding the lock then observes the linkage and the t_sock store together. The restore path is safe. rds_tcp_restore_callbacks() does list_del_init inside the lock. The matching tc->t_sock = NULL after unlink is harmless to readers holding the lock. Fixes: `70041088e3` ("RDS: Add TCP transport to RDS") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260512142807.1855619-1-maoyi.xie@ntu.edu.sg Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-14 17:06:59 -07:00
Sven Eckelmann	6c65cf23d4	batman-adv: tp_meter: avoid use of uninit sender vars batadv_tp_recv_ack() and batadv_tp_stop() are only valid for tp_vars in the BATADV_TP_SENDER role. When called with a BATADV_TP_RECEIVER role, it proceeds to read sender-only members that were never initialized, leading to undefined behavior. This can be triggered when a node that is currently acting as a receiver in an ongoing tp_meter session receives a malicious ACK packet. Guard against this by checking tp_vars->role immediately after the lookup and bailing out if it is not BATADV_TP_SENDER, before any of those members are accessed. Cc: stable@kernel.org Fixes: `33a3bb4a33` ("batman-adv: throughput meter implementation") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Reviewed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-14 20:01:31 +02:00
Sven Eckelmann	2d8826a2d3	batman-adv: dat: handle forward allocation error batadv_dat_forward_data() calls pskb_copy_for_clone() to duplicate an skb for each DHT candidate, but does not check the return value before passing it to batadv_send_skb_prepare_unicast_4addr(). That function dereferences the skb unconditionally, so a failed allocation triggers a NULL pointer dereference. Skip forwarding to the current DHT candidate on allocation failure. Cc: stable@kernel.org Fixes: `785ea11441` ("batman-adv: Distributed ARP Table - create DHT helper functions") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Reviewed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-14 20:01:31 +02:00
Ruijie Li	a340a51ed8	batman-adv: clear current gateway during teardown batadv_gw_node_free() removes the gateway list entries during mesh teardown, but it does not clear the currently selected gateway. This leaves stale gateway state behind across cleanup and can break a later mesh recreation. Clear bat_priv->gw.curr_gw before walking the gateway list so the selected gateway reference is dropped as part of teardown. Fixes: `2265c14108` ("batman-adv: gateway election code refactoring") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ruijie Li <ruijieli51@gmail.com> Signed-off-by: Zhanpeng Li <lzhanpeng2025@lzu.edu.cn> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-14 18:48:40 +02:00
Ruide Cao	9cd3f16c32	batman-adv: fix fragment reassembly length accounting batman-adv keeps a running payload length for queued fragments and uses it to validate a fragment chain before reassembly. That accounting currently allows the accumulated fragment length to be truncated during updates. As a result, malformed fragment chains can bypass the intended validation and drive reassembly with inconsistent length state, leading to a local denial of service. Fix the accounting by storing the accumulated length in a length-typed field and rejecting update overflows before the existing validation logic runs. The fix was verified against the original reproducer and against valid fragment reassembly paths. Fixes: `610bfc6bc9` ("batman-adv: Receive fragmented packets and merge") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ruide Cao <caoruide123@gmail.com> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-14 18:33:29 +02:00
Linus Torvalds	66182ca873	Merge tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Previous releases - regressions: - ethtool: fix NULL pointer dereference in phy_reply_size - netfilter: - allocate hook ops while under mutex - close dangling table module init race - restore nf_conntrack helper propagation via expectation - tcp: - fix potential UAF in reqsk_timer_handler(). - fix out-of-bounds access for twsk in tcp_ao_established_key(). - vsock: fix empty payload in tap skb for non-linear buffers - hsr: fix NULL pointer dereference in hsr_get_node_data() - eth: - cortina: fix RX drop accounting - ice: fix locking in ice_dcb_rebuild() Previous releases - always broken: - napi: avoid gro timer misfiring at end of busypoll - sched: - dualpi2: initialize timer earlier in dualpi2_init() - sch_cbs: Call qdisc_reset for child qdisc - shaper: - fix ordering issue in net_shaper_commit() - reject handle IDs exceeding internal bit-width - ipv6: flowlabel: enforce per-netns limit for unprivileged callers - tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring - smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint - sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL - batman-adv: - reject new tp_meter sessions during teardown - purge non-released claims - eth: - i40e: cleanup PTP registration on probe failure - idpf: fix double free and use-after-free in aux device error paths - ena: fix potential use-after-free in get_timestamp" * tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) net: phy: DP83TC811: add reading of abilities net: tls: prevent chain-after-chain in plain text SG net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring net/smc: reject CHID-0 ACCEPT that matches an empty ism_dev slot macsec: use rcu_work to defer TX SA crypto cleanup out of softirq macsec: use rcu_work to defer RX SA crypto cleanup out of softirq macsec: introduce dedicated workqueue for SA crypto cleanup net: net_failover: Fix the deadlock in slave register MAINTAINERS: update atlantic driver maintainer selftests/tc-testing: Add QFQ/CBS qlen underflow test net/sched: sch_cbs: Call qdisc_reset for child qdisc FDDI: defza: Sanitise the reset safety timer net: ethernet: ravb: Do not check URAM suspension when WoL is active ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS net: atm: fix skb leak in sigd_send() default branch net: ethtool: phy: avoid NULL deref when PHY driver is unbound net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled net: shaper: reject QUEUE scope handle with missing id ...	2026-05-14 08:57:43 -07:00
Michael Bommarito	3374ef8cf9	Bluetooth: L2CAP: ecred_reconfigure: send packed pdu, not stack pointer Commit `1c08108f30` ("Bluetooth: L2CAP: Avoid -Wflex-array-member-not-at-end warnings") converted the on-stack request PDU in l2cap_ecred_reconfigure() from an explicit packed struct to DEFINE_RAW_FLEX(), but did not adjust the size and source-pointer arguments to l2cap_send_cmd(): - struct { - struct l2cap_ecred_reconf_req req; - __le16 scid; - } pdu; + DEFINE_RAW_FLEX(struct l2cap_ecred_reconf_req, pdu, scid, 1); ... l2cap_send_cmd(conn, chan->ident, L2CAP_ECRED_RECONF_REQ, sizeof(pdu), &pdu); After the conversion, DEFINE_RAW_FLEX() expands to declare an anonymous union pdu_u plus a local pointer "pdu" pointing at it. Therefore: - sizeof(pdu) is now sizeof(struct l2cap_ecred_reconf_req ) = 8 on 64-bit (4 on 32-bit), not the 6 bytes of (mtu, mps, scid[1]). - &pdu is the address of the local pointer's stack storage, not the address of the request payload. l2cap_send_cmd() forwards (data, count) to l2cap_build_cmd(), which calls skb_put_data(skb, data, count). The L2CAP_ECRED_RECONFIGURE_REQ packet body therefore contains 8 bytes copied from the kernel stack starting at &pdu -- the 8 bytes overlap the pdu pointer's value, leaking a kernel stack address to the paired Bluetooth peer. The intended (mtu, mps, scid) fields are not transmitted at all, so the peer rejects the request as malformed and the L2CAP_ECRED_RECONFIGURE feature itself has been broken for the local-side initiator since the introducing commit landed. The sibling site l2cap_ecred_conn_req() in the same commit was converted correctly (sizeof(pdu) + len, pdu); only this site was missed. Restore the original semantics: pass the full flex-struct size via struct_size(pdu, scid, 1) and the pdu pointer (the struct address) as the source. Validated on a stock 7.0-based host kernel via the real call path: setsockopt(SOL_BLUETOOTH, BT_RCVMTU, ...) on a BT_CONNECTED L2CAP_MODE_EXT_FLOWCTL socket emits an L2CAP_ECRED_RECONFIGURE_REQ whose body is 8 bytes (the on-stack pdu local's value) rather than the expected 6. Three captures from fresh socket / fresh hciemu peer on the same host -- low bytes vary per call, high 0xffff confirms a kernel virtual address (KASLR-randomised stack slot, not a fixed string): RECONF_REQ body (ident=0x02 len=8): 42 fb 54 af 0e ca ff ff RECONF_REQ body (ident=0x02 len=8): 52 3d 2e af 0e ca ff ff RECONF_REQ body (ident=0x02 len=8): b2 fc 5b af 0e ca ff ff After this patch the body is 6 bytes carrying the expected little-endian (mtu, mps, scid). Cc: stable@vger.kernel.org Fixes: `1c08108f30` ("Bluetooth: L2CAP: Avoid -Wflex-array-member-not-at-end warnings") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-14 09:57:45 -04:00
Jiexun Wang	e83f5e24da	Bluetooth: serialize accept_q access bt_sock_poll() walks the accept queue without synchronization, while child teardown can unlink the same socket and drop its last reference. The unsynchronized accept queue walk has existed since the initial Bluetooth import. Protect accept_q with a dedicated lock for queue updates and polling. Also rework bt_accept_dequeue() to take temporary child references under the queue lock before dropping it and locking the child socket. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Reported-by: Jann Horn <jannh@google.com> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-14 09:49:56 -04:00
Jakub Kicinski	ff26a0e837	net: tls: prevent chain-after-chain in plain text SG Sashiko points out that if end = 0 (start != 0) the current code will create a chain link to content type right after the wrap link: This would create a chain where the wrap link points directly to another chain link. The scatterlist API sg_next iterator does not recursively resolve consecutive chain links. meaning this is illegal input to crypto. The wrapping link is unnecessary if end = 0. end is the entry after the last one used so end = 0 means there's nothing pushed after the wrap: end start i v v v [ ]...[ ][ d ][ d ][ d ][ d ][rsv for wrap] Skip the wrapping in this case. TLS 1.3 can use the "wrapping slot" for it's chaining if end = 0. This avoids the chain-after-chain. Move the wrap chaining before marking END and chaining off content type, that feels like more logical ordering to me, but should not matter from functional perspective. Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: `9aaaa56845` ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260511174920.433155-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-14 13:18:40 +02:00
Jakub Kicinski	285943c6e7	net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring When an sk_msg scatterlist ring wraps (sg.end < sg.start), tls_push_record() chains the tail portion of the ring to the head using sg_chain(). An extra entry in the sg array is reserved for this: struct sk_msg_sg { [...] /* The extra two elements: * 1) used for chaining the front and sections when the list becomes * partitioned (e.g. end < start). The crypto APIs require the * chaining; * 2) to chain tailer SG entries after the message. */ struct scatterlist data[MAX_MSG_FRAGS + 2]; The current code uses MAX_SKB_FRAGS + 1 as the ring size: sg_chain(&msg_pl->sg.data[msg_pl->sg.start], MAX_SKB_FRAGS - msg_pl->sg.start + 1, msg_pl->sg.data); This places the chain pointer at sg_chain(data[start], (MAX_SKB_FRAGS - msg_start + 1) .. = &data[start] + (MAX_SKB_FRAGS - msg_start + 1) - 1 = data[start + (MAX_SKB_FRAGS - start + 1) - 1] = data[MAX_SKB_FRAGS] instead of the true last entry. This is likely due to a "race" of the commit under Fixes landing close to commit `031097d9e0` ("bpf: sk_msg, zap ingress queue on psock down") Convert to ARRAY_SIZE and drop the data[start] / - start (as suggested by Sabrina). Reported-by: 钱一铭 <yimingqian591@gmail.com> Fixes: `9aaaa56845` ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260511174920.433155-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-14 13:18:40 +02:00
Xiang Mei	277740023d	net/smc: reject CHID-0 ACCEPT that matches an empty ism_dev slot On the SMC-D client, slot 0 of ini->ism_dev[]/ini->ism_chid[] is reserved for an SMC-Dv1 device. smc_find_ism_v2_device_clnt() populates V2 entries starting at index 1, so when no V1 device is selected slot 0 is left in its kzalloc()'ed state with ism_dev[0] == NULL and ism_chid[0] == 0. smc_v2_determine_accepted_chid() then matches the peer's CHID against the array starting from index 0 using the CHID alone. A malicious peer replying to a SMC-Dv2-only proposal with d1.chid == 0 matches the empty slot, ini->ism_selected becomes 0, and the subsequent ism_dev[0]->lgr_lock dereference in smc_conn_create() faults at offsetof(struct smcd_dev, lgr_lock) == 0x68: BUG: KASAN: null-ptr-deref in _raw_spin_lock_bh+0x79/0xe0 Write of size 4 at addr 0000000000000068 by task exploit/144 Call Trace: _raw_spin_lock_bh smc_conn_create (net/smc/smc_core.c:1997) __smc_connect (net/smc/af_smc.c:1447) smc_connect (net/smc/af_smc.c:1720) __sys_connect __x64_sys_connect do_syscall_64 Require ism_dev[i] to be non-NULL before accepting a CHID match. Fixes: `a7c9c5f4af` ("net/smc: CLC accept / confirm V2") Reported-by: Weiming Shi <bestswngs@gmail.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260511062138.2839584-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-14 12:20:06 +02:00
Faicker Mo	b84c5632c7	net: net_failover: Fix the deadlock in slave register There is netdev_lock_ops() before the NETDEV_REGISTER notifier in register_netdevice(), so use the non-locking functions in net_failover_slave_register(). failover_slave_register() in failover_existing_slave_register() adds lock and unlock ops too. Call Trace: <TASK> __schedule+0x30d/0x7a0 schedule+0x27/0x90 schedule_preempt_disabled+0x15/0x30 __mutex_lock.constprop.0+0x538/0x9e0 __mutex_lock_slowpath+0x13/0x20 mutex_lock+0x3b/0x50 dev_set_mtu+0x40/0xe0 net_failover_slave_register+0x24/0x280 failover_slave_register+0x103/0x1b0 failover_event+0x15e/0x210 ? dropmon_net_event+0xac/0xe0 notifier_call_chain+0x5e/0xe0 raw_notifier_call_chain+0x16/0x30 call_netdevice_notifiers_info+0x52/0xa0 register_netdevice+0x5f4/0x7c0 register_netdev+0x1e/0x40 _mlx5e_probe+0xe2/0x370 [mlx5_core] mlx5e_probe+0x59/0x70 [mlx5_core] ? __pfx_mlx5e_probe+0x10/0x10 [mlx5_core] Fixes: `4c975fd700` ("net: hold instance lock during NETDEV_REGISTER/UP") Signed-off-by: Faicker Mo <faicker.mo@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-13 19:01:03 -07:00
Jamal Hadi Salim	320fb29ea2	net/sched: sch_cbs: Call qdisc_reset for child qdisc During a reset, CBS is not calling reset on its child qdisc, which might cause qlen/backlog accounting issues. For example, if we have CBS with a QFQ parent and a netem child with delay, we can create a scenario where the parent's qlen underflows. QFQ, specifically, uses qlen to check whether it should deference a pointer, so this scenario may cause a null-ptr deref in QFQ: [ 43.875639][ T319] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] SMP KASAN NOPTI [ 43.876124][ T319] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 43.876417][ T319] CPU: 10 UID: 0 PID: 319 Comm: ping Not tainted 7.0.0-13039-ge728258debd5 #773 PREEMPT(full) [ 43.876751][ T319] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 43.876949][ T319] RIP: 0010:qfq_dequeue+0x35c/0x1650 [ 43.877123][ T319] Code: 00 fc ff df 80 3c 02 00 0f 85 17 0e 00 00 4c 8d 73 48 48 89 9d b8 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 76 0c 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b [ 43.877648][ T319] RSP: 0018:ffff8881017ef4f0 EFLAGS: 00010216 [ 43.877845][ T319] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 43.878073][ T319] RDX: 0000000000000009 RSI: 0000000c40000000 RDI: ffff88810eef02b0 [ 43.878306][ T319] RBP: ffff88810eef0000 R08: ffff88810eef0280 R09: 1ffff1102120fd63 [ 43.878523][ T319] R10: 1ffff1102120fd66 R11: 1ffff1102120fd67 R12: 0000000c40000000 [ 43.878742][ T319] R13: ffff88810eef02b8 R14: 0000000000000048 R15: 0000000020000000 [ 43.878959][ T319] FS: 00007f9c51c47c40(0000) GS:ffff88817a0be000(0000) knlGS:0000000000000000 [ 43.879214][ T319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.879403][ T319] CR2: 000055e69a2230a8 CR3: 000000010c07a000 CR4: 0000000000750ef0 [ 43.879621][ T319] PKRU: 55555554 [ 43.879735][ T319] Call Trace: [ 43.879844][ T319] <TASK> [ 43.879924][ T319] __qdisc_run+0x169/0x1900 [ 43.880075][ T319] ? dev_qdisc_enqueue+0x8b/0x210 [ 43.880222][ T319] __dev_queue_xmit+0x2346/0x37a0 [ 43.880376][ T319] ? register_lock_class+0x3f/0x800 [ 43.880531][ T319] ? srso_alias_return_thunk+0x5/0xfbef5 [ 43.880684][ T319] ? __pfx___dev_queue_xmit+0x10/0x10 [ 43.880834][ T319] ? srso_alias_return_thunk+0x5/0xfbef5 [ 43.880977][ T319] ? __lock_acquire+0x819/0x1df0 [ 43.881124][ T319] ? srso_alias_return_thunk+0x5/0xfbef5 [ 43.881275][ T319] ? srso_alias_return_thunk+0x5/0xfbef5 [ 43.881418][ T319] ? __asan_memcpy+0x3c/0x60 [ 43.881563][ T319] ? srso_alias_return_thunk+0x5/0xfbef5 [ 43.881708][ T319] ? eth_header+0x165/0x1a0 [ 43.881853][ T319] ? lockdep_hardirqs_on_prepare+0xdb/0x1a0 [ 43.882031][ T319] ? srso_alias_return_thunk+0x5/0xfbef5 [ 43.882174][ T319] ? neigh_resolve_output+0x3cc/0x7e0 [ 43.882325][ T319] ? srso_alias_return_thunk+0x5/0xfbef5 [ 43.882471][ T319] ip_finish_output2+0x6b6/0x1e10 Fix this by calling qdisc_reset for CBS' child qdisc. Sashiko caught an issue which could result in a null ptr deref if qdisc_create_dflt() is invoked on an unitialised cbs qdisc which is exposed by this patch. We add an early return if the qdisc is null to address this. This is a similar approach used by two other fixes[1][2]. The proper fix for this specific issue elucidated by sashiko is to remove the call to qdisc_reset when qdisc_create_dflt fails. Since the dflt qdisc isn't attached anywhere yet at that point, calling the reset callback doesn't make much sense (and as stated has been a source of two other bugs). We plan on submitting this fix in a later patch. [1] https://lore.kernel.org/netdev/20221018063201.306474-2-shaozhengchao@huawei.com/ [2] https://lore.kernel.org/netdev/20221018063201.306474-4-shaozhengchao@huawei.com/ Fixes: `585d763af0` ("net/sched: Introduce Credit Based Shaper (CBS) qdisc") Reported-by: Junyoung Jang <graypanda.inzag@gmail.com> Tested-by: Junyoung Jang <graypanda.inzag@gmail.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-13 17:53:39 -07:00
Chenguang Zhao	3d042592eb	ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics ethnl_bitmap32_not_zero() should return true if some bit in [start, end) is set: - Fix inverted memchr_inv() sense: return true when the scan finds a non-zero byte, not when the middle words are all zero. - Return false for an empty interval (end <= start). - When end is 32-bit aligned, indices in [start, end) do not include any bits from map[end_word]; return false after earlier checks found no non-zero data. Fixes: `10b518d4e6` ("ethtool: netlink bitset handling") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 18:45:13 -07:00
Xiang Mei	7bf563badd	net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint The smc_msg_event tracepoint class, shared by smc_tx_sendmsg and smc_rx_recvmsg, unconditionally dereferences smc->conn.lnk: __string(name, smc->conn.lnk->ibname) conn->lnk is only set for SMC-R; for SMC-D it is NULL. Other code on these paths already handles this (e.g. !conn->lnk in SMC_STAT_RMB_TX_SIZE_SMALL()). With the tracepoint enabled, the first sendmsg()/recvmsg() on an SMC-D socket crashes: Oops: general protection fault, probably for non-canonical address KASAN: null-ptr-deref in range [...] RIP: 0010:strlen+0x1e/0xa0 Call Trace: trace_event_raw_event_smc_msg_event (net/smc/smc_tracepoint.h:44) smc_rx_recvmsg (net/smc/smc_rx.c:515) smc_recvmsg (net/smc/af_smc.c:2859) __sys_recvfrom (net/socket.c:2315) __x64_sys_recvfrom (net/socket.c:2326) do_syscall_64 The faulting address 0x3e0 is offsetof(struct smc_link, ibname), confirming the NULL ->lnk deref. Enabling the tracepoint requires root, but the trigger itself is unprivileged: socket(AF_SMC, ...) has no capability check, and SMC-D negotiation needs no admin step on s390 or on x86 with the loopback ISM device loaded. Log an empty device name for SMC-D instead of dereferencing NULL. Fixes: `aff3083f10` ("net/smc: Introduce tracepoints for tx and rx msg") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 18:44:36 -07:00
Nicolò Coccia	a3fdd924d8	net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS A logic flaw in __smc_setsockopt() allows a local unprivileged user to cause a Denial of Service (DoS) by holding the socket lock indefinitely. The function __smc_setsockopt() calls copy_from_sockptr() while holding lock_sock(sk). By passing a userfaultfd-monitored memory page (or FUSE-backed memory on systems where unprivileged userfaultfd is disabled) as the optval, an attacker can halt execution during the copy operation, keeping the lock held. Combined with asynchronous tear-down operations like shutdown(), this exhausts the kernel wq (kworkers) and triggers the hung task watchdog. [ 240.123456] INFO: task kworker/u8:2 blocked for more than 120 seconds. [ 240.123489] Call Trace: [ 240.123501] smc_shutdown+... [ 240.123512] lock_sock_nested+... This patch moves the user-space copy outside the lock_sock() critical section to prevent the issue. Fixes: `a6a6fe27ba` ("net/smc: Dynamic control handshake limitation by socket options") Signed-off-by: Nicolò Coccia <n.coccia96@gmail.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 18:43:40 -07:00
Wei Yang	f9e2342046	net: atm: fix skb leak in sigd_send() default branch The default branch in sigd_send() calls sock_put() and returns -EINVAL without freeing the skb, while all other exit paths do so. Add the missing dev_kfree_skb() before sock_put() to fix the leak. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Wei Yang <albinwyang@tencent.com> Link: https://patch.msgid.link/20260509122358.1102997-1-albin_yang@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 18:07:02 -07:00
David Carlier	e3adf69f8e	net: ethtool: phy: avoid NULL deref when PHY driver is unbound phydev->drv can become NULL while the phy_device is still attached to its net_device, namely after the PHY driver is unbound via sysfs: echo <mdio_id> > /sys/bus/mdio_bus/drivers/<phy_drv>/unbind phy_remove() clears phydev->drv but doesn't call phy_detach(), so the phy_device stays in the link topology xarray and ethnl_req_get_phydev() still hands it back. ETHTOOL_MSG_PHY_GET then oopses on: rep_data->drvname = kstrdup(phydev->drv->name, GFP_KERNEL); drvname is already treated as optional by phy_reply_size(), phy_fill_reply() and phy_cleanup_data(), so just skip the allocation when there is no driver bound. Fixes: `9dd2ad5e92` ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Cc: stable@vger.kernel.org # 6.13.x Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260509215046.107157-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 17:45:12 -07:00
Jakub Kicinski	ce372e869f	net: shaper: reject QUEUE scope handle with missing id net_shaper_parse_handle() does not enforce that the user provides the handle ID. For NODE the ID defaults to UNSPEC for all other cases it defaults to 0. For NETDEV 0 is the only option. For QUEUE defaulting to 0 makes less intuitive sense. Specifically because the behavior should (IMHO) be the same for all cases where there may be more than one ID (QUEUE and NODE). We should either document this as intentional or reject. I picked the latter with no strong conviction. Fixes: `4b623f9f0f` ("net-shapers: implement NL get operation") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-11-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:15:00 +02:00
Jakub Kicinski	b62b29e6de	net: shaper: enforce singleton NETDEV scope with id 0 The NETDEV scope represents a singleton root shaper in the per-device hierarchy. All code assumes NETDEV shapers have id 0: net_shaper_default_parent() hardcodes parent->id = 0 when returning the NETDEV parent for QUEUE/NODE children, and the UAPI documentation describes NETDEV scope as "the main shaper" (singular, not plural). Make sure we reject non-0 IDs. Fixes: `4b623f9f0f` ("net-shapers: implement NL get operation") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-10-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:15:00 +02:00
Jakub Kicinski	8d5806c600	net: shaper: reject handle IDs exceeding internal bit-width net_shaper_parse_handle() reads the user-supplied handle ID via nla_get_u32(), accepting the full u32 range. However, the xarray key is built by net_shaper_handle_to_index() using FIELD_PREP(NET_SHAPER_ID_MASK, handle->id), where NET_SHAPER_ID_MASK is GENMASK(25, 0) - only 26 bits wide. FIELD_PREP silently masks off the upper bits at runtime. A user-supplied NODE id like 0x04000123 becomes id 0x123. Additionally, a user-supplied id equal to NET_SHAPER_ID_UNSPEC (0x03FFFFFF, which is NET_SHAPER_ID_MASK itself) would collide with the sentinel used internally by the group operation to signal "allocate a new NODE id". Reject user-supplied IDs >= NET_SHAPER_ID_MASK (i.e., >= 0x03FFFFFF) in the policy. Fixes: `4b623f9f0f` ("net-shapers: implement NL get operation") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-9-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:15:00 +02:00
Jakub Kicinski	0f9a857e34	net: shaper: fix undersized reply skb allocation in GROUP command net_shaper_group_send_reply() writes both the NET_SHAPER_A_IFINDEX attribute (via net_shaper_fill_binding()) and the nested NET_SHAPER_A_HANDLE attribute (via net_shaper_fill_handle()), but the reply skb at the call site in net_shaper_nl_group_doit() is allocated using net_shaper_handle_size(), which only accounts for the nested handle. The allocation is therefore short by nla_total_size(sizeof(u32)) (8 bytes) for the IFINDEX attribute. In practice the slab allocator rounds up the small allocation so the bug is latent, but the size accounting is wrong and could bite if the reply grew further. Introduce net_shaper_group_reply_size() that accounts for the full reply payload and use it both at the genlmsg_new() call site and in the defensive WARN_ONCE message. Fixes: `5d5d4700e7` ("net-shapers: implement NL group operation") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-7-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:14:59 +02:00
Jakub Kicinski	8054f85b83	net: shaper: set ret to -ENOMEM when genlmsg_new() fails in group_doit genlmsg_new() alloc failure path in net_shaper_nl_group_doit() forgets to set ret before jumping to error handling. Fixes: `5d5d4700e7` ("net-shapers: implement NL group operation") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-6-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:14:59 +02:00
Jakub Kicinski	a9a2fa1da6	net: shaper: reject duplicate leaves in GROUP request net_shaper_nl_group_doit() does not deduplicate NET_SHAPER_A_LEAVES entries. When userspace supplies the same leaf handle twice, the same old-parent pointer lands twice in old_nodes[]. The cleanup loop double frees the parent. Of course the same parent may still be in old_nodes[] twice if we are moving multiple of its leaves. Note that this patch also implicitly fixes the fact that the i >= leaves_count path forgets to set ret. Fixes: `5d5d4700e7` ("net-shapers: implement NL group operation") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-4-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:14:59 +02:00
Jakub Kicinski	235fb53761	net: shaper: fix trivial ordering issue in net_shaper_commit() We should update the entry before we mark it as valid. Fixes: `93954b40f6` ("net-shapers: implement NL set and delete operations") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:14:59 +02:00
Jakub Kicinski	7cee43fcb0	net: shaper: flip the polarity of the valid flag The usual way of inserting entries which are not yet fully ready into XArray is to have a VALID flag. The shaper code has a NOT_VALID flag. Since XArray code does not let us create entries with marks already set - the creation of entries is currently not atomic. Flip the polarity of the VALID flag. This closes the tiny race in net_shaper_pre_insert() of entries being created without the NOT_VALID flag. Fixes: `93954b40f6` ("net-shapers: implement NL set and delete operations") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 16:14:59 +02:00

1 2 3 4 5 ...

84300 Commits