linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-17 17:47:52 -04:00

Author	SHA1	Message	Date
Thorsten Blum	e405b3c9d4	net: ipconfig: Remove outdated comment and indent code block The comment has been around ever since commit `1da177e4c3` ("Linux-2.6.12-rc2") and can be removed. Remove it and indent the code block accordingly. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20260109121128.170020-2-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-12 19:11:35 -08:00
Bobby Eshleman	5c024716f5	net: devmem: convert binding refcount to percpu_ref Convert net_devmem_dmabuf_binding refcount from refcount_t to percpu_ref to optimize common-case reference counting on the hot path. The typical devmem workflow involves binding a dmabuf to a queue (acquiring the initial reference on binding->ref), followed by high-volume traffic where every skb fragment acquires a reference. Eventually traffic stops and the unbind operation releases the initial reference. Additionally, the high traffic hot path is often multi-core. This access pattern is ideal for percpu_ref as the first and last reference during bind/unbind normally book-ends activity in the hot path. __net_devmem_dmabuf_binding_free becomes the percpu_ref callback invoked when the last reference is dropped. kperf test: - 4MB message sizes - 60s of workload each run - 5 runs - 4 flows Throughput: Before: 45.31 GB/s (+/- 3.17 GB/s) After: 48.67 GB/s (+/- 0.01 GB/s) Picking throughput-matched kperf runs (both before and after matched at ~48 GB/s) for apples-to-apples comparison: Summary (averaged across 4 workers): TX worker CPU idle %: Before: 34.44% After: 87.13% RX worker CPU idle %: Before: 5.38% After: 9.73% kperf before: client: == Source client: Tx 98.100 Gbps (735764807680 bytes in 60001149 usec) client: Tx102.798 Gbps (770996961280 bytes in 60001149 usec) client: Tx101.534 Gbps (761517834240 bytes in 60001149 usec) client: Tx 82.794 Gbps (620966707200 bytes in 60001149 usec) client: net CPU 56: usr: 0.01% sys: 0.12% idle:17.06% iow: 0.00% irq: 9.89% sirq:72.91% client: app CPU 60: usr: 0.08% sys:63.30% idle:36.24% iow: 0.00% irq: 0.30% sirq: 0.06% client: net CPU 57: usr: 0.03% sys: 0.08% idle:75.68% iow: 0.00% irq: 2.96% sirq:21.23% client: app CPU 61: usr: 0.06% sys:67.67% idle:31.94% iow: 0.00% irq: 0.28% sirq: 0.03% client: net CPU 58: usr: 0.01% sys: 0.06% idle:76.87% iow: 0.00% irq: 2.84% sirq:20.19% client: app CPU 62: usr: 0.06% sys:69.78% idle:29.79% iow: 0.00% irq: 0.30% sirq: 0.05% client: net CPU 59: usr: 0.06% sys: 0.16% idle:74.97% iow: 0.00% irq: 3.76% sirq:21.03% client: app CPU 63: usr: 0.06% sys:59.82% idle:39.80% iow: 0.00% irq: 0.25% sirq: 0.05% client: == Target client: Rx 98.092 Gbps (735764807680 bytes in 60006084 usec) client: Rx102.785 Gbps (770962161664 bytes in 60006084 usec) client: Rx101.523 Gbps (761499566080 bytes in 60006084 usec) client: Rx 82.783 Gbps (620933136384 bytes in 60006084 usec) client: net CPU 2: usr: 0.00% sys: 0.01% idle:24.51% iow: 0.00% irq: 1.67% sirq:73.79% client: app CPU 6: usr: 1.51% sys:96.43% idle: 1.13% iow: 0.00% irq: 0.36% sirq: 0.55% client: net CPU 1: usr: 0.00% sys: 0.01% idle:25.18% iow: 0.00% irq: 1.99% sirq:72.80% client: app CPU 5: usr: 2.21% sys:94.54% idle: 2.54% iow: 0.00% irq: 0.38% sirq: 0.30% client: net CPU 3: usr: 0.00% sys: 0.01% idle:26.34% iow: 0.00% irq: 2.12% sirq:71.51% client: app CPU 7: usr: 2.22% sys:94.28% idle: 2.52% iow: 0.00% irq: 0.59% sirq: 0.37% client: net CPU 0: usr: 0.00% sys: 0.03% idle: 0.00% iow: 0.00% irq:10.44% sirq:89.51% client: app CPU 4: usr: 2.39% sys:81.46% idle:15.33% iow: 0.00% irq: 0.50% sirq: 0.30% kperf after: client: == Source client: Tx 99.257 Gbps (744447016960 bytes in 60001303 usec) client: Tx101.013 Gbps (757617131520 bytes in 60001303 usec) client: Tx 88.179 Gbps (661357854720 bytes in 60001303 usec) client: Tx101.002 Gbps (757533245440 bytes in 60001303 usec) client: net CPU 56: usr: 0.00% sys: 0.01% idle: 6.22% iow: 0.00% irq: 8.68% sirq:85.06% client: app CPU 60: usr: 0.08% sys:12.56% idle:87.21% iow: 0.00% irq: 0.08% sirq: 0.05% client: net CPU 57: usr: 0.00% sys: 0.05% idle:69.53% iow: 0.00% irq: 2.02% sirq:28.38% client: app CPU 61: usr: 0.11% sys:13.40% idle:86.36% iow: 0.00% irq: 0.08% sirq: 0.03% client: net CPU 58: usr: 0.00% sys: 0.03% idle:70.04% iow: 0.00% irq: 3.38% sirq:26.53% client: app CPU 62: usr: 0.10% sys:11.46% idle:88.31% iow: 0.00% irq: 0.08% sirq: 0.03% client: net CPU 59: usr: 0.01% sys: 0.06% idle:71.18% iow: 0.00% irq: 1.97% sirq:26.75% client: app CPU 63: usr: 0.10% sys:13.10% idle:86.64% iow: 0.00% irq: 0.10% sirq: 0.05% client: == Target client: Rx 99.250 Gbps (744415182848 bytes in 60003297 usec) client: Rx101.006 Gbps (757589737472 bytes in 60003297 usec) client: Rx 88.171 Gbps (661319475200 bytes in 60003297 usec) client: Rx100.996 Gbps (757514792960 bytes in 60003297 usec) client: net CPU 2: usr: 0.00% sys: 0.01% idle:28.02% iow: 0.00% irq: 1.95% sirq:70.00% client: app CPU 6: usr: 2.03% sys:87.20% idle:10.04% iow: 0.00% irq: 0.37% sirq: 0.33% client: net CPU 3: usr: 0.00% sys: 0.00% idle:27.63% iow: 0.00% irq: 1.90% sirq:70.45% client: app CPU 7: usr: 1.78% sys:89.70% idle: 7.79% iow: 0.00% irq: 0.37% sirq: 0.34% client: net CPU 0: usr: 0.00% sys: 0.01% idle: 0.00% iow: 0.00% irq: 9.96% sirq:90.01% client: app CPU 4: usr: 2.33% sys:83.51% idle:13.24% iow: 0.00% irq: 0.64% sirq: 0.26% client: net CPU 1: usr: 0.00% sys: 0.01% idle:27.60% iow: 0.00% irq: 1.94% sirq:70.43% client: app CPU 5: usr: 1.88% sys:89.61% idle: 7.86% iow: 0.00% irq: 0.35% sirq: 0.27% Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260107-upstream-precpu-ref-v2-v2-1-a709f098b3dc@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-12 18:00:12 -08:00
Jakub Kicinski	669aa3e3fa	Merge tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== First set of changes for the current -next cycle, of note: - ath12k gets an overhaul to support multi-wiphy device wiphy and pave the way for future device support in the same driver (rather than splitting to ath13k) - mac80211 gets some better iteration macros * tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (120 commits) wifi: mac80211: remove width argument from ieee80211_parse_bitrates wifi: mac80211_hwsim: remove NAN by default wifi: mac80211: improve station iteration ergonomics wifi: mac80211: improve interface iteration ergonomics wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel wifi: mac80211: unexport ieee80211_get_bssid() wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version wifi: iwlegacy: 3945-rs: remove redundant pointer check in il3945_rs_tx_status() and il3945_rs_get_rate() wifi: mac80211: don't send an unused argument to ieee80211_check_combinations wifi: libertas: fix WARNING in usb_tx_block wifi: mwifiex: Allocate dev name earlier for interface workqueue name wifi: wlcore: sdio: Use pm_ptr instead of #ifdef CONFIG_PM wifi: cfg80211: Fix use_for flag update on BSS refresh wifi: brcmfmac: rename function that frees vif wifi: brcmfmac: fix/add kernel-doc comments wifi: mac80211: Update csa_finalize to use link_id wifi: cfg80211: add cfg80211_stop_link() for per-link teardown wifi: ath12k: Skip DP peer creation for scan vdev wifi: ath12k: move firmware stats request outside of atomic context wifi: ath12k: add the missing RCU lock in ath12k_dp_tx_free_txbuf() ... ==================== Link: https://patch.msgid.link/20260112185836.378736-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-12 17:02:02 -08:00
Miri Korenblit	46e7ced3ef	wifi: mac80211: remove width argument from ieee80211_parse_bitrates The width parameter in ieee80211_parse_bitrates() is unused. Remove it. While at it, use the already fetched sband pointer as an argument instead of dereferencing it once again. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260108143257.d13dbbda93f0.Ie70b24af583e3812883b4004ce227e7af1646855@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:48:41 +01:00
Johannes Berg	f813117f20	wifi: mac80211: improve station iteration ergonomics Right now, the only way to iterate stations is to declare an iterator function, possibly data structure to use, and pass all that to the iteration helper function. This is annoying, and there's really no inherent need for it. Add a new for_each_station() macro that does the iteration in a more ergonomic way. To avoid even more exported functions, do the old ieee80211_iterate_stations_mtx() as an inline using the new way, which may also let the compiler optimise it a bit more, e.g. via inlining the iterator function. Link: https://patch.msgid.link/20260108143431.d2b641f6f6af.I4470024f7404446052564b15bcf8b3f1ada33655@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:48:17 +01:00
Johannes Berg	6b3bafa2bd	wifi: mac80211: improve interface iteration ergonomics Right now, the only way to iterate interfaces is to declare an iterator function, possibly data structure to use, and pass all that to the iteration helper function. This is annoying, and there's really no inherent need for it, except it was easier to implement with the iflist mutex, but that's not used much now. Add a new for_each_interface() macro that does the iteration in a more ergonomic way. To avoid even more exported functions, do the old ieee80211_iterate_active_interfaces_mtx() as an inline using the new way, which may also let the compiler optimise it a bit more, e.g. via inlining the iterator function. Also provide for_each_active_interface() for the common case of just iterating active interfaces. Link: https://patch.msgid.link/20260108143431.f2581e0c381a.Ie387227504c975c109c125b3c57f0bb3fdab2835@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:48:17 +01:00
Lachlan Hodges	e1cbdf78f6	wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel When sending a channel ensure we include the IEEE80211_CHAN_S1G_NO_PRIMARY flag. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20260109081439.3168-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:47:35 +01:00
Johannes Berg	391234eb48	wifi: mac80211: unexport ieee80211_get_bssid() This is only used within mac80211, and not even declared in a public header file. Don't export it. Link: https://patch.msgid.link/20260109095029.2b4d2fe53fc9.I9f5fa5c84cd42f749be0b87cc61dac8631c4c6d0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:47:25 +01:00
Miri Korenblit	33821a2b20	wifi: mac80211: don't send an unused argument to ieee80211_check_combinations When ieee80211_check_combinations is called with NULL as the chandef, the chanmode argument is not relevant. Send a don't care (0) instead. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260111192411.9aa743647b43.I407b3d878d94464ce01e25f16c6e2b687bcd8b5a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-12 19:42:39 +01:00
Jakub Kicinski	59ba823e68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc5). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-08 11:38:33 -08:00
Eric Dumazet	c92510f5e3	arp: do not assume dev_hard_header() does not change skb->head arp_create() is the only dev_hard_header() caller making assumption about skb->head being unchanged. A recent commit broke this assumption. Initialize @arp pointer after dev_hard_header() call. Fixes: `db5b4e39c4` ("ip6_gre: make ip6gre_header() robust") Reported-by: syzbot+58b44a770a1585795351@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260107212250.384552-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-08 09:04:24 -08:00
Jakub Kicinski	804809ae40	Merge tag 'wireless-2026-01-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of fixes: - mac80211: - long-standing injection bug due to chanctx rework - more recent interface iteration issue - collect statistics before removing stations - hwsim: - fix NAN frequency typo (potential NULL ptr deref) - fix locking of radio lock (needs softirqs disabled) - wext: - ancient issue with compat and events copying some uninitialized stack data to userspace * tag 'wireless-2026-01-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: collect station statistics earlier when disconnect wifi: mac80211: restore non-chanctx injection behaviour wifi: mac80211_hwsim: disable BHs for hwsim_radio_lock wifi: mac80211: don't iterate not running interfaces wifi: mac80211_hwsim: fix typo in frequency notification wifi: avoid kernel-infoleak from struct iw_point ==================== Link: https://patch.msgid.link/20260108140141.139687-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-08 08:49:24 -08:00
Willem de Bruijn	7d11e047ed	net: do not write to msg_get_inq in callee NULL pointer dereference fix. msg_get_inq is an input field from caller to callee. Don't set it in the callee, as the caller may not clear it on struct reuse. This is a kernel-internal variant of msghdr only, and the only user does reinitialize the field. So this is not critical for that reason. But it is more robust to avoid the write, and slightly simpler code. And it fixes a bug, see below. Callers set msg_get_inq to request the input queue length to be returned in msg_inq. This is equivalent to but independent from the SO_INQ request to return that same info as a cmsg (tp->recvmsg_inq). To reduce branching in the hot path the second also sets the msg_inq. That is WAI. This is a fix to commit `4d1442979e` ("af_unix: don't post cmsg for SO_INQ unless explicitly asked for"), which fixed the inverse. Also avoid NULL pointer dereference in unix_stream_read_generic if state->msg is NULL and msg->msg_get_inq is written. A NULL state->msg can happen when splicing as of commit `2b514574f7` ("net: af_unix: implement splice for stream af_unix sockets"). Also collapse two branches using a bitwise or. Cc: stable@vger.kernel.org Fixes: `4d1442979e` ("af_unix: don't post cmsg for SO_INQ unless explicitly asked for") Link: https://lore.kernel.org/netdev/willemdebruijn.kernel.24d8030f7a3de@gmail.com/ Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260106150626.3944363-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-08 08:45:13 -08:00
Xiang Mei	c1d73b1480	net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset `qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class itself is active. Two qfq_class objects may point to the same leaf_qdisc. This happens when: 1. one QFQ qdisc is attached to the dev as the root qdisc, and 2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get() / qdisc_put()) and is pending to be destroyed, as in function tc_new_tfilter. When packets are enqueued through the root QFQ qdisc, the shared leaf_qdisc->q.qlen increases. At the same time, the second QFQ qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters qfq_reset() with its own q->q.qlen == 0, but its class's leaf qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate an inactive aggregate and trigger a null-deref in qfq_deactivate_agg: [ 0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 0.903571] #PF: supervisor write access in kernel mode [ 0.903860] #PF: error_code(0x0002) - not-present page [ 0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0 [ 0.904502] Oops: Oops: 0002 [#1] SMP NOPTI [ 0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE [ 0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.909179] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.909572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.910247] PKRU: 55555554 [ 0.910391] Call Trace: [ 0.910527] <TASK> [ 0.910638] qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485) [ 0.910826] qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036) [ 0.911040] __qdisc_destroy (net/sched/sch_generic.c:1076) [ 0.911236] tc_new_tfilter (net/sched/cls_api.c:2447) [ 0.911447] rtnetlink_rcv_msg (net/core/rtnetlink.c:6958) [ 0.911663] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861) [ 0.911894] netlink_rcv_skb (net/netlink/af_netlink.c:2550) [ 0.912100] netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) [ 0.912296] ? __alloc_skb (net/core/skbuff.c:706) [ 0.912484] netlink_sendmsg (net/netlink/af_netlink.c:1894) [ 0.912682] sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1)) [ 0.912880] vfs_write (fs/read_write.c:593 fs/read_write.c:686) [ 0.913077] ksys_write (fs/read_write.c:738) [ 0.913252] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 0.913438] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) [ 0.913687] RIP: 0033:0x424c34 [ 0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9 Code starting with the faulting instruction =========================================== 0: 89 02 mov %eax,(%rdx) 2: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 9: eb bd jmp 0xffffffffffffffc8 b: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 15: 90 nop 16: f3 0f 1e fa endbr64 1a: 80 3d 2d 44 09 00 00 cmpb $0x0,0x9442d(%rip) # 0x9444e 21: 74 13 je 0x36 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a: 09 .byte 0x9 [ 0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34 [ 0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003 [ 0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000 [ 0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28 [ 0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001 [ 0.917039] </TASK> [ 0.917158] Modules linked in: [ 0.917316] CR2: 0000000000000000 [ 0.917484] ---[ end trace 0000000000000000 ]--- [ 0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.921014] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.921424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.922097] PKRU: 55555554 [ 0.922240] Kernel panic - not syncing: Fatal exception [ 0.922590] Kernel Offset: disabled Fixes: `0545a30377` ("pkt_sched: QFQ - quick fair queue scheduler") Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-08 08:22:28 -08:00
Baochen Qiang	a203dbeeca	wifi: mac80211: collect station statistics earlier when disconnect In __sta_info_destroy_part2(), station statistics are requested after the IEEE80211_STA_NONE -> IEEE80211_STA_NOTEXIST transition. This is problematic because the driver may be unable to handle the request due to the STA being in the NOTEXIST state (i.e. if the driver destroys the underlying data when transitioning to NOTEXIST). Move the statistics collection to before the state transition to avoid this issue. Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251222-mac80211-move-station-stats-collection-earlier-v1-1-12cd4e42c633@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:33:11 +01:00
Johannes Berg	d594cc6f2c	wifi: mac80211: restore non-chanctx injection behaviour During the transition to use channel contexts throughout, the ability to do injection while in monitor mode concurrent with another interface was lost, since the (virtual) monitor won't have a chanctx assigned in this scenario. It's harder to fix drivers that actually transitioned to using channel contexts themselves, such as mt76, but it's easy to do those that are (still) just using the emulation. Do that. Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218763 Reported-and-tested-by: Oscar Alfonso Diaz <oscar.alfonso.diaz@gmail.com> Fixes: `0a44dfc070` ("wifi: mac80211: simplify non-chanctx drivers") Link: https://patch.msgid.link/20251216105242.18366-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:33:10 +01:00
Miri Korenblit	c0d82ba961	wifi: mac80211: don't iterate not running interfaces for_each_chanctx_user_* was introdcued as a replacement for for_each_sdata_link, which visits also other chanctx users that are not link. for_each_sdata_link skips not running interfaces, do the same for for_each_chanctx_user_* Fixes: `1ce954c98b` ("wifi: mac80211: add and use chanctx usage iteration") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260107143736.55c084e2a976.I38b7b904a135dadca339321923b501b2c2c5c8c0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:33:10 +01:00
Eric Dumazet	21cbf883d0	wifi: avoid kernel-infoleak from struct iw_point struct iw_point has a 32bit hole on 64bit arches. struct iw_point { void __user pointer; / Pointer to the data (in user space) / __u16 length; / number of fields or size in bytes / __u16 flags; / Optional params */ }; Make sure to zero the structure to avoid disclosing 32bits of kernel data to user space. Fixes: `87de87d5e4` ("wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.c") Reported-by: syzbot+bfc7323743ca6dbcc3d3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/695f83f3.050a0220.1c677c.0392.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260108101927.857582-1-edumazet@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:33:05 +01:00
Huang Chenming	4073ea5161	wifi: cfg80211: Fix use_for flag update on BSS refresh Userspace may fail to connect to certain BSS that were initially marked as unusable due to regulatory restrictions (use_for = 0, e.g., 6 GHz power type mismatch). Even after these restrictions are removed and the BSS becomes usable, connection attempts still fail. The issue occurs in cfg80211_update_known_bss() where the use_for flag is updated using bitwise AND (&=) instead of direct assignment. Once a BSS is marked with use_for = 0, the AND operation masks out any subsequent non-zero values, permanently keeping the flag at 0. This causes __cfg80211_get_bss(), invoked by nl80211_assoc_bss(), to fail the check "(bss->pub.use_for & use_for) != use_for", thereby blocking association. Replace the bitwise AND operation with direct assignment so the use_for flag accurately reflects the current BSS state. Fixes: `d02a12b8e4` ("wifi: cfg80211: add BSS usage reporting") Signed-off-by: Huang Chenming <chenming.huang@oss.qualcomm.com> Link: https://patch.msgid.link/20251209025733.2098456-1-chenming.huang@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:23:08 +01:00
Aditya Kumar Singh	bfcc957b6a	wifi: mac80211: Update csa_finalize to use link_id With cfg80211_stop_link() adding support to stop a link in AP/P2P_GO mode, in failure cases only the corresponding link can be stopped, instead of stopping the whole interface. Hence, invoke cfg80211_stop_link() directly with the link_id set for AP/P2P_GO mode when CSA finalization fails. Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com> Signed-off-by: Manish Dharanenthiran <manish.dharanenthiran@oss.qualcomm.com> Link: https://patch.msgid.link/20251127-stop_link-v2-2-43745846c5fd@qti.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:11:01 +01:00
Manish Dharanenthiran	dc4b176cce	wifi: cfg80211: add cfg80211_stop_link() for per-link teardown Currently, whenever cfg80211_stop_iface() is called, the entire iface is stopped. However, there could be a need in AP/P2P_GO mode, where one would like to stop a single link in MLO operation instead of the whole MLD interface. Hence, introduce cfg80211_stop_link() to allow drivers to tear down only a specified AP/P2P_GO link during MLO operation. Passing -1 preserves the existing behavior of stopping the whole interface. Make cfg80211_stop_iface() call this function by passing -1 to keep the default behavior the same, that is, to stop all links and use cfg80211_stop_link() with the desired link_id for AP/P2P_GO mode, to stop only that link. This brings no behavioral change for single-link/non-MLO interfaces, and enables drivers to stop an AP/P2P_GO link without disrupting other links on the same interface. Signed-off-by: Manish Dharanenthiran <manish.dharanenthiran@oss.qualcomm.com> Link: https://patch.msgid.link/20251127-stop_link-v2-1-43745846c5fd@qti.qualcomm.com [make cfg80211_stop_iface() inline] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-01-08 13:11:01 +01:00
Shivani Gupta	adb25a46dc	net/sched: act_api: avoid dereferencing ERR_PTR in tcf_idrinfo_destroy syzbot reported a crash in tc_act_in_hw() during netns teardown where tcf_idrinfo_destroy() passed an ERR_PTR(-EBUSY) value as a tc_action pointer, leading to an invalid dereference. Guard against ERR_PTR entries when iterating the action IDR so teardown does not call tc_act_in_hw() on an error pointer. Fixes: `84a7d6797e` ("net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit()") Link: https://syzkaller.appspot.com/bug?extid=8f1c492ffa4644ff3826 Reported-by: syzbot+8f1c492ffa4644ff3826@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8f1c492ffa4644ff3826 Signed-off-by: Shivani Gupta <shivani07g@gmail.com> Link: https://patch.msgid.link/20260105005905.243423-1-shivani07g@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:27:18 -08:00
Eric Dumazet	27a01c1969	net: fully inline backlog_unlock_irq_restore() Some arches (like x86) do not inline spin_unlock_irqrestore(). backlog_unlock_irq_restore() is in RPS/RFS critical path, we prefer using spin_unlock() + local_irq_restore() for optimal performance. Also change backlog_unlock_irq_restore() second argument to avoid a pointless dereference. No difference in net/core/dev.o code size. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260105163054.13698-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:14:35 -08:00
Eric Dumazet	e9cd04b281	udp: udplite is unlikely Add some unlikely() annotations to speed up the fast path, at least with clang compiler. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260105101719.2378881-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:06:03 -08:00
Eric Dumazet	e5c8eda39a	udp: call skb_orphan() before skb_attempt_defer_free() Standard UDP receive path does not use skb->destructor. But skmsg layer does use it, since it calls skb_set_owner_sk_safe() from udp_read_skb(). This then triggers this warning in skb_attempt_defer_free(): DEBUG_NET_WARN_ON_ONCE(skb->destructor); We must call skb_orphan() to fix this issue. Fixes: `6471658dc6` ("udp: use skb_attempt_defer_free()") Reported-by: syzbot+3e68572cf2286ce5ebe9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/695b83bd.050a0220.1c9965.002b.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260105093630.1976085-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:05:17 -08:00
Gustavo A. R. Silva	c86af46b9c	ipv4/inet_sock.h: Avoid thousands of -Wflex-array-member-not-at-end warnings Use DEFINE_RAW_FLEX() to avoid thousands of -Wflex-array-member-not-at-end warnings. Remove struct ip_options_data, and adjust the rest of the code so that flexible-array member struct ip_options_rcu::opt.__data[] ends last in struct icmp_bxm. Compensate for this by using the DEFINE_RAW_FLEX() helper to define each on-stack struct instance that contained struct ip_options_data as a member, and to define struct ip_options_rcu with a fixed on-stack size for its nested flexible-array member opt.__data[]. Also, add a couple of code comments to prevent people from adding members to a struct after another member that contains a flexible array. With these changes, fix 2600 warnings of the following type: include/net/inet_sock.h:65:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/aVteBadWA6AbTp7X@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 17:02:52 -08:00
Yumei Huang	cb3de96eea	ipv6: preserve insertion order for same-scope addresses IPv6 addresses with the same scope are returned in reverse insertion order, unlike IPv4. For example, when adding a -> b -> c, the list is reported as c -> b -> a, while IPv4 preserves the original order. This behavior causes: a. When using `ip -6 a save` and `ip -6 a restore`, addresses are restored in the opposite order from which they were saved. See example below showing addresses added as 1::1, 1::2, 1::3 but displayed and saved in reverse order. # ip -6 a a 1::1 dev x # ip -6 a a 1::2 dev x # ip -6 a a 1::3 dev x # ip -6 a s dev x 2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 inet6 1::3/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::2/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::1/128 scope global tentative valid_lft forever preferred_lft forever # ip -6 a save > dump # ip -6 a d 1::1 dev x # ip -6 a d 1::2 dev x # ip -6 a d 1::3 dev x # ip a d ::1 dev lo # ip a restore < dump # ip -6 a s dev x 2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 inet6 1::1/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::2/128 scope global tentative valid_lft forever preferred_lft forever inet6 1::3/128 scope global tentative valid_lft forever preferred_lft forever # ip a showdump < dump if1: inet6 ::1/128 scope host proto kernel_lo valid_lft forever preferred_lft forever if2: inet6 1::3/128 scope global tentative valid_lft forever preferred_lft forever if2: inet6 1::2/128 scope global tentative valid_lft forever preferred_lft forever if2: inet6 1::1/128 scope global tentative valid_lft forever preferred_lft forever b. Addresses in pasta to appear in reversed order compared to host addresses. The ipv6 addresses were added in reverse order by commit `e55ffac601` ("[IPV6]: order addresses by scope"), then it was changed by commit `502a2ffd73` ("ipv6: convert idev_list to list macros"), and restored by commit `b54c9b98bb` ("ipv6: Preserve pervious behavior in ipv6_link_dev_addr()."). However, this reverse ordering within the same scope causes inconsistency with IPv4 and the issues described above. This patch aligns IPv6 address ordering with IPv4 for consistency by changing the comparison from >= to > when inserting addresses into the address list. Also updates the ioam6 selftest to reflect the new address ordering behavior. Combine these two changes into one patch for bisectability. Link: https://bugs.passt.top/show_bug.cgi?id=175 Suggested-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Yumei Huang <yuhuang@redhat.com> Acked-by: Justin Iurman <justin.iurman@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260104032357.38555-1-yuhuang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-06 16:41:35 -08:00
Mohammad Heib	238e03d046	net: fix memory leak in skb_segment_list for GRO packets When skb_segment_list() is called during packet forwarding, it handles packets that were aggregated by the GRO engine. Historically, the segmentation logic in skb_segment_list assumes that individual segments are split from a parent SKB and may need to carry their own socket memory accounting. Accordingly, the code transfers truesize from the parent to the newly created segments. Prior to commit `ed4cccef64` ("gro: fix ownership transfer"), this truesize subtraction in skb_segment_list() was valid because fragments still carry a reference to the original socket. However, commit `ed4cccef64` ("gro: fix ownership transfer") changed this behavior by ensuring that fraglist entries are explicitly orphaned (skb->sk = NULL) to prevent illegal orphaning later in the stack. This change meant that the entire socket memory charge remained with the head SKB, but the corresponding accounting logic in skb_segment_list() was never updated. As a result, the current code unconditionally adds each fragment's truesize to delta_truesize and subtracts it from the parent SKB. Since the fragments are no longer charged to the socket, this subtraction results in an effective under-count of memory when the head is freed. This causes sk_wmem_alloc to remain non-zero, preventing socket destruction and leading to a persistent memory leak. The leak can be observed via KMEMLEAK when tearing down the networking environment: unreferenced object 0xffff8881e6eb9100 (size 2048): comm "ping", pid 6720, jiffies 4295492526 backtrace: kmem_cache_alloc_noprof+0x5c6/0x800 sk_prot_alloc+0x5b/0x220 sk_alloc+0x35/0xa00 inet6_create.part.0+0x303/0x10d0 __sock_create+0x248/0x640 __sys_socket+0x11b/0x1d0 Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST packets constructed by GRO, the truesize adjustment is removed. The call to skb_release_head_state() must be preserved. As documented in commit `cf673ed0e0` ("net: fix fraglist segmentation reference count leak"), it is still required to correctly drop references to SKB extensions that may be overwritten during __copy_skb_header(). Fixes: `ed4cccef64` ("gro: fix ownership transfer") Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260104213101.352887-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-05 17:01:28 -08:00
Jamal Hadi Salim	9892353726	net/sched: act_mirred: Fix leak when redirecting to self on egress Whenever a mirred redirect to self on egress happens, mirred allocates a new skb (skb_to_send). The loop to self check was done after that allocation, but was not freeing the newly allocated skb, causing a leak. Fix this by moving the if-statement to before the allocation of the new skb. The issue was found by running the accompanying tdc test in 2/2 with config kmemleak enabled. After a few minutes the kmemleak thread ran and reported the leak coming from mirred. Fixes: `1d856251a0` ("net/sched: act_mirred: fix loop detection") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260101135608.253079-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-05 16:23:42 -08:00
Clara Engler	48a4aa9d9c	ipv4: Improve martian logs At the current moment, the logs for martian packets are as follows: ``` martian source {DST} from {SRC}, on dev {DEV} martian destination {DST} from {SRC}, dev {DEV} ``` These messages feel rather hard to understand in production, especially the "martian source" one, mostly because it is grammatically ambitious to parse which part is now the source address and which part is the destination address. For example, "{DST}" may there be interpreted as the actual source address due to following the word "source", thereby implying the actual source address to be the destination one. Personally, I discovered this bug while toying around with TUN interfaces and using them as a tunnel (receiving packets via a TUN interface and sending them over a TCP stream; receiving packets from a TCP stream and writing them to a TUN).[^1] When these IP addresses contained local IPs (i.e. 10.0.0.0/8 in source and destination), everything worked fine. However, sending them to a real routable IP address on the internet led to them being treated as a martian packet, obviously. Using a few sysctl(8) and iptables(8) settings[^2] fixed it, but while debugging I found the log message starting with "martian source" rather confusing, as I was unsure on whether the packet that gets dropped was the packet originating from me or the response from the endpoint, as "martian source <ROUTABLE IP>" could also be falsely interpreted as the response packet being martian, due to the word "source" followed by the routable IP address, implying the source address of that packet is set to this IP, as explained above. In the end, I had to look into the source code of the kernel on where this error message gets generated, which is usually an indicator of there being room for improvement with regard to this error message. In terms of improvement, this commit changes the error messages for martian source and martian destination packets as follows: ``` martian source (src={SRC}, dst={DST}, dev={DEV}) martian destination (src={SRC}, dst={DST}, dev={DEV}) ``` These new wordings leave pretty much no room for ambiguity as all parameters are prefixed with a respective key explaining their semantic meaning. See also the following thread on LKML.[^3] [^1]: <https://backreference.org/2010/03/26/tuntap-interface-tutorial> [^2]: sysctl net.ipv4.ip_forward=1 && \ iptables -A INPUT -i tun0 -j ACCEPT && \ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE [^3]: <https://lore.kernel.org/all/aSd4Xj8rHrh-krjy@4944566b5c925f79/> Signed-off-by: Clara Engler <cve@cve.cx> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260101125114.2608-1-cve@cve.cx Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-05 16:15:50 -08:00
Michal Luczaj	ce5e612dd4	vsock: Make accept()ed sockets use custom setsockopt() SO_ZEROCOPY handling in vsock_connectible_setsockopt() does not get called on accept()ed sockets due to a missing flag. Flip it. Fixes: `e0718bd82e` ("vsock: enable setting SO_ZEROCOPY") Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-1-64778d6c4f88@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-05 16:14:50 -08:00
Jakub Kicinski	d6f6c6d909	Merge tag 'nf-26-01-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net The following patchset contains Netfilter fixes for net: 1) Fix overlap detection for nf_tables with concatenated ranges. There are cases where element could not be added due to a conflict with existing range, while kernel reports success to userspace. 2) update selftest to cover this bug. 3) synproxy update path should use READ/WRITE once as we replace config struct while packet path might read it in parallel. This relies on said config struct to fit sizeof(long). From Fernando Fernandez Mancera. 4) Don't return -EEXIST from xtables in module load path, a pending patch to module infra will spot a warning if this happens. From Daniel Gomez. 5) Fix a memory leak in nf_tables when chain hits 2*32 users and rule is to be hw-offloaded, from Zilin Guan. 6) Avoid infinite list growth when insert rate is high in nf_conncount, also from Fernando. tag 'nf-26-01-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conncount: update last_gc only when GC has been performed netfilter: nf_tables: fix memory leak in nf_tables_newrule() netfilter: replace -EEXIST with -EBUSY netfilter: nft_synproxy: avoid possible data-race on update operation selftests: netfilter: nft_concat_range.sh: add check for overlap detection bug netfilter: nft_set_pipapo: fix range overlap detection ==================== Link: https://patch.msgid.link/20260102114128.7007-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 10:59:59 -08:00
Florian Westphal	2ef02ac38d	inet: frags: drop fraglist conntrack references Jakub added a warning in nf_conntrack_cleanup_net_list() to make debugging leaked skbs/conntrack references more obvious. syzbot reports this as triggering, and I can also reproduce this via ip_defrag.sh selftest: conntrack cleanup blocked for 60s WARNING: net/netfilter/nf_conntrack_core.c:2512 [..] conntrack clenups gets stuck because there are skbs with still hold nf_conn references via their frag_list. net.core.skb_defer_max=0 makes the hang disappear. Eric Dumazet points out that skb_release_head_state() doesn't follow the fraglist. ip_defrag.sh can only reproduce this problem since commit `6471658dc6` ("udp: use skb_attempt_defer_free()"), but AFAICS this problem could happen with TCP as well if pmtu discovery is off. The relevant problem path for udp is: 1. netns emits fragmented packets 2. nf_defrag_v6_hook reassembles them (in output hook) 3. reassembled skb is tracked (skb owns nf_conn reference) 4. ip6_output refragments 5. refragmented packets also own nf_conn reference (ip6_fragment calls ip6_copy_metadata()) 6. on input path, nf_defrag_v6_hook skips defragmentation: the fragments already have skb->nf_conn attached 7. skbs are reassembled via ipv6_frag_rcv() 8. skb_consume_udp -> skb_attempt_defer_free() -> skb ends up in pcpu freelist, but still has nf_conn reference. Possible solutions: 1 let defrag engine drop nf_conn entry, OR 2 export kick_defer_list_purge() and call it from the conntrack netns exit callback, OR 3 add skb_has_frag_list() check to skb_attempt_defer_free() 2 & 3 also solve ip_defrag.sh hang but share same drawback: Such reassembled skbs, queued to socket, can prevent conntrack module removal until userspace has consumed the packet. While both tcp and udp stack do call nf_reset_ct() before placing skb on socket queue, that function doesn't iterate frag_list skbs. Therefore drop nf_conn entries when they are placed in defrag queue. Keep the nf_conn entry of the first (offset 0) skb so that reassembled skb retains nf_conn entry for sake of TX path. Note that fixes tag is incorrect; it points to the commit introducing the 'ip_defrag.sh reproducible problem': no need to backport this patch to every stable kernel. Reported-by: syzbot+4393c47753b7808dac7d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/693b0fa7.050a0220.4004e.040d.GAE@google.com/ Fixes: `6471658dc6` ("udp: use skb_attempt_defer_free()") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260102140030.32367-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 10:58:04 -08:00
Weiming Shi	2a71a1a8d0	net: sock: fix hardened usercopy panic in sock_recv_errqueue skbuff_fclone_cache was created without defining a usercopy region, [1] unlike skbuff_head_cache which properly whitelists the cb[] field. [2] This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled and the kernel attempts to copy sk_buff.cb data to userspace via sock_recv_errqueue() -> put_cmsg(). The crash occurs when: 1. TCP allocates an skb using alloc_skb_fclone() (from skbuff_fclone_cache) [1] 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3] 3. The cloned skb is queued to sk_error_queue for timestamp reporting 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE) 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4] 6. __check_heap_object() fails because skbuff_fclone_cache has no usercopy whitelist [5] When cloned skbs allocated from skbuff_fclone_cache are used in the socket error queue, accessing the sock_exterr_skb structure in skb->cb via put_cmsg() triggers a usercopy hardening violation: [ 5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)! [ 5.382796] kernel BUG at mm/usercopy.c:102! [ 5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI [ 5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7 [ 5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 5.384903] RIP: 0010:usercopy_abort+0x6c/0x80 [ 5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490 [ 5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246 [ 5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74 [ 5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0 [ 5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74 [ 5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001 [ 5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00 [ 5.384903] FS: 0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000 [ 5.384903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0 [ 5.384903] PKRU: 55555554 [ 5.384903] Call Trace: [ 5.384903] <TASK> [ 5.384903] __check_heap_object+0x9a/0xd0 [ 5.384903] __check_object_size+0x46c/0x690 [ 5.384903] put_cmsg+0x129/0x5e0 [ 5.384903] sock_recv_errqueue+0x22f/0x380 [ 5.384903] tls_sw_recvmsg+0x7ed/0x1960 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? schedule+0x6d/0x270 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5.384903] ? mutex_unlock+0x81/0xd0 [ 5.384903] ? __pfx_mutex_unlock+0x10/0x10 [ 5.384903] ? __pfx_tls_sw_recvmsg+0x10/0x10 [ 5.384903] ? _raw_spin_lock_irqsave+0x8f/0xf0 [ 5.384903] ? _raw_read_unlock_irqrestore+0x20/0x40 [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5 The crash offset 296 corresponds to skb2->cb within skbuff_fclones: - sizeof(struct sk_buff) = 232 - offsetof(struct sk_buff, cb) = 40 - offset of skb2.cb in fclones = 232 + 40 = 272 - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee) This patch uses a local stack variable as a bounce buffer to avoid the hardened usercopy check failure. [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885 [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104 [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566 [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491 [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719 Fixes: `6d07d1cd30` ("usercopy: Restrict non-usercopy caches to size 0") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251223203534.1392218-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 09:54:32 -08:00
yuan.gao	4c0856c225	inet: ping: Fix icmp out counting When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO messages, ICMP_MIB_OUTMSGS is counted twice. ping_v4_sendmsg ping_v4_push_pending_frames ip_push_pending_frames ip_finish_skb __ip_make_skb icmp_out_count(net, icmp_type); // first count icmp_out_count(sock_net(sk), user_icmph.type); // second count However, when the ping program uses an IPPROTO_RAW socket, ICMP_MIB_OUTMSGS is counted correctly only once. Therefore, the first count should be removed. Fixes: `c319b4d76b` ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: yuan.gao <yuan.gao@ucloud.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251224063145.3615282-1-yuan.gao@ucloud.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 09:48:53 -08:00
Alexandre Knecht	3128df6be1	bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is incorrectly stripped from frames during egress processing. br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also moves any "next" VLAN from the payload into hwaccel: /* move next vlan tag to hw accel tag */ __skb_vlan_pop(skb, &vlan_tci); __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci); For QinQ frames where the C-VLAN sits in the payload, this moves it to hwaccel where it gets lost during VXLAN encapsulation. Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only the hwaccel S-VLAN and leaves the payload untouched. This path is only taken when vlan_tunnel is enabled and tunnel_info is configured, so 802.1Q bridges are unaffected. Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN preserved in VXLAN payload via tcpdump. Fixes: `11538d039a` ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 09:45:35 -08:00
Fernando Fernandez Mancera	7811ba4524	netfilter: nf_conncount: update last_gc only when GC has been performed Currently last_gc is being updated everytime a new connection is tracked, that means that it is updated even if a GC wasn't performed. With a sufficiently high packet rate, it is possible to always bypass the GC, causing the list to grow infinitely. Update the last_gc value only when a GC has been actually performed. Fixes: `d265929930` ("netfilter: nf_conncount: reduce unnecessary GC") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-02 10:44:28 +01:00
Zilin Guan	d077e8119d	netfilter: nf_tables: fix memory leak in nf_tables_newrule() In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to the err_release_rule label without freeing the allocated flow, leading to a memory leak. Fix this by adding a new label err_destroy_flow and jumping to it when nft_use_inc() fails. This ensures that the flow is properly released in this error case. Fixes: `1689f25924` ("netfilter: nf_tables: report use refcount overflow") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-02 10:44:28 +01:00
Daniel Gomez	2bafeb8d2f	netfilter: replace -EEXIST with -EBUSY The -EEXIST error code is reserved by the module loading infrastructure to indicate that a module is already loaded. When a module's init function returns -EEXIST, userspace tools like kmod interpret this as "module already loaded" and treat the operation as successful, returning 0 to the user even though the module initialization actually failed. Replace -EEXIST with -EBUSY to ensure correct error reporting in the module initialization path. Affected modules: * ebtable_broute ebtable_filter ebtable_nat arptable_filter * ip6table_filter ip6table_mangle ip6table_nat ip6table_raw * ip6table_security iptable_filter iptable_mangle iptable_nat * iptable_raw iptable_security Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-01 11:31:48 +01:00
Fernando Fernandez Mancera	36a3200575	netfilter: nft_synproxy: avoid possible data-race on update operation During nft_synproxy eval we are reading nf_synproxy_info struct which can be modified on update operation concurrently. As nf_synproxy_info struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations. Fixes: `ee394f96ad` ("netfilter: nft_synproxy: add synproxy stateful object support") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-01 11:31:48 +01:00
Florian Westphal	7711f4bb4b	netfilter: nft_set_pipapo: fix range overlap detection set->klen has to be used, not sizeof(). The latter only compares a single register but a full check of the entire key is needed. Example: table ip t { map s { typeof iifname . ip saddr : verdict flags interval } } nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected nft add element t s '{ "lo" . 10.0.0.0/8 : drop }' # bug: no error The 3rd 'add element' should be rejected via -ENOTEMPTY, not -EEXIST, so userspace / nft can report an error to the user. The latter is only correct for the 2nd case (re-add of existing element). As-is, userspace is told that the command was successful, but no elements were added. After this patch, 3rd command gives: Error: Could not process rule: File exists add element t s { "lo" . 127.0.0.0/8 . "lo" : drop } ^^^^^^^^^^^^^^^^^^^^^^^^^ Fixes: `0eb4b5ee33` ("netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion") Signed-off-by: Florian Westphal <fw@strlen.de>	2026-01-01 11:31:48 +01:00
Linus Torvalds	dbf8fe85a1	Merge tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth and WiFi. Notably this includes the fix for the iwlwifi issue you reported. Current release - regressions: - core: avoid prefetching NULL pointers - wifi: - iwlwifi: implement settime64 as stub for MVM/MLD PTP - mac80211: fix list iteration in ieee80211_add_virtual_monitor() - handshake: fix null-ptr-deref in handshake_complete() - eth: mana: fix use-after-free in reset service rescan path Previous releases - regressions: - openvswitch: avoid needlessly taking the RTNL on vport destroy - dsa: properly keep track of conduit reference - ipv4: - fix error route reference count leak with nexthop objects - fib: restore ECMP balance from loopback - mptcp: ensure context reset on disconnect() - bluetooth: fix potential UaF in btusb - nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write - eth: - gve: defer interrupt enabling until NAPI registration - i40e: fix scheduling in set_rx_mode - macb: relocate mog_init_rings() callback from macb_mac_link_up() to macb_open() - rtl8150: fix memory leak on usb_submit_urb() failure - wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc() Previous releases - always broken: - ip6_gre: make ip6gre_header() robust - ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT - af_unix: don't post cmsg for SO_INQ unless explicitly asked for - phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration - wifi: mac80211: discard beacon frames to non-broadcast address - eth: - iavf: fix off-by-one issues in iavf_config_rss_reg() - stmmac: fix the crash issue for zero copy XDP_TX action - team: fix check for port enabled when priority changes" * tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT net: rose: fix invalid array index in rose_kill_by_device() net: enetc: do not print error log if addr is 0 net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open() selftests: fib_test: Add test case for ipv4 multi nexthops net: fib: restore ECMP balance from loopback selftests: fib_nexthops: Add test cases for error routes deletion ipv4: Fix reference count leak when using error routes with nexthop objects net: usb: sr9700: fix incorrect command used to write single register ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr() usbnet: avoid a possible crash in dql_completed() gve: defer interrupt enabling until NAPI registration net: stmmac: fix the crash issue for zero copy XDP_TX action octeontx2-pf: fix "UBSAN: shift-out-of-bounds error" af_unix: don't post cmsg for SO_INQ unless explicitly asked for net: mana: Fix use-after-free in reset service rescan path net: avoid prefetching NULL pointers net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct net: nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write net: usb: asix: validate PHY address before use ...	2025-12-30 08:45:58 -08:00
Jiayuan Chen	1adaea51c6	ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT On PREEMPT_RT kernels, after rt6_get_pcpu_route() returns NULL, the current task can be preempted. Another task running on the same CPU may then execute rt6_make_pcpu_route() and successfully install a pcpu_rt entry. When the first task resumes execution, its cmpxchg() in rt6_make_pcpu_route() will fail because rt6i_pcpu is no longer NULL, triggering the BUG_ON(prev). It's easy to reproduce it by adding mdelay() after rt6_get_pcpu_route(). Using preempt_disable/enable is not appropriate here because ip6_rt_pcpu_alloc() may sleep. Fix this by handling the cmpxchg() failure gracefully on PREEMPT_RT: free our allocation and return the existing pcpu_rt installed by another task. The BUG_ON is replaced by WARN_ON_ONCE for non-PREEMPT_RT kernels where such races should not occur. Link: https://syzkaller.appspot.com/bug?extid=9b35e9bc0951140d13e6 Fixes: `d2d6422f8b` ("x86: Allow to enable PREEMPT_RT.") Reported-by: syzbot+9b35e9bc0951140d13e6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6918cd88.050a0220.1c914e.0045.GAE@google.com/T/ Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20251223051413.124687-1-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-30 12:04:36 +01:00
Pwnverse	6595beb40f	net: rose: fix invalid array index in rose_kill_by_device() rose_kill_by_device() collects sockets into a local array[] and then iterates over them to disconnect sockets bound to a device being brought down. The loop mistakenly indexes array[cnt] instead of array[i]. For cnt < ARRAY_SIZE(array), this reads an uninitialized entry; for cnt == ARRAY_SIZE(array), it is an out-of-bounds read. Either case can lead to an invalid socket pointer dereference and also leaks references taken via sock_hold(). Fix the index to use i. Fixes: `64b8bc7d5f` ("net/rose: fix races in rose_kill_by_device()") Co-developed-by: Fatma Alwasmi <falwasmi@purdue.edu> Signed-off-by: Fatma Alwasmi <falwasmi@purdue.edu> Signed-off-by: Pwnverse <stanksal@purdue.edu> Link: https://patch.msgid.link/20251222212227.4116041-1-ritviktanksalkar@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-30 11:45:51 +01:00
Vadim Fedorenko	6e17474aa9	net: fib: restore ECMP balance from loopback Preference of nexthop with source address broke ECMP for packets with source addresses which are not in the broadcast domain, but rather added to loopback/dummy interfaces. Original behaviour was to balance over nexthops while now it uses the latest nexthop from the group. To fix the issue introduce next hop scoring system where next hops with source address equal to requested will always have higher priority. For the case with 198.51.100.1/32 assigned to dummy0 and routed using 192.0.2.0/24 and 203.0.113.0/24 networks: 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff inet 198.51.100.1/32 scope global dummy0 valid_lft forever preferred_lft forever 7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.0.2.2/24 scope global veth1 valid_lft forever preferred_lft forever inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll valid_lft forever preferred_lft forever 9: veth3@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 203.0.113.2/24 scope global veth3 valid_lft forever preferred_lft forever inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll valid_lft forever preferred_lft forever ~ ip ro list: default nexthop via 192.0.2.1 dev veth1 weight 1 nexthop via 203.0.113.1 dev veth3 weight 1 192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2 203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2 before: for i in {1..255} ; do ip ro get 10.0.0.$i; done \| grep veth \| awk ' {print $(NF-2)}' \| sort \| uniq -c: 255 veth3 after: for i in {1..255} ; do ip ro get 10.0.0.$i; done \| grep veth \| awk ' {print $(NF-2)}' \| sort \| uniq -c: 122 veth1 133 veth3 Fixes: `32607a332c` ("ipv4: prefer multipath nexthop that matches source address") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20251221192639.3911901-1-vadim.fedorenko@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-30 11:07:38 +01:00
Ido Schimmel	ac782f4e3b	ipv4: Fix reference count leak when using error routes with nexthop objects When a nexthop object is deleted, it is marked as dead and then fib_table_flush() is called to flush all the routes that are using the dead nexthop. The current logic in fib_table_flush() is to only flush error routes (e.g., blackhole) when it is called as part of network namespace dismantle (i.e., with flush_all=true). Therefore, error routes are not flushed when their nexthop object is deleted: # ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip route add 198.51.100.1/32 nhid 1 # ip route add blackhole 198.51.100.2/32 nhid 1 # ip nexthop del id 1 # ip route show blackhole 198.51.100.2 nhid 1 dev dummy1 As such, they keep holding a reference on the nexthop object which in turn holds a reference on the nexthop device, resulting in a reference count leak: # ip link del dev dummy1 [ 70.516258] unregister_netdevice: waiting for dummy1 to become free. Usage count = 2 Fix by flushing error routes when their nexthop is marked as dead. IPv6 does not suffer from this problem. Fixes: `493ced1ac4` ("ipv4: Allow routes to use nexthop objects") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Closes: https://lore.kernel.org/netdev/d943f806-4da6-4970-ac28-b9373b0e63ac@I-love.SAKURA.ne.jp/ Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251221144829.197694-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-30 10:39:22 +01:00
Will Rosenberg	58fc7342b5	ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr() There exists a kernel oops caused by a BUG_ON(nhead < 0) at net/core/skbuff.c:2232 in pskb_expand_head(). This bug is triggered as part of the calipso_skbuff_setattr() routine when skb_cow() is passed headroom > INT_MAX (i.e. (int)(skb_headroom(skb) + len_delta) < 0). The root cause of the bug is due to an implicit integer cast in __skb_cow(). The check (headroom > skb_headroom(skb)) is meant to ensure that delta = headroom - skb_headroom(skb) is never negative, otherwise we will trigger a BUG_ON in pskb_expand_head(). However, if headroom > INT_MAX and delta <= -NET_SKB_PAD, the check passes, delta becomes negative, and pskb_expand_head() is passed a negative value for nhead. Fix the trigger condition in calipso_skbuff_setattr(). Avoid passing "negative" headroom sizes to skb_cow() within calipso_skbuff_setattr() by only using skb_cow() to grow headroom. PoC: Using `netlabelctl` tool: netlabelctl map del default netlabelctl calipso add pass doi:7 netlabelctl map add default address:0::1/128 protocol:calipso,7 Then run the following PoC: int fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP); // setup msghdr int cmsg_size = 2; int cmsg_len = 0x60; struct msghdr msg; struct sockaddr_in6 dest_addr; struct cmsghdr * cmsg = (struct cmsghdr ) calloc(1, sizeof(struct cmsghdr) + cmsg_len); msg.msg_name = &dest_addr; msg.msg_namelen = sizeof(dest_addr); msg.msg_iov = NULL; msg.msg_iovlen = 0; msg.msg_control = cmsg; msg.msg_controllen = cmsg_len; msg.msg_flags = 0; // setup sockaddr dest_addr.sin6_family = AF_INET6; dest_addr.sin6_port = htons(31337); dest_addr.sin6_flowinfo = htonl(31337); dest_addr.sin6_addr = in6addr_loopback; dest_addr.sin6_scope_id = 31337; // setup cmsghdr cmsg->cmsg_len = cmsg_len; cmsg->cmsg_level = IPPROTO_IPV6; cmsg->cmsg_type = IPV6_HOPOPTS; char hop_hdr = (char )cmsg + sizeof(struct cmsghdr); hop_hdr[1] = 0x9; //set hop size - (0x9 + 1) 8 = 80 sendmsg(fd, &msg, 0); Fixes: `2917f57b6b` ("calipso: Allow the lsm to label the skbuff directly.") Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Will Rosenberg <whrosenb@asu.edu> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20251219173637.797418-1-whrosenb@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-29 19:36:45 +01:00
Paolo Abeni	a6694b7e39	Merge tag 'wireless-2025-12-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Various fixes all over, most are recent regressions but also some long-standing issues: - cfg80211: - fix an issue with overly long SSIDs - mac80211: - long-standing beacon protection issue on some devices - for for a multi-BSSID AP-side issue - fix a syzbot warning on OCB (not really used in practice) - remove WARN on connections using disabled channels, as that can happen due to changes in the disable flag - fix monitor mode list iteration - iwlwifi: - fix firmware loading on certain (really old) devices - add settime64 to PTP clock to avoid a warning and clock registration failure, but it's not actually supported - rtw88: - remove WQ_UNBOUND since it broke USB adapters (because it can't be used with WQ_BH) - fix SDIO issues with certain devices - rtl8192cu: fix TID array out-of-bounds (since 6.9) - wlcore (TI): add missing skb push headroom increase * tag 'wireless-2025-12-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: Implement settime64 as stub for MVM/MLD PTP wifi: iwlwifi: Fix firmware version handling wifi: mac80211: ocb: skip rx_no_sta when interface is not joined wifi: mac80211: do not use old MBSSID elements wifi: mac80211: don't WARN for connections on invalid channels wifi: wlcore: ensure skb headroom before skb_push wifi: cfg80211: sme: store capped length in __cfg80211_connect_result() wifi: mac80211: fix list iteration in ieee80211_add_virtual_monitor() wifi: mac80211: Discard Beacon frames to non-broadcast address Revert "wifi: rtw88: add WQ_UNBOUND to alloc_workqueue users" wifi: rtlwifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc() wifi: rtw88: limit indirect IO under powered off for RTL8822CS ==================== Link: https://patch.msgid.link/20251217201441.59876-3-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-29 17:04:01 +01:00
Jens Axboe	4d1442979e	af_unix: don't post cmsg for SO_INQ unless explicitly asked for A previous commit added SO_INQ support for AF_UNIX (SOCK_STREAM), but it posts a SCM_INQ cmsg even if just msg->msg_get_inq is set. This is incorrect, as ->msg_get_inq is just the caller asking for the remainder to be passed back in msg->msg_inq, it has nothing to do with cmsg. The original commit states that this is done to make sockets io_uring-friendly", but it's actually incorrect as io_uring doesn't use cmsg headers internally at all, and it's actively wrong as this means that cmsg's are always posted if someone does recvmsg via io_uring. Fix that up by only posting a cmsg if u->recvmsg_inq is set. Additionally, mirror how TCP handles inquiry handling in that it should only be done for a successful return. This makes the logic for the two identical. Cc: stable@vger.kernel.org Fixes: `df30285b36` ("af_unix: Introduce SO_INQ.") Reported-by: Julian Orth <ju.orth@gmail.com> Link: https://github.com/axboe/liburing/issues/1509 Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/07adc0c2-2c3b-4d08-8af1-1c466a40b6a8@kernel.dk Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-28 16:11:22 +01:00
Eric Dumazet	c04de0c795	net: avoid prefetching NULL pointers Aditya Gupta reported PowerPC crashes bisected to the blamed commit. Apparently some platforms do not allow prefetch() on arbitrary pointers. prefetch(next); prefetch(&next->priority); // CRASH when next == NULL Only NULL seems to be supported, with specific handling in prefetch(). Add a conditional to avoid the two prefetches and the skb->next clearing for the last skb in the list. Fixes: `b2e9821cff` ("net: prefech skb->priority in __dev_xmit_skb()") Reported-by: Aditya Gupta <adityag@linux.ibm.com> Closes: https://lore.kernel.org/netdev/e9f4abee-b132-440f-a50e-bced0868b5a7@linux.ibm.com/T/#mddc372b64ec5a3b181acc9ee3909110c391cc18a Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Aditya Gupta <adityag@linux.ibm.com> Link: https://patch.msgid.link/20251218081844.809008-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-28 10:19:11 +01:00

1 2 3 4 5 ...

82728 Commits