Paolo pointed out that we can avoid separately allocating struct
cake_sched_config even in the non-mq case, by embedding it into struct
cake_sched_data. This reduces the complexity of the logic that swaps the
pointers and frees the old value, at the cost of adding 56 bytes to the
latter. Since cake_sched_data is already almost 17k bytes, this seems
like a reasonable tradeoff.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Fixes: bc0ce2bad3 ("net/sched: sch_cake: Factor out config variables into separate struct")
Link: https://patch.msgid.link/20260113143157.2581680-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, can and IPsec.
Current release - regressions:
- net: add net.core.qdisc_max_burst
- can: propagate CAN device capabilities via ml_priv
Previous releases - regressions:
- dst: fix races in rt6_uncached_list_del() and
rt_del_uncached_list()
- ipv6: fix use-after-free in inet6_addr_del().
- xfrm: fix inner mode lookup in tunnel mode GSO segmentation
- ip_tunnel: spread netdev_lockdep_set_classes()
- ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv()
- bluetooth: hci_sync: enable PA sync lost event
- eth: virtio-net:
- fix the deadlock when disabling rx NAPI
- fix misalignment bug in struct virtnet_info
Previous releases - always broken:
- ipv4: ip_gre: make ipgre_header() robust
- can: fix SSP_SRC in cases when bit-rate is higher than 1 MBit.
- eth:
- mlx5e: profile change fix
- octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
- macvlan: fix possible UAF in macvlan_forward_source()"
* tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
virtio_net: Fix misalignment bug in struct virtnet_info
net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
can: raw: instantly reject disabled CAN frames
can: propagate CAN device capabilities via ml_priv
Revert "can: raw: instantly reject unsupported CAN frames"
net/sched: sch_qfq: do not free existing class in qfq_change_class()
selftests: drv-net: fix RPS mask handling for high CPU numbers
selftests: drv-net: fix RPS mask handling in toeplitz test
ipv6: Fix use-after-free in inet6_addr_del().
dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list()
net: hv_netvsc: reject RSS hash key programming without RX indirection table
tools: ynl: render event op docs correctly
net: add net.core.qdisc_max_burst
net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definition
net: phy: motorcomm: fix duplex setting error for phy leds
net: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
net/mlx5e: Restore destroying state bit after profile cleanup
net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv
net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv
net/mlx5e: Fix crash on profile change rollback failure
...
Marc Kleine-Budde says:
====================
pull-request: can 2026-01-15
this is a pull request of 4 patches for net/main, it super-seeds the
"can 2026-01-14" pull request. The dev refcount leak in patch #3 is
fixed.
The first 3 patches are by Oliver Hartkopp and revert the approach to
instantly reject unsupported CAN frames introduced in
net-next-for-v6.19 and replace it by placing the needed data into the
CAN specific ml_priv.
The last patch is by Tetsuo Handa and fixes a J1939 refcount leak for
j1939_session in session deactivation upon receiving the second RTS.
linux-can-fixes-for-6.19-20260115
* tag 'linux-can-fixes-for-6.19-20260115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
can: raw: instantly reject disabled CAN frames
can: propagate CAN device capabilities via ml_priv
Revert "can: raw: instantly reject unsupported CAN frames"
====================
Link: https://patch.msgid.link/20260115090603.1124860-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Steffen Klassert says:
====================
pull request (net): ipsec 2026-01-14
1) Fix inner mode lookup in tunnel mode GSO segmentation.
The protocol was taken from the wrong field.
2) Set ipv4 no_pmtu_disc flag only on output SAs. The
insertation of input SAs can fail if no_pmtu_disc
is set.
Please pull or let me know if there are problems.
ipsec-2026-01-14
* tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is set
xfrm: Fix inner mode lookup in tunnel mode GSO segmentation
====================
Link: https://patch.msgid.link/20260114121817.1106134-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We (Paolo and I) noticed that in the sending path touching an extra
cacheline due to cq_cached_prod_lock will impact the performance. After
moving the lock from struct xsk_buff_pool to struct xsk_queue, the
performance is increased by ~5% which can be observed by xdpsock.
An alternative approach [1] can be using atomic_try_cmpxchg() to have the
same effect. But unfortunately I don't have evident performance numbers to
prove the atomic approach is better than the current patch. The advantage
is to save the contention time among multiple xsks sharing the same pool
while the disadvantage is losing good maintenance. The full discussion can
be found at the following link.
[1]: https://lore.kernel.org/all/20251128134601.54678-1-kerneljasonxing@gmail.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20260104012125.44003-3-kerneljasonxing@gmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In the shared umem mode with different queues or devices, either
uninitialized cq or fq is not allowed which was previously done in
xp_assign_dev_shared(). The patch advances the check at the beginning
so that 1) we can avoid a few memory allocation and stuff if cq or fq
is NULL, 2) it can be regarded as preparation for the next patch in
the series.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20260104012125.44003-2-kerneljasonxing@gmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 1a620a7238
and its follow-up fixes for the introduced dependency issues.
commit 1a620a7238 ("can: raw: instantly reject unsupported CAN frames")
commit cb2dc6d286 ("can: Kconfig: select CAN driver infrastructure by default")
commit 6abd4577bc ("can: fix build dependency")
commit 5a5aff6338 ("can: fix build dependency")
The entire problem was caused by the requirement that a new network layer
feature needed to know about the protocol capabilities of the CAN devices.
Instead of accessing CAN device internal data structures which caused the
dependency problems a better approach has been developed which makes use of
CAN specific ml_priv data which is accessible from both sides.
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260109144135.8495-2-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull bpf fixes from Alexei Starovoitov:
- Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK in riscv JIT (Menglong
Dong)
- Fix reference count leak in bpf_prog_test_run_xdp() (Tetsuo Handa)
- Fix metadata size check in bpf_test_run() (Toke Høiland-Jørgensen)
- Check that BPF insn array is not allowed as a map for const strings
(Deepanshu Kartikey)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix reference count leak in bpf_prog_test_run_xdp()
bpf: Reject BPF_MAP_TYPE_INSN_ARRAY in check_reg_const_str()
selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
bpf, test_run: Subtract size of xdp_frame from allowed metadata size
riscv, bpf: Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK
IPv4 allows for a nexthop device mismatch when the "onlink" keyword is
specified:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2
Error: Nexthop has invalid gateway.
# ip route add 198.51.100.0/24 nexthop via 192.0.2.2 dev dummy2 onlink
# echo $?
0
This seems to be consistent with the description of "onlink" in the
ip-route man page: "Pretend that the nexthop is directly attached to
this link, even if it does not match any interface prefix".
On the other hand, IPv6 rejects a nexthop device mismatch, even when
"onlink" is specified:
# ip link add name dummy1 up type dummy
# ip address add 2001:db8:1::1/64 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2
RTNETLINK answers: No route to host
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
Error: Nexthop has invalid gateway or device mismatch.
This is intentional according to commit fc1e64e109 ("net/ipv6: Add
support for onlink flag") which added IPv6 "onlink" support and states
that "any unicast gateway is allowed as long as the gateway is not a
local address and if it resolves it must match the given device".
The condition was later relaxed in commit 4ed591c8ab ("net/ipv6: Allow
onlink routes to have a device mismatch if it is the default route") to
allow for a nexthop device mismatch if the gateway address is resolved
via the default route:
# ip link add name dummy1 up type dummy
# ip route add ::/0 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2
RTNETLINK answers: No route to host
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
# echo $?
0
While the decision to forbid a nexthop device mismatch in IPv6 seems to
be intentional, it is unclear why it was made. Especially when it
differs from IPv4 and seems to go against the intended behavior of
"onlink".
Therefore, relax the condition further and allow for a nexthop device
mismatch when "onlink" is specified:
# ip link add name dummy1 up type dummy
# ip address add 2001:db8:1::1/64 dev dummy1
# ip link add name dummy2 up type dummy
# ip route add 2001:db8:10::/64 nexthop via 2001:db8:1::2 dev dummy2 onlink
# echo $?
0
The motivating use case is the fact that FRR would like to be able to
configure overlay routes of the following form:
# ip route add <host-Z> vrf <VRF> encap ip id <ID> src <VTEP-A> dst <VTEP-Z> via <VTEP-Z> dev vxlan0 onlink
Where vxlan0 is in the default VRF in which "VTEP-Z" is reachable via
one of the underlay routes (e.g., via swpX). Without this patch, the
above only works with IPv4, but not with IPv6.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ethernet provides a wide variety of layer 1 protocols and standards for
data transmission. The front-facing ports of an interface have their own
complexity and configurability.
Introduce a representation of these front-facing ports. The current code
is minimalistic and only support ports controlled by PHY devices, but
the plan is to extend that to SFP as well as raw Ethernet MACs that
don't use PHY devices.
This minimal port representation allows describing the media and number
of pairs of a BaseT port. From that information, we can derive the
linkmodes usable on the port, which can be used to limit the
capabilities of an interface.
For now, the port pairs and medium is derived from devicetree, defined
by the PHY driver, or populated with default values (as we assume that
all PHYs expose at least one port).
The typical example is 100M ethernet. 100BaseTX works using only 2
pairs on a Cat 5 cables. However, in the situation where a 10/100/1000
capable PHY is wired to its RJ45 port through 2 pairs only, we have no
way of detecting that. The "max-speed" DT property can be used, but a
more accurate representation can be used :
mdi {
connector-0 {
media = "BaseT";
pairs = <2>;
};
};
From that information, we can derive the max speed reachable on the
port.
Another benefit of having that is to avoid vendor-specific DT properties
(micrel,fiber-mode or ti,fiber-mode).
This basic representation is meant to be expanded, by the introduction
of port ops, userspace listing of ports, and support for multi-port
devices.
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260108080041.553250-4-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In an effort to have a better representation of Ethernet ports,
introduce enumeration values representing the various ethernet Mediums.
This is part of the 802.3 naming convention, for example :
1000 Base T 4
| | | |
| | | \_ pairs (4)
| | \___ Medium (T == Twisted Copper Pairs)
| \_______ Baseband transmission
\____________ Speed
Other example :
10000 Base K X 4
| | \_ lanes (4)
| \___ encoding (BaseX is 8b/10b while BaseR is 66b/64b)
\_____ Medium (K is backplane ethernet)
In the case of representing a physical port, only the medium and number
of pairs should be relevant. One exception would be 1000BaseX, which is
currently also used as a medium in what appears to be any of 1000BaseSX,
1000BaseCX, 1000BaseLX, 1000BaseEX, 1000BaseBX10 and some other.
This was reflected in the mediums associated with the 1000BaseX linkmode.
These mediums are set in the net/ethtool/common.c lookup table that
maintains a list of all linkmodes with their number of pairs, medium,
encoding, speed and duplex.
One notable exception to this is 100BaseT Ethernet. It emcompasses 100BaseTX,
which is a 2-pairs protocol but also 100BaseT4, that will also work on 4-pairs
cables. As we don't make a disctinction between these, the lookup table
contains 2 sets of pair numbers, indicating the min number of pairs for a
protocol to work and the "nominal" number of pairs as well.
Another set of exceptions are linkmodes such 100000baseLR4_ER4, where
the same link mode seems to represent 100GBaseLR4 and 100GBaseER4. The
macro __DEFINE_LINK_MODE_PARAMS_MEDIUMS is here used to populate the
.mediums bitfield with all appropriate mediums.
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260108080041.553250-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RDS was written to require ordered workqueues for "cp->cp_wq":
Work is executed in the order scheduled, one item at a time.
If these workqueues are shared across connections,
then work executed on behalf of one connection blocks work
scheduled for a different and unrelated connection.
Luckily we don't need to share these workqueues.
While it obviously makes sense to limit the number of
workers (processes) that ought to be allocated on a system,
a workqueue that doesn't have a rescue worker attached,
has a tiny footprint compared to the connection as a whole:
A workqueue costs ~900 bytes, including the workqueue_struct,
pool_workqueue, workqueue_attrs, wq_node_nr_active and the
node_nr_active flex array. Each connection can have up to 8
(RDS_MPATH_WORKERS) paths for a worst case of ~7 KBytes per
connection. While an RDS/IB connection totals only ~5 MBytes.
So we're getting a signficant performance gain
(90% of connections fail over under 3 seconds vs. 40%)
for a less than 0.02% overhead.
RDS doesn't even benefit from the additional rescue workers:
of all the reasons that RDS blocks workers, allocation under
memory pressue is the least of our concerns. And even if RDS
was stalling due to the memory-reclaim process, the work
executed by the rescue workers are highly unlikely to free up
any memory. If anything, they might try to allocate even more.
By giving each connection path its own workqueues, we allow
RDS to better utilize the unbound workers that the system
has available.
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260109224843.128076-3-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch adds a per connection workqueue which can be initialized
and used independently of the globally shared rds_wq.
This patch is the first in a series that aims to address tcp ack
timeouts during the tcp socket shutdown sequence.
This initial refactoring lays the ground work needed to alleviate
queue congestion during heavy reads and writes. The independently
managed queues will allow shutdowns and reconnects respond more quickly
before the peer(s) timeout waiting for the proper acks.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260109224843.128076-2-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This commit adds shared shaper state across the cake instances beneath a
cake_mq qdisc. It works by periodically tracking the number of active
instances, and scaling the configured rate by the number of active
queues.
The scan is lockless and simply reads the qlen and the last_active state
variable of each of the instances configured beneath the parent cake_mq
instance. Locking is not required since the values are only updated by
the owning instance, and eventual consistency is sufficient for the
purpose of estimating the number of active queues.
The interval for scanning the number of active queues is set to 200 us.
We found this to be a good tradeoff between overhead and response time.
For a detailed analysis of this aspect see the Netdevconf talk:
https://netdevconf.info/0x19/docs/netdev-0x19-paper16-talk-paper.pdf
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-5-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a cake_mq qdisc which installs cake instances on each hardware
queue on a multi-queue device.
This is just a copy of sch_mq that installs cake instead of the default
qdisc on each queue. Subsequent commits will add sharing of the config
between cake instances, as well as a multi-queue aware shaper algorithm.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-3-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
To enable the cake_mq qdisc to reuse code from the mq qdisc, export a
bunch of functions from sch_mq. Split common functionality out from some
functions so it can be composed with other code, and export other
functions wholesale. To discourage wanton reuse, put the symbols into a
new NET_SCHED_INTERNAL namespace, and a sch_priv.h header file.
No functional change intended.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-1-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In blamed commit, I added a check against the temporary queue
built in __dev_xmit_skb(). Idea was to drop packets early,
before any spinlock was acquired.
if (unlikely(defer_count > READ_ONCE(q->limit))) {
kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP);
return NET_XMIT_DROP;
}
It turned out that HTB Qdisc has a zero q->limit.
HTB limits packets on a per-class basis.
Some of our tests became flaky.
Add a new sysctl : net.core.qdisc_max_burst to control
how many packets can be stored in the temporary lockless queue.
Also add a new QDISC_BURST_DROP drop reason to better diagnose
future issues.
Thanks Neal !
Fixes: 100dfa74ca ("net: dev_queue_xmit() llist adoption")
Reported-and-bisected-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
clang is unable to inline the memset() calls in net/core/skbuff.c
when initializing allocated sk_buff.
memset(skb, 0, offsetof(struct sk_buff, tail));
This is unfortunate, because:
1) calling external memset_orig() helper adds a call/ret and
typical setup cost.
2) offsetof(struct sk_buff, tail) == 0xb8 = 0x80 + 0x38
On x86_64, memset_orig() performs two 64 bytes clear,
then has to loop 7 times to clear the final 56 bytes.
skbuff_clear() makes sure the minimal and optimal code
is generated.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260109203836.1667441-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
First set of changes for the current -next cycle, of note:
- ath12k gets an overhaul to support multi-wiphy device
wiphy and pave the way for future device support in
the same driver (rather than splitting to ath13k)
- mac80211 gets some better iteration macros
* tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (120 commits)
wifi: mac80211: remove width argument from ieee80211_parse_bitrates
wifi: mac80211_hwsim: remove NAN by default
wifi: mac80211: improve station iteration ergonomics
wifi: mac80211: improve interface iteration ergonomics
wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel
wifi: mac80211: unexport ieee80211_get_bssid()
wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version
wifi: iwlegacy: 3945-rs: remove redundant pointer check in il3945_rs_tx_status() and il3945_rs_get_rate()
wifi: mac80211: don't send an unused argument to ieee80211_check_combinations
wifi: libertas: fix WARNING in usb_tx_block
wifi: mwifiex: Allocate dev name earlier for interface workqueue name
wifi: wlcore: sdio: Use pm_ptr instead of #ifdef CONFIG_PM
wifi: cfg80211: Fix use_for flag update on BSS refresh
wifi: brcmfmac: rename function that frees vif
wifi: brcmfmac: fix/add kernel-doc comments
wifi: mac80211: Update csa_finalize to use link_id
wifi: cfg80211: add cfg80211_stop_link() for per-link teardown
wifi: ath12k: Skip DP peer creation for scan vdev
wifi: ath12k: move firmware stats request outside of atomic context
wifi: ath12k: add the missing RCU lock in ath12k_dp_tx_free_txbuf()
...
====================
Link: https://patch.msgid.link/20260112185836.378736-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Right now, the only way to iterate stations is to declare an
iterator function, possibly data structure to use, and pass all
that to the iteration helper function. This is annoying, and
there's really no inherent need for it.
Add a new for_each_station() macro that does the iteration in
a more ergonomic way. To avoid even more exported functions, do
the old ieee80211_iterate_stations_mtx() as an inline using the
new way, which may also let the compiler optimise it a bit more,
e.g. via inlining the iterator function.
Link: https://patch.msgid.link/20260108143431.d2b641f6f6af.I4470024f7404446052564b15bcf8b3f1ada33655@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Right now, the only way to iterate interfaces is to declare an
iterator function, possibly data structure to use, and pass all
that to the iteration helper function. This is annoying, and
there's really no inherent need for it, except it was easier to
implement with the iflist mutex, but that's not used much now.
Add a new for_each_interface() macro that does the iteration in
a more ergonomic way. To avoid even more exported functions, do
the old ieee80211_iterate_active_interfaces_mtx() as an inline
using the new way, which may also let the compiler optimise it
a bit more, e.g. via inlining the iterator function.
Also provide for_each_active_interface() for the common case of
just iterating active interfaces.
Link: https://patch.msgid.link/20260108143431.f2581e0c381a.Ie387227504c975c109c125b3c57f0bb3fdab2835@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: enable PA Sync Lost event
* tag 'for-net-2026-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: enable PA Sync Lost event
====================
Link: https://patch.msgid.link/20260109211949.236218-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Analog to commit db5b4e39c4 ("ip6_gre: make ip6gre_header() robust")
Over the years, syzbot found many ways to crash the kernel
in ipgre_header() [1].
This involves team or bonding drivers ability to dynamically
change their dev->needed_headroom and/or dev->hard_header_len
In this particular crash mld_newpack() allocated an skb
with a too small reserve/headroom, and by the time mld_sendpack()
was called, syzbot managed to attach an ipgre device.
[1]
skbuff: skb_under_panic: text:ffffffff89ea3cb7 len:2030915468 put:2030915372 head:ffff888058b43000 data:ffff887fdfa6e194 tail:0x120 end:0x6c0 dev:team0
kernel BUG at net/core/skbuff.c:213 !
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 1322 Comm: kworker/1:9 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Workqueue: mld mld_ifc_work
RIP: 0010:skb_panic+0x157/0x160 net/core/skbuff.c:213
Call Trace:
<TASK>
skb_under_panic net/core/skbuff.c:223 [inline]
skb_push+0xc3/0xe0 net/core/skbuff.c:2641
ipgre_header+0x67/0x290 net/ipv4/ip_gre.c:897
dev_hard_header include/linux/netdevice.h:3436 [inline]
neigh_connected_output+0x286/0x460 net/core/neighbour.c:1618
NF_HOOK_COND include/linux/netfilter.h:307 [inline]
ip6_output+0x340/0x550 net/ipv6/ip6_output.c:247
NF_HOOK+0x9e/0x380 include/linux/netfilter.h:318
mld_sendpack+0x8d4/0xe60 net/ipv6/mcast.c:1855
mld_send_cr net/ipv6/mcast.c:2154 [inline]
mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Reported-by: syzbot+7c134e1c3aa3283790b9@syzkaller.appspotmail.com
Closes: https://www.spinics.net/lists/netdev/msg1147302.html
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260108190214.1667040-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add missing entries in netdev_lock_type[] and netdev_lock_name[] :
CAN, MCTP, RAWIP, CAIF, IP6GRE, 6LOWPAN, NETLINK, VSOCKMON,
IEEE802154_MONITOR.
Also add a WARN_ONCE() in netdev_lock_pos() to help future bug hunting
next time a protocol is added without updating these arrays.
Fixes: 1a33e10e4a ("net: partially revert dynamic lockdep key changes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260108093244.830280-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ceph fixes from Ilya Dryomov:
"A bunch of libceph fixes split evenly between memory safety and
implementation correctness issues (all marked for stable) and a change
in maintainers for CephFS: Slava and Alex have formally taken over
Xiubo's role"
* tag 'ceph-for-6.19-rc5' of https://github.com/ceph/ceph-client:
libceph: make calc_target() set t->paused, not just clear it
libceph: reset sparse-read state in osd_fault()
libceph: return the handler error from mon_handle_auth_done()
libceph: make free_choose_arg_map() resilient to partial allocation
ceph: update co-maintainers list in MAINTAINERS
libceph: replace overzealous BUG_ON in osdmap_apply_incremental()
libceph: prevent potential out-of-bounds reads in handle_auth_done()
Enable the PA Sync Lost event mask to ensure PA sync loss is properly
reported and handled.
Fixes: 485e0626e5 ("Bluetooth: hci_event: Fix not handling PA Sync Lost event")
Signed-off-by: Yang Li <yang.li@amlogic.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Johannes Berg says:
====================
Couple of fixes:
- mac80211:
- long-standing injection bug due to chanctx rework
- more recent interface iteration issue
- collect statistics before removing stations
- hwsim:
- fix NAN frequency typo (potential NULL ptr deref)
- fix locking of radio lock (needs softirqs disabled)
- wext:
- ancient issue with compat and events copying some
uninitialized stack data to userspace
* tag 'wireless-2026-01-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: collect station statistics earlier when disconnect
wifi: mac80211: restore non-chanctx injection behaviour
wifi: mac80211_hwsim: disable BHs for hwsim_radio_lock
wifi: mac80211: don't iterate not running interfaces
wifi: mac80211_hwsim: fix typo in frequency notification
wifi: avoid kernel-infoleak from struct iw_point
====================
Link: https://patch.msgid.link/20260108140141.139687-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>