Commit Graph

19097 Commits

Author SHA1 Message Date
Eric Dumazet
a435163d31 net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
Instead of storing the @log at the beginning of rps_dev_flow_table
use 5 low order bits of the rps_tag_ptr to store the log of the size.

This removes a potential cache line miss (for light traffic).

This allows us to switch to one high-order allocation instead of vmalloc()
when CONFIG_RFS_ACCEL is not set.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:54:10 -08:00
Eric Dumazet
b2cc61857e net-sysfs: remove rcu field from 'struct rps_dev_flow_table'
Remove rps_dev_flow_table_release() in favor of kvfree_rcu_mightsleep().

In the following pach, we will remove "u8 @log" field
and 'struct rps_dev_flow_table' size will be a power-of-two.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:54:10 -08:00
Eric Dumazet
dd378109d2 net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table
Instead of storing the @mask at the beginning of rps_sock_flow_table,
use 5 low order bits of the rps_tag_ptr to store the log of the size.

This removes a potential cache line miss to fetch @mask.

More importantly, we can switch to vmalloc_huge() without wasting memory.

Tested with:

numactl --interleave=all bash -c "echo 4194304 >/proc/sys/net/core/rps_sock_flow_entries"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:54:09 -08:00
Eric Dumazet
9cde131cdd net-sysfs: add rps_sock_flow_table_mask() helper
In preparation of the following patch, abstract access
to the @mask field in 'struct rps_sock_flow_table'.

Also cleanup rps_sock_flow_sysctl() a bit :

- Rename orig_sock_table to o_sock_table.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:54:09 -08:00
Eric Dumazet
61753849b8 net-sysfs: remove rcu field from 'struct rps_sock_flow_table'
Removing rcu_head (and @mask in a following patch)
will allow a power-of-two allocation and thus high-order
allocation for better performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:54:09 -08:00
Eric Dumazet
42a101775b net: add rps_tag_ptr type and helpers
Add a new rps_tag_ptr type to encode a pointer and a size
to a power-of-two table.

Three helpers are added converting an rps_tag_ptr to:

1) A log of the size.

2) A mask : (size - 1).

3) A pointer to the array.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260302181432.1836150-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:54:09 -08:00
Eric Dumazet
c26b8c4e29 net: fix off-by-one in udp_flow_src_port() / psp_write_headers()
udp_flow_src_port() and psp_write_headers() use ip_local_port_range.

ip_local_port_range is inclusive : all ports between min and max
can be used.

Before this patch, if ip_local_port_range was set to 40000-40001
40001 would not be used as a source port.

Use reciprocal_scale() to help code readability.

Not tagged for stable trees, as this change could break user
expectations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260302163933.1754393-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:51:10 -08:00
Jakub Kicinski
dbbda7dd68 Merge tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:

====================
Notable features this time:
 - cfg80211/mac80211
   - finished assoc frame encryption/EPPKE/802.1X-over-auth
     (also hwsim)
   - radar detection improvements
   - 6 GHz incumbent signal detection APIs
   - multi-link support for FILS, probe response
     templates and client probling
 - ath12k:
   - monitor mode support on IPQ5332
   - basic hwmon temperature reporting

* tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (38 commits)
  wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing
  wifi: mac80211_hwsim: change hwsim_class to a const struct
  wifi: mac80211: give the AP more time for EPPKE as well
  wifi: ath12k: Remove the unused argument from the Rx data path
  wifi: ath12k: Enable monitor mode support on IPQ5332
  wifi: ath12k: Set up MLO after SSR
  wifi: ath11k: Silence remoteproc probe deferral prints
  wifi: cfg80211: support key installation on non-netdev wdevs
  wifi: cfg80211: make cluster id an array
  wifi: mac80211: update outdated comment
  wifi: mac80211: Advertise IEEE 802.1X authentication support
  wifi: mac80211: Add support for IEEE 802.1X authentication protocol in non-AP STA mode
  wifi: cfg80211: add support for IEEE 802.1X Authentication Protocol
  wifi: mac80211: Advertise EPPKE support based on driver capabilities
  wifi: mac80211_hwsim: Advertise support for (Re)Association frame encryption
  wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO
  wifi: rt2x00: use generic nvmem_cell_get
  wifi: mac80211: fetch unsolicited probe response template by link ID
  wifi: mac80211: fetch FILS discovery template by link ID
  wifi: nl80211: don't allow DFS channels for NAN
  ...
====================

Link: https://patch.msgid.link/20260304113707.175181-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 15:30:05 -08:00
Dipayaan Roy
2b12ffb669 net: mana: Trigger VF reset/recovery on health check failure due to HWC timeout
The GF stats periodic query is used as mechanism to monitor HWC health
check. If this HWC command times out, it is a strong indication that
the device/SoC is in a faulty state and requires recovery.

Today, when a timeout is detected, the driver marks
hwc_timeout_occurred, clears cached stats, and stops rescheduling the
periodic work. However, the device itself is left in the same failing
state.

Extend the timeout handling path to trigger the existing MANA VF
recovery service by queueing a GDMA_EQE_HWC_RESET_REQUEST work item.
This is expected to initiate the appropriate recovery flow by suspende
resume first and if it fails then trigger a bus rescan.

This change is intentionally limited to HWC command timeouts and does
not trigger recovery for errors reported by the SoC as a normal command
response.

Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/aaFShvKnwR5FY8dH@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-03 11:14:22 +01:00
Kuniyuki Iwashima
425e080a1c dccp Remove inet_hashinfo2_init_mod().
Commit c92c81df93 ("net: dccp: fix kernel crash on module load")
added inet_hashinfo2_init_mod() for DCCP.

Commit 22d6c9eebf ("net: Unexport shared functions for DCCP.")
removed EXPORT_SYMBOL_GPL() it but forgot to remove the function
itself.

Let's remove inet_hashinfo2_init_mod().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260301063756.1581685-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02 18:50:28 -08:00
Kuniyuki Iwashima
3c1e53e554 ipmr: Add dedicated mutex for mrt->{mfc_hash,mfc_cache_list}.
We will no longer hold RTNL for ipmr_rtm_route() to modify the
MFC hash table.

Only __dev_get_by_index() in rtm_to_ipmr_mfcc() is the RTNL
dependant, otherwise, we just need protection for mrt->mfc_hash
and mrt->mfc_cache_list.

Let's add a new mutex for ipmr_mfc_add(), ipmr_mfc_delete(),
and mroute_clean_tables() (setsockopt(MRT_FLUSH or MRT_DONE)).

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-15-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02 18:49:41 -08:00
Kuniyuki Iwashima
4480d5fa1f ipmr/ip6mr: Convert net->ipv[46].ipmr_seq to atomic_t.
We will no longer hold RTNL for ipmr_mfc_add() and ipmr_mfc_delete().

MFC entry can be loosely connected with VIF by its index for
mrt->vif_table[] (stored in mfc_parent), but the two tables are
not synchronised.  i.e. Even if VIF 1 is removed, MFC for VIF 1
is not automatically removed.

The only field that the MFC/VIF interfaces share is
net->ipv[46].ipmr_seq, which is protected by RTNL.

Adding a new mutex for both just to protect a single field is overkill.

Let's convert the field to atomic_t.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02 18:49:41 -08:00
Kuniyuki Iwashima
1c36d186a0 ipmr: Define net->ipv4.{ipmr_notifier_ops,ipmr_seq} under CONFIG_IP_MROUTE.
net->ipv4.ipmr_notifier_ops and net->ipv4.ipmr_seq are used
only in net/ipv4/ipmr.c.

Let's move these definitions under CONFIG_IP_MROUTE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260228221800.1082070-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02 18:49:41 -08:00
Eric Dumazet
8341c989ac net: remove addr_len argument of recvmsg() handlers
Use msg->msg_namelen as a place holder instead of a
temporary variable, notably in inet[6]_recvmsg().

This removes stack canaries and allows tail-calls.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux
add/remove: 0/0 grow/shrink: 2/19 up/down: 26/-532 (-506)
Function                                     old     new   delta
rawv6_recvmsg                                744     767     +23
vsock_dgram_recvmsg                           55      58      +3
vsock_connectible_recvmsg                     50      47      -3
unix_stream_recvmsg                          161     158      -3
unix_seqpacket_recvmsg                        62      59      -3
unix_dgram_recvmsg                            42      39      -3
tcp_recvmsg                                  546     543      -3
mptcp_recvmsg                               1568    1565      -3
ping_recvmsg                                 806     800      -6
tcp_bpf_recvmsg_parser                       983     974      -9
ip_recv_error                                588     576     -12
ipv6_recv_rxpmtu                             442     428     -14
udp_recvmsg                                 1243    1224     -19
ipv6_recv_error                             1046    1024     -22
udpv6_recvmsg                               1487    1461     -26
raw_recvmsg                                  465     437     -28
udp_bpf_recvmsg                             1027     984     -43
sock_common_recvmsg                          103      27     -76
inet_recvmsg                                 257     175     -82
inet6_recvmsg                                257     175     -82
tcp_bpf_recvmsg                              663     568     -95
Total: Before=25143834, After=25143328, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260227151120.1346573-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02 18:17:17 -08:00
Avraham Stern
7c6084d7fa wifi: cfg80211: support key installation on non-netdev wdevs
Currently key installation is only supported for netdev. For NAN,
support most key operations (except setting default data key) on
wdevs instead of netdevs, and adjust all the APIs and tracing to
match.

Since nothing currently sets NL80211_EXT_FEATURE_SECURE_NAN, this
doesn't change anything (P2P Device already isn't allowed.)

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260107150057.69a0cfad95fa.I00efdf3b2c11efab82ef6ece9f393382bcf33ba8@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 11:28:33 +01:00
Miri Korenblit
94d8657392 wifi: cfg80211: make cluster id an array
cfg80211_nan_conf::cluster_id is currently a pointer, but there is no real
reason to not have it an array. It makes things easier as there is no
need to check the pointer validity each time.
If a cluster ID wasn't provided by user space it will be randomized.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260302091108.2b12e4ccf5bb.Ib16bf5cca55463d4c89e18099cf1dfe4de95d405@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 11:01:02 +01:00
Sai Pratyusha Magam
a536be9231 wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO
Per IEEE Std 802.11be-2024, 12.5.2.3.3, if the MPDU is an
individually addressed Data frame between an AP MLD and a
non-AP MLD associated with the AP MLD, then A1/A2/A3
will be MLD MAC addresses. Otherwise, Al/A2/A3 will be
over-the-air link MAC addresses.

Currently, during AAD and Nonce computation for software based
encryption/decryption cases, mac80211 directly uses the addresses it
receives in the skb frame header. However, after the first
authentication, management frame addresses for non-AP MLD stations
are translated to MLD addresses from over the air link addresses in
software. This means that the skb header could contain translated MLD
addresses, which when used as is, can lead to incorrect AAD/Nonce
computation.

In the following manner, ensure that the right set of addresses are used:

In the receive path, stash the pre-translated link addresses in
ieee80211_rx_data and use them for the AAD/Nonce computations
when required.

In the transmit path, offload the encryption for a CCMP/GCMP key
to the hwsim driver that can then ensure that encryption and hence
the AAD/Nonce computations are performed on the frame containing the
right set of addresses, i.e, MLD addresses if unicast data frame and
link addresses otherwise.

To do so, register the set key handler in hwsim driver so mac80211 is
aware that it is the driver that would take care of encrypting the
frame. Offload encryption for a CCMP/GCMP key, while keeping the
encryption for WEP/TKIP and MMIE generation for a AES_CMAC or a
AES_GMAC key still at the SW crypto in mac layer

Co-developed-by: Rohan Dutta <quic_drohan@quicinc.com>
Signed-off-by: Rohan Dutta <quic_drohan@quicinc.com>
Signed-off-by: Sai Pratyusha Magam <sai.magam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260226042959.3766157-1-sai.magam@oss.qualcomm.com
[only store and apply link_addrs for unicast non-data
 rather storing always and applying for !unicast_data]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 09:53:19 +01:00
Sriram R
e098c26b35 wifi: mac80211: fetch unsolicited probe response template by link ID
Currently, the unsolicited probe response template is always fetched from
the default link of a virtual interface in both Multi-Link Operation (MLO)
and non-MLO cases. However, in the MLO case there is a need to fetch the
unsolicited probe response template from a specific link instead of the
default link.

Hence, add support for fetching the unsolicited probe response template
based on the link ID from the corresponding link data.

Signed-off-by: Sriram R <quic_srirrama@quicinc.com>
Co-developed-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com>
Signed-off-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com>
Link: https://patch.msgid.link/20260220-fils-prob-by-link-v1-2-a2746a853f75@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 09:29:15 +01:00
Sriram R
0495b64132 wifi: mac80211: fetch FILS discovery template by link ID
Currently, the FILS discovery template is always fetched from the default
link of a virtual interface in both Multi-Link Operation (MLO) and
non-MLO cases. However, in the MLO case there is a need to fetch the FILS
discovery template from a specific link instead of the default link.

Hence, add support for fetching the FILS discovery template based on the
link ID from the corresponding link data.

Signed-off-by: Sriram R <quic_srirrama@quicinc.com>
Co-developed-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com>
Signed-off-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com>
Link: https://patch.msgid.link/20260220-fils-prob-by-link-v1-1-a2746a853f75@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 09:29:15 +01:00
Miri Korenblit
033fe322f5 wifi: nl80211/cfg80211: support stations of non-netdev interfaces
Currently, a station can only be added to a netdev interface,
mainly because there was no need for a station of a non-netdev
interface.

But for NAN, we will have stations that belong to the NL80211_IFTYPE_NAN
interface.

Prepare for adding/changing/deleting a station that belongs to a non-netdev
interface. This doesn't actually allow such stations - this will be done
in a different patch.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260219114327.65c9cc96f814.Ic02066b88bb8ad6b21e15cbea8d720280008c83b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 09:23:03 +01:00
Hari Chandrakanthan
6a584e336c wifi: cfg80211: add support to handle incumbent signal detected event from mac80211/driver
When any incumbent signal is detected by an AP/mesh interface operating
in 6 GHz band, FCC mandates the AP/mesh to vacate the channels affected
by it [1].

Add a new API cfg80211_incumbent_signal_notify() that can be used
by mac80211 or drivers to notify the higher layers about the signal
interference event with the interference bitmap in which each bit
denotes the affected 20 MHz in the operating channel.

Add support for the new nl80211 event and nl80211 attribute as well to
notify userspace on the details about the interference event. Userspace is
expected to process it and take further action - vacate the channel, or
reduce the bandwidth.

[1] - https://apps.fcc.gov/kdb/GetAttachment.html?id=nXQiRC%2B4mfiA54Zha%2BrW4Q%3D%3D&desc=987594%20D02%20U-NII%206%20GHz%20EMC%20Measurement%20v03&tracking_number=277034

Signed-off-by: Hari Chandrakanthan <quic_haric@quicinc.com>
Signed-off-by: Amith A <amith.a@oss.qualcomm.com>
Link: https://patch.msgid.link/20260216032027.2310956-2-amith.a@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 09:14:54 +01:00
Janusz Dziedzic
d69cb039ab wifi: cfg80211: set and report chandef CAC ongoing
Allow to track and check CAC state from user mode by
simple check phy channels eg. using iw phy1 channels
command.
This is done for regular CAC and background CAC.
It is important for background CAC while we can start
it from any app (eg. iw or hostapd).

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@gmail.com>
Link: https://patch.msgid.link/20260206171830.553879-3-janusz.dziedzic@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-02 09:10:28 +01:00
Jesper Dangaard Brouer
67713dff63 net: sched: sch_dualpi2: use qdisc_dequeue_drop() for dequeue drops
DualPI2 drops packets during dequeue but was using kfree_skb_reason()
directly, bypassing trace_qdisc_drop. Convert to qdisc_dequeue_drop()
and add QDISC_DROP_L4S_STEP_NON_ECN to the qdisc drop reason enum.

- Set TCQ_F_DEQUEUE_DROPS flag in dualpi2_init()
- Use enum qdisc_drop_reason in drop_and_retry()
- Replace kfree_skb_reason() with qdisc_dequeue_drop()

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/177211351978.3011628.11267023360997620069.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 15:31:35 -08:00
Jesper Dangaard Brouer
9d3e7f9718 net: sched: rename QDISC_DROP_CAKE_FLOOD to QDISC_DROP_FLOOD_PROTECTION
Rename QDISC_DROP_CAKE_FLOOD to QDISC_DROP_FLOOD_PROTECTION to use a
generic name without embedding the qdisc name. This follows the
principle that drop reasons should describe the drop mechanism rather
than being tied to a specific qdisc implementation.

The flood protection drop reason is used by qdiscs implementing
probabilistic drop algorithms (like BLUE) that detect unresponsive
flows indicating potential DoS or flood attacks. CAKE uses this via
its Cobalt AQM component.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/177211347537.3011628.13759059534638729639.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 15:31:35 -08:00
Jesper Dangaard Brouer
f30d9073ec net: sched: rename QDISC_DROP_FQ_* to generic names
Rename FQ-specific drop reasons to generic names:
- QDISC_DROP_FQ_BAND_LIMIT -> QDISC_DROP_BAND_LIMIT
- QDISC_DROP_FQ_HORIZON_LIMIT -> QDISC_DROP_HORIZON_LIMIT

This follows the principle that drop reasons should describe the drop
mechanism rather than being tied to a specific qdisc implementation.
These concepts (priority band limits, timestamp horizon) could apply
to other qdiscs as well.

Remove the local macro define FQDR() and instead use the
full QDISC_DROP_* name to make it easier to navigate code.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/177211346902.3011628.12523261489552097455.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 15:31:35 -08:00
Jesper Dangaard Brouer
3e28f8ad47 net: sched: sfq: convert to qdisc drop reasons
Convert SFQ to use the new qdisc-specific drop reason infrastructure.

This patch demonstrates how to convert a flow-based qdisc to use the
new enum qdisc_drop_reason. As part of this conversion:

- Add QDISC_DROP_MAXFLOWS for flow table exhaustion
- Rename FQ_FLOW_LIMIT to generic FLOW_LIMIT, now shared by FQ and SFQ
- Use QDISC_DROP_OVERLIMIT for sfq_drop() when overall limit exceeded
- Use QDISC_DROP_FLOW_LIMIT for per-flow depth limit exceeded

The FLOW_LIMIT reason is now a common drop reason for per-flow limits,
applicable to both FQ and SFQ qdiscs.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/177211345946.3011628.12770616071857185664.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 15:31:34 -08:00
Jesper Dangaard Brouer
ff2998f29f net: sched: introduce qdisc-specific drop reason tracing
Create new enum qdisc_drop_reason and trace_qdisc_drop tracepoint
for qdisc layer drop diagnostics with direct qdisc context visibility.

The new tracepoint includes qdisc handle, parent, kind (name), and
device information. Existing SKB_DROP_REASON_QDISC_DROP is retained
for backwards compatibility via kfree_skb_reason().

Convert qdiscs with drop reasons to use the new infrastructure.

Change CAKE's cobalt_should_drop() return type from enum skb_drop_reason
to enum qdisc_drop_reason to fix implicit enum conversion warnings.
Use QDISC_DROP_UNSPEC as the 'not dropped' sentinel instead of
SKB_NOT_DROPPED_YET. Both have the same compiled value (0), so the
comparison logic remains semantically equivalent.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/177211345275.3011628.1974310302645218067.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 15:31:34 -08:00
Eric Dumazet
5151ec54f5 net: use try_cmpxchg() in lock_sock_nested()
Add a fast path in lock_sock_nested(), to avoid acquiring
the socket spinlock only to set @owned to one:

        spin_lock_bh(&sk->sk_lock.slock);
        if (unlikely(sock_owned_by_user_nocheck(sk)))
                __lock_sock(sk);
        sk->sk_lock.owned = 1;
        spin_unlock_bh(&sk->sk_lock.slock);

On x86_64 compiler generates something quite efficient:

00000000000077c0 <lock_sock_nested>:
    77c0:       f3 0f 1e fa                 endbr64
    77c4:       e8 00 00 00 00              call   __fentry__
    77c9:       b9 01 00 00 00              mov    $0x1,%ecx
    77ce:       31 c0                       xor    %eax,%eax
    77d0:       f0 48 0f b1 8f 48 01 00 00  lock cmpxchg %rcx,0x148(%rdi)
    77d9:       75 06                       jne    slow_path
    77db:       2e e9 00 00 00 00           cs jmp __x86_return_thunk-0x4
slow_path:      ...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20260226021215.1764237-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 17:25:45 -08:00
Byungchul Park
fd6dad4e1a netmem: remove the pp fields from net_iov
Now that the pp fields in net_iov have no users, remove them from
net_iov and clean up.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260224061424.11219-1-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 19:45:24 -08:00
Jakub Kicinski
0314e382cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-7.0-rc2).

Conflicts:

tools/testing/selftests/drivers/net/hw/rss_ctx.py
  19c3a2a81d ("selftests: drv-net: rss: Generate unique ports for RSS context tests")
  ce5a0f4612 ("selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up")

include/net/inet_connection_sock.h
  858d2a4f67 ("tcp: fix potential race in tcp_v6_syn_recv_sock()")
  fcd3d039fa ("tcp: make tcp_v{4,6}_send_check() static")
https://lore.kernel.org/aZ8PSFLzBrEU3I89@sirena.org.uk

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
  69050f8d6d ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
  bf4afc53b7 ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument")
  8a96b9144f ("net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk()")

Adjacent changes:

net/netfilter/ipvs/ip_vs_ctl.c
  c59bd9e62e ("ipvs: use more counters to avoid service lookups")
  bf4afc53b7 ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 10:23:00 -08:00
Linus Torvalds
b9c8fc2cae Merge tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from IPsec, Bluetooth and netfilter

  Current release - regressions:

   - wifi: fix dev_alloc_name() return value check

   - rds: fix recursive lock in rds_tcp_conn_slots_available

  Current release - new code bugs:

   - vsock: lock down child_ns_mode as write-once

  Previous releases - regressions:

   - core:
      - do not pass flow_id to set_rps_cpu()
      - consume xmit errors of GSO frames

   - netconsole: avoid OOB reads, msg is not nul-terminated

   - netfilter: h323: fix OOB read in decode_choice()

   - tcp: re-enable acceptance of FIN packets when RWIN is 0

   - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb().

   - wifi: brcmfmac: fix potential kernel oops when probe fails

   - phy: register phy led_triggers during probe to avoid AB-BA deadlock

   - eth:
      - bnxt_en: fix deleting of Ntuple filters
      - wan: farsync: fix use-after-free bugs caused by unfinished tasklets
      - xscale: check for PTP support properly

  Previous releases - always broken:

   - tcp: fix potential race in tcp_v6_syn_recv_sock()

   - kcm: fix zero-frag skb in frag_list on partial sendmsg error

   - xfrm:
      - fix race condition in espintcp_close()
      - always flush state and policy upon NETDEV_UNREGISTER event

   - bluetooth:
      - purge error queues in socket destructors
      - fix response to L2CAP_ECRED_CONN_REQ

   - eth:
      - mlx5:
         - fix circular locking dependency in dump
         - fix "scheduling while atomic" in IPsec MAC address query
      - gve: fix incorrect buffer cleanup for QPL
      - team: avoid NETDEV_CHANGEMTU event when unregistering slave
      - usb: validate USB endpoints"

* tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  netfilter: nf_conntrack_h323: fix OOB read in decode_choice()
  dpaa2-switch: validate num_ifs to prevent out-of-bounds write
  net: consume xmit errors of GSO frames
  vsock: document write-once behavior of the child_ns_mode sysctl
  vsock: lock down child_ns_mode as write-once
  selftests/vsock: change tests to respect write-once child ns mode
  net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query
  net/mlx5: Fix missing devlink lock in SRIOV enable error path
  net/mlx5: E-switch, Clear legacy flag when moving to switchdev
  net/mlx5: LAG, disable MPESW in lag_disable_change()
  net/mlx5: DR, Fix circular locking dependency in dump
  selftests: team: Add a reference count leak test
  team: avoid NETDEV_CHANGEMTU event when unregistering slave
  net: mana: Fix double destroy_workqueue on service rescan PCI path
  MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER
  dpll: zl3073x: Remove redundant cleanup in devm_dpll_init()
  selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0
  tcp: re-enable acceptance of FIN packets when RWIN is 0
  vsock: Use container_of() to get net namespace in sysctl handlers
  net: usb: kaweth: validate USB endpoints
  ...
2026-02-26 08:00:13 -08:00
Bobby Eshleman
102eab95f0 vsock: lock down child_ns_mode as write-once
Two administrator processes may race when setting child_ns_mode as one
process sets child_ns_mode to "local" and then creates a namespace, but
another process changes child_ns_mode to "global" between the write and
the namespace creation. The first process ends up with a namespace in
"global" mode instead of "local". While this can be detected after the
fact by reading ns_mode and retrying, it is fragile and error-prone.

Make child_ns_mode write-once so that a namespace manager can set it
once and be sure it won't change. Writing a different value after the
first write returns -EBUSY. This applies to all namespaces, including
init_net, where an init process can write "local" to lock all future
namespaces into local mode.

Fixes: eafb64f40c ("vsock: add netns to vsock core")
Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Co-developed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26 11:10:03 +01:00
Florian Westphal
6b94d081f8 netfilter: nf_tables: remove register tracking infrastructure
This facility was disabled in commit
9e539c5b6d ("netfilter: nf_tables: disable expression reduction infra"),
because not all nft_exprs guarantee they will update the destination
register: some may set NFT_BREAK instead to cancel evaluation of the
rule.

This has been dead code ever since.
There are no plans to salvage this at this time, so remove this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-10-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:36:26 -08:00
Julian Anastasov
09b71fb459 ipvs: no_cport and dropentry counters can be per-net
Change the no_cport counters to be per-net and address family.
This should reduce the extra conn lookups done during present
NO_CPORT connections.

By changing from global to per-net dropentry counters, one net
will not affect the drop rate of another net.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-7-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:36:26 -08:00
Julian Anastasov
c59bd9e62e ipvs: use more counters to avoid service lookups
When new connection is created we can lookup for services multiple
times to support fallback options. We already have some counters
to skip specific lookups because it costs CPU cycles for hash
calculation, etc.

Add more counters for fwmark/non-fwmark services (fwm_services and
nonfwm_services) and make all counters per address family.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-6-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:36:26 -08:00
Julian Anastasov
b24ae1a387 ipvs: use single svc table
fwmark based services and non-fwmark based services can be hashed
in same service table. This reduces the burden of working with two
tables.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-4-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:36:25 -08:00
Jiejian Wu
74455a5b43 ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns
Current ipvs uses one global mutex "__ip_vs_mutex" to keep the global
"ip_vs_svc_table" and "ip_vs_svc_fwm_table" safe. But when there are
tens of thousands of services from different netns in the table, it
takes a long time to look up the table, for example, using "ipvsadm
-ln" from different netns simultaneously.

We make "ip_vs_svc_table" and "ip_vs_svc_fwm_table" per netns, and we
add "service_mutex" per netns to keep these two tables safe instead of
the global "__ip_vs_mutex" in current version. To this end, looking up
services from different netns simultaneously will not get stuck,
shortening the time consumption in large-scale deployment. It can be
reproduced using the simple scripts below.

init.sh: #!/bin/bash
for((i=1;i<=4;i++));do
        ip netns add ns$i
        ip netns exec ns$i ip link set dev lo up
        ip netns exec ns$i sh add-services.sh
done

add-services.sh: #!/bin/bash
for((i=0;i<30000;i++)); do
        ipvsadm -A  -t 10.10.10.10:$((80+$i)) -s rr
done

runtest.sh: #!/bin/bash
for((i=1;i<4;i++));do
        ip netns exec ns$i ipvsadm -ln > /dev/null &
done
ip netns exec ns4 ipvsadm -ln > /dev/null

Run "sh init.sh" to initiate the network environment. Then run "time
./runtest.sh" to evaluate the time consumption. Our testbed is a 4-core
Intel Xeon ECS. The result of the original version is around 8 seconds,
while the result of the modified version is only 0.8 seconds.

Signed-off-by: Jiejian Wu <jiejian@linux.alibaba.com>
Co-developed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260224205048.4718-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:36:25 -08:00
Kuniyuki Iwashima
fc1f97929a bonding: Optimise is_netpoll_tx_blocked().
bond_start_xmit() spends some cycles in is_netpoll_tx_blocked():

  if (unlikely(is_netpoll_tx_blocked(dev)))
      return NETDEV_TX_BUSY;

because of the "pushf;pop reg" sequence (aka irqs_disabled()).

Let's swap the conditions in is_netpoll_tx_blocked() and
convert netpoll_block_tx to a static key.

Before:

   1.23 │       mov    %gs:0x28,%rax
   1.24 │       mov    %rax,0x18(%rsp)
  29.45 │       pushfq
   0.50 │       pop    %rax
   0.47 │       test   $0x200,%eax
        │     ↓ je     1b4
   0.49 │ 32:   lea    0x980(%rsi),%rbx

After:

   0.72 │       mov    %gs:0x28,%rax
   0.81 │       mov    %rax,0x18(%rsp)
   0.82 │       nop
   2.77 │ 2a:   lea    0x980(%rsi),%rbx

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260223230749.2376145-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 18:13:38 -08:00
Eric Dumazet
539a6cf084 tcp: move inet6_csk_update_pmtu() to tcp_ipv6.c
This function is only called from tcp_v6_mtu_reduced() and can be
(auto)inlined by the compiler.

Note that inet6_csk_route_socket() is no longer (auto)inlined,
which is a good thing as it is slow path.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1

add/remove: 0/2 grow/shrink: 2/0 up/down: 93/-129 (-36)
Function                                     old     new   delta
tcp_v6_mtu_reduced                           139     228     +89
inet6_csk_route_socket                       486     490      +4
__pfx_inet6_csk_update_pmtu                   16       -     -16
inet6_csk_update_pmtu                        113       -    -113
Total: Before=25076512, After=25076476, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223153047.886683-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 17:47:27 -08:00
Eric Dumazet
fcd3d039fa tcp: make tcp_v{4,6}_send_check() static
tcp_v{4,6}_send_check() are only called from tcp_output.c
and should be made static so that the compiler does not need
to put an out of line copy of them.

Remove (struct inet_connection_sock_af_ops) send_check field
and use instead @net_header_len.

Move @net_header_len close to @queue_xmit for data locality
as both are used in TCP tx fast path.

$ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3
add/remove: 0/2 grow/shrink: 0/3 up/down: 0/-172 (-172)
Function                                     old     new   delta
__tcp_transmit_skb                          3426    3423      -3
tcp_v4_send_check                            136     132      -4
mptcp_subflow_init                           777     763     -14
__pfx_tcp_v6_send_check                       16       -     -16
tcp_v6_send_check                            135       -    -135
Total: Before=25143196, After=25143024, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 17:16:09 -08:00
Eric Dumazet
255688652b tcp: move tcp_v6_send_check() to tcp_output.c
Move tcp_v6_send_check() so that __tcp_transmit_skb() can inline it.

$ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2
add/remove: 0/0 grow/shrink: 1/0 up/down: 105/0 (105)
Function                                     old     new   delta
__tcp_transmit_skb                          3321    3426    +105
Total: Before=25143091, After=25143196, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 17:16:09 -08:00
Eric Dumazet
bd5e5e1d41 tcp: inline __tcp_v4_send_check()
Inline __tcp_v4_send_check(), like __tcp_v6_send_check().

Move tcp_v4_send_check() to tcp_output.c close to
its fast path caller (__tcp_transmit_skb()).

Note __tcp_v4_send_check() is still out-of-line for tcp4_gso_segment()
because it is called in an unlikely() section.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-9 (-9)
Function                                     old     new   delta
__tcp_v4_send_check                          130     121      -9
Total: Before=25143100, After=25143091, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223100729.3761597-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 17:16:09 -08:00
Eric Dumazet
f033335937 udp: move udp6_csum_init() back to net/ipv6/udp.c
This function has a single caller in net/ipv6/udp.c.

Move it there so that the compiler can decide to (auto)inline
it if he prefers to. IBT glue is removed anyway.

With clang, we can see it was able to inline it and also
inlined one other helper at the same time.

UDPLITE removal will also help.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 1/0 up/down: 840/-785 (55)
Function                                     old     new   delta
__udp6_lib_rcv                              1247    2087    +840
__pfx_udp6_csum_init                          16       -     -16
udp6_csum_init                               769       -    -769
Total: Before=25074399, After=25074454, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223093445.3696368-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 16:30:40 -08:00
Eric Dumazet
2550def53b net: __lock_sock() can be static
After commit 6511882cdd ("mptcp: allocate fwd memory separately
on the rx and tx path") __lock_sock() can be static again.

Make sure __lock_sock() is not inlined, so that lock_sock_nested()
no longer needs a stack canary.

Add a noinline attribute on lock_sock_nested() so that calls
to lock_sock() from net/core/sock.c are not inlined,
none of them are fast path to deserve that:

 - sockopt_lock_sock()
 - sock_set_reuseport()
 - sock_set_reuseaddr()
 - sock_set_mark()
 - sock_set_keepalive()
 - sock_no_linger()
 - sock_bindtoindex()
 - sk_wait_data()
 - sock_set_rcvbuf()

$ scripts/bloat-o-meter -t vmlinux.old vmlinux
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-312 (-312)
Function                                     old     new   delta
__lock_sock                                  192     188      -4
__lock_sock_fast                             239      86    -153
lock_sock_nested                             227      72    -155
Total: Before=24888707, After=24888395, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260223092716.3673939-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24 16:30:33 -08:00
Paolo Abeni
1348659dc9 Merge tag 'for-net-2026-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - purge error queues in socket destructors
 - hci_sync: Fix CIS host feature condition
 - L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ
 - L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short
 - L2CAP: Fix response to L2CAP_ECRED_CONN_REQ
 - L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ
 - L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ
 - hci_qca: Cleanup on all setup failures

* tag 'for-net-2026-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ
  Bluetooth: L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ
  Bluetooth: Fix CIS host feature condition
  Bluetooth: L2CAP: Fix response to L2CAP_ECRED_CONN_REQ
  Bluetooth: hci_qca: Cleanup on all setup failures
  Bluetooth: purge error queues in socket destructors
  Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short
  Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ
====================

Link: https://patch.msgid.link/20260223211634.3800315-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-24 15:03:08 +01:00
Sebastian Andrzej Siewior
983512f3a8 net: Drop the lock in skb_may_tx_timestamp()
skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
not be taken in IRQ context, only softirq is okay. A few drivers receive
the timestamp via a dedicated interrupt and complete the TX timestamp
from that handler. This will lead to a deadlock if the lock is already
write-locked on the same CPU.

Taking the lock can be avoided. The socket (pointed by the skb) will
remain valid until the skb is released. The ->sk_socket and ->file
member will be set to NULL once the user closes the socket which may
happen before the timestamp arrives.
If we happen to observe the pointer while the socket is closing but
before the pointer is set to NULL then we may use it because both
pointer (and the file's cred member) are RCU freed.

Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a
matching WRITE_ONCE() where the pointer are cleared.

Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de
Fixes: b245be1f4d ("net-timestamp: no-payload only sysctl")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260220183858.N4ERjFW6@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-24 11:27:29 +01:00
Luiz Augusto von Dentz
c28d2bff70 Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short
Test L2CAP/ECFC/BV-26-C expect the response to L2CAP_ECRED_CONN_REQ with
and MTU value < L2CAP_ECRED_MIN_MTU (64) to be L2CAP_CR_LE_INVALID_PARAMS
rather than L2CAP_CR_LE_UNACCEPT_PARAMS.

Also fix not including the correct number of CIDs in the response since
the spec requires all CIDs being rejected to be included in the
response.

Link: https://github.com/bluez/bluez/issues/1868
Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 15:28:56 -05:00
Luiz Augusto von Dentz
7accb1c432 Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ
This fixes responding with an invalid result caused by checking the
wrong size of CID which should have been (cmd_len - sizeof(*req)) and
on top of it the wrong result was use L2CAP_CR_LE_INVALID_PARAMS which
is invalid/reserved for reconf when running test like L2CAP/ECFC/BI-03-C:

> ACL Data RX: Handle 64 flags 0x02 dlen 14
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
        MTU: 64
        MPS: 64
        Source CID: 64
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!        Result: Reserved (0x000c)
         Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003)

Fiix L2CAP/ECFC/BI-04-C which expects L2CAP_RECONF_INVALID_MPS (0x0002)
when more than one channel gets its MPS reduced:

> ACL Data RX: Handle 64 flags 0x02 dlen 16
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 8
        MTU: 264
        MPS: 99
        Source CID: 64
!       Source CID: 65
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!        Result: Reconfiguration successful (0x0000)
         Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)

Fix L2CAP/ECFC/BI-05-C when SCID is invalid (85 unconnected):

> ACL Data RX: Handle 64 flags 0x02 dlen 14
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
        MTU: 65
        MPS: 64
!        Source CID: 85
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!        Result: Reconfiguration successful (0x0000)
         Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003)

Fix L2CAP/ECFC/BI-06-C when MPS < L2CAP_ECRED_MIN_MPS (64):

> ACL Data RX: Handle 64 flags 0x02 dlen 14
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
        MTU: 672
!       MPS: 63
        Source CID: 64
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!       Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)
        Result: Reconfiguration failed - other unacceptable parameters (0x0004)

Fix L2CAP/ECFC/BI-07-C when MPS reduced for more than one channel:

> ACL Data RX: Handle 64 flags 0x02 dlen 16
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 3 len 8
        MTU: 84
!       MPS: 71
        Source CID: 64
!        Source CID: 65
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!       Result: Reconfiguration successful (0x0000)
        Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)

Link: https://github.com/bluez/bluez/issues/1865
Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 15:23:37 -05:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00