Commit Graph

79398 Commits

Author SHA1 Message Date
Benjamin Berg
98934687f8 wifi: mac80211: skip all known membership selectors
The GLK and EPD Selectors are also not rates, so add a new macro for the
minimum value of a selector and test against that instead of the entire
list. Also fix the typo in the EPD selector define.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.2c19a2dc53db.If187b7d93d8b43a6c70e422c837b7636538fb358@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:45 +01:00
Benjamin Berg
2e3de34f5c wifi: mac80211: log link information in ieee80211_determine_chan_mode
ieee80211_determine_chan_mode is called for each link and if there is a
downgrade, then it is interesting to know on which link it happened.
Pass through the link_id where relevant and use the new link_id_info
macro instead of sdata_info so that the link ID is printed when
relevant.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.d400da710fc4.I64775ec914603d3c7b0c6ea14b507c0370c11622@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:45 +01:00
Benjamin Berg
2996843287 wifi: mac80211: fix typo in HE MCS check
It printed the AP RX MCS value instead of the TX one.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.743c7c1914f4.I1e5888ac6c8324d078fe91d01da31daa76d0e328@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:44 +01:00
Benjamin Berg
c30e9a8558 wifi: mac80211: Remove unused basic_rates variable
The basic_rates variable was passed to mesh_sta_info_init as an out
parameter even though the result is not used. Passing NULL instead is
safe here, so do that.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.23a86a9bad0c.If79bc2c1c98d01cfb4c7e93c18b198fe6c6ea44c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:44 +01:00
Emmanuel Grumbach
00e3daadfe wifi: mac80211: improve stop/wake queue tracing
Add the refcount. This can be useful when we want to understand why a
queue stays stopped after it is woken.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.bd320c6e6702.I6ae0f19d922aea1f28236d72bf260acac428fc02@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:44 +01:00
Johannes Berg
2bf502251b wifi: cfg80211: check extended MLD capa/ops in assoc
Check that additionally extended MLD capa/ops for the MLD is
consistent, i.e. the same value is reported by all affiliated
APs/links.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.e29f42c7ae21.Ib2cdce608321ad154e4b13103cc315c3e3cb6b2b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:44 +01:00
Johannes Berg
993ace39ef wifi: cfg80211: scan: skip duplicate RNR entries
There really shouldn't be duplicate entries when we give
the list to the driver, and since we already have a list
it's easy to avoid.

While at it, remove the unnecessary allocation there.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.b0012c70f503.Id6fcad979434c1437340aa283abae2906345cca1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:44 +01:00
Johannes Berg
69f9556556 wifi: mac80211: mlme: improve messages from config_bw()
The ieee80211_config_bw() function is called in different
contexts: during association with the association response
and during beacon tracking with the beacon. This can be a
bit misleading, so disambiguate the messages for those.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.ee574cf7553b.Ie7c78877d20b5e9de4cce3cf8e4f1b9e0c7ee005@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:44 +01:00
Johannes Berg
f52de501d1 wifi: mac80211: reject per-band vendor elements with MLO
The MLME code doesn't currently handle adding vendor elements
correctly with multi-link due to element inheritance. Simply
prevent that for now completely, if someone needs it we can
fix this later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.bb82d3aaf6ef.Ib30573d0666430a3d7a905e513dfc661edf0bf65@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:44 +01:00
Emmanuel Grumbach
c3219371d6 wifi: mac80211: remove an unneeded check in Rx
Coverity pointed out that __ieee80211_rx_h_amsdu() checks if rx->sta is
NULL before dereferencing it but not always.

Since rx->sta can't be NULL at this point, just remove the check to
avoid confusion

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.033096029d0a.I0923387246a6152f589d278f27f27bce52daee79@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:43 +01:00
Johannes Berg
da7f40c05c wifi: mac80211: add some support for RX OMI power saving
In order to save power, it can be desirable to change the
RX operating mode using OMI to reduce the bandwidth. As the
handshake must be done in the HTC+ field, it cannot be done
by mac80211 directly, so expose functions to the driver to
request and finalize the necessary updates.

Note that RX OMI really only changes what the peer (AP) will
transmit to us, but in order to use it to actually save some
power (by reducing the listen bandwidth) we also update rate
scaling and then the channel context's mindef accordingly.

The updates are split into two in order to sequence them
correctly, when reducing bandwidth first reduce the rate
scaling and thus TX, then send OMI, then reduce the listen
bandwidth (chandef); when increasing bandwidth this is the
other way around. This also requires tracking in different
variables which part is applicable already.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250101070249.2c1a1934bd73.I4e90fd503504e37f9eac5bdae62e3f07e7071275@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 15:26:43 +01:00
Dr. David Alan Gilbert
aa93ca6f65 wifi: mac80211: Remove unused ieee80211_smps_is_restrictive
The last use of ieee80211_smps_is_restrictive() was removed in 2020 by
commit 52b4810bed ("mac80211: Remove support for changing AP SMPS mode")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20241226170119.108947-1-linux@treblig.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 14:01:23 +01:00
Michael-CY Lee
61dcfa8c2a wifi: cfg80211: copy multi-link element from the multi-link probe request's frame body to the generated elements
According to Draft P802.11be_D7.0 clause 35.3.4.2, if a multi-link
request requests an MLD with which an AP corresponding to the
nontransmitted BSSID, the corresponding multi-link probe response
shall carry a basic multi-mink element of that MLD in the frame body
of the multi-link probe response, whose location is outside of the
Multiple BSSID element carried in the frame.

Therefore additional handing is needed for parsing multi-link probe
response and generating the merged elements so that the MLD in the frame
body can be correctly copied to the generated elements. Otherwise, the
nontransmitted BSS looks like non-MLD.

Signed-off-by: Money Wang <money.wang@mediatek.com>
Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com>
Link: https://patch.msgid.link/20241225073725.847062-1-michael-cy.lee@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 13:53:53 +01:00
Miri Korenblit
687a7c8a72 wifi: mac80211: change disassoc sequence a bit
Currently, the sequence goes like this (among others):
1. flush all stations (including the AP ones) -> this will tell the
   drivers to remove the stations
2. notify the driver the vif is not associated.

Which means that in between 1 and 2, the state is that the vif is
associated, but there is no AP station, which makes no sense, and may be
problematic for some drivers (for example iwlwifi)

Change the sequence to:
1. flush the TDLS stations
2. move the AP station to IEEE80211_STA_NONE
3. notify the driver about the vif being unassociated
4. flush the AP station

In order to not break other drivers, add a vif flag to indicate whether
the driver wants to new sequence or not. If the flag is not set, then
things will be done in the old sequence.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241224192322.996ad1be6cb3.I7815d33415aa1d65c0120b54be7a15a45388f807@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 13:53:04 +01:00
Miri Korenblit
a10723ced4 wifi: mac80211: add an option to filter a sta from being flushed
Sometimes we might want to flush only part of the stations of a vif,
for example only the TDLS ones.
To allow this, add a do_not_flush_sta argument to __sta_info_flush,
which in turn, will not flush this station.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241224192322.535e1fcca192.Icecf7f443bf98c9535ce8ec03b24d0d17dfbc28e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 13:53:04 +01:00
Dr. David Alan Gilbert
98d1e33b09 wifi: mac80211: Clean up debugfs_key deadcode
The last use of ieee80211_debugfs_key_sta_del() was removed in 2007 by
commit 11a843b7e1 ("[MAC80211]: rework key handling")

The last use of ieee80211_debugfs_key_add_mgmt_default() was removed
in 2010 by
commit f7e0104c1a ("mac80211: support separate default keys")

The last use of ieee80211_debugfs_key_add_beacon_default() was
removed in 2020 by
commit e5473e80d4 ("mac80211: Support BIGTK configuration for Beacon
protection")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20241224013257.185742-2-linux@treblig.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13 13:51:19 +01:00
David S. Miller
7b24f164cf Merge tag 'ipsec-next-2025-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
ipsec-next-2025-01-09

1) Implement the AGGFRAG protocol and basic IP-TFS (RFC9347) functionality.
   From Christian Hopps.

2) Support ESN context update to hardware for TX.
   From Jianbo Liu.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-10 09:15:17 +00:00
Jakub Kicinski
14ea4cd1b1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc7).

Conflicts:
  a42d71e322 ("net_sched: sch_cake: Add drop reasons")
  737d4d91d3 ("sched: sch_cake: add bounds checks to host bulk flow fairness counts")

Adjacent changes:

drivers/net/ethernet/meta/fbnic/fbnic.h
  3a856ab347 ("eth: fbnic: add IRQ reuse support")
  95978931d5 ("eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 16:11:47 -08:00
Jakub Kicinski
b5cf67a8f7 Merge tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix imbalance between flowtable BIND and UNBIND calls to configure
   hardware offload, this fixes a possible kmemleak.

2) Clamp maximum conntrack hashtable size to INT_MAX to fix a possible
   WARN_ON_ONCE splat coming from kvmalloc_array(), only possible from
   init_netns.

* tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: conntrack: clamp maximum hashtable size to INT_MAX
  netfilter: nf_tables: imbalance in flowtable binding
====================

Link: https://patch.msgid.link/20250109123532.41768-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:54:49 -08:00
Matthieu Baerts (NGI0)
7f5611cbc4 rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).

Fixes: c6a58ffed5 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:35 -08:00
Matthieu Baerts (NGI0)
6259d2484d sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is
used.

Fixes: d1e462a7a5 ("sctp: add probe_interval in sysctl and sock/asoc/transport")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-8-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:35 -08:00
Matthieu Baerts (NGI0)
c10377bbc1 sctp: sysctl: udp_port: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.

Fixes: 046c052b47 ("sctp: enable udp tunneling socks")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-7-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:35 -08:00
Matthieu Baerts (NGI0)
15649fd541 sctp: sysctl: auth_enable: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.

Fixes: b14878ccb7 ("net: sctp: cache auth_enable per endpoint")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-6-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:35 -08:00
Matthieu Baerts (NGI0)
9fc17b76fc sctp: sysctl: rto_min/max: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.

Fixes: 4f3fdf3bc5 ("sctp: add check rto_min and rto_max in sysctl")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-5-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:34 -08:00
Matthieu Baerts (NGI0)
ea62dd1383 sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' structure can be obtained from the table->data using
container_of().

Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is
used.

Fixes: 3c68198e75 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-4-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:34 -08:00
Matthieu Baerts (NGI0)
92cf7a51bd mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'pernet' structure can be obtained from the table->data using
container_of().

Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-3-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:34 -08:00
Matthieu Baerts (NGI0)
d38e26e362 mptcp: sysctl: sched: avoid using current->nsproxy
Using the 'net' structure via 'current' is not recommended for different
reasons.

First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table->data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.

Another reason is that access to current->nsproxy->netns can oops if
attempted when current->nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
  CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206

  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
   __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
   __kernel_write+0xf6/0x140 fs/read_write.c:632
   do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
   acct_pin_kill+0x2d/0x100 kernel/acct.c:192
   pin_kill+0x194/0x7c0 fs/fs_pin.c:44
   mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
   cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
   task_work_run+0x14e/0x250 kernel/task_work.c:239
   exit_task_work include/linux/task_work.h:43 [inline]
   do_exit+0xad8/0x2d70 kernel/exit.c:938
   do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
   get_signal+0x2576/0x2610 kernel/signal.c:3017
   arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
   do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7fee3cb87a6a
  Code: Unable to access opcode bytes at 0x7fee3cb87a40.
  RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
  RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
  RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
  R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
  R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
   </TASK>
  Modules linked in:
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  ----------------
  Code disassembly (best guess), 1 bytes skipped:
     0:	42 80 3c 38 00       	cmpb   $0x0,(%rax,%r15,1)
     5:	0f 85 fe 02 00 00    	jne    0x309
     b:	4d 8b a4 24 08 09 00 	mov    0x908(%r12),%r12
    12:	00
    13:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
    1a:	fc ff df
    1d:	49 8d 7c 24 28       	lea    0x28(%r12),%rdi
    22:	48 89 fa             	mov    %rdi,%rdx
    25:	48 c1 ea 03          	shr    $0x3,%rdx
  * 29:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
    2d:	0f 85 cc 02 00 00    	jne    0x2ff
    33:	4d 8b 7c 24 28       	mov    0x28(%r12),%r15
    38:	48                   	rex.W
    39:	8d                   	.byte 0x8d
    3a:	84 24 c8             	test   %ah,(%rax,%rcx,8)

Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table->data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.

Fixes: 6963c508fd ("mptcp: only allow set existing scheduler for net.mptcp.scheduler")
Cc: stable@vger.kernel.org
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:34 -08:00
Matthieu Baerts (NGI0)
771ec78dc8 mptcp: sysctl: avail sched: remove write access
'net.mptcp.available_schedulers' sysctl knob is there to list available
schedulers, not to modify this list.

There are then no reasons to give write access to it.

Nothing would have been written anyway, but no errors would have been
returned, which is unexpected.

Fixes: 73c900aa36 ("mptcp: add net.mptcp.available_schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-1-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:53:34 -08:00
Toke Høiland-Jørgensen
737d4d91d3 sched: sch_cake: add bounds checks to host bulk flow fairness counts
Even though we fixed a logic error in the commit cited below, syzbot
still managed to trigger an underflow of the per-host bulk flow
counters, leading to an out of bounds memory access.

To avoid any such logic errors causing out of bounds memory accesses,
this commit factors out all accesses to the per-host bulk flow counters
to a series of helpers that perform bounds-checking before any
increments and decrements. This also has the benefit of improving
readability by moving the conditional checks for the flow mode into
these helpers, instead of having them spread out throughout the
code (which was the cause of the original logic error).

As part of this change, the flow quantum calculation is consolidated
into a helper function, which means that the dithering applied to the
ost load scaling is now applied both in the DRR rotation and when a
sparse flow's quantum is first initiated. The only user-visible effect
of this is that the maximum packet size that can be sent while a flow
stays sparse will now vary with +/- one byte in some cases. This should
not make a noticeable difference in practice, and thus it's not worth
complicating the code to preserve the old behaviour.

Fixes: 546ea84d07 ("sched: sch_cake: fix bulk flow accounting logic for host fairness")
Reported-by: syzbot+f63600d288bfb7057424@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Link: https://patch.msgid.link/20250107120105.70685-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 08:18:41 -08:00
Jakub Kicinski
0b7bdc7fab netdev: define NETDEV_INTERNAL
Linus suggested during one of past maintainer summits (in context of
a DMA_BUF discussion) that symbol namespaces can be used to prevent
unwelcome but in-tree code from using all exported functions.
Create a namespace for netdev.

Export netdev_rx_queue_restart(), drivers may want to use it since
it gives them a simple and safe way to restart a queue to apply
config changes. But it's both too low level and too actively developed
to be used outside netdev.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Jakub Kicinski
d6c7b03497 net: make sure we retain NAPI ordering on netdev->napi_list
Netlink code depends on NAPI instances being sorted by ID on
the netdev list for dump continuation. We need to be able to
find the position on the list where we left off if dump does
not fit in a single skb, and in the meantime NAPI instances
can come and go.

This was trivially true when we were assigning a new ID to every
new NAPI instance. Since we added the NAPI config API, we try
to retain the ID previously used for the same queue, but still
add the new NAPI instance at the start of the list.

This is fine if we reset the entire netdev and all NAPIs get
removed and added back. If driver replaces a NAPI instance
during an operation like DEVMEM queue reset, or recreates
a subset of NAPI instances in other ways we may end up with
broken ordering, and therefore Netlink dumps with either
missing or duplicated entries.

At this stage the problem is theoretical. Only two drivers
support queue API, bnxt and gve. gve recreates NAPIs during
queue reset, but it doesn't support NAPI config.
bnxt supports NAPI config but doesn't recreate instances
during reset.

We need to save the ID in the config as soon as it is assigned
because otherwise the new NAPI will not know what ID it will
get at enable time, at the time it is being added.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 15:33:08 +01:00
Pablo Neira Ayuso
b541ba7d1f netfilter: conntrack: clamp maximum hashtable size to INT_MAX
Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
resizing hashtable because __GFP_NOWARN is unset. See:

  0708a0afe2 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")

Note: hashtable resize is only possible from init_netns.

Fixes: 9cc1c73ad6 ("netfilter: conntrack: avoid integer overflow when resizing")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-09 13:29:45 +01:00
Pablo Neira Ayuso
13210fc63f netfilter: nf_tables: imbalance in flowtable binding
All these cases cause imbalance between BIND and UNBIND calls:

- Delete an interface from a flowtable with multiple interfaces

- Add a (device to a) flowtable with --check flag

- Delete a netns containing a flowtable

- In an interactive nft session, create a table with owner flag and
  flowtable inside, then quit.

Fix it by calling FLOW_BLOCK_UNBIND when unregistering hooks, then
remove late FLOW_BLOCK_UNBIND call when destroying flowtable.

Fixes: ff4bf2f42a ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: Phil Sutter <phil@nwl.cc>
Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-09 13:29:38 +01:00
Eric Dumazet
a3b3d2dc38 net: hsr: remove synchronize_rcu() from hsr_add_port()
A synchronize_rcu() was added by mistake in commit
c5a7591172 ("net/hsr: Use list_head (and rcu) instead
of array for slave devices.")

RCU does not mandate to observe a grace period after
list_add_tail_rcu().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250107144701.503884-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 13:24:57 +01:00
Eric Dumazet
2170a1f091 net: no longer reset transport_header in __netif_receive_skb_core()
In commit 66e4c8d950 ("net: warn if transport header was not set")
I added a debug check in skb_transport_header() to detect
if a caller expects the transport_header to be set to a meaningful
value by a prior code path.

Unfortunately, __netif_receive_skb_core() resets the transport header
to the same value than the network header, defeating this check
in receive paths.

Pretending the transport and network headers are the same
is usually wrong.

This patch removes this reset for CONFIG_DEBUG_NET=y builds
to let fuzzers and CI find bugs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250107144342.499759-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 13:18:41 +01:00
Yuyang Huang
33d97a07b3 netlink: add IPv6 anycast join/leave notifications
This change introduces a mechanism for notifying userspace
applications about changes to IPv6 anycast addresses via netlink. It
includes:

* Addition and deletion of IPv6 anycast addresses are reported using
  RTM_NEWANYCAST and RTM_DELANYCAST.
* A new netlink group (RTNLGRP_IPV6_ACADDR) for subscribing to these
  notifications.

This enables user space applications(e.g. ip monitor) to efficiently
track anycast addresses through netlink messages, improving metrics
collection and system monitoring. It also unlocks the potential for
advanced anycast management in user space, such as hardware offload
control and fine grained network control.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250107114355.1766086-1-yuyanghuang@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 12:54:45 +01:00
Jakub Kicinski
6730ee8f08 Merge tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - btmtk: Fix failed to send func ctrl for MediaTek devices.
 - hci_sync: Fix not setting Random Address when required
 - MGMT: Fix Add Device to responding before completing
 - btnxpuart: Fix driver sending truncated data

* tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
  Bluetooth: btnxpuart: Fix driver sending truncated data
  Bluetooth: MGMT: Fix Add Device to responding before completing
  Bluetooth: hci_sync: Fix not setting Random Address when required
====================

Link: https://patch.msgid.link/20250108162627.1623760-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08 19:08:19 -08:00
Daniel Borkmann
80fb40baba tcp: Annotate data-race around sk->sk_mark in tcp_v4_send_reset
This is a follow-up to 3c5b4d69c3 ("net: annotate data-races around
sk->sk_mark"). sk->sk_mark can be read and written without holding
the socket lock. IPv6 equivalent is already covered with READ_ONCE()
annotation in tcp_v6_send_response().

Fixes: 3c5b4d69c3 ("net: annotate data-races around sk->sk_mark")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/f459d1fc44f205e13f6d8bdca2c8bfb9902ffac9.1736244569.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08 10:22:02 -08:00
Jakub Kicinski
d1cacd7477 netdev: prevent accessing NAPI instances from another namespace
The NAPI IDs were not fully exposed to user space prior to the netlink
API, so they were never namespaced. The netlink API must ensure that
at the very least NAPI instance belongs to the same netns as the owner
of the genl sock.

napi_by_id() can become static now, but it needs to move because of
dev_get_by_napi_id().

Cc: stable@vger.kernel.org
Fixes: 1287c1ae0f ("netdev-genl: Support setting per-NAPI config values")
Fixes: 27f91aaf49 ("netdev-genl: Add netlink framework functions for napi")
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250106180137.1861472-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-08 10:21:00 -08:00
Chris Lu
67dba2c28f Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
Use usb_autopm_get_interface() and usb_autopm_put_interface()
in btmtk_usb_shutdown(), it could send func ctrl after enabling
autosuspend.

Bluetooth: btmtk_usb_hci_wmt_sync() hci0: Execution of wmt command
           timed out
Bluetooth: btmtk_usb_shutdown() hci0: Failed to send wmt func ctrl
           (-110)

Fixes: 5c5e8c52e3 ("Bluetooth: btmtk: move btusb_mtk_[setup, shutdown] to btmtk.c")
Signed-off-by: Chris Lu <chris.lu@mediatek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-08 11:14:03 -05:00
Luiz Augusto von Dentz
a182d9c84f Bluetooth: MGMT: Fix Add Device to responding before completing
Add Device with LE type requires updating resolving/accept list which
requires quite a number of commands to complete and each of them may
fail, so instead of pretending it would always work this checks the
return of hci_update_passive_scan_sync which indicates if everything
worked as intended.

Fixes: e8907f7654 ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-08 11:14:01 -05:00
Luiz Augusto von Dentz
c2994b0084 Bluetooth: hci_sync: Fix not setting Random Address when required
This fixes errors such as the following when Own address type is set to
Random Address but it has not been programmed yet due to either be
advertising or connecting:

< HCI Command: LE Set Exte.. (0x08|0x0041) plen 13
        Own address type: Random (0x03)
        Filter policy: Ignore not in accept list (0x01)
        PHYs: 0x05
        Entry 0: LE 1M
          Type: Passive (0x00)
          Interval: 60.000 msec (0x0060)
          Window: 30.000 msec (0x0030)
        Entry 1: LE Coded
          Type: Passive (0x00)
          Interval: 180.000 msec (0x0120)
          Window: 90.000 msec (0x0090)
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Exten.. (0x08|0x0042) plen 6
        Extended scan: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
        Duration: 0 msec (0x0000)
        Period: 0.00 sec (0x0000)
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
        Status: Invalid HCI Command Parameters (0x12)

Fixes: c45074d68a ("Bluetooth: Fix not generating RPA when required")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-08 11:14:00 -05:00
Russell King (Oracle)
60c6e3a592 net: dsa: no longer call ds->ops->get_mac_eee()
All implementations of get_mac_eee() now just return zero without doing
anything useful. Remove the call to this method in preparation to
removing the method from each DSA driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUlkz-007Uyl-UA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 18:06:17 -08:00
Kuniyuki Iwashima
cb358ff941 ipvlan: Fix use-after-free in ipvlan_get_iflink().
syzbot presented an use-after-free report [0] regarding ipvlan and
linkwatch.

ipvlan does not hold a refcnt of the lower device unlike vlan and
macvlan.

If the linkwatch work is triggered for the ipvlan dev, the lower dev
might have already been freed, resulting in UAF of ipvlan->phy_dev in
ipvlan_get_iflink().

We can delay the lower dev unregistration like vlan and macvlan by
holding the lower dev's refcnt in dev->netdev_ops->ndo_init() and
releasing it in dev->priv_destructor().

Jakub pointed out calling .ndo_XXX after unregister_netdevice() has
returned is error prone and suggested [1] addressing this UAF in the
core by taking commit 750e516033 ("net: avoid potential UAF in
default_operstate()") further.

Let's assume unregistering devices DOWN and use RCU protection in
default_operstate() not to race with the device unregistration.

[0]:
BUG: KASAN: slab-use-after-free in ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
Read of size 4 at addr ffff0000d768c0e0 by task kworker/u8:35/6944

CPU: 0 UID: 0 PID: 6944 Comm: kworker/u8:35 Not tainted 6.13.0-rc2-g9bc5c9515b48 #12 4c3cb9e8b4565456f6a355f312ff91f4f29b3c47
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound linkwatch_event
Call trace:
 show_stack+0x38/0x50 arch/arm64/kernel/stacktrace.c:484 (C)
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xbc/0x108 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x16c/0x6f0 mm/kasan/report.c:489
 kasan_report+0xc0/0x120 mm/kasan/report.c:602
 __asan_report_load4_noabort+0x20/0x30 mm/kasan/report_generic.c:380
 ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
 dev_get_iflink+0x7c/0xd8 net/core/dev.c:674
 default_operstate net/core/link_watch.c:45 [inline]
 rfc2863_policy+0x144/0x360 net/core/link_watch.c:72
 linkwatch_do_dev+0x60/0x228 net/core/link_watch.c:175
 __linkwatch_run_queue+0x2f4/0x5b8 net/core/link_watch.c:239
 linkwatch_event+0x64/0xa8 net/core/link_watch.c:282
 process_one_work+0x700/0x1398 kernel/workqueue.c:3229
 process_scheduled_works kernel/workqueue.c:3310 [inline]
 worker_thread+0x8c4/0xe10 kernel/workqueue.c:3391
 kthread+0x2b0/0x360 kernel/kthread.c:389
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862

Allocated by task 9303:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x30/0x68 mm/kasan/common.c:68
 kasan_save_alloc_info+0x44/0x58 mm/kasan/generic.c:568
 poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
 __kasan_kmalloc+0x84/0xa0 mm/kasan/common.c:394
 kasan_kmalloc include/linux/kasan.h:260 [inline]
 __do_kmalloc_node mm/slub.c:4283 [inline]
 __kmalloc_node_noprof+0x2a0/0x560 mm/slub.c:4289
 __kvmalloc_node_noprof+0x9c/0x230 mm/util.c:650
 alloc_netdev_mqs+0xb4/0x1118 net/core/dev.c:11209
 rtnl_create_link+0x2b8/0xb60 net/core/rtnetlink.c:3595
 rtnl_newlink_create+0x19c/0x868 net/core/rtnetlink.c:3771
 __rtnl_newlink net/core/rtnetlink.c:3896 [inline]
 rtnl_newlink+0x122c/0x15c0 net/core/rtnetlink.c:4011
 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
 netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg net/socket.c:726 [inline]
 __sys_sendto+0x2ec/0x438 net/socket.c:2197
 __do_sys_sendto net/socket.c:2204 [inline]
 __se_sys_sendto net/socket.c:2200 [inline]
 __arm64_sys_sendto+0xe4/0x110 net/socket.c:2200
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600

Freed by task 10200:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x30/0x68 mm/kasan/common.c:68
 kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:582
 poison_slab_object mm/kasan/common.c:247 [inline]
 __kasan_slab_free+0x48/0x68 mm/kasan/common.c:264
 kasan_slab_free include/linux/kasan.h:233 [inline]
 slab_free_hook mm/slub.c:2338 [inline]
 slab_free mm/slub.c:4598 [inline]
 kfree+0x140/0x420 mm/slub.c:4746
 kvfree+0x4c/0x68 mm/util.c:693
 netdev_release+0x94/0xc8 net/core/net-sysfs.c:2034
 device_release+0x98/0x1c0
 kobject_cleanup lib/kobject.c:689 [inline]
 kobject_release lib/kobject.c:720 [inline]
 kref_put include/linux/kref.h:65 [inline]
 kobject_put+0x2b0/0x438 lib/kobject.c:737
 netdev_run_todo+0xdd8/0xf48 net/core/dev.c:10924
 rtnl_unlock net/core/rtnetlink.c:152 [inline]
 rtnl_net_unlock net/core/rtnetlink.c:209 [inline]
 rtnl_dellink+0x484/0x680 net/core/rtnetlink.c:3526
 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
 netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg net/socket.c:726 [inline]
 ____sys_sendmsg+0x410/0x708 net/socket.c:2583
 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2637
 __sys_sendmsg net/socket.c:2669 [inline]
 __do_sys_sendmsg net/socket.c:2674 [inline]
 __se_sys_sendmsg net/socket.c:2672 [inline]
 __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2672
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600

The buggy address belongs to the object at ffff0000d768c000
 which belongs to the cache kmalloc-cg-4k of size 4096
The buggy address is located 224 bytes inside of
 freed 4096-byte region [ffff0000d768c000, ffff0000d768d000)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x117688
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff0000c77ef981
flags: 0xbfffe0000000040(head|node=0|zone=2|lastcpupid=0x1ffff)
page_type: f5(slab)
raw: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
head: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000003 fffffdffc35da201 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff0000d768bf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff0000d768c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff0000d768c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
 ffff0000d768c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff0000d768c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 8c55facecd ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250102174400.085fd8ac@kernel.org/ [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106071911.64355-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 17:50:49 -08:00
Kuniyuki Iwashima
7fb1073300 net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
(un)?register_netdevice_notifier_dev_net() hold RTNL before triggering
the notifier for all netdev in the netns.

Let's convert the RTNL to rtnl_net_lock().

Note that move_netdevice_notifiers_dev_net() is assumed to be (but not
yet) protected by per-netns RTNL of both src and dst netns; we need to
convert wireless and hyperv drivers that call dev_change_net_namespace().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106070751.63146-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 17:49:20 -08:00
Kuniyuki Iwashima
ca779f4065 net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net().
(un)?register_netdevice_notifier_net() hold RTNL before triggering the
notifier for all netdev in the netns.

Let's convert the RTNL to rtnl_net_lock().

Note that the per-netns netdev notifier is protected by per-netns RTNL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106070751.63146-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 17:49:20 -08:00
Kuniyuki Iwashima
a239e06250 net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier().
(un)?register_netdevice_notifier() hold pernet_ops_rwsem and RTNL,
iterate all netns, and trigger the notifier for all netdev.

Let's hold __rtnl_net_lock() before triggering the notifier.

Note that we will need protection for netdev_chain when RTNL is
removed.  (e.g. blocking_notifier conversion [0] with a lockdep
annotation [1])

Link: https://lore.kernel.org/netdev/20250104063735.36945-2-kuniyu@amazon.com/ [0]
Link: https://lore.kernel.org/netdev/20250105075957.67334-1-kuniyu@amazon.com/ [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106070751.63146-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 17:49:19 -08:00
Eric Dumazet
1b960cd193 net: watchdog: rename __dev_watchdog_up() and dev_watchdog_down()
In commit d7811e623d ("[NET]: Drop tx lock in dev_watchdog_up")
dev_watchdog_up() became a simple wrapper for __netdev_watchdog_up()

Herbert also said : "In 2.6.19 we can eliminate the unnecessary
__dev_watchdog_up and replace it with dev_watchdog_up."

This patch consolidates things to have only two functions, with
a common prefix.

- netdev_watchdog_up(), exported for the sake of one freescale driver.
  This replaces __netdev_watchdog_up() and dev_watchdog_up().

- netdev_watchdog_down(), static to net/sched/sch_generic.c
  This replaces dev_watchdog_down().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250105090924.1661822-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 17:43:01 -08:00
Benjamin Coddington
b341ca51d2 tls: Fix tls_sw_sendmsg error handling
We've noticed that NFS can hang when using RPC over TLS on an unstable
connection, and investigation shows that the RPC layer is stuck in a tight
loop attempting to transmit, but forever getting -EBADMSG back from the
underlying network.  The loop begins when tcp_sendmsg_locked() returns
-EPIPE to tls_tx_records(), but that error is converted to -EBADMSG when
calling the socket's error reporting handler.

Instead of converting errors from tcp_sendmsg_locked(), let's pass them
along in this path.  The RPC layer handles -EPIPE by reconnecting the
transport, which prevents the endless attempts to transmit on a broken
connection.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Link: https://patch.msgid.link/9594185559881679d81f071b181a10eb07cd079f.1736004079.git.bcodding@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 17:00:19 -08:00
Ted Chen
a1942da8a3 bridge: Make br_is_nd_neigh_msg() accept pointer to "const struct sk_buff"
The skb_buff struct in br_is_nd_neigh_msg() is never modified. Mark it as
const.

Signed-off-by: Ted Chen <znscnchen@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250104083846.71612-1-znscnchen@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07 15:13:10 +01:00