Right now we have a broken sequence where we enable DMA channel interrupts
which can be left enabled and never disabled if we hit an error path.
Worse still when we unload the driver, the DMA channel interrupt bits are
left intact. About the only saving grace here is that we do remember to
disable the wcnss interrupt when unload the driver.
Fixes: 8e84c25821 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211105122152.1580542-2-bryan.odonoghue@linaro.org
Firmware can trigger a missed beacon indication, this is not the same as a
lost signal.
Flag to Linux the missed beacon and let the WiFi stack decide for itself if
the link is up or down by sending its own probe to determine this.
We should only be signalling the link is lost when the firmware indicates
Fixes: 8e84c25821 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211027232529.657764-1-bryan.odonoghue@linaro.org
An SMD capture from the downstream prima driver on WCN3680B shows the
following command sequence for connected scans:
- init_scan_req
- start_scan_req, channel 1
- end_scan_req, channel 1
- start_scan_req, channel 2
- ...
- end_scan_req, channel 3
- finish_scan_req
- init_scan_req
- start_scan_req, channel 4
- ...
- end_scan_req, channel 6
- finish_scan_req
- ...
- end_scan_req, channel 165
- finish_scan_req
Upstream currently never calls wcn36xx_smd_end_scan, and in some cases[1]
still sends finish_scan_req twice in a row or before init_scan_req. A
typical connected scan looks like this:
- init_scan_req
- start_scan_req, channel 1
- finish_scan_req
- init_scan_req
- start_scan_req, channel 2
- ...
- start_scan_req, channel 165
- finish_scan_req
- finish_scan_req
This patch cleans up scanning so that init/finish and start/end are always
paired together and correctly nested.
- init_scan_req
- start_scan_req, channel 1
- end_scan_req, channel 1
- finish_scan_req
- init_scan_req
- start_scan_req, channel 2
- end_scan_req, channel 2
- ...
- start_scan_req, channel 165
- end_scan_req, channel 165
- finish_scan_req
Note that upstream will not do batching of 3 active-probe scans before
returning to the operating channel, and this patch does not change that.
To match downstream in this aspect, adjust IEEE80211_PROBE_DELAY and/or
the 125ms max off-channel time in ieee80211_scan_state_decision.
[1]: commit d195d7aac0 ("wcn36xx: Ensure finish scan is not requested
before start scan") addressed one case of finish_scan_req being sent
without a preceding init_scan_req (the case of the operating channel
coinciding with the first scan channel); two other cases are:
1) if SW scan is started and aborted immediately, without scanning any
channels, we send a finish_scan_req without ever sending init_scan_req,
and
2) as SW scan logic always returns us to the operating channel before
calling wcn36xx_sw_scan_complete, finish_scan_req is always sent twice
at the end of a SW scan
Fixes: 8e84c25821 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Benjamin Li <benl@squareup.com>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211027170306.555535-4-benl@squareup.com
Without ieee80211_ops->flush implemented to empty HW queues, mac80211 will
do a 100ms dead wait after stopping SW queues, before leaving the operating
channel to resume a software connected scan[1].
(see ieee80211_scan_state_resume)
This wait is correctly included in the calculation for whether or not
we've exceeded max off-channel time, as it occurs after sending the null
frame with PS bit set. Thus, with 125 ms max off-channel time we only
have 25 ms of scan time, which technically isn't even enough to scan one
channel (although mac80211 always scans at least one channel per off-
channel window).
Moreover, for passive probes we end up spending at least 100 ms + 111 ms
(IEEE80211_PASSIVE_CHANNEL_TIME) "off-channel"[2], which exceeds the listen
interval of 200 ms that we provide in our association request frame. That's
technically out-of-spec.
[1]: Until recently, wcn36xx performed software (rather than FW-offloaded)
scanning when 5GHz channels are requested. This apparent limitation is now
resolved -- see commit 1395f8a6a4d5 ("wcn36xx: Enable hardware scan offload
for 5Ghz band").
[2]: in quotes because about 100 ms of it is still on-channel but with PS
set
Signed-off-by: Benjamin Li <benl@squareup.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211027170306.555535-3-benl@squareup.com
ATH10K chips are used it wide range of routers,
accesspoints, range extenders, network appliances.
On these embedded devices, calibration data is often
stored on the main system's flash and was out of reach
for the driver.
To bridge this gap, ath10k is getting extended to pull
the (pre-)calibration data through nvmem subsystem.
To do this, a nvmem-cell containing the information can
either be specified in the platform data or via device-tree.
Tested with:
Netgear EX6150v2 (IPQ4018 - pre-calibration method)
TP-Link Archer C7 v2 (QCA9880v2 - old calibration method)
Cc: Robert Marko <robimarko@gmail.com>
Cc: Thibaut VARÈNE <hacks@slashdirt.org>
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211016234609.1568317-1-chunkeey@gmail.com
Commit 6f4d70308e ("ath11k: support SMPS configuration for 6 GHz") changed
"if (ht_cap & WMI_HT_CAP_DYNAMIC_SMPS)" to "if (ht_cap &
WMI_HT_CAP_DYNAMIC_SMPS || ar->supports_6ghz)" which means
NL80211_FEATURE_DYNAMIC_SMPS is enabled for all chips which support 6 GHz.
However, WCN6855 supports 6 GHz but it does not support feature
NL80211_FEATURE_DYNAMIC_SMPS, and this can lead to MU-MIMO test failures for
WCN6855.
Disable NL80211_FEATURE_DYNAMIC_SMPS for WCN6855 since its ht_cap does not
support WMI_HT_CAP_DYNAMIC_SMPS. Enable the feature only on QCN9074 as that's
the only other device supporting 6 GHz band.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210914163726.38604-3-jouni@codeaurora.org
Add the missing endpoint max-packet sanity check to probe() to avoid
division by zero in ath10k_usb_hif_tx_sg() in case a malicious device
has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4f ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: 9cbee35868 ("ath6kl: add full USB support")
Cc: stable@vger.kernel.org # 3.5
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211027080819.6675-3-johan@kernel.org
Add the missing endpoint max-packet sanity check to probe() to avoid
division by zero in ath10k_usb_hif_tx_sg() in case a malicious device
has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4f ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: 4db66499df ("ath10k: add initial USB support")
Cc: stable@vger.kernel.org # 4.14
Cc: Erik Stromdahl <erik.stromdahl@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211027080819.6675-2-johan@kernel.org
The channel scan list must be updated before triggering a hardware scan
so that firmware takes into account the regulatory info for each single
channel such as active/passive config, power, DFS, etc... Without this
the firmware uses its own internal default channel configuration, which
is not aligned with mac80211 regulatory rules, and misses several
channels (e.g. 144).
Fixes: 2f3bef4b24 ("wcn36xx: Add hardware scan offload support")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1635175328-25642-1-git-send-email-loic.poulain@linaro.org
Firmware link offload monitoring can be made to work in 3/4 cases by
switching on firmware feature bit WLANACTIVE_OFFLOAD
- Secure power-save on
- Secure power-save off
- Open power-save on
However, with an open AP if we switch off power-saving - thus never
entering Beacon Mode Power Save - BMPS, firmware never forwards loss
of beacon upwards.
We had hoped that WLANACTIVE_OFFLOAD and some fixes for sequence numbers
would unblock this but, it hasn't and further investigation is required.
Its possible to have a complete set of Secure power-save on/off and Open
power-save on/off provided we use Linux' link monitoring mechanism.
While we debug the Open AP failure we need to fix upstream.
This reverts commit c973fdad79f6eaf247d48b5fc77733e989eb01e1.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211025093037.3966022-2-bryan.odonoghue@linaro.org
If the system is resumed because of an incoming packet, the wcn36xx RX
interrupts is fired before actual resuming of the wireless/mac80211
stack, causing any received packets to be simply dropped. E.g. a ping
request causes a system resume, but is dropped and so never forwarded
to the IP stack.
This change fixes that, disabling DMA interrupts on suspend to no pass
packets until mac80211 is resumed and ready to handle them.
Note that it's not incompatible with RX irq wake.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1635150496-19290-1-git-send-email-loic.poulain@linaro.org
The firmware is offering features such as ARP offload, for which
firmware crafts its own (QoS)packets without waking up the host.
Point is that the sequence numbers generated by the firmware are
not in sync with the host mac80211 layer and can cause packets
such as firmware ARP reponses to be dropped by the AP (too old SN).
To fix this we need to let the firmware manages the sequence
numbers by its own (except for QoS null frames). There is a SN
counter for each QoS queue and one global/baseline counter for
Non-QoS.
Fixes: 84aff52e4f ("wcn36xx: Use sequence number allocated by mac80211")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1635150336-18736-1-git-send-email-loic.poulain@linaro.org
This is essentially exactly following the dma_wmb()/dma_rmb() usage
instructions in Documentation/memory-barriers.txt.
The theoretical races here are:
1. DXE (the DMA Transfer Engine in the Wi-Fi subsystem) seeing the
dxe->ctrl & WCN36xx_DXE_CTRL_VLD write before the dxe->dst_addr_l
write, thus performing DMA into the wrong address.
2. CPU reading dxe->dst_addr_l before DXE unsets dxe->ctrl &
WCN36xx_DXE_CTRL_VLD. This should generally be harmless since DXE
doesn't write dxe->dst_addr_l (no risk of freeing the wrong skb).
Fixes: 8e84c25821 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Benjamin Li <benl@squareup.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211023001528.3077822-1-benl@squareup.com
On an open AP when you pull the plug on the AP, if we are not already in
BMPS mode then the firmware will not generate a disconnection event.
Instead we need to monitor for failure to enter BMPS and treat a string of
failures as connection loss.
Secure AP connections don't appear to demonstrate this behavior so the
work-around is limited to open APs only.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211022140447.2846248-2-bryan.odonoghue@linaro.org
WCNSS RX DMA transfer support is limited to 3872 bytes, which is
enough for simple MPDUs (single MSDU), but not enough for cases
with A-MSDU (depending on max AMSDU size or max MPDU size).
In that case the MPDU is spread over multiple transfers, with the
first transfer containing the MPDU header and (at least) the first
A-MSDU subframe and additional transfer(s) containing the following
A-MSDUs. This can be handled with a series of flags to tagging the
first and last A-MSDU transfers.
In that case we have to bufferize and re-linearize the A-MSDU buffers
into a proper MPDU skb before forwarding to mac80211 (in the same way
as it is done in ath10k).
This change also includes sanity check of the buffer descriptor to
prevent skb overflow.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634557705-11120-1-git-send-email-loic.poulain@linaro.org
Until now, offload scanning for 5Ghz channels was considered broken.
However it was mostly a driver issue, caused by bad reporting of the
beacons/probe-resp bands and frequencies, which has been fixed.
We can now allow offload scan for 5GHz band, this reduces the scanning
time comparing to software driven scanning.
Note that offloaded scan is limited to 48 channels, check for this.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634554678-7993-2-git-send-email-loic.poulain@linaro.org
For packets originating from hardware scan, the channel and band is
included in the buffer descriptor (bd->rf_band & bd->rx_ch).
For 2Ghz band the channel value is directly reported in the 4-bit
rx_ch field. For 5Ghz band, the rx_ch field contains a mapping
index (given the 4-bit limitation).
The reserved0 value field is also used to extend 4-bit mapping to
5-bit mapping to support more than 16 5Ghz channels.
This change adds correct reporting of the frequency/band, that is
used in scan mechanism. And is required for 5Ghz hardware scan
support.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634554678-7993-1-git-send-email-loic.poulain@linaro.org
This change fix the TX ack mechanism in various ways:
- For NO_ACK tagged packets, we don't need to wait for TX_ACK indication
and so are not subject to the single packet ack limitation. So we don't
have to stop the tx queue, and can call the tx status callback as soon
as DMA transfer has completed.
- Fix skb ownership/reference. Only start status indication timeout
once the DMA transfer has been completed. This avoids the skb to be
both referenced in the DMA tx ring and by the tx_ack_skb pointer,
preventing any use-after-free or double-free.
- This adds a sanity (paranoia?) check on the skb tx ack pointer.
- Resume TX queue if TX status tagged packet TX fails.
Cc: stable@vger.kernel.org
Fixes: fdf21cc371 ("wcn36xx: Add TX ack support")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634567281-28997-1-git-send-email-loic.poulain@linaro.org
We observe unexpected connection drops with some APs due to
non-acked mac80211 generated null data frames (keep-alive).
After debugging and capture, we noticed that null frames are
submitted at standard data bitrate and that the given APs are
in trouble with that.
After setting the null frame bitrate to control bitrate, all
null frames are acked as expected and connection is maintained.
Not sure if it's a requirement of the specification, but it seems
the right thing to do anyway, null frames are mostly used for control
purpose (power-saving, keep-alive...), and submitting them with
a slower/simpler bitrate/modulation is more robust.
Cc: stable@vger.kernel.org
Fixes: 512b191d96 ("wcn36xx: Fix TX data path")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634560399-15290-1-git-send-email-loic.poulain@linaro.org
Commit 9af7c32cec ("ath10k: add target IRAM recovery feature support")
introduced a new firmware feature flag ATH10K_FW_FEATURE_IRAM_RECOVERY. But
this caused ath10k_pci module load to fail if ATH10K_FW_CRASH_DUMP_RAM_DATA bit
was not enabled in the ath10k coredump_mask module parameter:
[ 2209.328190] ath10k_pci 0000:02:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[ 2209.434414] ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 1
[ 2209.547191] ath10k_pci 0000:02:00.0: firmware ver 10.4-3.9.0.2-00099 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate,iram-recovery crc32 cbade90a
[ 2210.896485] ath10k_pci 0000:02:00.0: board_file api 1 bmi_id 0:1 crc32 a040efc2
[ 2213.603339] ath10k_pci 0000:02:00.0: failed to copy target iram contents: -12
[ 2213.839027] ath10k_pci 0000:02:00.0: could not init core (-12)
[ 2213.933910] ath10k_pci 0000:02:00.0: could not probe fw (-12)
And by default coredump_mask does not have ATH10K_FW_CRASH_DUMP_RAM_DATA
enabled so anyone using a firmware with iram-recovery feature would fail. To my
knowledge only QCA9984 firmwares starting from release 10.4-3.9.0.2-00099
enabled the feature.
The reason for regression was that ath10k_core_copy_target_iram() used
ath10k_coredump_get_mem_layout() to get the memory layout, but when
ATH10K_FW_CRASH_DUMP_RAM_DATA was disabled it would get just NULL and bail out
with an error.
While looking at all this I noticed another bug: if CONFIG_DEV_COREDUMP is
disabled but the firmware has iram-recovery enabled the module load fails with
similar error messages. I fixed that by returning 0 from
ath10k_core_copy_target_iram() when _ath10k_coredump_get_mem_layout() returns
NULL.
Tested-on: QCA9984 hw2.0 PCI 10.4-3.9.0.2-00139
Fixes: 9af7c32cec ("ath10k: add target IRAM recovery feature support")
Signed-off-by: Abinaya Kalaiselvan <akalaise@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211020075054.23061-1-kvalo@codeaurora.org
Using a kernel pointer in place of a dma_addr_t token can
lead to undefined behavior if that makes it into cache
management functions. The compiler caught one such attempt
in a cast:
drivers/net/wireless/ath/ath10k/mac.c: In function 'ath10k_add_interface':
drivers/net/wireless/ath/ath10k/mac.c:5586:47: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
5586 | arvif->beacon_paddr = (dma_addr_t)arvif->beacon_buf;
| ^
Looking through how this gets used down the way, I'm fairly
sure that beacon_paddr is never accessed again for ATH10K_DEV_TYPE_HL
devices, and if it was accessed, that would be a bug.
Change the assignment to use a known-invalid address token
instead, which avoids the warning and makes it easier to catch
bugs if it does end up getting used.
Fixes: e263bdab9c ("ath10k: high latency fixes for beacon buffer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211014075153.3655910-1-arnd@kernel.org
ath.git patches for v5.16. Major changes:
ath9k
* add option to reset the wifi chip via debugfs
* convert Device Tree bindings to the json-schema
* support Device Tree ieee80211-freq-limit property to limit channels
When powersaving (so either wifi powersaving or deep sleep, depending on
which state the firmware is in) is disabled, the way the firmware goes
into host sleep is different: Usually the firmware implicitely enters
host sleep on the next SLEEP event we get when we configured host sleep
via HSCFG before. When powersaving is disabled though, there are no
SLEEP events, the way we enter host sleep in that case is different: The
firmware will send us a HS_ACT_REQ event and after that we "manually"
make the firmware enter host sleep by sending it another HSCFG command
with the action HS_ACTIVATE.
Now waking up from host sleep appears to be different depending on
whether powersaving is enabled again: When powersaving is enabled, the
firmware implicitely leaves host sleep as soon as it wakes up and sends
us an AWAKE event. When powersaving is disabled though, it apparently
doesn't implicitely leave host sleep, but instead we need to send it a
HSCFG command with the HS_CONFIGURE action and the HS_CFG_CANCEL
condition. We didn't do that so far, which is why waking up from host
sleep was broken when powersaving is disabled.
So add some additional state to mwifiex_adapter where we keep track of
whether host sleep was activated manually via HS_ACTIVATE, and if that
was the case, deactivate it manually again via HS_CFG_CANCEL.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211016153244.24353-6-verdre@v0yd.nl
While looking at on-air packets using Wireshark, I noticed we're never
setting the initiator bit when sending DELBA requests to the AP: While
we set the bit on our del_ba_param_set bitmask, we forget to actually
copy that bitmask over to the command struct, which means we never
actually set the initiator bit.
Fix that and copy the bitmask over to the host_cmd_ds_11n_delba command
struct.
Fixes: 5e6e3a92b9 ("wireless: mwifiex: initial commit for Marvell mwifiex driver")
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Acked-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211016153244.24353-5-verdre@v0yd.nl
Sometimes the KEY_MATERIAL command can fail with the 88W8897 firmware
(when this happens exactly seems pretty random). This appears to prevent
the access point from starting, so it seems like a good idea to log an
error in that case.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211016153244.24353-3-verdre@v0yd.nl
It's not an error if someone chooses to put their computer to sleep, not
wanting it to wake up because the person next door has just discovered
what a magic packet is. So change the loglevel of this annoying message
from ERROR to INFO.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211016153244.24353-2-verdre@v0yd.nl
When fail to init coex module, free 'common' and 'adapter' directly, but
common->tx_thread which will access 'common' and 'adapter' is running at
the same time. That will trigger the UAF bug.
==================================================================
BUG: KASAN: use-after-free in rsi_tx_scheduler_thread+0x50f/0x520 [rsi_91x]
Read of size 8 at addr ffff8880076dc000 by task Tx-Thread/124777
CPU: 0 PID: 124777 Comm: Tx-Thread Not tainted 5.15.0-rc5+ #19
Call Trace:
dump_stack_lvl+0xe2/0x152
print_address_description.constprop.0+0x21/0x140
? rsi_tx_scheduler_thread+0x50f/0x520
kasan_report.cold+0x7f/0x11b
? rsi_tx_scheduler_thread+0x50f/0x520
rsi_tx_scheduler_thread+0x50f/0x520
...
Freed by task 111873:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0x109/0x140
kfree+0x117/0x4c0
rsi_91x_init+0x741/0x8a0 [rsi_91x]
rsi_probe+0x9f/0x1750 [rsi_usb]
Stop thread before free 'common' and 'adapter' to fix it.
Fixes: 2108df3c4b ("rsi: add coex support")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211015040335.1021546-1-william.xuanziyang@huawei.com
It seems that the PCIe+USB firmware (latest version 15.68.19.p21) of the
88W8897 card sometimes ignores or misses when we try to wake it up by
writing to the firmware status register. This leads to the firmware
wakeup timeout expiring and the driver resetting the card because we
assume the firmware has hung up or crashed.
Turns out that the firmware actually didn't hang up, but simply "missed"
our wakeup request and didn't send us an interrupt with an AWAKE event.
Trying again to read the firmware status register after a short timeout
usually makes the firmware wake up as expected, so add a small retry
loop to mwifiex_pm_wakeup_card() that looks at the interrupt status to
check whether the card woke up.
The number of tries and timeout lengths for this were determined
experimentally: The firmware usually takes about 500 us to wake up
after we attempt to read the status register. In some cases where the
firmware is very busy (for example while doing a bluetooth scan) it
might even miss our requests for multiple milliseconds, which is why
after 15 tries the waiting time gets increased to 10 ms. The maximum
number of tries it took to wake the firmware when testing this was
around 20, so a maximum number of 50 tries should give us plenty of
safety margin.
Here's a reproducer for those firmware wakeup failures I've found:
1) Make sure wifi powersaving is enabled (iw dev wlp1s0 set power_save on)
2) Connect to any wifi network (makes firmware go into wifi powersaving
mode, not deep sleep)
3) Make sure bluetooth is turned off (to ensure the firmware actually
enters powersave mode and doesn't keep the radio active doing bluetooth
stuff)
4) To confirm that wifi powersaving is entered ping a device on the LAN,
pings should be a few ms higher than without powersaving
5) Run "while true; do iwconfig; sleep 0.0001; done", this wakes and
suspends the firmware extremely often
6) Wait until things explode, for me it consistently takes <5 minutes
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=109681
Cc: stable@vger.kernel.org
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211011133224.15561-3-verdre@v0yd.nl
On the 88W8897 PCIe+USB card the firmware randomly crashes after setting
the TX ring write pointer. The issue is present in the latest firmware
version 15.68.19.p21 of the PCIe+USB card.
Those firmware crashes can be worked around by reading any PCI register
of the card after setting that register, so read the PCI_VENDOR_ID
register here. The reason this works is probably because we keep the bus
from entering an ASPM state for a bit longer, because that's what causes
the cards firmware to crash.
This fixes a bug where during RX/TX traffic and with ASPM L1 substates
enabled (the specific substates where the issue happens appear to be
platform dependent), the firmware crashes and eventually a command
timeout appears in the logs.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=109681
Cc: stable@vger.kernel.org
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211011133224.15561-2-verdre@v0yd.nl