A race can occur between the MCQ completion path and the abort handler:
once a request completes, __blk_mq_free_request() sets rq->mq_hctx to
NULL, meaning the subsequent ufshcd_mcq_req_to_hwq() call in
ufshcd_mcq_abort() can return a NULL pointer. If this NULL pointer is
dereferenced, the kernel will crash.
Add a NULL check for the returned hwq pointer. If hwq is NULL, log an
error and return FAILED, preventing a potential NULL-pointer
dereference. As suggested by Bart, the ufshcd_cmd_inflight() check is
removed.
This is similar to the fix in commit 74736103fb ("scsi: ufs: core: Fix
ufshcd_abort_one racing issue").
This is found by our static analysis tool KNighter.
Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Link: https://lore.kernel.org/r/20250410001320.2219341-1-chenyuan0y@gmail.com
Fixes: f1304d4420 ("scsi: ufs: mcq: Added ufshcd_mcq_abort()")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ufs device JEDEC specification version 4.1 adds support for the
device level exception events. To support this new device level
exception feature, expose two new sysfs nodes below to provide the user
space access to the device level exception information.
/sys/bus/platform/drivers/ufshcd/*/device_lvl_exception_count
/sys/bus/platform/drivers/ufshcd/*/device_lvl_exception_id
The device_lvl_exception_count sysfs node reports the number of device
level exceptions that have occurred since the last time this variable is
reset. Writing a value of 0 will reset it. The device_lvl_exception_id
reports the exception ID which is the qDeviceLevelExceptionID attribute
of the device JEDEC specifications version 4.1 and later. The user space
application can query these sysfs nodes to get more information about
the device level exception.
Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Link: https://lore.kernel.org/r/6278d7c125b2f0cf5056f4a647a4b9c1fdd24fc7.1743198325.git.quic_nguyenb@quicinc.com
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Arthur Simchaev <arthur.simchaev@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Peter Griffin <peter.griffin@linaro.org> says:
Hi folks,
This series fixes several stability issues with the upstream ufs-exynos
driver, specifically for the gs101 SoC found in Pixel 6.
The main fix is regarding the IO cache coherency setting and ensuring
that it is correctly applied depending on if the dma-coherent property
is specified in device tree. This fixes the UFS stability issues on gs101
and I would imagine will also fix issues on exynosauto platform that
seems to have similar iocc shareability bits.
Additionally the phy reference counting is fixed which allows module
load/unload to work reliably and keeps the phy state machine in sync
with the controller glue driver.
regards,
Peter
Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-0-96722cc2ba1b@linaro.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This reverts commit 36f5f026df, reversing
changes made to 43a7eec035.
Thomas says:
"I just noticed that for some incomprehensible reason, probably sheer
incompetemce when trying to utilize b4, I managed to merge an outdated
_and_ buggy version of that series.
Can you please revert that merge completely?"
Done.
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SoC driver updates from Arnd Bergmann:
"These are the updates for SoC specific drivers and related subsystems:
- Firmware driver updates for SCMI, FF-A and SMCCC firmware
interfaces, adding support for additional firmware features
including SoC identification and FF-A SRI callbacks as well as
various bugfixes
- Memory controller updates for Nvidia and Mediatek
- Reset controller support for microchip sam9x7 and imx8qxp/imx8qm
- New hardware support for multiple Mediatek, Renesas and Samsung
Exynos chips
- Minor updates on Zynq, Qualcomm, Amlogic, TI, Samsung, Nvidia and
Apple chips
There will be a follow up with a few more driver updates that are
still causing build regressions at the moment"
* tag 'soc-drivers-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (97 commits)
irqchip: Add support for Amlogic A4 and A5 SoCs
dt-bindings: interrupt-controller: Add support for Amlogic A4 and A5 SoCs
reset: imx: fix incorrect module device table
dt-bindings: power: qcom,kpss-acc-v2: add qcom,msm8916-acc compatible
bus: qcom-ssc-block-bus: Fix the error handling path of qcom_ssc_block_bus_probe()
bus: qcom-ssc-block-bus: Remove some duplicated iounmap() calls
soc: qcom: pd-mapper: Add support for SDM630/636
reset: imx: Add SCU reset driver for i.MX8QXP and i.MX8QM
dt-bindings: firmware: imx: add property reset-controller
dt-bindings: reset: atmel,at91sam9260-reset: add sam9x7
memory: mtk-smi: Add ostd setting for mt8192
dt-bindings: soc: samsung: exynos-usi: Drop unnecessary status from example
firmware: tegra: bpmp: Fix typo in bpmp-abi.h
soc/tegra: pmc: Use str_enable_disable-like helpers
soc: samsung: include linux/array_size.h where needed
firmware: arm_scmi: use ioread64() instead of ioread64_hi_lo()
soc: mediatek: mtk-socinfo: Add extra entry for MT8395AV/ZA Genio 1200
soc: mediatek: mt8188-mmsys: Add support for DSC on VDO0
soc: mediatek: mmsys: Migrate all tables to MMSYS_ROUTE() macro
soc: mediatek: mt8365-mmsys: Fix routing table masks and values
...
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (scsi_debug, ufs, lpfc, st, fnic, mpi3mr,
mpt3sas) and the removal of cxlflash.
The only non-trivial core change is an addition to unit attention
handling to recognize UAs for power on/reset and new media so the tape
driver can use it"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (107 commits)
scsi: st: Tighten the page format heuristics with MODE SELECT
scsi: st: ERASE does not change tape location
scsi: st: Fix array overflow in st_setup()
scsi: target: tcm_loop: Fix wrong abort tag
scsi: lpfc: Restore clearing of NLP_UNREG_INP in ndlp->nlp_flag
scsi: hisi_sas: Fixed failure to issue vendor specific commands
scsi: fnic: Remove unnecessary NUL-terminations
scsi: fnic: Remove redundant flush_workqueue() calls
scsi: core: Use a switch statement when attaching VPD pages
scsi: ufs: renesas: Add initialization code for R-Car S4-8 ES1.2
scsi: ufs: renesas: Add reusable functions
scsi: ufs: renesas: Refactor 0x10ad/0x10af PHY settings
scsi: ufs: renesas: Remove register control helper function
scsi: ufs: renesas: Add register read to remove save/set/restore
scsi: ufs: renesas: Replace init data by init code
scsi: ufs: dt-bindings: renesas,ufs: Add calibration data
scsi: mpi3mr: Task Abort EH Support
scsi: storvsc: Don't report the host packet status as the hv status
scsi: isci: Make most module parameters static
scsi: megaraid_sas: Make most module parameters static
...
Pull block updates from Jens Axboe:
- Fixes for integrity handling
- NVMe pull request via Keith:
- Secure concatenation for TCP transport (Hannes)
- Multipath sysfs visibility (Nilay)
- Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li)
- Correct use of 64-bit BARs for pci-epf target (Niklas)
- Socket fix for selinux when used in containers (Peijie)
- MD pull request via Yu:
- fix recovery can preempt resync (Li Nan)
- fix md-bitmap IO limit (Su Yue)
- fix raid10 discard with REQ_NOWAIT (Xiao Ni)
- fix raid1 memory leak (Zheng Qixing)
- fix mddev uaf (Yu Kuai)
- fix raid1,raid10 IO flags (Yu Kuai)
- some refactor and cleanup (Yu Kuai)
- Series cleaning up and fixing bugs in the bad block handling code
- Improve support for write failure simulation in null_blk
- Various lock ordering fixes
- Fixes for locking for debugfs attributes
- Various ublk related fixes and improvements
- Cleanups for blk-rq-qos wait handling
- blk-throttle fixes
- Fixes for loop dio and sync handling
- Fixes and cleanups for the auto-PI code
- Block side support for hardware encryption keys in blk-crypto
- Various cleanups and fixes
* tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits)
nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi)
nvme-tcp: fix selinux denied when calling sock_sendmsg
nvmet: pci-epf: Always configure BAR0 as 64-bit
nvmet: Remove duplicate uuid_copy
nvme: zns: Simplify nvme_zone_parse_entry()
nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls
nvmet-fc: Remove unused functions
nvme-pci: remove stale comment
nvme-fc: Utilise min3() to simplify queue count calculation
nvme-multipath: Add visibility for queue-depth io-policy
nvme-multipath: Add visibility for numa io-policy
nvme-multipath: Add visibility for round-robin io-policy
nvmet: add tls_concat and tls_key debugfs entries
nvmet-tcp: support secure channel concatenation
nvmet: Add 'sq' argument to alloc_ctrl_args
nvme-fabrics: reset admin connection for secure concatenation
nvme-tcp: request secure channel concatenation
nvme-keyring: add nvme_tls_psk_refresh()
nvme: add nvme_auth_derive_tls_psk()
nvme: add nvme_auth_generate_digest()
...
Pull MSI irq updates from Thomas Gleixner:
- Switch the MSI descriptor locking to guards
- Replace the broken PCI/TPH implementation, which lacks any form of
serialization against concurrent modifications with a properly
serialized mechanism in the PCI/MSI core code
- Replace the MSI descriptor abuse in the SCSI/UFS Qualcom driver with
dedicated driver internal storage
* tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi: Rename msi_[un]lock_descs()
scsi: ufs: qcom: Remove the MSI descriptor abuse
PCI/TPH: Replace the broken MSI-X control word update
PCI/MSI: Provide a sane mechanism for TPH
PCI: hv: Switch MSI descriptor locking to guard()
PCI/MSI: Switch to MSI descriptor locking to guard()
NTB/msi: Switch MSI descriptor locking to lock guard()
soc: ti: ti_sci_inta_msi: Switch MSI descriptor locking to guard()
genirq/msi: Use lock guards for MSI descriptor locking
cleanup: Provide retain_ptr()
genirq/msi: Make a few functions static
There is a TOCTOU race in ufshcd_compl_one_cqe(): hba->dev_cmd.complete may
be cleared from another thread after it has been checked and before it is
used. Fix this race by moving the device command completion from the stack
of the device command submitter into struct ufs_hba. This patch fixes the
following kernel crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Call trace:
_raw_spin_lock_irqsave+0x34/0x80
complete+0x24/0xb8
ufshcd_compl_one_cqe+0x13c/0x4f0
ufshcd_mcq_poll_cqe_lock+0xb4/0x108
ufshcd_intr+0x2f4/0x444
__handle_irq_event_percpu+0xbc/0x250
handle_irq_event+0x48/0xb0
Fixes: 5a0b0cb9be ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250314225206.1487838-1-bvanassche@acm.org
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Qualcomm driver updates for v6.15
Improve the client interface for the Qualcomm ICE driver to avoid
leaking references, including fixing the client drivers to call the new
function.
Adopt str_on_off() helper in AOSS driver and mark non-global servreg QMI
element info array in the PDR driver static.
* tag 'qcom-drivers-for-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
soc: qcom: Do not expose internal servreg_location_entry_ei array
soc: qcom: ice: make of_qcom_ice_get() static
scsi: ufs: qcom: fix dev reference leaked through of_qcom_ice_get
mmc: sdhci-msm: fix dev reference leaked through of_qcom_ice_get
soc: qcom: ice: introduce devm_of_qcom_ice_get
dt-bindings: soc: qcom: qcom,pmic-glink: Document SM8750 compatible
soc: qcom: Use str_enable_disable-like helpers
Link: https://lore.kernel.org/r/20250317210158.2025380-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The driver abuses the MSI descriptors for internal purposes. Aside of core
code and MSI providers nothing has to care about their existence. They have
been encapsulated with a lot of effort because this kind of abuse caused
all sorts of issues including a maintainability nightmare.
Rewrite the code so it uses dedicated storage to hand the required
information to the interrupt handler.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250313130321.963504017@linutronix.de
If the device doesn't support arpmb we'll crash due to copying user data in
bsg_transport_sg_io_fn().
In the case where ufs_bsg_exec_advanced_rpmb_req() returns an error, do not
set the job's reply_len.
Memory crash backtrace:
3,1290,531166405,-;ufshcd 0000:00:12.5: ARPMB OP failed: error code -22
4,1308,531166555,-;Call Trace:
4,1309,531166559,-; <TASK>
4,1310,531166565,-; ? show_regs+0x6d/0x80
4,1311,531166575,-; ? die+0x37/0xa0
4,1312,531166583,-; ? do_trap+0xd4/0xf0
4,1313,531166593,-; ? do_error_trap+0x71/0xb0
4,1314,531166601,-; ? usercopy_abort+0x6c/0x80
4,1315,531166610,-; ? exc_invalid_op+0x52/0x80
4,1316,531166622,-; ? usercopy_abort+0x6c/0x80
4,1317,531166630,-; ? asm_exc_invalid_op+0x1b/0x20
4,1318,531166643,-; ? usercopy_abort+0x6c/0x80
4,1319,531166652,-; __check_heap_object+0xe3/0x120
4,1320,531166661,-; check_heap_object+0x185/0x1d0
4,1321,531166670,-; __check_object_size.part.0+0x72/0x150
4,1322,531166679,-; __check_object_size+0x23/0x30
4,1323,531166688,-; bsg_transport_sg_io_fn+0x314/0x3b0
Fixes: 6ff265fc5e ("scsi: ufs: core: bsg: Add advanced RPMB support in ufs_bsg")
Cc: stable@vger.kernel.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Arthur Simchaev <arthur.simchaev@sandisk.com>
Link: https://lore.kernel.org/r/20250220142039.250992-1-arthur.simchaev@sandisk.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ziqi Chen <quic_ziqichen@quicinc.com> says:
With OPP V2 enabled, devfreq can scale clocks amongst multiple frequency
plans. However, the gear speed is only toggled between min and max during
clock scaling. Enable multi-level gear scaling by mapping clock frequencies
to gear speeds, so that when devfreq scales clock frequencies we can put
the UFS link at the appropraite gear speeds accordingly.
This series has been tested on below platforms -
sm8550 mtp + UFS3.1
SM8650 MTP + UFS3.1
SM8750 MTP + UFS4.0
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-HDK
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-HDK
Link: https://lore.kernel.org/r/20250213080008.2984807-1-quic_ziqichen@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of only two frequencies, if OPP V2 is used, the UFS devfreq clock
scaling may scale the clock among multiple frequencies. In the case of
scaling up, the devfreq may decide to scale the clock to an intermediate
freq based on load, but the clock scale up pre change operation uses
settings for the max clock freq unconditionally. Fix it by passing the
target_freq to clock scale up pre change so that the correct settings for
the target_freq can be used.
In the case of scaling down, the clock scale down post change operation is
doing fine, because it reads the actual clock rate to tell freq, but to
keep symmetry with clock scale up pre change operation, just use the
target_freq instead of reading clock rate.
Signed-off-by: Can Guo <quic_cang@quicinc.com>
Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com>
Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com>
Link: https://lore.kernel.org/r/20250213080008.2984807-3-quic_ziqichen@quicinc.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit bb9850704c ("scsi: ufs: core: Honor runtime/system PM levels if
set by host controller drivers") introduced the check for setting default
PM levels only if the levels are uninitialized by the host controller
drivers. But it missed the fact that the levels could be initialized to 0
(UFS_PM_LVL_0) on purpose by the controller drivers. Even though none of
the drivers are doing so now, the logic should be fixed irrespectively.
So set the default levels unconditionally before calling ufshcd_hba_init()
API which initializes the controller drivers. It ensures that the
controller drivers could override the default levels if required.
Fixes: bb9850704c ("scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers")
Reported-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250219105047.49932-1-manivannan.sadhasivam@linaro.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ufshcd_is_ufs_dev_busy(), ufshcd_print_host_state() and
ufshcd_eh_timed_out() are used in both modes (legacy mode and MCQ mode).
hba->outstanding_reqs only represents the outstanding requests in legacy
mode. Hence, change hba->outstanding_reqs into scsi_host_busy(hba->host) in
these functions.
Fixes: eacb139b77 ("scsi: ufs: core: mcq: Enable multi-circular queue")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250214224352.3025151-1-bvanassche@acm.org
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>