ufshcd_queuecommand() may be called two times in a row for a SCSI command
before it is completed. Hence make the following changes:
- In the functions that submit a command, do not check the old value of
lrbp->cmd nor clear lrbp->cmd in error paths.
- In ufshcd_release_scsi_cmd(), do not clear lrbp->cmd.
See also scsi_send_eh_cmnd().
This commit prevents that the following appears if a command times out:
WARNING: at drivers/ufs/core/ufshcd.c:2965 ufshcd_queuecommand+0x6f8/0x9a8
Call trace:
ufshcd_queuecommand+0x6f8/0x9a8
scsi_send_eh_cmnd+0x2c0/0x960
scsi_eh_test_devices+0x100/0x314
scsi_eh_ready_devs+0xd90/0x114c
scsi_error_handler+0x2b4/0xb70
kthread+0x16c/0x1e0
Fixes: 5a0b0cb9be ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230524203659.1394307-3-bvanassche@acm.org
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
One UFS vendor asked to increase the UFS timeout from 1 s to 3 s. Another
UFS vendor asked to increase the UFS timeout from 1 s to 10 s. Hence this
patch that increases the UFS timeout to 10 s. This patch can cause the
total timeout to exceed 20 s, the Android shutdown timeout. This is fine
since the loop around ufshcd_execute_start_stop() exists to deal with unit
attentions and because unit attentions are reported quickly.
Fixes: dcd5b7637c ("scsi: ufs: Reduce the START STOP UNIT timeout")
Fixes: 8f2c96420c ("scsi: ufs: core: Reduce the power mode change timeout")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230524203659.1394307-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the ufshcd_clear_cmds(), the 2nd parameter would be the bit mask of the
command to be cleared in the transfer request door bell register. This bit
mask mechanism does not scale well in MCQ mode when the queue depth becomes
much greater than 64. Change the 2nd parameter to the function to be the
task_tag number of the corresponding bit to be cleared in the door bell
register. By doing so, MCQ mode with a large queue depth can reuse this
function.
Since the behavior of this function is changed from handling multiple
commands into a single command, rename ufshcd_clear_cmds() into
ufshcd_clear_cmd().
Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Link: https://lore.kernel.org/r/8411fb5363acc90519bced30ea2c2ac582ff2340.1685396241.git.quic_nguyenb@quicinc.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Tested-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Various improvements are made for collecting congestion statistics:
- Pre-existing logic is replaced with use of an hrtimer for increased
reporting accuracy.
- Congestion timestamp information is reorganized into a single struct.
- Common statistic collection logic is refactored into a helper routine.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is mishandling of SLI-4 CQE status values larger than what is allowed
by the LPFC_IOCB_STATUS_MASK of 4 bits. The LPFC_IOCB_STATUS_MASK is a
leftover SLI-3 construct and serves no purpose in SLI-4 path.
Remove the LPFC_IOCB_STATUS_MASK and clean up general CQE status handling
in SLI-4 completion paths.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A firmware upgrade does not necessitate dumping of phba->dbg_log[] to kmsg
via LOG_TRACE_EVENT. A simple KERN_NOTICE log message should suffice to
notify the user of successful or unsuccessful firmware upgrade. As such,
firmware upgrade log messages are updated to use KERN_NOTICE instead of
LOG_TRACE_EVENT. Additionally, in order to notify the user of reset type
for instantiating newly downloaded firmware, lpfc_log_msg's default
KERN_LEVEL is updated to 5 or KERN_NOTICE.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When NPIV ports are zoned to devices that support both initiator and target
mode, a remote device's initiated PRLI results in unintended final kref
clean up of the device's ndlp structure. This disrupts NPIV ports'
discovery for target devices that support both initiator and target mode.
Modify the NPIV lpfc_drop_node clause such that we allow the ndlp to live
so long as it was in NLP_STE_PLOGI_ISSUE, NLP_STE_REG_LOGIN_ISSUE, or
NLP_STE_PRLI_ISSUE nlp_state. This allows lpfc's issued PRLI completion
routine to determine if the final kref clean up should execute rather than
a remote device's issued PRLI.
Fixes: db651ec225 ("scsi: lpfc: Correct used_rpi count when devloss tmo fires with no recovery")
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS
flag only handled Fabric Port Login, Fabric Controller, Management, and
Name Server addresses.
Fabric domain controllers fall under the same category for usage of the
NLP_IN_RECOV_POST_DEV_LOSS flag. Add a default case statement to mark an
ndlp for device loss recovery.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In dev_loss_tmo callback routine, we early return if the ndlp is in a state
of rediscovery. This occurs when a target proactively PLOGIs or PRLIs
after an RSCN before the dev_loss_tmo callback routine is scheduled to run.
Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such
cases.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Due to a target port D_ID swap, it is possible for the
lpfc_register_remote_port() routine to touch post mortem fc_rport memory
when trying to access fc_rport->dd_data.
The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(),
where fc_remote_port_delete() reclaims fc_rport memory.
Remove the fc_rport->dd_data->pnode NULL assignment because the following
line reassigns ndlp->rport with an fc_rport object from
fc_remote_port_add() anyways. The pnode nullification is superfluous.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche <bvanassche@acm.org> says:
In the traces we recorded while testing zoned storage we noticed that UFS
commands are requeued while the clock is being ungated. Command requeueing
makes it harder than necessary to preserve the command order. Hence this
patch series that modifies the SCSI core and also the UFS driver such that
clock ungating does not trigger command requeueing.
Link: https://lore.kernel.org/r/20230529202640.11883-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ungating the clock asynchronously causes ufshcd_queuecommand() to return
SCSI_MLQUEUE_HOST_BUSY and hence causes commands to be requeued. This is
suboptimal. Allow ufshcd_queuecommand() to sleep such that clock ungating
does not trigger command requeuing. Remove the ufshcd_scsi_block_requests()
and ufshcd_scsi_unblock_requests() calls because these are no longer
needed. The flush_work(&hba->clk_gating.ungate_work) call is sufficient to
make the SCSI core wait for clock ungating to complete.
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230529202640.11883-6-bvanassche@acm.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>