Admin reply processing can be called from multiple contexts. The driver
uses an atomic flag for synchronization among multiple threads/context for
draining the admin replies.
Upon entering the admin processing routine, the driver will set the atomic
flag and start reply processing. When exiting the routine, the driver
resets the flag. However, there is a race condition when one thread (Thread
1) has processed replies and is about to reset the flag but in the meantime
few more replies are posted and another thread (Thread 2) is called to
process replies. Since the synchronization flag is still set, Thread 2 will
return without processing replies and those new replies will not be
flushed.
Make the watchdog thread monitor cases where admin ISR/poll call returns
due to another thread processing admin replies. If such an instance is
found, make driver call admin ISR to drain replies (if any).
Co-developed-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250220142528.20837-4-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the task management thread processes reply queues while the reset
thread resets them, the task management thread accesses an invalid queue ID
(0xFFFF), set by the reset thread, which points to unallocated memory,
causing a crash.
Add flag 'io_admin_reset_sync' to synchronize access between the reset,
I/O, and admin threads. Before a reset, the reset handler sets this flag to
block I/O and admin processing threads. If any thread bypasses the initial
check, the reset thread waits up to 10 seconds for processing to finish. If
the wait exceeds 10 seconds, the controller is marked as unrecoverable.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250129100850.25430-4-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Allocate segmented trace buffer if firmware advertises the capability in
IOCfacts.
Upon driver load, read the trace buffer size from driver page 1, calculate
the required segments for trace buffer, and allocate segmented buffers.
Each segment is 4096 bytes in size.
While posting driver diagnostic buffer to firmware, advertise that trace
buffer is segmented.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250129100850.25430-3-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To avoid reply queue full condition, update the driver to check IOCFacts
capabilities for qfull.
Update the operational reply queue's Consumer Index after processing 100
replies. If pending I/Os on a reply queue exceeds a threshold
(reply_queue_depth - 200), then return I/O back to OS to retry.
Also increase default admin reply queue size to 2K.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250129100850.25430-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During controller transitioning to READY state, if the controller is found
in transient states ("becoming ready" or "reset requested"), driver waits
for 510 secs even if the controller transitions out of these states
early. This causes an unnecessary delay of 510 secs in the overall firmware
initialization sequence.
Poll the controller state periodically (every 100 milliseconds) while
waiting for the controller to come out of the mentioned transient
states. Once the controller transits out of the transient states, come out
of the wait loop.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240905102753.105310-5-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche <bvanassche@acm.org> says:
Hi Martin,
Multiple SCSI drivers use snprintf() to format a workqueue name before
invoking one of the create*_workqueue() macros. This patch series
simplifies such code by passing the format string and arguments to
alloc_workqueue(). Additionally, the structure members that are only
used as a temporary buffer for formatting workqueue names are
removed. Please consider this patch series for the next merge window.
Thanks,
Bart.
Link: https://lore.kernel.org/r/20240822195944.654691-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The workqueue maintainer wants to remove the create*_workqueue() macros
because these macros always set the WQ_MEM_RECLAIM flag and because these
only support literal workqueue names. Hence this patch that replaces the
create*_workqueue() invocations with the definition of this macro. The
WQ_MEM_RECLAIM flag has been retained because I think that flag is necessary
for workqueues created by storage drivers. This patch has been generated by
running spatch and git clang-format. spatch has been invoked as follows:
spatch --in-place --sp-file expand-create-workqueue.spatch $(git grep -lEw 'create_(freezable_|singlethread_|)workqueue' */scsi */ufs)
The contents of the expand-create-workqueue.spatch file is as follows:
@@
expression name;
@@
-create_workqueue(name)
+alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, name)
@@
expression name;
@@
-create_freezable_workqueue(name)
+alloc_workqueue("%s", WQ_FREEZABLE | WQ_UNBOUND | WQ_MEM_RECLAIM, 1, name)
@@
expression name;
@@
-create_singlethread_workqueue(name)
+alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM, name)
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240822195944.654691-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a timestamp update or an event acknowledgment command times out, the
driver invokes the soft reset handler to recover the controller while
holding a mutex lock. The soft reset handler also tries to acquire the same
mutex to send initialization commands to the controller which leads to a
deadlock scenario.
To resolve the issue the driver will check thestatus and if this indicates
the controller is operational, the driver will issue a diagnostic fault
reset and exit out of the command processing function. If the controller is
already faulted or asynchronously reset, then the driver will just exit the
command processing function.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
mpi3mr, hisi_sas, arcmsr).
The major core change is the constification of the host templates
(which touches everything) along with other minor fixups and clean
ups"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command()
scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately
scsi: cxlflash: s/semahpore/semaphore/
scsi: lpfc: Silence an incorrect device output
scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation
scsi: scsi_debug: Fix missing error code in scsi_debug_init()
scsi: hisi_sas: Work around build failure in suspend function
scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
scsi: mpt3sas: Fix an issue when driver is being removed
scsi: mpt3sas: Remove HBA BIOS version in the kernel log
scsi: target: core: Fix invalid memory access
scsi: scsi_debug: Drop sdebug_queue
scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()
scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
scsi: scsi_debug: Use scsi_block_requests() to block queues
scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
scsi: scsi_debug: Change shost list lock to a mutex
...
The driver is exiting from the fault watchdog thread if it sees the 0xF002
(Soft reset in progress) fault code.
If the driver initiates the soft reset, then the driver restarts the
watchdog at the end of the soft reset completion. However, if the soft
reset is initiated by the firmware asynchronously, then the driver will
never restart the watchdog and never re-initialize the controller after the
asynchronous soft reset completion.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230331122317.11391-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>