Don Brace <don.brace@microchip.com> says:
These patches are based on Martin Petersen's 6.12/scsi-queue tree
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
6.12/scsi-queue
There are two functional changes:
smartpqi-add-fw-log-to-kdump
smartpqi-add-counter-for-parity-write-stream-requests
There are three minor bug fixes:
smartpqi-fix-stream-detection
smartpqi-fix-rare-system-hang-during-LUN-reset
smartpqi-fix-volume-size-updates
The other two patches add PCI-IDs for new controllers and change the
driver version.
This set of changes consists of:
* smartpqi-add-fw-log-to-kdump
During a kdump, the driver tells the controller to copy its logging
information to some pre-allocated buffers that can be analyzed
later.
This is a "feature" driven capability and is backward compatible
with existing controller FW.
This patch renames some prefixes for OFA (Online-Firmware Activation
ofa_*) buffers to host_memory_*. So, not a lot of actual functional
changes to smartpqi_init.c, mainly determining the memory size
allocation.
We added a function to notify the controller to copy debug data into
host memory before continuing kdump.
Most of the functional changes are in smartpqi_sis.c where the
actual handshaking is done.
* smartpqi-fix-stream-detection
Correct some false write-stream detections. The data structure used
to check for write-streams was not initialized to all 0's causing
some false write stream detections. The driver sends down streamed
requests to the raid engine instead of using AIO bypass for some
extra performance. (Potential full-stripe write verses Read Modify
Write).
False detections have not caused any data corruption. Found by
internal testing. No known externally reported bugs.
* smartpqi-add-counter-for-parity-write-stream-requests
Adding some counters for raid_bypass and write streams. These two
counters are related because write stream detection is only checked
if an I/O request is eligible for bypass (AIO).
The bypass counter (raid_bypass_cnt) was moved into a common
structure (pqi_raid_io_stats) and changed to type __percpu. The
write stream counter is (write_stream_cnt) has been added to this
same structure.
These counters are __percpu counters for performance. We added a
sysfs entry to show the write stream count. The raid bypass counter
sysfs entry already exists.
Useful for checking streaming writes. The change in the sysfs entry
write_stream_cnt can be checked during AIO eligible write
operations.
* smartpqi-add-new-controller-PCI-IDs
Adding support for new controller HW. No functional changes.
* smartpqi-fix-rare-system-hang-during-LUN-reset
We found a rare race condition that can occur during a LUN reset. We
were not emptying our internal queue completely.
There have been some rare conditions where our internal request
queue has requests for multiple LUNs and a reset comes in for one of
the LUNs. The driver waits for this internal queue to empty. We were
only clearing out the requests for the LUN being reset so the
request queue was never empty causing a hang.
The Fix:
For all requests in our internal request queue:
Complete requests with DID_RESET for queued requests for the
device undergoing a reset.
Complete requests with DID_REQUEUE for all other queued requests.
Found by internal testing. No known externally reported bugs.
* smartpqi-fix-volume-size-updates
The current code only checks for a size change if there is also a
queue depth change. We are separating the check for queue depth and
the size changes.
Found by internal testing. No known bugs were filed.
* smartpqi-update-version-to-2.1.30-031
No functional changes.
Link: https://lore.kernel.org/r/20240827185501.692804-1-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct a rare case where in a LUN reset occurs on a device and I/O
requests for other devices persist in the driver's internal request queue.
Part of a LUN reset involves waiting for our internal request queue to
empty before proceeding. The internal request queue contains requests not
yet sent down to the controller.
We were clearing the requests queued for the LUN undergoing a reset, but
not all of the queued requests. Causing a hang.
For all requests in our internal request queue:
Complete requests with DID_RESET for queued requests for the device
undergoing a reset.
Complete requests with DID_REQUEUE for all other queued requests.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20240827185501.692804-6-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add sysfs entry to check for write stream requests.
Move existing raid_bypass_cnt into a structure named pqi_raid_io_stats and
add member write_stream_cnt. These two counters are related because write
stream detection is only checked if an I/O request is eligible for bypass
(AIO).
Example usage:
lsscsi
[15:1:0:0] disk Adaptec LOGICAL VOLUME 0129 /dev/sdae
cat /sys/block/sdae/device/ssd_smart_path_enabled
1
^
|
+---- NOTE: here bypass has been enabled on device sdae
To read the counter for parity write stream requests:
cat /sys/block/sdae/device/write_stream_cnt
0x60cd507
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Co-developed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20240827185501.692804-4-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct stream detection by initializing the structure
pqi_scsi_dev_raid_map_data to 0s.
When the OS issues SCSI READ commands, the driver erroneously considers
them as SCSI WRITES. If they are identified as sequential IOs, the driver
then submits those requests via the RAID path instead of the AIO path.
The 'is_write' flag might be set for SCSI READ commands also. The driver
may interpret SCSI READ commands as SCSI WRITE commands, resulting in IOs
being submitted through the RAID path.
Note: This does not cause data corruption.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20240827185501.692804-3-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add controller logs to kdump.
Driver allocates DMA memory and communicates this address to FW. In the
event of system crash, host driver notifies the firmware about the crash
and firmware posts all the necessary logs in the pre-allocated host buffer
for firmware debugging.
Once firmware notifies the completion of the log uploading to the host
memory and host continues with the OS crash dump saving.
This is a "feature" driven capability and is backward compatible with
existing controller FW.
Rename some prefixes for OFA (Online-Firmware Activation ofa_*) buffers to
host_memory_*. So, not a lot of actual functional changes to
smartpqi_init.c, mainly determining the memory size allocation.
Added a function to notify the controller to copy debug data into host
memory before continuing kdump.
Most of the functional changes are in smartpqi_sis.c where the actual
handshaking is done.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20240827185501.692804-2-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche <bvanassche@acm.org> says:
Hi Martin,
Multiple SCSI drivers use snprintf() to format a workqueue name before
invoking one of the create*_workqueue() macros. This patch series
simplifies such code by passing the format string and arguments to
alloc_workqueue(). Additionally, the structure members that are only
used as a temporary buffer for formatting workqueue names are
removed. Please consider this patch series for the next merge window.
Thanks,
Bart.
Link: https://lore.kernel.org/r/20240822195944.654691-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The workqueue maintainer wants to remove the create*_workqueue() macros
because these macros always set the WQ_MEM_RECLAIM flag and because these
only support literal workqueue names. Hence this patch that replaces the
create*_workqueue() invocations with the definition of this macro. The
WQ_MEM_RECLAIM flag has been retained because I think that flag is necessary
for workqueues created by storage drivers. This patch has been generated by
running spatch and git clang-format. spatch has been invoked as follows:
spatch --in-place --sp-file expand-create-workqueue.spatch $(git grep -lEw 'create_(freezable_|singlethread_|)workqueue' */scsi */ufs)
The contents of the expand-create-workqueue.spatch file is as follows:
@@
expression name;
@@
-create_workqueue(name)
+alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, name)
@@
expression name;
@@
-create_freezable_workqueue(name)
+alloc_workqueue("%s", WQ_FREEZABLE | WQ_UNBOUND | WQ_MEM_RECLAIM, 1, name)
@@
expression name;
@@
-create_singlethread_workqueue(name)
+alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM, name)
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240822195944.654691-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 13247018d6 ("scsi: target: iscsi: Fix hang in the iSCSI login
code") removed iscsi_handle_login_thread_timeout() but left declaration.
Commit 3e1c81a95f ("iscsi-target: Refactor RX PDU logic + export request
PDU handling") left iscsi_target_get_initial_payload() declaration.
Commit d703ce2f7f ("iscsi/iser-target: Convert to command priv_size
usage") remove iscsit_alloc_cmd() but left declaration.
And finally, a few other declarations were never implenmented since
introduction in commit e48354ce07 ("iscsi-target: Add iSCSI fabric
support for target v4.1").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20240810093437.2586476-1-yuehaibing@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Finn Thain <fthain@linux-m68k.org> says:
This series begins with some work on the mac_scsi driver to improve
compatibility with SCSI2SD v5 devices. Better error handling is needed
there because the PDMA hardware does not tolerate the write latency
spikes which SD cards can produce.
A bug is fixed in the 5380 core driver so that scatter/gather can be
enabled in mac_scsi.
Several patches at the end of this series improve robustness and
correctness in the core driver.
This series has been tested on a variety of mac_scsi hosts. A variety
of SCSI targets was also tested, including Quantum HDD, Fujitsu HDD,
Iomega FDD, Ricoh CD-RW, Matsushita CD-ROM, SCSI2SD and BlueSCSI.
Link: https://lore.kernel.org/r/cover.1723001788.git.fthain@linux-m68k.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It's not an error for a target to change the bus phase during a transfer.
Unfortunately, the FLAG_DMA_FIXUP workaround does not allow for that -- a
phase change produces a DRQ timeout error and the device borken flag will
be set.
Check the phase match bit during FLAG_DMA_FIXUP processing. Don't forget to
decrement the command residual. While we are here, change shost_printk()
into scmd_printk() for better consistency with other DMA error messages.
Tested-by: Stan Johnson <userm57@yahoo.com>
Fixes: 55181be8ce ("ncr5380: Replace redundant flags with FLAG_NO_DMA_FIXUP")
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Link: https://lore.kernel.org/r/99dc7d1f4c825621b5b120963a69f6cd3e9ca659.1723001788.git.fthain@linux-m68k.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SD cards can produce write latency spikes on the order of a hundred
milliseconds. If the target firmware does not hide that latency during DATA
IN and OUT phases it can cause the PDMA circuitry to raise a processor bus
fault which in turn leads to an unreliable byte count and a DMA overrun.
The Last Byte Sent flag is used to detect the overrun but this mechanism is
unreliable on some systems. Instead, set a DID_ERROR result whenever there
is a bus fault during a PDMA send, unless the cause was a phase mismatch.
Cc: stable@vger.kernel.org # 5.15+
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Fixes: 7c1f3e3447 ("scsi: mac_scsi: Treat Last Byte Sent time-out as failure")
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Link: https://lore.kernel.org/r/cc38df687ace2c4ffc375a683b2502fc476b600d.1723001788.git.fthain@linux-m68k.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>