Stop the OS from re-discovering multiple LUNs for tape drive and medium
changer.
Duplicate device nodes for Ultrium tape drive and medium changer are being
created.
The Ultrium tape drive is a multi-LUN SCSI target. It presents a LUN for
the tape drive and a 2nd LUN for the medium changer. Our controller FW
lists both LUNs in the RPL results.
As a result, the smartpqi driver exposes both devices to the OS. Then the
OS does its normal device discovery via the SCSI REPORT LUNS command, which
causes it to re-discover both devices a 2nd time, which results in the
duplicate device nodes.
Link: https://lore.kernel.org/r/20210928235442.201875-10-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Send a TEST UNIT READY to HBA disks and do not present them to the OS if
0x02/0x04/0x1b (SANITIZE IN PROGRESS) is returned.
During boot-up, some OSes appear to hang when there are one or more disks
undergoing a sanitize operation.
According to SCSI SBC4 specification section 4.11.2 "Commands allowed
during SANITIZE", some SCSI commands are permitted, but read/write
operations are not.
When the OS attempts to read the disk partition table a CHECK CONDITION ASC
0x04 ASCQ 0x1b is returned which causes the OS to retry the read until
SANITIZE has completed. This can take hours.
According to document HPE Smart Storage Administrator User Guide, during
the sanitize erase operation, the drive is unusable. I.e. the expected
behavior for SANITIZE is the that disk remains offline even after SANITIZE
has completed. The customer is expected to re-enable the disk using the
management utility.
Link: https://lore.kernel.org/r/20210928235442.201875-6-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enhance check for commands queued to the controller. Add new function
pqi_nonempty_inbound_queue_count() that will wait for all I/O queued for
submission to controller across all queue groups to drain. Add helper
functions to obtain queue command counts for each queue group. These
queues should drain quickly as they are already staged to be submitted down
to the controller's IB queue.
Enhance check for outstanding command completion. Update the count of
outstanding commands while waiting. This value was not re-obtained and was
potentially causing infinite wait for all completions.
Link: https://lore.kernel.org/r/20210928235442.201875-5-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct kdump hangs when controller is locked up.
There are occasions when a controller reboot (controller soft reset) is
issued when a controller firmware crash dump is in progress.
This leads to incomplete controller firmware crash dump:
- When the controller crash dump is in progress, and a kdump is initiated,
the driver issues inbound doorbell reset to bring back the controller in
SIS mode.
- If the controller is in locked up state, the inbound doorbell reset does
not work causing controller initialization failures. This results in the
driver hanging waiting for SIS mode.
To avoid an incomplete controller crash dump, add in a controller crash
dump handshake:
- Controller will indicate start and end of the controller crash dump by
setting some register bits.
- Driver will look these bits when a kdump is initiated. If a controller
crash dump is in progress, the driver will wait for the controller crash
dump to complete before issuing the controller soft reset then complete
driver initialization.
Link: https://lore.kernel.org/r/20210928235442.201875-3-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This function is more complicated than necessary.
If we change from scnprintf() to snprintf() that lets us remove the if
bytes_wrote < sizeof(protocol) checks. Also, we can use bytes_wrote ? ","
: "" to print the comma and remove the separate if statement and the
"is_string_nonempty" variable.
[mkp: a few formatting cleanups and s/wrote/written/]
Link: https://lore.kernel.org/r/20210916132605.GF25094@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
IRQ polling thread calls ISR after enable_irq() to handle any missed I/O
completion. The atomic flag "in_used" was added to have the synchronization
between the IRQ polling thread and the interrupt context. There is a bug
around it leading to a race condition.
Below is the sequence:
- IRQ polling thread accesses ISR, fetches the reply descriptor.
- Real interrupt arrives and pre-empts polling thread (enable_irq() is
already called).
- Interrupt context picks the same reply descriptor as fetched by polling
thread, processes it, and exits.
- Polling thread resumes and processes the descriptor which is already
processed by interrupt thread leads to kernel crash.
Setting the "in_used" flag before fetching the reply descriptor ensures
synchronized access to ISR.
Link: https://www.spinics.net/lists/linux-scsi/msg159440.html
Link: https://lore.kernel.org/r/20210929124022.24605-2-sumit.saxena@broadcom.com
Fixes: 9bedd36e91 ("scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs")
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change the log level of the following message to debug:
Unsupported SCSI Opcode 0xXX, sending CHECK_CONDITION.
This message is mostly helpful during debugging sessions in order to
understand errors on the initiator side. But most of the time it's just
useless and makes reading logs much harder.
It gets particularly annoying if there are many initiators that come and go
or if an initiator runs a program that does not care whether the command is
supported and just keeps sending it.
Link: https://lore.kernel.org/r/20210929114959.705852-1-k.shelekhin@yadro.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Callers of ufshcd_reset_and_restore() expect it to return in an operational
state. However, the code only checks direct errors and so the ufshcd_state
may not be UFSHCD_STATE_OPERATIONAL due to error interrupts.
Fix by also checking ufshcd_state, still allowing non-fatal errors which
are left for the error handler to deal with.
Link: https://lore.kernel.org/r/20211002154550.128511-2-adrian.hunter@intel.com
Reviewed-by: Avri altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 57d104c153 ("ufs: add UFS power management support") made the UFS
driver submit a REQUEST SENSE command before submitting a power management
command to a WLUN to clear the POWER ON unit attention. Instead of
submitting a REQUEST SENSE command before submitting a power management
command, retry the power management command until it succeeds.
This is the preparation to get rid of all UNIT ATTENTION code which should
be handled by users.
Link: https://lore.kernel.org/r/20211001182015.1347587-2-jaegeuk@kernel.org
Cc: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The MODE SELECT(6) command allows handling mode page buffers that are up to
255 bytes, including the 4 byte header needed in front of the page
buffer. For requests larger than this limit, automatically use the MODE
SELECT(10) command.
In both cases, since scsi_mode_select() adds the mode select page header,
checks on the buffer length value must include this header size to avoid
overflows of the command CDB allocation length field.
While at it, use put_unaligned_be16() for setting the header block
descriptor length and CDB allocation length when using MODE SELECT(10).
[mkp: fix MODE SENSE vs. MODE SELECT confusion]
Link: https://lore.kernel.org/r/20210820070255.682775-3-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Several problems exist with scsi_mode_sense() buffer length handling:
1) The allocation length field of the MODE SENSE(10) command is 16-bits,
occupying bytes 7 and 8 of the CDB. With this command, access to mode
pages larger than 255 bytes is thus possible. However, the CDB
allocation length field is set by assigning len to byte 8 only, thus
truncating buffer length larger than 255.
2) If scsi_mode_sense() is called with len smaller than 8 with
sdev->use_10_for_ms set, or smaller than 4 otherwise, the buffer length
is increased to 8 and 4 respectively, and the buffer is zero filled
with these increased values, thus corrupting the memory following the
buffer.
Fix these 2 problems by using put_unaligned_be16() to set the allocation
length field of MODE SENSE(10) CDB and by returning an error when len is
too small.
Furthermore, if len is larger than 255B, always try MODE SENSE(10) first,
even if the device driver did not set sdev->use_10_for_ms. In case of
invalid opcode error for MODE SENSE(10), access to mode pages larger than
255 bytes are not retried using MODE SENSE(6). To avoid buffer length
overflows for the MODE_SENSE(10) case, check that len is smaller than 65535
bytes.
While at it, also fix the folowing:
* Use get_unaligned_be16() to retrieve the mode data length and block
descriptor length fields of the mode sense reply header instead of using
an open coded calculation.
* Fix the kdoc dbd argument explanation: the DBD bit stands for Disable
Block Descriptor, which is the opposite of what the dbd argument
description was.
Link: https://lore.kernel.org/r/20210820070255.682775-2-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building an allmodconfig kernel, the following build error shows up:
aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_probe':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:177: undefined reference to `hwmon_device_register_with_info'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:177:(.text+0x510): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_device_register_with_info'
aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_remove':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:195: undefined reference to `hwmon_device_unregister'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:195:(.text+0x5c8): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_device_unregister'
aarch64-linux-gnu-ld: drivers/scsi/ufs/ufs-hwmon.o: in function `ufs_hwmon_notify_event':
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:206: undefined reference to `hwmon_notify_event'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:206:(.text+0x64c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_notify_event'
aarch64-linux-gnu-ld: /home/anders/src/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:209: undefined reference to `hwmon_notify_event'
/kernel/next/drivers/scsi/ufs/ufs-hwmon.c:209:(.text+0x66c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `hwmon_notify_event'
Since SCSI_UFS_HWMON can't be built as a module, SCSI_UFS_HWMON has to
depend on HWMON=y.
Link: https://lore.kernel.org/r/20210927084615.1938432-1-anders.roxell@linaro.org
Fixes: e88e2d3220 ("scsi: ufs: core: Probe for temperature notification support")
Also-reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.
Use the struct_size() helper to do the arithmetic instead of the argument
"size + count * size" in the kzalloc() function.
This code was detected with the help of Coccinelle and audited and fixed
manually.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Link: https://lore.kernel.org/r/20210925114205.11377-1-len.baker@gmx.com
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Contention for the mailbox interface may occur during driver initialization
(immediately after a function reset), between mailbox commands initiated
via ioctl (bsg) and those driver requested by the driver.
After setting SLI_ACTIVE flag for a port, there is a window in which the
driver will allow an ioctl to be initiated while the adapter is
initializing and issuing mailbox commands via polling. The polling logic
then gets confused.
Correct by having thread setting SLI_ACTIVE spot an active mailbox command
and allow it complete before proceeding.
Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>