The error recovery is handled by management firmware (MFW) with the help of
qed/qedi drivers. Upon detecting errors, driver informs MFW about this
event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
notification to the driver which performs the required cleanup/recovery
from the driver side.
Link: https://lore.kernel.org/r/20200908095657.26821-9-mrangankar@marvell.com
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add support to initiate MFW process recovery for all the devices if storage
function receives the event first.
Also added fix for kernel test robot warning,
>> drivers/scsi/qedi/qedi_main.c:1119:6: warning: no previous prototype
>> for 'qedi_schedule_hw_err_handler' [-Wmissing-prototypes]
Link: https://lore.kernel.org/r/20200908095657.26821-8-mrangankar@marvell.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For short time cable pulls, the in-flight I/O to the firmware is never
cleaned up, resulting in the behaviour of stale I/O completion causing
list_del corruption and soft lockup of the system.
On link down event, mark all the connections for recovery, causing cleanup
of all the in-flight I/O immediately.
Link: https://lore.kernel.org/r/20200908095657.26821-7-mrangankar@marvell.com
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While aborting the I/O, the firmware cleanup task timed out and driver
deleted the I/O from active command list. Some time later the firmware
sent the cleanup task response and driver again deleted the I/O from
active command list causing firmware to send completion for non-existent
I/O and list_del corruption of active command list.
Add fix to check if I/O is present before deleting it from the active
command list to ensure firmware sends valid I/O completion and protect
against list_del corruption.
Link: https://lore.kernel.org/r/20200908095657.26821-4-mrangankar@marvell.com
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
VIOS partitions with SLI-4 enabled Emulex adapters will be capable of
driving I/O in parallel through mulitple work queues or channels, and with
new hypervisor firmware that supports multiple interrupt sources an ibmvfc
NPIV single initiator can be modified to exploit end-to-end channelization
in a PowerVM environment.
VIOS hosts will also be able to expose fabric perfromance impact
notifications (FPIN) via a new asynchronous event to ibmvfc clients that
advertise support via IBMVFC_CAN_HANDLE_FPIN in their capabilities flag
during NPIV_LOGIN.
This patch introduces three new Management Datagrams (MADs) for
channelization support negotiation as well as the FPIN asynchronous
event and FPIN status flags. Follow up work is required to plumb the
ibmvfc client driver to use these new interfaces.
Link: https://lore.kernel.org/r/20200904232936.840193-2-tyreld@linux.ibm.com
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently when using the num_parts parameter, partitions are aligned and
the end sector is one prior to the next start. This creates different
sized partitions. Create instead equally sized partitions by trimming the
end of each partition to the size of the smallest partition. This aligns
better with what one would expect from automatically created partitions and
can be helpful with testing things such as raid which often expect legs of
the same size. Minimal space is lost as the initial partition starting
size is calculated by dividing num_sectors by sdebug_num_parts.
Link: https://lore.kernel.org/r/20200902211434.9979-2-jpittman@redhat.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add BIST support for phy FFE (Feed forward equalizer) setting. The user can
configure FFE through the new debugfs interface.
FFE is a parameter used for link layer control. It will affect the link
quality between the SAS controller and the backplane. In the BIST test, the
FFE interface is provided to assist board testers in optimizing link
parameters.
The modification of the FFE parameter will affect the test after BIST or
the normal running of the board. The user should save the initial FFE
values and restore them after BIST test is complete.
Link: https://lore.kernel.org/r/1598958790-232272-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When updating PROG_PHY_LINK_RATE to set linkrate for a phy we used a
hard-coded initial value instead of getting the current value from the
register. The assumption was that this register would not be modified, but
in fact it was partially modified in a new version of hardware. The
hard-coded value we used changed the default value of the register to a an
incorrect setting and as a result the SAS controller could not change
linkrate for the phy.
Delete hard-coded value and always read the latest value of register before
updating it.
Link: https://lore.kernel.org/r/1598958790-232272-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This addresses the following gcc warning with "make W=1":
drivers/scsi/fnic/fnic_fcs.c: In function ‘fnic_rq_cmpl_frame_recv’:
drivers/scsi/fnic/fnic_fcs.c:840:15: warning: variable
‘eth_hdrs_stripped’ set but not used [-Wunused-but-set-variable]
840 | unsigned int eth_hdrs_stripped;
| ^~~~~~~~~~~~~~~~~
Link: https://lore.kernel.org/r/20200831081126.3251288-5-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This addresses the following gcc warning with "make W=1":
drivers/scsi/fnic/fnic_fcs.c: In function ‘is_fnic_fip_flogi_reject’:
drivers/scsi/fnic/fnic_fcs.c:317:9: warning: variable ‘els_len’ set but
not used [-Wunused-but-set-variable]
317 | size_t els_len = 0;
| ^~~~~~~
drivers/scsi/fnic/fnic_fcs.c:312:21: warning: variable ‘els_dtype’ set
but not used [-Wunused-but-set-variable]
312 | enum fip_desc_type els_dtype = 0;
| ^~~~~~~~~
Link: https://lore.kernel.org/r/20200831081126.3251288-3-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
PA Layer issues a LINERESET to the PHY at the recovery step in the Power
Mode change operation. If it happens during auto or manual hibern8 enter,
even if hibern8 enter succeeds, UFS power mode shall be set to PWM-G1 mode
and kept in that mode after exit from hibern8, leading to bad performance.
Handle the LINERESET in the eh_work by restoring power mode to HS mode
after all pending reqs and tasks are cleared from doorbell.
Link: https://lore.kernel.org/r/1598321228-21093-3-git-send-email-cang@codeaurora.org
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To recover non-fatal errors, no full reset is required, err_handler only
clears those pending TRs/TMRs so that SCSI layer can re-issue them. In
current err_handler, TRs are directly cleared from UFS host's doorbell but
not aborted from device side. However, according to the UFSHCI JEDEC spec,
the host software shall use UTP Transfer Request List Clear Register to
clear a task from UFS host's doorbell only when a UTP Transfer Request is
expected to not be completed, e.g. when the host software receives a
“FUNCTION COMPLETE” Task Management response which means a Transfer Request
was aborted. To follow the UFSHCI JEDEC spec, in err_handler, abort one TR
before clearing it from doorbell.
Link: https://lore.kernel.org/r/1598321228-21093-2-git-send-email-cang@codeaurora.org
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have two knobs to control flush for write booster,
fWriteBoosterBufferFlushDuringHibernate and fWriteBoosterBufferFlushEn.
Some vendors use only fWriteBoosterBufferFlushDuringHibernate because this
can reportedly cover most scenarios. Also, there have been some reports
that flush by fWriteBoosterBufferFlushEn could lead to increased power
consumption thanks to unexpected internal operations. Consequently, we need
a way to enable or disable fWriteBoosterEn operations. Add quirk to bypass
manual flush.
Link: https://lore.kernel.org/r/ffdb0eda30515809f0ad9ee936b26917ee9b4593.1598319701.git.kwmad.kim@samsung.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>