SPC4 has:
The first ISCSI INITIATOR SESSION ID field byte containing an ASCII null
character terminates the ISCSI INITIATOR SESSION ID field without regard
for the specified length of the iSCSI TransportID or the contents of the
ADDITIONAL LENGTH field.
----------------------------------------
which sounds like we can get an iSID shorter than 12 chars. SPC and the
iSCSI RFC do not say how to handle that case other than just cutting off
the iSID. This patch just makes sure that if we get an iSID like that, we
only copy/send that string.
There is no OS that does this right now, so there was no test case. I did
test with sg utils to check it works as expected and nothing breaks.
Link: https://lore.kernel.org/r/1593654203-12442-8-git-send-email-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This fixes the following bugs with the transport id setup for iscsi:
1. Incorrectly adding NULL after initiator name for TPID format 1.
2. For TPID format 1 buffer setup we are doing off+len, off++ and then
also len+=some_value. This results in the isid going past buffer
boundaries when we then do buf[off+len]
3. The pr_reg_isid is the isid in string format which is 12 bytes, but we
are only copying 6 bytes.
Link: https://lore.kernel.org/r/1593654203-12442-6-git-send-email-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
__core_scsi3_add_registration clears the t10_pr_registration pr_reg_deve
and does a core_scsi3_lunacl_undepend_item which does an undepend and also
does a kref_put from the get done in __core_scsi3_alloc_registration. So
when we get to the bottom of core_scsi3_decode_spec_i_port the pr_reg_deve
is NULL and we crash when trying to access the local_pr_reg's pr_reg_deve.
We've also done an extra undepend for local_pr_reg and if we didn't crash
on the NULL we would have done an extra kref_put too.
This patch has us do a core_scsi3_lunacl_depend_item for local_pr_reg and
then let __core_scsi3_add_registration handle the cleanup for the
pr_reg_deve. We then just skip the undepend for the acl and tpg for the
local pr_reg.
The error path then works in a similar way, but we always do the
core_scsi3_lunacl_undepend_item since we never call
__core_scsi3_add_registration in that code path.
Link: https://lore.kernel.org/r/1593654203-12442-4-git-send-email-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Heavy testing indicates the irqsave() spinlock around the __set_bit() is
insufficient to stop following clear_bit() calls being rarely applied
out-of-order. Also the nearby failed kzalloc() path leading to
SCSI_MLQUEUE_HOST_BUSY does not properly undo the in_use bitmap and
num_in_q, fix.
Link: https://lore.kernel.org/r/20200702145355.522283-1-dgilbert@interlog.com
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we are doing async removal of the session, we could be doing a
scsi_remove_target from the removal workqueue, and for the offload case we
could be doing a new session addition and scan to the same host. The
add/scan might then end up trying to use the target_id of the target we are
removing.
This patch just has a delay the freeing of the target_id until after the
scsi_remove_target has completed, so we know it's no longer in use.
Link: https://lore.kernel.org/r/1593632868-6808-2-git-send-email-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current logging methods typically end up requesting a reproduction with
a different logging level set to figure out what happened. This was mainly
by design to not clutter the kernel log messages with things that were
typically not interesting and the messages themselves could cause other
issues.
When looking to make a better system, it was seen that in many cases when
more data was wanted was when another message, usually at KERN_ERR level,
was logged. And in most cases, what the additional logging that was then
enabled was typically. Most of these areas fell into the discovery machine.
Based on this summary, the following design has been put in place: The
driver will maintain an internal log (256 elements of 256 bytes). The
"additional logging" messages that are usually enabled in a reproduction
will be changed to now log all the time to the internal log. A new logging
level is defined - LOG_TRACE_EVENT. When this level is set (it is not by
default) and a message marked as KERN_ERR is logged, all the messages in
the internal log will be dumped to the kernel log before the KERN_ERR
message is logged.
There is a timestamp on each message added to the internal log. However,
this timestamp is not converted to wall time when logged. The value of the
timestamp is solely to give a crude time reference for the messages.
Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although the existing implementation is very good at high I/O load, on
tests involving light load, especially on only a few hardware queues,
latency was a little higher than it can be due to using workqueue
scheduling. Other tasks in the system can delay handling.
Change the lower level to use irq_poll by default which uses a softirq for
I/O completion. This gives better latency as variance in when the cq is
processed is reduced over the workqueue interface. However, as high load is
better served by not being in softirq when the CPU is loaded, work queues
are still used under high I/O load.
Link: https://lore.kernel.org/r/20200630215001.70793-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change vocabulary of 0373 log msg from "error" to "cmpl" The current
language of the 0373 message contains the word "error" which caused a
number of customers to inquire about the "error" and if it should be a
concern. It isn't an error, it's simply an io completion status.
Revise the message to replace the word "error" with "cmpl" for completion.
Link: https://lore.kernel.org/r/20200630215001.70793-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the kdump kernel shuts down lpfc calls flush_work_queue on an
interrupt to schedule the cq handler. When there is only one CPU active on
the kdump kernel, it is possible for the work_on to get scheduled on a
non-active CPU causing it to never be scheduled.
When in the kdump environment, per-CPU affinity of cq's to cpus is not
necessary. In those cases, use a general queue_work rather than a
queue_work_on().
Link: https://lore.kernel.org/r/20200630215001.70793-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When vports are deleted, it is observed that there is memory/kthread
leakage as the vport isn't fully being released.
There is a shost reference taken in scsi_add_host_dma that is not released
during scsi_remove_host. It was noticed that other drivers resolve this by
doing a scsi_host_put after calling scsi_remove_host.
The vport_delete routine is taking two references one that corresponds to
an access to the scsi_host in the vport_delete routine and another that is
released after the adapter mailbox command completes that destroys the VPI
that corresponds to the vport.
Remove one of the references taken such that the second reference that is
put will complete the missing scsi_add_host_dma reference and the shost
will be terminated.
Link: https://lore.kernel.org/r/20200630215001.70793-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Call traces have been observed running different tests that involve aborts
and setting the rrq active flag. The lpfc_set_rrq_active routine is doing
a mempool_alloc under the soft_irq processing level. When the mempool needs
to get a new buffer from the free pool and has to wait for memory to become
free it will check the flags passed in on the alloc and dump the stack if
the thread is running in interrupt context.
Replace the GFP_KERNEL flag with GFP_ATOMIC so that the memory allocation
will not attempt to sleep if there is no mem available.
Link: https://lore.kernel.org/r/20200630215001.70793-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During driver unload/reload testing, the NVMe initiator would not
re-establish connectivity to NVMe controllers on reload.
The failing NVMe array supports concurrent FCP and NVMe operation via
different nport_id's. The array was repeatedly sending an ADISC every 2
seconds after PLOGI completed and while NVMe subsystems were executing
discovery. The target would continue this state for roughly 45 seconds.
The driver's current behavior on ADISC receipt is to validate a the ADISC
vs the device and issue a RESUME_RPI to restore transmission. The receipt
of the ADISC effectively caused a driver to take actions similar to a
logout and login for the remote port, causing the deregistration of the
nvme rport and a subsequent re-registration. This caused a constant reset
and re-connect of the NVMe controller while this 45s window occurred. There
was no need for the state changes as ADISC does not change login state.
This patch corrects this behavior by validating if the remoteport is
already logged in (MAPPED) and when true, avoids the call to set the ndlp
state to MAPPED, which triggers the unreg/re-reg. Thus ADISC does not
change the login state of the node.
Link: https://lore.kernel.org/r/20200630215001.70793-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Coverity reported the following error:
Assigned value that is never used may represent unnecessary computation.
The rc variable was initially assigned a value but in several cases, when
an error case is detected, it is reassigned a new value. The initial value
had little use.
In code-reviewing this routine, it could use some cleanup:
- Setting the initialization value to -ENODEV is a much better choice and
lessens code in the routine.
- The wasn't tracking logic errors vs no error and mailbox failure.
Better to resolve by adding a status to track the mailbox failure
and merge it with the logic error when the routine returns.
Link: https://lore.kernel.org/r/20200630215001.70793-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The request_t 'handle' member is 32-bits wide, hence use wrt_reg_dword().
Change the cast in the wrt_reg_byte() call to make it clear that a regular
pointer is casted to an __iomem pointer.
Note: 'pkt' points to I/O memory for the qlafx00 adapter family and to
coherent memory for all other adapter families.
This patch fixes the following Coverity complaint:
CID 358864 (#1 of 1): Reliance on integer endianness (INCOMPATIBLE_CAST)
incompatible_cast: Pointer &pkt->handle points to an object whose effective
type is unsigned int (32 bits, unsigned) but is dereferenced as a narrower
unsigned short (16 bits, unsigned). This may lead to unexpected results
depending on machine endianness.
Link: https://lore.kernel.org/r/20200629225454.22863-7-bvanassche@acm.org
Fixes: 8ae6d9c7eb ("[SCSI] qla2xxx: Enhancements to support ISPFx00.")
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Quinn Tran <qutran@marvell.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If tcmu_handle_completions() has to process a padding shorter than
sizeof(struct tcmu_cmd_entry), the current call to
tcmu_flush_dcache_range() with sizeof(struct tcmu_cmd_entry) as length
param is wrong and causes crashes on e.g. ARM, because
tcmu_flush_dcache_range() in this case calls
flush_dcache_page(vmalloc_to_page(start)); with start being an invalid
address above the end of the vmalloc'ed area.
The fix is to use the minimum of remaining ring space and sizeof(struct
tcmu_cmd_entry) as the length param.
The patch was tested on kernel 4.19.118.
See https://bugzilla.kernel.org/show_bug.cgi?id=208045#c10
Link: https://lore.kernel.org/r/20200629093756.8947-1-bstroesser@ts.fujitsu.com
Tested-by: JiangYu <lnsyyj@hotmail.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Allow Exynos UFS driver to build as a module. This patch fixes the
followin build issue reported by kernel build robot.
drivers/scsi/ufs/ufs-exynos.o: in function `exynos_ufs_probe':
drivers/scsi/ufs/ufs-exynos.c:1231: undefined reference to `ufshcd_pltfrm_init'
drivers/scsi/ufs/ufs-exynos.o: in function `exynos_ufs_pre_pwr_mode':
drivers/scsi/ufs/ufs-exynos.c:635: undefined reference to `ufshcd_get_pwr_dev_param'
drivers/scsi/ufs/ufs-exynos.o:undefined reference to `ufshcd_pltfrm_shutdown'
drivers/scsi/ufs/ufs-exynos.o:undefined reference to `ufshcd_pltfrm_suspend'
drivers/scsi/ufs/ufs-exynos.o:undefined reference to `ufshcd_pltfrm_resume'
drivers/scsi/ufs/ufs-exynos.o:undefined reference to `ufshcd_pltfrm_runtime_suspend'
drivers/scsi/ufs/ufs-exynos.o:undefined reference to `ufshcd_pltfrm_runtime_resume'
drivers/scsi/ufs/ufs-exynos.o:undefined reference to `ufshcd_pltfrm_runtime_idle'
Link: https://lore.kernel.org/r/20200620173232.52521-1-alim.akhtar@samsung.com
Fixes: 55f4b1f736 ("scsi: ufs: ufs-exynos: Add UFS host support for Exynos SoCs")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>