Currently there is an error return path that neglects to free the
allocation for lcb_context. Fix this by adding a new error free exit path
that kfree's lcb_context before returning. Use this new kfree exit path in
another exit error path that also kfree's the same object, allowing a line
of code to be removed.
Link: https://lore.kernel.org/r/20201118141314.462471-1-colin.king@canonical.com
Fixes: 4430f7fd09 ("scsi: lpfc: Rework locations of ndlp reference taking")
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Resource leak")
Currently there is a null check on the pointer ndlp that exits via error
path issue_ct_rsp_exit followed by another null check on the same pointer
that is almost identical to the previous null check stanza and yet can
never can be reached because the previous check exited via
issue_ct_rsp_exit. This is deadcode and can be removed.
Link: https://lore.kernel.org/r/20201118133744.461385-1-colin.king@canonical.com
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Logically dead code")
There is a null check on pointer lpfc_cmd after the pointer has been
dereferenced when pointers rdata and ndlp are initialized at the start of
the function. Fix this by only assigning rdata and ndlp after the pointer
lpfc_cmd has been null checked.
Link: https://lore.kernel.org/r/20201118131345.460631-1-colin.king@canonical.com
Fixes: 96e209be6e ("scsi: lpfc: Convert SCSI I/O completions to SLI-3 and SLI-4 handlers")
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Dereference before null check")
The FC iu and response payloads are located at different offsets depending
on the ibmvfc_cmd version. This is a result of the version 2 vfcFrame
definition adding an extra 64bytes of reserved space to the structure prior
to the payloads.
Add helper routines to determine the current vfcFrame version and return a
pointer to the proper iu or response structure within that ibmvfc_cmd.
Link: https://lore.kernel.org/r/20201118011104.296999-5-tyreld@linux.ibm.com
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Testing the NPIV Login response capabilities is a long winded process of
dereferencing the vhost->login_buf->resp.capabilities field, then byte
swapping that value to host endian, and performing the bitwise test.
Currently we only ever check this in ibmvfc_cancel_all(), but follow-up
patches will need to regularly check for targetWWPN and channelization
support.
Add a helper to simplify checking various VIOS capabilities, namely
ibmvfc_check_caps().
Link: https://lore.kernel.org/r/20201118011104.296999-4-tyreld@linux.ibm.com
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce a target_wwpn field to several MADs. Its possible that a SCSI ID
of a target can change due to some fabric changes. The WWPN of the SCSI
target provides a better way to identify the target. Also, add flags for
receiving MAD versioning information and advertising client support for
targetWWPN with the VIOS. This latter capability flag will be required for
future clients capable of requesting multiple hardware queues from the host
adapter.
Link: https://lore.kernel.org/r/20201118011104.296999-3-tyreld@linux.ibm.com
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The virtual FC frame command exchanged with the VIOS is used for device
reset and command abort TMF as well as normally queued commands. When
initializing the ibmvfc_cmd there are several elements of the command that
are set the same way regardless of the command type.
Deduplicate code by moving these commonally set fields into a
initialization helper routine, namely ibmvfc_init_vfc_cmd().
Link: https://lore.kernel.org/r/20201118011104.296999-2-tyreld@linux.ibm.com
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Both ibmvfc_show_host_(capabilities|npiv_version) functions retrieve values
from vhost->login_buf.resp buffer. This is the MAD response buffer from the
VIOS and as such any multi-byte non-string values are in big endian format.
Byte swap these values to host CPU endian format for better human
readability.
Link: https://lore.kernel.org/r/20201117185031.129939-1-tyreld@linux.ibm.com
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following call stack prevents clk_gating at every I/O completion. We
can remove the condition, ufshcd_any_tag_in_use(), since clkgating_work
will check it again.
ufshcd_complete_requests(struct ufs_hba *hba)
ufshcd_transfer_req_compl()
__ufshcd_transfer_req_compl()
__ufshcd_release(hba)
if (ufshcd_any_tag_in_use() == 1)
return;
ufshcd_tmc_handler(hba);
blk_mq_tagset_busy_iter();
Note that this still requires work to deal with a potential race condition
when user sets clkgating.delay_ms to very small value. That can cause
preventing clkgating by the check of ufshcd_any_tag_in_use() in gate_work.
Link: https://lore.kernel.org/r/20201117165839.1643377-7-jaegeuk@kernel.org
Fixes: 7252a36030 ("scsi: ufs: Avoid busy-waiting by eliminating tag conflicts")
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nowadays many vendors initialize their device parameters in their own
vendor drivers. The initialization code is almost the same as well as the
pre-defined definitions. Introduce a common device parameter initialization
function which assign the most common initial values. With this function,
we could remove those duplicated codes in vendor drivers.
Link: https://lore.kernel.org/r/20201116065054.7658-3-stanley.chu@mediatek.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch reworks the abort interfaces such that SLI-3 retains the
iocb-based formatting and completions and SLI-4 now uses native WQEs and
completion routines.
The following changes are made:
- The code is refactored from a confusing 2 routine sequence of
xx_abort_iotag_issue(), which creates/formats and abort cmd, and
xx_issue_abort_tag(), which then issues and handles the completion of
the abort cmd - into a single interface of xx_issue_abort_iotag(). The
new interface will determine whether SLI-3 or SLI-4 and then call the
appropriate handler. A completion handler can now be specified to
address the differences in completion handling. Note: original code is
all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path,
and NVMe natively using wqes.
- The SLI-3 side is refactored:
The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined
with the logic of lpfc_sli_abort_iotag_issue() as well as the
iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to
create the new single SLI-3 abort routine that formats and issues the
iocb.
- The SLI-4 side is refactored and added to:
The native WQE abort code in NVMe is moved to the new SLI-4
issue_abort_iotag() routine. Items in SCSI that set fields not set by
NVMe is migrated into the new routine. Thus the routine supports NVMe
and SCSI initiators. The nvmet block (target) formats the abort slightly
different (like the old NVMe initiator) thus it has its own prep routine
stolen from NVMe initiator and it retains the current code it has for
issuing the WQE (does not use the commonized routine the initiators
do). SLI-4 completion handlers were also added.
- lpfc_abort_handler now becomes a wrapper that determines whether
SLI-3 or SLI-4 and calls the proper abort handler.
Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current driver implementation uses SLI-4 WQE to iocb conversion before
calling the cmpl callback function.
Rework the FCP I/O completion path to utilize the SLI-4 WQE.
This patch converts the SCSI I/O completion paths from the iocb-centric
interfaces to the routines are native for whether I/Os are iocb-based
(SLI-3) or WQE-based (SLI-4).
Most existing routines were iocb-based, so this creates a lot of SLI-4
specific routines to provide the functionality.
Link: https://lore.kernel.org/r/20201115192646.12977-15-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch converts the SCSI I/O path from the iocb-centric interfaces to
the common I/O submission path which supports native SLI-4 WQEs.
A wrapper routine is put in place to distinguish SLI-3 from SLI. If SLI-3,
the same iocb-centric paths are used, perhaps with refactored code that is
explicitly for SLI-3. For SLI-4, any iocb-related formatting is replaced
by wqe-based formatting, although much of that is addressed by the common
wqe templates in the SLI-4 path.
Link: https://lore.kernel.org/r/20201115192646.12977-14-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing initiator-side cable swaps with NPIV, oops occur. The
reference counts for the Fabric nodes on the NPIV vports isn't balanced,
resulting in premature node removal.
The following fixes were made:
- Removed the FC_LBIT check in lpfc_linkup_port. This removed the special
case for vports that didn't have them clean up just like the physical
port.
- Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the
node is being removed in the context of a reference count release and a
mailbox command can't be issued at this point.
- Remove special case handling in the default mailbox completion handler
that allowed the skipping of a node reference. Now, reference counting
always requires the removal of the reference.
- Move the location of the DEVICE_RM event is done during LOGO handling as
the driver has additional work to do on the ndlp before puts/releases
can be performed.
Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing NPIV and link bounces, the vport would not show a fabric node
for the F_Port, would not transition into NPR state during a link fault, or
leave the FDMI node untouched during error injection. Cause for this was
determined to be an inconsistent manner in which F_Port, Nameserver, and
FDMI controller nodes were created and linked. In some cases, the nodes
would never be unregistered from the transport, leaving references
active. In other cases, the fabric nodes may register with the transport
multiple times while still registered.
The following changes were made:
- Fix the FDISC issue routine, which starts vport (re)creation, to mark
the F_Port as a fabric node (NLP_FABRIC) and allow the F_Port node to
fully be created and show up in the node list.
- When remote ports are cleaned up on vport termination, cleanup the
nameserver and FDMI controller nodes on the vport so they unregister
from the transport.
- On link bounces, don't exclude the NPIV Fabric remote ports from
transitioning to the NPR state, allowing them to avoid re-registration
if already registered.
Link: https://lore.kernel.org/r/20201115192646.12977-9-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a target swap happens, under certain conditions the node sends a
LOGO. The unsolicited ELS logic responds with a reject. The logic may
allocate a new node to handle this. Afterward, the new nodes are dropped
incorrectly leaving them in a mis-matched state and refcounting causes a
use-after-free situation leading to a crash.
It is also possible that the unsolicited els handling finds a node which is
in an UNUSED state. The handling moves these nodes to NPR state with a
refcount of 1. Although the end of the discovery logic assumes a final put
will free such a node, there are codes paths which could increment the
reference count, thus the node is in NPR state and not released.
Eventually this mismatch in state and refcount leads to premature release
of the node causing a crash.
Fix by always using the discovery engine DEVICE RM event to decrement and
release the nodes (rather than explicit code that tried to do it before).
This will take care of moving the node to the UNUSED state and then removes
the final ref count. If there is a trigger to reuse this node, the
transition from the UNUSED state clearly indicates that the initial
reference is then incremented and use can continue.
Link: https://lore.kernel.org/r/20201115192646.12977-8-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a PLOGI/ADISC/PRLI/REG_RPI fails, the node remains in the nodelist in
that state. Although the driver now frees a node when the ref count goes
to zero, in this case the ref cnt doesn't reach zero because there isn't a
mechanism to release the final reference. Discovery just stops.
Fix by calling the node discovery state machine DEVICE_RM event whenever
one of these commands fail. This will remove the final reference count and
trigger node release.
Link: https://lore.kernel.org/r/20201115192646.12977-7-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the discovery layers within the driver use the SCSI midlayer
host_lock to access node-specific structures. This can contend with the I/O
path and is too coarse of a lock.
Rework the driver so that it uses a lock specific to the remote port node
structure when accessing the structure contents. A few of the changes
brought out spots were some slightly reorganized routines worked better.
Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Due to bug history and code review, the node reference counting approach in
the driver isn't implemented consistently with how the scsi and nvme
transport perform registrations and unregistrations and their callbacks.
This resulted in many bad/stale node pointers.
Reword the driver so that reference handling is performed as follows:
- The initial node reference is taken on structure allocation
- Take a reference on any add/register call to the transport
- Remove a reference on any delete/unregister call to the transport
- After the node has fully removed from both the SCSI and NVMEe transports
(dev_loss_callbacks have called back) call the discovery engine
DEVICE_RM event which will remove the final reference and release the
node structure.
- Alter dev_loss handling when a vport or base port is unloading.
- Remove the put_node handling - no longer needed.
- Rewrite the vport_delete handling on reference counts. Part of this
effort was driven from the FDISC not registering with the transport and
disrupting the model for node reference counting.
- Deleted lpfc_nlp_remove. Pushed it's remaining ops into
lpfc_nlp_release.
- Several other small code cleanups.
Link: https://lore.kernel.org/r/20201115192646.12977-5-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lpfc driver is calling get_device and put_device on scsi_fc_transport
device structure. When this code was removed, the driver triggered an oops
in "scsi_is_host_dev" when the first SCSI target was unregistered from the
transport.
The reason the calls were necessary is that the driver is calling
scsi_remove_host too early, before the target rports are unregistered and
the scsi devices disconnected from the scsi_host. The fc_host was torn
down during fc_remove_host.
Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to
after the nodes are cleaned up. Remove the get_device and put_device calls
and the supporting code.
Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Now that the driver has gone to a normal ref interface (with no odd logic)
the discovery logic needs to be updated to reworked so that it properly
takes references when it should and give them up when it should.
Rework the driver for the following get/put model:
- Move gets to just before an I/O is issued. Add gets for places where an
I/O was issued without one.
- Ensure that failures from lpfc_nlp_get() are handled by the driver.
- Check and fix the placement of lpfc_nlp_puts relative to io completions.
Note: some of these paths may not release the reference on the exact io
completion as the reference is held as the code takes another step in
the discovery thread and which may cause another io to be issued.
- Rearrange some code for error processing and calling lpfc_nlp_put.
- Fix some places of incorrect reference freeing that was causing the
premature releasing of the structure.
- Nvmet plogi handling performs unreg_rpi's. The reference counts were
unbalanced resulting in premature node removal. In some cases this
caused loss of node discovery. Corrected the reftaking around nvmet
plogis.
Nodes that experience devloss now get released from the node list now that
there is a proper reference taking.
Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>