Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Provide a new interface for affinity hints to provide a separation
between hint and actual affinity change which has become a hidden
property of the current interface
- Fix up the in tree usage of the affinity hint interfaces
Drivers:
- No new irqchip drivers!
- Fix GICv3 redistributor table reservation with RT across kexec
- Fix GICv4.1 redistributor view of the VPE table across kexec
- Add support for extra interrupts on spear-shirq
- Make obtaining some interrupts optional for the Renesas drivers
- Various cleanups and bug fixes"
* tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/renesas-intc-irqpin: Use platform_get_irq_optional() to get the interrupt
irqchip/renesas-irqc: Use platform_get_irq_optional() to get the interrupt
irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
irqchip/ingenic-tcu: Use correctly sized arguments for bit field
irqchip/gic-v2m: Add const to of_device_id
irqchip/imx-gpcv2: Mark imx_gpcv2_instance with __ro_after_init
irqchip/spear-shirq: Add support for IRQ 0..6
irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime
irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve
irqchip/gic-v3-its: Give the percpu rdist struct its own flags field
net/mlx4: Use irq_update_affinity_hint()
net/mlx5: Use irq_set_affinity_and_hint()
hinic: Use irq_set_affinity_and_hint()
scsi: lpfc: Use irq_set_affinity()
mailbox: Use irq_update_affinity_hint()
ixgbe: Use irq_update_affinity_hint()
be2net: Use irq_update_affinity_hint()
enic: Use irq_update_affinity_hint()
RDMA/irdma: Use irq_update_affinity_hint()
scsi: mpt3sas: Use irq_set_affinity_and_hint()
...
Pull block updates from Jens Axboe:
- Unify where the struct request handling code is located in the blk-mq
code (Christoph)
- Header cleanups (Christoph)
- Clean up the io_context handling code (Christoph, me)
- Get rid of ->rq_disk in struct request (Christoph)
- Error handling fix for add_disk() (Christoph)
- request allocation cleanusp (Christoph)
- Documentation updates (Eric, Matthew)
- Remove trivial crypto unregister helper (Eric)
- Reduce shared tag overhead (John)
- Reduce poll_stats memory overhead (me)
- Known indirect function call for dio (me)
- Use atomic references for struct request (me)
- Support request list issue for block and NVMe (me)
- Improve queue dispatch pinning (Ming)
- Improve the direct list issue code (Keith)
- BFQ improvements (Jan)
- Direct completion helper and use it in mmc block (Sebastian)
- Use raw spinlock for the blktrace code (Wander)
- fsync error handling fix (Ye)
- Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me)
* tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits)
MAINTAINERS: add entries for block layer documentation
docs: block: remove queue-sysfs.rst
docs: sysfs-block: document virt_boundary_mask
docs: sysfs-block: document stable_writes
docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
docs: sysfs-block: add contact for nomerges
docs: sysfs-block: sort alphabetically
docs: sysfs-block: move to stable directory
block: don't protect submit_bio_checks by q_usage_counter
block: fix old-style declaration
nvme-pci: fix queue_rqs list splitting
block: introduce rq_list_move
block: introduce rq_list_for_each_safe macro
block: move rq_list macros to blk-mq.h
block: drop needless assignment in set_task_ioprio()
block: remove unnecessary trailing '\'
bio.h: fix kernel-doc warnings
block: check minor range in device_add_disk()
block: use "unsigned long" for blk_validate_block_size().
block: fix error unwinding in device_add_disk
...
Pull SCSI fixes from James Bottomley:
"Three fixes, all in drivers. The lpfc one doesn't look exploitable,
but nasty things could happen in string operations if mybuf ends up
with an on stack unterminated string"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: vmw_pvscsi: Set residual data length conditionally
scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
The PVSCSI implementation in the VMware hypervisor under specific
configuration ("SCSI Bus Sharing" set to "Physical") returns zero dataLen
in the completion descriptor for READ CAPACITY(16). As a result, the kernel
can not detect proper disk geometry. This can be recognized by the kernel
message:
[ 0.776588] sd 1:0:0:0: [sdb] Sector size 0 reported, assuming 512.
The PVSCSI implementation in QEMU does not set dataLen at all, keeping it
zeroed. This leads to a boot hang as was reported by Shmulik Ladkani.
It is likely that the controller returns the garbage at the end of the
buffer. Residual length should be set by the driver in that case. The SCSI
layer will erase corresponding data. See commit bdb2b8cab4 ("[SCSI] erase
invalid data returned by device") for details.
Commit e662502b3a ("scsi: vmw_pvscsi: Set correct residual data length")
introduced the issue by setting residual length unconditionally, causing
the SCSI layer to erase the useful payload beyond dataLen when this value
is returned as 0.
As a result, considering existing issues in implementations of PVSCSI
controllers, we do not want to call scsi_set_resid() when dataLen ==
0. Calling scsi_set_resid() has no effect if dataLen equals buffer length.
Link: https://lore.kernel.org/lkml/20210824120028.30d9c071@blondie/
Link: https://lore.kernel.org/r/20211220190514.55935-1-amakhalov@vmware.com
Fixes: e662502b3a ("scsi: vmw_pvscsi: Set correct residual data length")
Cc: Matt Wang <wwentao@vmware.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Vishal Bhakta <vbhakta@vmware.com>
Cc: VMware PV-Drivers <pv-drivers@vmware.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: linux-scsi@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI fix from James Bottomley:
"One driver fix: the pm8001 has never actually worked on a system with
an IOMMU and this fixes that use case"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: pm8001: Fix phys_to_virt() usage on dma_addr_t
The driver supports a "direct" mode of operation, where the SMP req frame
is directly copied into the command payload (and vice-versa for the SMP
resp).
To get at the SMP req frame data in the scatterlist the driver uses
phys_to_virt() on the DMA mapped memory dma_addr_t . This is broken, and
subsequently crashes as follows when an IOMMU is enabled:
Unable to handle kernel paging request at virtual address
ffff0000fcebfb00
...
pc : pm80xx_chip_smp_req+0x2d0/0x3d0
lr : pm80xx_chip_smp_req+0xac/0x3d0
pm80xx_chip_smp_req+0x2d0/0x3d0
pm8001_task_exec.constprop.0+0x368/0x520
pm8001_queue_command+0x1c/0x30
smp_execute_task_sg+0xdc/0x204
sas_discover_expander.part.0+0xac/0x6cc
sas_discover_root_expander+0x8c/0x150
sas_discover_domain+0x3ac/0x6a0
process_one_work+0x1d0/0x354
worker_thread+0x13c/0x470
kthread+0x17c/0x190
ret_from_fork+0x10/0x20
Code: 371806e1 910006d6 6b16033f 54000249 (38766b05)
---[ end trace b91d59aaee98ea2d ]---
note: kworker/u192:0[7] exited with preempt_count 1
Instead use kmap_atomic().
--
Difference to v1:
- use kmap_atomic() in both locations
Difference to v2:
- add whitespace around arithmetic (Damien)
Link: https://lore.kernel.org/r/1639390248-213603-1-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI fixes from James Bottomley:
"Four fixes, all in drivers.
Three are small and obvious, the qedi one is a bit larger but also
pretty obvious"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Format log strings only if needed
scsi: scsi_debug: Fix buffer size of REPORT ZONES command
scsi: qedi: Fix cmd_cleanup_cmpl counter mismatch issue
scsi: pm80xx: Do not call scsi_remove_host() in pm8001_alloc()
The driver uses irq_set_affinity_hint to set the affinity for the lpfc
interrupts to a mask corresponding to the local NUMA node to avoid
performance overhead on AMD architectures.
However, irq_set_affinity_hint() setting the affinity is an undocumented
side effect that this function also sets the affinity under the hood.
To remove this side effect irq_set_affinity_hint() has been marked as
deprecated and new interfaces have been introduced.
Also, as per the commit dcaa213679 ("scsi: lpfc: Change default IRQ model
on AMD architectures"):
"On AMD architecture, revert the irq allocation to the normal style
(non-managed) and then use irq_set_affinity_hint() to set the cpu affinity
and disable user-space rebalancing."
we don't really need to set the affinity_hint as user-space rebalancing for
the lpfc interrupts is not desired.
Hence, replace the irq_set_affinity_hint() with irq_set_affinity() which
only applies the affinity for the interrupts.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Link: https://lore.kernel.org/r/20210903152430.244937-12-nitesh@redhat.com
The driver uses irq_set_affinity_hint() specifically for the high IOPS
queue interrupts for two purposes:
- To set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- To apply an affinity that it provides
The driver enforces its own affinity to bind the high IOPS queue interrupts
to the local NUMA node. However, irq_set_affinity_hint() applying the
provided cpumask as an affinity (if not NULL) for the interrupt is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_set_affinity_and_hint()
where the provided mask needs to be applied as the affinity and
affinity_hint pointer needs to be set and replace with
irq_update_affinity_hint() where only affinity_hint needs to be updated.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Link: https://lore.kernel.org/r/20210903152430.244937-6-nitesh@redhat.com
The driver uses irq_set_affinity_hint() specifically for the high IOPS
queue interrupts for two purposes:
- To set the affinity_hint which is consumed by the userspace for
distributing the interrupts
- To apply an affinity that it provides
The driver enforces its own affinity to bind the high IOPS queue interrupts
to the local NUMA node. However, irq_set_affinity_hint() applying the
provided cpumask as an affinity for the interrupt is an undocumented side
effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_set_affinity_and_hint()
where the provided mask needs to be applied as the affinity and
affinity_hint pointer needs to be set and replace with
irq_update_affinity_hint() where only affinity_hint needs to be updated.
Change the megasas_set_high_iops_queue_affinity_hint function name to
megasas_set_high_iops_queue_affinity_and_hint to clearly indicate that the
function is setting both affinity and affinity_hint.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20210903152430.244937-5-nitesh@redhat.com
According to ZBC and SPC specifications, the unit of ALLOCATION LENGTH
field of REPORT ZONES command is byte. However, current scsi_debug
implementation handles it as number of zones to calculate buffer size to
report zones. When the ALLOCATION LENGTH has a large number, this results
in too large buffer size and causes memory allocation failure. Fix the
failure by handling ALLOCATION LENGTH as byte unit.
Link: https://lore.kernel.org/r/20211207010638.124280-1-shinichiro.kawasaki@wdc.com
Fixes: f0d1cf9378 ("scsi: scsi_debug: Add ZBC zone commands")
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI fixes from James Bottomley:
"Two patches, both in drivers.
One is a fix to FC recovery (lpfc) and the other is an enhancement to
support the Intel Alder Motherboard with the UFS driver which comes
under the -rc exception process for hardware enabling"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: ufs-pci: Add support for Intel ADL
scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGO
Calling scsi_remove_host() before scsi_add_host() results in a crash:
BUG: kernel NULL pointer dereference, address: 0000000000000108
RIP: 0010:device_del+0x63/0x440
Call Trace:
device_unregister+0x17/0x60
scsi_remove_host+0xee/0x2a0
pm8001_pci_probe+0x6ef/0x1b90 [pm80xx]
local_pci_probe+0x3f/0x90
We cannot call scsi_remove_host() in pm8001_alloc() because scsi_add_host()
has not been called yet at that point in time.
Function call tree:
pm8001_pci_probe()
|
`- pm8001_pci_alloc()
| |
| `- pm8001_alloc()
| |
| `- scsi_remove_host()
|
`- scsi_add_host()
Link: https://lore.kernel.org/r/20211201041627.1592487-1-ipylypiv@google.com
Fixes: 05c6c029a4 ("scsi: pm80xx: Increase number of supported queues")
Reviewed-by: Vishakha Channapattan <vishakhavc@google.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
All modern drivers can support extra partitions using the extended
dev_t. In fact except for the ioctl method drivers never even see
partitions in normal operation.
So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
GENHD_FL_CD marks a gendisk as a vaguely CD-ROM like device.
Besides being used internally inside of sunvdc.c an xen-blkfront it
is used by xen-blkback as a hint to claim a device exported to a
guest is a CD-ROM like device. Just check for disk->cdi instead
which is the right indicator for "real" CD-ROM or DVD drivers. This
will miss the paravirtualized guest drivers, but those make little
sense to report anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin:
"Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken
does not seem to work, breaking too many guests in the process. We'll
need to do length validation using some other approach"
[ This merge also ends up reverting commit f7a36b03a7 ("vsock/virtio:
suppress used length validation"), which came in through the
networking tree in the meantime, and was part of that whole used
length validation series - Linus ]
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa_sim: avoid putting an uninitialized iova_domain
vhost-vdpa: clean irqs before reseting vdpa device
virtio-blk: modify the value type of num in virtio_queue_rq()
vhost/vsock: cleanup removing `len` variable
vhost/vsock: fix incorrect used length reported to the guest
Revert "virtio_ring: validate used buffer length"
Revert "virtio-net: don't let virtio core to validate used length"
Revert "virtio-blk: don't let virtio core to validate used length"
Revert "virtio-scsi: don't let virtio core to validate used buffer length"
This reverts commit c57911ebfb.
Attempts to validate length in the core did not work out. We'll drop
them for now, so revert the dependent changes in drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A commit introduced formal regstration of all Fabric nodes to the SCSI
transport as well as REG/UNREG RPI mailbox requests. The commit introduced
the NLP_RELEASE_RPI flag for rports set in the lpfc_cmpl_els_logo_acc()
routine to help clean up the RPIs. This new code caused the driver to
release the RPI value used for the remote port and marked the RPI invalid.
When the driver later attempted to re-login, it would use the invalid RPI
and the adapter rejected the PLOGI request. As no login occurred, the
devloss timer on the rport expired and connectivity was lost.
This patch corrects the code by removing the snippet that requests the rpi
to be unregistered. This change only occurs on a node that is already
marked to be rediscovered. This puts the code back to its original
behavior, preserving the already-assigned rpi value (registered or not)
which can be used on the re-login attempts.
Link: https://lore.kernel.org/r/20211123165646.62740-1-jsmart2021@gmail.com
Fixes: fe83e3b9b4 ("scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller")
Cc: <stable@vger.kernel.org> # v5.14+
Co-developed-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This fixes an issue added in commit 4edd8cd4e8 ("scsi: core: sysfs: Fix
hang when device state is set via sysfs") where if userspace is requesting
to set the device state to SDEV_RUNNING when the state is already
SDEV_RUNNING, we return -EINVAL instead of count. The commmit above set ret
to count for this case, when it should have set it to 0.
Link: https://lore.kernel.org/r/20211120164917.4924-1-michael.christie@oracle.com
Fixes: 4edd8cd4e8 ("scsi: core: sysfs: Fix hang when device state is set via sysfs")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For updating the IOC firmware's timestamp with system timestamp, the driver
issues the Mpi26IoUnitControlRequest message. While framing the
Mpi26IoUnitControlRequest, the driver should copy the lower 32 bits of the
current timestamp into IOCParameterValue field and the higher 32 bits into
Reserved7 field.
Link: https://lore.kernel.org/r/20211117123215.25487-1-sreekanth.reddy@broadcom.com
Fixes: f98790c003 ("scsi: mpt3sas: Sync time periodically between driver and firmware")
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While determining the SAS address of a drive, the driver checks whether the
handle number is less than the HBA phy count or not. If the handle number
is less than the HBA phy count then driver assumes that this handle belongs
to HBA and hence it assigns the HBA SAS address.
During IOC firmware downgrade operation, if the number of HBA phys is
reduced and the OS drive's device handle drops below the phy count while
determining the drive's SAS address, the driver ends up using the HBA's SAS
address. This leads to a mismatch of drive's SAS address and hence the
driver unregisters the OS drive and the system goes into read-only mode.
Update the IOC's num_phys to the HBA phy count provided by actual loaded
firmware.
Link: https://lore.kernel.org/r/20211117105058.3505-1-sreekanth.reddy@broadcom.com
Fixes: a5e99fda01 ("scsi: mpt3sas: Update hba_port objects after host reset")
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While looping over shost's sdev list it is possible that one
of the drives is getting removed and its sas_target object is
freed but its sdev object remains intact.
Consequently, a kernel panic can occur while the driver is trying to access
the sas_address field of sas_target object without also checking the
sas_target object for NULL.
Link: https://lore.kernel.org/r/20211117104909.2069-1-sreekanth.reddy@broadcom.com
Fixes: f92363d123 ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the following sparse warnings in ufshpb_set_hpb_read_to_upiu():
sparse warnings: (new ones prefixed by >>)
drivers/scsi/ufs/ufshpb.c:335:27: sparse: sparse: cast from restricted __be64
drivers/scsi/ufs/ufshpb.c:335:25: sparse: expected restricted __be64 [usertype] ppn_tmp
drivers/scsi/ufs/ufshpb.c:335:25: sparse: got unsigned long long [usertype]
Link: https://lore.kernel.org/r/20211111222452.384089-1-huobean@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
hba->outstanding_tasks, which is read under host_lock spinlock, tells the
interrupt handler what task management tags are in use by the driver. The
doorbell register bits indicate which tags are in use by the hardware. A
doorbell bit that is 0 is because the bit has yet to be set by the driver,
or because the task is complete. It is only possible to disambiguate the 2
cases, if reading/writing the doorbell register is synchronized with
reading/writing hba->outstanding_tasks.
For that reason, reading REG_UTP_TASK_REQ_DOOR_BELL must be done under
spinlock.
Link: https://lore.kernel.org/r/20211108064815.569494-3-adrian.hunter@intel.com
Fixes: f5ef336fd2 ("scsi: ufs: core: Fix task management completion")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
__ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, which
races with the completion function ufshcd_tmc_handler() which expects
req->end_io_data to have a value.
Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
host_lock spinlock.
It is also not necessary (nor typical) to clear req->end_io_data because
the block layer does it before allocating out requests e.g. via
blk_get_request().
So fix by not clearing it.
Link: https://lore.kernel.org/r/20211108064815.569494-2-adrian.hunter@intel.com
Fixes: f5ef336fd2 ("scsi: ufs: core: Fix task management completion")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This fixes a regression added with:
commit f0f82e2476 ("scsi: core: Fix capacity set to zero after
offlinining device")
The problem is that after iSCSI recovery, iscsid will call into the kernel
to set the dev's state to running, and with that patch we now call
scsi_rescan_device() with the state_mutex held. If the SCSI error handler
thread is just starting to test the device in scsi_send_eh_cmnd() then it's
going to try to grab the state_mutex.
We are then stuck, because when scsi_rescan_device() tries to send its I/O
scsi_queue_rq() calls -> scsi_host_queue_ready() -> scsi_host_in_recovery()
which will return true (the host state is still in recovery) and I/O will
just be requeued. scsi_send_eh_cmnd() will then never be able to grab the
state_mutex to finish error handling.
To prevent the deadlock move the rescan-related code to after we drop the
state_mutex.
This also adds a check for if we are already in the running state. This
prevents extra scans and helps the iscsid case where if the transport class
has already onlined the device during its recovery process then we don't
need userspace to do it again plus possibly block that daemon.
Link: https://lore.kernel.org/r/20211105221048.6541-3-michael.christie@oracle.com
Fixes: f0f82e2476 ("scsi: core: Fix capacity set to zero after offlinining device")
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: lijinlin <lijinlin3@huawei.com>
Cc: Wu Bo <wubo40@huawei.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We can race where iscsi_session_recovery_timedout() has woken up the error
handler thread and it's now setting the devices to offline, and
session_recovery_timedout()'s call to scsi_target_unblock() is also trying
to set the device's state to transport-offline. We can then get a mix of
states.
For the case where we can't relogin we want the devices to be in
transport-offline so when we have repaired the connection
__iscsi_unblock_session() can set the state back to running.
Set the device state then call into libiscsi to wake up the error handler.
Link: https://lore.kernel.org/r/20211105221048.6541-2-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following has been observed on a test setup:
WARNING: CPU: 4 PID: 250 at drivers/scsi/ufs/ufshcd.c:2737 ufshcd_queuecommand+0x468/0x65c
Call trace:
ufshcd_queuecommand+0x468/0x65c
scsi_send_eh_cmnd+0x224/0x6a0
scsi_eh_test_devices+0x248/0x418
scsi_eh_ready_devs+0xc34/0xe58
scsi_error_handler+0x204/0x80c
kthread+0x150/0x1b4
ret_from_fork+0x10/0x30
That warning is triggered by the following statement:
WARN_ON(lrbp->cmd);
Fix this warning by clearing lrbp->cmd from the abort handler.
Link: https://lore.kernel.org/r/20211104181059.4129537-1-bvanassche@acm.org
Fixes: 7a3e97b0dc ("[SCSI] ufshcd: UFS Host controller driver")
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull more SCSI updates from James Bottomley:
"This series is all the stragglers that didn't quite make the first
merge window pull. It's mostly minor updates and bug fixes of merge
window code but it also has two driver updates: ufs and qla2xxx"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (46 commits)
scsi: scsi_debug: Don't call kcalloc() if size arg is zero
scsi: core: Remove command size deduction from scsi_setup_scsi_cmnd()
scsi: scsi_ioctl: Validate command size
scsi: ufs: ufshpb: Properly handle max-single-cmd
scsi: core: Avoid leaving shost->last_reset with stale value if EH does not run
scsi: bsg: Fix errno when scsi_bsg_register_queue() fails
scsi: sr: Remove duplicate assignment
scsi: ufs: ufs-exynos: Introduce ExynosAuto v9 virtual host
scsi: ufs: ufs-exynos: Multi-host configuration for ExynosAuto v9
scsi: ufs: ufs-exynos: Support ExynosAuto v9 UFS
scsi: ufs: ufs-exynos: Add pre/post_hce_enable drv callbacks
scsi: ufs: ufs-exynos: Factor out priv data init
scsi: ufs: ufs-exynos: Add EXYNOS_UFS_OPT_SKIP_CONFIG_PHY_ATTR option
scsi: ufs: ufs-exynos: Support custom version of ufs_hba_variant_ops
scsi: ufs: ufs-exynos: Add setup_clocks callback
scsi: ufs: ufs-exynos: Add refclkout_stop control
scsi: ufs: ufs-exynos: Simplify drv_data retrieval
scsi: ufs: ufs-exynos: Change pclk available max value
scsi: ufs: Add quirk to enable host controller without PH configuration
scsi: ufs: Add quirk to handle broken UIC command
...
Pull block fixes from Jens Axboe:
- Set of fixes for the batched tag allocation (Ming, me)
- add_disk() error handling fix (Luis)
- Nested queue quiesce fixes (Ming)
- Shared tags init error handling fix (Ye)
- Misc cleanups (Jean, Ming, me)
* tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block:
nvme: wait until quiesce is done
scsi: make sure that request queue queiesce and unquiesce balanced
scsi: avoid to quiesce sdev->request_queue two times
blk-mq: add one API for waiting until quiesce is done
blk-mq: don't free tags if the tag_set is used by other device in queue initialztion
block: fix device_add_disk() kobject_create_and_add() error handling
block: ensure cached plug request matches the current queue
block: move queue enter logic into blk_mq_submit_bio()
block: make bio_queue_enter() fast-path available inline
block: split request allocation components into helpers
block: have plug stored requests hold references to the queue
blk-mq: update hctx->nr_active in blk_mq_end_request_batch()
blk-mq: add RQF_ELV debug entry
blk-mq: only try to run plug merge if request has same queue with incoming bio
block: move RQF_ELV setting into allocators
dm: don't stop request queue after the dm device is suspended
block: replace always false argument with 'false'
block: assign correct tag before doing prefetch of request
blk-mq: fix redundant check of !e expression
For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two call balanced.
It isn't easy to audit that in all scsi drivers, especially the two may
be called from different contexts, so do it in scsi core with one
per-device atomic variable to balance quiesce and unquiesce.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b3e ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two to be balanced.
blk_mq_quiesce_queue() calls blk_mq_quiesce_queue_nowait() for updating
quiesce depth and marking the flag, then scsi_internal_device_block() calls
blk_mq_quiesce_queue_nowait() two times actually.
Fix the double quiesce and keep quiesce and unquiesce balanced.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b3e ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>