On Google gs101, the number of UTP transfer request slots (nutrs) is 32,
and in this case the driver ends up programming the UTRL_NEXUS_TYPE
incorrectly as 0.
This is because the left hand side of the shift is 1, which is of type
int, i.e. 31 bits wide. Shifting by more than that width results in
undefined behaviour.
Fix this by switching to the BIT() macro, which applies correct type
casting as required. This ensures the correct value is written to
UTRL_NEXUS_TYPE (0xffffffff on gs101), and it also fixes a UBSAN shift
warning:
UBSAN: shift-out-of-bounds in drivers/ufs/host/ufs-exynos.c:1113:21
shift exponent 32 is too large for 32-bit type 'int'
For consistency, apply the same change to the nutmrs / UTMRL_NEXUS_TYPE
write.
Fixes: 55f4b1f736 ("scsi: ufs: ufs-exynos: Add UFS host support for Exynos SoCs")
Cc: stable@vger.kernel.org
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://lore.kernel.org/r/20250707-ufs-exynos-shift-v1-1-1418e161ae40@linaro.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sphinx reports indentation warning on scsi_track_queue_full() return
values:
Documentation/driver-api/scsi:101: ./drivers/scsi/scsi.c:247: ERROR: Unexpected indentation. [docutils]
Fix the warning by making the return values listing a bullet list.
Fixes: eb44820c28 ("[SCSI] Add Documentation and integrate into docbook build")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20250702035822.18072-2-bagasdotme@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some functions return a negative value to indicate an error while other
functions return a value != 0 to indicate an error. Document the return
value behavior where this documentation is missing and fix the return
value documentation where necessary. Add warnings to detect mismatches
between documentation and implementation. This matters because several
sysfs callback functions only work correctly if a negative value is
returned upon error.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250623215909.4169007-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The qla2x00_dfs_tgt_port_database_show() function constructs a fake
fc_port_t object on the stack, which--depending on the configuration--is
large enough to exceed the stack size warning limit:
drivers/scsi/qla2xxx/qla_dfs.c:176:1: error: stack frame size (1392) exceeds limit (1280) in 'qla2x00_dfs_tgt_port_database_show' [-Werror,-Wframe-larger-than]
Rework this function to no longer need the structure but instead call a
custom helper function that just prints the data directly from the
port_database_24xx structure.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250620173232.864179-1-arnd@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add pm80xx_fatal_error_uevent_emit() which is called when the pm80xx
driver encouters a fatal error. The uevent has the following additional
custom key/value pair sets:
- DRIVER: driver name, pm80xx in this case
- HBA_NUM: the scsi host id of the device
- EVENT_TYPE: to indicate a fatal error
- REPORTED_BY: either driver or firmware
The uevent is anchored to the kernel object that represents the SCSI
controller, which includes other useful core variables, such as, ACTION,
DEVPATH, SUBSYSTEM, and more.
The fatal_error_uevent_emit() function is called when the controller
fatal error state changes. Since this doesn't happen often for a
specific SCSI host, there is no worries of a uevent storm.
Signed-off-by: Salomon Dushimirimana <salomondush@google.com>
Link: https://lore.kernel.org/r/20250616190018.2136260-1-salomondush@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.4.0.10
This patch set contains bug fixes related to diagnostic log messaging,
driver initialization and removal, updates to mailbox command handling,
and string modifications for obsolete adapter model descriptions.
The patches were cut against Martin's 6.17/scsi-queue tree.
Link: https://lore.kernel.org/r/20250618192138.124116-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move clearing of HBA_SETUP flag out of lpfc_sli_brdrestart_s4 and before
lpfc_sli4_queue_unset. lpfc_sli4_queue_unset kfrees phba queues, so
clear the HBA_SETUP atomic flag to signal that the phba struct is no
longer initialized.
Also, add a check for the HBA_SETUP flag in the lpfc_sli4_io_xri_aborted
routine before dereferencing the ELS WQ.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250618192138.124116-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During rmmod, all ndlp objects are cleaned up and marked with the
NLP_DROPPED flag indicating that an ndlp object is currently being
released. Thus, if an RSCN is received during driver unload, then
walking the fc_nodes list to process the RSCN is unnecessary because the
ndlp objects are very shortly going to be released.
In the lpfc_rscn_recovery_check routine, early return if the driver is in
the middle of unloading by checking for the FC_UNLOADING flag.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250618192138.124116-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a call to lpfc_sli4_read_rev() from lpfc_sli4_hba_setup() fails, the
resultant cleanup routine lpfc_sli4_vport_delete_fcp_xri_aborted() may
occur before sli4_hba.hdwqs are allocated. This may result in a null
pointer dereference when attempting to take the abts_io_buf_list_lock for
the first hardware queue. Fix by adding a null ptr check on
phba->sli4_hba.hdwq and early return because this situation means there
must have been an error during port initialization.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250618192138.124116-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With the ATA error model, an NCQ command failure always triggers an abort
(termination) of all NCQ commands queued on the device. In such case, the
SAT or the host must handle the failed command according to the command
sense data and immediately retry all other NCQ commands that were aborted
due to the failed NCQ command.
For SAS HBAs controlled by the mpt3sas driver, NCQ command aborts are not
handled by the HBA SAT and sent back to the host, with an ioc log
information equal to 0x31080000 (IOC_LOGINFO_PREFIX_PL with the PL code
PL_LOGINFO_CODE_SATA_NCQ_FAIL_ALL_CMDS_AFTR_ERR). The function
_scsih_io_done() always forces a retry of commands terminated with the
status MPI2_IOCSTATUS_SCSI_IOC_TERMINATED using the SCSI result
DID_SOFT_ERROR, regardless of the log_info for the command. This
correctly forces the retry of collateral NCQ abort commands, but with the
retry counter for the command being incremented. If a command to an ATA
device is subject to too many retries due to other NCQ commands failing
(e.g. read commands trying to access unreadable sectors), the collateral
NCQ abort commands may be terminated with an error as they run out of
retries. This violates the SAT specification and causes hard-to-debug
command errors.
Solve this issue by modifying the handling of the
MPI2_IOCSTATUS_SCSI_IOC_TERMINATED status to check if a command is for an
ATA device and if the command loginfo indicates an NCQ collateral
abort. If that is the case, force the command retry using the SCSI result
DID_IMM_RETRY to avoid incrementing the command retry count.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250606052747.742998-3-dlemoal@kernel.org
Tested-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With the ATA error model, an NCQ command failure always triggers an abort
(termination) of all NCQ commands queued on the device. In such case, the
SAT or the host must handle the failed command according to the command
sense data and immediately retry all other NCQ commands that were aborted
due to the failed NCQ command.
For SAS HBAs controlled by the mpi3mr driver, NCQ command aborts are not
handled by the HBA SAT and sent back to the host, with an ioc log
information equal to 0x31080000 (IOC_LOGINFO_PREFIX_PL with the PL code
PL_LOGINFO_CODE_SATA_NCQ_FAIL_ALL_CMDS_AFTR_ERR). The function
mpi3mr_process_op_reply_desc() always forces a retry of commands
terminated with the status MPI3_IOCSTATUS_SCSI_IOC_TERMINATED using the
SCSI result DID_SOFT_ERROR, regardless of the ioc_loginfo for the
command. This correctly forces the retry of collateral NCQ abort
commands, but with the retry counter for the command being incremented.
If a command to an ATA device is subject to too many retries due to other
NCQ commands failing (e.g. read commands trying to access unreadable
sectors), the collateral NCQ abort commands may be terminated with an
error as they run out of retries. This violates the SAT specification and
causes hard-to-debug command errors.
Solve this issue by modifying the handling of the
MPI3_IOCSTATUS_SCSI_IOC_TERMINATED status to check if a command is for an
ATA device and if the command ioc_loginfo indicates an NCQ collateral
abort. If that is the case, force the command retry using the SCSI result
DID_IMM_RETRY to avoid incrementing the command retry count.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250606052747.742998-2-dlemoal@kernel.org
Tested-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log. Since commit ad67b74d24 ("printk: hash
addresses printed with %p") the regular %p has been improved to avoid
this issue. Furthermore, restricted pointers ("%pK") were never meant to
be used through printk(). They can still unintentionally leak raw
pointers or acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and easier to
reason about.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20250611-restricted-pointers-scsi-v1-1-fe31bfbc4910@linutronix.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_add_lun() tests the device vendor string of SCSI devices to detect
if a SCSI device is in fact an ATA device, in order to correctly handle
SATL power management. The function scsi_cdl_enable() also requires
knowing if a SCSI device is an ATA device to control the state of the
device CDL feature but this function does that by testing for the
presence of the VPD page 89h (ATA INFORMATION page).
sd_read_write_same() also has a similar test.
Simplify these different methods by adding the is_ata field to struct
scsi_device to remember that a SCSI device is in fact an ATA one based
on the device vendor name test. This field can also allow low level
SCSI host adapter drivers to take special actions for ATA devices
(e.g. to better handle ATA NCQ errors).
With this, simplify scsi_cdl_enable() and sd_read_write_same().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250611093421.2901633-1-dlemoal@kernel.org
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With W=1, gcc complains correctly:
mpt3sas_ctl.c: In function ‘mpt3sas_send_mctp_passthru_req’:
mpt3sas_ctl.c:2917:29: error: variable ‘mpi_reply’ set but not used [-Werror=unused-but-set-variable]
2917 | MPI2DefaultReply_t *mpi_reply;
| ^~~~~~~~~
Drop the unused assignment and variable.
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://lore.kernel.org/r/20250606-mpt3sas-v1-1-906ffe49fb6b@linaro.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
By default the scsi_dispatch_cmd_error() return value is displayed in
decimal:
kworker/3:1H-183 [003] .... 51.035474: scsi_dispatch_cmd_error: host_no=0 channel=0 id=0 lun=4 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(READ_10 lba=3907214 txlen=1 protect=0 raw=28 00 00 3b 9e 8e 00 00 01 00) rtn=4181
However, these numbers are not particularly helpful wrt. debugging
errors. Especially since the kernel code consistently uses the following
defines in hexadecimal:
SCSI_MLQUEUE_HOST_BUSY 0x1055
SCSI_MLQUEUE_DEVICE_BUSY 0x1056
SCSI_MLQUEUE_EH_RETRY 0x1057
SCSI_MLQUEUE_TARGET_BUSY 0x1058
Switch to using the string form of these values in the trace output:
dd-1059 [007] ..... 31.689529: scsi_dispatch_cmd_error: host_no=0 channel=0 id=0 lun=4 data_sgl=65 prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=23 scheduler_tag=117 cmnd=(READ_10 lba=0 txlen=128 protect=0 raw=28 00 00 00 00 00 00 00 80 00) rtn=SCSI_MLQUEUE_DEVICE_BUSY
Signed-off-by: Kassey Li <quic_yingangl@quicinc.com>
Link: https://lore.kernel.org/r/20250521011711.1983625-1-quic_yingangl@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch updates the scsi_fc_transport.rst documentation by replacing
the outdated << To Be Supplied >> placeholder under the "FC Remote Ports
(rports)" section with a detailed explanation of remote port
functionality in the Fibre Channel (FC) transport class.
The new documentation covers:
- What rports are and their role in FC-based SCSI communication
- Their representation in sysfs (/sys/class/fc_remote_ports/)
- Common sysfs attributes such as (port_id, port_name, node_name, and
port_state).
- Their typical lifecycle (creation and removal)
- Guidance for driver developers on using fc_remote_port_add() and
fc_remote_port_delete()
This change improves the completeness and usefulness of the FC transport
documentation for developers and users interacting with Fibre Channel
drivers in the Linux SCSI subsystem
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://lore.kernel.org/r/20250607162304.1765430-1-alok.a.tiwari@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The function fcoe_select_cpu() is just used to distribute incoming skbs
which start a new FC command sequence. But the network stack already
received (and processed) that skb, and there is a _really_ good chance
that all subsequent skbs for this sequence will be handled with the same
CPU. So we should just use the CPU on which this skb was allocated on and
save ourselves some overhead due to pointless scheduling.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Link: https://lore.kernel.org/r/20250605062014.105302-1-hare@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull timer cleanup from Thomas Gleixner:
"The delayed from_timer() API cleanup:
The renaming to the timer_*() namespace was delayed due massive
conflicts against Linux-next. Now that everything is upstream finish
the conversion"
* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename from_timer() to timer_container_of()
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Cure IO bitmap inconsistencies
A failed fork cleans up all resources of the newly created thread
via exit_thread(). exit_thread() invokes io_bitmap_exit() which
does the IO bitmap cleanups, which unfortunately assume that the
cleanup is related to the current task, which is obviously bogus.
Make it work correctly
- A lockdep fix in the resctrl code removed the clearing of the
command buffer in two places, which keeps stale error messages
around. Bring them back.
- Remove unused trace events"
* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
x86/iopl: Cure TIF_IO_BITMAP inconsistencies
x86/fpu: Remove unused trace events
Pull timer fix from Thomas Gleixner:
"Add the missing seq_file forward declaration in the timer namespace
header"
* tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timens: Add struct seq_file forward declaration
Add initial DMR support, which required smarter RAPL probe
Fix AMD MSR RAPL energy reporting
Add RAPL power limit configuration output
Minor fixes
Signed-off-by: Len Brown <len.brown@intel.com>