In preparation for making the kmalloc family of allocators type aware, we
need to make sure that the returned type from the allocation matches the
type of the variable being assigned. (Before, the allocator would always
return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct crb_addr_pair *" and the returned type will
be a _different_ "struct crb_addr_pair *", causing a warning. This really
stumped me for a bit. :) Drop the redundant declaration.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250426061951.work.272-kees@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull in fixes from 6.15 and resolve a few conflicts so we can have a
clean base for UFS patches.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SATA devices are lost when FLR is performed while the controller and disks
are in suspended state.
This is because the libata layer is called to initialize the SATA device
during controller resume. If FLR is executed at this time, the IDENTIFY
command fails. As a result, the revalidate fails, and the SATA device is
disabled by the libata layer.
Wait until error handling completes.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20250414080845.1220997-5-liyihang9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In commit 21c7e97247 ("scsi: hisi_sas: Disable SATA disk phy for severe
I_T nexus reset failure"), if the softreset fails upon certain
conditions, the PHY connected to the disk is disabled directly. Manual
recovery is required, which is inconvenient for users in actual use.
In addition, SATA disks do not support simultaneous connection of multiple
hosts. Therefore, when multiple controllers are connected to a SATA disk
at the same time, the controller which is connected later failed to issue
an ATA softreset to the SATA disk. As a result, the PHY associated with
the disk is disabled and cannot be automatically recovered.
Now that, we will not focus on the execution result of softreset. No
matter whether the execution is successful or not, we will directly carry
out I_T_nexus_reset.
Fixes: 21c7e97247 ("scsi: hisi_sas: Disable SATA disk phy for severe I_T nexus reset failure")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20250414080845.1220997-4-liyihang9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commands that have not been completed with scsi_done() do not clear the
SCMD_INITIALIZED flag and therefore will not be properly reinitialized.
Thus, the next time the scsi_cmnd structure is used, the command may
fail in scsi_cmd_runtime_exceeded() due to the old jiffies_at_alloc
value:
kernel: sd 16:0:1:84: [sdts] tag#405 timing out command, waited 720s
kernel: sd 16:0:1:84: [sdts] tag#405 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=66636s
Clear flags for commands that have not been completed by SCSI.
Fixes: 4abafdc436 ("block: remove the initialize_rq_fn blk_mq_ops method")
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Link: https://lore.kernel.org/r/20250324084933.15932-2-a.kovaleva@yadro.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NOPIN response timer may expire on a deleted connection and crash with
such logs:
Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus (null),i,0x00023d000125,iqn.2017-01.com.iscsi.target,t,0x3d
BUG: Kernel NULL pointer dereference on read at 0x00000000
NIP strlcpy+0x8/0xb0
LR iscsit_fill_cxn_timeout_err_stats+0x5c/0xc0 [iscsi_target_mod]
Call Trace:
iscsit_handle_nopin_response_timeout+0xfc/0x120 [iscsi_target_mod]
call_timer_fn+0x58/0x1f0
run_timer_softirq+0x740/0x860
__do_softirq+0x16c/0x420
irq_exit+0x188/0x1c0
timer_interrupt+0x184/0x410
That is because nopin response timer may be re-started on nopin timer
expiration.
Stop nopin timer before stopping the nopin response timer to be sure
that no one of them will be re-started.
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Link: https://lore.kernel.org/r/20241224101757.32300-1-d.bogdanov@yadro.com
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Follow JESD220G, support a WB buffer resize function through sysfs. The
host can obtain resize hint and resize status, and enable the resize
operation. Add three sysfs nodes:
1. wb_resize_enable
2. wb_resize_hint
3. wb_resize_status
The detailed definition of the three nodes can be found in the sysfs
documentation.
Signed-off-by: Huan Tang <tanghuan@vivo.com>
Signed-off-by: Lu Hongfei <luhongfei@vivo.com>
Link: https://lore.kernel.org/r/20250411092924.1116-1-tanghuan@vivo.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A race can occur between the MCQ completion path and the abort handler:
once a request completes, __blk_mq_free_request() sets rq->mq_hctx to
NULL, meaning the subsequent ufshcd_mcq_req_to_hwq() call in
ufshcd_mcq_abort() can return a NULL pointer. If this NULL pointer is
dereferenced, the kernel will crash.
Add a NULL check for the returned hwq pointer. If hwq is NULL, log an
error and return FAILED, preventing a potential NULL-pointer
dereference. As suggested by Bart, the ufshcd_cmd_inflight() check is
removed.
This is similar to the fix in commit 74736103fb ("scsi: ufs: core: Fix
ufshcd_abort_one racing issue").
This is found by our static analysis tool KNighter.
Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Link: https://lore.kernel.org/r/20250410001320.2219341-1-chenyuan0y@gmail.com
Fixes: f1304d4420 ("scsi: ufs: mcq: Added ufshcd_mcq_abort()")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The strlcat() with FORTIFY support is triggering a panic because it
thinks the target buffer will overflow although the correct target
buffer size is passed in.
Anyway, instead of memset() with 0 followed by a strlcat(), just use
memcpy() and ensure that the resulting buffer is NULL terminated.
BIOSVersion is only used for the lpfc_printf_log() which expects a
properly terminated string.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Link: https://lore.kernel.org/r/20250409-fix-lpfc-bios-str-v1-1-05dac9e51e13@kernel.org
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
strncpy() is deprecated for NUL-terminated destination buffers; use
strscpy() instead.
Since sli_config_cmd_init() already zeroes out the destination buffers,
the potential NUL-padding by strncpy() is unnecessary. strscpy() copies
only the required characters and guarantees NUL-termination.
And since all three destination buffers have a fixed length, strscpy()
automatically determines their size using sizeof() when the argument is
omitted. This makes any explicit sizeof() calls unnecessary.
The source strings are also NUL-terminated and meet the __must_be_cstr()
requirement of strscpy().
No functional changes intended.
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20250408102843.804083-2-thorsten.blum@linux.dev
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Neil Armstrong <neil.armstrong@linaro.org> says:
On systems with a large number request slots and unavailable MCQ,
the current design of the interrupt handler can delay handling of
other subsystems interrupts causing display artifacts, GPU stalls
or system firmware requests timeouts.
Example of errors reported on a loaded system:
[drm:dpu_encoder_frame_done_timeout:2706] [dpu error]enc32 frame done timeout
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: hangcheck detected gpu lockup rb 2!
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: completed fence: 74285
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: submitted fence: 74286
Error sending AMC RPMH requests (-110)
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250407-topic-ufs-use-threaded-irq-v3-0-08bee980f71e@linaro.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On systems with a large number request slots and unavailable MCQ ESI,
the current design of the interrupt handler can delay handling of other
subsystems interrupts causing display artifacts, GPU stalls or system
firmware requests timeouts.
Since the interrupt routine can take quite some time, it's preferable to
move it to a threaded handler and leave the hard interrupt handler wake
up the threaded interrupt routine, the interrupt line would be masked
until the processing is finished in the thread thanks to the
IRQS_ONESHOT flag.
When MCQ & ESI interrupts are enabled the I/O completions are now
directly handled in the "hard" interrupt routine to keep IOPs high since
queues handling is done in separate per-queue interrupt routines.
This fixes all encountered issued when running FIO tests on the Qualcomm
SM8650 platform.
Example of errors reported on a loaded system:
[drm:dpu_encoder_frame_done_timeout:2706] [dpu error]enc32 frame done timeout
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: hangcheck detected gpu lockup rb 2!
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: completed fence: 74285
msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.20.1: submitted fence: 74286
Error sending AMC RPMH requests (-110)
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250407-topic-ufs-use-threaded-irq-v3-3-08bee980f71e@linaro.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Eric Biggers <ebiggers@kernel.org> says:
Add support for hardware-wrapped inline encryption keys to the Qualcomm
ICE (Inline Crypto Engine) and UFS (Universal Flash Storage) drivers.
I'd like these patches to be taken through the scsi tree for 6.16.
But the Qualcomm / msm tree would be okay too if that is preferred.
The block layer framework for this feature was merged in 6.15; refer to
the "Hardware-wrapped keys" section of
Documentation/block/inline-encryption.rst. This patchset wires it up
for the newer Qualcomm SoCs, such as SM8650, which have a HWKM (Hardware
Key Manager) and support the SCM calls needed to easily use it.
Tested on the SM8650 HDK with xfstests, specifically generic/368 and
generic/369, in combination with the required fscrypt patch
https://lore.kernel.org/r/20250404225859.172344-1-ebiggers@kernel.org
which I plan to apply separately.
Link: https://lore.kernel.org/r/20250404231533.174419-1-ebiggers@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Qualcomm's Inline Crypto Engine (ICE) version 3.2 and later includes a
key management hardware block called the Hardware Key Manager (HWKM).
Add support for HWKM to the ICE driver. HWKM provides hardware-wrapped
key support where the ICE (storage) keys are not exposed to software and
instead are protected in hardware. Later patches will wire up this
feature to ufs-qcom and sdhci-msm using the support added in this patch.
HWKM and legacy mode are currently mutually exclusive. The selection of
which mode to use has to be made before the storage driver(s) registers
any inline encryption capable disk(s) with the block layer (i.e.,
generally at boot time) so that the appropriate crypto capabilities can
be advertised to upper layers. Therefore, make the ICE driver select
HWKM mode when the all of the following are true:
- The new module parameter qcom_ice.use_wrapped_keys=1 is specified.
- HWKM is present and is at least v2, i.e. ICE is v3.2.1 or later.
- The SCM calls needed to fully use HWKM are supported by TrustZone.
[EB: merged related patches; fixed the module parameter to work
correctly; dropped unnecessary support for HWKM v1; fixed error
handling; improved log messages, comments, and commit message;
fixed naming; merged enable and init functions; and other cleanups]
Signed-off-by: Gaurav Kashyap <quic_gaurkash@quicinc.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20250404231533.174419-3-ebiggers@kernel.org
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qcom_ice_program_key() currently accepts the key as an array of bytes,
algorithm ID, key size enum, and data unit size. However both callers
have a struct blk_crypto_key which contains all that information. Thus
they both have similar code that converts the blk_crypto_key into the
form that qcom_ice_program_key() wants. Once wrapped key support is
added, the key type would need to be added to the arguments too.
Therefore, this patch changes qcom_ice_program_key() to take in all this
information as a struct blk_crypto_key directly. The calling code is
updated accordingly. This ends up being much simpler, and it makes the
key type be passed down automatically once wrapped key support is added.
Based on a patch by Gaurav Kashyap <quic_gaurkash@quicinc.com> that
replaced the byte array argument only. This patch makes the
blk_crypto_key replace other arguments like the algorithm ID too,
ensuring that there remains only one source of truth.
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20250404231533.174419-2-ebiggers@kernel.org
Acked-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kai Mäkisara <Kai.Makisara@kolumbus.fi> says:
The patch set includes changes to better support different device types.
The first patch fixes two obvious typos in the existing definitions.
The second patch adds a device type mask to the command definitions (struct
opcode_info_t). This makes possible for different command definitions for
different device types and makes easy to add opcodes specific to certain
device types. The mask is 32 bits wide and the bit positions are derived
from the Peripheral Device Type field returned from INQUIRY and used in
the struct scsi_device.
In addition to the mask, the second patch adds command filtering based on
device type to command queuing and building of the response in Report
Supported Opcodes.
The third patch splits definitions of READ(6), WRITE(6) and PRE-FETCH/READ
POSITION to versions for tapes and for other devices.
The fourth patch changes obtaining device type from sdebug_ptype to
struct scsi_device->type whenever it is set correctly. This improves
support for using different device types in the same debug host.
The patch set applies to 6.15/scsi-staging
Link: https://lore.kernel.org/r/20250310155557.2872-1-Kai.Makisara@kolumbus.fi
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Devices of several types can be created for a single host. The module
device type should be used only when the devices are created.
Scsi_scan sets the device type initially to 0xff and sets the correct
type based in Inquiry results. This means that Inquiry must report
sdebug_ptype as long as scsi_device->type is not set (the limit 32
comes from the 5-bit length of the Peripheral Device Type in Inquiry).
Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Link: https://lore.kernel.org/r/20250310155557.2872-5-Kai.Makisara@kolumbus.fi
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>