Commit Graph

1396681 Commits

Author SHA1 Message Date
Hannes Reinecke
d604e1ec24 scsi: core: Support allocating reserved commands
Quite some drivers are using management commands internally. These
commands typically use the same tag pool as regular SCSI commands. Tags
for these management commands are set aside before allocating the
block-mq tag bitmap for regular SCSI commands. The block layer already
supports this via the reserved tag mechanism. Add a new field
'nr_reserved_cmds' to the SCSI host template to instruct the block layer
to set aside a tag space for these management commands by using reserved
tags. Exclude reserved commands from .can_queue because .can_queue is
visible in sysfs.

[ bvanassche: modified patch title and patch description. Left out the
  following statements: "if (sht->nr_reserved_cmds)" and also
  "if (sdev->host->nr_reserved_cmds) flags |= BLK_MQ_REQ_RESERVED;". Moved
  nr_reserved_cmds declarations and statements close to the
  corresponding can_queue declarations and statements. See also
  https://lore.kernel.org/linux-scsi/20210503150333.130310-11-hare@suse.de/ ]

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-12 17:02:30 -05:00
Alok Tiwari
e9ff858c9a scsi: qla4xxx: Use correct variable in memset for clarity
Both mbox_cmd and mbox_sts have the same size, so using sizeof(mbox_cmd)
when clearing mbox_sts did not cause any functional issue. However, it
is misleading and reduces code readability.

Update the memset() calls to use sizeof(mbox_sts) to make the intent
clear

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20251021090354.1804327-1-alok.a.tiwari@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 23:02:15 -04:00
Bart Van Assche
e414748b7e scsi: aacraid: Improve code readability
aac_queuecommand() is a scsi_host_template.queuecommand()
implementation.  Any value returned by this function other than one of
the following values is translated into SCSI_MLQUEUE_HOST_BUSY:

* 0
* SCSI_MLQUEUE_HOST_BUSY
* SCSI_MLQUEUE_DEVICE_BUSY
* SCSI_MLQUEUE_EH_RETRY
* SCSI_MLQUEUE_TARGET_BUSY

Improve readability of aac_queuecommand() by returning
SCSI_MLQUEUE_HOST_BUSY instead of FAILED.

Cc: Gilbert Wu <gilbert.wu@microchip.com>
Cc: Sagar Biradar <Sagar.Biradar@microchip.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251021201743.3539900-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 23:01:00 -04:00
John Garry
bb798c1f43 scsi: advansys: Don't call asc_prt_scsi_host() -> scsi_host_busy()
The driver calls asc_prt_scsi_host() -> scsi_host_busy() prior to
calling scsi_add_host(). This should not be done, and has raised issues
for other drivers, like [0].

Function asc_prt_scsi_host() only has a single callsite, as above, where
the shost busy count would always be 0.

Avoid printing the shost busy count to avoid this problem.

[0] https://lore.kernel.org/linux-scsi/20251014200118.3390839-3-bvanassche@acm.org/

Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251023085451.3933666-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:58:53 -04:00
John Garry
dcc98c1136 scsi: core: Minor comment fixes for scsi_host_busy()
I guess that the @shost comment on scsi_host_busy() was copied from
scsi_host_get() (as it is the same), however they do not do the same
thing.

Also drop reference to busy counter, which has been removed.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251023082759.3927000-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:56:18 -04:00
Martin K. Petersen
a7480fda0f Merge patch series "Eight small UFS patches"
Bart Van Assche <bvanassche@acm.org> says:

Hi Martin,

This patch series includes two bug fixes for this development cycle
and six small patches that are intended for the next merge window. If
applying the first two patches only during the current development
cycle would be inconvenient, postponing all patches until the next
merge window is fine with me.

Please consider including these patches in the upstream kernel.

Thanks,

Bart.

[mkp: Applied patches #1 and #2 to 6.18/scsi-fixes]

Link: https://patch.msgid.link/20251014200118.3390839-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:20:44 -04:00
Bart Van Assche
bfe5f5dacf scsi: ufs: core: Simplify ufshcd_mcq_sq_cleanup() using guard()
Simplify ufshcd_mcq_sq_cleanup() by using guard(mutex)() instead of
explicit mutex_lock() and mutex_unlock() calls. No functionality has
been changed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/20251014200118.3390839-9-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:18:12 -04:00
Bart Van Assche
047f190494 scsi: ufs: core: Remove a goto label from ufshcd_uic_cmd_compl()
Return directly instead of jumping to a return statement. No
functionality has been changed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/20251014200118.3390839-8-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:18:12 -04:00
Bart Van Assche
b30006b5be scsi: ufs: core: Move the ufshcd_enable_intr() declaration
ufshcd_enable_intr() is not exported and hence should not be declared in
include/ufs/ufshcd.h.

Fixes: 2537577979 ("scsi: ufs: core: Change MCQ interrupt enable flow")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/20251014200118.3390839-7-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:18:12 -04:00
Bart Van Assche
a332735a53 scsi: ufs: core: Remove UFS_DEV_COMP
Remove the UFS_DEV_COMP constant because it is not used.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/20251014200118.3390839-6-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:18:12 -04:00
Bart Van Assche
b3b0842bcb scsi: ufs: core: Change the type of uic_command::cmd_active
Since uic_command::cmd_active is used as a boolean variable, change its
type from 'int' into 'bool'. No functionality has been changed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251014200118.3390839-5-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:18:12 -04:00
Bart Van Assche
7b2c4224fa scsi: ufs: core: Improve documentation in include/ufs/ufshci.h
Make it easier to find the sections in the UFSHCI standard where these
constants come from.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251014200118.3390839-4-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-23 22:18:12 -04:00
Magnus Lindholm
ea0e278a5c scsi: qla1280: Fix compiler warnings (DEBUG mode)
Building the qla1280 driver with DEBUG_QLA1280 set will emit compiler
warnings. Fix some print formatting strings to reflect the correct type
of printed variables as well as remove unused code. (static function
ql1280_dump_device) in order to avoid compiler warnings.

[mkp: fixed a few more checkpatch warnings]

Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://patch.msgid.link/20251002052604.24590-1-linmag7@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 22:06:22 -04:00
Martin K. Petersen
36e6daa543 Merge patch series "Enhance UFS Mediatek Driver"
Peter Wang <peter.wang@mediatek.com says>:

Improves the UFS Mediatek driver by correcting clock scaling with PM
QoS, and adjusting power management flows. It addresses
shutdown/suspend race conditions, and removes redundant
functions. Support for new platforms is added with the MMIO_OTSD_CTRL
register, and MT6991 performance is optimized with MRTT and random
improvements. These changes collectively enhance driver performance,
stability, and compatibility.

Changes since v1:

 1. Remove two patches that will be fixed in UFS core.
    - ufs: host: mediatek: Fix runtime suspend error deadlock
    - ufs: host: mediatek: Enable interrupts for MCQ mode
 2. Use hba->shutting_down instead of ufshcd_is_user_access_allowed

v1:
   https://patch.msgid.link/20250918104000.208856-1-peter.wang@mediatek.com

Link: https://patch.msgid.link/20250924094527.2992256-1-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:37:57 -04:00
Naomi Chu
9ce37e94c3 scsi: ufs: host: mediatek: Support new features for MT6991
Add support for the MT6991 platform by enabling MRTT settings and random
performance improvements. These enhancements aim to optimize performance
and efficiency on the MT6991 hardware.

Enable multi-Round Trip Time (MRTT) for improved data handling.  Enable
random performance improvement features to boost overall system
responsiveness.

Signed-off-by: Naomi Chu <naomi.chu@mediatek.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-9-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:46 -04:00
Peter Wang
4fb4c835a9 scsi: ufs: host: mediatek: Add support for new platform with MMIO_OTSD_CTR
Introduce support for a new UFS Mediatek platform by adding the
REG_UFS_UFS_MMIO_OTSD_CTRL register. This update includes checks for
legacy platforms and uses the new register to replace debug selection
and handle specific operations.  The changes ensure compatibility across
different hardware versions and prevent potential issues with debug
usage on newer platforms.

Additional updates include error logging improvements during link setup
for newer and legacy platforms, ensuring proper event logging and
debugging.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-8-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:46 -04:00
Peter Wang
9b2b03b361 scsi: ufs: host: mediatek: Remove duplicate function
Remove the duplicate ufs_mtk_us_to_ahit() function in the UFS Mediatek
driver and export the existing ufshcd_us_to_ahit() function for shared
use. This change reduces redundancy and maintains consistency across the
codebase.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-7-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:46 -04:00
Peter Wang
014de20bb3 scsi: ufs: host: mediatek: Fix shutdown/suspend race condition
Address a race condition between shutdown and suspend operations in the
UFS Mediatek driver. Before entering suspend, check if a shutdown is in
progress to prevent conflicts and ensure system stability.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-6-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:46 -04:00
Peter Wang
1fd05367d5 scsi: ufs: host: mediatek: Adjust sync length for FASTAUTO mode
Set the sync length for FASTAUTO G1 mode in the UFS Mediatek
driver. This ensures the sync length meets minimum values for high-speed
gears, improving stability during power mode changes.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-5-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:45 -04:00
Peter Wang
16b42c4281 scsi: ufs: host: mediatek: Handle clock scaling for high gear in PM flow
Add clock scaling down for power management flow in the UFS Mediatek
driver. If clock scaling is disabled and fixed in high gear, ensure the
clock scales down during suspend and scales up again after resume to
support high gear.  This adjustment maintains proper power management.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-4-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:45 -04:00
Peter Wang
55ce691dc7 scsi: ufs: host: mediatek: Adjust clock scaling for PM flow
Adjust clock scaling during suspend and resume in the UFS Mediatek
driver. Ensure that the clock scales down during suspend if it was
scaled up, and scales up again after resume.  This adjustment maintains
proper power management.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-3-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:45 -04:00
Peter Wang
7162536410 scsi: ufs: host: mediatek: Correct clock scaling with PM QoS flow
Correct clock scaling with PM QoS during suspend and resume.  Ensure PM
QoS is released during suspend if scaling up and re-applied after
resume. This prevents performance issues and maintains proper power
management.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-2-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:36:45 -04:00
Martin K. Petersen
1c6279dc25 Merge patch series "Remove UFS_DEVICE_QUIRK_DELAY_AFTER_LPM quirk"
Bao D. Nguyen <quic_nguyenb@quicinc.com> says:

Multiple ufs device manufacturers request support for the
UFS_DEVICE_QUIRK_DELAY_AFTER_LPM quirk in the Qualcomm's platform
driver.  After checking further with the major UFS manufacturers
engineering teams such as Samsung, Kioxia, SK Hynix and Micron, all
the manufacturers require this quirk. Since the quirk is needed by all
the ufs device manufacturers, remove the quirk in the ufs core driver
and implement a universal delay for all the ufs devices.

In addition to verifying with the public device's datasheets, the ufs
device manufacturer's engineering teams confirmed the required vcc
power-off time for the devices is a minimum of 1ms before vcc can be
powered on again. The existing 5ms delay implemented in the ufs core
driver seems too conservative, so replace the hard coded 5ms delay
with a variable default to 2ms setting to improve the system resume
latency.  The platform drivers can override this setting as needed.

Link: https://patch.msgid.link/cover.1760383740.git.quic_nguyenb@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:23:50 -04:00
Bao D. Nguyen
4760b639b4 scsi: ufs: core: Replace hard coded vcc-off delay with a variable
After the UFS device VCC is powered off, all the UFS device
manufacturers require a minimum of 1ms of power-off time before VCC can
be powered on again. This requirement has been verified with all the UFS
device manufacturer's datasheets.

Replace the hard coded 5ms delay with a variable with a default setting
of 2ms to improve the system resume latency. The platform drivers can
override this setting as needed.

Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/72fa649406a0bf02271575b7d58f22c968aa5d7e.1760383740.git.quic_nguyenb@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:22:17 -04:00
Bao D. Nguyen
f8e82ae65e scsi: ufs: core: Remove UFS_DEVICE_QUIRK_DELAY_AFTER_LPM quirk
After the UFS device VCC is turned off, all the UFS device manufacturers
require a period of power-off time before the VCC can be turned on
again. This requirement has been confirmed with all the UFS device
manufacturer's datasheets.

Remove the UFS_DEVICE_QUIRK_DELAY_AFTER_LPM quirk in the UFS core driver
and implement a universal delay that is required by all the UFS device
manufacturers. In addition, remove the support for this quirk in the
platform drivers.

Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/25f134d5a42e8b8365be64d512d1bb5fc2bce6ff.1760383740.git.quic_nguyenb@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:22:17 -04:00
Peter Wang
8627f322cb scsi: ufs: core: Support dumping CQ entry in MCQ Mode
Enhance the ufshcd_print_tr() function to support dumping completion
queue (CQ) entries in MCQ mode when an error occurs.  This addition
provides more detailed debugging information by including the CQ entry
data in the error logs, aiding in the diagnosis of issues in MCQ mode.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251016023507.1000664-3-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:16:04 -04:00
Peter Wang
bfe0d22f12 scsi: ufs: core: Update CQ Entry to UFS 4.1 format
Update the completion queue (CQ) entry format according to the UFS 4.1
specification. UFS 4.1 introduces new members in reserved record
DW5. Also refine DW4 with detailed members defined in UFS 4.0. Modify
the code to incorporate these changes by updating the overall_status in
the CQ entry structure.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251016023507.1000664-2-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:16:04 -04:00
Bart Van Assche
ce085ecdba scsi: core: Do not declare scsi_cmnd pointers const
This change allows removing multiple casts and hence improves type
checking by the compiler.

Cc: Hannes Reinecke <hare@suse.de>
Suggested-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251014220426.3690007-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-21 21:11:35 -04:00
Qiang Liu
3d0d1c7a5c scsi: fnic: Self-assignment of intr_time_type has no effect
Remove the self-assignment statement of the intr_time_type variable.

Signed-off-by: Qiang Liu <liuqiang@kylinos.cn>
Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://patch.msgid.link/20251017075504.143491-1-liuqiangneo@163.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:43:07 -04:00
André Draszik
79a2287c1d scsi: ufs: dt-bindings: exynos: Add power-domains
The UFS controller can be part of a power domain, so we need to allow
the relevant property 'power-domains'.

Signed-off-by: André Draszik <andre.draszik@linaro.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Link: https://patch.msgid.link/20251007-power-domains-scsi-ufs-dt-bindings-exynos-v1-1-1acfa81a887a@linaro.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:03:14 -04:00
Bhanu Seshu Kumar Valluri
05e66c18e3 scsi: smartpqi: Prefer kmalloc_array() over kmalloc()
As a best practice use kmalloc_array() to safely calculate dynamic
object sizes without overflow.

[mkp: line exceeding 100 chars, added newline]

Acked-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Bhanu Seshu Kumar Valluri <bhanuseshukumar@gmail.com>
Link: https://patch.msgid.link/20251007065345.8853-1-bhanuseshukumar@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:01:20 -04:00
Gustavo A. R. Silva
81cb6c228f scsi: megaraid_sas: Avoid a couple -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Use the new TRAILING_OVERLAP() helper to fix the following warnings:

drivers/scsi/megaraid/megaraid_sas_fusion.h:1153:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/scsi/megaraid/megaraid_sas_fusion.h:1198:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

This helper creates a union between a flexible-array member (FAM) and a
set of MEMBERS that would otherwise follow it --in this case 'struct
MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES_DYN]' and 'struct
MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES]' in the corresponding
structures.

This overlays the trailing members onto the FAM (struct MR_LD_SPAN_MAP
ldSpanMap[];) while keeping the FAM and the start of MEMBERS aligned.

The static_assert() ensures this alignment remains, and it's
intentionally placed inmediately after the corresponding structures --no
blank line in between.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/aM1E7Xa8qYdZ598N@kspp
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:00:42 -04:00
Gustavo A. R. Silva
11956e4b91 scsi: isci: Avoid -Wflex-array-member-not-at-end warning
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declaration (which happens to be in a union, so
we're moving the entire union) to the end of the corresponding
structure. Notice that `struct ssp_response_iu` is a flexible structure,
this is a structure that contains a flexible-array member.

With these changes fix the following warning:

drivers/scsi/isci/task.h:92:11: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/aM09bpl1xj9KZSZl@kspp
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20 12:00:11 -04:00
Linus Torvalds
3a86608788 Linux 6.18-rc1 v6.18-rc1 2025-10-12 13:42:36 -07:00
Linus Torvalds
3dd7b81235 Merge tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "One revert because of a regression in the I2C core which has sadly not
  showed up during its time in -next"

* tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Revert "i2c: boardinfo: Annotate code used in init phase only"
2025-10-12 13:27:56 -07:00
Linus Torvalds
8765f46791 Merge tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:

 - Skip interrupt ID 0 in sifive-plic during suspend/resume because
   ID 0 is reserved and accessing reserved register space could result
   in undefined behavior

 - Fix a function's retval check in aspeed-scu-ic

* tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume
  irqchip/aspeed-scu-ic: Fix an IS_ERR() vs NULL check
2025-10-12 08:45:52 -07:00
Linus Torvalds
67029a49db Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
 "The previous fix to trace_marker required updating trace_marker_raw as
  well. The difference between trace_marker_raw from trace_marker is
  that the raw version is for applications to write binary structures
  directly into the ring buffer instead of writing ASCII strings. This
  is for applications that will read the raw data from the ring buffer
  and get the data structures directly. It's a bit quicker than using
  the ASCII version.

  Unfortunately, it appears that our test suite has several tests that
  test writes to the trace_marker file, but lacks any tests to the
  trace_marker_raw file (this needs to be remedied). Two issues came
  about the update to the trace_marker_raw file that syzbot found:

   - Fix tracing_mark_raw_write() to use per CPU buffer

     The fix to use the per CPU buffer to copy from user space was
     needed for both the trace_maker and trace_maker_raw file.

     The fix for reading from user space into per CPU buffers properly
     fixed the trace_marker write function, but the trace_marker_raw
     file wasn't fixed properly. The user space data was correctly
     written into the per CPU buffer, but the code that wrote into the
     ring buffer still used the user space pointer and not the per CPU
     buffer that had the user space data already written.

   - Stop the fortify string warning from writing into trace_marker_raw

     After converting the copy_from_user_nofault() into a memcpy(),
     another issue appeared. As writes to the trace_marker_raw expects
     binary data, the first entry is a 4 byte identifier. The entry
     structure is defined as:

     struct {
   	struct trace_entry ent;
   	int id;
   	char buf[];
     };

     The size of this structure is reserved on the ring buffer with:

       size = sizeof(*entry) + cnt;

     Then it is copied from the buffer into the ring buffer with:

       memcpy(&entry->id, buf, cnt);

     This use to be a copy_from_user_nofault(), but now converting it to
     a memcpy() triggers the fortify-string code, and causes a warning.

     The allocated space is actually more than what is copied, as the
     cnt used also includes the entry->id portion. Allocating
     sizeof(*entry) plus cnt is actually allocating 4 bytes more than
     what is needed.

     Change the size function to:

       size = struct_size(entry, buf, cnt - sizeof(entry->id));

     And update the memcpy() to unsafe_memcpy()"

* tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Stop fortify-string from warning in tracing_mark_raw_write()
  tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
2025-10-11 16:06:04 -07:00
Linus Torvalds
c04022dccb Merge tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:

 - Fix UAPI types check in headers_check.pl

 - Only enable -Werror for hostprogs with CONFIG_WERROR / W=e

 - Ignore fsync() error when output of gen_init_cpio is a pipe

 - Several little build fixes for recent modules.builtin.modinfo series

* tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols
  s390/vmlinux.lds.S: Move .vmlinux.info to end of allocatable sections
  kbuild: Add '.rel.*' strip pattern for vmlinux
  kbuild: Restore pattern to avoid stripping .rela.dyn from vmlinux
  gen_init_cpio: Ignore fsync() returning EINVAL on pipes
  scripts/Makefile.extrawarn: Respect CONFIG_WERROR / W=e for hostprogs
  kbuild: uapi: Strip comments before size type check
2025-10-11 15:47:12 -07:00
Wolfram Sang
a8482d2c90 Revert "i2c: boardinfo: Annotate code used in init phase only"
This reverts commit 1a2b423be6 because we
got a regression report and need time to find out the details.

Reported-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Closes: https://lore.kernel.org/r/29ec0082-4dd4-4120-acd2-44b35b4b9487@oss.qualcomm.com
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-10-11 23:57:33 +02:00
Linus Torvalds
98906f9d85 Merge tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
 "This cycle, we have a new RTC driver, for the SpacemiT P1. The optee
  driver gets alarm support. We also get a fix for a race condition that
  was fairly rare unless while stress testing the alarms.

  Subsystem:
   - Fix race when setting alarm
   - Ensure alarm irq is enabled when UIE is enabled
   - remove unneeded 'fast_io' parameter in regmap_config

  New driver:
   - SpacemiT P1 RTC

  Drivers:
   - efi: Remove wakeup functionality
   - optee: add alarms support
   - s3c: Drop support for S3C2410
   - zynqmp: Restore alarm functionality after kexec transition"

* tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (29 commits)
  rtc: interface: Ensure alarm irq is enabled when UIE is enabled
  rtc: tps6586x: Fix initial enable_irq/disable_irq balance
  rtc: cpcap: Fix initial enable_irq/disable_irq balance
  rtc: isl12022: Fix initial enable_irq/disable_irq balance
  rtc: interface: Fix long-standing race when setting alarm
  rtc: pcf2127: fix watchdog interrupt mask on pcf2131
  rtc: zynqmp: Restore alarm functionality after kexec transition
  rtc: amlogic-a4: Optimize global variables
  rtc: sd2405al: Add I2C address.
  rtc: Kconfig: move symbols to proper section
  rtc: optee: make optee_rtc_pm_ops static
  rtc: optee: Fix error code in optee_rtc_read_alarm()
  rtc: optee: fix error code in probe()
  dt-bindings: rtc: Convert apm,xgene-rtc to DT schema
  rtc: spacemit: support the SpacemiT P1 RTC
  rtc: optee: add alarm related rtc ops to optee rtc driver
  rtc: optee: remove unnecessary memory operations
  rtc: optee: fix memory leak on driver removal
  rtc: x1205: Fix Xicor X1205 vendor prefix
  dt-bindings: rtc: Fix Xicor X1205 vendor prefix
  ...
2025-10-11 11:56:47 -07:00
Linus Torvalds
2a6edd867b Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Fixes only in drivers (ufs, mvsas, qla2xxx, target) that came in just
  before or during the merge window.

  The most important one is the qla2xxx which reverts a conversion to
  fix flexible array member warnings, that went up in this merge window
  but which turned out on further testing to be causing data corruption"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Include UTP error in INT_FATAL_ERRORS
  scsi: ufs: sysfs: Make HID attributes visible
  scsi: mvsas: Fix use-after-free bugs in mvs_work_queue
  scsi: ufs: core: Fix PM QoS mutex initialization
  scsi: ufs: core: Fix runtime suspend error deadlock
  Revert "scsi: qla2xxx: Fix memcpy() field-spanning write issue"
  scsi: target: target_core_configfs: Add length check to avoid buffer overflow
2025-10-11 11:49:00 -07:00
Linus Torvalds
9591fdb061 Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Borislav Petkov:

 - Remove a bunch of asm implementing condition flags testing in KVM's
   emulator in favor of int3_emulate_jcc() which is written in C

 - Replace KVM fastops with C-based stubs which avoids problems with the
   fastop infra related to latter not adhering to the C ABI due to their
   special calling convention and, more importantly, bypassing compiler
   control-flow integrity checking because they're written in asm

 - Remove wrongly used static branches and other ugliness accumulated
   over time in hyperv's hypercall implementation with a proper static
   function call to the correct hypervisor call variant

 - Add some fixes and modifications to allow running FRED-enabled
   kernels in KVM even on non-FRED hardware

 - Add kCFI improvements like validating indirect calls and prepare for
   enabling kCFI with GCC. Add cmdline params documentation and other
   code cleanups

 - Use the single-byte 0xd6 insn as the official #UD single-byte
   undefined opcode instruction as agreed upon by both x86 vendors

 - Other smaller cleanups and touchups all over the place

* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86,retpoline: Optimize patch_retpoline()
  x86,ibt: Use UDB instead of 0xEA
  x86/cfi: Remove __noinitretpoline and __noretpoline
  x86/cfi: Add "debug" option to "cfi=" bootparam
  x86/cfi: Standardize on common "CFI:" prefix for CFI reports
  x86/cfi: Document the "cfi=" bootparam options
  x86/traps: Clarify KCFI instruction layout
  compiler_types.h: Move __nocfi out of compiler-specific header
  objtool: Validate kCFI calls
  x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
  x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
  x86/fred: Install system vector handlers even if FRED isn't fully enabled
  x86/hyperv: Use direct call to hypercall-page
  x86/hyperv: Clean up hv_do_hypercall()
  KVM: x86: Remove fastops
  KVM: x86: Convert em_salc() to C
  KVM: x86: Introduce EM_ASM_3WCL
  KVM: x86: Introduce EM_ASM_1SRC2
  KVM: x86: Introduce EM_ASM_2CL
  KVM: x86: Introduce EM_ASM_2W
  ...
2025-10-11 11:19:16 -07:00
Linus Torvalds
2f0a750453 Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:

 - Simplify inline asm flag output operands now that the minimum
   compiler version supports the =@ccCOND syntax

 - Remove a bunch of AS_* Kconfig symbols which detect assembler support
   for various instruction mnemonics now that the minimum assembler
   version supports them all

 - The usual cleanups all over the place

* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
  x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
  x86/mtrr: Remove license boilerplate text with bad FSF address
  x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
  x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
  x86/entry/fred: Push __KERNEL_CS directly
  x86/kconfig: Remove CONFIG_AS_AVX512
  crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
  crypto: X86 - Remove CONFIG_AS_VAES
  crypto: x86 - Remove CONFIG_AS_GFNI
  x86/kconfig: Drop unused and needless config X86_64_SMP
2025-10-11 10:51:14 -07:00
Linus Torvalds
6bb71f0fe5 Merge tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
 "A NULL pointer deref hotfix"

* tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: fix barn NULL pointer dereference on memoryless nodes
2025-10-11 10:40:24 -07:00
Linus Torvalds
fbde105f13 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Finish constification of 1st parameter of bpf_d_path() (Rong Tao)

 - Harden userspace-supplied xdp_desc validation (Alexander Lobakin)

 - Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
   Borkmann)

 - Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)

 - Use correct context to unpin bpf hash map with special types (KaFai
   Wan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add test for unpinning htab with internal timer struct
  bpf: Avoid RCU context warning when unpinning htab with internal structs
  xsk: Harden userspace-supplied xdp_desc validation
  bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
  libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
  bpf: Finish constification of 1st parameter of bpf_d_path()
2025-10-11 10:31:38 -07:00
Linus Torvalds
ae13bd2310 Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more updates from Andrew Morton:
 "Just one series here - Mike Rappoport has taught KEXEC handover to
  preserve vmalloc allocations across handover"

* tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt
  kho: add support for preserving vmalloc allocations
  kho: replace kho_preserve_phys() with kho_preserve_pages()
  kho: check if kho is finalized in __kho_preserve_order()
  MAINTAINERS, .mailmap: update Umang's email address
2025-10-11 10:27:52 -07:00
Linus Torvalds
971370a88c Merge tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "7 hotfixes.  All 7 are cc:stable and all 7 are for MM.

  All singletons, please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: hugetlb: avoid soft lockup when mprotect to large memory area
  fsnotify: pass correct offset to fsnotify_mmap_perm()
  mm/ksm: fix flag-dropping behavior in ksm_madvise
  mm/damon/vaddr: do not repeat pte_offset_map_lock() until success
  mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage
  mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
  memcg: skip cgroup_file_notify if spinning is not allowed
2025-10-11 10:14:55 -07:00
Steven Rostedt
54b91e54b1 tracing: Stop fortify-string from warning in tracing_mark_raw_write()
The way tracing_mark_raw_write() records its data is that it has the
following structure:

  struct {
	struct trace_entry;
	int id;
	char buf[];
  };

But memcpy(&entry->id, buf, size) triggers the following warning when the
size is greater than the id:

 ------------[ cut here ]------------
 memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4)
 WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
 Modules linked in:
 CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary)
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
 Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48
 RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292
 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001
 RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40
 R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010
 R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000
 FS:  00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  tracing_mark_raw_write+0x1fe/0x290
  ? __pfx_tracing_mark_raw_write+0x10/0x10
  ? security_file_permission+0x50/0xf0
  ? rw_verify_area+0x6f/0x4b0
  vfs_write+0x1d8/0xdd0
  ? __pfx_vfs_write+0x10/0x10
  ? __pfx_css_rstat_updated+0x10/0x10
  ? count_memcg_events+0xd9/0x410
  ? fdget_pos+0x53/0x5e0
  ksys_write+0x182/0x200
  ? __pfx_ksys_write+0x10/0x10
  ? do_user_addr_fault+0x4af/0xa30
  do_syscall_64+0x63/0x350
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7fa61d318687
 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
 RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687
 RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001
 RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
 R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000
  </TASK>
 ---[ end trace 0000000000000000 ]---

This is because fortify string sees that the size of entry->id is only 4
bytes, but it is writing more than that. But this is OK as the
dynamic_array is allocated to handle that copy.

The size allocated on the ring buffer was actually a bit too big:

  size = sizeof(*entry) + cnt;

But cnt includes the 'id' and the buffer data, so adding cnt to the size
of *entry actually allocates too much on the ring buffer.

Change the allocation to:

  size = struct_size(entry, buf, cnt - sizeof(entry->id));

and the memcpy() to unsafe_memcpy() with an added justification.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20251011112032.77be18e4@gandalf.local.home
Fixes: 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read user space")
Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-11 11:27:27 -04:00
Vlastimil Babka
fd6db58867 slab: fix barn NULL pointer dereference on memoryless nodes
Phil reported a boot failure once sheaves become used in commits
59faa4da7c ("maple_tree: use percpu sheaves for maple_node_cache") and
3accabda4d ("mm, vma: use percpu sheaves for vm_area_struct cache"):

 BUG: kernel NULL pointer dereference, address: 0000000000000040
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 21 UID: 0 PID: 818 Comm: kworker/u398:0 Not tainted 6.17.0-rc3.slab+ #5 PREEMPT(voluntary)
 Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.26.0 07/30/2025
 RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0
 Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89
 RSP: 0018:ffffd2d10950bdb0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff8a775dab74b0 RCX: 00000000ffffffff
 RDX: 0000000000000cc0 RSI: ffff8a6800804000 RDI: ffff8a680004e300
 RBP: ffffd2d10950be40 R08: 0000000000000060 R09: ffffffffb9367388
 R10: 00000000000149e8 R11: ffff8a6f87a38000 R12: 0000000000000cc0
 R13: 0000000000000cc0 R14: ffff8a680004e300 R15: 00000000000000c0
 FS:  0000000000000000(0000) GS:ffff8a77a3541000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 0000000e1aa24000 CR4: 00000000003506f0
 Call Trace:
  <TASK>
  ? srso_return_thunk+0x5/0x5f
  ? vm_area_alloc+0x1e/0x60
  kmem_cache_alloc_noprof+0x4ec/0x5b0
  vm_area_alloc+0x1e/0x60
  create_init_stack_vma+0x26/0x210
  alloc_bprm+0x139/0x200
  kernel_execve+0x4a/0x140
  call_usermodehelper_exec_async+0xd0/0x190
  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
  ret_from_fork+0xf0/0x110
  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 Modules linked in:
 CR2: 0000000000000040
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0
 Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89
 RSP: 0018:ffffd2d10950bdb0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff8a775dab74b0 RCX: 00000000ffffffff
 RDX: 0000000000000cc0 RSI: ffff8a6800804000 RDI: ffff8a680004e300
 RBP: ffffd2d10950be40 R08: 0000000000000060 R09: ffffffffb9367388
 R10: 00000000000149e8 R11: ffff8a6f87a38000 R12: 0000000000000cc0
 R13: 0000000000000cc0 R14: ffff8a680004e300 R15: 00000000000000c0
 FS:  0000000000000000(0000) GS:ffff8a77a3541000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 0000000e1aa24000 CR4: 00000000003506f0
 Kernel panic - not syncing: Fatal exception
 Kernel Offset: 0x36a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 ---[ end Kernel panic - not syncing: Fatal exception ]---

And noted "this is an AMD EPYC 7401 with 8 NUMA nodes configured such
that memory is only on 2 of them."

 # numactl --hardware
 available: 8 nodes (0-7)
 node 0 cpus: 0 8 16 24 32 40 48 56 64 72 80 88
 node 0 size: 0 MB
 node 0 free: 0 MB
 node 1 cpus: 2 10 18 26 34 42 50 58 66 74 82 90
 node 1 size: 31584 MB
 node 1 free: 30397 MB
 node 2 cpus: 4 12 20 28 36 44 52 60 68 76 84 92
 node 2 size: 0 MB
 node 2 free: 0 MB
 node 3 cpus: 6 14 22 30 38 46 54 62 70 78 86 94
 node 3 size: 0 MB
 node 3 free: 0 MB
 node 4 cpus: 1 9 17 25 33 41 49 57 65 73 81 89
 node 4 size: 0 MB
 node 4 free: 0 MB
 node 5 cpus: 3 11 19 27 35 43 51 59 67 75 83 91
 node 5 size: 32214 MB
 node 5 free: 31625 MB
 node 6 cpus: 5 13 21 29 37 45 53 61 69 77 85 93
 node 6 size: 0 MB
 node 6 free: 0 MB
 node 7 cpus: 7 15 23 31 39 47 55 63 71 79 87 95
 node 7 size: 0 MB
 node 7 free: 0 MB

Linus decoded the stacktrace to get_barn() and get_node() and determined
that kmem_cache->node[numa_mem_id()] is NULL.

The problem is due to a wrong assumption that memoryless nodes only
exist on systems with CONFIG_HAVE_MEMORYLESS_NODES, where numa_mem_id()
points to the nearest node that has memory. SLUB has been allocating its
kmem_cache_node structures only on nodes with memory and so it does with
struct node_barn.

For kmem_cache_node, get_partial_node() checks if get_node() result is
not NULL, which I assumed was for protection from a bogus node id passed
to kmalloc_node() but apparently it's also for systems where
numa_mem_id() (used when no specific node is given) might return a
memoryless node.

Fix the sheaves code the same way by checking the result of get_node()
and bailing out if it's NULL. Note that cpus on such memoryless nodes
will have degraded sheaves performance, which can be improved later,
preferably by making numa_mem_id() work properly on such systems.

Fixes: 2d517aa09b ("slab: add opt-in caching layer of percpu sheaves")
Reported-and-tested-by: Phil Auld <pauld@redhat.com>
Closes: https://lore.kernel.org/all/20251010151116.GA436967@pauld.westford.csb/
Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-%3Dwg1xK%2BBr%3DFJ5QipVhzCvq7uQVPt5Prze6HDhQQ%3DQD_BcQ@mail.gmail.com/
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-10-11 10:47:57 +02:00
Steven Rostedt
bda745ee8f tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
The fix to use a per CPU buffer to read user space tested only the writes
to trace_marker. But it appears that the selftests are missing tests to
the trace_maker_raw file. The trace_maker_raw file is used by applications
that writes data structures and not strings into the file, and the tools
read the raw ring buffer to process the structures it writes.

The fix that reads the per CPU buffers passes the new per CPU buffer to
the trace_marker file writes, but the update to the trace_marker_raw write
read the data from user space into the per CPU buffer, but then still used
then passed the user space address to the function that records the data.

Pass in the per CPU buffer and not the user space address.

TODO: Add a test to better test trace_marker_raw.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20251011035243.386098147@kernel.org
Fixes: 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read user space")
Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-10 23:58:44 -04:00