Commit Graph

8780 Commits

Author SHA1 Message Date
Linus Torvalds
0c00ed308d Merge tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:

 - Support for batch request processing for ublk, improving the
   efficiency of the kernel/ublk server communication. This can yield
   nice 7-12% performance improvements

 - Support for integrity data for ublk

 - Various other ublk improvements and additions, including a ton of
   selftests additions and updated

 - Move the handling of blk-crypto software fallback from below the
   block layer to above it. This reduces the complexity of dealing with
   bio splitting

 - Series fixing a number of potential deadlocks in blk-mq related to
   the queue usage counter and writeback throttling and rq-qos debugfs
   handling

 - Add an async_depth queue attribute, to resolve a performance
   regression that's been around for a qhilw related to the scheduler
   depth handling

 - Only use task_work for IOPOLL completions on NVMe, if it is necessary
   to do so. An earlier fix for an issue resulted in all these
   completions being punted to task_work, to guarantee that completions
   were only run for a given io_uring ring when it was local to that
   ring. With the new changes, we can detect if it's necessary to use
   task_work or not, and avoid it if possible.

 - rnbd fixes:
      - Fix refcount underflow in device unmap path
      - Handle PREFLUSH and NOUNMAP flags properly in protocol
      - Fix server-side bi_size for special IOs
      - Zero response buffer before use
      - Fix trace format for flags
      - Add .release to rnbd_dev_ktype

 - MD pull requests via Yu Kuai
      - Fix raid5_run() to return error when log_init() fails
      - Fix IO hang with degraded array with llbitmap
      - Fix percpu_ref not resurrected on suspend timeout in llbitmap
      - Fix GPF in write_page caused by resize race
      - Fix NULL pointer dereference in process_metadata_update
      - Fix hang when stopping arrays with metadata through dm-raid
      - Fix any_working flag handling in raid10_sync_request
      - Refactor sync/recovery code path, improve error handling for
        badblocks, and remove unused recovery_disabled field
      - Consolidate mddev boolean fields into mddev_flags
      - Use mempool to allocate stripe_request_ctx and make sure
        max_sectors is not less than io_opt in raid5
      - Fix return value of mddev_trylock
      - Fix memory leak in raid1_run()
      - Add Li Nan as mdraid reviewer

 - Move phys_vec definitions to the kernel types, mostly in preparation
   for some VFIO and RDMA changes

 - Improve the speed for secure erase for some devices

 - Various little rust updates

 - Various other minor fixes, improvements, and cleanups

* tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
  blk-mq: ABI/sysfs-block: fix docs build warnings
  selftests: ublk: organize test directories by test ID
  block: decouple secure erase size limit from discard size limit
  block: remove redundant kill_bdev() call in set_blocksize()
  blk-mq: add documentation for new queue attribute async_dpeth
  block, bfq: convert to use request_queue->async_depth
  mq-deadline: covert to use request_queue->async_depth
  kyber: covert to use request_queue->async_depth
  blk-mq: add a new queue sysfs attribute async_depth
  blk-mq: factor out a helper blk_mq_limit_depth()
  blk-mq-sched: unify elevators checking for async requests
  block: convert nr_requests to unsigned int
  block: don't use strcpy to copy blockdev name
  blk-mq-debugfs: warn about possible deadlock
  blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
  blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
  blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
  blk-rq-qos: fix possible debugfs_mutex deadlock
  blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
  blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
  ...
2026-02-09 17:57:21 -08:00
Xiao Ni
05c8de4f09 md: fix return value of mddev_trylock
A return value of 0 is treaded as successful lock acquisition. In fact, a
return value of 1 means getting the lock successfully.

Link: https://lore.kernel.org/linux-raid/20260127073951.17248-1-xni@redhat.com
Fixes: 9e59d60976 ("md: call del_gendisk in control path")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Closes: https://lore.kernel.org/linux-raid/20250611073108.25463-1-xni@redhat.com/T/#mfa369ef5faa4aa58e13e6d9fdb88aecd862b8f2f
Signed-off-by: Xiao Ni <xni@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by:  Li Nan <linan122@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-02-02 15:39:55 +08:00
Zilin Guan
6abc7d5dcf md/raid1: fix memory leak in raid1_run()
raid1_run() calls setup_conf() which registers a thread via
md_register_thread(). If raid1_set_limits() fails, the previously
registered thread is not unregistered, resulting in a memory leak
of the md_thread structure and the thread resource itself.

Add md_unregister_thread() to the error path to properly cleanup
the thread, which aligns with the error handling logic of other paths
in this function.

Compile tested only. Issue found using a prototype static analysis tool
and code review.

Link: https://lore.kernel.org/linux-raid/20260126071533.606263-1-zilin@seu.edu.cn
Fixes: 97894f7d3c ("md/raid1: use the atomic queue limit update APIs")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Li Nan <linan122@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-02-02 15:35:03 +08:00
Linus Torvalds
03610bd6b5 Merge tag 'block-6.19-20260130' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - Fix for an accounting leak in bcache that's been there forever,
   and a related dead code removal

 - Revert of a fix for rnbd that went into this series, but depends
   on other changes that are staged for 7.0

 - NVMe pull request via Keith:
      - TCP target completion race condition fix (Ming)
      - DMA descriptor cleanup fix (Roger)

* tag 'block-6.19-20260130' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  bcache: fix I/O accounting leak in detached_dev_do_request
  bcache: remove dead code in detached_dev_do_request
  nvme-pci: DMA unmap the correct regions in nvme_free_sgls
  Revert "rnbd-clt: fix refcount underflow in device unmap path"
  nvmet: fix race in nvmet_bio_done() leading to NULL pointer dereference
2026-01-30 13:18:32 -08:00
Shida Zhang
4da7c5c3ec bcache: fix I/O accounting leak in detached_dev_do_request
When a bcache device is detached, discard requests are completed
immediately. However, the I/O accounting started in
cached_dev_make_request() is not ended, leading to 100% disk
utilization reports in iostat. Add the missing bio_end_io_acct() call.

Fixes: cafe563591 ("bcache: A block layer cache")
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Acked-by: Coly Li <colyli@fnnas.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-28 19:06:55 -07:00
Shida Zhang
6ea84d7a92 bcache: remove dead code in detached_dev_do_request
bio_alloc_clone() with GFP_NOIO and a mempool will not return NULL.
Remove the unnecessary NULL check.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-28 19:06:55 -07:00
Heinz Mauelshagen
cefcb9297f md raid: fix hang when stopping arrays with metadata through dm-raid
When using device-mapper's dm-raid target, stopping a RAID array can cause
the system to hang under specific conditions.

This occurs when:

- A dm-raid managed device tree is suspended from top to bottom
   (the top-level RAID device is suspended first, followed by its
    underlying metadata and data devices)

- The top-level RAID device is then removed

Removing the top-level device triggers a hang in the following sequence:
the dm-raid destructor calls md_stop(), which tries to flush the
write-intent bitmap by writing to the metadata sub-devices. However, these
devices are already suspended, making them unable to complete the write-intent
operations and causing an indefinite block.

Fix:

- Prevent bitmap flushing when md_stop() is called from dm-raid
destructor context
  and avoid a quiescing/unquescing cycle which could also cause I/O

- Still allow write-intent bitmap flushing when called from dm-raid
suspend context

This ensures that RAID array teardown can complete successfully even when the
underlying devices are in a suspended state.

This second patch uses md_is_rdwr() to distinguish between suspend and
destructor paths as elaborated on above.

Link: https://lore.kernel.org/linux-raid/CAM23VxqYrwkhKEBeQrZeZwQudbiNey2_8B_SEOLqug=pXxaFrA@mail.gmail.com
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:46:40 +08:00
Jiasheng Jiang
f150e753cb md-cluster: fix NULL pointer dereference in process_metadata_update
The function process_metadata_update() blindly dereferences the 'thread'
pointer (acquired via rcu_dereference_protected) within the wait_event()
macro.

While the code comment states "daemon thread must exist", there is a valid
race condition window during the MD array startup sequence (md_run):

1. bitmap_load() is called, which invokes md_cluster_ops->join().
2. join() starts the "cluster_recv" thread (recv_daemon).
3. At this point, recv_daemon is active and processing messages.
4. However, mddev->thread (the main MD thread) is not initialized until
   later in md_run().

If a METADATA_UPDATED message is received from a remote node during this
specific window, process_metadata_update() will be called while
mddev->thread is still NULL, leading to a kernel panic.

To fix this, we must validate the 'thread' pointer. If it is NULL, we
release the held lock (no_new_dev_lockres) and return early, safely
ignoring the update request as the array is not yet fully ready to
process it.

Link: https://lore.kernel.org/linux-raid/20260117145903.28921-1-jiashengjiangcool@gmail.com
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:36:08 +08:00
Jack Wang
46ef85f854 md/bitmap: fix GPF in write_page caused by resize race
A General Protection Fault occurs in write_page() during array resize:
RIP: 0010:write_page+0x22b/0x3c0 [md_mod]

This is a use-after-free race between bitmap_daemon_work() and
__bitmap_resize(). The daemon iterates over `bitmap->storage.filemap`
without locking, while the resize path frees that storage via
md_bitmap_file_unmap(). `quiesce()` does not stop the md thread,
allowing concurrent access to freed pages.

Fix by holding `mddev->bitmap_info.mutex` during the bitmap update.

Link: https://lore.kernel.org/linux-raid/20260120102456.25169-1-jinpu.wang@ionos.com
Closes: https://lore.kernel.org/linux-raid/CAMGffE=Mbfp=7xD_hYxXk1PAaCZNSEAVeQGKGy7YF9f2S4=NEA@mail.gmail.com/T/#u
Cc: stable@vger.kernel.org
Fixes: d60b479d17 ("md/bitmap: add bitmap_resize function to allow bitmap resizing.")
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:36:06 +08:00
Yu Kuai
d119bd2e16 md/md-llbitmap: fix percpu_ref not resurrected on suspend timeout
When llbitmap_suspend_timeout() times out waiting for percpu_ref to
become zero, it returns -ETIMEDOUT without resurrecting the percpu_ref.
The caller (md_llbitmap_daemon_fn) then continues to the next page
without calling llbitmap_resume(), leaving the percpu_ref in a killed
state permanently.

Fix this by resurrecting the percpu_ref before returning the error,
ensuring the page control structure remains usable for subsequent
operations.

Link: https://lore.kernel.org/linux-raid/20260123182623.3718551-3-yukuai@fnnas.com
Fixes: 5ab829f197 ("md/md-llbitmap: introduce new lockless bitmap")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
2026-01-26 13:25:31 +08:00
Yu Kuai
cd1635d844 md/raid5: fix IO hang with degraded array with llbitmap
When llbitmap bit state is still unwritten, any new write should force
rcw, as bitmap_ops->blocks_synced() is checked in handle_stripe_dirtying().
However, later the same check is missing in need_this_block(), causing
stripe to deadloop during handling because handle_stripe() will decide
to go to handle_stripe_fill(), meanwhile need_this_block() always return
0 and nothing is handled.

Link: https://lore.kernel.org/linux-raid/20260123182623.3718551-2-yukuai@fnnas.com
Fixes: 5ab829f197 ("md/md-llbitmap: introduce new lockless bitmap")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
2026-01-26 13:18:59 +08:00
Li Nan
5d1dd57929 md: remove recovery_disabled
'recovery_disabled' logic is complex and confusing, originally intended to
preserve raid in extreme scenarios. It was used in following cases:
- When sync fails and setting badblocks also fails, kick out non-In_sync
  rdev and block spare rdev from joining to preserve raid [1]
- When last backup is unavailable, prevent repeated add-remove of spares
  triggering recovery [2]

The original issues are now resolved:
- Error handlers in all raid types prevent last rdev from being kicked out
- Disks with failed recovery are marked Faulty and can't re-join

Therefore, remove 'recovery_disabled' as it's no longer needed.

[1] 5389042ffa ("md: change managed of recovery_disabled.")
[2] 4044ba58dd ("md: don't retry recovery of raid1 that fails due to error on source drive.")

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-13-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:17:38 +08:00
Li Nan
7435b73f05 md/raid10: cleanup skip handling in raid10_sync_request
Skip a sector in raid10_sync_request() when it needs no syncing or no
readable device exists. Current skip handling is unnecessary:

- Use 'skip' label to reissue the next sector instead of return directly
- Complete sync and return 'max_sectors' when multiple sectors are skipped
  due to badblocks

The first is error-prone. For example, commit bc49694a9e ("md: pass in
max_sectors for pers->sync_request()") removed redundant max_sector
assignments. Since skip modifies max_sectors, `goto skip` leaves
max_sectors equal to sector_nr after the jump, which is incorrect.

The second causes sync to complete erroneously when no actual sync occurs.
For recovery, recording badblocks and continue syncing subsequent sectors
is more suitable. For resync, just skip bad sectors and syncing subsequent
sectors.

Clean up complex and unnecessary skip code. Return immediately when a
sector should be skipped. Reduce code paths and lower regression risk.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-12-linan666@huaweicloud.com
Fixes: bc49694a9e ("md: pass in max_sectors for pers->sync_request()")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:26 +08:00
Li Nan
99582edb3f md/raid10: fix any_working flag handling in raid10_sync_request
In raid10_sync_request(), 'any_working' indicates if any IO will
be submitted. When there's only one In_sync disk with badblocks,
'any_working' might be set to 1 but no IO is submitted. Fix it by
setting 'any_working' after badblock checks.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-11-linan666@huaweicloud.com
Fixes: e875ecea26 ("md/raid10 record bad blocks as needed during recovery.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:25 +08:00
Li Nan
8ff59a7247 md: move finish_reshape to md_finish_sync()
finish_reshape implementations of raid10 and raid5 only update mddev
and rdev configurations. Move these operations to md_finish_sync() as
it is more appropriate.

No functional changes.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-10-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:23 +08:00
Li Nan
6dd3aa08e8 md: factor out sync completion update into helper
Repeatedly reading 'mddev->recovery' flags in md_do_sync() may introduce
potential risk if this flag is modified during sync, leading to incorrect
offset updates. Therefore, replace direct 'mddev->recovery' checks with
'action'.

Move sync completion update logic into helper md_finish_sync(), which
improves readability and maintainability.

The reshape completion update remains safe as it only updated after
successful reshape when MD_RECOVERY_INTR is not set and 'curr_resync'
equals 'max_sectors'.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-9-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:21 +08:00
Li Nan
af9c40ff5a md: remove MD_RECOVERY_ERROR handling and simplify resync_offset update
Following previous patch "md: update curr_resync_completed even when
MD_RECOVERY_INTR is set", 'curr_resync_completed' always equals
'curr_resync' for resync, so MD_RECOVERY_ERROR can be removed.

Also, simplify resync_offset update logic.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-8-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:19 +08:00
Li Nan
cc0dab317a md: update curr_resync_completed even when MD_RECOVERY_INTR is set
An error sync IO may be done and sub 'recovery_active' while its
error handling work is pending. This work sets 'recovery_disabled'
and MD_RECOVERY_INTR, then later removes the bad disk without Faulty
flag. If 'curr_resync_completed' is updated before the disk is removed,
it could lead to reading from sync-failed regions.

With the previous patch, error IO will set badblocks or mark rdev as
Faulty, sync-failed regions are no longer readable. After waiting for
'recovery_active' to reach 0 (in the previous line), all sync IO has
*completed*, regardless of whether MD_RECOVERY_INTR is set. Thus, the
MD_RECOVERY_INTR check can be removed.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-7-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:17 +08:00
Li Nan
fd4d44c14f md: mark rdev Faulty when badblocks setting fails
Currently when sync read fails and badblocks set fails (exceeding
512 limit), rdev isn't immediately marked Faulty. Instead
'recovery_disabled' is set and non-In_sync rdevs are removed later.
This preserves array availability if bad regions aren't read, but bad
sectors might be read by users before rdev removal. This occurs due
to incorrect resync/recovery_offset updates that include these bad
sectors.

When badblocks exceed 512, keeping the disk provides little benefit
while adding complexity. Prompt disk replacement is more important.
Therefore when badblocks set fails, directly call md_error to mark rdev
Faulty immediately, preventing potential data access issues.

After this change, cleanup of offset update logic and 'recovery_disabled'
handling will follow.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-6-linan666@huaweicloud.com
Fixes: 5e5702898e ("md/raid10: Handle read errors during recovery better.")
Fixes: 3a9f28a511 ("md/raid1: improve handling of read failure during recovery.")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:15 +08:00
Li Nan
aa9d12cfa1 md: break remaining operations on badblocks set failure in narrow_write_error
Mark device faulty and exit at once when setting badblocks fails in
narrow_write_error(). No need to continue processing remaining sections.
With this change, narrow_write_error() no longer needs to return a value,
so adjust its return type to void.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-5-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:12 +08:00
Li Nan
4870b0f59c md/raid1,raid10: support narrow_write_error when badblocks is disabled
When badblocks.shift < 0 (badblocks disabled), narrow_write_error()
return false, preventing write error handling. Since narrow_write_error()
only splits IO into smaller sizes and re-submits, it can work with
badblocks disabled.

Adjust to use the logical block size for block_sectors when badblocks is
disabled, allowing narrow_write_error() to function in this case.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-4-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:09 +08:00
Li Nan
2a5d4549a2 md: factor error handling out of md_done_sync into helper
The 'ok' parameter in md_done_sync() is redundant for most callers that
always pass 'true'. Factor error handling logic into a separate helper
function md_sync_error() to eliminate unnecessary parameter passing and
improve code clarity.

No functional changes introduced.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-3-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:07 +08:00
Li Nan
090856dd85 md/raid1: simplify uptodate handling in end_sync_write
In end_sync_write, r1bio state is always set to either R1BIO_WriteError
or R1BIO_MadeGood. Consequently, put_sync_write_buf() never takes the
'else' branch that calls md_done_sync(), making the uptodate parameter
have no practical effect.

Pass 1 to put_sync_write_buf(). A more complete cleanup will be done in
a follow-up patch.

Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-2-linan666@huaweicloud.com
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:16:05 +08:00
Yu Kuai
4ffe28ed0d md/raid5: make sure max_sectors is not less than io_opt
Otherwise, even if user issue IO by io_opt, such IO will be split
by max_sectors before they are submitted to raid5. For consequence,
full stripe IO is impossible.

BTW, dm-raid5 is not affected and still have such problem.

Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-7-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2026-01-26 13:11:51 +08:00
Yu Kuai
9340a95d48 md/raid5: use mempool to allocate stripe_request_ctx
On the one hand, stripe_request_ctx is 72 bytes, and it's a bit huge for
a stack variable.

On the other hand, the bitmap sectors_to_do is a fixed size, result in
max_hw_sector_kb of raid5 array is at most 256 * 4k = 1Mb, and this will
make full stripe IO impossible for the array that chunk_size * data_disks
is bigger. Allocate ctx during runtime will make it possible to get rid
of this limit.

Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-6-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
2026-01-26 13:11:29 +08:00
Yu Kuai
10787568cc md: merge mddev serialize_policy into mddev_flags
There is not need to use a separate field in struct mddev, there are no
functional changes.

Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-5-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
2026-01-26 13:10:51 +08:00
Yu Kuai
4f6d2e648c md: merge mddev faillast_dev into mddev_flags
There is not need to use a separate field in struct mddev, there are no
functional changes.

Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-4-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
2026-01-26 13:10:24 +08:00
Yu Kuai
fba4a98040 md: merge mddev has_superblock into mddev_flags
There is not need to use a separate field in struct mddev, there are no
functional changes.

Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-3-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
2026-01-26 13:09:55 +08:00
Yu Kuai
2d9f7150ac md/raid5: fix raid5_run() to return error when log_init() fails
Since commit f63f17350e ("md/raid5: use the atomic queue limit
update APIs"), the abort path in raid5_run() returns 'ret' instead of
-EIO. However, if log_init() fails, 'ret' is still 0 from the previous
successful call, causing raid5_run() to return success despite the
failure.

Fix this by capturing the return value from log_init().

Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-2-yukuai@fnnas.com
Fixes: f63f17350e ("md/raid5: use the atomic queue limit update APIs")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202601130531.LGfcZsa4-lkp@intel.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Li Nan <linan122@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-26 13:09:42 +08:00
Linus Torvalds
00d20db21e Merge tag 'block-6.19-20260122' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - A set of selftest fixes for ublk

 - Fix for a pid mismatch in ublk, comparing PIDs in different
   namespaces if run inside a namespace

 - Fix for a regression added in this release with polling, where the
   nvme tcp connect code would spin forever

 - Zoned device error path fix

 - Tweak the blkzoned uapi additions from this kernel release, making
   them more easily discoverable

 - Fix for a regression in bcache with bio endio handling added in this
   release

* tag 'block-6.19-20260122' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  bcache: use bio cloning for detached device requests
  blk-mq: use BLK_POLL_ONESHOT for synchronous poll completion
  selftests/ublk: fix garbage output in foreground mode
  selftests/ublk: fix error handling for starting device
  selftests/ublk: fix IO thread idle check
  block: make the new blkzoned UAPI constants discoverable
  ublk: fix ublksrv pid handling for pid namespaces
  block: Fix an error path in disk_update_zone_resources()
2026-01-23 12:53:56 -08:00
Shida Zhang
3ef825dfd4 bcache: use bio cloning for detached device requests
Previously, bcache hijacked the bi_end_io and bi_private fields of
the incoming bio when the backing device was in a detached state.
This is fragile and breaks if the bio is needed to be processed by
other layers.

This patch transitions to using a cloned bio embedded within a private
structure. This ensures the original bio's metadata remains untouched.

Fixes: 53280e3984 ("bcache: fix improper use of bi_end_io")
Co-developed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Acked-by: Coly Li <colyli@fnnas.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22 07:24:50 -07:00
Ming Lei
5e2fde1a94 block: pass io_comp_batch to rq_end_io_fn callback
Add a third parameter 'const struct io_comp_batch *' to the rq_end_io_fn
callback signature. This allows end_io handlers to access the completion
batch context when requests are completed via blk_mq_end_request_batch().

The io_comp_batch is passed from blk_mq_end_request_batch(), while NULL
is passed from __blk_mq_end_request() and blk_mq_put_rq_ref() which don't
have batch context.

This infrastructure change enables drivers to detect whether they're
being called from a batched completion path (like iopoll) and access
additional context stored in the io_comp_batch.

Update all rq_end_io_fn implementations:
- block/blk-mq.c: blk_end_sync_rq
- block/blk-flush.c: flush_end_io, mq_flush_data_end_io
- drivers/nvme/host/ioctl.c: nvme_uring_cmd_end_io
- drivers/nvme/host/core.c: nvme_keep_alive_end_io
- drivers/nvme/host/pci.c: abort_endio, nvme_del_queue_end, nvme_del_cq_end
- drivers/nvme/target/passthru.c: nvmet_passthru_req_done
- drivers/scsi/scsi_error.c: eh_lock_door_done
- drivers/scsi/sg.c: sg_rq_end_io
- drivers/scsi/st.c: st_scsi_execute_end
- drivers/target/target_core_pscsi.c: pscsi_req_done
- drivers/md/dm-rq.c: end_clone_request

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-20 10:12:54 -07:00
Jens Axboe
5df832ba5f Merge branch 'block-6.19' into for-7.0/block
Merge in fixes that went to 6.19 after for-7.0/block was branched.
Pending ublk changes depend on particularly the async scan work.

* block-6.19:
  block: zero non-PI portion of auto integrity buffer
  ublk: fix use-after-free in ublk_partition_scan_work
  blk-mq: avoid stall during boot due to synchronize_rcu_expedited
  loop: add missing bd_abort_claiming in loop_set_status
  block: don't merge bios with different app_tags
  blk-rq-qos: Remove unlikely() hints from QoS checks
  loop: don't change loop device under exclusive opener in loop_set_status
  block, bfq: update outdated comment
  blk-mq: skip CPU offline notify on unmapped hctx
  selftests/ublk: fix Makefile to rebuild on header changes
  selftests/ublk: add test for async partition scan
  ublk: scan partition in async way
  block,bfq: fix aux stat accumulation destination
  md: Fix forward incompatibility from configurable logical block size
  md: Fix logical_block_size configuration being overwritten
  md: suspend array while updating raid_disks via sysfs
  md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
  md: Fix static checker warning in analyze_sbs
2026-01-11 13:16:36 -07:00
Linus Torvalds
bea82c80a5 Merge tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - Scan partition tables asynchronously for ublk, similarly to how nvme
   does it. This avoids potential deadlocks, which is why nvme does it
   that way too. Includes a set of selftests as well.

 - MD pull request via Yu:
     - Fix null-pointer dereference in raid5 sysfs group_thread_cnt
       store (Tuo Li)
     - Fix possible mempool corruption during raid1 raid_disks update
       via sysfs (FengWei Shih)
     - Fix logical_block_size configuration being overwritten during
       super_1_validate() (Li Nan)
     - Fix forward incompatibility with configurable logical block size:
       arrays assembled on new kernels could not be assembled on older
       kernels (v6.18 and before) due to non-zero reserved pad rejection
       (Li Nan)
     - Fix static checker warning about iterator not incremented (Li Nan)

 - Skip CPU offlining notifications on unmapped hardware queues

 - bfq-iosched block stats fix

 - Fix outdated comment in bfq-iosched

* tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  block, bfq: update outdated comment
  blk-mq: skip CPU offline notify on unmapped hctx
  selftests/ublk: fix Makefile to rebuild on header changes
  selftests/ublk: add test for async partition scan
  ublk: scan partition in async way
  block,bfq: fix aux stat accumulation destination
  md: Fix forward incompatibility from configurable logical block size
  md: Fix logical_block_size configuration being overwritten
  md: suspend array while updating raid_disks via sysfs
  md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
  md: Fix static checker warning in analyze_sbs
2026-01-02 12:15:59 -08:00
Li Nan
a4166f1c48 md: Fix forward incompatibility from configurable logical block size
Commit 62ed1b5822 ("md: allow configuring logical block size") used
reserved pad to add 'logical_block_size' to metadata. RAID rejects
non-zero reserved pad, so arrays fail when rolling back to old kernels
after booting new ones.

Set 'logical_block_size' only for newly created arrays to support rollback
to old kernels. Importantly new arrays still won't work on old kernels to
prevent data loss issue from LBS changes.

For arrays created on old kernels which confirmed not to rollback,
configure LBS by echo current LBS (queue/logical_block_size) to
md/logical_block_size.

Fixes: 62ed1b5822 ("md: allow configuring logical block size")
Reported-by: BugReports <bugreports61@gmail.com>
Closes: https://lore.kernel.org/linux-raid/825e532d-d1e1-44bb-5581-692b7c091796@huaweicloud.com/T/#t
Signed-off-by: Li Nan <linan122@huawei.com>
Link: https://lore.kernel.org/linux-raid/20251226024221.724201-2-linan666@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-27 10:14:07 +08:00
Li Nan
864466c38c md: Fix logical_block_size configuration being overwritten
In super_1_validate(), mddev->logical_block_size is directly overwritten
with the value from metadata. This causes the previously configured lbs
to be lost, making the configuration ineffective. Fix it.

Fixes: 62ed1b5822 ("md: allow configuring logical block size")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/linux-raid/20251226024221.724201-1-linan666@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-27 10:13:40 +08:00
FengWei Shih
2cc583653b md: suspend array while updating raid_disks via sysfs
In raid1_reshape(), freeze_array() is called before modifying the r1bio
memory pool (conf->r1bio_pool) and conf->raid_disks, and
unfreeze_array() is called after the update is completed.

However, freeze_array() only waits until nr_sync_pending and
(nr_pending - nr_queued) of all buckets reaches zero. When an I/O error
occurs, nr_queued is increased and the corresponding r1bio is queued to
either retry_list or bio_end_io_list. As a result, freeze_array() may
unblock before these r1bios are released.

This can lead to a situation where conf->raid_disks and the mempool have
already been updated while queued r1bios, allocated with the old
raid_disks value, are later released. Consequently, free_r1bio() may
access memory out of bounds in put_all_bios() and release r1bios of the
wrong size to the new mempool, potentially causing issues with the
mempool as well.

Since only normal I/O might increase nr_queued while an I/O error occurs,
suspending the array avoids this issue.

Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends
the array. Therefore, we suspend the array when updating raid_disks
via sysfs to avoid this issue too.

Signed-off-by: FengWei Shih <dannyshih@synology.com>
Link: https://lore.kernel.org/linux-raid/20251226101816.4506-1-dannyshih@synology.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-27 09:54:50 +08:00
Tuo Li
7ad6ef91d8 md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
The variable mddev->private is first assigned to conf and then checked:

  conf = mddev->private;
  if (!conf) ...

If conf is NULL, then mddev->private is also NULL. In this case,
null-pointer dereferences can occur when calling raid5_quiesce():

  raid5_quiesce(mddev, true);
  raid5_quiesce(mddev, false);

since mddev->private is assigned to conf again in raid5_quiesce(), and conf
is dereferenced in several places, for example:

  conf->quiesce = 0;
  wake_up(&conf->wait_for_quiescent);

To fix this issue, the function should unlock mddev and return before
invoking raid5_quiesce() when conf is NULL, following the existing pattern
in raid5_change_consistency_policy().

Fixes: fa1944bbe6 ("md/raid5: Wait sync io to finish before changing group cnt")
Signed-off-by: Tuo Li <islituo@gmail.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/linux-raid/20251225130326.67780-1-islituo@gmail.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-27 09:51:35 +08:00
Li Nan
00f6c1b4d1 md: Fix static checker warning in analyze_sbs
The following warn is reported:

 drivers/md/md.c:3912 analyze_sbs()
 warn: iterator 'i' not incremented

Fixes: d8730f0cf4 ("md: Remove deprecated CONFIG_MD_MULTIPATH")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-raid/7e2e95ce-3740-09d8-a561-af6bfb767f18@huaweicloud.com/T/#t
Signed-off-by: Li Nan <linan122@huawei.com>
Link: https://lore.kernel.org/linux-raid/20251215124412.4015572-1-linan666@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-25 15:45:21 +08:00
Linus Torvalds
6a1636e066 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "The only core fix is in doc; all the others are in drivers, with the
  biggest impacts in libsas being the rollback on error handling and in
  ufs coming from a couple of error handling fixes, one causing a crash
  if it's activated before scanning and the other fixing W-LUN
  resumption"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: qcom: Fix confusing cleanup.h syntax
  scsi: libsas: Add rollback handling when an error occurs
  scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
  scsi: ufs: core: Fix a deadlock in the frequency scaling code
  scsi: ufs: core: Fix an error handler crash
  scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
  scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
  scsi: qla4xxx: Use time conversion macros
  scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
  scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
  scsi: imm: Fix use-after-free bug caused by unfinished delayed work
  scsi: target: sbp: Remove KMSG_COMPONENT macro
  scsi: core: Correct documentation for scsi_device_quiesce()
  scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
  scsi: target: Reset t_task_cdb pointer in error case
  scsi: ufs: core: Fix EH failure after W-LUN resume error
2025-12-14 15:35:35 +12:00
Linus Torvalds
35ebee7e72 Merge tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - Always initialize DMA state, fixing a potentially nasty issue on the
   block side

 - btrfs zoned write fix with cached zone reports

 - Fix corruption issues in bcache with chained bio's, and further make
   it clear that the chained IO handler is simply a marker, it's not
   code meant to be executed

 - Kill old code dealing with synchronous IO polling in the block layer,
   that has been dead for a long time. Only async polling is supported
   these days

 - Fix a lockdep issue in tag_set management, moving it to RCU

 - Fix an issue with ublks bio_vec iteration

 - Don't unconditionally enforce blocking issue of ublk control
   commands, allow some of them with non-blocking issue as they
   do not block

* tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  blk-mq-dma: always initialize dma state
  blk-mq: delete task running check in blk_hctx_poll()
  block: fix cached zone reports on devices with native zone append
  block: Use RCU in blk_mq_[un]quiesce_tagset() instead of set->tag_list_lock
  ublk: don't mutate struct bio_vec in iteration
  block: prohibit calls to bio_chain_endio
  bcache: fix improper use of bi_end_io
  ublk: allow non-blocking ctrl cmds in IO_URING_F_NONBLOCK issue
2025-12-12 22:04:18 +12:00
Linus Torvalds
d358e52546 Merge tag 'for-6.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:

 - convert crypto_shash users to direct crypto library use with simpler
   and faster code and reduced stack usage (Eric Biggers):

     - the dm-verity SHA-256 conversion also teaches it to do two-way
       interleaved hashing for added performance

     - dm-crypt MD5 conversion (used for Loop-AES compatibility)

 - added document for for takeover/reshape raid1 -> raid5 examples (Heinz Mauelshagen)

 - fix dm-vdo kerneldoc warnings (Matthew Sakai)

 - various random fixes and cleanups

* tag 'for-6.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
  dm pcache: fix segment info indexing
  dm pcache: fix cache info indexing
  dm-pcache: advance slot index before writing slot
  dm raid: add documentation for takeover/reshape raid1 -> raid5 table line examples
  dm log-writes: Add missing set_freezable() for freezable kthread
  dm-raid: fix possible NULL dereference with undefined raid type
  dm-snapshot: fix 'scheduling while atomic' on real-time kernels
  dm: ignore discard return value
  MAINTAINERS: add Benjamin Marzinski as a device mapper maintainer
  dm-mpath: Simplify the setup_scsi_dh code
  dm vdo: fix kerneldoc warnings
  dm-bufio: align write boundary on physical block size
  dm-crypt: enable DM_TARGET_ATOMIC_WRITES
  dm: test for REQ_ATOMIC in dm_accept_partial_bio()
  dm-verity: remove useless mempool
  dm-verity: disable recursive forward error correction
  dm-ebs: Mark full buffer dirty even on partial write
  dm mpath: enable DM_TARGET_ATOMIC_WRITES
  dm verity fec: Expose corrected block count via status
  dm: Don't warn if IMA_DISABLE_HTABLE is not enabled
  ...
2025-12-11 12:13:29 +09:00
Li Chen
13ea55ea20 dm pcache: fix segment info indexing
Segment info indexing also used sizeof(struct) instead of the
4K metadata stride, so info_index could point between slots and
subsequent writes would advance incorrectly. Derive info_index
from the pointer returned by the segment meta search using
PCACHE_SEG_INFO_SIZE and advance to the next slot for future
updates.

Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Cc: stable@vger.kernel.org	# 6.18
2025-12-10 19:28:23 +01:00
Li Chen
ee76331783 dm pcache: fix cache info indexing
The on-media cache_info index used sizeof(struct) instead of the
4K metadata stride, so gc_percent updates from dmsetup message
were written between slots and lost after reboot. Use
PCACHE_CACHE_INFO_SIZE in get_cache_info_addr() and align
info_index with the slot returned by pcache_meta_find_latest().

Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Cc: stable@vger.kernel.org	# 6.18
2025-12-10 19:28:23 +01:00
Dongsheng Yang
ebbb90344a dm-pcache: advance slot index before writing slot
In dm-pcache, in order to ensure crash-consistency, a dual-copy scheme
is used to alternately update metadata, and there is a slot index that
records the current slot. However, in the write path the current
implementation writes directly to the current slot indexed by slot
index, and then advances the slot — which ends up overwriting the
existing slot, violating the crash-consistency guarantee.

This patch fixes that behavior, preventing metadata from being
overwritten incorrectly.

In addition, this patch add a missing pmem_wmb() after memcpy_flushcache().

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Cc: stable@vger.kernel.org	# 6.18
2025-12-10 19:28:23 +01:00
Haotian Zhang
ab08f9c8b3 dm log-writes: Add missing set_freezable() for freezable kthread
The log_writes_kthread() calls try_to_freeze() but lacks set_freezable(),
rendering the freeze attempt ineffective since kernel threads are
non-freezable by default. This prevents proper thread suspension during
system suspend/hibernate.

Add set_freezable() to explicitly mark the thread as freezable.

Fixes: 0e9cebe724 ("dm: add log writes target")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10 19:28:22 +01:00
Alexey Simakov
2f6cfd6d7c dm-raid: fix possible NULL dereference with undefined raid type
rs->raid_type is assigned from get_raid_type_by_ll(), which may return
NULL. This NULL value could be dereferenced later in the condition
'if (!(rs_is_raid10(rs) && rt_is_raid0(rs->raid_type)))'.

Add a fail-fast check to return early with an error if raid_type is NULL,
similar to other uses of this function.

Found by Linux Verification Center (linuxtesting.org) with Svace.

Fixes: 33e53f0685 ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping")
Signed-off-by: Alexey Simakov <bigalex934@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10 19:28:22 +01:00
Mikulas Patocka
8581b19eb2 dm-snapshot: fix 'scheduling while atomic' on real-time kernels
There is reported 'scheduling while atomic' bug when using dm-snapshot on
real-time kernels. The reason for the bug is that the hlist_bl code does
preempt_disable() when taking the lock and the kernel attempts to take
other spinlocks while holding the hlist_bl lock.

Fix this by converting a hlist_bl spinlock into a regular spinlock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Jiping Ma <jiping.ma2@windriver.com>
2025-12-10 19:28:22 +01:00
Chaitanya Kulkarni
f4412c7d5a dm: ignore discard return value
__blkdev_issue_discard() always returns 0, making all error checking
at call sites dead code.

For dm-thin change issue_discard() return type to void, in
passdown_double_checking_shared_status() remove the r assignment from
return value of the issue_discard(), for end_discard() hardcode value of
r to 0 that matches only value returned from __blkdev_issue_discard().

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10 19:28:22 +01:00
Benjamin Marzinski
27f204c215 dm-mpath: Simplify the setup_scsi_dh code
There's no point to the MPATHF_RETAIN_ATTACHED_HW_HANDLER flag any more.
The way setup_scsi_dh() worked, if that flag wasn't set, it would
attempt to attach any passed in hardware handler. This would always fail
if a different hardware handler was attached, which caused
setup_scsi_dh() to rerun as if the flag was set. So the code would
already retain any attached handler, because attaching a different one
would always fail.

Also, the code had a bug. If attached_handler_name was NULL but there
was a scsi device handler attached (because either
scsi_dh_attached_handler_name failed() to allocate a name, a handler got
attached after it was called) the code would loop endlessly.

Instead, ignore MPATHF_RETAIN_ATTACHED_HW_HANDLER, and always free the
passed in handler if *attached_handler_name is set. This simplifies the
code, and avoids the endless loop bug, while keeping the functionality
the same.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10 19:28:22 +01:00