linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 07:51:31 -04:00

Author	SHA1	Message	Date
Nilay Shroff	886f352015	nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown Cancelling the I/O and admin tagsets during nvme-loop controller reset or shutdown is unnecessary. The subsequent destruction of the I/O and admin queues already waits for all in-flight target operations to complete. Cancelling the tagsets first also opens a race window. After a request tag has been cancelled, a late completion from the target may still arrive before the queues are destroyed. In that case the completion path may access a request whose tag has already been cancelled or freed, which can lead to a kernel crash. Please see below the kernel crash encountered while running blktests nvme/040: run blktests nvme/040 at 2026-03-08 06:34:27 loop0: detected capacity change from 0 to 2097152 nvmet: adding nsid 1 to subsystem blktests-subsystem-1 nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. nvme nvme6: creating 96 I/O queues. nvme nvme6: new ctrl: "blktests-subsystem-1" nvme_log_error: 1 callbacks suppressed block nvme6n1: no usable path - requeuing I/O nvme6c6n1: Read(0x2) @ LBA 2096384, 128 blocks, Host Aborted Command (sct 0x3 / sc 0x71) blk_print_req_error: 1 callbacks suppressed I/O error, dev nvme6c6n1, sector 2096384 op 0x0:(READ) flags 0x2880700 phys_seg 1 prio class 2 block nvme6n1: no usable path - requeuing I/O Kernel attempted to read user page (236) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000236 Faulting instruction address: 0xc000000000961274 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: nvme_loop nvme_fabrics loop nvmet null_blk rpadlpar_io rpaphp xsk_diag bonding rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink pseries_rng dax_pmem vmx_crypto drm drm_panel_orientation_quirks xfs mlx5_core nvme bnx2x sd_mod nd_pmem nd_btt nvme_core sg papr_scm tls libnvdimm ibmvscsi ibmveth scsi_transport_srp nvme_keyring nvme_auth mdio hkdf pseries_wdt dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: loop] CPU: 25 UID: 0 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 7.0.0-rc3+ #14 PREEMPT Hardware name: IBM,9043-MRX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1120.00 (RF1120_128) hv:phyp pSeries NIP: c000000000961274 LR: c008000009af1808 CTR: c00000000096124c REGS: c0000007ffc0f910 TRAP: 0300 Not tainted (7.0.0-rc3+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22222222 XER: 00000000 CFAR: c008000009af232c DAR: 0000000000000236 DSISR: 40000000 IRQMASK: 0 GPR00: c008000009af17fc c0000007ffc0fbb0 c000000001c78100 c0000000be05cc00 GPR04: 0000000000000001 0000000000000000 0000000000000007 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000002 c008000009af2318 GPR12: c00000000096124c c0000007ffdab880 0000000000000000 0000000000000000 GPR16: 0000000000000010 0000000000000000 0000000000000004 0000000000000000 GPR20: 0000000000000001 c000000002ca2b00 0000000100043bb2 000000000000000a GPR24: 000000000000000a 0000000000000000 0000000000000000 0000000000000000 GPR28: c000000084021d40 c000000084021d50 c0000000be05cd60 c0000000be05cc00 NIP [c000000000961274] blk_mq_complete_request_remote+0x28/0x2d4 LR [c008000009af1808] nvme_loop_queue_response+0x110/0x290 [nvme_loop] Call Trace: 0xc00000000502c640 (unreliable) nvme_loop_queue_response+0x104/0x290 [nvme_loop] __nvmet_req_complete+0x80/0x498 [nvmet] nvmet_req_complete+0x24/0xf8 [nvmet] nvmet_bio_done+0x58/0xcc [nvmet] bio_endio+0x250/0x390 blk_update_request+0x2e8/0x68c blk_mq_end_request+0x30/0x5c lo_complete_rq+0x94/0x110 [loop] blk_complete_reqs+0x78/0x98 handle_softirqs+0x148/0x454 do_softirq_own_stack+0x3c/0x50 __irq_exit_rcu+0x18c/0x1b4 irq_exit+0x1c/0x34 do_IRQ+0x114/0x278 hardware_interrupt_common_virt+0x28c/0x290 Since the queue teardown path already guarantees that all target-side operations have completed, cancelling the tagsets is redundant and unsafe. So avoid cancelling the I/O and admin tagsets during controller reset and shutdown. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:06 -07:00
Marco Crivellari	e8e1a4c0fb	nvme: add WQ_PERCPU to alloc_workqueue users This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The refactoring is going to alter the default behavior of alloc_workqueue() to be unbound by default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. For more details see the Link tag below. In order to keep alloc_workqueue() behavior identical, explicitly request WQ_PERCPU. Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Suggested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:06 -07:00
Marco Crivellari	12f5fb5ee1	nvmet-fc: add WQ_PERCPU to alloc_workqueue users This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The refactoring is going to alter the default behavior of alloc_workqueue() to be unbound by default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. For more details see the Link tag below. In order to keep alloc_workqueue() behavior identical, explicitly request WQ_PERCPU. Cc: Justin Tee <justin.tee@broadcom.com> Cc: Naresh Gottumukkala <nareshgottumukkala83@gmail.com> CC: Paul Ely <paul.ely@broadcom.com> Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Suggested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:05 -07:00
Marco Crivellari	3d553be6d2	nvmet: replace use of system_wq with system_percpu_wq This patch continues the effort to refactor workqueue APIs, which has begun with the changes introducing new workqueues and a new alloc_workqueue flag: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The point of the refactoring is to eventually alter the default behavior of workqueues to become unbound by default so that their workload placement is optimized by the scheduler. Before that to happen, workqueue users must be converted to the better named new workqueues with no intended behaviour changes: system_wq -> system_percpu_wq system_unbound_wq -> system_dfl_wq This way the old obsolete workqueues (system_wq, system_unbound_wq) can be removed in the future. Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Suggested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:05 -07:00
Alistair Francis	33eb451044	nvme-auth: Don't propose NVME_AUTH_DHGROUP_NULL with SC_C Section 8.3.4.5.2 of the NVMe 2.1 base spec states that """ The 00h identifier shall not be proposed in an AUTH_Negotiate message that requests secure channel concatenation (i.e., with the SC_C field set to a non-zero value). """ We need to ensure that we don't set the NVME_AUTH_DHGROUP_NULL idlist if SC_C is set. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chris Leech <cleech@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kamaljit Singh <kamaljit.singh@opensource.wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:05 -07:00
Alistair Francis	09e8f0f934	nvme: Add the DHCHAP maximum HD IDs In preperation for using DHCHAP length in upcoming host and target patches let's add the hash and diffie-hellman ID length macros. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yunje Shin <ioerts@kookmin.ac.kr> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:05 -07:00
Robert Beckett	a8eebf9699	nvme-pci: add NVME_QUIRK_DISABLE_WRITE_ZEROES for Kingston OM3SGP4 The Kingston OM3SGP42048K2-A00 (PCI ID 2646:502f) firmware has a race condition when processing concurrent write zeroes and DSM (discard) commands, causing spurious "LBA Out of Range" errors and IOMMU page faults at address 0x0. The issue is reliably triggered by running two concurrent mkfs commands on different partitions of the same drive, which generates interleaved write zeroes and discard operations. Disable write zeroes for this device, matching the pattern used for other Kingston OM* drives that have similar firmware issues. Cc: stable@vger.kernel.org Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Assisted-by: claude-opus-4-6-v1 Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:05 -07:00
Robert Beckett	40f0496b61	nvme: respect NVME_QUIRK_DISABLE_WRITE_ZEROES when wzsl is set The NVM Command Set Identify Controller data may report a non-zero Write Zeroes Size Limit (wzsl). When present, nvme_init_non_mdts_limits() unconditionally overrides max_zeroes_sectors from wzsl, even if NVME_QUIRK_DISABLE_WRITE_ZEROES previously set it to zero. This effectively re-enables write zeroes for devices that need it disabled, defeating the quirk. Several Kingston OM* drives rely on this quirk to avoid firmware issues with write zeroes commands. Check for the quirk before applying the wzsl override. Fixes: `5befc7c26e` ("nvme: implement non-mdts command limits") Cc: stable@vger.kernel.org Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Assisted-by: claude-opus-4-6-v1 Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:05 -07:00
Caleb Sander Mateos	c4cfe8c328	nvmet: report NPDGL and NPDAL A block device with a very large discard_granularity queue limit may not be able to report it in the 16-bit NPDG and NPDA fields in the Identify Namespace data structure. For this reason, version 2.1 of the NVMe specs added 32-bit fields NPDGL and NPDAL to the NVM Command Set Specific Identify Namespace structure. So report the discard_granularity there too and set OPTPERF to 11b to indicate those fields are supported. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:05 -07:00
Caleb Sander Mateos	e0d56e7055	nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT Use the NVME_NS_FEAT_OPTPERF_SHIFT constant in nvmet_bdev_set_limits() to set the OPTPERF bits of the nvme_id_ns NSFEAT field instead of the magic number 4. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:04 -07:00
Caleb Sander Mateos	1029298da3	nvme: set discard_granularity from NPDG/NPDA Currently, nvme_config_discard() always sets the discard_granularity queue limit to the logical block size. However, NVMe namespaces can advertise a larger preferred discard granularity in the NPDG or NPDA field of the Identify Namespace structure or the NPDGL or NPDAL fields of the I/O Command Set Specific Identify Namespace structure. Use these fields to compute the discard_granularity limit. The logic is somewhat involved. First, the fields are optional. NPDG is only reported if the low bit of OPTPERF is set in NSFEAT. NPDA is reported if any bit of OPTPERF is set. And NPDGL and NPDAL are reported if the high bit of OPTPERF is set. NPDGL and NPDAL can also each be set to 0 to opt out of reporting a limit. I/O Command Set Specific Identify Namespace may also not be supported by older NVMe controllers. Another complication is that multiple values may be reported among NPDG, NPDGL, NPDA, and NPDAL. The spec says to prefer the values reported in the L variants. The spec says NPDG should be a multiple of NPDA and NPDGL should be a multiple of NPDAL, but it doesn't specify a relationship between NPDG and NPDAL or NPDGL and NPDA. So use the maximum of the reported NPDG(L) and NPDA(L) values as the discard_granularity. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:04 -07:00
Caleb Sander Mateos	b465046c8c	nvme: add from0based() helper The NVMe specifications are big fans of "0's based"/"0-based" fields for encoding values that must be positive. The encoded value is 1 less than the value it represents. nvmet already provides a helper to0based() for encoding 0's based values, so add a corresponding helper to decode these fields on the host side. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:04 -07:00
Caleb Sander Mateos	823340b7e8	nvme: always issue I/O Command Set specific Identify Namespace Currently, the I/O Command Set specific Identify Namespace structure is only fetched for controllers that support extended LBA formats. This is because struct nvme_id_ns_nvm is only used by nvme_configure_pi_elbas(), which is only called when the ELBAS bit is set in the CTRATT field of the Identify Controller structure. However, the I/O Command Set specific Identify Namespace structure will soon be used in nvme_update_disk_info(), so always try to obtain it in nvme_update_ns_info_block(). This Identify structure is first defined in NVMe spec version 2.0, but controllers reporting older versions could still implement it. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:04 -07:00
Caleb Sander Mateos	d3c04a6ea5	nvme: update nvme_id_ns OPTPERF constants In NVMe verson 2.0 and below, OPTPERF comprises only bit 4 of NSFEAT in the Identify Namespace structure. Since version 2.1, OPTPERF includes both bits 4 and 5 of NSFEAT. Replace the NVME_NS_FEAT_IO_OPT constant with NVME_NS_FEAT_OPTPERF_SHIFT, NVME_NS_FEAT_OPTPERF_MASK, and NVME_NS_FEAT_OPTPERF_MASK_2_1, representing the first bit, pre-2.1 bit width, and post-2.1 bit width of OPTPERF. Update nvme_update_disk_info() to check both OPTPERF bits for controllers that report version 2.1 or newer, as NPWG and NOWS are supported even if only bit 5 is set. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:04 -07:00
Caleb Sander Mateos	9110b85244	nvme: fold nvme_config_discard() into nvme_update_disk_info() The choice of what queue limits are set in nvme_update_disk_info() vs. nvme_config_discard() seems a bit arbitrary. A subsequent commit will compute the discard_granularity limit using struct nvme_id_ns, which is only passed to nvme_update_disk_info() currently. So move the logic in nvme_config_discard() to nvme_update_disk_info(). Replace several instances of ns->ctrl in nvme_update_disk_info() with the ctrl variable brought from nvme_config_discard(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:04 -07:00
Caleb Sander Mateos	ac61e869be	nvme: add preferred I/O size fields to struct nvme_id_ns_nvm A subsequent change will use the NPDGL and NPDAL fields of the NVM Command Set Specific Identify Namespace structure, so add them (and the handful of intervening fields) to struct nvme_id_ns_nvm. Add an assertion that the size is still 4 KB. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:03 -07:00
Alistair Francis	ed6a9f7dab	nvme: Allow reauth from sysfs Allow userspace to trigger a reauth (REPLACETLSPSK) from sysfs. This can be done by writing a zero to the sysfs file. echo 0 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/tls_configured_key In order to use the new keys for the admin queue we call controller reset. This isn't ideal, but I can't find a simpler way to reset the admin queue TLS connection. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:03 -07:00
Alistair Francis	56d25f1a6e	nvme: Expose the tls_configured sysfs for secure concat connections Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:03 -07:00
Alistair Francis	2e6eb6b277	nvmet-tcp: Don't free SQ on authentication success Curently after the host sends a REPLACETLSPSK we free the TLS keys as part of calling nvmet_auth_sq_free() on success. This means when the host sends a follow up REPLACETLSPSK we return CONCAT_MISMATCH as the check for !nvmet_queue_tls_keyid(req->sq) fails. This patch ensures we don't free the TLS key on success as we might need it again in the future. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:03 -07:00
Alistair Francis	ecf4d2d883	nvmet-tcp: Don't error if TLS is enabed on a reset If the host sends a AUTH_Negotiate Message on the admin queue with REPLACETLSPSK set then we expect and require a TLS connection and shouldn't report an error if TLS is enabled. This change only enforces the nvmet_queue_tls_keyid() check if we aren't resetting the negotiation. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:03 -07:00
Eric Biggers	6d888db2cf	crypto: remove HKDF library Remove crypto/hkdf.c, since it's no longer used. Originally it had two users, but now both of them just inline the needed HMAC computations using the HMAC library APIs. That ends up being better, since it eliminates all the complexity and performance issues associated with the crypto_shash abstraction and multi-step HMAC input formatting. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:03 -07:00
Eric Biggers	26c8c2ddde	nvme-auth: common: remove selections of no-longer used crypto modules Now that nvme-auth uses the crypto library instead of crypto_shash, remove obsolete selections from the NVME_AUTH kconfig option. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:03 -07:00
Eric Biggers	844d950bb2	nvme-auth: common: remove nvme_auth_digest_name() Since nvme_auth_digest_name() is no longer used, remove it and the associated data from the hash_map array. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:02 -07:00
Eric Biggers	16977e7755	nvme-auth: target: use crypto library in nvmet_auth_ctrl_hash() For the HMAC computation in nvmet_auth_ctrl_hash(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Notably, this eliminates the crypto transformation object allocation for every call, which was very slow. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:02 -07:00
Eric Biggers	e501533f67	nvme-auth: target: use crypto library in nvmet_auth_host_hash() For the HMAC computation in nvmet_auth_host_hash(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Notably, this eliminates the crypto transformation object allocation for every call, which was very slow. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:02 -07:00
Eric Biggers	efe8df9f9c	nvme-auth: target: remove obsolete crypto_has_shash() checks Since nvme-auth is now doing its HMAC computations using the crypto library, it's guaranteed that all the algorithms actually work. Therefore, remove the crypto_has_shash() checks which are now obsolete. However, the caller in nvmet_auth_negotiate() seems to have also been relying on crypto_has_shash(nvme_auth_hmac_name(host_hmac_id)) to validate the host_hmac_id. Therefore, make it validate the ID more directly by checking whether nvme_auth_hmac_hash_len() returns 0 or not. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:02 -07:00
Eric Biggers	ac9a49cf6e	nvme-auth: host: remove allocation of crypto_shash Now that the crypto_shash that is being allocated in nvme_auth_process_dhchap_challenge() and stored in the struct nvme_dhchap_queue_context is no longer used, remove it. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:02 -07:00
Eric Biggers	c4f216c2a9	nvme-auth: host: use crypto library in nvme_auth_dhchap_setup_ctrl_response() For the HMAC computation in nvme_auth_dhchap_setup_ctrl_response(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:02 -07:00
Eric Biggers	6be8d3f043	nvme-auth: host: use crypto library in nvme_auth_dhchap_setup_host_response() For the HMAC computation in nvme_auth_dhchap_setup_host_response(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:02 -07:00
Eric Biggers	d126cbaa7d	nvme-auth: common: use crypto library in nvme_auth_derive_tls_psk() For the HKDF-Expand-Label computation in nvme_auth_derive_tls_psk(), use the crypto library instead of crypto_shash and crypto/hkdf.c. While this means the HKDF "helper" functions are no longer utilized, they clearly weren't buying us much: it's simpler to just inline the HMAC computations directly, and this code needs to be tested anyway. (A similar result was seen in fs/crypto/. As a result, this eliminates the last user of crypto/hkdf.c, which we'll be able to remove as well.) As usual this is also a lot more efficient, eliminating the allocation of a transformation object and multiple other dynamic allocations. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:01 -07:00
Eric Biggers	0002764c2f	nvme-auth: common: use crypto library in nvme_auth_generate_digest() For the HMAC computation in nvme_auth_generate_digest(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Notably, this eliminates the crypto transformation object allocation for every call, which was very slow. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:01 -07:00
Eric Biggers	be01b841d3	nvme-auth: common: use crypto library in nvme_auth_generate_psk() For the HMAC computation in nvme_auth_generate_psk(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Notably, this eliminates the crypto transformation object allocation for every call, which was very slow. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:01 -07:00
Eric Biggers	a67d096fe9	nvme-auth: common: use crypto library in nvme_auth_augmented_challenge() For the hash and HMAC computations in nvme_auth_augmented_challenge(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Notably, this eliminates two crypto transformation object allocations for every call, which was very slow. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:01 -07:00
Eric Biggers	092c05f8de	nvme-auth: common: use crypto library in nvme_auth_transform_key() For the HMAC computation in nvme_auth_transform_key(), use the crypto library instead of crypto_shash. This is simpler, faster, and more reliable. Notably, this eliminates the transformation object allocation for every call, which was very slow. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:01 -07:00
Eric Biggers	4263ca1cae	nvme-auth: common: add HMAC helper functions Add some helper functions for computing HMAC-SHA256, HMAC-SHA384, or HMAC-SHA512 values using the crypto library instead of crypto_shash. These will enable some significant simplifications and performance improvements in nvme-auth. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:01 -07:00
Eric Biggers	4454820b4e	nvme-auth: common: explicitly verify psk_len == hash_len nvme_auth_derive_tls_psk() is always called with psk_len == hash_len. And based on the comments above nvme_auth_generate_psk() and nvme_auth_derive_tls_psk(), this isn't an implementation choice but rather just the length the spec uses. Add a check which makes this explicit, so that when cleaning up nvme_auth_derive_tls_psk() we don't have to retain support for arbitrary values of psk_len. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:00 -07:00
Eric Biggers	0beeca72cf	nvme-auth: rename nvme_auth_generate_key() to nvme_auth_parse_key() This function does not generate a key. It parses the key from the string that the caller passes in. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:00 -07:00
Eric Biggers	f990ad67f0	nvme-auth: common: add KUnit tests for TLS key derivation Unit-test the sequence of function calls that derive tls_psk, so that we can be more confident that changes in the implementation don't break it. Since the NVMe specification doesn't seem to include any test vectors for this (nor does its description of the algorithm seem to match what was actually implemented, for that matter), I just set the expected values to the values that the code currently produces. In the case of SHA-512, nvme_auth_generate_digest() currently returns -EINVAL, so for now the test tests for that too. If it is later determined that some other behavior is needed, the test can be updated accordingly. Tested with: tools/testing/kunit/kunit.py run --kunitconfig drivers/nvme/common/ Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:00 -07:00
Eric Biggers	bf0e2567a6	nvme-auth: use proper argument types For input parameters, use pointer to const. This makes it easier to understand which parameters are inputs and which are outputs. In addition, consistently use char for strings and u8 for binary. This makes it easier to understand what is a string and what is binary data. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:00 -07:00
Eric Biggers	e57406c07b	nvme-auth: common: constify static data Fully constify the dhgroup_map and hash_map arrays. Remove 'const' from individual fields, as it is now redundant. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:00 -07:00
Eric Biggers	9100a28c8b	nvme-auth: add NVME_AUTH_MAX_DIGEST_SIZE constant Define a NVME_AUTH_MAX_DIGEST_SIZE constant and use it in the appropriate places. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-03-27 07:35:00 -07:00
Vasily Gorbik	67807fbaf1	block: fix bio_alloc_bioset slowpath GFP handling bio_alloc_bioset() first strips __GFP_DIRECT_RECLAIM from the optimistic fast allocation attempt with try_alloc_gfp(). If that fast path fails, the slowpath checks saved_gfp to decide whether blocking allocation is allowed, but then still calls mempool_alloc() with the stripped gfp mask. That can lead to a NULL bio pointer being passed into bio_init(). Fix the slowpath by using saved_gfp for the bio and bvec mempool allocations. Fixes: `b520c4eef8` ("block: split bio_alloc_bioset more clearly into a fast and slowpath") Reported-by: syzbot+09ddb593eea76a158f42@syzkaller.appspotmail.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/p01.gc6e9ad5845ad.ttca29g@ub.hpns Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-23 07:58:32 -06:00
Ming Lei	24d4c90286	ublk: move cold paths out of __ublk_batch_dispatch() for icache efficiency Mark ublk_filter_unused_tags() as noinline since it is only called from the unlikely(needs_filter) branch. Extract the error-handling block from __ublk_batch_dispatch() into a new noinline ublk_batch_dispatch_fail() function to keep the hot path compact and icache-friendly. This also makes __ublk_batch_dispatch() more readable by separating the error recovery logic from the normal dispatch flow. Before: __ublk_batch_dispatch is ~1419 bytes After: __ublk_batch_dispatch is ~1090 bytes (-329 bytes, -23%) Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://patch.msgid.link/20260318014112.3125432-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-22 19:03:52 -06:00
Jens Axboe	713db70d6d	Merge tag 'md-7.1-20260323' of git://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into for-7.1/block Pull MD changes from Yu Kuia: "Bug Fixes: - md: suppress spurious superblock update error message for dm-raid (Chen Cheng) - md/raid1: fix the comparing region of interval tree (Xiao Ni) - md/raid10: fix deadlock with check operation and nowait requests (Josh Hunt) - md/raid5: skip 2-failure compute when other disk is R5_LOCKED (FengWei Shih) - md/md-llbitmap: raise barrier before state machine transition (Yu Kuai) - md/md-llbitmap: skip reading rdevs that are not in_sync (Yu Kuai) Improvements: - md/raid5: set chunk_sectors to enable full stripe I/O splitting (Yu Kuai) Cleanups: - md: remove unused mddev argument from export_rdev (Chen Cheng) - md/raid5: remove stale md_raid5_kick_device() declaration (Chen Cheng) - md/raid5: move handle_stripe() comment to correct location (Chen Cheng)" * tag 'md-7.1-20260323' of git://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: md: remove unused mddev argument from export_rdev md/raid5: move handle_stripe() comment to correct location md/raid5: remove stale md_raid5_kick_device() declaration md/raid1: fix the comparing region of interval tree md/raid5: skip 2-failure compute when other disk is R5_LOCKED md/md-llbitmap: raise barrier before state machine transition md/md-llbitmap: skip reading rdevs that are not in_sync md/raid5: set chunk_sectors to enable full stripe I/O splitting md/raid10: fix deadlock with check operation and nowait requests md: suppress spurious superblock update error message for dm-raid	2026-03-22 13:37:45 -06:00
Chen Cheng	6f507eb2bb	md: remove unused mddev argument from export_rdev The mddev argument in export_rdev() is never used. Remove it to simplify callers. Signed-off-by: Chen Cheng <chencheng@fnnas.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/linux-raid/20260304111417.20777-1-chencheng@fnnas.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com>	2026-03-23 02:15:11 +08:00
Chen Cheng	81c041260a	md/raid5: move handle_stripe() comment to correct location Move the handle_stripe() documentation comment from above analyse_stripe() to directly above handle_stripe() where it belongs. Signed-off-by: Chen Cheng <chencheng@fnnas.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> Link: https://lore.kernel.org/linux-raid/20260304111001.15767-1-chencheng@fnnas.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com>	2026-03-23 02:15:11 +08:00
Chen Cheng	af5c99b8ea	md/raid5: remove stale md_raid5_kick_device() declaration Remove the unused md_raid5_kick_device() declaration from raid5.h - no definition exists for this function. Signed-off-by: Chen Cheng <chencheng@fnnas.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> Link: https://lore.kernel.org/linux-raid/20260304110919.15071-1-chencheng@fnnas.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com>	2026-03-23 02:15:11 +08:00
Xiao Ni	de3544d2e5	md/raid1: fix the comparing region of interval tree Interval tree uses [start, end] as a region which stores in the tree. In raid1, it uses the wrong end value. For example: bio(A,B) is too big and needs to be split to bio1(A,C-1), bio2(C,B). The region of bio1 is [A,C] and the region of bio2 is [C,B]. So bio1 and bio2 overlap which is not right. Fix this problem by using right end value of the region. Fixes: `d0d2d8ba04` ("md/raid1: introduce wait_for_serialization") Signed-off-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/linux-raid/20260305011839.5118-2-xni@redhat.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com>	2026-03-23 02:15:10 +08:00
FengWei Shih	52e4324935	md/raid5: skip 2-failure compute when other disk is R5_LOCKED When skip_copy is enabled on a doubly-degraded RAID6, a device that is being written to will be in R5_LOCKED state with R5_UPTODATE cleared. If a new read triggers fetch_block() while the write is still in flight, the 2-failure compute path may select this locked device as a compute target because it is not R5_UPTODATE. Because skip_copy makes the device page point directly to the bio page, reconstructing data into it might be risky. Also, since the compute marks the device R5_UPTODATE, it triggers WARN_ON in ops_run_io() which checks that R5_SkipCopy and R5_UPTODATE are not both set. This can be reproduced by running small-range concurrent read/write on a doubly-degraded RAID6 with skip_copy enabled, for example: mdadm -C /dev/md0 -l6 -n6 -R -f /dev/loop[0-3] missing missing echo 1 > /sys/block/md0/md/skip_copy fio --filename=/dev/md0 --rw=randrw --bs=4k --numjobs=8 \ --iodepth=32 --size=4M --runtime=30 --time_based --direct=1 Fix by checking R5_LOCKED before proceeding with the compute. The compute will be retried once the lock is cleared on IO completion. Signed-off-by: FengWei Shih <dannyshih@synology.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> Link: https://lore.kernel.org/linux-raid/20260319053351.3676794-1-dannyshih@synology.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com>	2026-03-22 09:57:33 +08:00
Kees Cook	c2d466b9fe	block: partitions: Replace pp_buf with struct seq_buf In preparation for removing the strlcat API[1], replace the char *pp_buf with a struct seq_buf, which tracks the current write position and remaining space internally. This allows for: - Direct use of seq_buf_printf() in place of snprintf()+strlcat() pairs, eliminating local tmp buffers throughout. - Adjacent strlcat() calls that build strings piece-by-piece (e.g., strlcat("["); strlcat(name); strlcat("]")) to be collapsed into single seq_buf_printf() calls. - Simpler call sites: seq_buf_puts() takes only the buffer and string, with no need to pass PAGE_SIZE at every call. The backing buffer allocation is unchanged (__get_free_page), and the output path uses seq_buf_str() to NUL-terminate before passing to printk(). Link: https://github.com/KSPP/linux/issues/370 [1] Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Josh Law <objecting@objecting.org> Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Josh Law <objecting@objecting.org> Link: https://patch.msgid.link/20260321004840.work.670-kees@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-21 08:27:08 -06:00

1 2 3 4 5 ...

1427967 Commits