linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-21 20:45:27 -04:00

Author	SHA1	Message	Date
Li Zhijian	87bf646921	RDMA/rxe: Fix race condition in QP timer handlers I encontered the following warning: WARNING: drivers/infiniband/sw/rxe/rxe_task.c:249 at rxe_sched_task+0x1c8/0x238 [rdma_rxe], CPU#0: swapper/0/0 ... libsha1 [last unloaded: ip6_udp_tunnel] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G C 6.19.0-rc5-64k-v8+ #37 PREEMPT Tainted: [C]=CRAP Hardware name: Raspberry Pi 4 Model B Rev 1.2 Call trace: rxe_sched_task+0x1c8/0x238 [rdma_rxe] (P) retransmit_timer+0x130/0x188 [rdma_rxe] call_timer_fn+0x68/0x4d0 __run_timers+0x630/0x888 ... WARNING: drivers/infiniband/sw/rxe/rxe_task.c:38 at rxe_sched_task+0x1c0/0x238 [rdma_rxe], CPU#0: swapper/0/0 ... WARNING: drivers/infiniband/sw/rxe/rxe_task.c:111 at do_work+0x488/0x5c8 [rdma_rxe], CPU#3: kworker/u17:4/93400 ... refcount_t: underflow; use-after-free. WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x138/0x1a0, CPU#3: kworker/u17:4/93400 The issue is caused by a race condition between retransmit_timer() and rxe_destroy_qp, leading to the Queue Pair's (QP) reference count dropping to zero during timer handler execution. It seems this warning is harmless because rxe_qp_do_cleanup() will flush all pending timers and requests. Example of flow causing the issue: CPU0 CPU1 retransmit_timer() { spin_lock_irqsave rxe_destroy_qp() __rxe_cleanup() __rxe_put() // qp->ref_count decrease to 0 rxe_qp_do_cleanup() { if (qp->valid) { rxe_sched_task() { WARN_ON(rxe_read(task->qp) <= 0); } } spin_unlock_irqrestore } spin_lock_irqsave qp->valid = 0 spin_unlock_irqrestore } Ensure the QP's reference count is maintained and its validity is checked within the timer callbacks by adding calls to rxe_get(qp) and corresponding rxe_put(qp) after use. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Fixes: `d946716325` ("RDMA/rxe: Rewrite rxe_task.c") Link: https://patch.msgid.link/20260120074437.623018-1-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-28 05:02:30 -05:00
Konstantin Taranov	a01745ccf7	RDMA/mana_ib: Add device‑memory support Introduce a basic DM implementation that enables creating and registering device memory, and using the associated memory keys for networking operations. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/20260127082649.429018-1-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-27 09:16:11 -05:00
Zilin Guan	9b9d253908	RDMA/mlx5: Fix memory leak in GET_DATA_DIRECT_SYSFS_PATH handler The UVERBS_HANDLER(MLX5_IB_METHOD_GET_DATA_DIRECT_SYSFS_PATH) function allocates memory for the device path using kobject_get_path(). If the length of the device path exceeds the output buffer length, the function returns -ENOSPC but does not free the allocated memory, resulting in a memory leak. Add a kfree() call to the error path to ensure the allocated memory is properly freed. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: `ec7ad65309` ("RDMA/mlx5: Introduce GET_DATA_DIRECT_SYSFS_PATH ioctl") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Link: https://patch.msgid.link/20260126074801.627898-1-zilin@seu.edu.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-27 07:04:18 -05:00
Yi Liu	1956f0a74c	RDMA/uverbs: Validate wqe_size before using it in ib_uverbs_post_send ib_uverbs_post_send() uses cmd.wqe_size from userspace without any validation before passing it to kmalloc() and using the allocated buffer as struct ib_uverbs_send_wr. If a user provides a small wqe_size value (e.g., 1), kmalloc() will succeed, but subsequent accesses to user_wr->opcode, user_wr->num_sge, and other fields will read beyond the allocated buffer, resulting in an out-of-bounds read from kernel heap memory. This could potentially leak sensitive kernel information to userspace. Additionally, providing an excessively large wqe_size can trigger a WARNING in the memory allocation path, as reported by syzkaller. This is inconsistent with ib_uverbs_unmarshall_recv() which properly validates that wqe_size >= sizeof(struct ib_uverbs_recv_wr) before proceeding. Add the same validation for ib_uverbs_post_send() to ensure wqe_size is at least sizeof(struct ib_uverbs_send_wr). Fixes: `c3bea3d2dc` ("RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()") Signed-off-by: Yi Liu <liuy22@mails.tsinghua.edu.cn> Link: https://patch.msgid.link/20260122142900.2356276-2-liuy22@mails.tsinghua.edu.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-26 07:57:44 -05:00
Jacob Moroni	2529aead51	RDMA/irdma: Use CQ ID for CEQE context The hardware allows for an opaque CQ context field to be carried over into CEQEs for the CQ. Previously, a pointer to the CQ was used for this context. In the normal CQ destroy flow, the CEQ ring is scrubbed to remove any preexisting CEQEs for the CQ that may not have been processed yet so that the CQ structure is not dereferenced in the CEQ ISR after the CQ has been freed. However, in some cases, it is possible for a CEQE to be in flight in HW even after the CQ destroy command completion is received, so it could be missed during the scrub. To protect against this, we can take advantage of the CQ table that already exists and use the CQ ID for this context rather than a CQ pointer. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260120212546.1893076-2-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-25 08:54:20 -05:00
Jacob Moroni	2b7c2ba130	RDMA/irdma: Add enum defs for reserved CQs/QPs Added definitions for the special reserved CQs and QPs. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260120212546.1893076-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-25 08:54:20 -05:00
Li Zhijian	12985e5915	RDMA/rxe: Fix iova-to-va conversion for MR page sizes != PAGE_SIZE The current implementation incorrectly handles memory regions (MRs) with page sizes different from the system PAGE_SIZE. The core issue is that rxe_set_page() is called with mr->page_size step increments, but the page_list stores individual struct page pointers, each representing PAGE_SIZE of memory. ib_sg_to_page() has ensured that when i>=1 either a) SG[i-1].dma_end and SG[i].dma_addr are contiguous or b) SG[i-1].dma_end and SG[i].dma_addr are mr->page_size aligned. This leads to incorrect iova-to-va conversion in scenarios: 1) page_size < PAGE_SIZE (e.g., MR: 4K, system: 64K): ibmr->iova = 0x181800 sg[0]: dma_addr=0x181800, len=0x800 sg[1]: dma_addr=0x173000, len=0x1000 Access iova = 0x181800 + 0x810 = 0x182010 Expected VA: 0x173010 (second SG, offset 0x10) Before fix: - index = (0x182010 >> 12) - (0x181800 >> 12) = 1 - page_offset = 0x182010 & 0xFFF = 0x10 - xarray[1] stores system page base 0x170000 - Resulting VA: 0x170000 + 0x10 = 0x170010 (wrong) 2) page_size > PAGE_SIZE (e.g., MR: 64K, system: 4K): ibmr->iova = 0x18f800 sg[0]: dma_addr=0x18f800, len=0x800 sg[1]: dma_addr=0x170000, len=0x1000 Access iova = 0x18f800 + 0x810 = 0x190010 Expected VA: 0x170010 (second SG, offset 0x10) Before fix: - index = (0x190010 >> 16) - (0x18f800 >> 16) = 1 - page_offset = 0x190010 & 0xFFFF = 0x10 - xarray[1] stores system page for dma_addr 0x170000 - Resulting VA: system page of 0x170000 + 0x10 = 0x170010 (wrong) Yi Zhang reported a kernel panic[1] years ago related to this defect. Solution: 1. Replace xarray with pre-allocated rxe_mr_page array for sequential indexing (all MR page indices are contiguous) 2. Each rxe_mr_page stores both struct page* and offset within the system page 3. Handle MR page_size != PAGE_SIZE relationships: - page_size > PAGE_SIZE: Split MR pages into multiple system pages - page_size <= PAGE_SIZE: Store offset within system page 4. Add boundary checks and compatibility validation This ensures correct iova-to-va conversion regardless of MR page size and system PAGE_SIZE relationship, while improving performance through array-based sequential access. Tests on 4K and 64K PAGE_SIZE hosts: - rdma-core/pytests $ ./build/bin/run_tests.py --dev eth0_rxe - blktest: $ TIMEOUT=30 QUICK_RUN=1 USE_RXE=1 NVMET_TRTYPES=rdma ./check nvme srp rnbd [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/ Fixes: `592627ccbd` ("RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20260116032753.2574363-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-25 08:33:19 -05:00
Li Zhijian	d3922f6dad	RDMA/rxe: Remove unused page_offset member In rxe_map_mr_sg(), the `page_offset` member of the `rxe_mr` struct was initialized based on `ibmr.iova`, which will be updated inside ib_sg_to_pages() later. Consequently, the value assigned to `page_offset` was incorrect. However, since `page_offset` was never utilized throughout the code, it can be safely removed to clean up the codebase and avoid future confusion. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20260116032833.2574627-1-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-18 11:46:15 -05:00
Or Har-Toov	18ea78e2ae	IB/mlx5: Fix port speed query for representors When querying speed information for a representor in switchdev mode, the code previously used the first device in the eswitch, which may not match the device that actually owns the representor. In setups such as multi-port eswitch or LAG, this led to incorrect port attributes being reported. Fix this by retrieving the correct core device from the representor's eswitch before querying its port attributes. Fixes: `27f9e0ccb6` ("net/mlx5: Lag, Add single RDMA device in multiport mode") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260115-port-speed-query-fix-v2-1-3bde6a3c78e7@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-18 11:36:59 -05:00
Chiara Meiohas	ebc2164a4c	RDMA/mlx5: Fix UMR hang in LAG error state unload During firmware reset in LAG mode, a race condition causes the driver to hang indefinitely while waiting for UMR completion during device unload. See [1]. In LAG mode the bond device is only registered on the master, so it never sees sys_error events from the slave. During firmware reset this causes UMR waits to hang forever on unload as the slave is dead but the master hasn't entered error state yet, so UMR posts succeed but completions never arrive. Fix this by adding a sys_error notifier that gets registered before MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device(). This ensures error events reach the bond device throughout teardown. [1] Call Trace: __schedule+0x2bd/0x760 schedule+0x37/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.6+0x2b5/0x4a0 __mlx5_ib_dereg_mr+0x606/0x870 [mlx5_ib] ? __xa_erase+0x4a/0xa0 ? _cond_resched+0x15/0x30 ? wait_for_completion+0x31/0x100 ib_dereg_mr_user+0x48/0xc0 [ib_core] ? rdmacg_uncharge_hierarchy+0xa0/0x100 destroy_hw_idr_uobject+0x20/0x50 [ib_uverbs] uverbs_destroy_uobject+0x37/0x150 [ib_uverbs] __uverbs_cleanup_ufile+0xda/0x140 [ib_uverbs] uverbs_destroy_ufile_hw+0x3a/0xf0 [ib_uverbs] ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs] remove_client_context+0x8b/0xd0 [ib_core] disable_device+0x8c/0x130 [ib_core] __ib_unregister_device+0x10d/0x180 [ib_core] ib_unregister_device+0x21/0x30 [ib_core] __mlx5_ib_remove+0x1e4/0x1f0 [mlx5_ib] auxiliary_bus_remove+0x1e/0x30 device_release_driver_internal+0x103/0x1f0 bus_remove_device+0xf7/0x170 device_del+0x181/0x410 mlx5_rescan_drivers_locked.part.10+0xa9/0x1d0 [mlx5_core] mlx5_disable_lag+0x253/0x260 [mlx5_core] mlx5_lag_disable_change+0x89/0xc0 [mlx5_core] mlx5_eswitch_disable+0x67/0xa0 [mlx5_core] mlx5_unload+0x15/0xd0 [mlx5_core] mlx5_unload_one+0x71/0xc0 [mlx5_core] mlx5_sync_reset_reload_work+0x83/0x100 [mlx5_core] process_one_work+0x1a7/0x360 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x22/0x40 Fixes: `ede132a5cf` ("RDMA/mlx5: Move events notifier registration to be after device registration") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260113-umr-hand-lag-fix-v1-1-3dc476e00cd9@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-18 11:04:07 -05:00
Konstantin Taranov	f972bde732	RDMA/mana_ib: Take CQ type from the device type Get CQ type from the used gdma device. The MANA_IB_CREATE_RNIC_CQ flag is ignored. It was used in older kernel versions where the mana_ib was shared between ethernet and rnic. Fixes: `d4293f96ce` ("RDMA/mana_ib: unify mana_ib functions to support any gdma device") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/20260115093625.177306-1-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-15 06:03:22 -05:00
Jacob Moroni	7874eeacfa	RDMA/iwcm: Fix workqueue list corruption by removing work_list The commit `e1168f0` ("RDMA/iwcm: Simplify cm_event_handler()") changed the work submission logic to unconditionally call queue_work() with the expectation that queue_work() would have no effect if work was already pending. The problem is that a free list of struct iwcm_work is used (for which struct work_struct is embedded), so each call to queue_work() is basically unique and therefore does indeed queue the work. This causes a problem in the work handler which walks the work_list until it's empty to process entries. This means that a single run of the work handler could process item N+1 and release it back to the free list while the actual workqueue entry is still queued. It could then get reused (INIT_WORK...) and lead to list corruption in the workqueue logic. Fix this by just removing the work_list. The workqueue already does this for us. This fixes the following error that was observed when stress testing with ucmatose on an Intel E830 in iWARP mode: [ 151.465780] list_del corruption. next->prev should be ffff9f0915c69c08, but was ffff9f0a1116be08. (next=ffff9f0a15b11c08) [ 151.466639] ------------[ cut here ]------------ [ 151.466986] kernel BUG at lib/list_debug.c:67! [ 151.467349] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 151.467753] CPU: 14 UID: 0 PID: 2306 Comm: kworker/u64:18 Not tainted 6.19.0-rc4+ #1 PREEMPT(voluntary) [ 151.468466] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 151.469192] Workqueue: 0x0 (iw_cm_wq) [ 151.469478] RIP: 0010:__list_del_entry_valid_or_report+0xf0/0x100 [ 151.469942] Code: c7 58 5f 4c b2 e8 10 50 aa ff 0f 0b 48 89 ef e8 36 57 cb ff 48 8b 55 08 48 89 e9 48 89 de 48 c7 c7 a8 5f 4c b2 e8 f0 4f aa ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 [ 151.471323] RSP: 0000:ffffb15644e7bd68 EFLAGS: 00010046 [ 151.471712] RAX: 000000000000006d RBX: ffff9f0915c69c08 RCX: 0000000000000027 [ 151.472243] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f0a37d9c600 [ 151.472768] RBP: ffff9f0a15b11c08 R08: 0000000000000000 R09: c0000000ffff7fff [ 151.473294] R10: 0000000000000001 R11: ffffb15644e7bba8 R12: ffff9f092339ee68 [ 151.473817] R13: ffff9f0900059c28 R14: ffff9f092339ee78 R15: 0000000000000000 [ 151.474344] FS: 0000000000000000(0000) GS:ffff9f0a847b5000(0000) knlGS:0000000000000000 [ 151.474934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 151.475362] CR2: 0000559e233a9088 CR3: 000000020296b004 CR4: 0000000000770ef0 [ 151.475895] PKRU: 55555554 [ 151.476118] Call Trace: [ 151.476331] <TASK> [ 151.476497] move_linked_works+0x49/0xa0 [ 151.476792] __pwq_activate_work.isra.46+0x2f/0xa0 [ 151.477151] pwq_dec_nr_in_flight+0x1e0/0x2f0 [ 151.477479] process_scheduled_works+0x1c8/0x410 [ 151.477823] worker_thread+0x125/0x260 [ 151.478108] ? __pfx_worker_thread+0x10/0x10 [ 151.478430] kthread+0xfe/0x240 [ 151.478671] ? __pfx_kthread+0x10/0x10 [ 151.478955] ? __pfx_kthread+0x10/0x10 [ 151.479240] ret_from_fork+0x208/0x270 [ 151.479523] ? __pfx_kthread+0x10/0x10 [ 151.479806] ret_from_fork_asm+0x1a/0x30 [ 151.480103] </TASK> Fixes: `e1168f09b3` ("RDMA/iwcm: Simplify cm_event_handler()") Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260112020006.1352438-1-jmoroni@google.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-15 04:59:53 -05:00
Jiasheng Jiang	0beefd0e15	RDMA/rxe: Fix double free in rxe_srq_from_init In rxe_srq_from_init(), the queue pointer 'q' is assigned to 'srq->rq.queue' before copying the SRQ number to user space. If copy_to_user() fails, the function calls rxe_queue_cleanup() to free the queue, but leaves the now-invalid pointer in 'srq->rq.queue'. The caller of rxe_srq_from_init() (rxe_create_srq) eventually calls rxe_srq_cleanup() upon receiving the error, which triggers a second rxe_queue_cleanup() on the same memory, leading to a double free. The call trace looks like this: kmem_cache_free+0x.../0x... rxe_queue_cleanup+0x1a/0x30 [rdma_rxe] rxe_srq_cleanup+0x42/0x60 [rdma_rxe] rxe_elem_release+0x31/0x70 [rdma_rxe] rxe_create_srq+0x12b/0x1a0 [rdma_rxe] ib_create_srq_user+0x9a/0x150 [ib_core] Fix this by moving 'srq->rq.queue = q' after copy_to_user. Fixes: `aae0484e15` ("IB/rxe: avoid srq memory leak") Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Link: https://patch.msgid.link/20260112015412.29458-1-jiashengjiangcool@gmail.com Reviewed-by: Zhu Yanjun <yanjun.Zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-15 04:59:53 -05:00
Chengchang Tang	354e7a6d44	RDMA/hns: Support drain SQ and RQ Some ULPs, e.g. rpcrdma, rely on drain_qp() to ensure all outstanding requests are completed before releasing related memory. If drain_qp() fails, ULPs may release memory directly, and in-flight WRs may later be flushed after the memory is freed, potentially leading to UAF. drain_qp() failures can happen when HW enters an error state or is reset. Add support to drain SQ and RQ in such cases by posting a fake WR during reset, so the driver can process all remaining WRs in sequence and generate corresponding completions. Always invoke comp_handler() in drain process to ensure completions are not lost under concurrency (e.g. concurrent post_send() and reset, or QPs created during reset). If the CQ is already processed, cancel any already scheduled comp_handler() to avoid concurrency issues. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260108113032.856306-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-15 04:59:53 -05:00
Jacob Moroni	5c3f795d17	RDMA/irdma: Remove fixed 1 ms delay during AH wait loop The AH CQP command wait loop executes in an atomic context and was using a fixed 1 ms delay. Since many AH create commands can complete much faster than 1 ms, use poll_timeout_us_atomic with a 1 us delay. Also, use the timeout value indicated during the capability exchange rather than a hard-coded value. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260105180550.2907858-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:19:11 -05:00
Jacob Moroni	52f3d34c29	RDMA/irdma: Remove redundant dma_wmb() before writel() A dma_wmb() is not necessary before a writel() because writel() already has an even stronger store barrier. A dma_wmb() is only required to order writes to consistent/DMA memory whereas the barrier in writel() is specified to order writes to DMA memory as well as MMIO. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260103172517.2088895-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:37 -05:00
Grzegorz Prajsner	88f2bf22d9	RDMA/rtrs-srv: Fix error print in process_info_req() rtrs_srv_change_state() returns bool (true on success) therefore there is no reason to print error when it fails as it always will be 0. Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-11-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:14 -05:00
Md Haris Iqbal	fc29063070	RDMA/rtrs-clt: For conn rejection use actual err number When the connection establishment request is rejected from the server side, then the actual error number sent back should be used. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-10-haris.iqbal@ionos.com Reviewed-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:14 -05:00
Kim Zhu	6405f72e7a	RDMA/rtrs: Extend log message when a port fails Add HCA name and port of this HCA. This would help with analysing and debugging the logs. The logs would looks something like this, rtrs_server L2516: Handling event: port error (10). HCA name: mlx4_0, port num: 2 rtrs_client L3326: Handling event: port error (10). HCA name: mlx4_0, port num: 1 Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-9-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:14 -05:00
Kim Zhu	b034a10fdf	RDMA/rtrs-srv: Rate-limit I/O path error logging Excessive error logging is making it difficult to identify the root cause of issues. Implement rate limiting to improve log clarity. Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-8-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:13 -05:00
Md Haris Iqbal	c32eaba2d7	RDMA/rtrs-srv: Add check and closure for possible zombie paths During several network incidents, a number of RTRS paths for a session went through disconnect and reconnect phase. However, some of those did not auto-reconnect successfully. Instead they failed with the following logs, On client, kernel: rtrs_client L1991: <sess-name>: Connect rejected: status 28 (consumer defined), rtrs errno -104 kernel: rtrs_client L2698: <sess-name>: init_conns() failed: err=-104 path=gid:<gid1>@gid:<gid2> [mlx4_0:1] On server, (log a) kernel: ibtrs_server L1868: <>: Connection already exists: 0 When the misbehaving path was removed, and add_path was called to re-add the path, the log on client side changed to, (log b) kernel: rtrs_client L1991: <sess-name>: Connect rejected: status 28 (consumer defined), rtrs errno -17 There was no log on the server side for this, which is expected since there is no logging in that path, if (unlikely(__is_path_w_addr_exists(srv, &cm_id->route.addr))) { err = -EEXIST; goto err; Because of the following check on server side, if (unlikely(sess->state != IBTRS_SRV_CONNECTING)) { ibtrs_err(s, "Session in wrong state: %s\n", .. we know that the path in (log a) was in CONNECTING state. The above state of the path persists for as long as we leave the session be. This means that the path is in some zombie state, probably waiting for the info_req packet to arrive, which never does. The changes in this commits does 2 things. 1) Add logs at places where we see the errors happening. The logs would shed more light at the state and lifetime of such zombie paths. 2) Close such zombie sessions, only if they are in CONNECTING state, and after an inactivity period of 30 seconds. i) The state check prevents closure of paths which are CONNECTED. Also, from the above logs and code, we already know that the path could only be on CONNECTING state, so we play safe and narrow our impact surface area by closing only CONNECTING paths. ii) The inactivity period is to allow requests for other cid to finish processing, or for any stray packets to arrive/fail. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-7-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:13 -05:00
Jack Wang	781c35b5d5	RDMA/rtrs-clt: Remove unused members in rtrs_clt_io_req Remove unused members from rtrs_clt_io_req. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-6-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:13 -05:00
Kim Zhu	f85febf57b	RDMA/rtrs: Improve error logging for RDMA cm events The member variable status in the struct rdma_cm_event is used for both linux errors and the errors definded in rdma stack. Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:13 -05:00
Md Haris Iqbal	9293e04278	RDMA/rtrs: Add optional support for IB_MR_TYPE_SG_GAPS Support IB_MR_TYPE_SG_GAPS, which has less limitations than standard IB_MR_TYPE_MEM_REG, a few ULP support this. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:13 -05:00
Kim Zhu	d6cc7b0d61	RDMA/rtrs: Add error description to the logs Print error description instead of the error number. Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-3-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:13 -05:00
Roman Penyaev	83835f7c07	RDMA/rtrs-srv: fix SG mapping This fixes the following error on the server side: RTRS server session allocation failed: -EINVAL caused by the caller of the `ib_dma_map_sg()`, which does not expect less mapped entries, than requested, which is in the order of things and can be easily reproduced on the machine with enabled IOMMU. The fix is to treat any positive number of mapped sg entries as a successful mapping and cache DMA addresses by traversing modified SG table. Fixes: `9cb8374804` ("RDMA/rtrs: server: main functionality") Signed-off-by: Roman Penyaev <r.peniaev@gmail.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:13 -05:00
Leon Romanovsky	325e3b5431	RDMA/ocrdma: Remove unused OCRDMA_UVERBS definition The OCRDMA_UVERBS() macro is unused, so remove it to clean up the code. Link: https://patch.msgid.link/20260104-ib-core-misc-v1-6-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-01-05 04:04:01 -05:00
Leon Romanovsky	cc016ebeb1	RDMA/qedr: Remove unused defines Perform basic cleanup by removing unused defines from qedr.h. Link: https://patch.msgid.link/20260104-ib-core-misc-v1-5-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-01-05 04:03:46 -05:00
Leon Romanovsky	522a5c1c56	RDMA/mlx5: Avoid direct access to DMA device pointer The dma_device field is marked as internal and must not be accessed by drivers or ULPs. Remove all direct mlx5 references to this field. Link: https://patch.msgid.link/20260104-ib-core-misc-v1-4-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-01-05 04:02:51 -05:00
Maher Sanalla	6dc78c53de	RDMA/mlx5: Fix ucaps init error flow In mlx5_ib_stage_caps_init(), if mlx5_ib_init_ucaps() fails after mlx5_ib_init_var_table() succeeds, the VAR bitmap is leaked since the function returns without cleanup. Thus, cleanup the var table bitmap in case of error of initializing ucaps before exiting, preventing the leak above. Fixes: `cf7174e898` ("RDMA/mlx5: Create UCAP char devices for supported device capabilities") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/20260104-ib-core-misc-v1-3-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 04:02:40 -05:00
Parav Pandit	8d466b155f	RDMA/core: Avoid exporting module local functions and remove not-used ones Some of the functions are local to the module and some are not used starting from commit `36783dec8d` ("RDMA/rxe: Delete deprecated module parameters interface"). Delete and avoid exporting them. Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://patch.msgid.link/20260104-ib-core-misc-v1-2-00367f77f3a8@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 04:02:29 -05:00
Leon Romanovsky	ac7dea328a	RDMA/umem: Remove redundant DMABUF ops check ib_umem_dmabuf_get_with_dma_device() is an in-kernel function and does not require a defensive check for the .move_notify callback. All current callers guarantee that this callback is always present. Link: https://patch.msgid.link/20260104-ib-core-misc-v1-1-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-01-05 04:02:18 -05:00
Or Har-Toov	aaecff5e13	RDMA/mlx5: Implement query_port_speed callback Implement the query_port_speed callback for mlx5 driver to support querying effective port bandwidth. For LAG configurations, query the aggregated speed from the LAG layer or from the modified vport max_tx_speed. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:43:17 -05:00
Or Har-Toov	3fd984d5cd	RDMA/mlx5: Raise async event on device speed change Raise IB_EVENT_DEVICE_SPEED_CHANGE whenever the speed of one of the device's ports changes. Usually all ports of the device changes together. This ensures user applications and upper-layer software are immediately notified when bandwidth changes, improving traffic management in dynamic environments. This is especially useful for vports which are part of a LAG configuration, to know if the effective speed of the LAG was changed. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:43:12 -05:00
Or Har-Toov	51a07ce2fe	IB/core: Add query_port_speed verb Add new ibv_query_port_speed() verb to enable applications to query the effective bandwidth of a port. This verb is particularly useful when the speed is not a multiplication of IB speed and width where width is 2^n. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:43:05 -05:00
Or Har-Toov	d4adeff26c	IB/core: Refactor rate_show to use ib_port_attr_to_rate() Update sysfs rate_show() to rely on ib_port_attr_to_speed_info() for converting IB port speed and width attributes to data rate and speed string. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:42:59 -05:00
Or Har-Toov	2941abac6d	IB/core: Add helper to convert port attributes to data rate Introduce ib_port_attr_to_rate() to compute the data rate in 100 Mbps units (deci-Gb/sec) from a port's active_speed and active_width attributes. This generic helper removes duplicated speed-to-rate calculations, which are used by sysfs and the upcoming new verb. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:42:53 -05:00
Or Har-Toov	263d1d9975	IB/core: Add async event on device speed change Add IB_EVENT_DEVICE_SPEED_CHANGE for notifying user applications on device's ports speed changes. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:42:43 -05:00
Leon Romanovsky	976cc7ac12	Support effective VF bandwidth query in LAG mode Currently, mlx5 driver exposes only the parent function's speed to VFs, providing no way to query the actual effective bandwidth in LAG and MPESW configurations. This limitation prevents userspace and upper-layer software from obtaining accurate bandwidth information, which impacts traffic scheduling decisions. This series addresses this by: 1. Adding mlx5 internal logic to calculate and propagate the effective aggregated LAG speed to all attached vports. The vport speeds are dynamically updated when LAG member link states change. 2. Extending RDMA core with a new ib_query_port_speed() verb and an IB_EVENT_DEVICE_SPEED_CHANGE async event. These interfaces expose the effective port speed to userspace, supporting speeds that are not expressible as IB speed * width (where width is 2^n). This series enables userspace applications to query the effective port speed and receive notifications on speed changes in real-time. In LAG configurations, each mlx5 port reports the aggregated bandwidth of all active LAG members. Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:39:29 -05:00
Or Har-Toov	f0b2fde980	net/mlx5: Add support for querying bond speed Add mlx5_lag_query_bond_speed() to query the aggregated speed of lag configurations with a bond device. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:38:44 -05:00
Or Har-Toov	28ea6036da	net/mlx5: Handle port and vport speed change events in MPESW Add port change event handling logic for MPESW LAG mode, ensuring VFs are updated when the speed of LAG physical ports changes. This triggers a speed update workflow when relevant port state changes occur, enabling consistent and accurate reporting of VF bandwidth. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:38:25 -05:00
Or Har-Toov	50f1d188c5	net/mlx5: Propagate LAG effective max_tx_speed to vports Currently, vports report only their parent's uplink speed, which in LAG setups does not reflect the true aggregated bandwidth. This makes it hard for upper-layer software to optimize load balancing decisions based on accurate bandwidth information. Fix the issue by calculating the possible maximum speed of a LAG as the sum of speeds of all active uplinks that are part of the LAG. Propagate this effective max speed to vports associated with the LAG whenever a relevant event occurs, such as physical port link state changes or LAG creation/modification. With this change, upper-layer components receive accurate bandwidth information corresponding to the active members of the LAG and can make better load balancing decisions. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:38:17 -05:00
Or Har-Toov	3df5dd46fc	net/mlx5: Add max_tx_speed and its CAP bit to IFC Introduce the max_tx_speed field to the query and modify_vport_state structures. Add the esw_vport_state_max_tx_speed capability bit, indicating the firmware support modifying the max_tx_speed field via the MODIFY_VPORT_STATE command. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:37:43 -05:00
Chengchang Tang	0789f92990	RDMA/hns: Notify ULP of remaining soft-WCs during reset During a reset, software-generated WCs cannot be reported via interrupts. This may cause the ULP to miss some WCs. To avoid this, add check in the CQ arm process: if a hardware reset has occurred and there are still unreported soft-WCs, notify the ULP to handle the remaining WCs, thereby preventing any loss of completions. Fixes: `626903e935` ("RDMA/hns: Add support for reporting wc as software mode") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Junxian Huang	84bd5d60f0	RDMA/hns: Fix RoCEv1 failure due to DSCP DSCP is not supported in RoCEv1, but get_dscp() is still called. If get_dscp() returns an error, it'll eventually cause create_ah to fail even when using RoCEv1. Correct the return value and avoid calling get_dscp() when using RoCEv1. Fixes: `ee20cc17e9` ("RDMA/hns: Support DSCP") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Junxian Huang	8cda8acbb1	RDMA/hns: Return actual error code instead of fixed EINVAL query_cqc() and query_mpt() may return various error codes in different cases. Return actual error code instead of fixed EINVAL. Fixes: `f2b070f36d` ("RDMA/hns: Support CQ's restrack raw ops for hns driver") Fixes: `3d67e7e236` ("RDMA/hns: Support MR's restrack raw ops for hns driver") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Chengchang Tang	c0a26bbd3f	RDMA/hns: Fix WQ_MEM_RECLAIM warning When sunrpc is used, if a reset triggered, our wq may lead the following trace: workqueue: WQ_MEM_RECLAIM xprtiod:xprt_rdma_connect_worker [rpcrdma] is flushing !WQ_MEM_RECLAIM hns_roce_irq_workq:flush_work_handle [hns_roce_hw_v2] WARNING: CPU: 0 PID: 8250 at kernel/workqueue.c:2644 check_flush_dependency+0xe0/0x144 Call trace: check_flush_dependency+0xe0/0x144 start_flush_work.constprop.0+0x1d0/0x2f0 __flush_work.isra.0+0x40/0xb0 flush_work+0x14/0x30 hns_roce_v2_destroy_qp+0xac/0x1e0 [hns_roce_hw_v2] ib_destroy_qp_user+0x9c/0x2b4 rdma_destroy_qp+0x34/0xb0 rpcrdma_ep_destroy+0x28/0xcc [rpcrdma] rpcrdma_ep_put+0x74/0xb4 [rpcrdma] rpcrdma_xprt_disconnect+0x1d8/0x260 [rpcrdma] xprt_rdma_connect_worker+0xc0/0x120 [rpcrdma] process_one_work+0x1cc/0x4d0 worker_thread+0x154/0x414 kthread+0x104/0x144 ret_from_fork+0x10/0x18 Since QP destruction frees memory, this wq should have the WQ_MEM_RECLAIM. Fixes: `ffd541d457` ("RDMA/hns: Add the workqueue framework for flush cqe handler") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Etienne AUJAMES	ddd6c8c873	IB/cache: update gid cache on client reregister event Some HCAs (e.g: ConnectX4) do not trigger a IB_EVENT_GID_CHANGE on subnet prefix update from SM (PortInfo). Since the commit `d58c23c925` ("IB/core: Only update PKEY and GID caches on respective events"), the GID cache is updated exclusively on IB_EVENT_GID_CHANGE. If this event is not emitted, the subnet prefix in the IPoIB interface’s hardware address remains set to its default value (0xfe80000000000000). Then rdma_bind_addr() failed because it relies on hardware address to find the port GID (subnet_prefix + port GUID). This patch fixes this issue by updating the GID cache on IB_EVENT_CLIENT_REREGISTER event (emitted on PortInfo::ClientReregister=1). Fixes: `d58c23c925` ("IB/core: Only update PKEY and GID caches on respective events") Signed-off-by: Etienne AUJAMES <eaujames@ddn.com> Link: https://patch.msgid.link/aVUfsO58QIDn5bGX@eaujamesFR0130 Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 09:07:22 -05:00
Lianfa Weng	8818ffb04b	RDMA/hns: Introduce limit_bank mode with better performance In limit_bank mode, QPs/CQs are restricted to using half of the banks. HW concentrates resources on these banks, thereby improving performance compared to the default mode. Switch between limit_bank mode and default mode by setting the cap flag in FW. Since the number of QPs and CQs will be halved, this mode is suitable for scenarios where fewer QPs and CQs are required. Signed-off-by: Lianfa Weng <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20251230154911.3397584-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-31 05:33:34 -05:00
Linus Torvalds	f8f9c1f4d0	Linux 6.19-rc3 v6.19-rc3	2025-12-28 13:24:26 -08:00

1 2 3 4 5 ...

1412584 Commits