linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-02 04:02:30 -04:00

Author	SHA1	Message	Date
Siva Reddy Kallam	4f830cd8d7	RDMA/bng_re: Add infrastructure for enabling Firmware channel Add infrastructure for enabling Firmware channel. Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20251117171136.128193-6-siva.kallam@broadcom.com Reviewed-by: Usman Ansari <usman.ansari@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-24 02:58:29 -05:00
Siva Reddy Kallam	53310b698f	RDMA/bng_re: Allocate required memory resources for Firmware channel Allocate required memory resources for Firmware channel. Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20251117171136.128193-5-siva.kallam@broadcom.com Reviewed-by: Usman Ansari <usman.ansari@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-24 02:58:29 -05:00
Siva Reddy Kallam	745065770c	RDMA/bng_re: Register and get the resources from bnge driver Register and get the basic required resources from bnge driver. Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20251117171136.128193-4-siva.kallam@broadcom.com Reviewed-by: Usman Ansari <usman.ansari@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-24 02:58:29 -05:00
Siva Reddy Kallam	d0da769c19	RDMA/bng_re: Add Auxiliary interface Add basic Auxiliary interface to the driver which supports the BCM5770X NIC family. Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20251117171136.128193-3-siva.kallam@broadcom.com Reviewed-by: Usman Ansari <usman.ansari@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-24 02:58:29 -05:00
Vikas Gupta	8ac050ec3b	bng_en: Add RoCE aux device support Add an auxiliary (aux) device to support RoCE. The base driver is responsible for creating the auxiliary device and allocating the required resources to it, which will be owned by the bnge RoCE driver in future patches. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Link: https://patch.msgid.link/20251117171136.128193-2-siva.kallam@broadcom.com Reviewed-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-20 10:35:56 -05:00
Kalesh AP	fecaa0c74f	RDMA/bnxt_re: Fix wrong check for CQ coalesc support Driver is not creating the debugfs hooks for CQ coalesc parameters because of a wrong check. Fixed the condition check inside bnxt_re_init_cq_coal_debugfs(). Fixes: `cf27490790` ("RDMA/bnxt_re: Add a debugfs entry for CQE coalescing tuning") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20251117061306.1140588-1-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-20 10:30:52 -05:00
Li RongQing	d056bc45b6	RDMA/core: Prevent soft lockup during large user memory region cleanup When a process exits with numerous large, pinned memory regions consisting of 4KB pages, the cleanup of the memory region through __ib_umem_release() may cause soft lockups. This is because unpin_user_page_range_dirty_lock() is called in a tight loop for unpin and releasing page without yielding the CPU. watchdog: BUG: soft lockup - CPU#44 stuck for 26s! [python3:73464] Kernel panic - not syncing: softlockup: hung tasks CPU: 44 PID: 73464 Comm: python3 Tainted: G OEL asm_sysvec_apic_timer_interrupt+0x1b/0x20 RIP: 0010:free_unref_page+0xff/0x190 ? free_unref_page+0xe3/0x190 __put_page+0x77/0xe0 put_compound_head+0xed/0x100 unpin_user_page_range_dirty_lock+0xb2/0x180 __ib_umem_release+0x57/0xb0 [ib_core] ib_umem_release+0x3f/0xd0 [ib_core] mlx5_ib_dereg_mr+0x2e9/0x440 [mlx5_ib] ib_dereg_mr_user+0x43/0xb0 [ib_core] uverbs_free_mr+0x15/0x20 [ib_uverbs] destroy_hw_idr_uobject+0x21/0x60 [ib_uverbs] uverbs_destroy_uobject+0x38/0x1b0 [ib_uverbs] __uverbs_cleanup_ufile+0xd1/0x150 [ib_uverbs] uverbs_destroy_ufile_hw+0x3f/0x100 [ib_uverbs] ib_uverbs_close+0x1f/0xb0 [ib_uverbs] __fput+0x9c/0x280 ____fput+0xe/0x20 task_work_run+0x6a/0xb0 do_exit+0x217/0x3c0 do_group_exit+0x3b/0xb0 get_signal+0x150/0x900 arch_do_signal_or_restart+0xde/0x100 exit_to_user_mode_loop+0xc4/0x160 exit_to_user_mode_prepare+0xa0/0xb0 syscall_exit_to_user_mode+0x27/0x50 do_syscall_64+0x63/0xb0 Fix soft lockup issues by incorporating cond_resched() calls within __ib_umem_release(), and this SG entries are typically grouped in 2MB chunks on x86_64, adding cond_resched() should has minimal performance impact. Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://patch.msgid.link/20251113095317.2628-1-lirongqing@baidu.com Acked-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-13 08:35:16 -05:00
Kalesh AP	d43358cda7	RDMA/restrack: Fix typos in the comments Fix couple of occurrences of the misspelled word "reource" in the comments with the correct spelling "resource". Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20251113105457.879903-1-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-13 06:27:39 -05:00
Tuo Li	a49a9f4555	RDMA/irdma: Remove redundant NULL check of udata in irdma_create_user_ah() The variable udata cannot be NULL because irdma_create_user_ah() always receives it. Therefore, the if() check can be safely removed. Signed-off-by: Tuo Li <islituo@gmail.com> Link: https://patch.msgid.link/20251112120253.68945-1-islituo@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-12 07:10:25 -05:00
Randy Dunlap	4e5cba5bb6	RDMA/cm: Correct typedef and bad line warnings In include/rdma/ib_cm.h: Correct a typedef's kernel-doc notation by adding the 'typedef' keyword to it to avoid a warning. Add a leading " *" to a kernel-doc line to avoid a warning. Warning: ib_cm.h:289 function parameter 'ib_cm_handler' not described in 'int' Warning: ib_cm.h:289 expecting prototype for ib_cm_handler(). Prototype was for int() instead Warning: ib_cm.h:484 bad line: connection message in case duplicates are received. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251112062908.2711007-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-12 04:39:12 -05:00
Leon Romanovsky	736c561950	Expose definition for 1600Gbps link mode Single patch to expose new link mode for 1600Gbps, utilizing 8 lanes at 200Gbps per lane. Signed-off-by: Leon Romanovsky <leon@kernel.org> * mlx5-next: net/mlx5: Expose definition for 1600Gbps link mode	2025-11-12 03:52:18 -05:00
Tariq Toukan	5422318e27	net/mlx5: Expose definition for 1600Gbps link mode This patch exposes new link mode for 1600Gbps, utilizing 8 lanes at 200Gbps per lane. Co-developed-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762863888-1092798-1-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-12 03:35:14 -05:00
Ma Ke	a338d6e849	RDMA/rtrs: server: Fix error handling in get_or_create_srv After device_initialize() is called, use put_device() to release the device according to kernel device management rules. While direct kfree() work in this case, using put_device() is more correct. Found by code review. Fixes: `9cb8374804` ("RDMA/rtrs: server: main functionality") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Link: https://patch.msgid.link/20251110005158.13394-1-make24@iscas.ac.cn Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-10 03:09:41 -05:00
Marco Crivellari	5c467151f6	IB/isert: add WQ_PERCPU to alloc_workqueue users Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251107133626.190952-1-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 06:30:58 -05:00
Marco Crivellari	65d21dee53	IB/iser: add WQ_PERCPU to alloc_workqueue users Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251107133306.187939-1-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 06:30:48 -05:00
Jacob Moroni	5dd68a5914	RDMA/irdma: Remove unused CQ registry The CQ registry was never actually used (ceq->reg_cq was always NULL), so remove the dead code. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20251105162841.31786-1-jmoroni@google.com Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2025-11-09 06:13:57 -05:00
Patrisious Haddad	6e79e21005	RDMA/mlx5: Add other eswitch support to userspace tables Allows the creation of RDMA TRANSPORT tables over VFs/SFs that belong to another eswitch manager. Which is only possible for PFs that were connected via a create_lag PRM command. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-7-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:21:46 -05:00
Patrisious Haddad	f277662b73	RDMA/mlx5: Refactor _get_prio() function Refactor the _get_prio() function to remove redundant arguments by reusing the existing flow table attributes struct instead of passing attributes separately. This improves code clarity and maintainability. In addition allows downstream patch to add new parameter without needing to change __get_prio() arguments. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-6-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:21:42 -05:00
Patrisious Haddad	5939decc64	RDMA/mlx5: Add other_eswitch support for devx destruction When building a devx object destruction command for steering objects add consideration for other_eswitch argument to allow proper destruction for objects that were created with it. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-5-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:21:38 -05:00
Patrisious Haddad	3506242da0	RDMA/mlx5: Change default device for LAG slaves in RDMA TRANSPORT namespaces In case of a LAG configuration change the root namespace core device for all of the LAG slaves to be the core device of the master device for RDMA_TRANSPORT namespaces, in order to ensure all tables are created through the master device. Once the LAG is disabled revert back to the native core device. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-4-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:21:33 -05:00
Leon Romanovsky	d06ccdc952	Add other eswitch support When the device in switchdev mode, the RDMA device manages all the vports which belong to its representors, which can lead to a situation where the PF that is used to manage the RDMA device isn't the native PF of some of the vports it manages. Add infrastructure to allow the master PF to manage all the hardware resources for the vports under its management. Whereas currently the only such resource is RDMA TRANSPORT steering domains. That is done by adding new FW argument other_eswitch which is passed by the driver to the FW to allow the master PF to properly manage vports belonging to other native PF. Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:17:37 -05:00
Patrisious Haddad	583b4fe1c1	net/mlx5: fs, set non default device per namespace Add mlx5_fs_set_root_dev() function which swaps the root namespace core device with another one for a given table_type. It is intended for usage only by RDMA_TRANSPORT tables in case of LAG configuration, to allow the creation of tables during LAG always through the LAG master device, which is valid since during LAG the master is allowed to manage the RDMA_TRANSPORT tables of its slaves. In addition move the table_type enum to global include to allow its use in a downstream patch in the RDMA driver. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-3-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:16:58 -05:00
Patrisious Haddad	3b848dec7e	net/mlx5: fs, Add other_eswitch support for steering tables Add other_eswitch support which allows flow tables creation above vports that reside on different esw managers. The new flag MLX5_FLOW_TABLE_OTHER_ESWITCH indicates if the esw_owner_vhca_id attribute is supported. Note that this is only supported if the Advanced-RDMA cap- rdma_transport_manager_other_eswitch is set. And it is the caller responsibility to check that. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-2-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:16:53 -05:00
Patrisious Haddad	6948417b3f	net/mlx5: Add OTHER_ESWITCH HW capabilities Add OTHER_ESWITCH capabilities which includes other_eswitch and eswitch_owner_vhca_id to all steering objects. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251029-support-other-eswitch-v1-1-98bb707b5d57@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:16:47 -05:00
Yishai Hadas	2d838c11e1	net/mlx5: Add direct ST mode support for RDMA Add support for direct ST mode where ST Table Location equals PCI_TPH_LOC_NONE. In that case, no steering table exists, the steering tag itself will be used directly by the SW, FW, HW from the mkey. This enables RDMA users to use the current exposed APIs to work in direct mode. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251027-st-direct-mode-v1-2-e0ad953866b6@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:13:34 -05:00
Yishai Hadas	7b8a8ec20c	PCI/TPH: Expose pcie_tph_get_st_table_loc() Expose pcie_tph_get_st_table_loc() to be used by drivers as will be done in the next patch from the series. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20251027-st-direct-mode-v1-1-e0ad953866b6@nvidia.com Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 05:13:02 -05:00
Kalesh AP	cf27490790	RDMA/bnxt_re: Add a debugfs entry for CQE coalescing tuning This patch adds debugfs interfaces that allows the user to enable/disable the RoCE CQ coalescing and fine tune certain CQ coalescing parameters which would be helpful during debug. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20251103043425.234846-1-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-09 04:02:27 -05:00
Randy Dunlap	512c832657	IB/rdmavt: rdmavt_qp.h: clean up kernel-doc comments Correct the kernel-doc comments format to avoid around 35 kernel-doc warnings: - use struct keyword to introduce struct kernel-doc comments - use correct variable name for some struct members - use correct function name in comments for some functions - fix spelling in a few comments - use a ':' instead of '-' to separate struct members from their descriptions - add a function name heading for rvt_div_mtu() This leaves one struct member that is not described: rdmavt_qp.h:206: warning: Function parameter or struct member 'wq' not described in 'rvt_krwq' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251105045127.106822-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Marco Crivellari	7196156b0c	IB/rdmavt: WQ_PERCPU added to alloc_workqueue users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. CC: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251101163121.78400-6-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Marco Crivellari	5267feda50	RDMA/mlx4: WQ_PERCPU added to alloc_workqueue users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. CC: Yishai Hadas <yishaih@nvidia.com> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251101163121.78400-5-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Marco Crivellari	5f93287fa9	hfi1: WQ_PERCPU added to alloc_workqueue users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. CC: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251101163121.78400-4-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Marco Crivellari	e60c5583b6	RDMA/core: WQ_PERCPU added to alloc_workqueue users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251101163121.78400-3-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Marco Crivellari	f673fb3449	RDMA/core: RDMA/mlx5: replace use of system_unbound_wq with system_dfl_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251101163121.78400-2-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Jay Bhat	da58d4223b	RDMA/irdma: Take a lock before moving SRQ tail in poll_cq Need to take an SRQ lock in poll_cq before moving SRQ tail. Signed-off-by: Jay Bhat <jay.bhat@intel.com> Reviewed-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-7-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Jay Bhat	0a19274555	RDMA/irdma: CQ size and shadow update changes for GEN3 CQ shadow area should not be updated at the end of a page (once every 64th CQ entry), except when CQ has no more CQEs. SW must also increase the requested CQ size by 1 and make sure the CQ is not exactly one page in size. This is to address a quirk in the hardware. Signed-off-by: Jay Bhat <jay.bhat@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-4-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-02 06:52:58 -05:00
Jay Bhat	cd84d8001e	RDMA/irdma: Silently consume unsignaled completions In case we get an unsignaled error completion, we silently consume the CQE by pretending the QP does not exist. Without this, bookkeeping for signaled completions does not work correctly. Signed-off-by: Jay Bhat <jay.bhat@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-5-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-02 06:52:51 -05:00
Jay Bhat	153243086e	RDMA/irdma: Initialize cqp_cmds_info to prevent resource leaks Failure to initialize info.create field to false in certain cases was resulting in incorrect status code going to rdma-core when dereg_mr failed during reset. To fix this, memset entire cqp_request->info in irdma_alloc_and_get_cqp_request() function, so that this is not spread all over the code. Signed-off-by: Bhat, Jay <jay.bhat@intel.com> Reviewed-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-2-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-02 06:52:44 -05:00
Jacob Moroni	69e8e429bc	RDMA/irdma: Enforce local fence for LOCAL_INV WRs Enforce local fence for LOCAL_INV WRs to avoid spurious FASTREG_VALID_MKEY async events during heavy invalidation/registration activity. Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-3-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-02 06:52:33 -05:00
Zhu Yanjun	503a5e4690	RDMA/rxe: Fix null deref on srq->rq.queue after resize failure A NULL pointer dereference can occur in rxe_srq_chk_attr() when ibv_modify_srq() is invoked twice in succession under certain error conditions. The first call may fail in rxe_queue_resize(), which leads rxe_srq_from_attr() to set srq->rq.queue = NULL. The second call then triggers a crash (null deref) when accessing srq->rq.queue->buf->index_mask. Call Trace: <TASK> rxe_modify_srq+0x170/0x480 [rdma_rxe] ? __pfx_rxe_modify_srq+0x10/0x10 [rdma_rxe] ? uverbs_try_lock_object+0x4f/0xa0 [ib_uverbs] ? rdma_lookup_get_uobject+0x1f0/0x380 [ib_uverbs] ib_uverbs_modify_srq+0x204/0x290 [ib_uverbs] ? __pfx_ib_uverbs_modify_srq+0x10/0x10 [ib_uverbs] ? tryinc_node_nr_active+0xe6/0x150 ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2c0/0x470 [ib_uverbs] ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs] ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs] ib_uverbs_run_method+0x55a/0x6e0 [ib_uverbs] ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs] ib_uverbs_cmd_verbs+0x54d/0x800 [ib_uverbs] ? __pfx_ib_uverbs_cmd_verbs+0x10/0x10 [ib_uverbs] ? __pfx___raw_spin_lock_irqsave+0x10/0x10 ? __pfx_do_vfs_ioctl+0x10/0x10 ? ioctl_has_perm.constprop.0.isra.0+0x2c7/0x4c0 ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10 ib_uverbs_ioctl+0x13e/0x220 [ib_uverbs] ? __pfx_ib_uverbs_ioctl+0x10/0x10 [ib_uverbs] __x64_sys_ioctl+0x138/0x1c0 do_syscall_64+0x82/0x250 ? fdget_pos+0x58/0x4c0 ? ksys_write+0xf3/0x1c0 ? __pfx_ksys_write+0x10/0x10 ? do_syscall_64+0xc8/0x250 ? __pfx_vm_mmap_pgoff+0x10/0x10 ? fget+0x173/0x230 ? fput+0x2a/0x80 ? ksys_mmap_pgoff+0x224/0x4c0 ? do_syscall_64+0xc8/0x250 ? do_user_addr_fault+0x37b/0xfe0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `8700e3e7c4` ("Soft RoCE driver") Tested-by: Liu Yi <asatsuyu.liu@gmail.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20251027215203.1321-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-28 04:04:07 -04:00
Håkon Bugge	58aca1f3de	RDMA/cm: Base cm_id destruction timeout on CMA values When a GSI MAD packet is sent on the QP, it will potentially be retried CMA_MAX_CM_RETRIES times with a timeout value of: 4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT The above equates to ~64 seconds using the default CMA values. The cm_id_priv's refcount will be incremented for this period. Therefore, the timeout value waiting for a cm_id destruction must be based on the effective timeout of MAD packets. To provide additional leeway, we add 25% to this timeout and use that instead of the constant 10 seconds timeout, which may result in false negatives. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Link: https://patch.msgid.link/20251021132738.4179604-1-haakon.bugge@oracle.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-27 07:36:39 -04:00
Adithya Jayachandran	eea31f21dc	{rdma,net}/mlx5: Query vports mac address from device Before this patch during either switchdev or legacy mode enablement we cleared the mac address of vports between changes. This change allows us to preserve the vports mac address between eswitch mode changes. Vports hold information for VFs/SFs such as the permanent mac address. VF/SF mac can be set either by iproute vf interface or devlink function interface. For no obvious reason we reset it to 0 on switchdev/legacy mode changes, this patch is fixing that, to align with other vport information that are never reset, e.g GUID,mtu,promisc mode, etc .. Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Leon Romanovsky <leon@kernel.org> # RDMA	2025-10-24 20:16:01 -07:00
Randy Dunlap	be180c847a	RDMA/uverbs: fix some kernel-doc warnings Fix 49 kernel-doc warnings in ib_verbs.h: - Add struct short description for rdma_stat_desc, rdma_hw_stats. - Fix kernel-doc format for struct members (use ':' instead of '-') for several structs. - Don't use "/**" kernel-doc notation for struct members in ib_device_ops (most members are not documented and most of the kernel-doc was not formatted correctly). - Spell function parameters correctly in ib_dma_map_sgtable_attrs(), ib_device_try_get(), rdma_roce_rescan_device(). - Add kernel-doc for the function parameter in rdma_flow_label_to_udp_sport(). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251020034320.3011094-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-20 11:45:46 -04:00
Colin Ian King	1511efaca0	RDMA/rxe: Remove redundant assignment to variable page_offset The variable page_offset is being assigned a value at the start of a loop and being redundantly zero'd at the end of the loop, there is no code that reads the zero'd value. The assignment is redundant and can be removed. Signed-off-by: Colin Ian King <coking@nvidia.com> Link: https://patch.msgid.link/20251014120343.2528608-1-coking@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-19 08:06:21 -04:00
Stefan Metzmacher	879424832d	RDMA/core: let rdma_connect_locked() call lockdep_assert_held(&id_priv->handler_mutex) rdma_accept() also has this, so this is now more consistent and may prevent bugs in future. Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20251008165913.444276-1-metze@samba.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-19 07:03:25 -04:00
Alok Tiwari	cfd51fcf11	RDMA/cxgb4: fix typo in write_pbl() debug message Correct the debug log format string from "pdb_addr" -> "pbl_addr" to match the actual variable name. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20250929091102.50384-1-alok.a.tiwari@oracle.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-19 06:51:46 -04:00
Linus Torvalds	3a86608788	Linux 6.18-rc1 v6.18-rc1	2025-10-12 13:42:36 -07:00
Linus Torvalds	3dd7b81235	Merge tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "One revert because of a regression in the I2C core which has sadly not showed up during its time in -next" * tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: boardinfo: Annotate code used in init phase only"	2025-10-12 13:27:56 -07:00
Linus Torvalds	8765f46791	Merge tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Skip interrupt ID 0 in sifive-plic during suspend/resume because ID 0 is reserved and accessing reserved register space could result in undefined behavior - Fix a function's retval check in aspeed-scu-ic * tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume irqchip/aspeed-scu-ic: Fix an IS_ERR() vs NULL check	2025-10-12 08:45:52 -07:00
Linus Torvalds	67029a49db	Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "The previous fix to trace_marker required updating trace_marker_raw as well. The difference between trace_marker_raw from trace_marker is that the raw version is for applications to write binary structures directly into the ring buffer instead of writing ASCII strings. This is for applications that will read the raw data from the ring buffer and get the data structures directly. It's a bit quicker than using the ASCII version. Unfortunately, it appears that our test suite has several tests that test writes to the trace_marker file, but lacks any tests to the trace_marker_raw file (this needs to be remedied). Two issues came about the update to the trace_marker_raw file that syzbot found: - Fix tracing_mark_raw_write() to use per CPU buffer The fix to use the per CPU buffer to copy from user space was needed for both the trace_maker and trace_maker_raw file. The fix for reading from user space into per CPU buffers properly fixed the trace_marker write function, but the trace_marker_raw file wasn't fixed properly. The user space data was correctly written into the per CPU buffer, but the code that wrote into the ring buffer still used the user space pointer and not the per CPU buffer that had the user space data already written. - Stop the fortify string warning from writing into trace_marker_raw After converting the copy_from_user_nofault() into a memcpy(), another issue appeared. As writes to the trace_marker_raw expects binary data, the first entry is a 4 byte identifier. The entry structure is defined as: struct { struct trace_entry ent; int id; char buf[]; }; The size of this structure is reserved on the ring buffer with: size = sizeof(entry) + cnt; Then it is copied from the buffer into the ring buffer with: memcpy(&entry->id, buf, cnt); This use to be a copy_from_user_nofault(), but now converting it to a memcpy() triggers the fortify-string code, and causes a warning. The allocated space is actually more than what is copied, as the cnt used also includes the entry->id portion. Allocating sizeof(entry) plus cnt is actually allocating 4 bytes more than what is needed. Change the size function to: size = struct_size(entry, buf, cnt - sizeof(entry->id)); And update the memcpy() to unsafe_memcpy()" * tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Stop fortify-string from warning in tracing_mark_raw_write() tracing: Fix tracing_mark_raw_write() to use buf and not ubuf	2025-10-11 16:06:04 -07:00
Linus Torvalds	c04022dccb	Merge tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Fix UAPI types check in headers_check.pl - Only enable -Werror for hostprogs with CONFIG_WERROR / W=e - Ignore fsync() error when output of gen_init_cpio is a pipe - Several little build fixes for recent modules.builtin.modinfo series * tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols s390/vmlinux.lds.S: Move .vmlinux.info to end of allocatable sections kbuild: Add '.rel.*' strip pattern for vmlinux kbuild: Restore pattern to avoid stripping .rela.dyn from vmlinux gen_init_cpio: Ignore fsync() returning EINVAL on pipes scripts/Makefile.extrawarn: Respect CONFIG_WERROR / W=e for hostprogs kbuild: uapi: Strip comments before size type check	2025-10-11 15:47:12 -07:00

1 2 3 4 5 ...

1396693 Commits