Commit Graph

1427409 Commits

Author SHA1 Message Date
Marco Crivellari
2bb02691df RDMA/rxe: Replace use of system_unbound_wq with rxe_wq
This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

This specific driver already allocate an unbound workqueue named "rxe_wq",
so replace system_unbound_wq with this one instead of system_dfl_wq.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20260318152748.837388-1-marco.crivellari@suse.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-30 13:47:43 -04:00
Jacob Moroni
f3cf74933c RDMA/irdma: Add support for GEN4 hardware
GEN4 hardware is similar to GEN3 and requires only a few special cases.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-30 13:47:37 -04:00
Jay Bhat
9d6ba4ced7 RDMA/irdma: Provide scratch buffers to firmware for internal use
For GEN3 and higher, FW requires scratch buffers for bookkeeping
during cleanup, specifically during QP and MR destroy ops.

Signed-off-by: Jay Bhat <jay.bhat@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-30 13:46:56 -04:00
Michael Margolin
5aeb6e0399 RDMA/efa: Rename alloc_ucontext comp_mask to supported_caps
Following discussion [1], rename the comp_mask field in
efa_ibv_alloc_ucontext_cmd to supported_caps to reflect its actual
usage as a capabilities handshake mechanism rather than a standard
comp_mask. Rename related constants and align function and macro names.

[1] https://lore.kernel.org/linux-rdma/20260312120858.GH1448102@nvidia.com/

Signed-off-by: Michael Margolin <mrgolin@amazon.com>
Link: https://patch.msgid.link/20260316180846.30273-1-mrgolin@amazon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17 07:04:03 -04:00
Dean Luick
6be4ca0ab3 RDMA/rdmavt: Add driver mmap callback
Add a reserved range and a driver callback to allow the driver to
have custom mmaps.

Generated mmap offsets are cookies and are not related to the size of
the mmap.  Advance the mmap offset by the minimum, PAGE_SIZE, rather
than the size of the mmap.

Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308909972.1279894.15543003811821875042.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-11 15:17:28 -04:00
Dean Luick
0fed679e08 RDMA/rdmavt: Correct multi-port QP iteration
When finding special QPs, the iterator makes an incorrect port
index calculation.  Fix the calculation.

Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308909468.1279894.5073405674644246445.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-11 15:17:28 -04:00
Dean Luick
679eb25de4 RDMA/rdmavt: Add ucontext alloc/dealloc passthrough
Add a private data pointer to the ucontext structure and add
per-client pass-throughs.

Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177325008318.52243.7367786996925601681.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-11 15:17:28 -04:00
Dean Luick
786ee8ddf4 RDMA/OPA: Update OPA link speed list
Update the list of available link speeds.  Fix comments.

Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308908456.1279894.16723781060261360236.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-11 15:17:28 -04:00
Rosen Penev
56521f5877 IB/hfi1: kzalloc to kzalloc_flex
Combine kzalloc and kcalloc with a flexible array member. Avoids having
to free separately.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260309215017.4753-1-rosenp@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-10 14:04:09 -04:00
Dennis Dalessandro
1b50f42049 RDMA/hfi1: Remove opa_vnic
OPA Vnic has been abandoned and left to rot. Time to excise.

Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308912950.1280237.15051663328388849915.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-10 07:51:30 -04:00
Rosen Penev
2afa8b9f5f RDMA/ocrdma: kzalloc_objs to kzalloc_flex
Simplify allocation by eliminating one. No longer need to kfree pages
separately.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260308201419.5260-1-rosenp@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-10 06:22:59 -04:00
Jacob Moroni
4707bf5f6c RDMA/irdma: Add support for revocable pinned dmabuf import
Use the new API to support importing pinned dmabufs from exporters
that require revocation, such as VFIO. The revoke semantic is
achieved by issuing a HW invalidation command but not freeing
the key. This prevents further accesses to the region (they will
result in an invalid key AE), but also keeps the key reserved
until the region is actually deregistered (i.e., ibv_dereg_mr)
so that a new MR registration cannot acquire the same key.

Tested with lockdep+kasan and a memfd backed dmabuf.

The rereg_mr path is explicitly blocked in libibverbs for dmabuf MRs
(more specifically, any MR not of type IBV_MR_TYPE_MR), so the rereg_mr
path for dmabufs was tested with a modified libibverbs.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-6-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08 08:37:38 -04:00
Jacob Moroni
3a0b171302 RDMA/umem: Add helpers for umem dmabuf revoke lock
Added helpers to acquire and release the umem dmabuf revoke
lock. The intent is to avoid the need for drivers to peek
into the ib_umem_dmabuf internals to get the dma_resv_lock
and bring us one step closer to abstracting ib_umem_dmabuf
away from drivers in general.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-5-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08 08:37:38 -04:00
Jacob Moroni
ff85a2ebac RDMA/umem: Add pinned revocable dmabuf import interface
Added an interface for importing a pinned but revocable dmabuf.
This interface can be used by drivers that are capable of revocation
so that they can import dmabufs from exporters that may require it,
such as VFIO.

This interface implements a two step process, where drivers will first
call ib_umem_dmabuf_get_pinned_revocable_and_lock() which will pin and
map the dmabuf (and provide a functional move_notify/invalidate_mappings
callback), but will return with the lock still held so that the
driver can then populate the callback via
ib_umem_dmabuf_set_revoke_locked() without races from concurrent
revocations. This scheme also allows for easier integration with drivers
that may not have actually allocated their internal MR objects at the time
of the get_pinned_revocable* call.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-4-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08 08:37:38 -04:00
Jacob Moroni
797291a66c RDMA/umem: Move umem dmabuf revoke logic into helper function
This same logic will eventually be reused from within the
invalidate_mappings callback which already has the dma_resv_lock
held, so break it out into a separate function so it can be reused.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-3-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08 08:37:38 -04:00
Jacob Moroni
553dfa8cbd RDMA/umem: Add ib_umem_dmabuf_get_pinned_and_lock helper
Move the inner logic of ib_umem_dmabuf_get_pinned_with_dma_device()
to a new static function that returns with the lock held upon success.

The intent is to allow reuse for the future get_pinned_revocable_and_lock
function.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-2-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08 08:37:38 -04:00
Marco Crivellari
1dc469f669 RDMA/rtrs: add WQ_PERCPU to alloc_workqueue users
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

   commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20260305154117.326472-1-marco.crivellari@suse.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08 07:02:43 -04:00
Sriharsha Basavapatna
a06165a705 RDMA/bnxt_re: Support application specific CQs
This patch supports application allocated memory for CQs.

The application allocates and manages the CQs directly. To support
this, the driver exports a new comp_mask to indicate direct control
of the CQ. When this comp_mask bit is set in the ureq, the driver
maps this application allocated CQ memory into hardware. As the
application manages this memory, the CQ depth ('cqe') passed by it
must be used as is and the driver shouldn't update it.

For CQs, ib_core supports pinning dmabuf based application memory,
specified through provider attributes. This umem is mananged by the
ib_core and is available in ib_cq. Register 'create_cq_user' devop
to process this umem. The driver also supports the legacy interface
that allocates umem internally.

Link: https://patch.msgid.link/r/20260302110036.36387-7-sriharsha.basavapatna@broadcom.com
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Sriharsha Basavapatna
cec5157b6c RDMA/bnxt_re: Separate kernel and user CQ creation paths
This patch refactors kernel and user CQ creation logic into
two separate code paths. This will be used to support dmabuf
based user CQ memory in the next patch. There is no functional
change in this patch.

Link: https://patch.msgid.link/r/20260302110036.36387-6-sriharsha.basavapatna@broadcom.com
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Sriharsha Basavapatna
3d4a42360c RDMA/bnxt_re: Refactor bnxt_re_create_cq()
Some applications may allocate dmabuf based memory for CQs. To support
this, update the existing code to use SZ_4K to specify supported HW
page size for CQs, as we support only 4K pages for now.
Call ib_umem_find_best_pgsz() to ensure umem supports this requested
page size. A helper function includes these changes.

Link: https://patch.msgid.link/r/20260302110036.36387-5-sriharsha.basavapatna@broadcom.com
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Kalesh AP
1234a9d8ae RDMA/bnxt_re: Support doorbell extensions
Some applications may need multiple doorbells to support parallel
processing of threads that each operate on a group of resources.

The following uapi methods have been implemented in this patch.

- BNXT_RE_METHOD_DBR_ALLOC:
  This will allow the appliation to create extra doorbell regions
  and use the associated doorbell page index in CREATE_QP and
  use the associated DB address while ringing the doorbell.

- BNXT_RE_METHOD_DBR_FREE:
  Free the allocated doorbell region.

- BNXT_RE_METHOD_GET_DEFAULT_DBR:
  Return the default doorbell page index and doorbell page address
  associated with the ucontext.

Link: https://patch.msgid.link/r/20260302110036.36387-4-sriharsha.basavapatna@broadcom.com
Co-developed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Kalesh AP
13f9a813ee RDMA/bnxt_re: Refactor bnxt_qplib_create_qp() function
Inside bnxt_qplib_create_qp(), driver currently is doing
a lot of things like allocating HWQ memory for SQ/RQ/ORRQ/IRRQ,
initializing few of qplib_qp fields etc.

Refactored the code such that all memory allocation for HWQs
have been moved to bnxt_re_init_qp_attr() function and inside
bnxt_qplib_create_qp() function just initialize the request
structure and issue the HWRM command to firmware.

Introduced couple of new functions bnxt_re_setup_qp_hwqs() and
bnxt_re_setup_qp_swqs() moved the hwq and swq memory allocation
logic there.

Link: https://patch.msgid.link/r/20260302110036.36387-3-sriharsha.basavapatna@broadcom.com
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Kalesh AP
eee6268421 RDMA/bnxt_re: Move the UAPI methods to a dedicated file
This is in preparation for upcoming patches in the series.
Driver has to support additional UAPIs for some applications.
Moving current UAPI implementation to a new file, uapi.c.

Link: https://patch.msgid.link/r/20260302110036.36387-2-sriharsha.basavapatna@broadcom.com
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
613713f251 RDMA: Add IB_UVERBS_CORE_SUPPORT_ROBUST_UDATA
This flag can be set by drivers once they have finished auditing and
implementing the full udata support on every udata operation.

My intention going forward is that driver authors proposing new udata uAPI
for their drivers must first do the work and set this flag.

If this flag is not set the userspace should not try to use udata based
uAPI newer than this commit, though on a case by case basis it may be OK
based on what checks historical kernels performed on the specific call.

Since bnxt_re is audited now, it is the first driver to set the flag.

Link: https://patch.msgid.link/r/13-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
bed686d8dc RDMA/bnxt_re: Use ib_respond_empty_udata()
Like ib_is_udata_in_empty() for the request side ib_respond_empty_udata()
is called on the response side if there no response struct.

Link: https://patch.msgid.link/r/12-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
bc30311e49 RDMA/bnxt_re: Use ib_respond_udata()
All the calls to ib_copy_to_udata() can use this helper safely.

Link: https://patch.msgid.link/r/11-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
0cee3acab2 RDMA/bnxt_re: Add missing comp_mask validation
Two existing req driver data structures have comp_mask but nothing
checks them for valid contents. Add the missing checks.

Link: https://patch.msgid.link/r/10-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
3f6b103c4b RDMA/bnxt_re: Add compatibility checks to the uapi path for no data
If drivers ever want to go from an empty drvdata to something with them
they need to have called ib_is_udata_in_empty(). Add the missing calls to
all the system calls that don't have req structures.

Link: https://patch.msgid.link/r/9-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
b33d860a13 RDMA/bnxt_re: Add compatibility checks to the uapi path
Check that the driver data is properly sized and properly zeroed by
calling ib_copy_validate_udata_in().

Use git history to find the commit introducing each req struct and use
that to select the end member.

Link: https://patch.msgid.link/r/8-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
5ebe8832ef RDMA: Provide documentation about the uABI compatibility rules
Write down how all of this is supposed to work using the new helpers.

Link: https://patch.msgid.link/r/7-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
4c379ba04c RDMA: Add ib_is_udata_in_empty()
If the driver doesn't yet support any request driver data it should check
that it is all zeroed. This is a common pattern, add a helper around
_ib_copy_validate_udata_in() to do this.

Link: https://patch.msgid.link/r/6-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
14badc323e RDMA: Add ib_respond_udata()
Wrap the common copy_to_user() pattern used in drivers and enhance it
to zero pad as well. Include debug logging on failures.

Link: https://patch.msgid.link/r/5-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
dbf6491bb9 RDMA: Add ib_copy_validate_udata_in_cm()
For structures with comp_mask also absorb the check of comp_mask valid
bits into the helper. This is slightly tricky because ~ might not fully
extend to 64 bits, the helper inserts an explicit type to ensure that ~
covers all bits.

Link: https://patch.msgid.link/r/4-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
1de9287ece RDMA: Add ib_copy_validate_udata_in()
Add a new function to consolidate the required compatibility pattern for
driver data of checking against a minimum size, and checking for unknown
trailing bytes to be zero into a function.

This new function uses the faster copy_struct_from_user() instead of
trying to directly check for zero.

Incorporate the common ibdev_dbg() logging directly into the error paths
of the helper.

Link: https://patch.msgid.link/r/3-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:25 -04:00
Jason Gunthorpe
b51caeb24a RDMA/core: Add rdma_udata_to_dev()
Get an ib_device out of a udata so it can be used for debug prints.

Link: https://patch.msgid.link/r/2-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:24 -04:00
Jason Gunthorpe
38a6e5579d RDMA: Use copy_struct_from_user() instead of open coding
This entire function is just open coding copy_struct_from_user(), call it
directly, it is faster anyhow.

Link: https://patch.msgid.link/r/1-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08 06:20:24 -04:00
Maher Sanalla
75b864f087 RDMA/mlx5: Add support for TLP VAR allocation
Extend the VAR allocation UAPI to accept an optional flags attribute,
allowing userspace to request TLP VAR allocation via the
MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP flag.

When the TLP flag "MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP" is specified,
the driver selects the TLP VAR region for allocation instead of the
regular VirtIO VAR region.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05 05:42:01 -05:00
Maher Sanalla
d3552a1f1e RDMA/mlx5: Add TLP VAR region support and infrastructure
Add support for TLP (Transaction Layer Packet) VAR regions used by
software-defined device emulation. TLP VAR provides dedicated response
gateways for sending TLP responses back to the host in TLP emulation
scenarios.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05 05:42:01 -05:00
Maher Sanalla
ea6641828d RDMA/mlx5: Refactor VAR table to use region abstraction
Extract mlx5_var_region struct from mlx5_var_table to enable
supporting multiple VAR regions in VAR table, which will be used in
the upcoming patches (Virtio emulation VAR and TLP emulation VAR).

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05 05:42:01 -05:00
Leon Romanovsky
f63f1d74e9 Add support for TLP emulation
This series adds support for Transaction Layer Packet (TLP) emulation
response gateway regions, enabling userspace device emulation software
to write TLP responses directly to lower layers without kernel driver
involvement.

Currently, the mlx5 driver exposes VirtIO emulation access regions via
the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
ioctl to also support allocating TLP response gateway channels for
PCI device emulation use cases.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05 05:41:16 -05:00
Cheng Xu
f30bc6f9b9 RDMA/erdma: Remove numa_node from struct erdma_devattr
Using dev_to_node() to get the pci device's numa information
instead of caching it in struct erdma_devattr.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://patch.msgid.link/20260305062929.58881-1-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05 05:41:02 -05:00
Leon Romanovsky
9d2994f97d RDMA/core: Delete not-implemented get_vector_affinity
No drivers implement .get_vector_affinity(), and no callers invoke
ib_get_vector_affinity(), so remove it.

Link: https://patch.msgid.link/20260226-get_vector_affinity-v1-1-910a899c4e5d@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05 05:41:02 -05:00
Michael Guralnik
dbd0472fd7 RDMA/nldev: Expose kernel-internal FRMR pools in netlink
Allow netlink users, through the usage of driver-details netlink
attribute, to get information about internal FRMR pools that use the
kernel_vendor_key FRMR key member.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-11-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05 05:41:02 -05:00
Michael Guralnik
da73d7634f RDMA/nldev: Add command to set pinned FRMR handles
Allow users to set through netlink, for a specific FRMR pool, the amount
of handles that are not aged, and fill the pool to this amount.

This allows users to warm-up the FRMR pools to an expected amount of
handles with specific attributes that fits their expected usage.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-10-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05 05:41:02 -05:00
Maher Sanalla
385a06f74f net/mlx5: Expose TLP emulation capabilities
Expose and query TLP device emulation caps on driver load.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05 05:33:58 -05:00
Maher Sanalla
01b7768578 net/mlx5: Add TLP emulation device capabilities
Introduce the hardware structures and definitions needed for the driver
support of TLP emulation in mlx5_ifc.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05 05:33:58 -05:00
Michael Guralnik
d2ea675e86 RDMA/core: Add netlink command to modify FRMR aging
Allow users to set FRMR pools aging timer through netlink.
This functionality will allow user to control how long handles reside in
the kernel before being destroyed, thus being able to tune the tradeoff
between memory and HW object consumption and memory registration
optimization.
Since FRMR pools is highly beneficial for application restart scenarios,
this command allows users to modify the aging timer to their application
restart time, making sure the FRMR handles deregistered on application
teardown are kept for long enough in the pools for reuse in the
application startup.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-9-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02 13:45:37 -05:00
Michael Guralnik
50c035976a RDMA/nldev: Add command to get FRMR pools
Add support for a new command in netlink to dump to user the state of
the FRMR pools on the devices.
Expose each pool with its key and the usage statistics for it.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-8-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02 13:45:34 -05:00
Michael Guralnik
ba51cf9fcf net/mlx5: Drop MR cache related code
Following mlx5_ib move to using FRMR pools, drop all unused code of MR
cache.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-7-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02 13:45:30 -05:00
Michael Guralnik
36680ef7bc RDMA/mlx5: Switch from MR cache to FRMR pools
Use the new generic FRMR pools mechanism to optimize the performance of
memory registrations.
The move to the new generic FRMR pools will allow users configuring MR
cache through debugfs of MR cache to use the netlink API for FRMR pools
which will be added later in this series. Thus being able to have more
flexibility configuring the kernel and also being able to configure on
machines where debugfs is not available.

Mlx5_ib will save the mkey index as the handle in FRMR pools, same as the
MR cache implementation.
Upon each memory registration mlx5_ib will try to pull a handle from FRMR
pools and upon each deregistration it will push the handle back to it's
appropriate pool.

Use the vendor key field in umr pool key to save the access mode of the
mkey.

Use the option for kernel-only FRMR pool to manage the mkeys used for
registration with DMAH as the translation between UAPI of DMAH and the
mkey property of st_index is non-trivial and changing dynamically.
Since the value for no PH is 0xff and not zero, switch between them in
the frmr_key to have a zero'ed kernel_vendor_key when not using DMAH.

Remove the limitation we had with MR cache for mkeys up to 2^20 dma
blocks and support mkeys up to HW limitations according to caps.

Remove all MR cache related code.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-6-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02 13:45:19 -05:00