Commit Graph

14981 Commits

Author SHA1 Message Date
Akiva Goldberger
e5eba42f01 mlx5: Fix default values in create CQ
Currently, CQs without a completion function are assigned the
mlx5_add_cq_to_tasklet function by default. This is problematic since
only user CQs created through the mlx5_ib driver are intended to use
this function.

Additionally, all CQs that will use doorbells instead of polling for
completions must call mlx5_cq_arm. However, the default CQ creation flow
leaves a valid value in the CQ's arm_db field, allowing FW to send
interrupts to polling-only CQs in certain corner cases.

These two factors would allow a polling-only kernel CQ to be triggered
by an EQ interrupt and call a completion function intended only for user
CQs, causing a null pointer exception.

Some areas in the driver have prevented this issue with one-off fixes
but did not address the root cause.

This patch fixes the described issue by adding defaults to the create CQ
flow. It adds a default dummy completion function to protect against
null pointer exceptions, and it sets an invalid command sequence number
by default in kernel CQs to prevent the FW from sending an interrupt to
the CQ until it is armed. User CQs are responsible for their own
initialization values.

Callers of mlx5_core_create_cq are responsible for changing the
completion function and arming the CQ per their needs.

Fixes: cdd04f4d4d ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11 15:12:18 +01:00
Jay Bhat
3202587837 RDMA/irdma: Fix vf_id size to u16 to avoid overflow
Correctly size the vf_id to u16 to avoid overflow.

Signed-off-by: Jay Bhat <jay.bhat@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251031021726.1003-6-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-02 06:46:01 -05:00
Guofeng Yue
b8c9aab4c7 RDMA/hns: Remove an extra blank line
Remove an extra blank line.

Signed-off-by: Guofeng Yue <yueguofeng@h-partners.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-27 05:44:00 -04:00
Junxian Huang
fe9622011f RDMA/hns: Fix wrong WQE data when QP wraps around
When QP wraps around, WQE data from the previous use at the same
position still remains as driver does not clear it. The WQE field
layout differs across different opcodes, causing that the fields
that are not explicitly assigned for the current opcode retain
stale values, and are issued to HW by mistake. Such fields are as
follows:

* MSG_START_SGE_IDX field in ATOMIC WQE
* BLOCK_SIZE and ZBVA fields in FRMR WQE
* DirectWQE fields when DirectWQE not used

For ATOMIC WQE, always set the latest sge index in MSG_START_SGE_IDX
as required by HW.

For FRMR WQE and DirectWQE, clear only those unassigned fields
instead of the entire WQE to avoid performance penalty.

Fixes: 68a997c5d2 ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-27 05:44:00 -04:00
wenglianfa
f5a7cbea54 RDMA/hns: Fix the modification of max_send_sge
The actual sge number may exceed the value specified in init_attr->cap
when HW needs extra sge to enable inline feature. Since these extra
sges are not expected by ULP, return the user-specified value to ULP
instead of the expanded sge number.

Fixes: 0c5e259b06 ("RDMA/hns: Fix incorrect sge nums calculation")
Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-27 05:44:00 -04:00
Chengchang Tang
c4b67b514a RDMA/hns: Fix recv CQ and QP cache affinity
Currently driver enforces affinity between QP cache and send CQ
cache, which helps improve the performance of sending, but doesn't
set affinity with recv CQ cache, resulting in suboptimal performance
of receiving.

Use one CQ bank per context to ensure the affinity among QP, send CQ
and recv CQ. For kernel ULP, CQ bank is fixed to 0.

Fixes: 9e03dbea2b ("RDMA/hns: Fix CQ and QP cache affinity")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-27 05:44:00 -04:00
Shuhao Fu
d8713158fa RDMA/uverbs: Fix umem release in UVERBS_METHOD_CQ_CREATE
In `UVERBS_METHOD_CQ_CREATE`, umem should be released if anything goes
wrong. Currently, if `create_cq_umem` fails, umem would not be
released or referenced, causing a possible leak.

In this patch, we release umem at `UVERBS_METHOD_CQ_CREATE`, the driver
should not release umem if it returns an error code.

Fixes: 1a40c362ae ("RDMA/uverbs: Add a common way to create CQ with umem")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Link: https://patch.msgid.link/aOh1le4YqtYwj-hH@osx.local
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-19 07:31:25 -04:00
Jacob Moroni
5575b7646b RDMA/irdma: Set irdma_cq cq_num field during CQ create
The driver maintains a CQ table that is used to ensure that a CQ is
still valid when processing CQ related AEs. When a CQ is destroyed,
the table entry is cleared, using irdma_cq.cq_num as the index. This
field was never being set, so it was just always clearing out entry
0.

Additionally, the cq_num field size was increased to accommodate HW
supporting more than 64K CQs.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20250923142439.943930-1-jmoroni@google.com
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-19 07:02:11 -04:00
Jacob Moroni
8d158f47f1 RDMA/irdma: Fix SD index calculation
In some cases, it is possible for pble_rsrc->next_fpm_addr to be
larger than u32, so remove the u32 cast to avoid unintentional
truncation.

This fixes the following error that can be observed when registering
massive memory regions:

[  447.227494] (NULL ib_device): cqp opcode = 0x1f maj_err_code = 0xffff min_err_code = 0x800c
[  447.227505] (NULL ib_device): [Update PE SDs Cmd Error][op_code=21] status=-5 waiting=1 completion_err=1 maj=0xffff min=0x800c

Fixes: e8c4dbc2fc ("RDMA/irdma: Add PBLE resource manager")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20250923190850.1022773-1-jmoroni@google.com
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-19 07:01:28 -04:00
YanLong Dai
88de89f184 RDMA/bnxt_re: Fix a potential memory leak in destroy_gsi_sqp
The current error handling path in bnxt_re_destroy_gsi_sqp() could lead
to a resource leak. When bnxt_qplib_destroy_qp() fails, the function
jumps to the 'fail' label and returns immediately, skipping the call
to bnxt_qplib_free_qp_res().

Continue the resource teardown even if bnxt_qplib_destroy_qp() fails,
which aligns with the driver's general error handling strategy and
prevents the potential leak.

Fixes: 8dae419f9e ("RDMA/bnxt_re: Refactor queue pair creation code")
Signed-off-by: YanLong Dai <daiyanlong@kylinos.cn>
Link: https://patch.msgid.link/20250924061444.11288-1-daiyanlong@kylinos.cn
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-19 06:47:55 -04:00
Linus Torvalds
2ccb4d203f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "A new Pensando ionic driver, a new Gen 3 HW support for Intel irdma,
  and lots of small bnxt_re improvements.

   - Small bug fixes and improves to hfi1, efa, mlx5, erdma, rdmarvt,
     siw

   - Allow userspace access to IB service records through the rdmacm

   - Optimize dma mapping for erdma

   - Fix shutdown of the GSI QP in mana

   - Support relaxed ordering MR and fix a corruption bug with mlx5 DMA
     Data Direct

   - Many improvement to bnxt_re:
       - Debugging features and counters
       - Improve performance of some commands
       - Change flow_label reporting in completions
       - Mirror vnic
       - RDMA flow support

   - New RDMA driver for Pensando Ethernet devices: ionic

   - Gen 3 hardware support for the Intel irdma driver

   - Fix rdma routing resolution with VRFs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (85 commits)
  RDMA/ionic: Fix memory leak of admin q_wr
  RDMA/siw: Always report immediate post SQ errors
  RDMA/bnxt_re: improve clarity in ALLOC_PAGE handler
  RDMA/irdma: Remove unused struct irdma_cq fields
  RDMA/irdma: Fix positive vs negative error codes in irdma_post_send()
  RDMA/bnxt_re: Remove non-statistics counters from hw_counters
  RDMA/bnxt_re: Add debugfs info entry for device and resource information
  RDMA/bnxt_re: Fix incorrect errno used in function comments
  RDMA: Use %pe format specifier for error pointers
  RDMA/ionic: Use ether_addr_copy instead of memcpy
  RDMA/ionic: Fix build failure on SPARC due to xchg() operand size
  RDMA/rxe: Fix race in do_task() when draining
  IB/sa: Fix sa_local_svc_timeout_ms read race
  IB/ipoib: Ignore L3 master device
  RDMA/core: Use route entry flag to decide on loopback traffic
  RDMA/core: Resolve MAC of next-hop device without ARP support
  RDMA/core: Squash a single user static function
  RDMA/irdma: Update Kconfig
  RDMA/irdma: Extend CQE Error and Flush Handling for GEN3 Devices
  RDMA/irdma: Add Atomic Operations support
  ...
2025-10-03 18:35:22 -07:00
Abhijit Gangurde
e6d736bd08 RDMA/ionic: Fix memory leak of admin q_wr
The admin queue work request buffer, aq->q_wr, is allocated via kcalloc in
__ionic_create_rdma_adminq. However, it was not being freed in the
corresponding teardown function __ionic_destroy_rdma_adminq. This results
in a memory leak.  Fix this leak by adding the missing kfree(aq->q_wr) in
the destruction path.

Fixes: f3bdbd4270 ("RDMA/ionic: Create device queues to support admin operations")
Link: https://patch.msgid.link/r/20250924142123.18344-1-abhijit.gangurde@amd.com
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-26 13:12:56 -03:00
Bernard Metzler
fdd0fe94d6 RDMA/siw: Always report immediate post SQ errors
In siw_post_send(), any immediate error encountered during processing of
the work request list must be reported to the caller, even if previous
work requests in that list were just accepted and added to the send queue.

Not reporting those errors confuses the caller, which would wait
indefinitely for the failing and potentially subsequently aborted work
requests completion.

This fixes a case where immediate errors were overwritten by subsequent
code in siw_post_send().

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://patch.msgid.link/r/20250923144536.103825-1-bernard.metzler@linux.dev
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-26 13:12:19 -03:00
Jakub Kicinski
203e3beb73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8).

Conflicts:

drivers/net/can/spi/hi311x.c
  6b69680847 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled")
  27ce71e1ce ("net: WQ_PERCPU added to alloc_workqueue users")
https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org

net/smc/smc_loopback.c
drivers/dibs/dibs_loopback.c
  a35c04de25 ("net/smc: fix warning in smc_rx_splice() when calling get_page()")
  cc21191b58 ("dibs: Move data path to dibs layer")
https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org

Adjacent changes:

drivers/net/dsa/lantiq/lantiq_gswip.c
  c0054b25e2 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()")
  7a1eaef0a7 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-25 11:00:59 -07:00
Alok Tiwari
fb0b08297e RDMA/bnxt_re: improve clarity in ALLOC_PAGE handler
Update uverbs_copy_to call to use sizeof(dpi) instead of sizeof(length)
when copying the device page index (DPI) back to user space. Both dpi
and length are declared as u32, so this change has no functional impact
but makes the code clearer.

Link: https://patch.msgid.link/r/20250924110130.340195-1-alok.a.tiwari@oracle.com
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-24 10:53:37 -03:00
Jacob Moroni
880245fd02 RDMA/irdma: Remove unused struct irdma_cq fields
These fields were set but not used anywhere, so remove them.

Link: https://patch.msgid.link/r/20250923142128.943240-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-24 10:52:57 -03:00
Dan Carpenter
4bab6d9584 RDMA/irdma: Fix positive vs negative error codes in irdma_post_send()
This code accidentally returns positive EINVAL instead of negative
-EINVAL.  Some of the callers treat positive returns as success.
Add the missing '-' char.

Fixes: a24a29c874 ("RDMA/irdma: Add Atomic Operations support")
Link: https://patch.msgid.link/r/aNKCjcD6Nab1jWEV@stanley.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-24 10:52:43 -03:00
Anantha Prabhu
7fcf00bd7f RDMA/bnxt_re: Remove non-statistics counters from hw_counters
Remove non-statistics counters from the RDMA hw_counters framework.
The removed data includes:

- Active resource counts (ACTIVE_PD, ACTIVE_QP, etc.)
- Resource watermarks (WATERMARK_PD, WATERMARK_QP, etc.)
- Operational counters (RESIZE_CQ_CNT)
- DB pacing metrics (PACING_RESCHED, PACING_CMPL, etc.)

This change ensures hw_counters contains only true performance
and error statistics.

Link: https://patch.msgid.link/r/20250923062657.981487-3-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-24 10:42:14 -03:00
Anantha Prabhu
604f202bc9 RDMA/bnxt_re: Add debugfs info entry for device and resource information
Add a new debugfs info entry that displays device information and
non-statistics data using the seq_file interface. This entry shows:

- Resource watermarks (peak usage tracking)
- Operational counters (CQ resize count)
- Doorbell pacing information

Link: https://patch.msgid.link/r/20250923062657.981487-2-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-24 10:42:14 -03:00
Alok Tiwari
9b9e32f75a RDMA/bnxt_re: Fix incorrect errno used in function comments
The function comments in qplib_rcfw.c mention -ETIMEOUT as a
possible return value. However, the correct errno is -ETIMEDOUT.

Update the comments to reflect the proper return value to avoid
confusion for developers and users referring to the code.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250921081854.1059094-1-alok.a.tiwari@oracle.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-21 08:53:22 -04:00
Leon Romanovsky
4b6b6233f5 RDMA: Use %pe format specifier for error pointers
Convert error logging throughout the RDMA subsystem to use
the %pe format specifier instead of PTR_ERR() with integer
format specifiers.

Link: https://patch.msgid.link/e81ec02df1e474be20417fb62e779776e3f47a50.1758217936.git.leon@kernel.org
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-09-21 07:34:49 -04:00
Abhijit Gangurde
260cce64aa RDMA/ionic: Use ether_addr_copy instead of memcpy
eth header from ib_ud_header structure packs the mac
into 4B high and 2B low parts. But when 4B high is used
in memcpy, it sees it as overflow. However, this is safe
due to the 4B high and 2B low arrangement in the structure.
To avoid the memcpy warning, use ether_addr_copy to copy
the mac address.

In function ‘fortify_memcpy_chk’,
    inlined from ‘ionic_set_ah_attr.isra’ at drivers/infiniband/hw/ionic/ionic_controlpath.c:609:3:
./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  580 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:287: drivers/infiniband/hw/ionic/ionic_controlpath.o] Error 1
make[5]: *** [scripts/Makefile.build:556: drivers/infiniband/hw/ionic] Error 2
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [scripts/Makefile.build:556: drivers/infiniband/hw] Error 2
make[3]: *** [scripts/Makefile.build:556: drivers/infiniband] Error 2
make[2]: *** [scripts/Makefile.build:556: drivers] Error 2
make[1]: *** [/tmp/tmp53nb1nwr/Makefile:2011: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2

Fixes: e8521822c7 ("RDMA/ionic: Register device ops for control path")
Reported-by: Leon Romanovsky <leon@kernel.org>
Closes: https://lore.kernel.org/lkml/20250918180750.GA135135@unreal/
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250919121301.1113759-2-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-21 07:30:37 -04:00
Abhijit Gangurde
ed9836c040 RDMA/ionic: Fix build failure on SPARC due to xchg() operand size
xchg() is used to safely handle the event queue arming.
However SPARC xchg operates only 4B of variable.
Change variable type from bool to int.

Unverified Error/Warning (likely false positive, kindly check if interested):

    ERROR: modpost: "__xchg_called_with_bad_pointer" [drivers/infiniband/hw/ionic/ionic_rdma.ko] undefined!

Error/Warning ids grouped by kconfigs:

recent_errors
`-- sparc-allmodconfig
    `-- ERROR:__xchg_called_with_bad_pointer-drivers-infiniband-hw-ionic-ionic_rdma.ko-undefined

Fixes: f3bdbd4270 ("RDMA/ionic: Create device queues to support admin operations")
Reported-by: Leon Romanovsky <leon@kernel.org>
Closes: https://lore.kernel.org/lkml/20250918180750.GA135135@unreal/
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250919121301.1113759-1-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-21 07:30:37 -04:00
Gui-Dong Han
8ca7eada62 RDMA/rxe: Fix race in do_task() when draining
When do_task() exhausts its iteration budget (!ret), it sets the state
to TASK_STATE_IDLE to reschedule, without a secondary check on the
current task->state. This can overwrite the TASK_STATE_DRAINING state
set by a concurrent call to rxe_cleanup_task() or rxe_disable_task().

While state changes are protected by a spinlock, both rxe_cleanup_task()
and rxe_disable_task() release the lock while waiting for the task to
finish draining in the while(!is_done(task)) loop. The race occurs if
do_task() hits its iteration limit and acquires the lock in this window.
The cleanup logic may then proceed while the task incorrectly
reschedules itself, leading to a potential use-after-free.

This bug was introduced during the migration from tasklets to workqueues,
where the special handling for the draining case was lost.

Fix this by restoring the original pre-migration behavior. If the state is
TASK_STATE_DRAINING when iterations are exhausted, set cont to 1 to
force a new loop iteration. This allows the task to finish its work, so
that a subsequent iteration can reach the switch statement and correctly
transition the state to TASK_STATE_DRAINED, stopping the task as intended.

Fixes: 9b4b7c1f9f ("RDMA/rxe: Add workqueue support for rxe tasks")
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://patch.msgid.link/20250919025212.1682087-1-hanguidong02@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-21 07:22:28 -04:00
Vlad Dumitrescu
1428cd764c IB/sa: Fix sa_local_svc_timeout_ms read race
When computing the delta, the sa_local_svc_timeout_ms is read without
ib_nl_request_lock held. Though unlikely in practice, this can cause
a race condition if multiple local service threads are managing the
timeout.

Fixes: 2ca546b92a ("IB/sa: Route SA pathrecord query through netlink")
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916163112.98414-1-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 05:26:26 -04:00
Vlad Dumitrescu
42f993d343 IB/ipoib: Ignore L3 master device
Currently, all master upper netdevices (e.g., bond, VRF) are treated
equally.

When a VRF netdevice is used over an IPoIB netdevice, the expected
netdev resolution is on the lower IPoIB device which has the IP address
assigned to it and not the VRF device.

The rdma_cm module (CMA) tries to match incoming requests to a
particular netdevice. When successful, it also validates that the return
path points to the same device by performing a routing table lookup.
Currently, the former would resolve to the VRF netdevice, while the
latter to the correct lower IPoIB netdevice, leading to failure in
rdma_cm.

Improve this by ignoring the VRF master netdevice, if it exists, and
instead return the lower IPoIB device.

Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-5-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 05:20:48 -04:00
Parav Pandit
c31e4038c9 RDMA/core: Use route entry flag to decide on loopback traffic
addr_resolve() considers a destination to be local if the next-hop
device of the resolved route for the destination is the loopback
netdevice.

This fails when the source and destination IP addresses belong to
a netdev enslaved to a VRF netdev. In this case the next-hop device
is the VRF itself:

 $ ip link add name myvrf up type vrf table 100
 $ ip link set ens2f0np0 master myvrf up
 $ ip addr add 192.168.1.1/24 dev ens2f0np0
 $ ip route get 192.168.1.1 oif myvrf
 local 192.168.1.1 dev myvrf table 100 src 192.168.1.1 uid 0
    cache <local>

This results in packets being generated with an incorrect destination
MAC of the VRF netdevice and ib_write_bw failing with timeout.

Solve this by determining if a destination is local or not based on
the resolved route's type rather than based on its next-hop netdevice
loopback flag.

This enables to resolve loopback traffic with and without VRF
configurations in a uniform way.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-4-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 05:20:35 -04:00
Parav Pandit
200651b9b8 RDMA/core: Resolve MAC of next-hop device without ARP support
Currently, if the next-hop netdevice does not support ARP resolution,
the destination MAC address is silently set to zero without reporting
an error. This leads to incorrect behavior and may result in packet
transmission failures.

Fix this by deferring MAC resolution to the IP stack via neighbour
lookup, allowing proper resolution or error reporting as appropriate.

Fixes: 7025fcd36b ("IB: address translation to map IP toIB addresses (GIDs)")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-3-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 05:20:20 -04:00
Parav Pandit
fa29d1e887 RDMA/core: Squash a single user static function
To reduce dependencies in IFF_LOOPBACK in route and neighbour resolution
steps, squash the static function to its single caller and simplify the
code.
Until now, network field was set even when neighbour resolution failed.
With this change, dev_addr output fields are valid only when resolution
is successful.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-2-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 05:15:39 -04:00
Tatyana Nikolova
060842fed5 RDMA/irdma: Update Kconfig
Update Kconfig to add dependency on idpf module and
add IPU E2000 to the list of supported devices.

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-17-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:46 -04:00
Shiraz Saleem
42f1d09909 RDMA/irdma: Extend CQE Error and Flush Handling for GEN3 Devices
Enhance the CQE error and flush handling specific to GEN3 devices.
Unlike GEN1/2 devices, which depend on software to generate completions
in error, GEN3 devices leverage firmware to generate CQEs in error for
all WQEs posted after a QP moves to an error state.

Key changes include:
- Updating the CQ poll logic to properly advance the CQ head in the
event of a flush CQE.
- Updating the flush logic for GEN3 to pass error WQE idx
for SQ on an AE to flush out unprocessed WQEs in error.
- Isolating the decoding of AE to flush codes into a separate routine
irdma_ae_to_qp_err_code. This routine can now be leveraged to
flush error CQEs on an AE and when error CQE is received for SRQ.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-16-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:46 -04:00
Faisal Latif
a24a29c874 RDMA/irdma: Add Atomic Operations support
Extend irdma to support atomic operations, namely Compare and Swap and
Fetch and Add, for GEN3 devices.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-15-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:46 -04:00
Shiraz Saleem
eb31dfc2b4 RDMA/irdma: Restrict Memory Window and CQE Timestamping to GEN3
With the deprecation of Memory Window and Timestamping support in GEN2,
move these features to be exclusive to GEN3. This iteration supports
only Type2 Memory Windows. Additionally, it includes the reporting of
the timestamp mask and Host Channel Adapter (HCA) core clock frequency
via the query device verb.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-14-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:46 -04:00
Faisal Latif
563e1feb5f RDMA/irdma: Add SRQ support
Implement verb API and UAPI changes to support SRQ functionality in GEN3
devices.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-13-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:46 -04:00
Shiraz Saleem
9a1d687863 RDMA/irdma: Support 64-byte CQEs and GEN3 CQE opcode decoding
Introduce support for 64-byte CQEs in GEN3 devices. Additionally,
implement GEN3-specific CQE opcode decoding.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-12-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:46 -04:00
Vinoth Kumar Chandra Mohan
419afdd122 RDMA/irdma: Add support for V2 HMC resource management scheme
HMC resource initialization is updated to support V1 or V2 approach
based on the FW capability. In the V2 approach, driver receives the
assigned HMC resources count and verifies if it will fit in the given
local memory. If it doesn't fit, the driver load fails.

Signed-off-by: Vinoth Kumar Chandra Mohan <vinoth.kumar.chandra.mohan@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-11-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Shiraz Saleem
87f413b6c9 RDMA/irdma: Extend QP context programming for GEN3
Extend the QP context structure with support for new fields
specific to GEN3 hardware capabilities.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-10-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Shiraz Saleem
d6ed4b69b8 RDMA/irdma: Add GEN3 virtual QP1 support
Add a new RDMA virtual channel op during QP1 creation that allow the
Control Plane (CP) to virtualize a regular QP as QP1 on non-default
RDMA capable vPorts. Additionally, the CP will return the Qsets to use
on the ib_device of the vPort.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-9-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Mustafa Ismail
2ad49ae733 RDMA/irdma: Introduce GEN3 vPort driver support
In the IPU model, a function can host one or more logical network
endpoints called vPorts. Each vPort may be associated with either a
physical or an internal communication port, and can be RDMA capable. A
vPort features a netdev and, if RDMA capable, must have an associated
ib_dev.

This change introduces a GEN3 auxiliary vPort driver responsible for
registering a verbs device for every RDMA-capable vPort. Additionally,
the UAPI is updated to prevent the binding of GEN3 devices to older
user-space providers.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-8-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Krzysztof Czurylo
da278cb29c RDMA/irdma: Add GEN3 HW statistics support
Plug into the unified HW statistics framework by adding a hardware
statistics map array for GEN3, defining the HW-specific width and
location for each counter in the statistics buffer.

Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-7-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Shiraz Saleem
b800e82feb RDMA/irdma: Add GEN3 support for AEQ and CEQ
Extend support for GEN3 devices by programming the necessary hardware
IRQ registers and the updated descriptor fields for the Asynchronous
Event Queue (AEQ) and Completion Event Queue (CEQ). Introduce a RDMA
virtual channel operation with the Control Plane (CP) to associate
interrupt vectors appropriately with AEQ and CEQ. Add new Asynchronous
Event (AE) definitions specific to GEN3.

Additionally, refactor the AEQ and CEQ setup into the irdma_ctrl_init_hw
device control initialization routine.

This completes the PCI device level initialization for RDMA in the core
driver.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-6-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Krzysztof Czurylo
c7db0abe5f RDMA/irdma: Add GEN3 CQP support with deferred completions
GEN3 introduces asynchronous handling of Control QP (CQP) operations to
minimize head-of-line blocking. Create the CQP using the updated GEN3-
specific descriptor fields and implement the necessary support for this
deferred completion mechanism.

Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-5-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Christopher Bednarz
7d5a7cc7b9 RDMA/irdma: Discover and set up GEN3 hardware register layout
Discover the hardware register layout for GEN3 devices through an RDMA
virtual channel operation with the Control Plane (CP). Set up the
corresponding hardware attributes specific to GEN3 devices.

Signed-off-by: Christopher Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-4-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Mustafa Ismail
d5edd33364 RDMA/irdma: Add GEN3 core driver support
Introduce support for the GEN3 auxiliary core driver, which is
responsible for initializing PCI-level RDMA resources.

Facilitate host-driver communication with the device's Control Plane (CP)
to discover capabilities and perform privileged operations through an
RDMA-specific messaging interface built atop the IDPF mailbox and virtual
channel protocol.

Establish the RDMA virtual channel message interface and incorporate
operations to retrieve the hardware version and discover capabilities
from the CP.

Additionally, set up the RDMA MMIO regions and initialize the RF structure.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Co-developed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-3-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Mustafa Ismail
0c2b80cac9 RDMA/irdma: Refactor GEN2 auxiliary driver
Refactor the irdma auxiliary driver and associated interfaces out of main.c
and into a standalone GEN2-specific source file and rename as gen_2 driver.

This is in preparation for adding GEN3 auxiliary drivers. Each HW
generation will have its own gen-specific interface file.

Additionally, move the Address Handle hash table and associated locks
under rf struct. This will allow GEN3 code to migrate to use it easily.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Co-developed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-2-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18 04:48:45 -04:00
Cosmin Ratiu
aa4595d0ad net/mlx5: Store the global doorbell in mlx5_priv
The global doorbell is used for more than just Ethernet resources, so
move it out of mlx5e_hw_objs into a common place (mlx5_priv), to avoid
non-Ethernet modules (e.g. HWS, ASO) depending on Ethernet structs.

Use this opportunity to consolidate it with the 'uar' pointer already
there, which was used as an RX doorbell. Underneath the 'uar' pointer is
identical to 'bfreg->up', so store a single resource and use that
instead.

For CQ doorbells, care is taken to always use bfreg->up->index instead
of bfreg->index, which may refer to a subsequent UAR page from the same
ALLOC_UAR batch on some NICs.

This paves the way for cleanly supporting multiple doorbells in the
Ethernet driver.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 18:30:32 -07:00
Shiraz Saleem
2bd7dd3836 RDMA/mana_ib: Extend modify QP
Extend modify QP to support further attributes: local_ack_timeout, UD qkey,
rate_limit, qp_access_flags, flow_label, max_rd_atomic.

Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1757923172-4475-1-git-send-email-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-15 04:11:12 -04:00
Håkon Bugge
2bbe1255fc RDMA/cm: Rate limit destroy CM ID timeout error message
When the destroy CM ID timeout kicks in, you typically get a storm of
them which creates a log flooding. Hence, change pr_err() to
pr_err_ratelimited() in cm_destroy_id_wait_timeout().

Fixes: 96d9cbe2f2 ("RDMA/cm: add timeout to cm_destroy_id wait")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Link: https://patch.msgid.link/20250912100525.531102-1-haakon.bugge@oracle.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-15 03:41:45 -04:00
Shravya KN
eba165b290 RDMA/bnxt_re: Avoid GID level QoS update from the driver
The driver inserts a VLAN header into RoCE packets when the
traffic was untagged by modifying the existing GID entries.
This has caused the firmware to enforce only VLAN-based
priority mappings, ignoring other valid priority configurations
set via APP TLVs (e.g., DSCP selectors).

Driver now has support for selecting the service level (vlan id)
and traffic class (dscp) during modify_qp. So no need to override
the priority update using the update gid method. Hence removing
the code that handles the above operation.

Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Link: https://patch.msgid.link/20250908094516.18222-3-kalesh-anakkur.purayil@broadcom.com
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11 02:18:36 -04:00
Anantha Prabhu
72d0b87e1e RDMA/bnxt_re: Update sysfs entries with appropriate data
Updated the existing sysfs entries with correct data.
This change is to align the behavior with our OOB driver.
Added "board_id" sysfs entry which will provide the
VPD Part number, if exists.

Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250908094516.18222-2-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11 02:18:36 -04:00