Currently the rxe driver has a rxe_mw struct object but nothing about
memory windows is enabled. This patch turns on memory windows and some
minor cleanup.
Set device attribute in rxe.c so max_mw = MAX_MW. Change parameters in
rxe_param.h so that MAX_MW is the same as MAX_MR. Reduce the number of
MRs and MWs to 4K from 256K. Add device capability bits for 2a and 2b
memory windows. Removed RXE_MR_TYPE_MW from the rxe_mr_type enum.
Link: https://lore.kernel.org/r/20210608042552.33275-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the rdma_rxe driver attempts to protect atomic responder
resources by taking a reference to the qp which is only freed when the
resource is recycled for a new read or atomic operation. This means that
in normal circumstances there is almost always an extra qp reference once
an atomic operation has been executed which prevents cleaning up the qp
and associated pd and cqs when the qp is destroyed.
This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
destroyed while a peer is retrying an atomic op it will cause the
operation to fail which is acceptable.
Link: https://lore.kernel.org/r/20210604230558.4812-1-rpearsonhpe@gmail.com
Reported-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Fixes: 86af617641 ("IB/rxe: remove unnecessary skb_clone")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The level1 PBL info address is stored as u64. This requires casting
through a uinptr_t before used as a pointer type.
And this leads to sparse warning such as this when uinptr_t is missing:
drivers/infiniband/hw/irdma/hw.c: In function 'irdma_destroy_virt_aeq':
drivers/infiniband/hw/irdma/hw.c:579:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
579 | dma_addr_t *pg_arr = (dma_addr_t *)aeq->palloc.level1.addr;
This can be fixed using an intermediate uintptr_t, but rather it is better
to fix the structure irdm_pble_info to store the address as u64* and the
VA it is assigned in irdma_chunk as a void*. This greatly reduces the
casting on this address.
Fixes: 44d9e52977 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20210609234924.938-1-shiraz.saleem@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The shifting of the u8 integer info->map[i] the left will be promoted
to a 32 bit signed int and then sign-extended to a u64. In the event
that the top bit of the u8 is set then all then all the upper 32 bits
of the u64 end up as also being set because of the sign-extension.
Fix this by casting the u8 values to a u64 before the left shift. This
Link: https://lore.kernel.org/r/20210605122059.25105-1-colin.king@canonical.com
Addresses-Coverity: ("Unitentional integer overflow / bad shift operation")
Fixes: 3f49d68425 ("RDMA/irdma: Implement HW Admin Queue OPs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
To avoid the following failure when trying to load the rdma_rxe module
while IPv6 is disabled, add a check for EAFNOSUPPORT and ignore the
failure, also delete the needless debug print from rxe_setup_udp_tunnel().
$ modprobe rdma_rxe
modprobe: ERROR: could not insert 'rdma_rxe': Operation not permitted
Fixes: dfdd6158ca ("IB/rxe: Fix kernel panic in udp_setup_tunnel")
Link: https://lore.kernel.org/r/20210603090112.36341-1-kamalheib1@gmail.com
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to prevent user space from modifying the index that belongs to
the kernel for shared queues let the kernel use a local copy of the index
and copy any new values of that index to the shared rxe_queue_bus struct.
This adds more switch statements which decreases the performance of the
queue API. Move the type into the parameter list for these functions so
that the compiler can optimize out the switch statements when the explicit
type is known. Modify all the calls in the driver on performance paths to
pass in the explicit queue type.
Link: https://lore.kernel.org/r/20210527194748.662636-4-rpearsonhpe@gmail.com
Link: https://lore.kernel.org/linux-rdma/20210526165239.GP1002214@@nvidia.com/
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Modify the queue APIs to protect all user space index loads with
smp_load_acquire() and all user space index stores with
smp_store_release(). Base this on the types of the queues which can be one
of ..KERNEL, ..FROM_USER, ..TO_USER. Kernel space indices are protected by
locks which also provide memory barriers.
Link: https://lore.kernel.org/r/20210527194748.662636-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
To create optimal code only want to use smp_load_acquire() and
smp_store_release() for user indices in rxe_queue APIs since kernel
indices are protected by locks which also act as memory barriers. By
adding a type to the queues we can determine which indices need to be
protected.
Link: https://lore.kernel.org/r/20210527194748.662636-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Shiraz Saleem says:
====================
Add Intel Ethernet Protocol Driver for RDMA (irdma)
The following patch series introduces a unified Intel Ethernet Protocol
Driver for RDMA (irdma) for the X722 iWARP device and a new E810 device
which supports iWARP and RoCEv2. The irdma module replaces the legacy
i40iw module for X722 and extends the ABI already defined for i40iw. It is
backward compatible with legacy X722 rdma-core provider (libi40iw).
X722 and E810 are PCI network devices that are RDMA capable. The RDMA
block of this parent device is represented via an auxiliary device
exported to 'irdma' using the core auxiliary bus infrastructure recently
added for 5.11 kernel. The parent PCI netdev drivers 'i40e' and 'ice'
register auxiliary RDMA devices with private data/ops encapsulated that
bind to auxiliary drivers registered in irdma module.
Currently, default is RoCEv2 for E810. Runtime support for protocol switch
to iWARP will be made available via devlink in a future patch.
====================
Link: https://lore.kernel.org/r/20210602205138.889-1-shiraz.saleem@intel.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* branch 'irdma':
RDMA/irdma: Update MAINTAINERS file
RDMA/irdma: Add irdma Kconfig/Makefile and remove i40iw
RDMA/irdma: Add ABI definitions
RDMA/irdma: Add dynamic tracing for CM
RDMA/irdma: Add miscellaneous utility definitions
RDMA/irdma: Add user/kernel shared libraries
RDMA/irdma: Add RoCEv2 UD OP support
RDMA/irdma: Implement device supported verb APIs
RDMA/irdma: Add PBLE resource manager
RDMA/irdma: Add connection manager
RDMA/irdma: Add QoS definitions
RDMA/irdma: Add privileged UDA queue implementation
RDMA/irdma: Add HMC backing store setup functions
RDMA/irdma: Implement HW Admin Queue OPs
RDMA/irdma: Implement device initialization definitions
RDMA/irdma: Register auxiliary driver and implement private channel OPs
i40e: Register auxiliary devices to provide RDMA
i40e: Prep i40e header for aux bus conversion
ice: Register auxiliary device to provide RDMA
ice: Implement iidc operations
ice: Initialize RDMA support
iidc: Introduce iidc.h
i40e: Replace one-element array with flexible-array member
Add Kconfig and Makefile to build irdma driver.
Remove i40iw driver and add an alias in irdma.
Remove legacy exported symbols i40e_register_client
and i40e_unregister_client from i40e as they are no
longer used.
irdma is the replacement driver that supports X722.
Link: https://lore.kernel.org/r/20210602205138.889-16-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add definitions for managing the RDMA HW work scheduler (WS) tree.
A WS node is created via a control QP operation with the bandwidth
allocation, arbitration scheme, and traffic class of the QP specified.
The Qset handle returned associates the QoS parameters for the QP.
The Qset is registered with the LAN and an equivalent node is created
in the LAN packet scheduler tree.
Link: https://lore.kernel.org/r/20210602205138.889-7-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
HW uses host memory as a backing store for a number of
protocol context objects and queue state tracking.
The Host Memory Cache (HMC) is a component responsible for
managing these objects stored in host memory.
Add the functions and data structures to manage the allocation
of backing pages used by the HMC for the various objects
Link: https://lore.kernel.org/r/20210602205138.889-5-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>