Commit Graph

752132 Commits

Author SHA1 Message Date
Jason Gunthorpe
c62091bcd9 Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into for-next
mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.

* tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5e: Explicitly set source e-switch in offloaded TC rules
  net/mlx5: Add source e-switch owner
  net/mlx5e: Explicitly set destination e-switch in FDB rules
  net/mlx5: Add destination e-switch owner
  net/mlx5: Properly handle a vport destination when setting FTE
  net/mlx5: Add merged e-switch cap
  IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
2018-05-24 09:40:43 -06:00
Parav Pandit
724631a9c6 IB/core: Introduce and use rdma_gid_table()
There are several places a gid table is accessed.
Have a helper tiny function rdma_gid_table() to avoid code
duplication at such places.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Parav Pandit
25e62655c7 IB/core: Reduce the places that use zgid
Instead of open coding memcmp() to check whether a given GID is zero or
not, use a helper function to do so, and replace instances of
memcpy(z,&zgid) with memset.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Erez Shitrit
7b74a83cf5 IB/mlx5: Fetch soft WQE's on fatal error state
On fatal error the driver simulates CQE's for ULPs that rely on
completion of all their posted work-request.

For the GSI traffic, the mlx5 has its own mechanism that sends the
completions via software CQE's directly to the relevant CQ.

This should be kept in fatal error too, so the driver should simulate
such CQE's with the specified error state in order to complete GSI QP
work requests.

Without the fix the next deadlock might appears:
        schedule_timeout+0x274/0x350
        wait_for_common+0xec/0x240
        mcast_remove_one+0xd0/0x120 [ib_core]
        ib_unregister_device+0x12c/0x230 [ib_core]
        mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
        mlx5_detach_device+0x184/0x1a0 [mlx5_core]
        mlx5_unload_one+0x308/0x340 [mlx5_core]
        mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 89ea94a7b6 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Leon Romanovsky
7a8690ed6f RDMA/ucm: Mark UCM interface as BROKEN
In commit 357d23c811a7 ("Remove the obsolete libibcm library")
in rdma-core [1], we removed obsolete library which used the
/dev/infiniband/ucmX interface.

Following multiple syzkaller reports about non-sanitized
user input in the UCMA module, the short audit reveals the same
issues in UCM module too.

It is better to disable this interface in the kernel,
before syzkaller team invests time and energy to harden
this unused interface.

[1] https://github.com/linux-rdma/rdma-core/pull/279

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Parav Pandit
9906224f60 IB/core: Remove duplicate declaration of gid_cache_wq
Remove duplicate declaration of gid_cache_wq.

Fixes: d41861942 ("IB/core: Add generic function to extract IB speed from netdev")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Leon Romanovsky
8f06228733 RDMA/mlx5: Remove debug prints of VMA pointers
Remove various prints of VMA pointers.

Reported-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
oulijun
cc3391cb53 RDMA/hns: Rename the idx field of db
The lower 15 bit of paramter of db structure means different
meanings when db type is sq, rq and srq.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Mike Marciniszyn
0252f73334 IB/qib: Fix DMA api warning with debug kernel
The following error occurs in a debug build when running MPI PSM:

[  307.415911] WARNING: CPU: 4 PID: 23867 at lib/dma-debug.c:1158
check_unmap+0x4ee/0xa20
[  307.455661] ib_qib 0000:05:00.0: DMA-API: device driver failed to check map
error[device address=0x00000000df82b000] [size=4096 bytes] [mapped as page]
[  307.517494] Modules linked in:
[  307.531584]  ib_isert iscsi_target_mod ib_srpt target_core_mod rpcrdma
sunrpc ib_srp scsi_transport_srp scsi_tgt ib_iser libiscsi ib_ipoib
scsi_transport_iscsi rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
ib_qib intel_powerclamp coretemp rdmavt intel_rapl iosf_mbi kvm_intel kvm
irqbypass crc32_pclmul ghash_clmulni_intel ipmi_ssif ib_core aesni_intel sg
ipmi_si lrw gf128mul dca glue_helper ipmi_devintf iTCO_wdt gpio_ich hpwdt
iTCO_vendor_support ablk_helper hpilo acpi_power_meter cryptd ipmi_msghandler
ie31200_edac shpchp pcc_cpufreq lpc_ich pcspkr ip_tables xfs libcrc32c sd_mod
crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul crct10dif_common
drm crc32c_intel libahci tg3 libata serio_raw ptp i2c_core
[  307.846113]  pps_core dm_mirror dm_region_hash dm_log dm_mod
[  307.866505] CPU: 4 PID: 23867 Comm: mpitests-IMB-MP Kdump: loaded Not
tainted 3.10.0-862.el7.x86_64.debug #1
[  307.911178] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
[  307.944206] Call Trace:
[  307.956973]  [<ffffffffbd9e915b>] dump_stack+0x19/0x1b
[  307.982201]  [<ffffffffbd2a2f58>] __warn+0xd8/0x100
[  308.005999]  [<ffffffffbd2a2fdf>] warn_slowpath_fmt+0x5f/0x80
[  308.034260]  [<ffffffffbd5f667e>] check_unmap+0x4ee/0xa20
[  308.060801]  [<ffffffffbd41acaa>] ? page_add_file_rmap+0x2a/0x1d0
[  308.090689]  [<ffffffffbd5f6c4d>] debug_dma_unmap_page+0x9d/0xb0
[  308.120155]  [<ffffffffbd4082e0>] ? might_fault+0xa0/0xb0
[  308.146656]  [<ffffffffc07761a5>] qib_tid_free.isra.14+0x215/0x2a0 [ib_qib]
[  308.180739]  [<ffffffffc0776bf4>] qib_write+0x894/0x1280 [ib_qib]
[  308.210733]  [<ffffffffbd540b00>] ? __inode_security_revalidate+0x70/0x80
[  308.244837]  [<ffffffffbd53c2b7>] ? security_file_permission+0x27/0xb0
[  308.266025] qib_ib0.8006: multicast join failed for
ff12:401b:8006:0000:0000:0000:ffff:ffff, status -22
[  308.323421]  [<ffffffffbd46f5d3>] vfs_write+0xc3/0x1f0
[  308.347077]  [<ffffffffbd492a5c>] ? fget_light+0xfc/0x510
[  308.372533]  [<ffffffffbd47045a>] SyS_write+0x8a/0x100
[  308.396456]  [<ffffffffbd9ff355>] system_call_fastpath+0x1c/0x21

The code calls a qib_map_page() which has never correctly tested for a
mapping error.

Fix by testing for pci_dma_mapping_error() in all cases and properly
handling the failure in the caller.

Additionally, streamline qib_map_page() arguments to satisfy just
the single caller.

Cc: <stable@vger.kernel.org>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Tested-by: Don Dutile <ddutile@redhat.com>
Reviewed-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Alex Estrin
763b69654b IB/isert: Fix for lib/dma_debug check_sync warning
The following error message occurs on a target host in a debug build
during session login:

[ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0
[ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes]
......snip .....

[ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1
[ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015
[ 3524.555740] Call Trace:
[ 3524.559102]  [<ffffffffa5fe915b>] dump_stack+0x19/0x1b
[ 3524.565477]  [<ffffffffa58a2f58>] __warn+0xd8/0x100
[ 3524.571557]  [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80
[ 3524.578610]  [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0
[ 3524.585177]  [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0
[ 3524.592812]  [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90
[ 3524.601029]  [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20
[ 3524.608574]  [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80
[ 3524.616699]  [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140
[ 3524.623567]  [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert]
[ 3524.633060]  [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert]
[ 3524.641383]  [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430
[ 3524.648561]  [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod]
[ 3524.658557]  [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod]
[ 3524.668084]  [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60
[ 3524.674420]  [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod]
[ 3524.684656]  [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod]
[ 3524.694901]  [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod]
[ 3524.705446]  [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod]
[ 3524.715976]  [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod]
[ 3524.725739]  [<ffffffffa58d60ff>] kthread+0xef/0x100
[ 3524.732007]  [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80
[ 3524.739540]  [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21
[ 3524.747558]  [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80
[ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]---
[ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.

The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it
with dma-mapped address.
login_tx_desc is a part of iser_conn structure and is used only once
during login negotiation, so the issue is fixed by eliminating
dma_sync call for this buffer using a special case routine.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Mike Marciniszyn
3ce459cd68 IB/{rdmavt,hfi1}: Change hrtimer add to use pinned version
Given we are dealing with nano-second level timers, when the timer
pops, ensure it happens on the CPU which caused the timer to be set
in the first place.  This avoids excessive jitter from the desired
expiration time by avoiding the cost of switching our context to
another CPU that is cache cold for this given timer.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Michael J. Ruhl
5938d94cf0 IB/hfi1: Set port number for errorinfo MAD response
For errorinfo MAD requests, the response has a 0 port number left over
from a memset. Instead we should always set the port number in the
response.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Mike Marciniszyn
c8314811f9 IB/hfi1: Cleanup of exp_rcv
The knowledge of the internal workings of the expect receive
is too distributed.

Fix by:
- right size several rcd fields associated with
  expect receive
- making an init entrance to init all the lists
- consolidate all the allocations into an array anchored
  in the rcd

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Don Hiatt
43a68c35c7 IB/hfi1: Add 16B Management Packet trace support
Add trace support for 16B Management Packets.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Don Hiatt
81cd3891f0 IB/hfi1: Add support for 16B Management Packets
16B Management Packets (L4=0x08) replace the BTH and DETH
of normal MAD packet packets with a header containing the
the source and destination queue pair numbers; fields that
were originally retrieved from the BTH/DETH are now populated
from this header as well as from the 16B LRH (e.g. pkey).

16B Management Packets are used as an optimized management
format on 16B fabrics.

These management packets have an opcode of IB_OPCODE_UD_SEND_ONLY,
a fixed 3Byte pad, and a header length of 24Bytes.

The decision as to when we send a management packet is based
upon either the source or destination queue pair number being
0 or 1.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Don Hiatt
4171a693a5 IB/hfi1: Define 16B Management Packets
Add 16B Management Packet definition. This optimized packet
format replaces the ib_other_headers and BTH with a source
and destination QP number.

To support these packets we introduce struct opa_16b_mgmt
into the struct hfi1_16b_header.

This packet format is only used for MAD packets using the
IB_OPCODE_UD_SEND_ONLY opcode on QP0/1.

The original 16B implementation failed to use 16B management
packets so now we add their definition.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Steve Wise
013f64a880 iw_cxgb4: provide detailed driver-specific MR information
Add a table of important fields from the fw_ri_tpte structure to the mr
resource tracking table.  This is helpful in debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Steve Wise
54e7688e54 iw_cxgb4: provide detailed driver-specific CQ information
Add a table of important fields from the c4iw_cq* structures to the cq
resource tracking table.  This is helpful in debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Steve Wise
116aeb8873 iw_cxgb4: provide detailed provider-specific CM_ID information
Add a table of important fields from the c4iw_ep* structures to the cm_id
resource tracking table.  This is helpful in debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Steve Wise
fbdb0a9181 RDMA/CMA: add rdma_iw_cm_id() and rdma_res_to_id() helpers
Add a helper function for iwarp drivers to be able to map an
rdma_cm_id to an iw_cm_id.  This is useful for dumping driver specific
NLDEV/RESTRACK connection state.

Add a helper to return the rdma_cm_id pointer from the rdma_restack
pointer.  This is needed for rdma drivers to map a res entry back to
the public rdma_cm_id struct.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-22 14:32:30 -04:00
Steve Wise
b06f2efd3b iw_cxgb4: always set iw_cm_id.provider_data
In active side connections, the provider_data field is not
getting set.  This will be used in a subsequent patch to dump
state, so always set it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-22 14:32:30 -04:00
Doug Ledford
fa9391dbad RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events
We do a light flush on CLIENT_REREG and SM_CHANGE events.  This goes
through and marks paths invalid. But we weren't always checking for this
validity when we needed to, and so we could keep using a path marked
invalid.  What's more, once we establish a path with a valid ah, we put
a pointer to the ah in the neigh struct directly, so even if we mark the
path as invalid, as long as the neigh has a direct pointer to the ah, it
keeps using the old, outdated ah.

To fix this we do several things.

1) Put the valid flag in the ah instead of the path struct, so when we
put the ah pointer directly in the neigh struct, we can easily check the
validity of the ah on send events.
2) Check the neigh->ah and neigh->ah->valid elements in the needed
places, and if we have an ah, but it's invalid, then invoke a refresh of
the ah.
3) Fix the various places that check for path, but didn't check for
path->valid (now path->ah && path->ah->valid).

Reported-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>
Fixes: ee1e2c82c2 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-22 13:16:05 -04:00
Shahar Klein
10ff5359f8 net/mlx5e: Explicitly set source e-switch in offloaded TC rules
Set a specific source e-switch when setting a rule that matches on the
ingress port.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:35 -07:00
Shahar Klein
3e99df8772 net/mlx5: Add source e-switch owner
The source e-switch owner allows a vport on one e-switch port be associated
with a rule defined on the second port e-switch.

The role of the source eswitch owner valid bit in the flow group is to
allow the firmware fail driver attempts to wild card the source eswitch
match field. If this bit is not set, the firmware ignores the source
eswitch owner field totally.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Rabie Loulou
56e858df9f net/mlx5e: Explicitly set destination e-switch in FDB rules
Set a specific destination e-switch when setting a destination vport.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Shahar Klein
b17f7fc10f net/mlx5: Add destination e-switch owner
The destination e-switch owner allows a rule in namespace of one e-switch
owner to point to a vport that is natively associated with another
e-switch owner.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Shahar Klein
65360e5451 net/mlx5: Properly handle a vport destination when setting FTE
When creating FTE, properly distinguish between destination being vport
or tir. The previous code just worked accidentally b/c of both dest being
in the same offset within a union.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Roi Dayan
a6d0456912 net/mlx5: Add merged e-switch cap
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.
Under merged eswitch, both the matching on source vport and setting
destination vport can have a 2nd attribute which is the vhca id of the
eswitch owner.

For example:
esw0: {match: <src vport=1 owner=0> action: fwd to <dst vport=7, owner=1>}
is a flow set on eswitch0 matching on source vport=1 from his eswitch
and the action being fwd to dest vport=7 of eswitch1.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz Klein <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Zhu Yanjun
da2f3b286a IB/rxe: avoid calling WARN_ON_ONCE twice
In the exit branch, WARN_ON_ONCE is called to show stack. So it is
not necessary to call WARN_ON_ONCE before going to exit.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-17 09:16:14 -06:00
Yixian Liu
5e6e78dbd3 RDMA/hns: Add 64KB page size support for hip08
This patch adds the support of 64KB page size for hip08
in kernel.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:44:05 -06:00
Sebastian Andrzej Siewior
112f5c8199 IB/ipoib: replace local_irq_disable() with proper locking
In ipoib_mcast_restart_task() the netif_addr_lock() is invoked prior
local_irq_save(). netif_addr_lock() should not be invoked in interrupt disabled
section, only in BH disabled sections.
The priv->lock is always acquired with disabled interrupts. The only place
where netif_addr_lock() and priv->lock nest ist ipoib_mcast_restart_task().

Drop the local_irq_save() and acquire priv->lock with spin_lock_irq() inside
the netif_addr locked section. It's safe to do so because the caller is either
a worker function or __ipoib_ib_dev_flush() which are both calling with
interrupts enabled (and since BH is enabled here, too so
netif_addr_lock_bh() needs to be used).

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:42:06 -06:00
Ariel Levkovich
e818e255a5 IB/mlx5: Expose MPLS related tunneling offloads
This patch reports the device's capbilities to offload
encapsulated MPLS tunnel protocols to user-space:
- Capability to offload MPLS over GRE.
- Capability to offload MPLS over UDP.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
71c6e8638c IB/mlx5: Add support for MPLS flow specification
This patch introduces support for the MPLS flow spec and
allows the creation of rules that are matching on the
MPLS label.

Applying the rule matching depends on the flow specs order and
the location of the MPLS in the spec list as there are different
configurations to be made in the device in the cases of MPLSoGRE
and MPLSoUDP vs. non-encapsulated MPLS.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
da2f22ae77 IB/mlx5: Add support for GRE flow specification
This patch introduces support for the GRE flow spec and
allowing the creation of rules based on the protocol and
key fields that are part of GRE protocol header.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
b04f0f036a IB/uverbs: Introduce a MPLS steering match filter
Add a new MPLS steering match filter that can match against
a single MPLS tag field.

Since the MPLS header can reside in different locations in the packet's
protocol stack as well as be encapsulated with a tunnel protocol, it
is required to know the exact location of the header in the protocol
stack.

Therefore, when including the MPLS protocol spec in the specs list,
it is mandatory to provide the list in an ordered manner, so
that it represents the actual header order in a matching packet.

Drivers that process the spec list and apply the matching rule
should treat the position of the MPLS spec in the spec list as the
actual location of the MPLS label in the packet's protocol stack.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
0d86bbec71 IB/uverbs: Expose MPLS flow spec to the user-kernel ABI header
Add ib_uverbs_flow_spec_mpls to define a rule to match the MPLS
protocol.

The spec includes the generic specs header, type, size and reserved
fields while the filter itself is defined as ib_uverbs_flow_mpls_filter
and includes a single 32bit field named 'label' which consists of:
Bits 0:19  - The MPLS label.
Bits 20:22 - Traffic class field.
Bit  23    - Bottom of stack bit.
Bits 24:31 - Time to live (TTL) field.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:54 -06:00
Ariel Levkovich
d90e5e5038 IB/uverbs: Introduce a GRE steering match filter
Adding a new GRE steering match filter that can match against
key and protocol fields.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:54 -06:00
Ariel Levkovich
20b6563ba1 IB/uverbs: Expose GRE flow spec to the user-kernel ABI header
Add ib_uverbs_flow_spec_gre to define a rule to match the GRE
encapsulation protocol.

The spec includes the generic specs header, type, size and reserved
fields while the filter itself is defined as ib_uverbs_flow_gre_filter
and includes:
1. Checksum present bit, key present bit and version bits in a single
   16bit field.
2. Protocol type field - Indicates the ether protocol type of the
   encapsulated payload.
3. Key field - present if key bit is set and contains an application
   specific key value.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:54 -06:00
Christophe JAILLET
909d434406 IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Fixes: 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16 17:50:19 -07:00
Christophe JAILLET
e574978ae5 net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()'
When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Fixes: fed9ce22bf ("net/mlx5: E-Switch, Add API to create vport rx rules")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16 17:49:01 -07:00
Christophe JAILLET
a5898e97f0 net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Fixes: 9efa752545 ("net/mlx5_core: Introduce access functions to query vport RoCE fields")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16 17:48:17 -07:00
Parav Pandit
e822ff213f IB/cm: Store and restore ah_attr during CM message processing
During CM request processing flow, ah_attr is initialized twice.
First based on wc. Secondly based on primary path record.
ah_attr initialization from path record can fail, which leads to ah_attr
zeroed out.

Therefore, always initialize ah_attr on stack during reinitialization
phase. If ah_attr init is successful, use the new ah_attry by
overwriting the old one. If the ah_attr init fails, continue to use the
last ah_attr.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 14:11:46 -06:00
Parav Pandit
0e225dcb76 IB/cm: Store and restore ah_attr during LAP msg processing
During CM LAP processing, ah_attr is reinitialized on receiving LAP
request. First likely during CM request processing.

ah_attr might get zero out if LAP processing fails.
Therefore, attempt to create new ah_attr for the LAP message.
If the initialization fails, continue with older ah_attr.
If the initialization passes, consider the new ah_attr by overwriting
the older one.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 14:11:46 -06:00
Parav Pandit
a5c57d3272 IB/cm: Avoid AV ah_attr overwriting during LAP message handling
AH attribute of the cm_id can be overwritten if LAP message is received
on CM request which is in progress. This bug got introduced to avoid
sleeping when spin lock is held as part of commit in Fixes tag.

Therefore validate the cm_id state first and continue to perform AV
ah_attr initialization.

Given that Aleternative path related messages are not supported for
RoCE, init_av_from_response/path is such messages are ok to be called
from blocking context.

Fixes: 33f93e1ebc ("IB/cm: Fix sleeping while spin lock is held")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 14:11:46 -06:00
Shiraz Saleem
f43c00c04b i40iw: Extend port reuse support for listeners
If two listeners are created with different IP's but
same port, the second rdma_listen fails due to a
duplicate port entry being added from the CQP add
APBVT OP. commit f16dc0aa5e ("i40iw: Add support
for port reuse on active side connections") does not
account for listener side port reuse.

Check for duplicate port before invoking the CQP command
to add APBVT entry and delete the entry only if the port
is not in use. Additionally, consolidate all port-reuse
logic into i40iw_manage_apbvt.

Fixes: f16dc0aa5e ("i40iw: Add support for port reuse on active side connections")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 13:13:20 -06:00
Yuval Shaia
aec05afe64 IB/core: Remove redundant return
"return" statement at the end of void function is redundant, removing
it.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-15 16:22:02 -06:00
Steve Wise
2d478b2859 iw_cxgb4: remove wr_id attributes
Remove sq/rq wr_id attributes because typically they are pointers and
we don't want to pass up kernel pointers.

Fixes: 056f9c7f39 ("iw_cxgb4: dump detailed driver-specific QP information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-15 16:18:03 -06:00
Steve Wise
e6125a254d RDMA/NLDEV: remove mr iova attribute
Remove mr iova attribute because we don't want to pass up kernel pointers.

Fixes: fccec5b89a ("RDMA/nldev: provide detailed MR information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-15 16:17:38 -06:00
Steve Wise
1ea62e8164 iw_cxgb4: fix uninitialized variable warnings
Fixes: 056f9c7f39 ("iw_cxgb4: dump detailed driver-specific QP information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-15 16:14:25 -06:00
Doug Ledford
0d52d80376 RDMA/uapi: Fix uapi breakage
During this merge window, we added support for addition RDMA netlink
operations.  Unfortunately, we added the items in the middle of our uapi
enum.  Fix that before final release.

Fixes: da5c850782 ("RDMA/nldev: add driver-specific resource
tracking")
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-15 15:54:46 -04:00