OSDN Git Service

uclinux-h8/linux.git
5 years agoMerge branch 'mlx5-packet-credit-fc' into rdma.git
Jason Gunthorpe [Fri, 7 Dec 2018 20:25:12 +0000 (13:25 -0700)]
Merge branch 'mlx5-packet-credit-fc' into rdma.git

Danit Goldberg says:

Packet based credit mode

Packet based credit mode is an alternative end-to-end credit mode for QPs
set during their creation. Credits are transported from the responder to
the requester to optimize the use of its receive resources.  In
packet-based credit mode, credits are issued on a per packet basis.

The advantage of this feature comes while sending large RDMA messages
through switches that are short in memory.

The first commit exposes QP creation flag and the HCA capability. The
second commit adds support for a new DV QP creation flag. The last commit
report packet based credit mode capability via the MLX5DV device
capabilities.

* branch 'mlx5-packet-credit-fc':
  IB/mlx5: Report packet based credit mode device capability
  IB/mlx5: Add packet based credit mode support
  net/mlx5: Expose packet based credit mode

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Report packet based credit mode device capability
Danit Goldberg [Fri, 30 Nov 2018 11:22:06 +0000 (13:22 +0200)]
IB/mlx5: Report packet based credit mode device capability

Report packet based credit mode capability via the mlx5 DV interface.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Add packet based credit mode support
Danit Goldberg [Fri, 30 Nov 2018 11:22:05 +0000 (13:22 +0200)]
IB/mlx5: Add packet based credit mode support

The device can support two credit modes, message based (default) and
packet based. In order to enable packet based mode, the QP should be
created with special flag that indicates this.

This patch adds support for the new DV QP creation flag that can be used
for RC QPs in order to change the credit mode.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/mlx5: Expose packet based credit mode
Danit Goldberg [Fri, 30 Nov 2018 11:22:04 +0000 (13:22 +0200)]
net/mlx5: Expose packet based credit mode

Packet based credit mode bit determines whether the credit mode
is done per message or packet. Expose the QP creation flag and
the HCA capability.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoIB/rxe: Utilize generic function to validate port number
Yuval Shaia [Thu, 6 Dec 2018 14:02:34 +0000 (16:02 +0200)]
IB/rxe: Utilize generic function to validate port number

Utilize rdma_is_port_valid to validate the given port.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/rxe: Make function rxe_pool_cleanup return void
Yuval Shaia [Thu, 6 Dec 2018 11:04:38 +0000 (13:04 +0200)]
IB/rxe: Make function rxe_pool_cleanup return void

Since the function always returns 0 make it void.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/uverbs: Fix typo in string concatenation macro
Leon Romanovsky [Thu, 6 Dec 2018 10:19:05 +0000 (12:19 +0200)]
RDMA/uverbs: Fix typo in string concatenation macro

Update UVERBS_OBJECT() macro to properly concatenate the object name.

Fixes: e502a864c352 ("IB/core: Introduce DECLARE_UVERBS_GLOBAL_METHODS")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Enable TX on a DEVX flow table
Alex Vesker [Tue, 4 Dec 2018 13:34:05 +0000 (15:34 +0200)]
IB/mlx5: Enable TX on a DEVX flow table

Flow table can be passed as a DEVX object which is a valid destination in
an EGRESS flow. Fix the original code to allow that.

Fixes: a7ee18bdee83 ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agomlx4: Use snprintf instead of complicated strcpy
Qian Cai [Fri, 30 Nov 2018 02:18:07 +0000 (21:18 -0500)]
mlx4: Use snprintf instead of complicated strcpy

This fixes a compilation warning in sysfs.c

drivers/infiniband/hw/mlx4/sysfs.c:360:2: warning: 'strncpy' output may be
truncated copying 8 bytes from a string of length 31
[-Wstringop-truncation]

By eliminating the temporary stack buffer.

Signed-off-by: Qian Cai <cai@gmx.us>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Reduce lock contention on iowait_lock for sdma and pio
Mike Marciniszyn [Wed, 28 Nov 2018 18:33:00 +0000 (10:33 -0800)]
IB/hfi1: Reduce lock contention on iowait_lock for sdma and pio

Commit 4e045572e2c2 ("IB/hfi1: Add unique txwait_lock for txreq events")
laid the ground work to support per resource waiting locking.

This patch adds that with a lock unique to each sdma engine and pio
sendcontext and makes necessary changes for verbs, PSM, and vnic to use
the new locks.

This is particularly beneficial for smaller messages that will exhaust
resources at a faster rate.

Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Close VNIC sdma_progress sleep window
Mike Marciniszyn [Wed, 28 Nov 2018 18:32:48 +0000 (10:32 -0800)]
IB/hfi1: Close VNIC sdma_progress sleep window

The call to sdma_progress() is called outside the wait lock.

In this case, there is a race condition where sdma_progress() can return
false and the sdma_engine can idle.  If that happens, there will be no
more sdma interrupts to cause the wakeup and the vnic_sdma xmit will hang.

Fix by moving the lock to enclose the sdma_progress() call.

Also, delete the tx_retry. The need for this was removed by:
commit bcad29137a97 ("IB/hfi1: Serve the most starved iowait entry first")

Fixes: 64551ede6cd1 ("IB/hfi1: VNIC SDMA support")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Allow the driver to initialize QP priv struct
Mike Marciniszyn [Wed, 28 Nov 2018 18:22:31 +0000 (10:22 -0800)]
IB/hfi1: Allow the driver to initialize QP priv struct

This patch adds an interface to allow the driver to initialize the QP priv
struct when the QP is created and after the qpn has been assigned.  A
field is added to the QP priv struct to reference the rcd and two new
files are added to contain the function to initialize the rcd field so
that more TID RDMA related code can be added here later.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Add OPFN and TID RDMA capability bits
Kaike Wan [Wed, 28 Nov 2018 18:22:20 +0000 (10:22 -0800)]
IB/hfi1: Add OPFN and TID RDMA capability bits

The OPFN and TID RDMA capability bits are added to allow users to control
which feature is enabled and disabled.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Unreserve a reserved request when it is completed
Kaike Wan [Wed, 28 Nov 2018 18:22:09 +0000 (10:22 -0800)]
IB/hfi1: Unreserve a reserved request when it is completed

Currently, When a reserved operation is completed, its entry in the send
queue will not be unreserved, which leads to the miscalculation of
qp->s_avail and thus the triggering of a WARN_ON call trace. This patch
fixes the problem by unreserving the reserved operation when it is
completed.

Fixes: 856cc4c237ad ("IB/hfi1: Add the capability for reserved operations")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Consider LMC in 16B/bypass ingress packet check
Ashutosh Dixit [Wed, 28 Nov 2018 18:19:47 +0000 (10:19 -0800)]
IB/hfi1: Consider LMC in 16B/bypass ingress packet check

Ingress packet check for 16B/bypass packets should consider the port
LMC. Not doing this will result in packets sent to the LMC LIDs getting
dropped. The check is implemented in HW for 9B packets.

Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Incorrect sizing of sge for PIO will OOPs
Michael J. Ruhl [Wed, 28 Nov 2018 18:19:36 +0000 (10:19 -0800)]
IB/hfi1: Incorrect sizing of sge for PIO will OOPs

An incorrect sge sizing in the HFI PIO path will cause an OOPs similar to
this:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] hfi1_verbs_send_pio+0x3d8/0x530 [hfi1]
PGD 0
Oops: 0000 1 SMP
 Call Trace:
 ? hfi1_verbs_send_dma+0xad0/0xad0 [hfi1]
 hfi1_verbs_send+0xdf/0x250 [hfi1]
 ? make_rc_ack+0xa80/0xa80 [hfi1]
 hfi1_do_send+0x192/0x430 [hfi1]
 hfi1_do_send_from_rvt+0x10/0x20 [hfi1]
 rvt_post_send+0x369/0x820 [rdmavt]
 ib_uverbs_post_send+0x317/0x570 [ib_uverbs]
 ib_uverbs_write+0x26f/0x420 [ib_uverbs]
 ? security_file_permission+0x21/0xa0
 vfs_write+0xbd/0x1e0
 ? mntput+0x24/0x40
 SyS_write+0x7f/0xe0
 system_call_fastpath+0x16/0x1b

Fix by adding the missing sizing check to correctly determine the sge
length.

Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Limit VNIC use of SDMA engines to the available count
Michael J. Ruhl [Wed, 28 Nov 2018 18:19:25 +0000 (10:19 -0800)]
IB/hfi1: Limit VNIC use of SDMA engines to the available count

VNIC assumes that all SDMA engines have been configured for use.  This is
not necessarily true (i.e. if the count was constrained by the module
parameter).

Update VNICs usage to use the configured count, rather than the hardware
count.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Correctly process FECN and BECN in packets
Mitko Haralanov [Wed, 28 Nov 2018 18:19:15 +0000 (10:19 -0800)]
IB/hfi1: Correctly process FECN and BECN in packets

A CA is supposed to ignore FECN bits in multicast, ACK, and CNP
packets. This patch corrects the behavior of the HFI1 driver in this
regard by ignoring FECNs in those packet types.

While fixing the above behavior, fix the extraction of the FECN and BECN
bits from the packet headers for both 9B and 16B packets.

Furthermore, this patch corrects the driver's response to a FECN in RDMA
READ RESPONSE packets. Instead of sending an "empty" ACK, the driver now
sends a CNP packet. While editing that code path, add the missing trace
for CNP packets.

Fixes: 88733e3b8450 ("IB/hfi1: Add 16B UD support")
Fixes: f59fb9e05109 ("IB/hfi1: Fix handling of FECN marked multicast packet")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Ignore LNI errors before DC8051 transitions to Polling state
Kaike Wan [Wed, 28 Nov 2018 18:19:04 +0000 (10:19 -0800)]
IB/hfi1: Ignore LNI errors before DC8051 transitions to Polling state

When it is requested to change its physical state back to Offline while in
the process to go up, DC8051 will set the ERROR field in the
DC8051_DBG_ERR_INFO_SET_BY_8051 register. This ERROR field will remain
until the next time when DC8051 transitions from Offline to Polling.
Subsequently, when the host requests DC8051 to change its physical state
to Polling again, it may receive a DC8051 interrupt with the stale ERROR
field still in DC8051_DBG_ERR_INFO_SET_BY_8051. If the host link state has
been changed to Polling, this stale ERROR will force the host to
transition to Offline state, resulting in a vicious cycle of Polling
->Offline->Polling->Offline. On the other hand, if the host link state is
still Offline when the stale ERROR is received, the stale ERROR will be
ignored, and the link will come up correctly.  This patch implements the
correct behavior by changing host link state to Polling only after DC8051
changes its physical state to Polling.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Dump pio info for non-user send contexts
Kaike Wan [Wed, 28 Nov 2018 18:14:32 +0000 (10:14 -0800)]
IB/hfi1: Dump pio info for non-user send contexts

This patch dumps the pio info for non-user send contexts to assist
debugging in the field.

Reviewed-by: Mike Marciniczyn <mike.marciniszyn@intel.com>
Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add SRQ asynchronous event support
Lijun Ou [Sat, 24 Nov 2018 08:49:22 +0000 (16:49 +0800)]
RDMA/hns: Add SRQ asynchronous event support

This patch implements the process flow of SRQ asynchronous
event.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add SRQ support for hip08 kernel mode
Lijun Ou [Sat, 24 Nov 2018 08:49:21 +0000 (16:49 +0800)]
RDMA/hns: Add SRQ support for hip08 kernel mode

This patch implements the SRQ(Share Receive Queue) verbs
and update the poll cq verbs to deal with SRQ complentions.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Init SRQ table for hip08
Lijun Ou [Sat, 24 Nov 2018 08:49:20 +0000 (16:49 +0800)]
RDMA/hns: Init SRQ table for hip08

This patch inits hem resource for SRQ table, includes
SRQWQE and SRQWQE index resource.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Eanble SRQ capacity for hip08
Lijun Ou [Sat, 24 Nov 2018 08:49:19 +0000 (16:49 +0800)]
RDMA/hns: Eanble SRQ capacity for hip08

This patch configures the flags for enabling the
SRQ(Share Receive Queue) capacity as well as update the
verb of querying device for setting srq specifications.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoMerge branch 'mlx5-devx' into wip/dl-for-next
Doug Ledford [Tue, 4 Dec 2018 19:39:23 +0000 (14:39 -0500)]
Merge branch 'mlx5-devx' into wip/dl-for-next

From Yishai,
-----------------------------------
This series enriches DEVX support in few aspects: it enables interoperability
between DEVX and verbs and improves mechanism for controlling privileged DEVX
commands.

The first patch updates mlx5 ifc file.

Next 3 patches enable modifying and querying verbs objects via the DEVX
interface.

To achieve that the core layer introduced the 'UVERBS_IDR_ANY_OBJECT' type
to match any IDR object. Once it's used by some driver's method, the
infrastructure skips checking for the IDR type and it becomes the driver
handler responsibility.

The DEVX methods of modify and query were changed to get any object type via
the 'UVERBS_IDR_ANY_OBJECT' mechanism. The type checking is done per object as
part of the driver code.

The next 3 patches introduce more robust mechanism for controlling privileged
DEVX commands. The responsibility to block/allow per command was moved to be
done in the firmware based on the UID credentials that the driver reports upon
user context creation. This enables more granularity per command based on the
device security model and the user credentials.

In addition, by introducing a valid range for 'general commands' we prevent the
need to touch the driver's code any time that a new future command will be
added.

The last patch fixes the XRC verbs flow once a DEVX context is used. This is
needed as XRCD is some shared kernel resource and as such a kernel UID (=0)
should be used in its related resources.

Thanks

Yishai Hadas
-----------------------------------

The top 6 patches are the mlx5-devx series, the remainder are from the
mlx5-next tree as the mlx5-devx series depended on the mlx5-next
mlx5_ifc file update.

* mlx5-devx: (42 commits)
  IB/mlx5: Allow XRC usage via verbs in DEVX context
  IB/mlx5: Update the supported DEVX commands
  IB/mlx5: Enforce DEVX privilege by firmware
  IB/mlx5: Enable modify and query verbs objects via DEVX
  IB/core: Enable getting an object type from a given uobject
  IB/core: Introduce UVERBS_IDR_ANY_OBJECT
  net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits
  RDMA/mlx5: Unfold modify RMP function
  RDMA/mlx5: Unfold create RMP function
  RDMA/mlx5: Initialize SRQ tables on mlx5_ib
  RDMA/mlx5: Update SRQ functions signatures to mlx5_ib format
  RDMA/mlx5: Use stages for callback to setup and release DEVX
  RDMA/mlx5: Remove SRQ signature global flag
  net/mlx5: Move SRQ functions to RDMA part
  net/mlx5: Remove references to local mlx5_core functions
  net/mlx5: Remove not-used lib/eq.h header file
  net/mlx5: Remove dead transobj code
  net/mlx5: Align SRQ licenses and copyright information
  net/mlx5: Debug print for forwarded async events
  net/mlx5: Forward SRQ resource events
  ...

Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/mlx5: Allow XRC usage via verbs in DEVX context
Yishai Hadas [Mon, 26 Nov 2018 06:28:38 +0000 (08:28 +0200)]
IB/mlx5: Allow XRC usage via verbs in DEVX context

Allows XRC usage from the verbs flow in a DEVX context.
As XRCD is some shared kernel resource between processes it should be
created with UID=0 to point on that.

As a result once XRC QP/SRQ are created they must be used as well with
UID=0 so that firmware will allow the XRCD usage.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/mlx5: Update the supported DEVX commands
Yishai Hadas [Mon, 26 Nov 2018 06:28:37 +0000 (08:28 +0200)]
IB/mlx5: Update the supported DEVX commands

Update the supported DEVX commands, it includes adding to the
query/modify command's list and to the encoding handling.

In addition, a valid range for general commands was added to be used for
future commands.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/mlx5: Enforce DEVX privilege by firmware
Yishai Hadas [Mon, 26 Nov 2018 06:28:36 +0000 (08:28 +0200)]
IB/mlx5: Enforce DEVX privilege by firmware

Enforce DEVX privilege by firmware, this enables future device
functionality without the need to make driver changes unless a new
privilege type will be introduced.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/mlx5: Enable modify and query verbs objects via DEVX
Yishai Hadas [Mon, 26 Nov 2018 06:28:35 +0000 (08:28 +0200)]
IB/mlx5: Enable modify and query verbs objects via DEVX

Enables modify and query verbs objects via the DEVX interface.
To support this the above DEVX handlers were changed to get any
object type via the UVERBS_IDR_ANY_OBJECT mechanism.

The type checking and handling is done per object as part of the
driver code.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/core: Enable getting an object type from a given uobject
Yishai Hadas [Mon, 26 Nov 2018 06:28:34 +0000 (08:28 +0200)]
IB/core: Enable getting an object type from a given uobject

Enable getting an object type from a given uobject, the type is saved
upon tree merging and is returned as part of some helper function.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/core: Introduce UVERBS_IDR_ANY_OBJECT
Yishai Hadas [Mon, 26 Nov 2018 06:28:33 +0000 (08:28 +0200)]
IB/core: Introduce UVERBS_IDR_ANY_OBJECT

Introduce the UVERBS_IDR_ANY_OBJECT type to match any IDR object.

Once used, the infrastructure skips checking for the IDR type, it
becomes the driver handler responsibility.

This enables drivers to get in a given method an object from various of
types.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoMerge 'mlx5-next' into mlx5-devx
Doug Ledford [Tue, 4 Dec 2018 18:36:57 +0000 (13:36 -0500)]
Merge 'mlx5-next' into mlx5-devx

The enhanced devx support series needs commit:
9d43faac02e3 ("net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits")

Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agonet/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits
Yishai Hadas [Mon, 26 Nov 2018 06:28:32 +0000 (08:28 +0200)]
net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits

Expose device capabilities for DEVX user context, it includes which caps
the device is supported and a matching bit to set as part of user
context creation.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/mlx5: Unfold modify RMP function
Leon Romanovsky [Wed, 28 Nov 2018 18:53:43 +0000 (20:53 +0200)]
RDMA/mlx5: Unfold modify RMP function

There is no need to perform modify_rmp in two separate function,
while one of them uses stack as a placeholder for data while other
allocates it dynamically. Combine those two functions to one call
instead of two.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/mlx5: Unfold create RMP function
Leon Romanovsky [Wed, 28 Nov 2018 18:53:42 +0000 (20:53 +0200)]
RDMA/mlx5: Unfold create RMP function

There is no need to perform create_rmp in two separate function, while
one of them uses stack as a placeholder for data while other allocates
it dynamically. Combine those two functions to one instead of two.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/mlx5: Initialize SRQ tables on mlx5_ib
Leon Romanovsky [Wed, 28 Nov 2018 18:53:41 +0000 (20:53 +0200)]
RDMA/mlx5: Initialize SRQ tables on mlx5_ib

Transfer initialization and cleanup from mlx5_priv struct of
mlx5_core_dev to be part of mlx5_ib_dev. This completes removal
of SRQ from mlx5_core.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/mlx5: Update SRQ functions signatures to mlx5_ib format
Leon Romanovsky [Wed, 28 Nov 2018 18:53:40 +0000 (20:53 +0200)]
RDMA/mlx5: Update SRQ functions signatures to mlx5_ib format

Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by
updating function signatures do not require mlx5_core_dev as an input,
because all operations in mlx5_ib are supposed to use mlx5_ib_dev.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/mlx5: Use stages for callback to setup and release DEVX
Leon Romanovsky [Wed, 28 Nov 2018 18:53:39 +0000 (20:53 +0200)]
RDMA/mlx5: Use stages for callback to setup and release DEVX

Reuse existing infrastructure to initialize and release DEVX uid.
The DevX interface is intended for user space access, so it is supposed
to be initialized before ib_register_device(). Also it isn't supported
in switchdev mode and don't need to initialize it in that mode.

Fixes: 76dc5a8406bf ("IB/mlx5: Manage device uid for DEVX white list commands")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/mlx5: Remove SRQ signature global flag
Leon Romanovsky [Wed, 28 Nov 2018 18:53:38 +0000 (20:53 +0200)]
RDMA/mlx5: Remove SRQ signature global flag

SRQ signature is not supported, hence no need for special static
global variable to announce it.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Move SRQ functions to RDMA part
Leon Romanovsky [Wed, 28 Nov 2018 18:53:37 +0000 (20:53 +0200)]
net/mlx5: Move SRQ functions to RDMA part

There is no need to keep SRQ which is RDMA object in mlx5_core.
In this patch, we partially move the execution code, while next patches
will move table initialization/release logic too.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Remove references to local mlx5_core functions
Leon Romanovsky [Wed, 28 Nov 2018 18:53:36 +0000 (20:53 +0200)]
net/mlx5: Remove references to local mlx5_core functions

As a preparation to move SRQ functionality to RDMA, drop all references
to mlx5_core logic and make SRQ be dependent on shared code only.

Most of the time, we are interested to know if events are working/not
working and it is possible with previous commit ("net/mlx5: Debug print
for forwarded async events").

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Remove not-used lib/eq.h header file
Leon Romanovsky [Wed, 28 Nov 2018 18:53:35 +0000 (20:53 +0200)]
net/mlx5: Remove not-used lib/eq.h header file

lib/eq.h is needed for EQ manipulation which are not performed in SRQ.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Remove dead transobj code
Leon Romanovsky [Wed, 28 Nov 2018 18:53:34 +0000 (20:53 +0200)]
net/mlx5: Remove dead transobj code

Delete functions which are not called and not needed.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Align SRQ licenses and copyright information
Leon Romanovsky [Wed, 28 Nov 2018 18:53:33 +0000 (20:53 +0200)]
net/mlx5: Align SRQ licenses and copyright information

Ensure that both RDMA and netdev parts of SRQ implementation
has same copyright and license information annotated by SPDX
tags.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/nldev: Export to user space number of contexts
Leon Romanovsky [Wed, 28 Nov 2018 11:16:45 +0000 (13:16 +0200)]
RDMA/nldev: Export to user space number of contexts

[leonro@server ~]$ rdma res show
1: mlx5_0: pd 3 cq 5 qp 4 cm_id 0 mr 0 ctx 0

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Annotate alloc/deallloc paths with context tracking
Leon Romanovsky [Wed, 28 Nov 2018 11:16:44 +0000 (13:16 +0200)]
RDMA/uverbs: Annotate alloc/deallloc paths with context tracking

Add restrack annotations to track allocations of ucontexts.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/restrack: Track ucontext
Leon Romanovsky [Wed, 28 Nov 2018 11:16:43 +0000 (13:16 +0200)]
RDMA/restrack: Track ucontext

Add ability to track allocated ib_ucontext, which are limited
resource and worth to be visible by users.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoMerge branch 'write-handler-consistent-flow' into for-next
Doug Ledford [Mon, 3 Dec 2018 17:20:53 +0000 (12:20 -0500)]
Merge branch 'write-handler-consistent-flow' into for-next

Make all of the write() handlers use a consistent flow

From Jason,

This series unifies all the write handlers to use a flow that is very
similar to the ioctl handler flow, including having the same basic
assumptions about extensible buffer handling and the same handler
function call signature.

Along the way this consolidates all the copy_to/from_user into a small
set of safe buffer accessor functions tailored to the usage here. These
accessors use the new dispatcher-controlled calling convention for ucore
data, and support a placement of the response that does not rely on the
cmd.response value.

Overall this brings in in strong bounds checking to all the write()
handlers and consistent enforcement of the zero-fill/zero-check
methodology for buffer extension.

The end result is a significant complexity reduction for all of the
handlers and creates a high degree of uniformity between the write,
write_ex, and ioctl handlers and dispatch flow.

Thanks

Jason Gunthorpe (12):
  RDMA/uverbs: Remove out_len checks that are now done by the core
  RDMA/uverbs: Use uverbs_attr_bundle to pass ucore for write/write_ex
  RDMA/uverbs: Get rid of the 'callback' scheme in the compat path
  RDMA/uverbs: Use uverbs_response() for remaining response copying
  RDMA/uverbs: Use uverbs_request() for request copying
  RDMA/uverbs: Use uverbs_request() and core for write_ex handlers
  RDMA/uverbs: Fill in the response for IB_USER_VERBS_EX_CMD_MODIFY_QP
  RDMA/uverbs: Simplify ib_uverbs_ex_query_device
  RDMA/uverbs: Add a simple iterator interface for reading the command
  RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()
  RDMA/uverbs: Do not check the input length on create_cq/qp paths
  RDMA/uverbs: Use only attrs for the write() handler signature

 drivers/infiniband/core/rdma_core.h   |    5 +-
 drivers/infiniband/core/uverbs_cmd.c  | 1165 ++++++++++---------------
 drivers/infiniband/core/uverbs_main.c |   23 +-
 drivers/infiniband/core/uverbs_uapi.c |   23 +-
 include/rdma/uverbs_ioctl.h           |    9 +-
 5 files changed, 479 insertions(+), 746 deletions(-)

Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Use only attrs for the write() handler signature
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:45 +0000 (20:58 +0200)]
RDMA/uverbs: Use only attrs for the write() handler signature

All of the old arguments can be derived from the uverbs_attr_bundle
structure, so get rid of the redundant arguments. Most of the prior work
has been removing users of the arguments to allow this to be a simple
patch.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Do not check the input length on create_cq/qp paths
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:44 +0000 (20:58 +0200)]
RDMA/uverbs: Do not check the input length on create_cq/qp paths

If the user did not provide a long enough command buffer then the missing
bytes are forced to zero. There is no reason to check the length if a zero
value is OK.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:43 +0000 (20:58 +0200)]
RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()

This has a very complicated memory layout, with two flex arrays. Use
the iterator API to make reading it clearer.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Add a simple iterator interface for reading the command
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:42 +0000 (20:58 +0200)]
RDMA/uverbs: Add a simple iterator interface for reading the command

Several methods have a command with a trailing flex array, and they
all open code some extraction scheme. Centralize this into a simple
iterator API.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Simplify ib_uverbs_ex_query_device
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:41 +0000 (20:58 +0200)]
RDMA/uverbs: Simplify ib_uverbs_ex_query_device

We truncate the response structure if there is not enough room in the
user buffer so there is no reason to have all the mess with finely managing
response_length. Just fully fill the attrs and truncate on copy.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Fill in the response for IB_USER_VERBS_EX_CMD_MODIFY_QP
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:40 +0000 (20:58 +0200)]
RDMA/uverbs: Fill in the response for IB_USER_VERBS_EX_CMD_MODIFY_QP

A response struct was defined, and userspace is providing it (but not
checking it). Fill it in and write it out.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Use uverbs_request() and core for write_ex handlers
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:39 +0000 (20:58 +0200)]
RDMA/uverbs: Use uverbs_request() and core for write_ex handlers

The write_ex handlers have this horrible boilerplate in every function to
do the zero extend/zero check and min size checks. This is now handled in
the core code via the meta-data, and the zero checks are handled by
uverbs_request(). Replace all the occurrences.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Use uverbs_request() for request copying
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:38 +0000 (20:58 +0200)]
RDMA/uverbs: Use uverbs_request() for request copying

This function properly zero-extends, and zero-checks if the user
buffer is not the same size as the kernel command struct.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Use uverbs_response() for remaining response copying
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:37 +0000 (20:58 +0200)]
RDMA/uverbs: Use uverbs_response() for remaining response copying

This function properly truncates and zero-fills the response which is the
standard used by the ioctl uAPI when working with user data.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Get rid of the 'callback' scheme in the compat path
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:36 +0000 (20:58 +0200)]
RDMA/uverbs: Get rid of the 'callback' scheme in the compat path

There is no reason for this. For response processing we simply need to
copy, truncate, and zero fill the response into whatever output buffer
was provided. Add a function uverbs_response() that does this
consistently.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Use uverbs_attr_bundle to pass ucore for write/write_ex
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:35 +0000 (20:58 +0200)]
RDMA/uverbs: Use uverbs_attr_bundle to pass ucore for write/write_ex

This creates a consistent way to access the two core buffers across write
and write_ex handlers.

Remove the open coded ucore conversion in the write/ex compatibility
handlers.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Remove out_len checks that are now done by the core
Jason Gunthorpe [Sun, 25 Nov 2018 18:58:34 +0000 (20:58 +0200)]
RDMA/uverbs: Remove out_len checks that are now done by the core

write() methods must work with fixed sized structures as that is the only
way to know where the udata segment starts. The common udata code now
rejects any write() that has a response buffer shorter than the core's
response.

Thus all the checks of out_len for write methods are redundant and can be
removed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agonet/mlx5: Debug print for forwarded async events
Saeed Mahameed [Mon, 26 Nov 2018 22:39:08 +0000 (14:39 -0800)]
net/mlx5: Debug print for forwarded async events

Print a debug message for every async FW event forwarded to mlx5
interfaces (mlx5e netdev and mlx5_ib rdma module).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Forward SRQ resource events
Saeed Mahameed [Mon, 26 Nov 2018 22:39:07 +0000 (14:39 -0800)]
net/mlx5: Forward SRQ resource events

Allow forwarding of SRQ events to mlx5_core interfaces, e.g. mlx5_ib.
Use mlx5_notifier_register/unregister in srq.c in order to allow seamless
transition of srq.c to infiniband subsystem.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Forward QP/WorkQueues resource events
Saeed Mahameed [Mon, 26 Nov 2018 22:39:06 +0000 (14:39 -0800)]
net/mlx5: Forward QP/WorkQueues resource events

Allow forwarding QP and WQ events to mlx5_core interfaces, e.g. mlx5_ib

Use mlx5_notifier_register/unregister in qp.c in order to allow seamless
transition of qp.c to infiniband subsystem.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Remove all deprecated software versions of FW events
Saeed Mahameed [Mon, 26 Nov 2018 22:39:05 +0000 (14:39 -0800)]
net/mlx5: Remove all deprecated software versions of FW events

Before the new mlx5 event notification infrastructure and API,
mlx5_core used to process all events before forwarding them to mlx5
interfaces (mlx5e/mlx5_ib) and used to translate the event type enum
to a software defined enum, this is not needed anymore since it is ok
for mlx5e and mlx5_ib to receive FW events as is, at least the few ones
mlx5 core allows.

mlx5e and mlx5_ib already moved to use the new API and they only handle FW
events types, it is now safe to remove all equivalent software defined
events and the logic around them.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoIB/mlx5: Handle raw delay drop general event
Saeed Mahameed [Mon, 26 Nov 2018 22:39:04 +0000 (14:39 -0800)]
IB/mlx5: Handle raw delay drop general event

Handle FW general event rq delay drop as it was received from FW via mlx5
notifiers API, instead of handling the processed software version of that
event. After this patch we can safely remove all software processed FW
events types and definitions.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Allow forwarding event type general event as is
Saeed Mahameed [Mon, 26 Nov 2018 22:39:03 +0000 (14:39 -0800)]
net/mlx5: Allow forwarding event type general event as is

FW general event is used by mlx5_ib for RQ delay drop timeout event
handling, in this patch we allow to forward FW general event type to mlx5
notifiers chain so mlx5_ib can handle it and to deprecate the software
version of it.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoIB/mlx5: Handle raw port change event rather than the software version
Saeed Mahameed [Mon, 26 Nov 2018 22:39:02 +0000 (14:39 -0800)]
IB/mlx5: Handle raw port change event rather than the software version

Use the FW version of the port change event as forwarded via new mlx5
notifiers API.

After this patch, processed software version of the port change event
will become deprecated and will be totally removed in downstream
patches.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Remove unused events callback and logic
Saeed Mahameed [Mon, 26 Nov 2018 22:39:01 +0000 (14:39 -0800)]
net/mlx5: Remove unused events callback and logic

The mlx5_interface->event callback is not used by mlx5e/mlx5_ib anymore.

We totally remove the delayed events logic work around, since with
the dynamic notifier registration API it is not needed anymore, mlx5_ib
can register its notifier and start receiving events exactly at the moment
it is ready to handle them.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoIB/mlx5: Use the new mlx5 core notifier API
Saeed Mahameed [Mon, 26 Nov 2018 22:39:00 +0000 (14:39 -0800)]
IB/mlx5: Use the new mlx5 core notifier API

Remove the deprecated mlx5_interface->event mlx5_ib callback and use new
mlx5 notifier API to subscribe for mlx5 events.

For native mlx5_ib devices profiles pf_profile/nic_rep_profile register
the notifier callback mlx5_ib_handle_event which treats the notifier
context as mlx5_ib_dev.

For vport repesentors, don't register any notifier, same as before, they
didn't receive any mlx5 events.

For slave port (mlx5_ib_multiport_info) register a different notifier
callback mlx5_ib_event_slave_port, which knows that the event is coming
for mlx5_ib_multiport_info and prepares the event job accordingly.
Before this on the event handler work we had to ask mlx5_core if this is
a slave port mlx5_core_is_mp_slave(work->dev), now it is not needed
anymore.
mlx5_ib_multiport_info notifier registration is done on
mlx5_ib_bind_slave_port and de-registration is done on
mlx5_ib_unbind_slave_port.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Forward all mlx5 events to mlx5 notifiers chain
Saeed Mahameed [Mon, 26 Nov 2018 22:38:59 +0000 (14:38 -0800)]
net/mlx5: Forward all mlx5 events to mlx5 notifiers chain

This to allow seamless migration to the new notifier chain API, and to
eventually deprecate interfaces dev->event callback.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Use the new mlx5 core notifier API
Saeed Mahameed [Mon, 26 Nov 2018 22:38:58 +0000 (14:38 -0800)]
net/mlx5e: Use the new mlx5 core notifier API

Remove the deprecated mlx5_interface->event mlx5e callback and use new
mlx5 notifier API to subscribe for mlx5 events, handle port change event
as received from FW rather than handling the mlx5 core processed port
change software version event.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Allow port change event to be forwarded to driver notifiers chain
Saeed Mahameed [Mon, 26 Nov 2018 22:38:57 +0000 (14:38 -0800)]
net/mlx5: Allow port change event to be forwarded to driver notifiers chain

The idea is to allow mlx5 core interfaces (mlx5e/mlx5_ib) to be able to
receive some allowed FW events as is via the new notifier API.

In this patch we allow forwarding port change event to mlx5 core interfaces
(mlx5e/mlx5_ib) as it was received from FW.
Once mlx5e and mlx5_ib start using this event we can safely remove the
redundant software version of it and its translation logic.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Driver events notifier API
Saeed Mahameed [Mon, 26 Nov 2018 22:38:56 +0000 (14:38 -0800)]
net/mlx5: Driver events notifier API

Use atomic notifier chain to fire events to mlx5 core driver
consumers (mlx5e/mlx5_ib) and provide mlx5 register/unregister notifier
API.

This API will replace the current mlx5_interface->event callback and all
the logic around it, especially the delayed events logic introduced by
commit 97834eba7c19 ("net/mlx5: Delay events till ib registration ends")

Which is not needed anymore with this new API where the mlx5 interface
can dynamically register/unregister its notifier.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoIB/mlx5: Use fragmented QP's buffer for in-kernel users
Guy Levi [Mon, 26 Nov 2018 06:15:50 +0000 (08:15 +0200)]
IB/mlx5: Use fragmented QP's buffer for in-kernel users

The current implementation of create QP requires contiguous memory, such a
requirement is problematic once the memory is fragmented or the system is
low in memory, it causes failures in dma_zalloc_coherent().

This patch takes advantage of the new mlx5_core API which allocates a
fragmented buffer. This makes the QP creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.

We also use the opportunity to fix some cosmetic legacy coding convention
errors which were in the feature scope.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Use fragmented SRQ's buffer for in-kernel users
Guy Levi [Mon, 26 Nov 2018 06:15:39 +0000 (08:15 +0200)]
IB/mlx5: Use fragmented SRQ's buffer for in-kernel users

The current implementation of create SRQ requires contiguous memory, such
a requirement is problematic once the memory is fragmented or the system
is low in memory, it causes failures in dma_zalloc_coherent().

This patch takes the advantage of the new mlx5_core API which allocates a
fragmented buffer, and makes the SRQ creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agorxe: IB_WR_REG_MR does not capture MR's iova field
Chuck Lever [Sun, 25 Nov 2018 22:13:08 +0000 (17:13 -0500)]
rxe: IB_WR_REG_MR does not capture MR's iova field

FRWR memory registration is done with a series of calls and WRs.
1. ULP invokes ib_dma_map_sg()
2. ULP invokes ib_map_mr_sg()
3. ULP posts an IB_WR_REG_MR on the Send queue

Step 2 generates an iova. It is permissible for ULPs to change this
iova (with certain restrictions) between steps 2 and 3.

rxe_map_mr_sg captures the MR's iova but later when rxe processes the
REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova
after step 2 but before step 3, rxe never captures that change.

When the remote sends an RDMA Read targeting that MR, rxe looks up the
R_key, but the altered iova does not match the iova stored in the MR,
causing the RDMA Read request to fail.

Reported-by: Anna Schumaker <schumaker.anna@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Attach a DEVX counter via raw flow creation
Mark Bloch [Tue, 20 Nov 2018 18:31:08 +0000 (20:31 +0200)]
RDMA/mlx5: Attach a DEVX counter via raw flow creation

Allow a user to attach a DEVX counter via mlx5 raw flow creation. In order
to attach a counter we introduce a new attribute:

MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX

A counter can be attached to multiple flow steering rules.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qib: Remove all occurrences of BUG_ON()
Leon Romanovsky [Thu, 29 Nov 2018 12:15:28 +0000 (14:15 +0200)]
RDMA/qib: Remove all occurrences of BUG_ON()

QIB driver was added in 2010 with many BUG_ON(), most of them were cleaned
out after years of development and usages.

It looks like that it is safe now to remove rest of BUG_ONs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/usnic: fix spelling mistake "miniumum" -> "minimum"
Colin Ian King [Thu, 29 Nov 2018 10:42:13 +0000 (10:42 +0000)]
IB/usnic: fix spelling mistake "miniumum" -> "minimum"

There is a spelling mistake in a usnic_err error message, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/uverbs: fix ptr_ret.cocci warnings
kbuild test robot [Tue, 27 Nov 2018 23:21:30 +0000 (07:21 +0800)]
RDMA/uverbs: fix ptr_ret.cocci warnings

drivers/infiniband/core/uverbs_cmd.c:1095:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 7106a9769715 ("RDMA/uverbs: Make write() handlers return 0 on success")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/drivers: Fix spelling mistake "initalize" -> "initialize"
Colin Ian King [Wed, 28 Nov 2018 15:11:16 +0000 (15:11 +0000)]
RDMA/drivers: Fix spelling mistake "initalize" -> "initialize"

Fix spelling mistake in usnic_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/uverbs: Use uverbs_attr_bundle to pass udata for ioctl()
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:20 +0000 (20:51 +0200)]
RDMA/uverbs: Use uverbs_attr_bundle to pass udata for ioctl()

Have the core code initialize the driver_udata if the method has a udata
description. This is done using the same create_udata the handler was
supposed to call.

This makes ioctl consistent with the write and write_ex paths.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Use uverbs_attr_bundle to pass udata for write
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:19 +0000 (20:51 +0200)]
RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write

Now that we have metadata describing the command format the core code can
directly compute the udata pointers and all the really ugly
ib_uverbs_init_udata() calls can be removed from the handlers.

This means all the write() handlers are no longer sensitive to the layout
of the command buffer.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Use uverbs_attr_bundle to pass udata for write_ex
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:18 +0000 (20:51 +0200)]
RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write_ex

The core code needs to compute the udata so we may as well pass it in the
uverbs_attr_bundle instead of on the stack. This converts the simple case
of write_ex() which already has a core calculation.

Also change the write() path to use the attrs for ib_uverbs_init_udata()
instead of on the stack. This lets the write to write_ex compatibility
path continue to follow the lead of the _ex path.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Prohibit write() calls with too small buffers
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:17 +0000 (20:51 +0200)]
RDMA/uverbs: Prohibit write() calls with too small buffers

The size meta-data in the prior patch describes the smallest acceptable
buffer for the write() interface. Globally check this in the core code.

This is necessary in the case of write() methods that have a driver udata
to prevent computing a negative udata buffer length.

The return code of -ENOSPC is chosen here as some of the handlers already
use this code, however many other handler use EINVAL.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Add structure size info to write commands
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:16 +0000 (20:51 +0200)]
RDMA/uverbs: Add structure size info to write commands

We need the structure sizes to compute the location of the udata in the
core code. Annotate the sizes into the new macro language.

This is generated largely by script and checked by comparing against the
similar list in rdma-core.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Do not pass ib_uverbs_file to ioctl methods
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:15 +0000 (20:51 +0200)]
RDMA/uverbs: Do not pass ib_uverbs_file to ioctl methods

The uverbs_attr_bundle already contains this pointer, and most methods
don't actually need it. Get rid of the redundant function argument.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Make write() handlers return 0 on success
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:14 +0000 (20:51 +0200)]
RDMA/uverbs: Make write() handlers return 0 on success

Currently they return the command length, while all other handlers return
0. This makes the write path closer to the write_ex and ioctl path.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for write
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:13 +0000 (20:51 +0200)]
RDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for write

Now that we can add meta-data to the description of write() methods we
need to pass the uverbs_attr_bundle into all write based handlers so
future patches can use it as a container for any new data transferred out
of the core.

This is the first step to bringing the write() and ioctl() methods to a
common interface signature.

This is a simple search/replace, and we push the attr down into the uobj
and other APIs to keep changes minimal.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/uverbs: Add missing driver_data
Jason Gunthorpe [Sun, 25 Nov 2018 18:51:12 +0000 (20:51 +0200)]
RDMA/uverbs: Add missing driver_data

If the struct is used with a driver_udata it should have a trailing
driver_data flex array to mark it as having udata.

In most cases this forces the end of the struct to be aligned to u64 which
is needed to make the trailing driver_data naturally aligned.

Unfortunately We have a few cases where the base struct is not aligned to
8 bytes, these are marked with a u32 driver_data and userspace will check
for alignment issues when it compiles the driver.

Also remove the empty ib_uverbs_modify_qp_resp as nothing uses this.

pahole says there is no change to any struct sizes by this change.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoIB/qib: fix spelling mistake "colescing" -> "coalescing"
Colin Ian King [Mon, 26 Nov 2018 16:23:20 +0000 (16:23 +0000)]
IB/qib: fix spelling mistake "colescing" -> "coalescing"

There is a spelling mistake in the module description text, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/mlx5: Improve core device events handling
Saeed Mahameed [Tue, 20 Nov 2018 22:12:28 +0000 (14:12 -0800)]
net/mlx5: Improve core device events handling

Register a separate handler per event type, rather than listening for all
events and looking for the events to handle in a switch case.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Device events, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:27 +0000 (14:12 -0800)]
net/mlx5: Device events, Use async events chain

Move all the generic async events handling into new specific events
handling file events.c to keep eq.c file clean from concrete event logic
handling.

Use new API to register for NOTIFY_ANY to handle generic events and
dispatch allowed events to mlx5_core consumers (mlx5_ib and mlx5e)

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: CQ ERR, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:26 +0000 (14:12 -0800)]
net/mlx5: CQ ERR, Use async events chain

Remove the explicit call to mlx5_eq_cq_event on MLX5_EVENT_TYPE_CQ_ERROR
and register a specific CQ ERROR handler via the new API.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Resource tables, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:25 +0000 (14:12 -0800)]
net/mlx5: Resource tables, Use async events chain

Remove the explicit call to QP/SRQ resources events handlers on several FW
events and let resources logic register resources events notifiers via the
new API.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: CmdIF, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:24 +0000 (14:12 -0800)]
net/mlx5: CmdIF, Use async events chain

Remove the explicit call to mlx5_cmd_comp_handler on MLX5_EVENT_TYPE_CMD
and let command interface to register its own handler when its ready.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: FWPage, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:23 +0000 (14:12 -0800)]
net/mlx5: FWPage, Use async events chain

Remove the explicit call to mlx5_core_req_pages_handler on
MLX5_EVENT_TYPE_PAGE_REQUEST and let FW page logic  to register its own
handler when its ready.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:22 +0000 (14:12 -0800)]
net/mlx5: E-Switch, Use async events chain

Remove the explicit call to mlx5_eswitch_vport_event on
MLX5_EVENT_TYPE_NIC_VPORT_CHANGE and let the eswitch register its own
handler when its ready.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Clock, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:21 +0000 (14:12 -0800)]
net/mlx5: Clock, Use async events chain

Remove the explicit call to mlx5_pps_event on MLX5_EVENT_TYPE_PPS_EVENT
and let clock logic to register its own handler when its ready.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: FPGA, Use async events chain
Saeed Mahameed [Tue, 20 Nov 2018 22:12:20 +0000 (14:12 -0800)]
net/mlx5: FPGA, Use async events chain

Remove the explicit call to mlx5_fpga_event on
MLX5_EVENT_TYPE_FPGA_ERROR or MLX5_EVENT_TYPE_FPGA_QP_ERROR
let fpga core to register its own handler when its ready.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>