OSDN Git Service

uclinux-h8/linux.git
6 years agoIB/hfi1: Add 16B UD support
Don Hiatt [Fri, 4 Aug 2017 20:54:23 +0000 (13:54 -0700)]
IB/hfi1: Add 16B UD support

Add 16B bypass packet support for UD traffic types.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Determine 9B/16B L2 header type based on Address handle
Don Hiatt [Fri, 4 Aug 2017 20:54:16 +0000 (13:54 -0700)]
IB/hfi1: Determine 9B/16B L2 header type based on Address handle

When address handle attributes are initialized, the LIDs are
transformed to be in the 32 bit LID space.
When constructing the header, hfi1 driver will look at the LID
to determine the packet header to be created.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Add support to process 16B header errors
Don Hiatt [Fri, 4 Aug 2017 20:54:10 +0000 (13:54 -0700)]
IB/hfi1: Add support to process 16B header errors

Enhance hdr_rcverr() to also handle errors during
16B bypass packet receive.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Add support to send 16B bypass packets
Don Hiatt [Fri, 4 Aug 2017 20:54:04 +0000 (13:54 -0700)]
IB/hfi1: Add support to send 16B bypass packets

We introduce struct hfi1_opa_header as a union
of ib (9B) and 16B headers.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Add support to receive 16B bypass packets
Don Hiatt [Fri, 4 Aug 2017 20:53:58 +0000 (13:53 -0700)]
IB/hfi1: Add support to receive 16B bypass packets

We introduce a struct hfi1_16b_header to support 16B headers.
16B bypass packets are received by the driver and processed
similar to 9B packets. Add basic support to handle 16B packets.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rdmavt, hfi1, qib: Modify check_ah() to account for extended LIDs
Don Hiatt [Fri, 4 Aug 2017 20:53:51 +0000 (13:53 -0700)]
IB/rdmavt, hfi1, qib: Modify check_ah() to account for extended LIDs

rvt_check_ah() delegates lid verification to underlying
driver. Underlying driver uses different conditions to
check for dlid depending on whether the device supports
extended LIDs

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hf1: User context locking is inconsistent
Michael J. Ruhl [Fri, 4 Aug 2017 20:52:44 +0000 (13:52 -0700)]
IB/hf1: User context locking is inconsistent

There is a mixture of mutex and spinlocks to protect receive context
(rcd/uctxt) information.  This is not used consistently.

Use the mutex to protect device receive context information only.
Use the spinlock to protect sub context information only.

Protect access to items in the rcd array with a spinlock and
reference count.

Remove spinlock around dd->rcd array cleanup.  Since interrupts are
disabled and cleaned up before this point, this lock is not useful.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Protect context array set/clear with spinlock
Michael J. Ruhl [Fri, 4 Aug 2017 20:52:38 +0000 (13:52 -0700)]
IB/hfi1: Protect context array set/clear with spinlock

The rcd array can be accessed from user context or during interrupts.
Protecting this with a mutex isn't a good idea because the mutex should
not be used from an IRQ.

Protect the allocation and freeing of rcd array elements with a
spinlock.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Use host_link_state to read state when DC is shut down
Bartlomiej Dudek [Fri, 4 Aug 2017 20:52:32 +0000 (13:52 -0700)]
IB/hfi1: Use host_link_state to read state when DC is shut down

When DC is shut down (by e.g.  disconnecting the cable), the
driver should use host_link_state to get port's current
physical state. This is due to the fact that physical state
is read from DC's CSRs and when DC is shut down and state is
changed, its registers are not impacted.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Remove lstate from hfi1_pportdata
Byczkowski, Jakub [Fri, 4 Aug 2017 20:52:26 +0000 (13:52 -0700)]
IB/hfi1: Remove lstate from hfi1_pportdata

Do not track logical state separately from host_link_state. Deduce
logical state from host_link_state when required. Transitions in
set_link_state and goto_offline already make sure host_link_state
reflects hardware's logical state properly.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Remove pmtu from the QP structure
Sebastian Sanchez [Fri, 4 Aug 2017 20:52:20 +0000 (13:52 -0700)]
IB/hfi1: Remove pmtu from the QP structure

The pmtu field doens't have be stored in the QP structure
as it can easily be calculated when needed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Revert egress pkey check enforcement
Alex Estrin [Fri, 4 Aug 2017 20:52:13 +0000 (13:52 -0700)]
IB/hfi1: Revert egress pkey check enforcement

Current code has some serious flaws. Disarm the flag
pending an appropriate patch.

Fixes: 53526500f301 ("IB/hfi1: Permanently enable P_Key checking in HFI")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Fix input len in multiple user verbs
Amrani, Ram [Tue, 27 Jun 2017 14:04:42 +0000 (17:04 +0300)]
IB/core: Fix input len in multiple user verbs

Most user verbs pass user data to the kernel with the inclusion of the
ib_uverbs_cmd_hdr structure. This is problematic because the vendor has
no ideas if the verb was called by a legacy verb or an extended verb.
Also, the incosistency between the verbs is confusing.

Fixes: 565197dd8fb1 ("IB/core: Extend ib_uverbs_create_cq")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agomlx5: Replace PCI pool old API
Romain Perier [Tue, 22 Aug 2017 11:46:59 +0000 (13:46 +0200)]
mlx5: Replace PCI pool old API

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier <romain.perier@collabora.com>
Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agomlx4: Replace PCI pool old API
Romain Perier [Tue, 22 Aug 2017 11:46:58 +0000 (13:46 +0200)]
mlx4: Replace PCI pool old API

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier <romain.perier@collabora.com>
Acked-by: Peter Senna Tschudin <peter.senna@collabora.com>
Tested-by: Peter Senna Tschudin <peter.senna@collabora.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mthca: Replace PCI pool old API
Romain Perier [Tue, 22 Aug 2017 11:46:56 +0000 (13:46 +0200)]
IB/mthca: Replace PCI pool old API

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier <romain.perier@collabora.com>
Acked-by: Peter Senna Tschudin <peter.senna@collabora.com>
Tested-by: Peter Senna Tschudin <peter.senna@collabora.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: Implement the alloc/get_hw_stats callback
Somnath Kotur [Wed, 2 Aug 2017 08:46:19 +0000 (01:46 -0700)]
RDMA/bnxt_re: Implement the alloc/get_hw_stats callback

Expose HW counters using the get_hw_stats callback

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: Allocate multiple notification queues
Selvin Xavier [Wed, 2 Aug 2017 08:46:18 +0000 (01:46 -0700)]
RDMA/bnxt_re: Allocate multiple notification queues

Enables multiple Interrupt vectors. Driver is requesting the max
MSIX vectors based on the number of online  cpus and creates upto
9 MSIx vectors (1 for control path and 8 for data path).
A tasklet is created for each of these vectors. NQs are assigned
to CQs in round robin fashion.
This patch also adds IRQ affinity hint for the MSIX vector of each NQ.

Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoAdd OPA extended LID support
Hiatt, Don [Mon, 14 Aug 2017 18:17:43 +0000 (14:17 -0400)]
Add OPA extended LID support

This patch series primarily increases sizes of variables that hold
lid values from 16 to 32 bits. Additionally, it adds a check in
the IB mad stack to verify a properly formatted MAD when OPA
extended LIDs are used.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'k.o/for-4.13-rc' into k.o/for-next
Doug Ledford [Fri, 18 Aug 2017 18:12:04 +0000 (14:12 -0400)]
Merge branch 'k.o/for-4.13-rc' into k.o/for-next

Merging our (hopefully) final -rc pull branch into our for-next branch
because some of our pending patches won't apply cleanly without having
the -rc patches in our tree.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'misc' into k.o/for-next
Doug Ledford [Fri, 18 Aug 2017 18:10:23 +0000 (14:10 -0400)]
Merge branch 'misc' into k.o/for-next

Conflicts:
drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
we aren't safe for that context) touched the same code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: add const to bin_attribute structures
Bhumika Goyal [Wed, 2 Aug 2017 10:01:30 +0000 (15:31 +0530)]
IB/hfi1: add const to bin_attribute structures

Add const to bin_attribute structures as they are only passed to the
functions sysfs_{remove/create}_bin_file. The arguments passed are of
type const, so declare the structures to be const.

Done using Coccinelle.

@m disable optional_qualifier@
identifier s;
position p;
@@
static struct bin_attribute s@p={...};

@okay1@
position p;
identifier m.s;
@@
(
sysfs_create_bin_file(...,&s@p,...)
|
sysfs_remove_bin_file(...,&s@p,...)
)

@bad@
position p!={m.p,okay1.p};
identifier m.s;
@@
s@p

@change depends on !bad disable optional_qualifier@
identifier m.s;
@@
static
+const
struct bin_attribute s={...};

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/qib: add const to bin_attribute structures
Bhumika Goyal [Wed, 2 Aug 2017 10:01:29 +0000 (15:31 +0530)]
IB/qib: add const to bin_attribute structures

Add const to bin_attribute structures as they are only passed to the
functions sysfs_{remove/create}_bin_file. The arguments passed are of
type const, so declare the structures to be const.

Done using Coccinelle.

@m disable optional_qualifier@
identifier s;
position p;
@@
static struct bin_attribute s@p={...};

@okay1@
position p;
identifier m.s;
@@
(
sysfs_create_bin_file(...,&s@p,...)
|
sysfs_remove_bin_file(...,&s@p,...)
)

@bad@
position p!={m.p,okay1.p};
identifier m.s;
@@
s@p

@change depends on !bad disable optional_qualifier@
identifier m.s;
@@
static
+const
struct bin_attribute s={...};

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/uverbs: Initialize cq_context appropriately
Bharat Potnuri [Tue, 1 Aug 2017 05:28:35 +0000 (10:58 +0530)]
RDMA/uverbs: Initialize cq_context appropriately

Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
dereference in ib_uverbs_comp_handler(), if application doesnot use completion
channel. This patch fixes the cq_context initialization.

Fixes: 1e7710f3f65 ("IB/core: Change completion channel to use the reworked")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoinfiniband: avoid overflow warning
Arnd Bergmann [Mon, 31 Jul 2017 06:50:05 +0000 (08:50 +0200)]
infiniband: avoid overflow warning

A sockaddr_in structure on the stack getting passed into rdma_ip2gid
triggers this warning, since we memcpy into a larger sockaddr_in6
structure:

In function 'memcpy',
    inlined from 'rdma_ip2gid' at include/rdma/ib_addr.h:175:3,
    inlined from 'addr_event.isra.4.constprop' at drivers/infiniband/core/roce_gid_mgmt.c:693:2,
    inlined from 'inetaddr_event' at drivers/infiniband/core/roce_gid_mgmt.c:716:9:
include/linux/string.h:305:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter

The warning seems appropriate here, but the code is also clearly
correct, so we really just want to shut up this instance of the
output.

The best way I found so far is to avoid the memcpy() call and instead
replace it with a struct assignment.

Fixes: 6974f0c4555e ("include/linux/string.h: add the option of fortified string.h functions")
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: fix spelling mistake: "allloc_buf" -> "alloc_buf"
Colin Ian King [Fri, 21 Jul 2017 22:19:33 +0000 (23:19 +0100)]
i40iw: fix spelling mistake: "allloc_buf" -> "alloc_buf"

Trivial fix to spelling mistake in i40iw_debug  message and
also split up a couple of lines that are too long and cause
checkpatch warnings

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rxe: Remove unneeded check
Yuval Shaia [Fri, 21 Jul 2017 19:20:50 +0000 (22:20 +0300)]
IB/rxe: Remove unneeded check

Port validation is performed in ib_core, no need to duplicate it here.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rxe: Convert pr_info to pr_warn
Yuval Shaia [Fri, 21 Jul 2017 19:14:09 +0000 (22:14 +0300)]
IB/rxe: Convert pr_info to pr_warn

This message is warning so let's print it accordingly.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Fixes for static checker warnings
Shiraz Saleem [Wed, 19 Jul 2017 18:55:26 +0000 (13:55 -0500)]
i40iw: Fixes for static checker warnings

Remove NULL check for cm_node->listener in i40iw_accept
as listener is always present at this point.

Remove the check for cm_node->accept_pend and related code
in i40iw_cm_event_connected as the cm_node in this context
is only pertinent to active node and cm_node->accept_pend
is always 0.

This fixes the following smatch warnings,

drivers/infiniband/hw/i40iw/i40iw_cm.c:3691 i40iw_accept()
error: we previously assumed 'cm_node->listener' could be null

drivers/infiniband/hw/i40iw/i40iw_cm.c:4061 i40iw_cm_event_connected()
error: we previously assumed 'cm_node->listener' could be null

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Simplify code
Christophe Jaillet [Sun, 16 Jul 2017 11:09:23 +0000 (13:09 +0200)]
i40iw: Simplify code

Axe a few lines of code and re-use existing error handling path to avoid
code duplication.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoinfiniband: pvrdma: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 06:30:46 +0000 (12:00 +0530)]
infiniband: pvrdma: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  10774    1872       8   12654    316e infiniband/hw/vmw_pvrdma/pvrdma_main.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  10838    1808       8   12654    316e infiniband/hw/vmw_pvrdma/pvrdma_main.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoinfiniband: nes: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 06:30:45 +0000 (12:00 +0530)]
infiniband: nes: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  10429     780      33   11242    2bea drivers/infiniband/hw/nes/nes.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  10541     668      33   11242    2bea drivers/infiniband/hw/nes/nes.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoinfiniband: mthca: constify pci_device_id.
Arvind Yadav [Sun, 16 Jul 2017 06:30:44 +0000 (12:00 +0530)]
infiniband: mthca: constify pci_device_id.

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text    data     bss     dec     hex filename
  13067     805       4   13876    3634 infiniband/hw/mthca/mthca_main.o

File size After adding 'const':
   text    data     bss     dec     hex filename
  13419     453       4   13876    3634 infiniband/hw/mthca/mthca_main.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoPCI/IB: add support for pci driver attribute groups
Greg Kroah-Hartman [Wed, 19 Jul 2017 13:01:06 +0000 (15:01 +0200)]
PCI/IB: add support for pci driver attribute groups

Some drivers (specifically the nes IB driver), want to create a lot of
sysfs driver attributes.  Instead of open-coding the creation and
removal of these files (and getting it wrong btw), it's a better idea to
let the driver core handle all of this logic for us.

So add a new field to the pci driver structure, **groups, that allows
pci drivers to specify an attribute group list it wishes to have created
when it is registered with the driver core.

Big bonus is now the driver doesn't race with userspace when the sysfs
files are created vs. when the kobject is announced, so any script/tool
that actually wanted to use these files will not have to poll waiting
for them to show up.

Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: fix spelling mistake: "Deallocte" -> "Deallocate"
Colin Ian King [Fri, 14 Jul 2017 07:30:10 +0000 (08:30 +0100)]
RDMA/bnxt_re: fix spelling mistake: "Deallocte" -> "Deallocate"

Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: fix spelling mistake in variable name continious
Colin Ian King [Thu, 13 Jul 2017 22:13:38 +0000 (23:13 +0100)]
IB/hfi1: fix spelling mistake in variable name continious

Trivial fix to spelling mistake, rename variable 'continious'
to the correct spelling 'continuous'

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/qib: fix spelling mistake: "failng" -> "failing"
Colin Ian King [Mon, 3 Jul 2017 09:23:47 +0000 (10:23 +0100)]
IB/qib: fix spelling mistake: "failng" -> "failing"

Trivial fix to spelling mistake in qib_dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoiwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM
Sagi Grimberg [Tue, 15 Aug 2017 19:20:38 +0000 (22:20 +0300)]
iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM

Its very likely that iwcm work execution will yield memory
allocations (for example cm connection request).

Reported-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agocm: Don't allocate ib_cm workqueue with WQ_MEM_RECLAIM
Sagi Grimberg [Tue, 15 Aug 2017 19:20:37 +0000 (22:20 +0300)]
cm: Don't allocate ib_cm workqueue with WQ_MEM_RECLAIM

create_workqueue always creates the workqueue with WQ_MEM_RECLAIM
and silences a flush dependency warn for WQ_LEGACY. Instead, we
want to keep the warn in case the allocator tries to flush the
cm workqueue because its very likely that cm work execution will
yield memory allocations (for example cm connection requests).

Reported-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agonvmet-rdma: remove redundant empty device add callout
Sagi Grimberg [Sun, 2 Jul 2017 08:20:52 +0000 (11:20 +0300)]
nvmet-rdma: remove redundant empty device add callout

Now that its not needed, we can simply not assign it.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agonvme-rdma: remove redundant empty device add callout
Sagi Grimberg [Sun, 2 Jul 2017 08:20:51 +0000 (11:20 +0300)]
nvme-rdma: remove redundant empty device add callout

Now that its not needed, we can simply not assign it.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/core: make ib_device.add method optional
Sagi Grimberg [Sun, 2 Jul 2017 08:20:50 +0000 (11:20 +0300)]
RDMA/core: make ib_device.add method optional

ib_clients can indeed fill .add to NULL, but then they will not see
any device removal notifications. The reason is that that
ib_register_client and ib_register_device checked existence of .add
before adding the creating a corresponding client_data and adding
it to the list. Simple condition reverse fixes the issue.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agocxgb4: Remove some dead code
Christophe Jaillet [Sat, 10 Jun 2017 09:19:20 +0000 (11:19 +0200)]
cxgb4: Remove some dead code

This 'BUG_ON(!ep)' can never trigger because we have:
   if (!ep)
      return 0;
just a few lines above. So it can be removed safely.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/uverbs: Fix NULL pointer dereference during device removal
Maor Gottlieb [Wed, 16 Aug 2017 15:57:04 +0000 (18:57 +0300)]
IB/uverbs: Fix NULL pointer dereference during device removal

As part of ib_uverbs_remove_one which might be triggered upon
reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace
application.
If device was removed after uverbs fd was opened but before
ib_uverbs_get_context was called, the event file will be accessed
before it was allocated, result in NULL pointer dereference:

[ 72.325873] BUG: unable to handle kernel NULL pointer dereference at (null)
...
[ 72.325984] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 72.327123] Call Trace:
[ 72.327168] ib_uverbs_async_handler.isra.8+0x2e/0x160 [ib_uverbs]
[ 72.327216] ? synchronize_srcu_expedited+0x27/0x30
[ 72.327269] ib_uverbs_remove_one+0x120/0x2c0 [ib_uverbs]
[ 72.327330] ib_unregister_device+0xd0/0x180 [ib_core]
[ 72.327373] mlx5_ib_remove+0x74/0x140 [mlx5_ib]
[ 72.327422] mlx5_remove_device+0xfb/0x110 [mlx5_core]
[ 72.327466] mlx5_unregister_interface+0x3c/0xa0 [mlx5_core]
[ 72.327509] mlx5_ib_cleanup+0x10/0x962 [mlx5_ib]
[ 72.327546] SyS_delete_module+0x155/0x230
[ 72.328472] ? exit_to_usermode_loop+0x70/0xa6
[ 72.329370] do_syscall_64+0x54/0xc0
[ 72.330262] entry_SYSCALL64_slow_path+0x25/0x25

Fix it by checking that user context was allocated before
trigger the event.

Fixes: 036b10635739 ('IB/uverbs: Enable device removal when there are active user space applications')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Protect sysfs entry on ib_unregister_device
Shiraz Saleem [Mon, 17 Jul 2017 19:03:50 +0000 (14:03 -0500)]
IB/core: Protect sysfs entry on ib_unregister_device

ib_unregister_device is not protecting removal of sysfs entries.
A call to ib_register_device in that window can result in
duplicate sysfs entry warning. Move mutex_unlock to after
ib_device_unregister_sysfs to protect against sysfs entry creation.

This issue is exposed during driver load/unload stress test.

WARNING: CPU: 5 PID: 4445 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5f/0x70
sysfs: cannot create duplicate filename '/class/infiniband/i40iw0'
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H
BIOS F7 01/17/2014
Workqueue: i40e i40e_service_task [i40e]
Call Trace:
dump_stack+0x67/0x98
__warn+0xcc/0xf0
warn_slowpath_fmt+0x4a/0x50
? kernfs_path_from_node+0x4b/0x60
sysfs_warn_dup+0x5f/0x70
sysfs_do_create_link_sd.isra.2+0xb7/0xc0
sysfs_create_link+0x20/0x40
device_add+0x28c/0x600
ib_device_register_sysfs+0x58/0x170 [ib_core]
ib_register_device+0x325/0x570 [ib_core]
? i40iw_register_rdma_device+0x1f4/0x400 [i40iw]
? kmem_cache_alloc_trace+0x143/0x330
? __raw_spin_lock_init+0x2d/0x50
i40iw_register_rdma_device+0x2dc/0x400 [i40iw]
i40iw_open+0x10a6/0x1950 [i40iw]
? i40iw_open+0xeab/0x1950 [i40iw]
? i40iw_make_cm_node+0x9c0/0x9c0 [i40iw]
i40e_client_subtask+0xa4/0x110 [i40e]
i40e_service_task+0xc2d/0x1320 [i40e]
process_one_work+0x203/0x710
? process_one_work+0x16f/0x710
worker_thread+0x126/0x4a0
? trace_hardirqs_on+0xd/0x10
kthread+0x112/0x150
? process_one_work+0x710/0x710
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x2e/0x40
---[ end trace fd11b69e21ea7653 ]---
Couldn't register device i40iw0 with driver model

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoiw_cxgb4: fix misuse of integer variable
Steve Wise [Tue, 25 Jul 2017 13:51:15 +0000 (06:51 -0700)]
iw_cxgb4: fix misuse of integer variable

Fixes: ee30f7d507c0 ("iw_cxgb4: Max fastreg depth depends on DSGL support")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hns: fix memory leak on ah on error return path
Colin Ian King [Tue, 8 Aug 2017 17:41:02 +0000 (18:41 +0100)]
IB/hns: fix memory leak on ah on error return path

When dmac is NULL, ah is not being freed on the error return path. Fix
this by kfree'ing it.

Detected by CoverityScan, CID#1452636 ("Resource Leak")

Fixes: d8966fcd4c25 ("IB/core: Use rdma_ah_attr accessor functions")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Fix potential fcn_id_array out of bounds
Christopher N Bednarz [Wed, 9 Aug 2017 01:38:48 +0000 (20:38 -0500)]
i40iw: Fix potential fcn_id_array out of bounds

Avoid out of bounds error by utilizing I40IW_MAX_STATS_COUNT
instead of I40IW_INVALID_FCN_ID.

Signed-off-by: Christopher N Bednarz <christoper.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Use correct alignment for CQ0 memory
Christopher N Bednarz [Wed, 9 Aug 2017 01:38:47 +0000 (20:38 -0500)]
i40iw: Use correct alignment for CQ0 memory

Utilize correct alignment variable when allocating
DMA memory for CQ0.

Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Fix typecast of tcp_seq_num
Mustafa Ismail [Wed, 9 Aug 2017 01:38:46 +0000 (20:38 -0500)]
i40iw: Fix typecast of tcp_seq_num

The typecast of tcp_seq_num incorrectly uses u8. Fix by
casting to u32.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Correct variable names
Mustafa Ismail [Wed, 9 Aug 2017 01:38:44 +0000 (20:38 -0500)]
i40iw: Correct variable names

Fix incorrect naming of status code and struct. Use inline
instead of immediate.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Fix parsing of query/commit FPM buffers
Chien Tin Tung [Wed, 9 Aug 2017 01:38:43 +0000 (20:38 -0500)]
i40iw: Fix parsing of query/commit FPM buffers

Parsing of commit/query Host Memory Cache Function Private Memory
is not skipping over reserved fields and incorrectly assigning
those values into object's base/cnt/max_cnt fields. Skip over
reserved fields and set correct values. Also correct memory
alignment requirement for commit/query FPM buffers.

Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/vmw_pvrdma: Report CQ missed events
Bryan Tan [Thu, 10 Aug 2017 19:05:02 +0000 (12:05 -0700)]
RDMA/vmw_pvrdma: Report CQ missed events

There is a chance of a race between arming the CQ and receiving
completions. By reporting CQ missed events any ULPs should poll
again to get the completions.

Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver")
Acked-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hns: Avoid compile test under non 64bit environments
Matan Barak [Tue, 25 Jul 2017 14:29:06 +0000 (17:29 +0300)]
IB/hns: Avoid compile test under non 64bit environments

The hns driver uses __raw_writeq which is only defined in 64BIT
environments. Trying to compile the driver in a 32BIT environment
results in errors. Only COMPILE_TEST when 64BIT is defined.

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRevert "RDMA/hns: fix build regression"
Doug Ledford [Mon, 14 Aug 2017 15:15:26 +0000 (11:15 -0400)]
Revert "RDMA/hns: fix build regression"

This reverts commit ecd840ff9b793ac60e3e6658414525535349a17b.

6 years agoMerge branch 'rdma-netlink' into k.o/merge-test
Doug Ledford [Thu, 10 Aug 2017 18:34:18 +0000 (14:34 -0400)]
Merge branch 'rdma-netlink' into k.o/merge-test

Conflicts:
include/rdma/ib_verbs.h - Modified a function signature adjacent
to a newly added function signature from a previous merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branches '32bit_lid' and 'irq_affinity' into k.o/merge-test
Doug Ledford [Thu, 10 Aug 2017 18:31:29 +0000 (14:31 -0400)]
Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-test

Conflicts:
drivers/infiniband/hw/mlx5/main.c - Both add new code
include/rdma/ib_verbs.h - Both add new code

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge tag 'rdma-next-2017-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Doug Ledford [Thu, 10 Aug 2017 17:43:11 +0000 (13:43 -0400)]
Merge tag 'rdma-next-2017-08-10' of git://git./linux/kernel/git/leon/linux-rdma into rdma-netlink

RDMA netlink infrastructure v2

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/netlink: Export node_type
Leon Romanovsky [Thu, 29 Jun 2017 13:01:29 +0000 (16:01 +0300)]
RDMA/netlink: Export node_type

Add ability to get node_type for RDAM netlink users.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Provide port state and physical link state
Leon Romanovsky [Thu, 29 Jun 2017 10:12:45 +0000 (13:12 +0300)]
RDMA/netlink: Provide port state and physical link state

Add port state and physical link state to the users of RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Export LID mask control (LMC)
Leon Romanovsky [Wed, 28 Jun 2017 12:49:30 +0000 (15:49 +0300)]
RDMA/netlink: Export LID mask control (LMC)

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netink: Export lids and sm_lids
Leon Romanovsky [Wed, 28 Jun 2017 12:38:36 +0000 (15:38 +0300)]
RDMA/netink: Export lids and sm_lids

According to the IB specification, the LID and SM_LID
are 16-bit wide, but to support OmniPath users, export
it as 32-bit value from the beginning.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Advertise IB subnet prefix
Leon Romanovsky [Wed, 28 Jun 2017 12:05:14 +0000 (15:05 +0300)]
RDMA/netlink: Advertise IB subnet prefix

Add IB subnet prefix to the port properties exported
by RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Export node_guid and sys_image_guid
Leon Romanovsky [Wed, 28 Jun 2017 11:01:37 +0000 (14:01 +0300)]
RDMA/netlink: Export node_guid and sys_image_guid

Add Node GUID and system image GUID to the device properties
exported by RDMA netlink, to be used by RDMAtool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Export FW version
Leon Romanovsky [Tue, 27 Jun 2017 13:58:59 +0000 (16:58 +0300)]
RDMA/netlink: Export FW version

Add FW version to the device properties exported
by RDMA netlink, to be used by RDMAtool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA: Simplify get firmware interface
Leon Romanovsky [Tue, 27 Jun 2017 13:49:53 +0000 (16:49 +0300)]
RDMA: Simplify get firmware interface

There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.

The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.

The introduction of this define allows us to remove the string size
from get_fw_str function signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Expose device and port capability masks
Leon Romanovsky [Tue, 20 Jun 2017 11:47:08 +0000 (14:47 +0300)]
RDMA/netlink: Expose device and port capability masks

The port capability mask is exposed to user space via sysfs interface,
while device capabilities are available for verbs only.

This patch provides those capabilities through netlink interface.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Implement nldev port doit callback
Leon Romanovsky [Thu, 22 Jun 2017 13:10:38 +0000 (16:10 +0300)]
RDMA/netlink: Implement nldev port doit callback

Provide ability to get specific to device and port information.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Add nldev port dumpit implementation
Leon Romanovsky [Tue, 20 Jun 2017 08:30:33 +0000 (11:30 +0300)]
RDMA/netlink: Add nldev port dumpit implementation

This patch implements the query interface to get all
ports data for the specific device.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Add nldev device doit implementation
Leon Romanovsky [Thu, 15 Jun 2017 17:33:08 +0000 (20:33 +0300)]
RDMA/netlink: Add nldev device doit implementation

Provide ability to query specific device.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Implement nldev device dumpit calback
Leon Romanovsky [Tue, 20 Jun 2017 06:59:14 +0000 (09:59 +0300)]
RDMA/netlink: Implement nldev device dumpit calback

This patch adds the ability to return all available devices
together with their properties.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Add nldev initialization flows
Leon Romanovsky [Tue, 20 Jun 2017 06:14:15 +0000 (09:14 +0300)]
RDMA/netlink: Add nldev initialization flows

Add nldev init and exit flows to the RDMA/core.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Add netlink device definitions to UAPI
Leon Romanovsky [Tue, 20 Jun 2017 04:55:53 +0000 (07:55 +0300)]
RDMA/netlink: Add netlink device definitions to UAPI

Introduce new defines to rdma_netlink.h, so the RDMA configuration tool
will be able to communicate with RDMA subsystem by using the shared defines.

The addition of new client (NLDEV) revealed the fact that we exposed by
mistake the RDMA_NL_I40IW define which is not backed by any RDMA netlink
by now and it won't be exposed in the future too. So this patch reuses
the value and deletes the old defines.

The NLDEV operates with objects. The struct ib_device has two straightforward
objects: device itself and ports of that device.

This brings us to propose the following commands to work on those objects:
 * RDMA_NLDEV_CMD_{GET,SET,NEW,DEL} - works on ib_device itself
 * RDMA_NLDEV_CMD_PORT_{GET,SET,NEW,DEL} - works on ports of specific ib_device

Those commands receive/return the device index (RDMA_NLDEV_ATTR_DEV_INDEX)
and port index (RDMA_NLDEV_ATTR_PORT_INDEX). For device object accesses,
the RDMA_NLDEV_ATTR_PORT_INDEX will return the maximum number of ports
for specific ib_device and for port access the actual port index.

The port index starts from 1 to follow RDMA/core internal semantics and
the sysfs exposed knobs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Update copyright
Leon Romanovsky [Sun, 18 Jun 2017 13:37:27 +0000 (16:37 +0300)]
RDMA/netlink: Update copyright

Add Mellanox to the copyright header.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Convert LS to doit callback
Leon Romanovsky [Thu, 15 Jun 2017 11:20:39 +0000 (14:20 +0300)]
RDMA/netlink: Convert LS to doit callback

RDMA_NL_LS protocol is actually does not dump anything,
but sets data and it should be handled by doit callback.

This patch actually converts RDMA_NL_LS to doit callback, while
preserving IWCM and RDMA_CM flows through netlink_dump_start().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Reduce indirection access to cb_table
Leon Romanovsky [Thu, 15 Jun 2017 10:14:13 +0000 (13:14 +0300)]
RDMA/netlink: Reduce indirection access to cb_table

Introduce intermediate variable to store access to fields
of cb_table.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Add and implement doit netlink callback
Leon Romanovsky [Thu, 15 Jun 2017 09:46:33 +0000 (12:46 +0300)]
RDMA/netlink: Add and implement doit netlink callback

The .doit callback is used by netlink core to differentiate
between get and set operations. Common convention is to use
that call for command operations like (SET, ADD, e.t.c.) and/or
access without NLF_M_DUMP flag.

This commit adds proper declaration and implementation
to RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/core: Add and expose static device index
Leon Romanovsky [Sun, 18 Jun 2017 11:39:59 +0000 (14:39 +0300)]
RDMA/core: Add and expose static device index

This patch adds static device index in similar fashion to
already available in netdev world (struct net->ifindex).

In downstream patches, the RDMA nelink will use this idx-to-ib_device
conversion, so as part of this commit, we are exposing the translation
function to be visible for IB/core users.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/core: Add iterator over ib_devices
Leon Romanovsky [Mon, 19 Jun 2017 11:04:56 +0000 (14:04 +0300)]
RDMA/core: Add iterator over ib_devices

The coming nldev needs to iterate over all IB devices in the system
and in order to not expose the ib_devices list outside the devices.c,
it is necessary to provide function iterator.

Current version is written explicitly for nldev callback to avoid
over-engineering at this stage, but it can be easily extended for
other types.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Rename netlink callback struct
Leon Romanovsky [Mon, 19 Jun 2017 15:23:45 +0000 (18:23 +0300)]
RDMA/netlink: Rename netlink callback struct

The RDMA netlink client infrastructure was removed and made obsolete.
The old infrastructure defined struct ibnl_client_cbs. Now that all
uses of this have been updated to the new infrastructure, rename the
struct to be compliant with the current stack naming standards:
struct rdma_nl_cbs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Simplify and rename ibnl_chk_listeners
Leon Romanovsky [Sun, 18 Jun 2017 12:51:16 +0000 (15:51 +0300)]
RDMA/netlink: Simplify and rename ibnl_chk_listeners

Make ibnl_chk_listeners function to be one line by removing
unneeded comparison.

Rename that function to be complaint to other functions in RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Rename and remove redundant parameter from ibnl_multicast
Leon Romanovsky [Sun, 18 Jun 2017 12:44:32 +0000 (15:44 +0300)]
RDMA/netlink: Rename and remove redundant parameter from ibnl_multicast

The pointer to netlink header was not used in the ibnl_multicast
function, so let's remove it and simplify the function
signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Rename and remove redundant parameter from ibnl_unicast*
Leon Romanovsky [Sun, 18 Jun 2017 12:35:20 +0000 (15:35 +0300)]
RDMA/netlink: Rename and remove redundant parameter from ibnl_unicast*

Netlink message header is not needed for unicast reply, hence remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Simplify the put_msg and put_attr
Leon Romanovsky [Sun, 18 Jun 2017 13:38:04 +0000 (16:38 +0300)]
RDMA/netlink: Simplify the put_msg and put_attr

Reuse standard macros to cancel the netlink message
in case of error.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/netlink: Add flag to consolidate common handling
Leon Romanovsky [Mon, 12 Jun 2017 13:00:19 +0000 (16:00 +0300)]
RDMA/netlink: Add flag to consolidate common handling

Add ability to provide flags to control RDMA netlink callbacks
and convert addr.c and sa_query.c to be first users of such
infrastructure. It allows to move their CAP_NET_ADMIN checks
into netlink core.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
6 years agoRDMA/iwcm: Remove extra EXPORT_SYMBOLS
Leon Romanovsky [Thu, 1 Jun 2017 08:59:44 +0000 (11:59 +0300)]
RDMA/iwcm: Remove extra EXPORT_SYMBOLS

The iwcm exports functions which are not used outside of ib_core.
This patch simply removes these EXPORT_SYMBOLS.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
6 years agoRDMA/iwcm: Remove useless check of netlink client validity
Leon Romanovsky [Thu, 1 Jun 2017 09:42:36 +0000 (12:42 +0300)]
RDMA/iwcm: Remove useless check of netlink client validity

RDMA netlink implementation guarantees that supplied
client number is in allowed range.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
6 years agoRDMA/netlink: Avoid double pass for RDMA netlink messages
Leon Romanovsky [Thu, 8 Jun 2017 06:05:12 +0000 (09:05 +0300)]
RDMA/netlink: Avoid double pass for RDMA netlink messages

The standard netlink_rcv_skb function skips messages without
NLM_F_REQUEST flag in it, while SA netlink client issues them.

In commit bc10ed7d3d19 ("IB/core: Add rdma netlink helper functions")
the local function was introduced to allow such messages.

This led to double pass for every incoming message.

In this patch, we unify that local implementation and netlink_rcv_skb
functions, so there will be no need for double pass anymore.

As a outcome, this combined function gained more strict check
for NLM_F_REQUEST flag and it is now allowed for SA pathquery
client only.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Remove redundant owner option for netlink callbacks
Leon Romanovsky [Tue, 30 May 2017 08:29:56 +0000 (11:29 +0300)]
RDMA/netlink: Remove redundant owner option for netlink callbacks

Owner field is not needed to be set because netlink is part of ib_core
which will be unloaded last after all other modules are unloaded.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/netlink: Remove netlink clients infrastructure
Leon Romanovsky [Mon, 5 Jun 2017 07:20:11 +0000 (10:20 +0300)]
RDMA/netlink: Remove netlink clients infrastructure

RDMA netlink has a complicated infrastructure for dynamically
registering and de-registering netlink clients to the NETLINK_RDMA
group. The complicated portion of this code is not widely used because
2 of the 3 current clients are statically compiled together with
netlink.c. The infrastructure, therefore, is deemed overkill.

Refactor the code to eliminate the dynamically added clients. Now all
clients are pre-registered in a client array at compile time, and at run
time they merely check-in with the infrastructure to pass their callback
table for inclusion in the pre-sized client array.

This also allows for future cleanups and removal of unneeded code in the
iwcm* netlink handler.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
6 years agoRDMA/core: Add wait/retry version of ibnl_unicast
Ismail, Mustafa [Wed, 28 Jun 2017 14:02:45 +0000 (09:02 -0500)]
RDMA/core: Add wait/retry version of ibnl_unicast

Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait,
and modify ibnl_unicast to not wait/retry.  This eliminates
the undesirable wait for future users of ibnl_unicast.

Change Portmapper calls originating from kernel to user-space
to use ibnl_unicast_wait and take advantage of the wait/retry
logic in netlink_unicast.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
6 years agonvme-rdma: use intelligent affinity based queue mappings
Sagi Grimberg [Thu, 13 Jul 2017 08:09:44 +0000 (11:09 +0300)]
nvme-rdma: use intelligent affinity based queue mappings

Use the generic block layer affinity mapping helper. Also,
limit nr_hw_queues to the rdma device number of irq vectors
as we don't really need more.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoblock: Add rdma affinity based queue mapping helper
Sagi Grimberg [Thu, 13 Jul 2017 08:09:43 +0000 (11:09 +0300)]
block: Add rdma affinity based queue mapping helper

Like pci and virtio, we add a rdma helper for affinity
spreading. This achieves optimal mq affinity assignments
according to the underlying rdma device affinity maps.

Reviewed-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agomlx5: support ->get_vector_affinity
Sagi Grimberg [Thu, 13 Jul 2017 08:09:42 +0000 (11:09 +0300)]
mlx5: support ->get_vector_affinity

Simply refer to the generic affinity mask helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/core: expose affinity mappings per completion vector
Sagi Grimberg [Thu, 13 Jul 2017 08:09:41 +0000 (11:09 +0300)]
RDMA/core: expose affinity mappings per completion vector

This will allow ULPs to intelligently locate threads based
on completion vector cpu affinity mappings. In case the
driver does not expose a get_vector_affinity callout, return
NULL so the caller can maintain a fallback logic.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agomlx5: move affinity hints assignments to generic code
Sagi Grimberg [Thu, 13 Jul 2017 08:09:40 +0000 (11:09 +0300)]
mlx5: move affinity hints assignments to generic code

generic api takes care of spreading affinity similar to
what mlx5 open coded (and even handles better asymmetric
configurations). Ask the generic API to spread affinity
for us, and feed him pre_vectors that do not participate
in affinity settings (which is an improvement to what we
had before).

The affinity assignments should match what mlx5 tried to
do earlier but now we do not set affinity to async, cmd
and pages dedicated vectors.

Also, remove mlx5e_get_cpu and introduce mlx5e_get_node
(used for allocation purposes) and mlx5_get_vector_affinity
(for indirection table construction) as they provide the needed
information. Luckily, we have generic helpers to get cpumask
and node given a irq vector. mlx5_get_vector_affinity will
be used by mlx5_ib in a subsequent patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agomlx5e: don't assume anything on the irq affinity mappings of the device
Sagi Grimberg [Thu, 13 Jul 2017 08:09:39 +0000 (11:09 +0300)]
mlx5e: don't assume anything on the irq affinity mappings of the device

mlx5e currently assumes that irq affinity is really spread first
irq vectors across device home node cpus, with the new generic affinity
mappings this is no longer the case, hence mlxe should not rely on
this anymore.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agomlx5: convert to generic pci_alloc_irq_vectors
Sagi Grimberg [Thu, 13 Jul 2017 08:09:38 +0000 (11:09 +0300)]
mlx5: convert to generic pci_alloc_irq_vectors

Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/CM: Set appropriate slid and dlid when handling CM request
Dasaratharaman Chandramouli [Thu, 8 Jun 2017 17:38:04 +0000 (13:38 -0400)]
IB/CM: Set appropriate slid and dlid when handling CM request

If extended LIDs are being used, a connection request contains
OPA GIDs in them. Extract the lids from the OPA gids and populate
slid/dlid fields in the path records that are created when handling
a connection request.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/CM: Create appropriate path records when handling CM request
Dasaratharaman Chandramouli [Thu, 8 Jun 2017 17:38:03 +0000 (13:38 -0400)]
IB/CM: Create appropriate path records when handling CM request

When handling an incoming conection request, ib_cm creates
either an IB or an OPA path record based on the gid field
in the request.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>