OSDN Git Service

uclinux-h8/linux.git
6 years agoIB/uverbs: Move non driver related elements from ib_ucontext to ib_ufile
Jason Gunthorpe [Wed, 4 Jul 2018 08:32:07 +0000 (11:32 +0300)]
IB/uverbs: Move non driver related elements from ib_ucontext to ib_ufile

The IDR is part of the ib_ufile so all the machinery to lock it, handle
closing and disassociation rightly belongs to the ufile not the ucontext.

This changes the lifetime of that data to match the lifetime of the file
descriptor which is always strictly longer than the lifetime of the
ucontext.

We need the entire locking machinery to continue to exist after ucontext
destruction to allow us to return the destroy data after a device has been
disassociated.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/uverbs: Add a uobj_perform_destroy helper
Jason Gunthorpe [Wed, 4 Jul 2018 08:32:06 +0000 (11:32 +0300)]
IB/uverbs: Add a uobj_perform_destroy helper

This consolidates a bunch of repeated code patterns into a helper.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Combine MIN_SZ_OR_ZERO with UVERBS_ATTR_STRUCT
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:31 +0000 (08:50 +0300)]
RDMA/uverbs: Combine MIN_SZ_OR_ZERO with UVERBS_ATTR_STRUCT

After all the rework is done it is now possible to include single flags in
the type macros. Any user of UVERBS_ATTR_STRUCT needs to zero check data
past the end of the known struct to be correct, so make this mandatory,
and get rid of MIN_SZ_OR_ZERO as a user flag.

This changes UVERBS_ATTR_TYPE to refer to a struct of exact size with not
possibility of extension, convert the few users of UVERBS_ATTR_TYPE and
MIN_SZ_OR_ZERO to use UVERBS_ATTR_STRUCT.

The one user of UVERBS_ATTR_STRUCT without MIN_SZ_OR_ZERO is just
confused. There is some padding at the end of that struct, but userspace
always provides it with the padding. The construction doesn't test if the
padding is zero, so it is pointless. Just use UVERBS_ATTR_TYPE.

Finally, rename min_sz_or_zero to zero_trailing to better reflect what it
does and hopefully avoid such mis-uses in the future.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Use UVERBS_ATTR_MIN_SIZE correctly and uniformly
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:30 +0000 (08:50 +0300)]
RDMA/uverbs: Use UVERBS_ATTR_MIN_SIZE correctly and uniformly

This newer macro allows specifying a lower bound on the accepted size, and
has an 'unlimited' upper bound. Due to this it never checks for trailing
zeroing so it doesn't make any sense to combine it with MIN_SZ_OR_ZERO, so
drop MIN_SZ_OR_ZERO when they are used together

There were a couple of places that open coded this pattern, switch them to
use the clearer UVERBS_ATTR_MIN_SIZE for clarity.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Remove UA_FLAGS
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:29 +0000 (08:50 +0300)]
RDMA/uverbs: Remove UA_FLAGS

This bit of boilerplate isn't really necessary, we can use bitfields
instead of a flags enum and the macros can then individually initialize
them through the __VA_ARGS__ like everything else.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Get rid of the & in method specifications
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:28 +0000 (08:50 +0300)]
RDMA/uverbs: Get rid of the & in method specifications

Hide it inside the macros. The & is confusing and interferes with using
this as a generic DSL in later patches.

Since this also touches almost every line, also run the specs through
clang-format (with 'BinPackParameters: false') to make the maintenance
easier.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Simplify UVERBS_OBJECT and _TREE family of macros
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:27 +0000 (08:50 +0300)]
RDMA/uverbs: Simplify UVERBS_OBJECT and _TREE family of macros

Instead of the large set of indirecting macros, define the few needed
macros to directly instantiate the struct uverbs_oject_tree_def and
associated objects list.

This is small amount of code duplication but the readability is far
better.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Simplify method definition macros
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:26 +0000 (08:50 +0300)]
RDMA/uverbs: Simplify method definition macros

Instead of the large set of indirecting macros, define the few needed
macros to directly instantiate the struct uverbs_method_def and associated
attributes list.

This is small amount of code duplication but the readability is far
better.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Simplify UVERBS_ATTR family of macros
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:25 +0000 (08:50 +0300)]
RDMA/uverbs: Simplify UVERBS_ATTR family of macros

Instead of using a complex cascade of macros, just directly provide the
initializer list each of the declarations is trying to create.

Now that the macros are simplified this also reworks the uverbs_attr_spec
to be friendly to older compilers by eliminating any unnamed
structures/unions inside, and removing the duplication of some fields. The
structure size remains at 16 bytes which was the original motivation for
some of this oddness.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Split UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:24 +0000 (08:50 +0300)]
RDMA/uverbs: Split UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE

Two methods are sharing the same attribute constant, but the attribute
definitions are not the same. This should not have been done, instead
split them into two attributes with the same number.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Store the specs_root in the struct ib_uverbs_device
Jason Gunthorpe [Wed, 4 Jul 2018 05:50:23 +0000 (08:50 +0300)]
RDMA/uverbs: Store the specs_root in the struct ib_uverbs_device

The specs are required to operate the uverbs file, so they belong inside
the ib_uverbs_device, not inside the ib_device. The spec passed in the
ib_device is just a communication from the driver and should not be used
during runtime.

This also changes the lifetime of the spec memory to match the
ib_uverbs_device, however at this time the spec_root can still contain
driver pointers after disassociation, so it cannot be used if ib_dev is
NULL. This is preparation for another series.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoMerge branch 'mlx5-dump-fill-mkey' into rdma.git for-next
Jason Gunthorpe [Wed, 4 Jul 2018 19:19:46 +0000 (13:19 -0600)]
Merge branch 'mlx5-dump-fill-mkey' into rdma.git for-next

For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Dump and fill MKEY from Leon Romanovsky:

====================
MLX5 IB HCA offers the memory key, dump_fill_mkey to increase performance,
when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access, while
keeping track of the processed length in the ibv_sge handling.

In this three patch series, we expose various bits in our HW spec
file (mlx5_ifc.h), move unneeded for mlx5_core FW command and export such
memory key to user space thought our mlx5-abi header file.
====================

Botched auto-merge in mlx5_ib_alloc_ucontext() resolved by hand.

* branch 'mlx5-dump-fill-mkey':
  IB/mlx5: Expose dump and fill memory key
  net/mlx5: Add hardware definitions for dump_fill_mkey
  net/mlx5: Limit scope of dump_fill_mkey function
  net/mlx5: Rate limit errors in command interface

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Expose dump and fill memory key
Yonatan Cohen [Tue, 19 Jun 2018 05:47:24 +0000 (08:47 +0300)]
IB/mlx5: Expose dump and fill memory key

MLX5 IB HCA offers the memory key, dump_fill_mkey to boost
performance, when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access,
while keeping track of the processed length in the ibv_sge handling.

Meaning, instead of a PCI write access the HCA leaves the target
memory untouched, and skips filling that packet section. Similar
behavior is done upon send, the HCA skips data in memory relevant
to this key and saves PCI bus access.

This functionality saves PCI read/write operations.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/mlx5: Add hardware definitions for dump_fill_mkey
Leon Romanovsky [Tue, 19 Jun 2018 05:47:23 +0000 (08:47 +0300)]
net/mlx5: Add hardware definitions for dump_fill_mkey

MLX5 IB HCA offers the memory key, dump_fill_mkey to boost
performance by forcing local HCA operations to skip the PCI bus
access,

This patch adds needed hardware definitions.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Limit scope of dump_fill_mkey function
Yonatan Cohen [Tue, 19 Jun 2018 05:47:22 +0000 (08:47 +0300)]
net/mlx5: Limit scope of dump_fill_mkey function

mlx5_core_dump_fill_mkey() is going to be used in next
patch in IB and doesn't need to be visible to whole
mlx5_core. Move that command to mlx5_ib.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/bnxt_re: Fix a bunch of off by one bugs in qplib_fp.c
Dan Carpenter [Wed, 4 Jul 2018 09:58:02 +0000 (12:58 +0300)]
RDMA/bnxt_re: Fix a bunch of off by one bugs in qplib_fp.c

The srq->swq[] is allocated in bnxt_qplib_create_srq().  It has
srq->hwq.max_elements elements so these tests should be > instead of >=
or we might go beyond the end of the array.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/bnxt_re: Fix a couple off by one bugs
Dan Carpenter [Wed, 4 Jul 2018 09:57:11 +0000 (12:57 +0300)]
RDMA/bnxt_re: Fix a couple off by one bugs

The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
It has sgid_tbl->max elements.  So the > should be >= to prevent
accessing one element beyond the end of the array.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: type promotion bug in rdma_rw_init_one_mr()
Dan Carpenter [Wed, 4 Jul 2018 09:32:12 +0000 (12:32 +0300)]
IB/core: type promotion bug in rdma_rw_init_one_mr()

"nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
error code then it's type promoted to a high unsigned int which is
treated as success.

Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMAINTAINERS: Moving out...
Or Gerlitz [Tue, 3 Jul 2018 15:02:37 +0000 (18:02 +0300)]
MAINTAINERS: Moving out...

The 2.6.18... was a hell of a ride, and by now both me and
Roi are not dealing with iser any more. Max will replace us
as maintainer, good luck to you dear!

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/i40w: Hold read semaphore while looking after VMA
Leon Romanovsky [Sun, 1 Jul 2018 16:36:24 +0000 (19:36 +0300)]
RDMA/i40w: Hold read semaphore while looking after VMA

VMA lookup is supposed to be performed while mmap_sem is held.

Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx4: Test port number before querying type.
Tarick Bedeir [Mon, 2 Jul 2018 21:02:34 +0000 (14:02 -0700)]
IB/mlx4: Test port number before querying type.

rdma_ah_find_type() can reach into ib_device->port_immutable with a
potentially out-of-bounds port number, so check that the port number is
valid first.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Tarick Bedeir <tarick@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agovmw_pvrdma: Release netdev when vmxnet3 module is removed
Neil Horman [Fri, 29 Jun 2018 11:52:06 +0000 (07:52 -0400)]
vmw_pvrdma: Release netdev when vmxnet3 module is removed

On repeated module load/unload cycles, its possible for the pvrmda driver
to encounter this crash:

...
[  297.032448] RIP: 0010:[<ffffffff839e4620>]  [<ffffffff839e4620>] netdev_walk_all_upper_dev_rcu+0x50/0xb0
[  297.034078] RSP: 0018:ffff95087780bd08  EFLAGS: 00010286
[  297.034986] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff95087a0c0000
[  297.036196] RDX: ffff95087a0c0000 RSI: ffffffff839e44e0 RDI: ffff950835d0c000
[  297.037421] RBP: ffff95087780bd40 R08: ffff95087a0e0ea0 R09: abddacd03f8e0ea0
[  297.038636] R10: abddacd03f8e0ea0 R11: ffffef5901e9dbc0 R12: ffff95087a0c0000
[  297.039854] R13: ffffffff839e44e0 R14: ffff95087a0c0000 R15: ffff950835d0c828
[  297.041071] FS:  0000000000000000(0000) GS:ffff95087fc00000(0000) knlGS:0000000000000000
[  297.042443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  297.043429] CR2: ffffffffffffffe8 CR3: 000000007a652000 CR4: 00000000003607f0
[  297.044674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  297.045893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  297.047109] Call Trace:
[  297.047545]  [<ffffffff839e4698>] netdev_has_upper_dev_all_rcu+0x18/0x20
[  297.048691]  [<ffffffffc05d31af>] is_eth_port_of_netdev+0x2f/0xa0 [ib_core]
[  297.049886]  [<ffffffffc05d3180>] ? is_eth_active_slave_of_bonding_rcu+0x70/0x70 [ib_core]
...

This occurs because vmw_pvrdma on probe stores a pointer to the netdev
that exists on function 0 of the same bus/device/slot (which represents
the vmxnet3 ethernet driver).  However, it never removes this pointer if
the vmxnet3 module is removed, leading to crashes resulting from use after
free dereferencing incidents like the one above.

The fix is pretty straightforward.  vmw_pvrdma should listen for
NETDEV_REGISTER and NETDEV_UNREGISTER events in its event listener code
block, and update the stored netdev pointer accordingly.  This solution
has been tested by myself and the reporter with successful results.  This
fix also allows the pvrdma driver to find its underlying ethernet device
in the event that vmxnet3 is loaded after pvrdma, which it was not able to
do before.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: ruquin@redhat.com
Tested-by: Adit Ranadive <aditr@vmware.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Fix GRE flow specification
Maor Gottlieb [Sun, 1 Jul 2018 12:50:17 +0000 (15:50 +0300)]
IB/mlx5: Fix GRE flow specification

Currently the driver sets the mask of the gre_protocol to 0xffff
without consideration in the user request.

Fix it by copy the mask from the verbs spec.

Fixes: da2f22ae7707 ("IB/mlx5: Add support for GRE flow specification")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Remove incorrect call to do_interrupt callback
Michael J. Ruhl [Mon, 2 Jul 2018 15:08:37 +0000 (08:08 -0700)]
IB/hfi1: Remove incorrect call to do_interrupt callback

The general interrupt handler is_rcv_avail_int() has two paths,
do_interrupt() (callback) and handle_user_interrupt().  The
do_interrupt() callback is for the threaded receive handling.
is_rcv_avail_int() cannot handle threaded IRQs.

If the do_interrupt() path is taken, and the IRQ returns
IRQ_WAKE_THREAD, the IRQ behavior will be indeterminate.

Remove incorrect call to do_interrupt() from is_rcv_avail_int(),
leaving the un-threaded (handle_user_interrupt()) path.

Fixes: f4f30031c33c ("staging/rdma/hfi1: Thread the receive interrupt.")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Set in_use_ctxts bits for user ctxts only
Michael J. Ruhl [Mon, 2 Jul 2018 15:08:27 +0000 (08:08 -0700)]
IB/hfi1: Set in_use_ctxts bits for user ctxts only

The in_use_ctxts bitmask is for user receive contexts only.  Setting it for
any other type of receive context is incorrect.

Move initial set of in_use_ctxts bits from the general context init to the
user context specific init. Having this bit set can allow contexts to be
incorrectly identified by some IRQ handlers. This will allow
handle_user_interrupt() will now filter user contexts correctly.

Clean up redundant is_rcv_urgent_int() user context check.

A follow on patch will clean up an incorrect code path in the
is_rcv_avail_int().

Fixes: 8737ce95c463 ("IB/hfi1: Fix an assign/ordering issue with shared context IDs")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoib_srpt: Fix a use-after-free in __srpt_close_all_ch()
Bart Van Assche [Mon, 2 Jul 2018 21:08:45 +0000 (14:08 -0700)]
ib_srpt: Fix a use-after-free in __srpt_close_all_ch()

BUG: KASAN: use-after-free in srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
Read of size 4 at addr ffff8801269d23f8 by task check/29726

CPU: 4 PID: 29726 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xa4/0xf5
 print_address_description+0x6f/0x270
 kasan_report+0x241/0x360
 __asan_load4+0x78/0x80
 srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
 srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
 configfs_write_file+0x14e/0x1d0 [configfs]
 __vfs_write+0xd2/0x3b0
 vfs_write+0x101/0x270
 ksys_write+0xab/0x120
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x77/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f235cfe6154

Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoib_srpt: Fix a use-after-free in srpt_close_ch()
Bart Van Assche [Mon, 2 Jul 2018 21:08:18 +0000 (14:08 -0700)]
ib_srpt: Fix a use-after-free in srpt_close_ch()

Avoid that KASAN reports the following:

BUG: KASAN: use-after-free in srpt_close_ch+0x4f/0x1b0 [ib_srpt]
Read of size 4 at addr ffff880151180cb8 by task check/4681

CPU: 15 PID: 4681 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xa4/0xf5
 print_address_description+0x6f/0x270
 kasan_report+0x241/0x360
 __asan_load4+0x78/0x80
 srpt_close_ch+0x4f/0x1b0 [ib_srpt]
 srpt_set_enabled+0xf7/0x1e0 [ib_srpt]
 srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
 configfs_write_file+0x14e/0x1d0 [configfs]
 __vfs_write+0xd2/0x3b0
 vfs_write+0x101/0x270
 ksys_write+0xab/0x120
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x77/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Remove set-but-not-used variables
Bart Van Assche [Mon, 2 Jul 2018 15:59:28 +0000 (08:59 -0700)]
IB/mlx5: Remove set-but-not-used variables

Avoid that the compiler complains about set-but-not-used variables when
building with W=1. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/srp: Remove driver version and release data information
Bart Van Assche [Mon, 2 Jul 2018 15:58:41 +0000 (08:58 -0700)]
IB/srp: Remove driver version and release data information

Remove the driver version and release date information because such
information is not relevant for an upstream driver. See also commit
e1267b01240a ("RDMA: Remove useless MODULE_VERSION").

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoinclude/rdma/opa_addr.h: Fix an endianness issue
Bart Van Assche [Mon, 2 Jul 2018 17:06:51 +0000 (10:06 -0700)]
include/rdma/opa_addr.h: Fix an endianness issue

IB_MULTICAST_LID_BASE is defined as follows:

  #define IB_MULTICAST_LID_BASE   cpu_to_be16(0xC000)

Hence use be16_to_cpu() to convert it to CPU endianness. Compile-tested
only.

Fixes: af808ece5ce9 ("IB/SA: Check dlid before SA agent queries for ClassPortInfo")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB: Improve uverbs_cleanup_ucontext algorithm
Yishai Hadas [Wed, 20 Jun 2018 14:11:39 +0000 (17:11 +0300)]
IB: Improve uverbs_cleanup_ucontext algorithm

Improve uverbs_cleanup_ucontext algorithm to work properly when the
topology graph of the objects cannot be determined at compile time.  This
is the case with objects created via the devx interface in mlx5.

Typically uverbs objects must be created in a strict topologically sorted
order, so that LIFO ordering will generally cause them to be freed
properly. There are only a few cases (eg memory windows) where objects can
point to things out of the strict LIFO order.

Instead of using an explicit ordering scheme where the HW destroy is not
allowed to fail, go over the list multiple times and allow the destroy
function to fail. If progress halts then a final, desperate, cleanup is
done before leaking the memory. This indicates a driver bug.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/srpt: Support HCAs with more than two ports
Bart Van Assche [Tue, 26 Jun 2018 22:24:48 +0000 (15:24 -0700)]
IB/srpt: Support HCAs with more than two ports

Since there are adapters that have four ports, increase the size of
the srpt_device.port[] array. This patch avoids that the following
warning is hit with quad port Chelsio adapters:

    WARN_ON(sdev->device->phys_port_cnt > ARRAY_SIZE(sdev->port));

Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/iser: set can_queue earlier to allow setting higher queue depth
Sagi Grimberg [Wed, 27 Jun 2018 08:03:45 +0000 (11:03 +0300)]
IB/iser: set can_queue earlier to allow setting higher queue depth

We need to set can_queue earlier than when enabling the scsi host.
in a blk-mq enabled environment, the tagset allocation is taken
from can_queue which cannot be modified later. Also, pass an updated
.can_queue to iscsi_session_setup to have enough iscsi tasks allocated
in the session kfifo.

Reported-by: Karandeep Chahal <karandeepchahal@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/rxe: don't clear the tx queue on every transfer
Vijay Immanuel [Tue, 19 Jun 2018 01:48:56 +0000 (18:48 -0700)]
IB/rxe: don't clear the tx queue on every transfer

Do not call sk_dst_set() on every packet transfer because
that calls sk_tx_queue_clear(), which clears the tx queue.
A QP must stay on the same tx queue to maintain packet order.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/cm: Remove now useless rcu_lock in dst_fetch_ha
Jason Gunthorpe [Wed, 27 Jun 2018 07:44:26 +0000 (10:44 +0300)]
IB/cm: Remove now useless rcu_lock in dst_fetch_ha

This lock used to be protecting a call to dst_get_neighbour_noref,
however the below commit changed it to dst_neigh_lookup which no longer
requires rcu.

Access to nud_state, neigh_event_send or rdma_copy_addr does not require
RCU, so delete the lock.

Fixes: 02b619555ad6 ("infiniband: Convert dst_fetch_ha() over to dst_neigh_lookup().")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/mlx5: Don't leak UARs in case of free fails
Leon Romanovsky [Wed, 27 Jun 2018 07:44:24 +0000 (10:44 +0300)]
RDMA/mlx5: Don't leak UARs in case of free fails

The failure in releasing one UAR doesn't mean that we can't continue to
release rest of system pages, so don't return too early.

As part of cleanup, there is no need to print warning if
mlx5_cmd_free_uar() fails because such warning will be printed as part of
mlx5_cmd_exec().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/vmw_pvrdma: Delete unused function
Yuval Shaia [Wed, 27 Jun 2018 16:26:11 +0000 (19:26 +0300)]
RDMA/vmw_pvrdma: Delete unused function

This function is not in use - delete it.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Check for rdma_protocol_ib only after validating port_num
Jason Gunthorpe [Mon, 25 Jun 2018 22:03:41 +0000 (16:03 -0600)]
IB/core: Check for rdma_protocol_ib only after validating port_num

port_num is untrusted data from the user, so it should be checked after
calling fill_sgid_attr, which validates it.

Fixes: 8d9ec9addd6c ("IB/core: Add a sgid_attr pointer to struct rdma_ah_attr")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/mlx5: Rate limit errors in command interface
Leon Romanovsky [Sun, 24 Jun 2018 08:23:46 +0000 (11:23 +0300)]
net/mlx5: Rate limit errors in command interface

Any error status returned by FW will trigger a print similar to the
following error message in the dmesg.

[   55.884355] mlx5_core 0000:00:04.0: mlx5_cmd_check:712:(pid 555):
ALLOC_UAR(0x802) op_mod(0x0) failed, status limits exceeded(0x8),
syndrome (0x0)

Those prints are extremely valuable to diagnose issues with running system
and it is important to keep them. However, not-so-careful user can trigger
endless number of such prints by depleting HW resources and will spam
dmesg.

Rate limiting of such messages solves this issue.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx4: Create slave AH's directly
Jason Gunthorpe [Sun, 24 Jun 2018 13:57:50 +0000 (16:57 +0300)]
IB/mlx4: Create slave AH's directly

Since slave GID's do not exist in the core gid table we can no longer use
the core code to help do this without creating inconsistencies. Directly
create the AH using mlx4 internal APIs.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/uverbs: Remove redundant check
Leon Romanovsky [Sun, 24 Jun 2018 08:23:52 +0000 (11:23 +0300)]
RDMA/uverbs: Remove redundant check

kern_spec->reserved is checked prior to calling
kern_spec_to_ib_spec_filter() which makes this second check redundant.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/umem: Don't check for a negative return value of dma_map_sg_attrs()
Leon Romanovsky [Sun, 24 Jun 2018 08:23:48 +0000 (11:23 +0300)]
RDMA/umem: Don't check for a negative return value of dma_map_sg_attrs()

dma_map_sg_attrs() returns 0 on error and can't return a negative number
(ensured by BUG_ON), so don't check.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR
Leon Romanovsky [Sun, 24 Jun 2018 08:23:47 +0000 (11:23 +0300)]
RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR

Number of specs is provided by user and in valid case can be equal to zero.
Such argument causes to call to kcalloc() with zero-length request and in
return the ZERO_SIZE_PTR is assigned. This pointer is different from NULL
and makes various if (..) checks to success.

Fixes: b6ba4a9aa59f ("IB/uverbs: Add support for flow counters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/verbs: Drop kernel variant of destroy_flow
Leon Romanovsky [Sun, 24 Jun 2018 08:23:45 +0000 (11:23 +0300)]
RDMA/verbs: Drop kernel variant of destroy_flow

Following the removal of ib_create_flow(), adjust the code to get rid of
ib_destroy_flow() too.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/verbs: Drop kernel variant of create_flow
Leon Romanovsky [Sun, 24 Jun 2018 08:23:44 +0000 (11:23 +0300)]
RDMA/verbs: Drop kernel variant of create_flow

There are no kernel users of this interface so lets drop it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Check existence of create_flow callback
Jason Gunthorpe [Mon, 25 Jun 2018 21:21:15 +0000 (15:21 -0600)]
RDMA/uverbs: Check existence of create_flow callback

In the accepted series "Refactor ib_uverbs_write path", we presented the
roadmap to get rid of uverbs_cmd_mask and uverbs_ex_cmd_mask fields in
favor of simple check of function pointer. So let's put NULL check of
create_flow function callback despite the fact that uverbs_ex_cmd_mask
still exists.

Link: https://www.spinics.net/lists/linux-rdma/msg60753.html
Suggested-by: Michael J Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMAINTAINERS: Update SRP entries
Bart Van Assche [Fri, 22 Jun 2018 15:11:17 +0000 (08:11 -0700)]
MAINTAINERS: Update SRP entries

Reflect the acquisition of SanDisk by Western Digital in my e-mail
address. Remove the reference to David Dillow's git tree since SRP patches
are queued by Doug and Jason. Remove the reference to the OpenFabrics
website since the srp_daemon source code has been moved from that website
into the rdma-core project. Add an entry for the SRP target driver.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: David Dillow <dillow@google.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/usnic: Update with bug fixes from core code
Jason Gunthorpe [Wed, 13 Jun 2018 17:19:42 +0000 (11:19 -0600)]
IB/usnic: Update with bug fixes from core code

usnic has a modified version of the core codes' ib_umem_get() and
related, and the copy misses many of the bug fixes done over the years:

Commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages")
Commit 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm
                      from ib_umem_get")
Commit 8494057ab5e4 ("IB/uverbs: Prevent integer overflow in ib_umem_get
                      address arithmetic")
Commit 8abaae62f3fd ("IB/core: disallow registering 0-sized memory region")
Commit 66578b0b2f69 ("IB/core: don't disallow registering region starting
                      at 0x0")
Commit 53376fedb9da ("RDMA/core: not to set page dirty bit if it's already
                      set.")
Commit 8e907ed48827 ("IB/umem: Use the correct mm during ib_umem_release")

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx4: Add support for drain SQ & RQ
Yishai Hadas [Tue, 19 Jun 2018 07:43:56 +0000 (10:43 +0300)]
IB/mlx4: Add support for drain SQ & RQ

This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.

Specifically, Upon internal error state modify QP to an error state can
be assumed to be success as each in-progress WR going to be flushed in
error in any case as expected by that modify command.

In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.

In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.

As the above depends on scheduling the code takes the relevant locks
and actions to make sure that the completion handler for that WR will
always be called after that the post_send/recv were issued but not in
parallel to the other task that handles the error flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add support for drain SQ & RQ
Yishai Hadas [Tue, 19 Jun 2018 07:43:55 +0000 (10:43 +0300)]
IB/mlx5: Add support for drain SQ & RQ

This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.

Specifically,
Upon internal error state modify QP to an error state can be assumed to
be success as each in-progress WR going to be flushed in error in any
case as expected by that modify command.

In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.

In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.

As the above depends on scheduling the code takes the relevant locks and
actions to make sure that the completion handler for that WR will always
be called after that the post_send/recv were issued but not in parallel
to the other task that handles the error flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Remove unused ib cache functions
Jason Gunthorpe [Tue, 19 Jun 2018 07:59:21 +0000 (10:59 +0300)]
RDMA/core: Remove unused ib cache functions

Now that all users have been converted to use the version of these APIs
that returns a gid_attr pointer we can delete the old entry points.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/cm: Use sgid_attr from the AV
Parav Pandit [Tue, 19 Jun 2018 07:59:20 +0000 (10:59 +0300)]
IB/cm: Use sgid_attr from the AV

Prior patches now ensure that the AV has a sgid_attr, if one would have
been required.  Instead of querying for one, take it directly from the AH.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'
Parav Pandit [Tue, 19 Jun 2018 07:59:19 +0000 (10:59 +0300)]
IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'

While processing a path record entry in CM messages the associated GID
attribute is now also supplied.

Currently for RoCE a netdevice's net namespace pointer and ifindex are
stored in path record entry. Both of these fields of the netdev can change
anytime while processing CM messages. Additionally storing net namespace
without holding reference will lead to use-after-free crash. Therefore it
is removed. Netdevice information for RoCE is instead provided via
referenced gid attribute in ib_cm requests.

Such a design leads to a situation where the kernel can crash when the net
pointer becomes invalid. However today it is always initialized to
init_net, which cannot become invalid. In order to support processing
packets in any arbitrary namespace of the received packet, it is necessary
to avoid such conditions.

This patch removes the dependency on the net pointer and ifindex; instead
it will rely on SGID attribute which contains a pointer to netdev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/cm: Pass the sgid_attr through various events
Parav Pandit [Tue, 19 Jun 2018 07:59:18 +0000 (10:59 +0300)]
IB/cm: Pass the sgid_attr through various events

Make the sgid_attr available along with path information to the event
consumer, this allows the consumer to keep using the same GID table entry
as the event is related to.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/cm: Keep track of the sgid_attr that created the cm id
Parav Pandit [Tue, 19 Jun 2018 07:59:17 +0000 (10:59 +0300)]
IB/cm: Keep track of the sgid_attr that created the cm id

Hold reference to the the sgid_attr which is used in a cm_id until the
cm_id is destroyed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB: Make init_ah_attr_grh_fields set sgid_attr
Parav Pandit [Tue, 19 Jun 2018 07:59:16 +0000 (10:59 +0300)]
IB: Make init_ah_attr_grh_fields set sgid_attr

Use the sgid and other information from the path record to figure out the
sgid_attrs.

Store the selected table entry in the sgid_attr for everything else to
use.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB: Make ib_init_ah_from_mcmember set sgid_attr
Parav Pandit [Tue, 19 Jun 2018 07:59:15 +0000 (10:59 +0300)]
IB: Make ib_init_ah_from_mcmember set sgid_attr

This is really just a CM support function, normally a multicast address
does not have a specific SGID - but the RDMA CM usage model does restrict
things to the netdevice the CM id is bound to, at least for roce case.

Store the selected table entry in the sgid_attr for everything else to
use.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB: Make ib_init_ah_attr_from_wc set sgid_attr
Parav Pandit [Tue, 19 Jun 2018 07:59:14 +0000 (10:59 +0300)]
IB: Make ib_init_ah_attr_from_wc set sgid_attr

The work completion is inspected to determine what dgid table entry was
used to receieve the packet, produces a sgid_attr that matches and sticks
it in the ah_attr.

All callers of this function are now required to release the ah_attr on
success.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/hfi1: Remove INTx support and simplify MSIx usage
Michael J. Ruhl [Wed, 20 Jun 2018 16:43:23 +0000 (09:43 -0700)]
IB/hfi1: Remove INTx support and simplify MSIx usage

The INTx IRQ support does not work for all HF1 IRQ handlers
(specifically the receive data IRQs).

Remove all supporting code for the INTx IRQ.

If the requested MSIx vector request is unsuccessful, do not allow the
driver to continue.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Reorg ctxtdata and rightsize fields
Mike Marciniszyn [Wed, 20 Jun 2018 16:43:14 +0000 (09:43 -0700)]
IB/hfi1: Reorg ctxtdata and rightsize fields

Many fields in ctxtdata are incorrectly sized and the organization of the
fields within the structure is a jumble.

Fix by:
- Correcting oversize fields.
- Putting fields common to all contexts at the top with hot fields
  at the top.
- Moving PSM fields to the bottom of the structure.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Remove caches of chip CSRs
Mike Marciniszyn [Wed, 20 Jun 2018 16:43:06 +0000 (09:43 -0700)]
IB/hfi1: Remove caches of chip CSRs

Remove the sizeable cache of the chip sizing CSRs and replace with CSR
reads as needed.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Remove unused/writeonly devdata fields
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:57 +0000 (09:42 -0700)]
IB/hfi1: Remove unused/writeonly devdata fields

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Rightsize ctxt_eager_bufs fields
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:49 +0000 (09:42 -0700)]
IB/hfi1: Rightsize ctxt_eager_bufs fields

Fields in this structure are sized excessively based on hardware
limitations and input values.

Fix by reducing fields as appropriate and repositioning to close holes in
the structure.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Remove rcvctrl from ctxtdata
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:40 +0000 (09:42 -0700)]
IB/hfi1: Remove rcvctrl from ctxtdata

It is only ever written.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Remove rcvhdrq_size
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:31 +0000 (09:42 -0700)]
IB/hfi1: Remove rcvhdrq_size

The usage of this ctxt data field is not hot path and the value can be
computed on demand to cut down the ctxtdata bloat.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Free GID table entry during GID deletion
Parav Pandit [Thu, 21 Jun 2018 12:31:25 +0000 (15:31 +0300)]
IB/core: Free GID table entry during GID deletion

If we already hold the table->lock when doing the kref_put it means we are
in a context where it is safe to do the deletion synchronously, with no
need for the work queue.

This helps to eliminate issues when GID change is requested as part of MAC
address change or bonding event change where expectation is to replace the
GID almost immediately.

Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Consider net namespace while leaving multicast group
Parav Pandit [Thu, 21 Jun 2018 12:31:24 +0000 (15:31 +0300)]
RDMA/cma: Consider net namespace while leaving multicast group

When sending multicast leave request, consider the net ns in which this
cm_id is created.

Code was duplicated in cma_leave_mc_groups() and rdma_leave_multicast(),
which is now done using a helper function cma_leave_roce_mc_group().

Fixes: bee3c3c91865 ("IB/cma: Join and leave multicast groups with IGMP")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Delete type and id from uverbs_obj_attr
Jason Gunthorpe [Wed, 20 Jun 2018 21:47:11 +0000 (15:47 -0600)]
IB/uverbs: Delete type and id from uverbs_obj_attr

In this context the uobject is not allowed to be NULL, so type is the same
as uobject->type, and at least for IDR, id is the same as uobject->id.

FD objects should never handle the FD number outside the uAPI boundary
code.

Suggested-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMerge branch 'icrc-counter' into rdma.git for-next
Jason Gunthorpe [Fri, 22 Jun 2018 14:53:27 +0000 (08:53 -0600)]
Merge branch 'icrc-counter' into rdma.git for-next

For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull RoCE ICRC counters from Leon Romanovsky:

====================
This series exposes RoCE ICRC counter through existing RDMA hw_counters
sysfs interface.

The first patch has all HW definitions in mlx5_ifc.h file and second patch
is the actual counter implementation.
====================

* branch 'icrc-counter':
  IB/mlx5: Support RoCE ICRC encapsulated error counter
  net/mlx5: Add RoCE RX ICRC encapsulated counter

6 years agoIB/mlx5: Support RoCE ICRC encapsulated error counter
Talat Batheesh [Thu, 21 Jun 2018 12:37:56 +0000 (15:37 +0300)]
IB/mlx5: Support RoCE ICRC encapsulated error counter

This patch adds support to query the counter that counts the
RoCE packets with corrupted ICRC (Invariant Cyclic Redundancy Code).

This counter will be under
/sys/class/infiniband/<mlx5-dev>/ports/<port>/hw_counters/

rx_icrc_encapsulated - The number of RoCE packets with ICRC
error.

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/mlx5: Add RoCE RX ICRC encapsulated counter
Talat Batheesh [Thu, 17 May 2018 08:14:18 +0000 (11:14 +0300)]
net/mlx5: Add RoCE RX ICRC encapsulated counter

Add capability bit in PCAM register and RoCE ICRC error counter
to PPCNT register.

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/mlx5: Refactor transport domain checks
Leon Romanovsky [Tue, 19 Jun 2018 07:39:06 +0000 (10:39 +0300)]
RDMA/mlx5: Refactor transport domain checks

Put all relevant checks for transport domain in the
mlx5_ib_alloc/dealloc_transport_domain functions.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/rdmavt, IB/hfi1: Create device dependent s_flags
Mike Marciniszyn [Mon, 4 Jun 2018 18:44:02 +0000 (11:44 -0700)]
IB/rdmavt, IB/hfi1: Create device dependent s_flags

Move some s_flags defines out of rdmavt and into hfi1 because they are
hfi1 specific and therefore should remain in the driver instead of
bubbling up to rdmavt.

Document device specific ranges in rdmavt and remap
those in hfi1.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Remove rcvhdrsize
Mike Marciniszyn [Mon, 4 Jun 2018 18:43:46 +0000 (11:43 -0700)]
IB/hfi1: Remove rcvhdrsize

The field is based on a constant that can never change.

Use the define to assign the register instead.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Move rhf_offset from devdata to ctxtdata
Mike Marciniszyn [Mon, 4 Jun 2018 18:43:37 +0000 (11:43 -0700)]
IB/hfi1: Move rhf_offset from devdata to ctxtdata

This field should be in ctxtdata to allow for better locality of access by
eliminating a dd dereference.

The new field is now side-by-side with rcvhdrqentsize since the rhf_offset
is a function of the rcvhdrqentsize.

Both fields are now correctly sized as u8.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/hfi1: Move normal functions from hfi1_devdata to const array
Mike Marciniszyn [Mon, 4 Jun 2018 18:43:29 +0000 (11:43 -0700)]
IB/hfi1: Move normal functions from hfi1_devdata to const array

The current implementation precludes having receive context specific
packet type receive handlers.

Fix this by adding adding c99 const array for the existing handlers and
remove the current 72 bytes of pointers from devdata.

A new pointer in hfi1_ctxtdata will point to the const array.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Expose DEVX tree
Yishai Hadas [Sun, 17 Jun 2018 10:00:06 +0000 (13:00 +0300)]
IB/mlx5: Expose DEVX tree

Expose DEVX tree to be used by upper layers.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add DEVX query EQN support
Yishai Hadas [Sun, 17 Jun 2018 10:00:05 +0000 (13:00 +0300)]
IB/mlx5: Add DEVX query EQN support

Return the matching device EQN for a given user vector number via the
DEVX interface.

Note:
EQs are owned by the kernel and shared by all user processes.
Basically, a user CQ can point to any EQ.
The kernel doesn't enforce any such limitation today either.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add DEVX support for memory registration
Yishai Hadas [Sun, 17 Jun 2018 10:00:04 +0000 (13:00 +0300)]
IB/mlx5: Add DEVX support for memory registration

Add support to register a memory with the firmware via the DEVX
interface.

The driver translates a given user address to ib_umem then it will
register the physical addresses with the firmware and get a unique id
for this registration to be used for this virtual address.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add support for DEVX query UAR
Yishai Hadas [Sun, 17 Jun 2018 10:00:03 +0000 (13:00 +0300)]
IB/mlx5: Add support for DEVX query UAR

Return a device UAR index for a given user index via the DEVX interface.

Security note:
The hardware protection mechanism works like this: Each device object that
is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
the device specification manual) upon its creation. Then upon doorbell,
hardware fetches the object context for which the doorbell was rang, and
validates that the UAR through which the DB was rang matches the UAR ID
of the object.

If no match the doorbell is silently ignored by the hardware.  Of
course, the user cannot ring a doorbell on a UAR that was not mapped to
it.

Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
mailboxes (except tagging them with UID), we expose to the user its UAR
ID, so it can embed it in these objects in the expected specification
format. So the only thing the user can do is hurt itself by creating a
QP/SQ/CQ with a UAR ID other than his, and then in this case other users
may ring a doorbell on its objects.

The consequence of that will be that another user can schedule a QP/SQ
of the buggy user for execution (just insert it to the hardware schedule
queue or arm its CQ for event generation), no further harm is expected.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add DEVX support for modify and query commands
Yishai Hadas [Sun, 17 Jun 2018 10:00:02 +0000 (13:00 +0300)]
IB/mlx5: Add DEVX support for modify and query commands

Add support in DEVX for modify and query commands, the required lock is
taken (i.e. READ/WRITE) by the KABI infrastructure accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add obj create and destroy functionality
Yishai Hadas [Sun, 17 Jun 2018 10:00:01 +0000 (13:00 +0300)]
IB/mlx5: Add obj create and destroy functionality

Add support to create and destroy firmware objects via the DEVX
interface.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add support for DEVX general command
Yishai Hadas [Sun, 17 Jun 2018 10:00:00 +0000 (13:00 +0300)]
IB/mlx5: Add support for DEVX general command

Add support to run general firmware command via the DEVX interface.

A command that works on some object (e.g. CQ, WQ, etc.) will be added
in next patches while maintaining the required object lock.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Introduce DEVX
Yishai Hadas [Sun, 17 Jun 2018 09:59:57 +0000 (12:59 +0300)]
IB/mlx5: Introduce DEVX

Introduce DEVX to enable direct device commands in downstream patches
from this series.

In that mode of work the firmware manages the isolation between
processes' resources and as such a DEVX user id is created and assigned
to the given user context upon allocation request.

A capability check is done to make sure that this feature is really
supported by the firmware prior to creating the DEVX user id.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Expose ib_ucontext from a given ib_uverbs_file
Yishai Hadas [Sun, 17 Jun 2018 09:59:59 +0000 (12:59 +0300)]
IB/core: Expose ib_ucontext from a given ib_uverbs_file

Drivers that use the IOCTL API may have the ib_uverbs_file and need a
way to get the related ib_ucontext from it, this is enabled by this
patch.

Downstream patches from this series will use it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Introduce DECLARE_UVERBS_GLOBAL_METHODS
Yishai Hadas [Sun, 17 Jun 2018 09:59:58 +0000 (12:59 +0300)]
IB/core: Introduce DECLARE_UVERBS_GLOBAL_METHODS

Introduce a new macro to be used for global methods on a singleton
object.

This macros sets internally the type_attrs to be NULL as such an object
can't be created.

Downstream patches from this series will use this macro.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Allow an empty namespace in ioctl() framework
Matan Barak [Sun, 17 Jun 2018 09:59:54 +0000 (12:59 +0300)]
IB/uverbs: Allow an empty namespace in ioctl() framework

The ioctl parser framework wrongly assumed that each namespace is
populated. This could lead to NULL dereferences. Fix the parser to
always check that a given namespace indeed exists.

Fixes: fac9658cabb9 ("IB/core: Add new ioctl interface")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Add a macro to define a type with no kernel known size
Matan Barak [Sun, 17 Jun 2018 09:59:53 +0000 (12:59 +0300)]
IB/uverbs: Add a macro to define a type with no kernel known size

Sometimes the uverbs uAPI  doesn't really care about the structure it gets
from user-space. All it wants to do is to allocate enough space and send
it to the hardware/provider driver. Adding a UVERBS_ATTR_MIN_SIZE that
could be used for this scenarios. We use USHRT_MAX as the kernel known
size to bypass any zero validations.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Add PTR_IN attributes that are allocated/copied automatically
Matan Barak [Sun, 17 Jun 2018 09:59:52 +0000 (12:59 +0300)]
IB/uverbs: Add PTR_IN attributes that are allocated/copied automatically

Adding UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY flag to PTR_IN attributes.
By using this flag, the parse automatically allocates and copies the
user-space data. This data is accessible by using uverbs_attr_get_len
and uverbs_attr_get_alloced_ptr inline accessor functions from the
handler.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Refactor uverbs_finalize_objects
Matan Barak [Sun, 17 Jun 2018 09:59:51 +0000 (12:59 +0300)]
IB/uverbs: Refactor uverbs_finalize_objects

uverbs_finalize_objects is currently used only to commit or abort
objects. Since we want to add automatic allocation/free of PTR_IN
attributes, moving it to uverbs_ioctl.c and renamit it to
uverbs_finalize_attrs.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Export uverbs idr and fd types
Matan Barak [Sun, 17 Jun 2018 09:59:50 +0000 (12:59 +0300)]
IB/uverbs: Export uverbs idr and fd types

As provider drivers could use UVERBS_ATTR_FD and UVERBS_ATTR_IDR macros
need to export them.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMerge branch 'mellanox/mlx5-next' into RDMA for-next
Jason Gunthorpe [Tue, 19 Jun 2018 16:49:42 +0000 (10:49 -0600)]
Merge branch 'mellanox/mlx5-next' into RDMA for-next

From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

* branch 'mellanox/mlx5-next':
  net/mlx5: Expose DEVX specification
  net/mlx5: Prevent warns in dmesg upon firmware commands

6 years agonet/mlx5: Expose DEVX specification
Yishai Hadas [Thu, 8 Mar 2018 12:36:27 +0000 (14:36 +0200)]
net/mlx5: Expose DEVX specification

This patch updates the mlx5_ifc structures and
command interface to support DEVX.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Prevent warns in dmesg upon firmware commands
Yishai Hadas [Sun, 13 May 2018 10:21:40 +0000 (13:21 +0300)]
net/mlx5: Prevent warns in dmesg upon firmware commands

When DEVX is used application builds by itself the command mail box,
this patch prevents warns upon firmware commands as of invalid user
space usage.

In addition,
A failure in destroy_mkey command was changed to be printed only under
debug mode.

This prevents a redundant warn when a memory window was used
with rereg_mr and finally was some kernel cleanup as of reset
flow/process termination.

In that case this command might temporarily fails as part of the cleanup
but finally it expects to succeed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoiw_cxgb4: remove duplicate memcpy() in c4iw_create_listen()
Bharat Potnuri [Fri, 15 Jun 2018 15:28:23 +0000 (20:58 +0530)]
iw_cxgb4: remove duplicate memcpy() in c4iw_create_listen()

memcpy() of mapped addresses is done twice in c4iw_create_listen(),
removing the duplicate memcpy().

Fixes: 170003c894d9 ("iw_cxgb4: remove port mapper related code")
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: add max_send_sge and max_recv_sge attributes
Steve Wise [Mon, 18 Jun 2018 15:05:26 +0000 (08:05 -0700)]
IB/core: add max_send_sge and max_recv_sge attributes

This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths.  For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.

Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/rxe: avoid unnecessary NULL check
Zhu Yanjun [Thu, 14 Jun 2018 09:45:42 +0000 (05:45 -0400)]
IB/rxe: avoid unnecessary NULL check

Before goto err2, the variable qp is checked. So it is not necessary
to check qp in label err2.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/rxe: support for 802.1q VLAN on the listener
Vijay Immanuel [Wed, 13 Jun 2018 01:12:05 +0000 (18:12 -0700)]
IB/rxe: support for 802.1q VLAN on the listener

Set the vlan flag and vlan_id field in the wc for rdma_listen()
to work over VLAN. This is required by ib_init_ah_attr_from_wc()
which is called by the CM REQ handler.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/rxe: increase max MR limit
Vijay Immanuel [Thu, 14 Jun 2018 01:48:37 +0000 (18:48 -0700)]
IB/rxe: increase max MR limit

Increase the max MR limit to support more I/O queues
for NVMe over Fabrics hosts.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mad: Use IDR for agent IDs
willy@infradead.org [Wed, 13 Jun 2018 12:34:03 +0000 (05:34 -0700)]
IB/mad: Use IDR for agent IDs

Allocate agent IDs from a global IDR instead of an atomic variable.
This eliminates the possibility of reusing an ID which is already in
use after 4 billion registrations.  We limit the assigned ID to be less
than 2^24 as the mlx4 driver uses the most significant byte of the agent
ID to store the slave number.  Users unlucky enough to see a collision
between agent numbers and slave numbers see messages like:

 mlx4_ib: egress mad has non-null tid msb:1 class:4 slave:0

and the MAD layer stops working.

We look up the agent under protection of the RCU lock, which means we
have to free the agent using kfree_rcu, and only increment the reference
counter if it is not 0.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reported-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Tested-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>